I have a data frame final
where each observation has an id in a column called final$workerId
I want to remove some rows of this data frame if their ID is found in another list called omit
Here is what I've tried
final <- read.csv("the data.csv")
omit <- c("A3E9N7HDRLT8KV","A39HQTITNY9TVJ","A272A0JGRTBFCR","A1QPHQ1C27ZFI7")
final <- final[,-final$workerId %in% omit]
I know how I could do it with a for loop but I am looking for a solution without using for loops if possible
%in%
returns a logical vector. The opposite of a logical vector can be found with !
, not -
, so final[!final$workerId %in% omit, ]
is what you want.
You could also use which
to turn your logical into an integer index vector, and then you could use -
like this: final[-which(final$workerId %in% omit), ]
, but the first way seems simpler.
Example:
mtcars[!mtcars$cyl %in% c(4, 6), ]
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
here's a dplyr solution that may be of interest. The logic of the syntax is similar to the base R attempt you wrote in your question.
omit <- c("A3E9N7HDRLT8KV","A39HQTITNY9TVJ","A272A0JGRTBFCR","A1QPHQ1C27ZFI7")
final <- filter(final, !(workerId %in% omit))
dplyr's
filter
selects a subset of rows based on some condition. The condition we provide here is the set of character strings that are not (!
) in (%in%
) the vector omit
. Because it's a dplyr
function, you don't need to use the data frame name final
when referencing the vector workerId
after you call it in the first argument.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With