Lets say I have this dataset:
data1 = sample(1:250, 250)
data2 = sample(1:250, 250)
data <- data.frame(data1,data2)
If I want to subset 'data' by 30 values in both 'data1' and 'data2' what would be the best way to do that? For example, from 'data' I want to select all rows where data1= 4 or 12 or 13 or 24 and data2= 4 or 12 or 13 or 24 and data2= 4 or 12 or 13 or 24. I want rows where both conditions are true.
I wrote this out like:
subdata <- subset(data, data1 == 4 |data1 == 12 |data1 == 13 |data1 == 24 & data2 == 4 |data2 == 12 |data2 == 13 |data2 == 24)
But this doesn't seem meet both conditions, rather it's one or the other.
Note that in your original subset, you didn't wrap your | tests for data1 and data2 in brackets. This produces the wrong subset of "data1= 4 or 12 or 13 or 24 OR data2= 4 or 12 or 13 or 24". You actually want:
subdata <- subset(data, (data1 == 4 |data1 == 12 |data1 == 13 |data1 == 24) & (data2 == 4 |data2 == 12 |data2 == 13 |data2 == 24))
Here is how you would modify your subset function with %in%:
subdata <- subset(data, (data1 %in% c(4, 12, 13, 24)) & (data2 %in% c(4, 12, 13, 24)))
Below I provide an elegant dplyr approach with filter_all:
library(dplyr)
data %>%
filter_all(all_vars(. %in% c(4, 12, 13, 24)))
Note:
Your sample functions do not easily produce sample data where the tests are actually true. As a result the above solution would likely return zero rows. I've therefore modified your sample dataset to produce rows that actually have matches that you can subset.
Data:
set.seed(1)
data1 = sample(c(4, 12, 13, 24, 100, 123), 500, replace = TRUE)
data2 = sample(c(4, 12, 13, 24, 100, 123), 500, replace = TRUE)
data <- data.frame(data1,data2)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With