Say I have this sort of dataframe:
day value group type id
1 1 0.1 A X 1
2 1 0.4 A Y 1
3 2 0.2 A X 3
4 2 0.5 A Y 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
8 1 0.3 B Y 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
12 3 0.3 B Y 12
13 1 0.1 C X 12
14 1 0.3 C Y 12
15 2 0.3 C X 5
16 2 0.2 C Y 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Data:
library(dplyr)
df1 <- data.frame(
day = rep(1:3,6),
value = c(0.1,0.2,0.3,0.4,0.5,0.2,0.1,0.1,0.2,0.3,0.4,0.3, 0.1,0.3,0.2,0.3,0.2,0.2),
group = rep(LETTERS[1:3], each=6)
) %>%
arrange(group,day) %>%
mutate(type=rep(LETTERS[24:25],9),
id = c(1,1,3,3,5,6,3,3,11,10,12,12,12,12,5,5,3,2))
df1
I want to filter this dataframe based upon a conditional filter. I want to group_by(day, group) and if all id within each grouping are all equal, I want to filter out all rows of type Y but keep rows of type X.
I can do this by running a loop or through several steps of dataframe subsetting, but I was wondering if there is a one/two liner in dplyr or data.table that I am somehow overlooking.
This would be the desired output:
day value group type id
1 1 0.1 A X 1
3 2 0.2 A X 3
5 3 0.3 A X 5
6 3 0.2 A Y 6
7 1 0.1 B X 3
9 2 0.1 B X 11
10 2 0.4 B Y 10
11 3 0.2 B X 12
13 1 0.1 C X 12
15 2 0.3 C X 5
17 3 0.2 C X 3
18 3 0.2 C Y 2
Here is an one-liner with data.table.
We convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'day', 'group', if the length of unique elements of 'id' is 1, we get the Subset of Data.table (.SD) rows where 'type' is 'X' or else get the .SD.
library(data.table)#v1.9.6+
setDT(df1)[, if(uniqueN(id)==1) .SD[type=='X'] else .SD, .(day, group)]
# day group value type id
# 1: 1 A 0.1 X 1
# 2: 2 A 0.2 X 3
# 3: 3 A 0.3 X 5
# 4: 3 A 0.2 Y 6
# 5: 1 B 0.1 X 3
# 6: 2 B 0.1 X 11
# 7: 2 B 0.4 Y 10
# 8: 3 B 0.2 X 12
# 9: 1 C 0.1 X 12
#10: 2 C 0.3 X 5
#11: 3 C 0.2 X 3
#12: 3 C 0.2 Y 2
Or if the 'type' is already ordered as in the example data
unique(setDT(df1), by = c('day', 'group', 'id'))
If not ordered,
unique(setDT(df1)[order(group,day, id, type)],by = c('day', 'group' , 'id'))
df1 <- structure(list(day = c(1L, 1L, 2L, 2L, 3L, 3L, 1L, 1L,
2L, 2L,
3L, 3L, 1L, 1L, 2L, 2L, 3L, 3L), value = c(0.1, 0.4, 0.2, 0.5,
0.3, 0.2, 0.1, 0.3, 0.1, 0.4, 0.2, 0.3, 0.1, 0.3, 0.3, 0.2, 0.2,
0.2), group = c("A", "A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "B", "C", "C", "C", "C", "C", "C"), type = c("X", "Y",
"X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X", "Y", "X",
"Y", "X", "Y"), id = c(1L, 1L, 3L, 3L, 5L, 6L, 3L, 3L, 11L, 10L,
12L, 12L, 12L, 12L, 5L, 5L, 3L, 2L)), .Names = c("day", "value",
"group", "type", "id"), class = "data.frame",
row.names = c(NA, -18L))
Similar to P Lapointe, I had the following. I initially wanted to use all() to check if all id values are identical or not, but that attempt failed. So, I chose to use diff(). Using mutate() I checked if all id values are identical or not for each group. Then, I chose rows which does not have the combination of check == TRUE and type == "Y". Finally, I removed the check column.
group_by(df1, day, group) %>%
mutate(check = any(diff(id) == 0)) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)
# day value group type id
# (int) (dbl) (fctr) (chr) (dbl)
#1 1 0.1 A X 1
#2 2 0.2 A X 3
#3 3 0.3 A X 5
#4 3 0.2 A Y 6
#5 1 0.1 B X 3
#6 2 0.1 B X 11
#7 2 0.4 B Y 10
#8 3 0.2 B X 12
#9 1 0.1 C X 12
#10 2 0.3 C X 5
#11 3 0.2 C X 3
#12 3 0.2 C Y 2
EDIT
After communicating with akrun, I revised the code above. Here it is.
group_by(df1, day, group) %>%
mutate(check = n_distinct(id) == 1) %>%
filter(!(check == TRUE & type == "Y")) %>%
select(-check)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With