I have a table containing two rows for each ID.
table <- tibble(
id = c(1,1,2,2,3,3,4,4,5,5),
row1 = c(2,5,2,5,1,3,2,5,3,2),
row2 = c("foo", "other foo", "bar", "bar", "bar", "bar other", "other", "foo", "other", "other")
)
> table
# A tibble: 10 × 3
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 1 5 other foo
3 2 2 bar
4 2 5 bar
5 3 1 bar
6 3 3 bar other
7 4 2 other
8 4 4 foo
9 5 3 other
10 5 2 other
I would like to resolve the table to a single row for each ID based on three rules in succession:
I feel there must be a more straightforward way of doing this. This is my attempt so far, but I've can't work out how to resolve the NA to return 'bar'.
table %>%
group_by(id) %>%
summarise(
row1 = ifelse(max(row1) >= 5,
first(row1[row1 < 5]),
ifelse(
grep("other", row2),
ifelse(
!is.na(first(row1[grep("other", row2, invert = T)])),
first(row1[grep("other", row2, invert = T)]),
first(row1)),
first(row1))
),
row2 = ifelse(
max(row1) >= 5,
first(row2[row1 < 5]),
ifelse(
grep("other", row2),
ifelse(
!is.na(first(row2[grep("other", row2, invert = T)])),
first(row2[grep("other", row2, invert = T)]),
first(row2)),
first(row2)
)
)
)
# A tibble: 5 × 3
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 2 2 NA
3 3 1 bar
4 4 2 foo
5 5 3 other
Desired output:
| id | row1 | row2 |
|---|---|---|
| 1 | 2 | foo |
| 2 | 2 | bar |
| 3 | 1 | bar |
| 4 | 2 | other |
| 5 | 3 | other |
Many thanks for your help.
Here is how we can do it:
library(dplyr)
library(tidyr)
library(stringr)
table %>%
group_by(id) %>%
separate_rows(row2) %>%
mutate(x = ifelse(row1>=5, min(row1),NA),
y = ifelse(str_detect(row2, 'other'), !str_detect(row2, 'other'), NA)) %>%
slice(1) %>%
select(-c(x, y))
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 2 2 bar
3 3 1 bar
4 4 2 other
5 5 3 other
table %>%
group_by(id) %>%
subset(
case_when(
any(row1 >= 5) ~ row1 < 5,
any(grepl("other", row2)) ~ !grepl("other", row2),
T ~ T
)
) %>%
filter(row_number() == 1) %>%
ungroup()
This answer takes advantage of dplyr's grouping abilities to check for any() within each group, so it gets easy to know if a certain condition happens within a group.
It also uses case_when() to check for a series of conditions in a prioritized order, implementing what would be a series of if/else's.
Finally, since in whatever case we would like only the first row that matches the criteria, it uses the function row_number() to check whether we're on the first row within the group, in order to select it.
Output is:
# A tibble: 5 x 3
id row1 row2
<dbl> <dbl> <chr>
1 1 2 foo
2 2 2 bar
3 3 1 bar other
4 4 2 other
5 5 3 other
>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With