How to use R summarise with multiple numeric and text-based conditional subsets

Question

I have a table containing two rows for each ID.

table <- tibble(
  id = c(1,1,2,2,3,3,4,4,5,5),
  row1 = c(2,5,2,5,1,3,2,5,3,2),
  row2 = c("foo", "other foo", "bar", "bar", "bar", "bar other", "other", "foo", "other", "other")
)
> table
# A tibble: 10 × 3
      id  row1 row2 
   <dbl> <dbl> <chr>
 1     1     2 foo  
 2     1     5 other foo
 3     2     2 bar  
 4     2     5 bar  
 5     3     1 bar
 6     3     3 bar other
 7     4     2 other
 8     4     4 foo  
 9     5     3 other
10     5     2 other

I would like to resolve the table to a single row for each ID based on three rules in succession:

If, for each ID, there is one row in which row1 is 5 or more, then choose the row where row1 is less than 5.
Else if, for each ID, there is a row in which row2 contains the word 'other', choose the row where row2 does not contain the word 'other'
Otherwise, for each ID, pick the first row.

I feel there must be a more straightforward way of doing this. This is my attempt so far, but I've can't work out how to resolve the NA to return 'bar'.

table %>%
  group_by(id) %>%
  summarise(
    row1 = ifelse(max(row1) >= 5,
            first(row1[row1 < 5]),
            ifelse(
              grep("other", row2),
              ifelse(
                !is.na(first(row1[grep("other", row2, invert = T)])),
                first(row1[grep("other", row2, invert = T)]),
                first(row1)),
              first(row1))
    ),
    row2 = ifelse(
      max(row1) >= 5,
      first(row2[row1 < 5]),
      ifelse(
        grep("other", row2),
        ifelse(
          !is.na(first(row2[grep("other", row2, invert = T)])),
          first(row2[grep("other", row2, invert = T)]),
          first(row2)),
        first(row2)
    )
)
)

# A tibble: 5 × 3
     id  row1 row2 
  <dbl> <dbl> <chr>
1     1     2 foo  
2     2     2 NA   
3     3     1 bar  
4     4     2 foo  
5     5     3 other

Desired output:

id	row1	row2
1	2	foo
2	2	bar
3	1	bar
4	2	other
5	3	other

Many thanks for your help.

TarJae · Accepted Answer

Here is how we can do it:

library(dplyr)
library(tidyr)
library(stringr)

table %>%
  group_by(id) %>% 
  separate_rows(row2) %>%
  mutate(x = ifelse(row1>=5, min(row1),NA),
         y = ifelse(str_detect(row2, 'other'), !str_detect(row2, 'other'), NA)) %>% 
  slice(1) %>% 
  select(-c(x, y))

    id  row1 row2 
  <dbl> <dbl> <chr>
1     1     2 foo  
2     2     2 bar  
3     3     1 bar  
4     4     2 other
5     5     3 other

user2332849 · Answer

table %>%
  group_by(id) %>%
  subset(
    case_when(
      any(row1 >= 5) ~ row1 < 5,
      any(grepl("other", row2)) ~ !grepl("other", row2),
      T ~ T
    )
  ) %>%
  filter(row_number() == 1) %>%
  ungroup()

This answer takes advantage of dplyr's grouping abilities to check for any() within each group, so it gets easy to know if a certain condition happens within a group.

It also uses case_when() to check for a series of conditions in a prioritized order, implementing what would be a series of if/else's.

Finally, since in whatever case we would like only the first row that matches the criteria, it uses the function row_number() to check whether we're on the first row within the group, in order to select it.

Output is:

# A tibble: 5 x 3
     id  row1 row2     
  <dbl> <dbl> <chr>    
1     1     2 foo      
2     2     2 bar      
3     3     1 bar other
4     4     2 other    
5     5     3 other    
>

How to use R summarise with multiple numeric and text-based conditional subsets

Tags:

r

dplyr

summarize

Jamie Scuffell

2 Answers

TarJae

user2332849

Recent Activity

Donate For Us

How to use R summarise with multiple numeric and text-based conditional subsets

Tags:

r

dplyr

summarize

Jamie Scuffell

2 Answers

TarJae

user2332849

Related questions

Recent Activity

Donate For Us