Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows conditional on value on other rows using dplyr in R

Tags:

r

dplyr

tidyverse

Using the example data provided below: For each institution type ("a" and "b") I want to drop rows with fac == "no" if there exists a row with fac == "yes" for the same year. I then want to sum the values by year. I am, however, not able to figure out how to drop the correct "no"-rows. Below are a couple of my attempts based on answers give here.

set.seed(123)
ext <- tibble(
  institution = c(rep("a", 7), rep("b", 7)),
  year = rep(c("2005", "2005", "2006", "2007", "2008", "2009", "2009"), 2),
  fac = rep(c("yes", "no", "no", "no", "no", "yes", "no"), 2),
  value = sample(1:100, 14, replace=T)
)

ext %>%
  group_by(institution, year) %>%
  filter(if (fac == "yes") fac != "no")

ext %>%
  group_by(institution, year) %>%
  case_when(fac == "yes" ~ filter(., fac != "no"))

ext %>%
  group_by(institution, year) %>%
  {if (fac == "yes") filter(., fac != "no")}
like image 702
humperderp Avatar asked Oct 29 '25 10:10

humperderp


2 Answers

Another way would be:

library(dplyr)
ext %>%
  group_by(institution, year) %>%
  filter(fac == 'yes' | n() < 2)

# institution year  fac   value
# 1 a           2005  yes      31
# 2 a           2006  no       51
# 3 a           2007  no       14
# 4 a           2008  no       67
# 5 a           2009  yes      42
# 6 b           2005  yes      43
# 7 b           2006  no       25
# 8 b           2007  no       90
# 9 b           2008  no       91
# 10 b          2009  yes      69

In case you want the overall amounts by year, add these two lines, which will yield the following output:

group_by(year) %>%
summarise(value=sum(value))

# year  value
# <chr> <int>
# 1 2005     74
# 2 2006     76
# 3 2007    104
# 4 2008    158
# 5 2009    111
like image 85
AlexB Avatar answered Nov 01 '25 00:11

AlexB


Does this work: by summarise, I assumed you want to sum by year after applying the filtering.

library(dplyr)
ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no'))
# A tibble: 10 x 4
# Groups:   institution, year [10]
   institution year  fac   value
   <chr>       <chr> <chr> <int>
 1 a           2005  yes      31
 2 a           2006  no       51
 3 a           2007  no       14
 4 a           2008  no       67
 5 a           2009  yes      42
 6 b           2005  yes      43
 7 b           2006  no       25
 8 b           2007  no       90
 9 b           2008  no       91
10 b           2009  yes      69
ext %>% group_by(institution, year) %>% filter(fac == 'yes'|all(fac == 'no')) %>% 
ungroup() %>% group_by(year) %>% summarise(value = sum(value))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 5 x 2
  year  value
  <chr> <int>
1 2005     74
2 2006     76
3 2007    104
4 2008    158
5 2009    111
like image 41
Karthik S Avatar answered Nov 01 '25 00:11

Karthik S