Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to subset identical rows from dataframe by condition in R?

Tags:

dataframe

r

dplyr

From this dataframe I want to subset rows which have the identical "model", "Country" and "year" values but different "factor"

model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <-  c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")

df <- data.frame(model,country,year,value, factor)

  model country year value factor
1     A   Italy 2016    14    NEW
2     B Germany 2016    24    OLD
3     C  Norway 2016    11    OLD
4     A   Italy 2016    34    OLD
5     A  France 2015    73    OLD
6     C Germany 2015    11    OLD
7     B  Norway 2015    33    NEW
8     A   Italy 2015    22    NEW

Expected output:

1     A   Italy 2016    22    NEW
4     A   Italy 2016    34    OLD

2 Answers

What you could do is create a column which tells the number of duplicated based on the groups (model, country and year) and then filter like this:

model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <-  c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")

df <- data.frame(model,country,year,value, factor)

library(dplyr)

df %>% 
  group_by(model, country, year) %>% 
  mutate(number_dups = n()) %>% 
  filter(number_dups > 1) %>%
  select(-number_dups) %>%
  ungroup()
#> # A tibble: 2 × 5
#>   model country  year value factor
#>   <chr> <chr>   <dbl> <dbl> <chr> 
#> 1 A     Italy    2016    14 NEW   
#> 2 A     Italy    2016    34 OLD

Created on 2022-08-12 by the reprex package (v2.0.1)

Please note: I assume that the expected output should be 14 instead of 22.

like image 112
Quinten Avatar answered Sep 09 '25 14:09

Quinten


In base R, you can use subset + ave

> subset(df, ave(seq_along(year), model, country, year, FUN = length) > 1)
  model country year value factor
1     A   Italy 2016    14    NEW
4     A   Italy 2016    34    OLD

Or, try Filter + split

> Filter(function(x) nrow(x) > 1, split(df, ~ model + country + year))
$A.Italy.2016
  model country year value factor
1     A   Italy 2016    14    NEW
4     A   Italy 2016    34    OLD

or

> do.call(rbind, unname(Filter(function(x) nrow(x) > 1, split(df, ~ model + country + year))))
  model country year value factor
1     A   Italy 2016    14    NEW
4     A   Italy 2016    34    OLD
like image 42
ThomasIsCoding Avatar answered Sep 09 '25 15:09

ThomasIsCoding