From this dataframe I want to subset rows which have the identical "model", "Country" and "year" values but different "factor"
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
model country year value factor
1 A Italy 2016 14 NEW
2 B Germany 2016 24 OLD
3 C Norway 2016 11 OLD
4 A Italy 2016 34 OLD
5 A France 2015 73 OLD
6 C Germany 2015 11 OLD
7 B Norway 2015 33 NEW
8 A Italy 2015 22 NEW
Expected output:
1 A Italy 2016 22 NEW
4 A Italy 2016 34 OLD
What you could do is create a column which tells the number of duplicated based on the groups (model, country and year) and then filter
like this:
model <- c("A","B","C","A","A","C","B","A")
country <- c("Italy","Germany","Norway","Italy","France","Germany","Norway","Italy")
year <- c(2016,2016,2016,2016,2015,2015,2015,2015)
value <- c(14,24,11,34,73,11,33,22)
factor <- c("NEW","OLD","OLD","OLD","OLD","OLD","NEW","NEW")
df <- data.frame(model,country,year,value, factor)
library(dplyr)
df %>%
group_by(model, country, year) %>%
mutate(number_dups = n()) %>%
filter(number_dups > 1) %>%
select(-number_dups) %>%
ungroup()
#> # A tibble: 2 × 5
#> model country year value factor
#> <chr> <chr> <dbl> <dbl> <chr>
#> 1 A Italy 2016 14 NEW
#> 2 A Italy 2016 34 OLD
Created on 2022-08-12 by the reprex package (v2.0.1)
Please note: I assume that the expected output should be 14 instead of 22.
In base R, you can use subset
+ ave
> subset(df, ave(seq_along(year), model, country, year, FUN = length) > 1)
model country year value factor
1 A Italy 2016 14 NEW
4 A Italy 2016 34 OLD
Or, try Filter
+ split
> Filter(function(x) nrow(x) > 1, split(df, ~ model + country + year))
$A.Italy.2016
model country year value factor
1 A Italy 2016 14 NEW
4 A Italy 2016 34 OLD
or
> do.call(rbind, unname(Filter(function(x) nrow(x) > 1, split(df, ~ model + country + year))))
model country year value factor
1 A Italy 2016 14 NEW
4 A Italy 2016 34 OLD
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With