Assume this dataframe:
country <- c('USA', 'USA', 'USA', 'USA', 'USA', 'UK', 'UK', 'UK', 'Canada')
number <- c(1:9)
df <- data.frame(country, number)
I want to be able to subset only the rows where the country count is greater than 4 or less than 2. So in this case, it would return:
country number
USA 1
USA 2
USA 3
USA 4
USA 5
Canada 9
I am able to make it work with this:
totalcounts <- filter(count(df, country), n>4 | n<2) # giving me a df of the country and count
for (i in nrow(totalcounts)){
# code in here that rbinds rows as it matches
}
But I feel there has to be an easier way. I haven't gotten the grasp of sapply and such yet, so I feel like I'm missing something here. It just seems like I am going the long way around and there is already something in place that does this.
Here is a base R option using subset + ave
subset(df,!ave(number,country,FUN = function(x) length(x)%in% c(2:4)))
or a shorter version (Thank @Onyambu)
subset(df,!ave(number,country,FUN = length) %in% 2:4)
such that
country number
1 USA 1
2 USA 2
3 USA 3
4 USA 4
5 USA 5
9 Canada 9
Base R option using table :
tab <- table(df$country)
subset(df, country %in% names(tab[tab > 4 | tab < 2]))
# country number
#1 USA 1
#2 USA 2
#3 USA 3
#4 USA 4
#5 USA 5
#9 Canada 9
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With