Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing infrequent rows in a data frame

Let's say I have a following very simple data frame:

a <- rep(5,30)
b <- rep(4,80)
d <- rep(7,55)

df <- data.frame(Column = c(a,b,d))

What would be the most generic way for removing all rows with the value that appear less then 60 times?

I know you could say "in this case it's just a", but in my real data there are many more frequencies, so I wouldn't want to specify them one by one.

I was thinking of writing a loop such that if length() of an 'i' is smaller than 60, these rows will be deleted, but perhaps you have other ideas. Thanks in advance.

like image 769
Yaahtzeck Avatar asked Sep 11 '25 20:09

Yaahtzeck


1 Answers

A solution using dplyr.

library(dplyr)

df2 <- df %>%
  group_by(Column) %>%
  filter(n() >= 60)

Or a solution from base R

uniqueID <- unique(df$Column)
targetID <- sapply(split(df, df$Column), function(x) nrow(x) >= 60)

df2 <- df[df$Column %in% uniqueID[targetID], , drop = FALSE]
like image 177
www Avatar answered Sep 13 '25 09:09

www