I want to lump infrequent levels with a factor variable for multiple variables into 'other'. I tried to reproduce the problem below. Animal and color are 2 factor variables that I want to lump. It does not work when I put them in a list and loop through the list. But it works for one variable. My actual data set has tens of such variables and I want to find a clean way to do this with the dplyr approach.
library(tidyverse)
library(forcats)
data <- data.frame(ID=rep(1:12), animal=c('dog','cat','fish','dog','dog','dog','fish','fish','fish','snake','fish','dog'),color=c('red','green','blue','red','green',
'red','green','red','green','red','green','red'))
### Does not work when I use a list and for loop
factor_columns <- c('animal','color')
for (feature in factor_columns) {
data <- data %>%
mutate(feature = fct_lump_prop(
f = feature,
prop = 0.2,
other_level = 'other'
))}
### Works with one column
data <- data %>%
mutate(animal = fct_lump_prop(
f = animal,
prop = 0.2,
other_level = 'other'
))
You can use across :
library(dplyr)
library(forcats)
data %>%
mutate(across(factor_columns, fct_lump_prop,prop = 0.2,other_level = 'other'))
#mutate_at in old dplyr
#mutate_at(vars(factor_columns), fct_lump_prop,prop = 0.2,other_level = 'other')
You can also use lapply :
data[factor_columns] <- lapply(data[factor_columns],
fct_lump_prop,prop = 0.2,other_level = 'other')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With