Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mutate() with an if/else function

Tags:

r

dplyr

I have an example dataframe

df <- data.frame(cust = sample(1:100, 1000, TRUE),
             channel = sample(c("WEB", "POS"), 1000, TRUE))

that I'm trying to mutate

get_channels <- function(data) {
    d <- data
    if(unique(d) %>% length() == 2){
        d <- "Both"
    } else {
        if(unique(d) %>% length() < 2 && unique(d) == "WEB") {
            d <- "Web"
        } else {
            d <- "POS"
            }
        }
    return(d)
}

This works without issue and on small dataframes, it takes no time at all.

start.time <- Sys.time()

df %>%
    group_by(cust) %>%
    mutate(chan = get_channels(channel)) %>%
    group_by(cust) %>% 
    slice(1) %>%
    group_by(chan) %>%
    summarize(count = n()) %>%
    mutate(perc = count/sum(count))

end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken

Time difference of 0.34602 secs

However, when the data frame gets rather large, say, on the order of >1000000 or more cust, my basic if/else fx takes much, much longer.

How can I streamline this function to make it run more quickly?

like image 782
Steven Avatar asked Feb 02 '26 16:02

Steven


1 Answers

You should use a data.table for this.

setDT(df)
t1 = Sys.time()
df = df[ , .(channels = ifelse(uniqueN(channel) == 2, "both", as.character(channel[1]))), by = .(cust)]

> Sys.time() - t1
Time difference of 0.00500083 secs

> head(df)
   cust channels
1:   37     both
2:   45     both
3:   74     both
4:   20     both
5:    1     both
6:   68     both
like image 191
Kristofersen Avatar answered Feb 05 '26 05:02

Kristofersen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!