Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Transform multiple columns with a function that uses different arguments per column

I write and review a fair amount of R code like this:

df <- data.frame(replicate(10, sample(0:5, 10, rep = TRUE)))
my.func <- function(col, y) {col %in% y}

df$X2 <- my.func(df$X2, c(1,2))
df$X3 <- my.func(df$X3, c(4,5))
df$X5 <- my.func(df$X5, c(1,2))
df$X6 <- my.func(df$X6, c(4,5))
df$X8 <- my.func(df$X8, c(4,5))
df$X9 <- my.func(df$X9, c(1,2))
df$X10 <- my.func(df$X10, c(1))

That is, certain columns in a data.frame (or data.table) are transformed using a function, where one argument is a column and the other is some arbitrary, somewhat-unique-to-that-column value.

What's a more concise way to make such transformations?

I've tried using data.table's set (:=) operator, which makes things slightly cleaner, but still each column name must appear twice and the function must appear once for each column.

like image 336
Emerson Avatar asked Mar 15 '26 14:03

Emerson


2 Answers

A concise way would be Map with the input arguments as the dataset ('df') and a list of vector that would be passed as argument to my.func. Here, each column of the data.frame is a unit and similarly the vector element from list.

df[] <- Map(my.func, df, list(1:2, 4:5, 3:4))

NOTE: The OP's function or a minimal reproducible example is not provided, so it is not tested

NOTE2: Here, the assumption is that the number of columns is 3. If it is more than 3, increase the length of the list as well


The above can also be converted to data.table syntax

library(data.table)
setDT(df)[, names(df) := Map(my.func, .SD, list(1:2, 4:5, 3:4))]

If only a subset of columns needs to be changed, specify the columns in .SDcols, and also change the names(df) to the subset of names


Or with tidyverse

library(tidyverse)
map2_dfc(df, list(1:2, 4:5, 3:4), my.func)
like image 148
akrun Avatar answered Mar 17 '26 05:03

akrun


OP's request from a comment:

make the association between column names and function argument(s) for those columns more explicit

Adjusting the Map approach seen in the other answers:

yL <- list(X2 = 1:2, X3 = 4:5, X5 = 3:4, X6 = 4:5, X8 = 4:5, X9 = 1:2, X10 = 1)
df[names(yL)] <-  Map(my.func, df[names(yL)], y = yL)

With data.table:

# this saves you from writing DT twice
DT[, names(yL) := Map(my.func, .SD, y = yL), .SDcols=names(yL)]
like image 31
Frank Avatar answered Mar 17 '26 03:03

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!