Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: purrr: using pmap for row-wise operations, but this time involving LOTS of columns

This is not a duplicate of questions like e.g. Row-wise iteration like apply with purrr

I understand how to use pmap() to do a row-wise operation on a data-frame:

library(tidyverse)

df1 = tribble(~col_1, ~col_2, ~col_3,
               1,      5,      12,
               9,      3,      3,
               6,     10,     7)

foo = function(col_1, col_2, col_3) {
  mean(c(col_1, col_2, col_3))
}

df1 %>% pmap_dbl(foo)

This gives the function foo applied to every row:

[1] 6.000000 5.000000 7.666667

But this gets pretty unwieldy when I have more than a few columns, because I have to pass them all in explicitly. What if I had say, 8 columns in my dataframe df2 and I want to apply a function bar that potentially involves every single one of those columns?

set.seed(12345)
df2 = rnorm(n=24) %>% matrix(nrow=3) %>% as_tibble() %>%
  setNames(c("col_1", "col_2", "col_3", "col_4", "col_5", "col_6", "col_7", "col_8"))

bar = function(col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8) {
  # imagine we do some complicated row-wise operation here
  mean(c(col_1, col_2, col_3, col_4, col_5, col_6, col_7, col_8))
}

df2 %>% pmap_dbl(bar)

Gives:

[1]  0.45085420  0.02639697 -0.28121651

This is clearly inadequate -- I have to add a new argument to bar for every single column. It's a lot of typing, and it makes the code less readable and more fragile. It seems like there should be a way to have it take a single argument x, and then access the variables I want by x$col_1 etc. Or something more elegant than the above at any rate. Is there any way to clean this code up using purrr?

like image 566
dain Avatar asked Jan 21 '26 10:01

dain


1 Answers

You can use the ... and en-list them once they're in your function.

dot_tester <- function(...) {
  dots <- list(...)
  dots$Sepal.Length + dots$Petal.Width
}

purrr::pmap(head(iris), dot_tester)
[[1]]
[1] 5.3

[[2]]
[1] 5.1

[[3]]
[1] 4.9

[[4]]
[1] 4.8

[[5]]
[1] 5.2

[[6]]
[1] 5.8

However, this doesn't change your code being "fragile", since you still explicitly and exactly need to match your column names as names within your function. The bonus is not having to list them out in the <- function() call.

like image 174
Brian Avatar answered Jan 25 '26 20:01

Brian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!