I'd like to iterate over a series of dataframes and apply the same function to them all.
I'm trying this using tidyr::nest and purrr::map_df. Here's a reprex of the sort of thing I'm trying to achieve.
data(iris)
library(purrr)
library(tidyr)
iris_df <- as.data.frame(iris)
my_var <- 2
my_fun <- function(df) {
sum_df <- sum(df) + my_var
}
iris_df %>% group_by(Species) %>% nest() %>% map_df(.$data, my_fun)
# Error: Index 1 must have length 1
What am I doing wrong? Is there a different approach?
EDIT: To clarify my desired output. Aiming for new column containing output eg
|Species|Data|my_function_output|
|:------|:---|:-----------------|
|setosa |<tibble>|509.1 |
The problem is that nest() gives you a data.frame with a column data which is a list of data.frames. You need to map or sapply over the data column of the nest() output, not the entire nest output. I use sapply, but you could also use map_dbl. If you use map you will end up with list output, and map_df will not work because it requires named input.
iris_df %>%
group_by(Species) %>%
nest() %>%
mutate(my_fun_out = sapply(data, my_fun))
# A tibble: 3 x 3
Species data my_fun_out
<fct> <list> <dbl>
1 setosa <tibble [50 x 4]> 509
2 versicolor <tibble [50 x 4]> 717
3 virginica <tibble [50 x 4]> 859
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With