Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

purrr combine pmap and nest

Tags:

r

purrr

I am trying to learn purrr to simulate data using rnorm with different means, sd, and n in each iteration. This code generates my dataframe:

parameter = crossing(n = c(60,80,100),   
                    agegroup = c("a", "b","c"), 
                    effectsize = c(0.2, 0.5, 0.8),
                    sd =2
                        ) %>%
# create a simulation id number
group_by(agegroup) %>%
mutate(sim= row_number())%>%
ungroup() %>%
mutate(# change effect size so that one group has effect, others d=0
       effectsize= if_else(agegroup == "a", effectsize, 0),
       # calculate the mean for the distribution from effect size
       mean =effectsize*sd) 

Now I want to iterate over the different simulations and for each row, generate data according to mean, sd and r using rnorm

# create a nested dataframe to iterate over each simulation and agegroup
nested_df =  parameter %>%
  group_by(sim, agegroup, effectsize)%>%
  nest() %>% arrange(sim)

This is what my dataframe then looks like: picture of dataframe

Now I want create normally-distributed data with the mean, sd, and n given in the "data" column

nested_df = nested_df %>%  
  mutate(data_points = pmap(data,rnorm))

However the code above gives an error that I haven't been able to find a solution to:

Error in mutate_impl(.data, dots) : 
  Evaluation error: unused arguments 

I read the Iteration chapter in R for Data Science and googled a bunch, but I can't figure out how to combine pmap and nest. The reason I would like to use those functions is that it would make it easier to keep the parameters, simulated data, and output all in one dataframe.

like image 321
Esther Avatar asked Oct 19 '25 05:10

Esther


1 Answers

You don't necessarily need to nest the parameters. For example:

parameter %>%
  # Use `pmap` because we explicitly specify three arguments
  mutate(data_points = pmap(list(n, mean, sd), rnorm))
# A tibble: 27 x 7
#         n agegroup effectsize    sd   sim  mean data_points
#     <dbl> <chr>         <dbl> <dbl> <int> <dbl> <list>     
#   1    60 a               0.2     2     1   0.4 <dbl [60]> 
#   2    60 a               0.5     2     2   1   <dbl [60]> 
#   3    60 a               0.8     2     3   1.6 <dbl [60]> 

With the nested data frame, you can use map rather than pmap:

nested_df %>%
  # Use `map` because there is really one argument, `data`,
  # but then refer to three different columns of `data`.
  mutate(data_points = map(data, ~ rnorm(.$n, .$mean, .$sd)))
like image 179
dipetkov Avatar answered Oct 21 '25 19:10

dipetkov



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!