I often find myself making incorrect choices in variables names when using purrr.
For example, take the code on the github page of purrr.
library(purrr)
mtcars %>%
split(.$cyl)
in split(.$cyl) I often make the mistake of using split(cyl). This seems to be the most obvious choice as it is consistent with other tidyverse commands such as select(cyl).
My question is why the .$ in front of the variable name.
The . represents the data object and by using $ it is extracting the column. It can also take in
mtcars %>%
split(.[['cyl']]
With in the mutate/summarise/group_by/select/arrange etc. we can simply pass the column names, but there it is different as split is a base R function and it cannot find the environment of the dataset where the column 'cyl' is unless we extract the column
One option we can do in tidyverse is to nest all other variables except 'cyl' i.e.
mtcars %>%
nest(-cyl)
Now, we have a list column named 'data' which contains all the other columns as a list of 'data.frame`s
With new versions of dplyr (0.8.1 tested), there is group_split as commented by @Moody_Mudskipper
mtcars %>%
group_split(cyl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With