I have a dataframe that needs to be split into individual files based on the value of a variable in the dataframe. There are scores of individuals and confidential information in the dataframe, thus a simplified example is below. I want the split to be based on the variable "first".
first <- c("Jon", "Bill", "Bill" , "Maria", "Ben", "Tina")
age <- c(23, 41, 41 , 32, 58, 26)
df <- data.frame(first , age)
df
For example, I want the file with Jon to have one line and the file with Bill to have two lines. I've attempted the following but I'm stuck. I don't know how to get individual dataframes from the list df.split.
library(tidyverse)
df.grped <-
df %>%
group_by(first)
df.split <-
group_split(df.grped)
So I would like to have the files: df.split_Jon, df.split_Bill, df.split_Maria, etc. The actual source file is large so I don't want to specify each.
Since I understand working in tidyverse the best I'd like to have the solution there, if possible. Thanks for any help!!
After splitting the data set by the unique values of the first column, we make use of list2env function to create a separated dataframe of each subset into the global environment as follows:
library(tidyverse)
setNames(df %>%
group_split(first), paste0("df.split_", unique(df$first))) %>%
list2env(envir = globalenv())
Another alternative:
library(tidyverse)
df %>%
group_split(first) %>%
walk(~ assign(str_c("df.split_", .[1, 1]), value = ., envir = .GlobalEnv))
names(.GlobalEnv)
#> [1] "df.split_Bill" "first" "df.split_Maria" "df.split_Ben"
#> [5] "df.split_Tina" "age" "df.split_Jon" "df"
Created on 2022-01-01 by the reprex package (v2.0.1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With