Convert a single data frame into list of data frames (parsing column names into prefixes and suffixes)

Question

I am hoping to determine an efficient way to convert a single data frame into a list of data frames. Below is my reproducible MWE:

set.seed(1)
ABAge = runif(100)
ABPoints = rnorm(100)
ACAge = runif(100)
ACPoints = rnorm(100)
BCAge = runif(100)
BCPoints = rnorm(100)

A_B <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ABAge, Points = ABPoints)
A_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = ACAge, Points = ACPoints)
B_C <- data.frame(ID = as.character(paste0("ID", 1:100)), Age = BCAge, Points = BCPoints)
A_B$ID <- as.character(A_B$ID)
A_C$ID <- as.character(A_C$ID)
B_C$ID <- as.character(B_C$ID)

listFormat <- list("A_B" = A_B, "A_C" = A_C, "B_C" = B_C)

dfFormat <- data.frame(ID = as.character(paste0("ID", 1:100)), A_B.Age = ABAge, A_B.Points = ABPoints, A_C.Age = ACAge, A_C.Points = ACPoints, B_C.Age = BCAge, B_C.Points = BCPoints)
dfFormat$ID = as.character(dfFormat$ID)

This results in a data frame format (dfFormat) that looks like this:

'data.frame':   100 obs. of  7 variables:
 $ ID        : chr  "ID1" "ID2" "ID3" "ID4" ...
 $ A_B.Age   : num  0.266 0.372 0.573 0.908 0.202 ...
 $ A_B.Points: num  0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C.Age   : num  0.6737 0.0949 0.4926 0.4616 0.3752 ...
 $ A_C.Points: num  0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C.Age   : num  0.814 0.929 0.147 0.75 0.976 ...
 $ B_C.Points: num  1.474 0.677 0.38 -0.193 1.578 ...

and a list of data frames listFormat that looks like this:

List of 3
 $ A_B:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
  ..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
 $ A_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
  ..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
 $ B_C:'data.frame':    100 obs. of  3 variables:
  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
  ..$ Age   : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
  ..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...

I am hoping to come up with an automated way to convert the dfFormat to listFormat. As can be seen in the above objects there are two main conditions:

The column ID is always the first column in dfFormat and it is always the first column in each sublist of listFormat.
The number of sublists is equal to the number of unique column names in dfFormat before the underscore ('_'). In this case, that is three prefixes (ex. "A_B", "A_C", and "B_C"). These prefixes are also the names of the three sublists.
Within each sublist, it contains the number of columns that had the associated prefix ("A_B"). For each sublist, this was two ("Age" and "Points"). These suffixes are the names of the columns.

I asked the reverse question here (i.e. how to go from listFormat to dfFormat) and got some helpful answers that I am learning from. I need to have code to reverse both directions and it seems the reverse direction may need new types of code. I put my attempt below to show how I am stuck!

conUnd <- which(sapply(colnames(dfFormat), function(x) grepl("_", x)))
listName <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1])
uListName <- unique(sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][1]))
listCol <- sapply(colnames(dfFormat[,conUnd]), function(x) strsplit(x, "[.]")[[1]][2])

listFormat = list()
for (i in 1:length(uListName)){
   [Gets messy here trying to define column names based on string variables]
}

Any advice would be greatly appreciated. I know my code is not efficient.

Ronak Shah · Accepted Answer

You can use split.default in base R -

output <- lapply(split.default(dfFormat[-1], sub("\..*", "",names(dfFormat[-1]))), 
          function(x) cbind(dfFormat[1], setNames(x, sub(".*\.", "", names(x)))))
str(output)

#List of 3
# $ A_B:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.266 0.372 0.573 0.908 0.202 ...
#  ..$ Points: num [1:100] 0.398 -0.612 0.341 -1.129 1.433 ...
# $ A_C:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.6737 0.0949 0.4926 0.4616 0.3752 ...
#  ..$ Points: num [1:100] 0.409 1.689 1.587 -0.331 -2.285 ...
# $ B_C:'data.frame':   100 obs. of  3 variables:
#  ..$ ID    : chr [1:100] "ID1" "ID2" "ID3" "ID4" ...
#  ..$ Age   : num [1:100] 0.814 0.929 0.147 0.75 0.976 ...
#  ..$ Points: num [1:100] 1.474 0.677 0.38 -0.193 1.578 ...

Convert a single data frame into list of data frames (parsing column names into prefixes and suffixes)

Tags:

list

dataframe

r

lavenderGem

1 Answers

Ronak Shah

Recent Activity

Donate For Us

Convert a single data frame into list of data frames (parsing column names into prefixes and suffixes)

Tags:

list

dataframe

r

lavenderGem

1 Answers

Ronak Shah

Related questions

Recent Activity

Donate For Us