I have a list of the following structure,
myList <- replicate(5, data.frame(id = 1:10, mean = runif(10)), simplify =F)
and I want to reduce it with a merge
myList %>% reduce(function(x, y) merge(x, y, by = 'id'))
That, however, leads to the following colnames:
id mean.x mean.y mean.x mean.y mean
While I would like something like
id mean1 mean2 mean3 mean4 mean5
Where the numbers are based on the order of myList.
Obviously I could iterate over 1:length(myList), but I find this solution unelegant. Other option would be to introduce a check in the reducing function, but that would indue a new linear time search for each element of the list, so I don't believe it to be very efficient.
Is there another way to achieve this?
New answer:
Using rbindlist and dcast from the data.table-package:
library(data.table)
mydata <- rbindlist(myList, idcol = 'df')
dcast(mydata, id ~ paste0('mean',df), value.var = 'mean')
Or with the tidyverse packages:
library(dplyr)
library(tidyr)
myList %>%
bind_rows(., .id = 'df') %>%
spread(df, mean) %>%
rename_at(-1, funs(paste0('mean',.)))
which both give (data.table-output is shown):
id mean1 mean2 mean3 mean4 mean5 1: 1 0.6937674 0.005642891 0.4155868 0.74184186 0.54513885 2: 2 0.3602352 0.569412043 0.8018570 0.29177043 0.34521060 3: 3 0.6353133 0.512876032 0.8711914 0.44660086 0.16338451 4: 4 0.2106574 0.555638598 0.8240744 0.37495213 0.57443740 5: 5 0.9530160 0.059930577 0.0930678 0.39862717 0.91568414 6: 6 0.3723244 0.598526326 0.4970844 0.01978011 0.07832631 7: 7 0.2923137 0.712971846 0.3805590 0.25676592 0.11682605 8: 8 0.6208868 0.426853621 0.5533876 0.64054247 0.78949419 9: 9 0.9032609 0.274705843 0.3525957 0.46994429 0.32883110 10: 10 0.9707088 0.351394642 0.1089803 0.97969335 0.77791085
When there are duplicates in id in one or more of the dataframes in myList, you have to adapt the dcast-step to dcast(mydata, id + rowid(id,df) ~ paste0('mean',df), value.var = 'mean') to get the correct outcome. Check the following example to see the result:
myList <- replicate(5, data.frame(id = sample(1:10, 10, TRUE), mean = runif(10)), simplify = FALSE)
mydata <- rbindlist(myList, idcol = 'df')
dcast(mydata, id + rowid(id,df) ~ paste0('mean',df), value.var = 'mean')
This also works when there are no duplicates in id.
The tidyverse-code has then to be adapted to:
myList %>%
bind_rows(., .id = 'df') %>%
group_by(df, id) %>%
mutate(ri = row_number()) %>%
ungroup() %>%
spread(df, mean) %>%
rename_at(3:7, funs(paste0('mean',.)))
Old answer (still valid):
A possible solution:
# option 1
myList <- mapply(function(x,y) {names(x)[2] = paste0('mean',y); x}, myList, 1:length(myList), SIMPLIFY = FALSE)
Reduce(function(x, y) merge(x, y, by = 'id'), myList)
# option 2 (quite similar to @zx8754's solution)
mydata <- Reduce(function(x, y) merge(x, y, by = 'id'), myList)
setNames(mydata, c('id', paste0('mean', seq_along(myList))))
which gives:
id mean1 mean2 mean3 mean4 mean5 1 1 0.1119114 0.4193226 0.86619590 0.52543072 0.52879193 2 2 0.4630863 0.8786721 0.02012432 0.77274088 0.09227344 3 3 0.9832522 0.4687838 0.49074271 0.01611625 0.69919423 4 4 0.7017467 0.7845002 0.44692958 0.64485570 0.40808345 5 5 0.6204856 0.1687563 0.54407165 0.54236973 0.09947167 6 6 0.1480965 0.7654041 0.43591864 0.22468554 0.84557988 7 7 0.0179509 0.3610114 0.45420122 0.20612154 0.76899342 8 8 0.9862083 0.5579173 0.13540519 0.97311401 0.13947602 9 9 0.3140737 0.2213044 0.05187671 0.07870425 0.23880332 10 10 0.4515313 0.2367271 0.65728768 0.22149073 0.90578043
You can also try to modify the function in the Reduce (or reduce) call to make the adding of indices automatic :
Reduce(function(x, y){
# get indices of columns that are not the common one, in x and y
col_noby_x <- which(colnames(x) != "id")
col_noby_y <- which(colnames(y) != "id")
# maximum of indices in x (at the end of the column names)
ind_x <- max(as.numeric(sub(".+(\\d+)$", "\\1", colnames(x)[col_noby_x])))
# if there is no indice yet, put 1 and 2, else modify names only in y, taking the max value of indices in x plus one.
if(!is.na(ind_x)) colnames(y)[col_noby_y] <- paste0(colnames(y)[col_noby_y], ind_x +1) else {colnames(x)[col_noby_x] <- paste0(colnames(x)[col_noby_x], 1); colnames(y)[col_noby_y] <- paste0(colnames(y)[col_noby_y], 2)}
# finally merge
merge(x, y, by="id")}, myList)
# id mean1 mean2 mean3 mean4 mean5
#1 1 0.10698388 0.0277198 0.5109345 0.8885772 0.79983437
#2 2 0.29750846 0.7951743 0.9558739 0.9691619 0.31805857
#3 3 0.07115142 0.2401011 0.8106464 0.5101563 0.78697618
#4 4 0.39564336 0.7225532 0.7583893 0.4275574 0.77151883
#5 5 0.55860511 0.4111913 0.8403031 0.4284490 0.51489116
#6 6 0.92191777 0.9142926 0.4708712 0.2451099 0.84142501
#7 7 0.08218166 0.2741819 0.6772842 0.7939364 0.86930336
#8 8 0.35392512 0.2088531 0.0801731 0.2734870 0.62963218
#9 9 0.64068537 0.8427225 0.1904426 0.2389339 0.73145206
#10 10 0.31304719 0.9898133 0.8173664 0.2013031 0.04658273
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With