I need a base R solution to convert nested list with different names to a data.frame
mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z=list('k'))
convert(mylist)
## returns a data.frame:
##
## a b z
## 1 2 <NULL>
## 3 NA <NULL>
## NA 5 <NULL>
## 9 NA <chr [1]>
I know this could be easily done with dplyr::bind_rows or data.table::rbindlist with fill = TRUE (not ideal though since it fills character column with NULL, not NA), but I do really need a solution in base R. To simplify the problem, it is also fine with a 2-level nested list that has no 3rd level lists such as
mylist <- list(list(a=1,b=2), list(a=3), list(b=5), list(a=9, z='k'))
convert(mylist)
## returns a data.frame:
##
## a b z
## 1 2 NA
## 3 NA NA
## NA 5 NA
## 9 NA k
I have tried something like
convert <- function(L) as.data.frame(do.call(rbind, L))
This does not fill NA and add additional column z
mylist here is just a simplified example. In reality I could not assume the names of the sublist elements (a, b and z in the example), nor the sublists lengths (2, 1, 1, 2 in the example).
Here are the assumptions for expected data.frame and the input mylist:
data.frame is determined by the maximum length of the sublists which could vary from 1 to several hundreds. There is no explicit source of information about the length of each sublist (which names will appear or disappear in which sublist is unknown)
max(sapply(mylist, length)) <= 1000 ## ==> TRUEdata.frame is determined by the length of mylist which could vary from 1 to several thousands
dplyr::between(length(mylist), 0, 10000) ## ==> TRUEdata.frame can only be determined intrinsically from mylistnumeric, character or list. To simplify the problem, consider only numeric and character.A shorter solution in base R would be
make_df <- function(a = NA, b = NA, z = NA) {
data.frame(a = unlist(a), b = unlist(b), z = unlist(z))
}
do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
#> a b z
#> 1 1 2 <NA>
#> 2 3 NA <NA>
#> 3 NA 5 <NA>
#> 4 9 NA k
Update
A more general solution using the same method, but which does not require specific names would be:
build_data_frame <- function(obj) {
nms <- unique(unlist(lapply(obj, names)))
frmls <- as.list(setNames(rep(NA, length(nms)), nms))
dflst <- setNames(lapply(nms, function(x) call("unlist", as.symbol(x))), nms)
make_df <- as.function(c(frmls, call("do.call", "data.frame", dflst)))
do.call(rbind, lapply(mylist, function(x) do.call(make_df, x)))
}
This allows
build_data_frame(mylist)
#> a b z
#> 1 1 2 <NA>
#> 2 3 NA <NA>
#> 3 NA 5 <NA>
#> 4 9 NA k
We can try the base R code below
subset(
Reduce(
function(...) {
merge(..., all = TRUE)
},
Map(
function(k, x) cbind(id = k, list2DF(x)),
seq_along(mylist), mylist
)
),
select = -id
)
which gives
a b z
1 1 2 NA
2 3 NA NA
3 NA 5 NA
4 9 NA k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With