I have some data that is formatted in a way that's difficult to use, so I'm trying to flatten it out. The minimum reproducible example is here.
> str(sampleData)
List of 4
$ Events :'data.frame': 2 obs. of 3 variables:
..$ CateringOptions:List of 2
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "SpicyOWing"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105031"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "New York"
.. ..$ :'data.frame': 1 obs. of 3 variables:
.. .. ..$ Agreed : logi TRUE
.. .. ..$ Tnc :'data.frame': 1 obs. of 5 variables:
.. .. .. ..$ Identity : chr "BaconEggs"
.. .. .. ..$ Schema : logi NA
.. .. .. ..$ ElementId : chr "105032"
.. .. .. ..$ ElementType : logi NA
.. .. .. ..$ ElementVersion: logi NA
.. .. ..$ Address: chr "Seattle"
..$ Action : num [1:2] 1 1
..$ Volume : num [1:2] 1000 2000
$ Host :List of 5
..$ Identity : chr "John"
..$ Schema : logi NA
..$ ElementId : chr "101505"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ Sender :List of 5
..$ Identity : chr "Jane"
..$ Schema : logi NA
..$ ElementId : chr "101005"
..$ ElementType : logi NA
..$ ElementVersion: logi NA
$ CompletedDate: chr "/Date(1490112000000)/"
Expected
> expectedOutcome
Events.CateringOptions.Agreed Events.CateringOptions.Tnc.Identity Events.CateringOptions.Tnc.Schema Events.CateringOptions.Tnc.ElementId
1 NA SpicyOWing TRUE 105031
2 NA BaconEggs TRUE 105032
Events.CateringOptions.Tnc.ElementType Events.CateringOptions.Tnc.ElementVersion Events.CateringOptions.Address Events.Action Events.Volume Host.Identity
1 NA NA New York 1 1000 John
2 NA NA Seattle 1 2000 John
Host.Schema Host.ElementId Host.ElementType Host.ElementVersion Sender.Identity Sender.Schema Sender.ElementId Sender.ElementType Sender.ElementVersion
1 NA 101505 NA NA Jane NA 101005 NA NA
2 NA 101505 NA NA Jane NA 101005 NA NA
CompletedDate
1 /Date(1490112000000)/
2 /Date(1490112000000)/
The check function
check<-function(li){
areDF<-sapply(1:length(li), function(i) class(li[[i]]) == "data.frame")
areList<-sapply(1:length(li), function(i) class(li[[i]]) == "list")
tmp1 <- NULL
tmp2 <- NULL
if(any(areDF)){
for(j in which(areDF)){
columns <- jsonlite::flatten(li[[j]])
li[[j]] <- check(columns)
}
tmp1<-plyr::rbind.fill(li[areDF])
#return(tmp1)
}
if(any(areList)){
for(j in which(areList)){
li[[j]]<-check(li[[j]])
}
tmp2<-do.call(cbind,li)
#return(tmp2)
}
if(!is.null(tmp1) & !is.null(tmp2)){
return (cbind(tmp1,tmp2))
}
else if(!is.null(tmp1)){
return (tmp1)
}
else if(!is.null(tmp2)){
return (tmp2)
}
return(li)
}
Results
> str(check(sampleData))
'data.frame': 2 obs. of 29 variables:
$ CateringOptions.Agreed : logi TRUE TRUE
$ CateringOptions.Address : chr "New York" "Seattle"
$ CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ CateringOptions.Tnc.Schema : logi NA NA
$ CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ CateringOptions.Tnc.ElementType : logi NA NA
$ CateringOptions.Tnc.ElementVersion : logi NA NA
$ Action : num 1 1
$ Volume : num 1000 2000
$ Events.CateringOptions.Agreed : logi TRUE TRUE
$ Events.CateringOptions.Address : chr "New York" "Seattle"
$ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
$ Events.CateringOptions.Tnc.Schema : logi NA NA
$ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
$ Events.CateringOptions.Tnc.ElementType : logi NA NA
$ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
$ Events.Action : num 1 1
$ Events.Volume : num 1000 2000
$ Host.Identity : Factor w/ 1 level "John": 1 1
$ Host.Schema : logi NA NA
$ Host.ElementId : Factor w/ 1 level "101505": 1 1
$ Host.ElementType : logi NA NA
$ Host.ElementVersion : logi NA NA
$ Sender.Identity : Factor w/ 1 level "Jane": 1 1
$ Sender.Schema : logi NA NA
$ Sender.ElementId : Factor w/ 1 level "101005": 1 1
$ Sender.ElementType : logi NA NA
$ Sender.ElementVersion : logi NA NA
$ CompletedDate : Factor w/ 1 level "/Date(1490112000000)/": 1 1
I almost have it, but the nested dataframe is being duped. Also, my code takes fairly long. Does anyone have any idea how I can go about flattening this?
I added my solution in the end in the gist
Here is my take at it, with help from purrr.
The idea is similar to yours, only with a different syntax: flatten() the most nested dataframes, then rbind() them.
If I understand your code properly, mine is slightly different at the end, since I'll try to get a more "jsonlite::flatten-friendly" structure to apply it once more to the end result:
library(jsonlite)
library(purrr)
res <-
sampleData %>%
modify_if(
is.list,
.f = ~ modify_if(
.x,
.p = function(x) all(sapply(x, is.data.frame)),
.f = ~ do.call("rbind", lapply(.x, jsonlite::flatten))
)
) %>%
as.data.frame() %>%
jsonlite::flatten()
str(res)
# 'data.frame': 2 obs. of 20 variables:
# $ Events.Action : num 1 1
# $ Events.Volume : num 1000 2000
# $ Host.Identity : chr "John" "John"
# $ Host.Schema : logi NA NA
# $ Host.ElementId : chr "101505" "101505"
# $ Host.ElementType : logi NA NA
# $ Host.ElementVersion : logi NA NA
# $ Sender.Identity : chr "Jane" "Jane"
# $ Sender.Schema : logi NA NA
# $ Sender.ElementId : chr "101005" "101005"
# $ Sender.ElementType : logi NA NA
# $ Sender.ElementVersion : logi NA NA
# $ CompletedDate : chr "/Date(1490112000000)/" "/Date(1490112000000)/"
# $ Events.CateringOptions.Agreed : logi TRUE TRUE
# $ Events.CateringOptions.Address : chr "New York" "Seattle"
# $ Events.CateringOptions.Tnc.Identity : chr "SpicyOWing" "BaconEggs"
# $ Events.CateringOptions.Tnc.Schema : logi NA NA
# $ Events.CateringOptions.Tnc.ElementId : chr "105031" "105032"
# $ Events.CateringOptions.Tnc.ElementType : logi NA NA
# $ Events.CateringOptions.Tnc.ElementVersion: logi NA NA
I've got one mismatch with your expectedOutcome but if I may, it might be on your side:
all.equal(expectedOutcome[sort(names(expectedOutcome))], res[sort(names(res))])
# [1] "Component “Events.CateringOptions.Agreed”: 'is.NA' value mismatch: 0 in current 2 in target"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With