Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I convert a hierarchical dataframe to a list in R?

If I had a hierarchical dataframe like this:

level_1<-c("a","a","a","b","c","c")
level_2<-c("flower","flower","tree","mushroom","dog","cat")
level_3<-c("rose","sunflower","pine",NA,"spaniel",NA)
level_4<-c("pink",NA,NA,NA,NA,NA)
df<-data.frame(level_1,level_2,level_3,level_4)

How do I convert this to a list which orders according to the hierarchy, like this:

> list
 [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"      "pine"      "b"         "mushroom"  "c"        
[11] "dog"       "spaniel"   "c"         "cat"      

So for in value in level 1, it list all level 2 values expanded across the other levels. Hopefully that makes sense?

Thanks in advance!

like image 241
EmmaH Avatar asked Oct 18 '25 21:10

EmmaH


2 Answers

We can try this

> unique(na.omit(c(t(df))))
 [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"
 [7] "pine"      "b"         "mushroom"  "c"         "dog"       "spaniel"
[13] "cat"
like image 106
ThomasIsCoding Avatar answered Oct 20 '25 12:10

ThomasIsCoding


In the question "c" appears twice in the desired answer but "a" and "b" only appear once. We assume that this is an error and what is wanted is that each should only appear once.

uniq <- function(x) unique(na.omit(c(t(x))))
unname(unlist(by(df, df$level_1, uniq)))
##  [1] "a"         "flower"    "rose"      "pink"      "sunflower" "tree"     
##  [7] "pine"      "b"         "mushroom"  "c"         "dog"       "spaniel"  
## [13] "cat"

It could also be expressed using pipes:

uniq <- \(x) x |> t() |> c() |> na.omit() |> unique()
by(df, df$level_1, uniq) |> unlist() |> unname()

As one of the other answers points out the same result could be obtained using just uniq(df) .

like image 27
G. Grothendieck Avatar answered Oct 20 '25 11:10

G. Grothendieck