If I had a hierarchical dataframe like this:
level_1<-c("a","a","a","b","c","c")
level_2<-c("flower","flower","tree","mushroom","dog","cat")
level_3<-c("rose","sunflower","pine",NA,"spaniel",NA)
level_4<-c("pink",NA,NA,NA,NA,NA)
df<-data.frame(level_1,level_2,level_3,level_4)
How do I convert this to a list which orders according to the hierarchy, like this:
> list
[1] "a" "flower" "rose" "pink" "sunflower" "tree" "pine" "b" "mushroom" "c"
[11] "dog" "spaniel" "c" "cat"
So for in value in level 1, it list all level 2 values expanded across the other levels. Hopefully that makes sense?
Thanks in advance!
We can try this
> unique(na.omit(c(t(df))))
[1] "a" "flower" "rose" "pink" "sunflower" "tree"
[7] "pine" "b" "mushroom" "c" "dog" "spaniel"
[13] "cat"
In the question "c" appears twice in the desired answer but "a" and "b" only appear once. We assume that this is an error and what is wanted is that each should only appear once.
uniq <- function(x) unique(na.omit(c(t(x))))
unname(unlist(by(df, df$level_1, uniq)))
## [1] "a" "flower" "rose" "pink" "sunflower" "tree"
## [7] "pine" "b" "mushroom" "c" "dog" "spaniel"
## [13] "cat"
It could also be expressed using pipes:
uniq <- \(x) x |> t() |> c() |> na.omit() |> unique()
by(df, df$level_1, uniq) |> unlist() |> unname()
As one of the other answers points out the same result could be obtained using just uniq(df)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With