I am having difficulty replicating the functionality of a typical SQL GROUP_CONCAT function in dplyr. I would also like to make sure the ordering inside the groups can be controlled. Ideally I want to use the hadleyverse/tidyverse but base R or other packages will work too.
Example data:
ID name
1 apple
1 orange
2 orange
3 orange
3 apple
Desired output:
ID name
1 apple,orange
2 orange
3 apple,orange
Note that for ID=3, the ordering is in alpha order, not how the rows are ordered. I think this can probably be handled by doing an arrange first, but it would be nice to control inside the summarise statement or the like.
In R, we can use one of the group by operations.
library(dplyr)
df1 %>%
group_by(ID) %>%
summarise(name = toString(sort(unique(name))))
# ID name
# <int> <chr>
#1 1 apple, orange
#2 2 orange
#3 3 apple, orange
Or using data.table
library(data.table)
setDT(df1)[, .(name = toString(sort(unique(name)))), by = ID]
# ID name
#1: 1 apple, orange
#2: 2 orange
#3: 3 apple, orange
For base R use this
aggregate(data=df,name~ID,FUN = function(t) sort(paste(t)))
Data
df<-read.table(header = T,text = "ID name
1 apple
1 orange
2 orange
3 orange
3 apple")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With