in my table, some cells are vectors instead of single value, i.e. the column is a list instead of vector:
dt1 <- data.table(
  colA=   c('A1','A2','A3'), 
  colB=list('B1',c('B2a','B2b'),'B3'),
  colC=   c('C1','C2','C3'), 
  colD=   c('D1','D2','D3')
)
dt1
#   colA    colB colC colD
#1:   A1      B1   C1   D1
#2:   A2 B2a,B2b   C2   D2
#3:   A3      B3   C3   D3 
I need to reshape it to a long format unlisting that column colB. So far I do it like this:
dt1[,.(colB=unlist(colB)),by=.(colA,colC,colD)]
#   colA colC colD colB
#1:   A1   C1   D1   B1
#2:   A2   C2   D2  B2a
#3:   A2   C2   D2  B2b
#4:   A3   C3   D3   B3
it does the job but I don't like that I have to indicate all other column names explicitly in by=. Is there better way to do this?
(I'm sure it's already answered elsewhere but I couldn't find it so far)
P.S. ideally I would like to manage without any external packages
The unlist R function converts a list to a single vector.
To sort a data frame in R, use the order( ) function. By default, sorting is ASCENDING. Prepend the sorting variable by a minus sign to indicate DESCENDING order.
Promoting my comment to an answer. Using:
dt1[,.(colB = unlist(colB)), by = setdiff(names(dt1), 'colB')]
gives:
colA colC colD colB 1: A1 C1 D1 B1 2: A2 C2 D2 B2a 3: A2 C2 D2 B2b 4: A3 C3 D3 B3
Or as an alternative (a slight variation of @Frank's proposal):
dt1[rep(dt1[,.I], lengths(colB))][, colB := unlist(dt1$colB)][]
I think @Jaap's is easiest, but here's another alternative to chew over:
#create ID column
dt1[ , ID := .I]
#unnest colB, keep ID column
dt_unnest = dt1[ , .(ID = rep(ID, lengths(colB)),
                     colB = unlist(colB))]
#merge
dt_unnest = dt_unnest[dt1[ , !'colB'], on = 'ID']
#    ID colB colA colC colD
# 1:  1   B1   A1   C1   D1
# 2:  2  B2a   A2   C2   D2
# 3:  2  B2b   A2   C2   D2
# 4:  3   B3   A3   C3   D3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With