Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

make combinations from one column of a data.table and sums from corresponding

Tags:

r

data.table

In:

   V1  V2
1:  A 0.6
2:  B 0.3
3:  C 0.1

Out (V1 are the combinations, and V2 their sum):

   V1  V2
1: AA 1.2
2: AB 0.9
3: AC 0.7
4: BA 0.9
5: BB 0.6
6: BC 0.4
7: CA 0.7
8: CB 0.4
9: CC 0.2

Achieving this with a double for loop, which seems slow... any faster, data-tablish, way of speeding things up?

Script:

dtIn <- data.table(LETTERS[1:3], c(0.6, 0.3, 0.1))
dtOut <- list()
for (i in 1:nrow(dtIn))
  for (j in 1:nrow(dtIn))
    dtOut[[paste0(i, j)]] <- data.table(paste0(c(dtIn[i, V1], dtIn[j, V1]), collapse = ""),
                                        dtIn[i, V2] + dtIn[j, V2])
dtOut <- rbindlist(dtOut)
like image 616
John Smith Avatar asked Feb 04 '26 00:02

John Smith


2 Answers

You can use CJ to "cross join" each column with itself:

dtIn[, lapply(.SD, function(x) 
  Reduce(if (class(x)=="character") paste0 else `+`, CJ(x,x)))]

This has a few advantages vs outer:

  • It doesn't involve coercion of a matrix (from outer) back to a vector.
  • It doesn't require repeatedly typing the var names.
  • If dtIn columns have names (other than the defaults), they will be preserved.
  • It can be extended fairly cleanly (e.g., CJ(x,x,x) to get 3x combinations).

.


In base, this can be done with expand.grid, which is essentially the same as CJ:

DF = data.frame(dtIn)
data.frame(lapply(DF, function(x) 
  Reduce(if (class(x)=="character") paste0 else `+`, expand.grid(x,x))))
like image 118
Frank Avatar answered Feb 05 '26 14:02

Frank


The double-loop suggests the use of outer here:

dtIn[,list(outer(V1,V1,paste0),outer(V2,V2,"+"))]

#    V1  V2
# 1: AA 1.2
# 2: BA 0.9
# 3: CA 0.7
# 4: AB 0.9
# 5: BB 0.6
# 6: CB 0.4
# 7: AC 0.7
# 8: BC 0.4
# 9: CC 0.2
like image 36
agstudy Avatar answered Feb 05 '26 14:02

agstudy