I have a data set with individuals (ID) that can be part of more than one group.
Example:
library(data.table)
DT <- data.table(
  ID = rep(1:5, c(3:1, 2:3)),
  Group = c("A", "B", "C", "B",
            "C", "A", "A", "C",
            "A", "B", "C")
)
DT
#     ID Group
#  1:  1     A
#  2:  1     B
#  3:  1     C
#  4:  2     B
#  5:  2     C
#  6:  3     A
#  7:  4     A
#  8:  4     C
#  9:  5     A
# 10:  5     B
# 11:  5     C
I want to know the sum of identical individuals for 2 groups.
The result should look like this:
  Group.1    Group.2    Sum
    A           B        2
    A           C        3
    B           C        3
Where Sum indicates the number of individuals the two groups have in common.
Here's my version:
# size-1 IDs can't contribute; skip
DT[ , if (.N > 1) 
  # simplify = FALSE returns a list;
  #   transpose turns the 3-length list of 2-length vectors
  #   into a length-2 list of 3-length vectors (efficiently)
  transpose(combn(Group, 2L, simplify = FALSE)), by = ID
  ][ , .(Sum = .N), keyby = .(Group.1 = V1, Group.2 = V2)]
With output:
#    Group.1 Group.2 Sum
# 1:       A       B   2
# 2:       A       C   3
# 3:       B       C   3
As of version 1.9.8 (on CRAN 25 Nov 2016), data.table has gained the ability to do non-equi joins. So, a self non-equi join can be used:
library(data.table) # v1.9.8+
setDT(DT)[, Group:= factor(Group)]
DT[DT, on = .(ID, Group < Group), nomatch = 0L, .(ID, x.Group, i.Group)][
  , .N, by = .(x.Group, i.Group)]
x.Group i.Group N 1: A B 2 2: A C 3 3: B C 3
The non-equi join on ID, Group < Group is a data.table version of combn() (but applied group-wise):
DT[DT, on = .(ID, Group < Group), nomatch = 0L, .(ID, x.Group, i.Group)]
ID x.Group i.Group 1: 1 A B 2: 1 A C 3: 1 B C 4: 2 B C 5: 4 A C 6: 5 A B 7: 5 A C 8: 5 B C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With