I have the following table.
dt = data.table(id = 1:5, intMask = c(11,14,8,1,13), imprint = c("1011", "1110", "1000", "0001", "1101"), N = c(3,3,1,1,3), mass = c(.05,.1,.15,.3,.4))
id intMask imprint N mass
1: 1 11 1011 3 0.05
2: 2 14 1110 3 0.10
3: 3 8 1000 1 0.15
4: 4 1 0001 1 0.30
5: 5 13 1101 3 0.40
Assume that the imprint column represents a binary representation of a set (i.e. here we have subsets of a set of cardinality 5). intMask represents the respective integer corresponding to the binary representation. N respective cardinality - i.e. number of 1s in the representation.
I would like to update the sum by summating all rows corresponding to respective supersets. I propose using the bitwAnd() function with column intMask to find respective supersets efficiently.
for(i in 1:nrow(dt)) {
i.intMask <- dt[i,intMask]
i.N <- dt[i,N]
dt[i, newMass := sum(dt[N >= i.N,][bitwAnd(intMask, i.intMask) == i.intMask, mass])]
}
I.e. to get
dt[]
id intMask imprint N mass newMass
1: 1 11 1011 3 0.05 0.05
2: 2 14 1110 3 0.10 0.10
3: 3 8 1000 1 0.15 0.70
4: 4 1 0001 1 0.30 0.75
5: 5 13 1101 3 0.40 0.04
Assume thousands of rows. Do you have an idea of how to do it efficiently? Preferably using data.table updating?
This might be one option
dt[
dt[
dt,
c(
id = .(i.id),
newMass = .(mass * (bitwAnd(intMask, i.intMask) == i.intMask))
),
on = .(N >= N)
][, lapply(.SD, sum), id],
on = .(id)
]
which gives
id intMask imprint N mass newMass
<int> <num> <char> <num> <num> <num>
1: 1 11 1011 3 0.05 0.05
2: 2 14 1110 3 0.10 0.10
3: 3 8 1000 1 0.15 0.70
4: 4 1 0001 1 0.30 0.75
5: 5 13 1101 3 0.40 0.40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With