I need to aggregate all my columns, but only if their values are 1, and group by an another columns.
Here is how I could perform this, step by step, i.e. column by column, from foo to out, the expected output.
set.seed(123)
foo <- data.table(id = rep(LETTERS[1:10], each = 10),
code = sample(c(111, 222, 333, 444), 100, replace = TRUE),
x1 = sample(0:1, 100, replace = TRUE),
x2 = sample(0:1, 100, replace = TRUE),
x3 = sample(0:1, 100, replace = TRUE),
x4 = sample(0:1, 100, replace = TRUE),
x5 = sample(0:1, 100, replace = TRUE),
x6 = sample(0:1, 100, replace = TRUE))
f1 <- foo[x1 == 1, .(x1 = uniqueN(code)), by = id]
f2 <- foo[x2 == 1, .(x2 = uniqueN(code)), by = id]
f3 <- foo[x3 == 1, .(x3 = uniqueN(code)), by = id]
setkey(f1, id)
setkey(f2, id)
setkey(f3, id)
out <- f1[f2,][f3,]
I'm sure there is an elegant and fast way to do this in data.table, maybe with a function and the .SDcols parameter, right?
But I can't find it...
Many thanks !!
With .SD and .SDcols:
foo[,lapply(.SD, function(x) uniqueN(code[x==1])), .SDcols = x1:x3, by = id]
id x1 x2 x3
1: A 3 3 2
2: B 4 4 4
3: C 2 2 3
4: D 4 3 3
5: E 4 4 3
6: F 1 3 3
7: G 3 2 2
8: H 4 2 3
9: I 3 3 3
10: J 3 2 2
Using tidyverse
library(dplyr)
foo %>%
group_by(id) %>%
summarise(across(x1:x3, ~ n_distinct(code[.==1])))
-output
# A tibble: 10 x 4
# id x1 x2 x3
# * <chr> <int> <int> <int>
# 1 A 3 3 2
# 2 B 4 4 4
# 3 C 2 2 3
# 4 D 4 3 3
# 5 E 4 4 3
# 6 F 1 3 3
# 7 G 3 2 2
# 8 H 4 2 3
# 9 I 3 3 3
#10 J 3 2 2
Or can use collapse
library(collapse)
collap(slt(foo, id, x1:x3), ~ id, fNdistinct)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With