Firstly I'm pretty sure this has been answered before but the search terms seem difficult to hit, apologies if there is a duplicate out there.
Say I have a vector of factors:
all <- factor(letters)
And I've gone on to use all combinations of those factor levels as part of a modelling pipeline:
combos <- t(combn(as.character(all), 5))
head(combos)
#     [,1] [,2] [,3] [,4] [,5]
# [1,] "a"  "b"  "c"  "d"  "e" 
# [2,] "a"  "b"  "c"  "d"  "f" 
# [3,] "a"  "b"  "c"  "d"  "g" 
# ...
My question is: How can I convert this second matrix to one showing presence/absence of all levels, like:
      a   b   c   d   e   f   g  ...
[1,]  1   1   1   1   1   0   0  ...
[2,]  1   1   1   1   0   1   0  ...
[3,]  1   1   1   1   0   0   1  ...
...
In terms of what I've tried, my first thought was a row-wise application of ifelse using apply, but I haven't been able to put anything workable together. Any smart way of doing this?
You can use matrix indexing to get even better speeds. Here is a much improved solution that does not use a for loop.
all <- factor(letters)
combos <- t(combn(as.character(all), 5))
A <- match(c(t(combos)), letters)
B <- 0:(length(A)-1) %/% 5 + 1
a <- unique(as.vector(combos))
x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
            dimnames = list(NULL, a))
x[cbind(B, A)] <- 1L
orig <- function() {
  a <- unique(as.vector(combos))
  x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
              dimnames = list(NULL, a))
  for (i in 1:nrow(combos)) {
    x[i, combos[i, ]] <- 1
  }
  x
}
new <- function() {
  A <- match(c(t(combos)), letters)
  B <- 0:(length(A)-1) %/% 5 + 1
  a <- unique(as.vector(combos))
  x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
              dimnames = list(NULL, a))
  x[cbind(B, A)] <- 1L
  x
}
identical(orig(), new())
# [1] TRUE
library(microbenchmark)
microbenchmark(orig(), new(), times = 20)
# Unit: milliseconds
#    expr       min        lq    median       uq      max neval
#  orig() 476.85206 486.11091 497.48429 512.4333 579.2695    20
#   new()  87.02026  91.17021  96.88463 111.6414 175.6339    20
In a problem like this, a for loop would work just fine and can be easily preallocated:
a <- unique(as.vector(combos))
x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
            dimnames = list(NULL, a))
for (i in 1:nrow(combos)) {
  x[i, combos[i, ]] <- 1
}
head(x)
#      a b c d e f g h i j k l m n o p q r s t u v w x y z
# [1,] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [2,] 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [3,] 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [4,] 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [5,] 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [6,] 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Here's my attempt:
combos.out <- t(apply(combos, 1, function(x) table(factor(x, levels = letters))))
head(combos.out)
#      a b c d e f g h i j k l m n o p q r s t u v w x y z
# [1,] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [2,] 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [3,] 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [4,] 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [5,] 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [6,] 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
WRT @Ananda Mahto's comment, the manipulation through transformation and factorisation definitely slows things down - a quick and dirty benchmark:
#Unit: milliseconds
#             expr        min         lq     median         uq        max neval
#   forfun(combos)   416.6027   534.6973   652.7919   718.4231   784.0544     3
# applyfun(combos) 13892.7020 15755.8570 17619.0121 22559.8271 27500.6421     3
Score one for the for loop!
A simple, and pretty efficient solution:
t(apply(combos,1,function(x){all %in% x}))*1
The for loop solution by Ananda Mahto is still about twice as fast:
      min       lq  median       uq      max neval
 561.2153 638.4648 643.439 650.7053 1199.857   100
versus
      min       lq   median       uq      max neval
 295.8798 305.0586 311.9961 370.6028 406.9336   100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With