I have 10M rows matrix with integer values
A row in this matrix can look as follows:
1 1 1 1 2
I need to transform the row above to the following vector:
4 1 0 0 0 0 0 0 0
Other example:
1 2 3 4 5
To:
1 1 1 1 1 0 0 0 0
How to do it efficiently in R
?
Update:
There is a function that does exactly what I need: base::tabulate (suggested here before)
but it is extremely slow (took at least 15 mins to go over my init matrix)
I would try something like this:
m <- nrow(x)
n <- ncol(x)
i.idx <- seq_len(m)
j.idx <- seq_len(n)
out <- matrix(0L, m, max(x))
for (j in j.idx) {
ij <- cbind(i.idx, x[, j])
out[ij] <- out[ij] + 1L
}
A for loop might sound surprising for a question that asks for an efficient implementation. However, this solution is vectorized for a given column and only loops through five columns. This will be many, many times faster than looping over 10 million rows using apply.
Testing with:
n <- 1e7
m <- 5
x <- matrix(sample(1:9, n*m, T), n ,m)
this approach takes less than six seconds while a naive t(apply(x, 1, tabulate, 9)) takes close to two minutes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With