R: Summing up neighboring matrix elements. How to speed up?

Question

I'm working with large matrices of about 2500x2500x50 (lonxlatxtime). The matrix contains only 1 and 0. I need to know for each timestep the sum of the 24 surrounding elements. So far I did it about this way:

xdim <- 2500
ydim <- 2500
tdim <- 50
a <- array(0:1,dim=c(xdim,ydim,tdim))
res <- array(0:1,dim=c(xdim,ydim,tdim))

for (t in 1:tdim){
  for (x in 3:(xdim-2)){
    for (y in 3:(ydim-2)){
      res[x,y,t] <- sum(a[(x-2):(x+2),(y-2):(y+2),t])
    }
  }
}

This works, but it is much too slow for my needs. Has anybody please an advice how to speed up?

coatless · Accepted Answer

Intro

I have to say, there are so many hidden things behind just the setup of the arrays. The remainder of the problem is trivial though. As a result, there are two ways to go about it really:

Bruteforce given by @Alex (written in C++)
Observing replication patterns

Bruteforce with OpenMP

If we want to 'brute force' it, then we can use the suggestion given by @Alex to employ OpenMP with Armadillo

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// Add a flag to enable OpenMP at compile time
// [[Rcpp::plugins(openmp)]]

// Protect against compilers without OpenMP
#ifdef _OPENMP
  #include <omp.h>
#endif

// [[Rcpp::export]]
arma::cube cube_parallel(arma::cube a, arma::cube res, int cores = 1) {

  // Extract the different dimensions
  unsigned int tdim = res.n_slices;

  unsigned int xdim = res.n_rows;

  unsigned int ydim = res.n_cols;

  // Same calculation loop
  #pragma omp parallel for num_threads(cores)
  for (unsigned int t = 0; t < tdim; t++){
    // pop the T
    arma::mat temp_mat = a.slice(t);

    // Subset the rows
    for (unsigned int x = 2; x < xdim-2; x++){

      arma::mat temp_row_sub = temp_mat.rows(x-2, x+2);

      // Iterate over the columns with unit accumulative sum
      for (unsigned int y = 2; y <  ydim-2; y++){
        res(x,y,t) = accu(temp_row_sub.cols(y-2,y+2));
      }
    }
  }

  return res;
}

Replication Patterns

However, the smarter approach is understanding how the array(0:1, dims) is being constructed.

Most notably:

Case 1: If xdim is even, then only the rows of a matrix alternate.
Case 2: If xdim is odd and ydim is odd, then rows alternate as well as the matrices alternate.
Case 3: If xdim is odd and ydim is even, then only the rows alternate

Examples

Let's see the cases in action to observe the patterns.

Case 1:

xdim <- 2
ydim <- 3
tdim <- 2
a <- array(0:1,dim=c(xdim,ydim,tdim))

Output:

, , 1

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    1    1

, , 2

     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    1    1    1

Case 2:

xdim <- 3
ydim <- 3
tdim <- 3
a <- array(0:1,dim=c(xdim,ydim,tdim))

Output:

, , 1

     [,1] [,2] [,3]
[1,]    0    1    0
[2,]    1    0    1
[3,]    0    1    0

, , 2

     [,1] [,2] [,3]
[1,]    1    0    1
[2,]    0    1    0
[3,]    1    0    1

, , 3

     [,1] [,2] [,3]
[1,]    0    1    0
[2,]    1    0    1
[3,]    0    1    0

Case 3:

xdim <- 3
ydim <- 4
tdim <- 2
a <- array(0:1,dim=c(xdim,ydim,tdim))

Output:

, , 1

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    1    0
[3,]    0    1    0    1

, , 2

     [,1] [,2] [,3] [,4]
[1,]    0    1    0    1
[2,]    1    0    1    0
[3,]    0    1    0    1

Pattern Hacking

Alrighty, based on the above discussion, we opt to make a bit of code the exploits this unique pattern.

Creating Alternating Vectors

An alternating vector in this case switches between two different values.

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// ------- Make Alternating Vectors

arma::vec odd_vec(unsigned int xdim){

  // make a temporary vector to create alternating 0-1 effect by row.
  arma::vec temp_vec(xdim);

  // Alternating vector (anyone have a better solution? )
  for (unsigned int i = 0; i < xdim; i++) {
    temp_vec(i) = (i % 2 ? 0 : 1);
  }

  return temp_vec;
}

arma::vec even_vec(unsigned int xdim){

  // make a temporary vector to create alternating 0-1 effect by row.
  arma::vec temp_vec(xdim);

  // Alternating vector (anyone have a better solution? )
  for (unsigned int i = 0; i < xdim; i++) {
    temp_vec(i) = (i % 2 ? 1 : 0); // changed
  }

  return temp_vec;
}

Creating the three cases of matrix

As mentioned above, there are three cases of matrix. The even, first odd, and second odd cases.

// --- Handle the different cases 

// [[Rcpp::export]]
arma::mat make_even_matrix(unsigned int xdim, unsigned int ydim){

  arma::mat temp_mat(xdim,ydim);

  temp_mat.each_col() = even_vec(xdim);

  return temp_mat;
}

// xdim is odd and ydim is even
// [[Rcpp::export]]
arma::mat make_odd_matrix_case1(unsigned int xdim, unsigned int ydim){

  arma::mat temp_mat(xdim,ydim);

  arma::vec e_vec = even_vec(xdim);
  arma::vec o_vec = odd_vec(xdim);

  // Alternating column 
  for (unsigned int i = 0; i < ydim; i++) {
    temp_mat.col(i) = (i % 2 ? o_vec : e_vec);
  }

  return temp_mat;
}

// xdim is odd and ydim is odd    
// [[Rcpp::export]]
arma::mat make_odd_matrix_case2(unsigned int xdim, unsigned int ydim){

  arma::mat temp_mat(xdim,ydim);

  arma::vec e_vec = even_vec(xdim);
  arma::vec o_vec = odd_vec(xdim);

  // Alternating column 
  for (unsigned int i = 0; i < ydim; i++) {
    temp_mat.col(i) = (i % 2 ? e_vec : o_vec); // slight change
  }

  return temp_mat;
}

Calculation Engine

Same as the previous solution, just without the t as we no longer need to repeat calculations.

// --- Calculation engine

// [[Rcpp::export]]
arma::mat calc_matrix(arma::mat temp_mat){

  unsigned int xdim = temp_mat.n_rows;

  unsigned int ydim = temp_mat.n_cols;

  arma::mat res = temp_mat;

  // Subset the rows
  for (unsigned int x = 2; x < xdim-2; x++){

    arma::mat temp_row_sub = temp_mat.rows(x-2, x+2);

    // Iterate over the columns with unit accumulative sum
    for (unsigned int y = 2; y <  ydim-2; y++){
      res(x,y) = accu(temp_row_sub.cols(y-2,y+2));
    }
  }

  return res;
}

Call Main Function

Here is the core function that pieces everything together. This gives us the desired distance arrays.

// --- Main Engine

// Create the desired cube information
// [[Rcpp::export]]
arma::cube dim_to_cube(unsigned int xdim = 4, unsigned int ydim = 4, unsigned int tdim = 3) {

  // Initialize values in A
  arma::cube res(xdim,ydim,tdim);

  if(xdim % 2 == 0){
    res.each_slice() = calc_matrix(make_even_matrix(xdim, ydim));
  }else{

    if(ydim % 2 == 0){

      res.each_slice() = calc_matrix(make_odd_matrix_case1(xdim, ydim));

    }else{

      arma::mat first_odd_mat = calc_matrix(make_odd_matrix_case1(xdim, ydim));

      arma::mat sec_odd_mat = calc_matrix(make_odd_matrix_case2(xdim, ydim));

      for(unsigned int t = 0; t < tdim; t++){
        res.slice(t) = (t % 2 ? sec_odd_mat : first_odd_mat);
      }

    }

  }

  return res;
}

Timing

Now, the real truth is how well does this perform:

Unit: microseconds
       expr      min        lq       mean    median        uq       max neval
    r_1core 3538.022 3825.8105 4301.84107 3957.3765 4043.0085 16856.865   100
 alex_1core 2790.515 2984.7180 3461.11021 3076.9265 3189.7890 15371.406   100
  cpp_1core  174.508  180.7190  197.29728  194.1480  204.8875   338.510   100
  cpp_2core  111.960  116.0040  126.34508  122.7375  136.2285   162.279   100
  cpp_3core   81.619   88.4485  104.54602   94.8735  108.5515   204.979   100
  cpp_cache   40.637   44.3440   55.08915   52.1030   60.2290   302.306   100

Script used for timing:

cpp_parallel = cube_parallel(a,res, 1)
alex_1core = alex(a,res,xdim,ydim,tdim)
cpp_cache = dim_to_cube(xdim,ydim,tdim)
op_answer = cube_r(a,res,xdim,ydim,tdim)

all.equal(cpp_parallel, op_answer)
all.equal(cpp_cache, op_answer)
all.equal(alex_1core, op_answer)

xdim <- 20
ydim <- 20
tdim <- 5
a <- array(0:1,dim=c(xdim,ydim,tdim))
res <- array(0:1,dim=c(xdim,ydim,tdim))


ga = microbenchmark::microbenchmark(r_1core = cube_r(a,res,xdim,ydim,tdim),
                                    alex_1core = alex(a,res,xdim,ydim,tdim),
                                    cpp_1core = cube_parallel(a,res, 1), 
                                    cpp_2core = cube_parallel(a,res, 2), 
                                    cpp_3core = cube_parallel(a,res, 3),
                                    cpp_cache = dim_to_cube(xdim,ydim,tdim))

Gabe · Answer

Here's one solution that's fast for a large array:

res <- apply(a, 3, function(a) t(filter(t(filter(a, rep(1, 5), circular=TRUE)), rep(1, 5), circular=TRUE)))
dim(res) <- c(xdim, ydim, tdim)

I filtered the array using rep(1,5) as the weights (i.e. sum values within a neighborhood of 2) along each dimension. I then modified the dim attribute since it initially comes out as a matrix.

Note that this wraps the sum around at the edges of the array (which might make sense since you're looking at latitude and longitude; if not, I can modify my answer).

For a concrete example:

xdim <- 500
ydim <- 500
tdim <- 15
a <- array(0:1,dim=c(xdim,ydim,tdim))

and here's what you're currently using (with NAs at the edges) and how long this example takes on my laptop:

f1 <- function(a, xdim, ydim, tdim){
  res <- array(NA_integer_,dim=c(xdim,ydim,tdim))
  for (t in 1:tdim){
    for (x in 3:(xdim-2)){
      for (y in 3:(ydim-2)){
        res[x,y,t] <- sum(a[(x-2):(x+2),(y-2):(y+2),t])
      }
    }
  }
  return(res)
}

system.time(res1 <- f1(a, xdim, ydim, tdim))
#   user  system elapsed
# 14.813   0.005  14.819

And here's a comparison with the version I described:

f2 <- function(a, xdim, ydim, tdim){
  res <- apply(a, 3, function(a) t(filter(t(filter(a, rep(1, 5), circular=TRUE)), rep(1, 5), circular=TRUE)))
  dim(res) <- c(xdim, ydim, tdim)
  return(res)
}

system.time(res2 <- f2(a, xdim, ydim, tdim))
#  user  system elapsed
# 1.188   0.047   1.236

You can see there's a significant speed boost (for large arrays). And to check that it's giving the correct solution (note that I'm adding NAs so both results match, since the one I gave filters in a circular manner):

## Match NAs
res2NA <- ifelse(is.na(res1), NA, res2)

all.equal(res2NA, res1)
# [1] TRUE

I'll add that your full array (2500x2500x50) took just under a minute (about 55 seconds), although it did use a lot of memory in the process, FYI.

R: Summing up neighboring matrix elements. How to speed up?

Tags:

r

matrix

rcpp

openmp

armadillo

Steffen

2 Answers

Intro

Bruteforce with OpenMP

Replication Patterns

Examples

Pattern Hacking

Creating Alternating Vectors

Creating the three cases of matrix

Calculation Engine

Call Main Function

Timing

coatless

Gabe

Recent Activity

Donate For Us

R: Summing up neighboring matrix elements. How to speed up?

Tags:

r

matrix

rcpp

openmp

armadillo

Steffen

2 Answers

Intro

Bruteforce with OpenMP

Replication Patterns

Examples

Pattern Hacking

Creating Alternating Vectors

Creating the three cases of matrix

Calculation Engine

Call Main Function

Timing

coatless

Gabe

Related questions

Recent Activity

Donate For Us