Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sum the vectors stored in the list using Rcpp

Tags:

list

r

vector

sum

rcpp

Suppose I have the following list of vector

List <- list(c(1:3), c(4:6), c(7:9))

To get the required result I have the following code in Rcpp

 totalCpp <- {"#include <Rcpp.h>
using namespace Rcpp;
  // [[Rcpp::export]]
List t_list(List r_list) {
  List results;
  for (int i = 0; i < r_list.size(); ++i) {
    NumericVector vec = as<NumericVector>(r_list[i]);
    int sum = 0;
    for (int j = 0; j < vec.size(); ++j) {
      sum += vec[j];
    }
    results.push_back(sum); // Add the sum to the results list
  }
  return results;
}
  "}
sourceCpp(code = totalCpp)

which returns the following

> t_list(List)
[[1]]
[1] 6

[[2]]
[1] 15

[[3]]
[1] 24

Is it possible to write this Rcpp code without using two for loops or is there any elegant way to write this code in the Rcpp?

like image 291
Cantor_Set Avatar asked Oct 22 '25 15:10

Cantor_Set


1 Answers

{Rcpp} has a built in sum:

library(inline)

builtin_sum <- cxxfunction(
  signature(r_list = "list"), 
  body = '
   List input_list(r_list);
   List results;
   for (int i = 0; i < input_list.size(); ++i) {
     NumericVector vec = as<NumericVector>(input_list[i]);
     double vec_sum = sum(vec);
     results.push_back(vec_sum);
   }
   return results;
 ', 
  plugin = "Rcpp")

This is besides the fact that lapply() works here:

lapply(List, sum)

Then if we want to be more elegant and actually gain some performance, we can pre-allocate the results vector and use direct assignment, instead of push_back.

improved_sum <- cxxfunction(
  signature(r_list = "list"),
  body = '
    List input_list(r_list);
    int n = input_list.size();
    NumericVector results(n);  // Pre-allocate numeric vector
                             
    for (int i = 0; i < n; ++i) {
      NumericVector vec = input_list[i];
      results[i] = sum(vec);  // Direct assignment, no push_back
    }
    return results;
    ', 
  plugin = "Rcpp")

Here's a benchmark:

set.seed(42)
large_list <- replicate(10000, sample(1:100, 50), simplify = FALSE)

microbenchmark::microbenchmark(
  lapply = lapply(large_list, sum),
  two_loops = two_loops(large_list),
  builtin_sum = builtin_sum(large_list),
  improved = improved_sum(large_list),
  times = 100
) -> res

res
ggplot2::autoplot(res) +
  ggplot2::theme_bw()
Unit: milliseconds
       expr      min       lq       mean    median        uq      max neval cld
     lapply   2.4638   2.7633   3.224807   3.04370   3.51925   5.6379   100   a 
  two_loops 265.7754 307.4380 327.912011 320.43895 336.63080 631.5728   100   b
builtin_sum 273.9828 309.8691 328.088739 324.40175 336.75415 608.7544   100   b
   improved   1.5470   1.7755   2.390364   1.89355   2.12300  19.0634   100   a 

like image 134
M-- Avatar answered Oct 25 '25 04:10

M--