Summing over all previous rows in large column efficiently

Question

I have a large data set (>100,000 rows) and would like to create a new column that sums all previous values of another column.

For a simulated data set test.data with 100,000 rows and 2 columns, I create the new vector that sums the contents of column 2 with:

sapply(1:100000, function(x) sum(test.data[1:x[1],2]))

I append this vector to the test.table later with cbind() This is too slow, however. Is there a faster way to accomplish this, or be able to reference the vector that sapply is making in sapply so I can just update the cumulative sum instead of performing the whole calc again?

Mike H. · Accepted Answer

Per my comment above it'll be faster if you do a direct assignment and use cumsum instead of sapply (cumsum was specifically built for what you want to do).

This should work:

test.data$sum <- cumsum(test.data[, 2])

Summing over all previous rows in large column efficiently

Tags:

r

J.Streb

1 Answers

Mike H.

Recent Activity

Donate For Us

Summing over all previous rows in large column efficiently

Tags:

r

J.Streb

1 Answers

Mike H.

Related questions

Recent Activity

Donate For Us