Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Averaging an increasing number of columns of a dataframe

Tags:

r

I have a data frame (wc2) with 7 columns:

    cm5      cm10      cm15      cm20      cm25      cm30       run_time
1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060        0
2 0.4084525 0.4028010 0.3617393 0.2595060 0.1294412 0.1220099        2
3 0.4087809 0.4042515 0.3711077 0.3119956 0.2241836 0.1290348        4
4 0.4088547 0.4045780 0.3732053 0.3218224 0.2611785 0.1720426        6
5 0.4088770 0.4046887 0.3739936 0.3255557 0.2739738 0.2081264        8
6 0.4088953 0.4047649 0.3744183 0.3273794 0.2798225 0.2273250       10

For every row (run_time) I want to average first the 1st column, then the 1st and 2nd columns, then the 1st, 2nd and 3rd columns and so on until the 6th column. The averaged result should be in a new column or a new data frame (I don't mind). I did it using the following code:

wc2$dia10 <- wc2$cm5
wc2$dia20 <- rowMeans(wc2[c("cm5", "cm10")])
wc2$dia30 <- rowMeans(wc2[c("cm5", "cm10", "cm15")])
wc2$dia40 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20")])
wc2$dia50 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20", "cm25")])
wc2$dia60 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20", "cm25", "cm30")])

From my basic knowledge of R I gather there is a much better way of doing that but I can't figure out how. Especially for when I'll have a bigger number of columns. I had a look at the answer for "Sum over and increasing number of columns of a data frame in R" but couldn't understand or apply it to my data.

Thanks for any help

like image 543
Jack Avatar asked Dec 12 '25 07:12

Jack


2 Answers

You can use Reduce with accumulate = TRUE argument as follows,

sapply(Reduce(c, 1:(ncol(df)-1), accumulate = TRUE)[-1], function(i) rowMeans(df[i]))

Or to get the exact output,

setNames(data.frame(df[1],sapply(Reduce(c, 1:(ncol(df)-1),accumulate = TRUE)[-1], function(i) 
           rowMeans(df[i]))), paste0('dia', seq(from = 10, to = ncol(df[-1])*10, by = 10)))

Or as @A5C1D2H2I1M1N2O1R2T1 suggests in comments,

do.call(cbind, setNames(lapply(1:6, function(x) rowMeans(df[1:x])),
                                        paste0("dia", seq(10, 60, 10)))

Both giving,

    dia10     dia20     dia30     dia40     dia50     dia60
1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060
2 0.4084525 0.4056268 0.3909976 0.3581247 0.3123880 0.2806583
3 0.4087809 0.4065162 0.3947134 0.3740339 0.3440639 0.3082257
4 0.4088547 0.4067164 0.3955460 0.3771151 0.3539278 0.3236136
5 0.4088770 0.4067829 0.3958531 0.3782787 0.3574178 0.3325359
6 0.4088953 0.4068301 0.3960262 0.3788645 0.3590561 0.3371009

Or to add it to the original data frame, then,

cbind(df, setNames(lapply(1:6, function(x) rowMeans(df[1:x])),
                                    paste0("dia", seq(10, 60, 10))))
like image 74
Sotos Avatar answered Dec 13 '25 20:12

Sotos


Here is an alternative method with apply and cumsum. Using rowMeans is almost surely preferable, but this method runs through the calculation in one pass.

setNames(data.frame(t(apply(dat[1:6], 1, cumsum) / 1:6)),
         paste0("dia", seq(10, 60, 10)))
      dia10     dia20     dia30     dia40     dia50     dia60
1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060
2 0.4084525 0.4056268 0.3909976 0.3581247 0.3123880 0.2806583
3 0.4087809 0.4065162 0.3947134 0.3740339 0.3440639 0.3082257
4 0.4088547 0.4067164 0.3955460 0.3771151 0.3539278 0.3236136
5 0.4088770 0.4067829 0.3958531 0.3782787 0.3574178 0.3325359
6 0.4088953 0.4068301 0.3960262 0.3788645 0.3590561 0.3371009

Using the smarter Reduce("+" with accumulate suggested by @alexis-laz, we could do

mapply("/", Reduce("+", dat[1:6], accumulate = TRUE), 1:6)

or to get a data.frame with the desired names

setNames(data.frame(mapply("/", Reduce("+", dat[1:6], accumulate = TRUE), 1:6)),
         paste0("dia", seq(10, 60, 10)))

The uglier code below follows the same idea, without mapply

setNames(data.frame(Reduce("+", dat[1:6], accumulate = TRUE)) /
                    rep(1:6, each=nrow(dat)), paste0("dia", seq(10, 60, 10)))
like image 34
lmo Avatar answered Dec 13 '25 21:12

lmo