I have a data frame (wc2) with 7 columns:
cm5 cm10 cm15 cm20 cm25 cm30 run_time
1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0
2 0.4084525 0.4028010 0.3617393 0.2595060 0.1294412 0.1220099 2
3 0.4087809 0.4042515 0.3711077 0.3119956 0.2241836 0.1290348 4
4 0.4088547 0.4045780 0.3732053 0.3218224 0.2611785 0.1720426 6
5 0.4088770 0.4046887 0.3739936 0.3255557 0.2739738 0.2081264 8
6 0.4088953 0.4047649 0.3744183 0.3273794 0.2798225 0.2273250 10
For every row (run_time) I want to average first the 1st column, then the 1st and 2nd columns, then the 1st, 2nd and 3rd columns and so on until the 6th column. The averaged result should be in a new column or a new data frame (I don't mind). I did it using the following code:
wc2$dia10 <- wc2$cm5
wc2$dia20 <- rowMeans(wc2[c("cm5", "cm10")])
wc2$dia30 <- rowMeans(wc2[c("cm5", "cm10", "cm15")])
wc2$dia40 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20")])
wc2$dia50 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20", "cm25")])
wc2$dia60 <- rowMeans(wc2[c("cm5", "cm10", "cm15", "cm20", "cm25", "cm30")])
From my basic knowledge of R I gather there is a much better way of doing that but I can't figure out how. Especially for when I'll have a bigger number of columns. I had a look at the answer for "Sum over and increasing number of columns of a data frame in R" but couldn't understand or apply it to my data.
Thanks for any help
You can use Reduce with accumulate = TRUE argument as follows,
sapply(Reduce(c, 1:(ncol(df)-1), accumulate = TRUE)[-1], function(i) rowMeans(df[i]))
Or to get the exact output,
setNames(data.frame(df[1],sapply(Reduce(c, 1:(ncol(df)-1),accumulate = TRUE)[-1], function(i)
rowMeans(df[i]))), paste0('dia', seq(from = 10, to = ncol(df[-1])*10, by = 10)))
Or as @A5C1D2H2I1M1N2O1R2T1 suggests in comments,
do.call(cbind, setNames(lapply(1:6, function(x) rowMeans(df[1:x])),
paste0("dia", seq(10, 60, 10)))
Both giving,
dia10 dia20 dia30 dia40 dia50 dia60 1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 2 0.4084525 0.4056268 0.3909976 0.3581247 0.3123880 0.2806583 3 0.4087809 0.4065162 0.3947134 0.3740339 0.3440639 0.3082257 4 0.4088547 0.4067164 0.3955460 0.3771151 0.3539278 0.3236136 5 0.4088770 0.4067829 0.3958531 0.3782787 0.3574178 0.3325359 6 0.4088953 0.4068301 0.3960262 0.3788645 0.3590561 0.3371009
Or to add it to the original data frame, then,
cbind(df, setNames(lapply(1:6, function(x) rowMeans(df[1:x])),
paste0("dia", seq(10, 60, 10))))
Here is an alternative method with apply and cumsum. Using rowMeans is almost surely preferable, but this method runs through the calculation in one pass.
setNames(data.frame(t(apply(dat[1:6], 1, cumsum) / 1:6)),
paste0("dia", seq(10, 60, 10)))
dia10 dia20 dia30 dia40 dia50 dia60
1 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060 0.1221060
2 0.4084525 0.4056268 0.3909976 0.3581247 0.3123880 0.2806583
3 0.4087809 0.4065162 0.3947134 0.3740339 0.3440639 0.3082257
4 0.4088547 0.4067164 0.3955460 0.3771151 0.3539278 0.3236136
5 0.4088770 0.4067829 0.3958531 0.3782787 0.3574178 0.3325359
6 0.4088953 0.4068301 0.3960262 0.3788645 0.3590561 0.3371009
Using the smarter Reduce("+" with accumulate suggested by @alexis-laz, we could do
mapply("/", Reduce("+", dat[1:6], accumulate = TRUE), 1:6)
or to get a data.frame with the desired names
setNames(data.frame(mapply("/", Reduce("+", dat[1:6], accumulate = TRUE), 1:6)),
paste0("dia", seq(10, 60, 10)))
The uglier code below follows the same idea, without mapply
setNames(data.frame(Reduce("+", dat[1:6], accumulate = TRUE)) /
rep(1:6, each=nrow(dat)), paste0("dia", seq(10, 60, 10)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With