Let's consider this simple dataset
set.seed(12345)
df <- data.frame(a1 = rnorm(5), a2 = rnorm(5), a3 = rnorm(5),
b1 = rnorm(5), b2 = rnorm(5), b3 = rnorm(5),
c1 = rnorm(5), c2 = rnorm(5), c3 = rnorm(5))
Which looks like
a1 a2 a3 b1 b2 b3 c1 c2 c3
1 0.5855288 -1.8179560 -0.1162478 0.8168998 0.7796219 1.8050975 0.8118732 0.49118828 1.1285108
2 0.7094660 0.6300986 1.8173120 -0.8863575 1.4557851 -0.4816474 2.1968335 -0.32408658 -2.3803581
3 -0.1093033 -0.2761841 0.3706279 -0.3315776 -0.6443284 0.6203798 2.0491903 -1.66205024 -1.0602656
4 -0.4534972 -0.2841597 0.5202165 1.1207127 -1.5531374 0.6121235 1.6324456 1.76773385 0.9371405
5 0.6058875 -0.9193220 -0.7505320 0.2987237 -1.5977095 -0.1623110 0.2542712 0.02580105 0.8544517
Now, I would like to get the mean of columns starting with a specific letter, specified in a vector.
So, for instance if I have
cols <- c("a", "c")
I'd like to output a dataframe with two columns (a and c) containing the mean of the a1/a2/a3 and c1/c2/c3 columns respectively.
a c
1 -0.449558319 0.8105241
2 1.052292204 -0.1692037
3 -0.004953185 -0.2243752
4 -0.072480153 1.4457733
5 -0.354655514 0.3781747
I've been playing around with starts_with
and row_wise
but I can't quite get the correct syntax.
select
columns that starts_with
a or c, then use split.default
to split the columns, and apply rowMeans
on each of the groups:
library(dplyr)
library(purrr)
select(df, starts_with(cols)) %>%
split.default(gsub("\\d", "", names(.))) %>%
map_dfc(rowMeans)
a c
1 -0.450 0.811
2 1.05 -0.169
3 -0.00495 -0.224
4 -0.0725 1.45
5 -0.355 0.378
Note that the gsub
part might need to be changed depending on the structure of your column names.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With