I have many columns that have same names that always start with the same string, either n_ for the number of students, score_ for the percent of students who passed, and loc_ for the room number.
In this, I want to multiple the n_ columns with their respective score_ columns (so n_math * score_math, n_sci * score_sci, etc.) and create new columns called n_*_success for the number of students who passed the class.
If I had just a few columns like in this sample dataset, I would do something like this for each column:
mutate(n_sci_success = n_sci * score_sci)
But I have many columns and I'd like to write some expression that will match column names.
I think I have to use regex and across (like across(starts_with("n_)), but I just can't figure it out. Any help would be much appreciated!
Here's a sample dataset:
library(tidyverse)
test <- tibble(id = c(1:4),
n_sci = c(10, 20, 30, 40),
score_sci = c(1, .9, .75, .7),
loc_sci = c(1, 2, 3, 4),
n_math = c(100, 50, 40, 30),
score_math = c(.5, .6, .7, .8),
loc_math = c(4, 3, 2, 1),
n_hist = c(10, 50, 30, 20),
score_hist = c(.5, .5, .9, .9),
loc_hist = c(2, 1, 4, 3))
Here's one way using across and new pick function from dplyr 1.1.0
library(dplyr)
out <- test %>%
mutate(across(starts_with('n_'), .names = 'res_{col}') *
pick(starts_with('score_')) * pick(starts_with('loc_')))
out %>% select(starts_with('res'))
# res_n_sci res_n_math res_n_hist
# <dbl> <dbl> <dbl>
#1 10 200 10
#2 36 90 25
#3 67.5 56 108
#4 112 24 54
This should also work if you replace all pick with across. pick is useful for selecting columns, across is useful when you need to apply a function to the columns selected.
I am using across in the 1st case (with starts_with('n_')) is because I want to give unique names to the new columns using .names which is not present in pick.
pick() is very nice, thanks for sharing. Here is way using reduce from purrr package:
We first use split.default to get a list, then apply reduce via map_dfr:
library(purrr)
library(stringr)
test %>%
split.default(str_remove(names(.), ".*_")) %>%
map_dfr(reduce, `*`)
# A tibble: 4 × 4
hist id math sci
<dbl> <int> <dbl> <dbl>
1 10 1 200 10
2 25 2 90 36
3 108 3 56 67.5
4 54 4 24 112
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With