How do I mutate across to multiple columns together that have similar names in R?

Question

I have many columns that have same names that always start with the same string, either n_ for the number of students, score_ for the percent of students who passed, and loc_ for the room number.

In this, I want to multiple the n_ columns with their respective score_ columns (so n_math * score_math, n_sci * score_sci, etc.) and create new columns called n_*_success for the number of students who passed the class.

If I had just a few columns like in this sample dataset, I would do something like this for each column:

mutate(n_sci_success = n_sci * score_sci)

But I have many columns and I'd like to write some expression that will match column names.

I think I have to use regex and across (like across(starts_with("n_)), but I just can't figure it out. Any help would be much appreciated!

Here's a sample dataset:

library(tidyverse)

test <- tibble(id = c(1:4),
               n_sci = c(10, 20, 30, 40),
               score_sci = c(1, .9, .75, .7),
               loc_sci = c(1, 2, 3, 4),
               n_math = c(100, 50, 40, 30),
               score_math = c(.5, .6, .7, .8),
               loc_math = c(4, 3, 2, 1),
               n_hist = c(10, 50, 30, 20),
               score_hist = c(.5, .5, .9, .9),
               loc_hist = c(2, 1, 4, 3))

Ronak Shah · Accepted Answer

Here's one way using across and new pick function from dplyr 1.1.0

library(dplyr)

out <- test %>%
  mutate(across(starts_with('n_'), .names = 'res_{col}') * 
           pick(starts_with('score_')) * pick(starts_with('loc_')))

out %>% select(starts_with('res'))

#  res_n_sci res_n_math res_n_hist
#      <dbl>      <dbl>      <dbl>
#1      10          200         10
#2      36           90         25
#3      67.5         56        108
#4     112           24         54

This should also work if you replace all pick with across. pick is useful for selecting columns, across is useful when you need to apply a function to the columns selected.

I am using across in the 1st case (with starts_with('n_')) is because I want to give unique names to the new columns using .names which is not present in pick.

TarJae · Answer

pick() is very nice, thanks for sharing. Here is way using reduce from purrr package:

We first use split.default to get a list, then apply reduce via map_dfr:

library(purrr)
library(stringr)
test %>%
  split.default(str_remove(names(.), ".*_")) %>% 
  map_dfr(reduce, `*`)

# A tibble: 4 × 4
   hist    id  math   sci
  <dbl> <int> <dbl> <dbl>
1    10     1   200  10  
2    25     2    90  36  
3   108     3    56  67.5
4    54     4    24 112

How do I mutate across to multiple columns together that have similar names in R?

Tags:

regex

r

dplyr

across

mutate

J.Sabree

2 Answers

Ronak Shah

TarJae

Recent Activity

Donate For Us

How do I mutate across to multiple columns together that have similar names in R?

Tags:

regex

r

dplyr

across

mutate

J.Sabree

2 Answers

Ronak Shah

TarJae

Related questions

Recent Activity

Donate For Us