Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I mutate across to multiple columns together that have similar names in R?

I have many columns that have same names that always start with the same string, either n_ for the number of students, score_ for the percent of students who passed, and loc_ for the room number.

In this, I want to multiple the n_ columns with their respective score_ columns (so n_math * score_math, n_sci * score_sci, etc.) and create new columns called n_*_success for the number of students who passed the class.

If I had just a few columns like in this sample dataset, I would do something like this for each column:

mutate(n_sci_success = n_sci * score_sci)

But I have many columns and I'd like to write some expression that will match column names.

I think I have to use regex and across (like across(starts_with("n_)), but I just can't figure it out. Any help would be much appreciated!

Here's a sample dataset:

library(tidyverse)

test <- tibble(id = c(1:4),
               n_sci = c(10, 20, 30, 40),
               score_sci = c(1, .9, .75, .7),
               loc_sci = c(1, 2, 3, 4),
               n_math = c(100, 50, 40, 30),
               score_math = c(.5, .6, .7, .8),
               loc_math = c(4, 3, 2, 1),
               n_hist = c(10, 50, 30, 20),
               score_hist = c(.5, .5, .9, .9),
               loc_hist = c(2, 1, 4, 3))

like image 303
J.Sabree Avatar asked Dec 13 '25 13:12

J.Sabree


2 Answers

Here's one way using across and new pick function from dplyr 1.1.0

library(dplyr)

out <- test %>%
  mutate(across(starts_with('n_'), .names = 'res_{col}') * 
           pick(starts_with('score_')) * pick(starts_with('loc_')))

out %>% select(starts_with('res'))

#  res_n_sci res_n_math res_n_hist
#      <dbl>      <dbl>      <dbl>
#1      10          200         10
#2      36           90         25
#3      67.5         56        108
#4     112           24         54

This should also work if you replace all pick with across. pick is useful for selecting columns, across is useful when you need to apply a function to the columns selected.

I am using across in the 1st case (with starts_with('n_')) is because I want to give unique names to the new columns using .names which is not present in pick.

like image 95
Ronak Shah Avatar answered Dec 16 '25 05:12

Ronak Shah


pick() is very nice, thanks for sharing. Here is way using reduce from purrr package:

We first use split.default to get a list, then apply reduce via map_dfr:

library(purrr)
library(stringr)
test %>%
  split.default(str_remove(names(.), ".*_")) %>% 
  map_dfr(reduce, `*`) 
# A tibble: 4 × 4
   hist    id  math   sci
  <dbl> <int> <dbl> <dbl>
1    10     1   200  10  
2    25     2    90  36  
3   108     3    56  67.5
4    54     4    24 112  
like image 38
TarJae Avatar answered Dec 16 '25 06:12

TarJae



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!