Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding the differences of paired-columns using dplyr

Tags:

r

dplyr

set.seed(3)
library(dplyr)
dat <- tibble(Measure = c("Height","Weight","Width","Length"),
             AD1_1= rpois(4,10),
             AD1_2= rpois(4,9),
             AD2_1= rpois(4,10),
             AD2_2= rpois(4,9),
             AD3_1= rpois(4,10),
             AD3_2= rpois(4,9),
             AD4_1= rpois(4,10),
             AD4_2= rpois(4,9),
             AD5_1= rpois(4,10),
             AD5_2= rpois(4,9),
             AD6_1= rpois(4,10),
             AD6_2= rpois(4,9))

Suppose I have data that looks like this. I wish to calculate the difference for each AD, paired with underscored number, i.e., AD1diff, AD2diff,AD3diff.

Instead of writing

dat %>%
mutate(AD1diff = AD1_1 - AD1_2,
       AD2diff = AD2_1 - AD2_2,
...)

what would be an efficient way to write this?

like image 941
aiorr Avatar asked Oct 16 '25 01:10

aiorr


2 Answers

One dplyr option could be:

dat %>%
 mutate(across(ends_with("_1"), .names = "{col}_diff") - across(ends_with("_2"))) %>%
 rename_with(~ sub("_\\d+", "", .), ends_with("_diff"))

  Measure AD1_1 AD1_2 AD2_1 AD2_2 AD3_1 AD3_2 AD4_1 AD4_2 AD5_1 AD5_2 AD6_1 AD6_2 AD1_diff AD2_diff AD3_diff AD4_diff AD5_diff AD6_diff
  <chr>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>    <int>    <int>    <int>    <int>    <int>    <int>
1 Height      6    10    10     3    12     8     7     5     7     5     8     9       -4        7        4        2        2       -1
2 Weight      8     9    13     6    14     7     8     7    13    11    10     9       -1        7        7        1        2        1
3 Width      10     9    11     5    12     8     7    11     9     5     5     6        1        6        4       -4        4       -1
4 Length      8     9     8     7     8    13     8     7     6    11    14     6       -1        1       -5        1       -5        8
like image 98
tmfmnk Avatar answered Oct 17 '25 16:10

tmfmnk


The "tidy" way to do this would be to convert your data from wide to long, do a grouped subtraction, and then go back to wide format:

library(tidyr)
dat_long = dat %>% pivot_longer(
  cols = starts_with("AD"),
  names_sep = "_",
  names_to = c("group", "obs")
) 

dat_long %>% head
# # A tibble: 48 x 4
#    Measure group obs   value
#    <chr>   <chr> <chr> <int>
#  1 Height  AD1   1         6
#  2 Height  AD1   2        10
#  3 Height  AD2   1        10
#  4 Height  AD2   2         3
#  5 Height  AD3   1        12
#  6 Height  AD3   2         8

dat_long %>%
  group_by(Measure, group) %>% 
  summarize(diff = value[obs == 1] - value[obs == 2]) %>%
  pivot_wider(names_from = "group", values_from = "diff") %>%
  rename_with(.fn = ~ paste0(., "diff"), .cols = starts_with("AD"))
# # A tibble: 4 x 7
# # Groups:   Measure [4]
#   Measure AD1diff AD2diff AD3diff AD4diff AD5diff AD6diff
#   <chr>     <int>   <int>   <int>   <int>   <int>   <int>
# 1 Height       -4       7       4       2       2      -1
# 2 Length       -1       1      -5       1      -5       8
# 3 Weight       -1       7       7       1       2       1
# 4 Width         1       6       4      -4       4      -1

like image 31
Gregor Thomas Avatar answered Oct 17 '25 15:10

Gregor Thomas