Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

switch column values in dataframe in R based in combination of two columns

Tags:

dataframe

r

I have a question about changing character values in a data.frame according to combinations based in two columns. I will try to give an example of how de data.frame looks

data <- data.frame(A1 = c("A", "T", "C"), A2 = c("C", "G", "T"), 
                   Ind1 = c("AA", "TG", "TT"), Ind2 = c("CA", "GT", "CT"),
                   Ind3 = c("AC", "GG", "TC"))

> data
  A1 A2 Ind1 Ind2 Ind3
1  A  C   AA   CA   AC
2  T  G   TG   GT   GG
3  C  T   TT   CT   TC

I want to change the values in columns from Ind1 to Ind3 that don't match the possible combinations from columns A1 and A2, for example in the first row, A1 is an A and A2 is a C so the possible combinations would be AA, AC, CC (combination based on A1, and A2 in that order). Therefore the Ind2 should be AC instead of CA.

The desired output would be this one:

> data
  A1 A2 Ind1 Ind2 Ind3
1  A  C   AA   AC   AC
2  T  G   TG   TG   GG
3  C  T   TT   CT   CT

I have tried with switch but it doesn't work. Any help would be appreciated. Thanks

like image 688
user2380782 Avatar asked Nov 01 '25 14:11

user2380782


2 Answers

If I understand the question correctly and assuming you only have two letters to deal with there is only one case which will need editing; that is when the letters are in reverse order i.e. 'A2A1'. All other cases will be correct. So you could manage this with a simple ifelse mutate.

data <- data.frame(A1 = c("A", "T", "C"), A2 = c("C", "G", "T"), 
                   Ind1 = c("AA", "TG", "TT"), Ind2 = c("CA", "GT", "CT"),
                   Ind3 = c("AC", "GG", "TC"))

library(dplyr)

data |> 
  mutate(across(starts_with("Ind"), ~ ifelse(.x == paste0(A2, A1), paste0(A1, A2), .x)))
#>   A1 A2 Ind1 Ind2 Ind3
#> 1  A  C   AA   AC   AC
#> 2  T  G   TG   TG   GG
#> 3  C  T   TT   CT   CT

In response to OP comments, using "real" data:


df2 <- structure(list(chr = "chr11", pos = "74565122", snp_id = "chr11_74565122_C_T_b38",     Allele1 = "C", Allele2 = "T", GTEX_111CU = "TT", GTEX_111YS = "CT",     GTEX_1122O = "TC", GTEX_117XS = "TC", GTEX_117YX = "TC"), class = "data.frame", row.names = c(NA, -1L))

df2
#>     chr      pos                 snp_id Allele1 Allele2 GTEX_111CU GTEX_111YS
#> 1 chr11 74565122 chr11_74565122_C_T_b38       C       T         TT         CT
#>   GTEX_1122O GTEX_117XS GTEX_117YX
#> 1         TC         TC         TC

mutate(df2, across(starts_with("GTEX"), ~ ifelse(.x %in% paste0(Allele2, Allele1), paste0(Allele1, Allele2), .x)))
#>     chr      pos                 snp_id Allele1 Allele2 GTEX_111CU GTEX_111YS
#> 1 chr11 74565122 chr11_74565122_C_T_b38       C       T         TT         CT
#>   GTEX_1122O GTEX_117XS GTEX_117YX
#> 1         CT         CT         CT
like image 122
Peter Avatar answered Nov 03 '25 06:11

Peter


We could use a regex pattern to test the validity of the combo, then reverse the string if it is not valid:

library(dplyr)
library(stringr)
data |>
  mutate(across(starts_with("Ind"), \(x) ifelse(
    str_detect(x, pattern = sprintf("^%s{0,2}%s{0,2}$", A1, A2)),
    x,
    stringi::stri_reverse(x))
  ))
#   A1 A2 Ind1 Ind2 Ind3
# 1  A  C   AA   AC   AC
# 2  T  G   TG   TG   GG
# 3  C  T   TT   CT   CT
like image 21
Gregor Thomas Avatar answered Nov 03 '25 05:11

Gregor Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!