I have a question about changing character values in a data.frame according to combinations based in two columns. I will try to give an example of how de data.frame looks
data <- data.frame(A1 = c("A", "T", "C"), A2 = c("C", "G", "T"), 
                   Ind1 = c("AA", "TG", "TT"), Ind2 = c("CA", "GT", "CT"),
                   Ind3 = c("AC", "GG", "TC"))
> data
  A1 A2 Ind1 Ind2 Ind3
1  A  C   AA   CA   AC
2  T  G   TG   GT   GG
3  C  T   TT   CT   TC
I want to change the values in columns from Ind1 to Ind3 that don't match the possible combinations from columns A1 and A2, for example in the first row, A1 is an A and A2 is a C so the possible combinations would be AA, AC, CC (combination based on A1, and A2 in that order). Therefore the Ind2 should be AC instead of CA.
The desired output would be this one:
> data
  A1 A2 Ind1 Ind2 Ind3
1  A  C   AA   AC   AC
2  T  G   TG   TG   GG
3  C  T   TT   CT   CT
I have tried with switch but it doesn't work. Any help would be appreciated.
Thanks
If I understand the question correctly and assuming you only have two letters to deal with there is only one case which will need editing;  that is when the letters are in reverse order i.e. 'A2A1'. All other cases will be correct. So you could manage this with a simple ifelse mutate.
data <- data.frame(A1 = c("A", "T", "C"), A2 = c("C", "G", "T"), 
                   Ind1 = c("AA", "TG", "TT"), Ind2 = c("CA", "GT", "CT"),
                   Ind3 = c("AC", "GG", "TC"))
library(dplyr)
data |> 
  mutate(across(starts_with("Ind"), ~ ifelse(.x == paste0(A2, A1), paste0(A1, A2), .x)))
#>   A1 A2 Ind1 Ind2 Ind3
#> 1  A  C   AA   AC   AC
#> 2  T  G   TG   TG   GG
#> 3  C  T   TT   CT   CT
In response to OP comments, using "real" data:
df2 <- structure(list(chr = "chr11", pos = "74565122", snp_id = "chr11_74565122_C_T_b38",     Allele1 = "C", Allele2 = "T", GTEX_111CU = "TT", GTEX_111YS = "CT",     GTEX_1122O = "TC", GTEX_117XS = "TC", GTEX_117YX = "TC"), class = "data.frame", row.names = c(NA, -1L))
df2
#>     chr      pos                 snp_id Allele1 Allele2 GTEX_111CU GTEX_111YS
#> 1 chr11 74565122 chr11_74565122_C_T_b38       C       T         TT         CT
#>   GTEX_1122O GTEX_117XS GTEX_117YX
#> 1         TC         TC         TC
mutate(df2, across(starts_with("GTEX"), ~ ifelse(.x %in% paste0(Allele2, Allele1), paste0(Allele1, Allele2), .x)))
#>     chr      pos                 snp_id Allele1 Allele2 GTEX_111CU GTEX_111YS
#> 1 chr11 74565122 chr11_74565122_C_T_b38       C       T         TT         CT
#>   GTEX_1122O GTEX_117XS GTEX_117YX
#> 1         CT         CT         CT
                        We could use a regex pattern to test the validity of the combo, then reverse the string if it is not valid:
library(dplyr)
library(stringr)
data |>
  mutate(across(starts_with("Ind"), \(x) ifelse(
    str_detect(x, pattern = sprintf("^%s{0,2}%s{0,2}$", A1, A2)),
    x,
    stringi::stri_reverse(x))
  ))
#   A1 A2 Ind1 Ind2 Ind3
# 1  A  C   AA   AC   AC
# 2  T  G   TG   TG   GG
# 3  C  T   TT   CT   CT
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With