Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Replace NA values in R dataframe across multiple columns using truncated names of other columns [duplicate]

I have the following data frame (example):

myfile <- data.frame(C1=c(1,3,4,5),
                     C2=c(5,4,6,7),
                     C3=c(0,1,3,2),
                     C1_A=c(NA,NA,1,2),
                     C2_A=c(NA,9,8,7),
                     C3_A=c(NA,NA,NA,1))

I would like to replace all NA values under the last 3 "_A" columns with the respective same row value from columns C1 to C3. for example C1_A to be 1,3,1,2

I tried the following line

myfile <- myfile %>% mutate(across(c(C1_A:C3_A), ~ if_else(is.na(.)==TRUE, eval(parse(text=str_replace(., "_A", ""))), .)))

but is not working and returns the bottom row value of the _A columns. Also tried it with the rowwise dplyr option, but still no success.

My real dataset has several columns like the example, so doesn't make sense to mutate each individually. How best to resolve this?

like image 702
JohnPat Avatar asked Nov 01 '25 02:11

JohnPat


2 Answers

An option with tidyverse:

myfile %>%
 mutate(across(ends_with("_A"), ~ if_else(is.na(.), get(str_remove(cur_column(), "_A")), .)))

  C1 C2 C3 C1_A C2_A C3_A
1  1  5  0    1    5    0
2  3  4  1    3    9    1
3  4  6  3    1    8    3
4  5  7  2    2    7    1
like image 58
tmfmnk Avatar answered Nov 02 '25 19:11

tmfmnk


If there's a set of complete columns followed by a matching set of incomplete columns, we could naively locate NA indices (1), get matching source / patch value indices by subtracting number of columns in a set from index col (2) and update NA locations (3):

myfile <- data.frame(C1=c(1,3,4,5),
                     C2=c(5,4,6,7),
                     C3=c(0,1,3,2),
                     C1_A=c(NA,NA,1,2),
                     C2_A=c(NA,9,8,7),
                     C3_A=c(NA,NA,NA,1))
# 1 - get NA locations
( na_idx <- src_idx <- which(is.na(myfile), arr.ind = TRUE) )
#>      row col
#> [1,]   1   4
#> [2,]   2   4
#> [3,]   1   5
#> [4,]   1   6
#> [5,]   2   6
#> [6,]   3   6

# 2 - update index col
src_idx[,2] <- src_idx[,2] - 3
src_idx
#>      row col
#> [1,]   1   1
#> [2,]   2   1
#> [3,]   1   2
#> [4,]   1   3
#> [5,]   2   3
#> [6,]   3   3

# 3 - update values
myfile[na_idx] <- myfile[src_idx]
myfile
#>   C1 C2 C3 C1_A C2_A C3_A
#> 1  1  5  0    1    5    0
#> 2  3  4  1    3    9    1
#> 3  4  6  3    1    8    3
#> 4  5  7  2    2    7    1

Created on 2025-10-08 with reprex v2.1.1

like image 34
margusl Avatar answered Nov 02 '25 18:11

margusl