Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there an elegant way to replace NAs with values from a corresponding column, for multiple columns, in R?

Tags:

r

dplyr

I'm working with a dataframe of trial participant blood test results, with some sporadic missing values (analyte failed). Fortunately we have two time points quite close together, so for missing values at timepoint 1, i'm hoping to impute the corresponding value from timepoint 2. I am just wondering, if there is an elegant way to code this in R/tidyverse for multiple test results?

Here is some sample data:

           timepoint = c(1,1,1,1,1,2,2,2,2,2),
           fst_test = c(NA,sample(1:40,9, replace =F)),
           scd_test = c(sample(1:20,8, replace = F),NA,NA))

So far I have been pivoting wider, then manually coalescing the corresponding test results, like so:

test %>% 
  pivot_wider(names_from = timepoint, 
              values_from = fst_test:scd_test) %>%
  mutate(fst_test_imputed = coalesce(fst_test_1, fst_test_2),
         scd_test_imputed = coalesce(scd_test_1, scd_test_2)) %>% 
  select(ID, fst_test_imputed, scd_test_imputed)

However for 15 tests this is cumbersome... I thought there might be an elegant R / dplyr solution for this situation?

Many thanks in advance for your help!!

like image 355
agun7 Avatar asked Oct 15 '25 16:10

agun7


1 Answers

We could use fill after creating a grouping column with rowid on the 'timepoint' (as the OP mentioned to replace with corresponding data point in 'timepoint' column). Then, we just need fill and specify the .direction as "updown" to fill NA in the preceding value with the succeeding non-NA first (if it should be only to take care of 'NA' in 'timepoint' 1, then change the .direction = "up")

library(dplyr)
library(tidyr)
library(data.table)
test %>%
    group_by(grp = rowid(timepoint)) %>%
    fill(fst_test, scd_test, .direction = "updown") %>%
    ungroup %>% 
    select(-grp)

data

test <- structure(list(timepoint = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2), 
fst_test = c(NA, 
16L, 30L, 29L, 14L, 32L, 21L, 20L, 3L, 23L), scd_test = c(18L, 
17L, 8L, 20L, 1L, 10L, 14L, 19L, NA, NA)),
 class = "data.frame", row.names = c(NA, 
-10L))
like image 133
akrun Avatar answered Oct 17 '25 07:10

akrun



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!