Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill NA values in a data frame according to another data frame

Tags:

dataframe

r

na

I have two data frames that almost have the same samples. df1 has many samples and it contains almost all the samples that are found in df2, apart from 2 or 3 samples.

In df1 there is a column, lets say it's the gender, that has NA values. Those gender values are found in df2.

I want to fill the NA values for each sample in df1 according to df2, for the shared samples between them.

How can I do that? especially that df1 is much bigger than df2 and the samples are not in the same order.

So for example let's say this is df1:

       samples       gender
1        Pt8           NA
2        Pt102         NA
3        Pt87          NA
4        Pt1           NA

And this is df2:

      subject_id     gender
1        Pt1          male
2        Pt102        male
3        Pt6          female
4        Pt8          male

So I just fill in the NA values that are in df1 according to the sample name.

like image 443
Programming Noob Avatar asked Oct 14 '25 09:10

Programming Noob


2 Answers

We may use a join

library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender), 
   on= .(samples = subject_id)]
like image 54
akrun Avatar answered Oct 17 '25 02:10

akrun


Update: Please see comments (removed wrong first answer):

library(dplyr)

bind_rows(df1, df2 %>% 
            rename_with(~colnames(df1))) %>% 
  arrange(gender) %>% 
  distinct(samples, .keep_all = TRUE) %>% 
  semi_join(df1, by="samples") %>% 
  mutate(samples = factor(samples, levels = df1$samples)) %>%
  arrange(samples)

  samples gender
4     Pt8   male
2   Pt102   male
3    Pt87   <NA>
1     Pt1   male
like image 38
TarJae Avatar answered Oct 17 '25 02:10

TarJae



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!