I have two data frames that almost have the same samples. df1
has many samples and it contains almost all the samples that are found in df2
, apart from 2 or 3 samples.
In df1
there is a column, lets say it's the gender, that has NA
values. Those gender values are found in df2
.
I want to fill the NA values for each sample in df1
according to df2
, for the shared samples between them.
How can I do that? especially that df1
is much bigger than df2
and the samples are not in the same order.
So for example let's say this is df1
:
samples gender
1 Pt8 NA
2 Pt102 NA
3 Pt87 NA
4 Pt1 NA
And this is df2
:
subject_id gender
1 Pt1 male
2 Pt102 male
3 Pt6 female
4 Pt8 male
So I just fill in the NA values that are in df1
according to the sample name.
We may use a join
library(data.table)
setDT(df1)[df2, gender := fcoalesce(as.character(gender), i.gender),
on= .(samples = subject_id)]
Update: Please see comments (removed wrong first answer):
library(dplyr)
bind_rows(df1, df2 %>%
rename_with(~colnames(df1))) %>%
arrange(gender) %>%
distinct(samples, .keep_all = TRUE) %>%
semi_join(df1, by="samples") %>%
mutate(samples = factor(samples, levels = df1$samples)) %>%
arrange(samples)
samples gender
4 Pt8 male
2 Pt102 male
3 Pt87 <NA>
1 Pt1 male
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With