Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combining columns, update columns based on other df, fill NAs

Tags:

merge

r

dplyr

at the beginning I'd like to note that I found multiple solutions on SO but none of them met my expectations.

I have to DF's:

1.

E                           F              G        H
chr1_100203723_100203724    NA             NA       NA
chr1_100212951_100212952    rs760764323    A,G,     0.000008,0.999992,
chr1_10032235_10032236      NA             NA       NA
chr1_100327060_100327061    NA             NA       NA
chr1_100346889_100346890    NA             NA       NA
chr1_100347237_100347238    rs749372877    C,G,T,   0.000008,0.000008,0.999983,
chr1_100357190_100357191    NA             NA       NA
chr1_100358057_100358058    NA             NA       NA
chr2_182852606_182852607    NA             NA       NA
chr2_202492077_202492078    NA             NA       NA
chr2_203760838_203760839    NA             NA       NA
chr2_215976351_215976352    NA             NA       NA
chr2_220354644_220354645    NA             NA       NA
chr2_234749403_234749404    NA             NA       NA
chr2_11802110_11802111      NA             NA       NA
chr2_31167747_31167748      NA             NA       NA

2.

E                           F               G       H
chr1_100203723_100203724    NA              NA      NA
chr1_100212951_100212952    NA              NA      NA
chr1_10032235_10032236      NA              NA      NA
chr1_100327060_100327061    NA              NA      NA
chr1_100346889_100346890    NA              NA      NA
chr1_100347237_100347238    NA              NA      NA
chr1_100357190_100357191    NA              NA      NA
chr1_100358057_100358058    NA              NA      NA
chr2_182852606_182852607    rs773426830     C,T,    0.999967,0.000033,
chr2_202492077_202492078    rs750583431     C,G,    0.000013,0.999987,
chr2_203760838_203760839    NA              NA      NA
chr2_215976351_215976352    rs113648834     C,T,    0.999934,0.000066,
chr2_220354644_220354645    NA              NA      NA
chr2_234749403_234749404    NA              NA      NA
chr2_11802110_11802111      rs371327070     A,G,    0.000044,0.999956,
chr2_31167747_31167748      rs201375957     A,C,T,  0.000008,0.999887,0.000105,

Desired output:

E                           F               G       H
chr1_100203723_100203724    NA              NA      NA
chr1_100212951_100212952    rs760764323     A,G,    0.000008,0.999992,
chr1_10032235_10032236      NA              NA      NA
chr1_100327060_100327061    NA              NA      NA
chr1_100346889_100346890    NA              NA      NA
chr1_100347237_100347238    rs749372877     C,G,T,  0.000008,0.000008,0.999983,
chr1_100357190_100357191    NA              NA      NA
chr1_100358057_100358058    NA              NA      NA
chr2_182852606_182852607    rs773426830     C,T,    0.999967,0.000033,
chr2_202492077_202492078    rs750583431     C,G,    0.000013,0.999987,
chr2_203760838_203760839    NA              NA      NA
chr2_215976351_215976352    rs113648834     C,T,    0.999934,0.000066,
chr2_220354644_220354645    NA              NA      NA
chr2_234749403_234749404    NA              NA      NA
chr2_11802110_11802111      rs371327070     A,G,    0.000044,0.999956,
chr2_31167747_31167748      rs201375957     A,C,T,  0.000008,0.999887,0.000105,

As you see DF1 is updated by DF2 columns F, G, H, where column E is my unique index. I tried to do merge() but this function didin't update my rows, it jus added columns of DF2 to DF1. I also tried updating with data.table and tidyverse, and my rows have been updated but other ones went to NAs... Finally I decided to do simple lapply() with nested ifelse(), however I don't know how to update all three columns simultaneously, what is more this is terrible slow for my over 50000 rows of data in each DF...

What I did so far:

DF1$F <- sapply(1:nrow(DF1), function(i) ifelse(DF1[i,1]==DF2[i,1] & is.na(DF1[i,1]), DF2[i,1], DF[i,1]))
like image 610
Adamm Avatar asked Dec 19 '25 20:12

Adamm


1 Answers

You cand do this in base R:

as.data.frame(Map(function(x,y) ifelse(is.na(x),y,x),DF1,DF2))

with library purrr you can have a prettier more compact form (see Soto's answer for an even more compact one with dplyr):

library(purrr)
map2_df(DF1,DF2,~ifelse(is.na(.x),.y,.x))

In both cases (technically a data.frame in the first case and a tibble in the second case):

output

                            E           F      G                           H
1    chr1_100203723_100203724        <NA>   <NA>                        <NA>
2    chr1_100212951_100212952 rs760764323   A,G,          0.000008,0.999992,
3    chr1_10032235_10032236        <NA>   <NA>                        <NA>
4    chr1_100327060_100327061        <NA>   <NA>                        <NA>
5    chr1_100346889_100346890        <NA>   <NA>                        <NA>
6    chr1_100347237_100347238 rs749372877 C,G,T, 0.000008,0.000008,0.999983,
7    chr1_100357190_100357191        <NA>   <NA>                        <NA>
8    chr1_100358057_100358058        <NA>   <NA>                        <NA>
9    chr2_182852606_182852607 rs773426830   C,T,          0.999967,0.000033,
10   chr2_202492077_202492078 rs750583431   C,G,          0.000013,0.999987,
11   chr2_203760838_203760839        <NA>   <NA>                        <NA>
12   chr2_215976351_215976352 rs113648834   C,T,          0.999934,0.000066,
13   chr2_220354644_220354645        <NA>   <NA>                        <NA>
14   chr2_234749403_234749404        <NA>   <NA>                        <NA>
15   chr2_11802110_11802111 rs371327070   A,G,          0.000044,0.999956,
16   chr2_31167747_31167748 rs201375957 A,C,T, 0.000008,0.999887,0.000105,

data

DF1 <- read.table(text="E                           F              G        H
chr1_100203723_100203724    NA             NA       NA
chr1_100212951_100212952    rs760764323    A,G,     0.000008,0.999992,
chr1_10032235_10032236      NA             NA       NA
chr1_100327060_100327061    NA             NA       NA
chr1_100346889_100346890    NA             NA       NA
chr1_100347237_100347238    rs749372877    C,G,T,   0.000008,0.000008,0.999983,
chr1_100357190_100357191    NA             NA       NA
chr1_100358057_100358058    NA             NA       NA
chr2_182852606_182852607    NA             NA       NA
chr2_202492077_202492078    NA             NA       NA
chr2_203760838_203760839    NA             NA       NA
chr2_215976351_215976352    NA             NA       NA
chr2_220354644_220354645    NA             NA       NA
chr2_234749403_234749404    NA             NA       NA
chr2_11802110_11802111      NA             NA       NA
chr2_31167747_31167748      NA             NA       NA",header=T,stringsAsFactors=F)


DF2 <- read.table(text="E                           F               G       H
chr1_100203723_100203724    NA              NA      NA
chr1_100212951_100212952    NA              NA      NA
chr1_10032235_10032236      NA              NA      NA
chr1_100327060_100327061    NA              NA      NA
chr1_100346889_100346890    NA              NA      NA
chr1_100347237_100347238    NA              NA      NA
chr1_100357190_100357191    NA              NA      NA
chr1_100358057_100358058    NA              NA      NA
chr2_182852606_182852607    rs773426830     C,T,    0.999967,0.000033,
chr2_202492077_202492078    rs750583431     C,G,    0.000013,0.999987,
chr2_203760838_203760839    NA              NA      NA
chr2_215976351_215976352    rs113648834     C,T,    0.999934,0.000066,
chr2_220354644_220354645    NA              NA      NA
chr2_234749403_234749404    NA              NA      NA
chr2_11802110_11802111      rs371327070     A,G,    0.000044,0.999956,
chr2_31167747_31167748      rs201375957     A,C,T,  0.000008,0.999887,0.000105,",header=T,stringsAsFactors=F)
like image 165
Moody_Mudskipper Avatar answered Dec 21 '25 11:12

Moody_Mudskipper



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!