I have two data frames. One (df1) contains all columns and rows of interest, but includes missing observations. The other (df2) includes values to be used in place of missing observations, and only includes columns and rows for which at least one NA was present in df1. I would like to merge the two data sets somehow to obtain the desired.result.
This seems like a very simple problem to solve, but I am drawing a blank. I cannot get merge to work. Maybe I could write nested for-loops, but have not done so yet. I also tried aggregate a few time. I am a little afraid to post this question, fearing my R card might be revoked. Sorry if this is a duplicate. I did search here and with Google fairly intensively. Thank you for any advice. A solution in base R is preferable.
df1 = read.table(text = "
county year1 year2 year3
aa 10 20 30
bb 1 NA 3
cc 5 10 NA
dd 100 NA 200
", sep = "", header = TRUE)
df2 = read.table(text = "
county year2 year3
bb 2 NA
cc NA 15
dd 150 NA
", sep = "", header = TRUE)
desired.result = read.table(text = "
county year1 year2 year3
aa 10 20 30
bb 1 2 3
cc 5 10 15
dd 100 150 200
", sep = "", header = TRUE)
This will do:
m <- merge(df1, df2, by="county", all=TRUE)
dotx <- m[,grepl("\\.x",names(m))]
doty <- m[,grepl("\\.y",names(m))]
dotx[is.na(dotx)] <- doty[is.na(dotx)]
names(dotx) <- sapply(strsplit(names(dotx),"\\."), `[`, 1)
result <- cbind(m[,!grepl("\\.x",names(m)) & !grepl("\\.y",names(m))], dotx)
Checking:
> result
county year1 year2 year3
1 aa 10 20 30
2 bb 1 2 3
3 cc 5 10 15
4 dd 100 150 200
aggregate can do this:
aggregate(. ~ county,
data=merge(df1, df2, all=TRUE), # Merged data, including NAs
na.action=na.pass, # Aggregate rows with missing values...
FUN=sum, na.rm=TRUE) # ...but instruct "sum" to ignore them.
## county year2 year3 year1
## 1 aa 20 30 10
## 2 bb 2 3 1
## 3 cc 10 15 5
## 4 dd 150 200 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With