I have 3 data frames (rows: sites, columns:species name) of species abundances within sites. Row numbers are identical, but column numbers differ as not all species are in all three data frames. I would like to merge them into one data frame with abundances of identical species summed up. For example:
data.frame1
       Sp1  Sp2  Sp3  Sp4
site1   1    2    3    1
site2   0    2    0    1
site3   1    1    1    1
data.frame2
       Sp1  Sp2  Sp4
 site1  0    1    2
 site2  1    2    0
 site3  1    1    1
data.frame3
       Sp1  Sp2  Sp5  Sp6
 site1  0    1    1    1     
 site2  1    1    1    5
 site3  2    0    0    0
What I want to have is something like:
       Sp1  Sp2  Sp3  Sp4  Sp5  Sp6
 site1  1    4    3    3    1    1
 site2  2    5    0    1    1    5
 site3  4    2    1    2    0    0
I guess i'd have to work with merge, but so far my attempts have failed to get what I want.
Any help is appreciated.
I'd use plyr's rbind.fill like this:
pp <- cbind(names=c(rownames(df1), rownames(df2), rownames(df3)), 
                        rbind.fill(list(df1, df2, df3)))
#   names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1   1   2   3   1  NA  NA
# 2 site2   0   2   0   1  NA  NA
# 3 site3   1   1   1   1  NA  NA
# 4 site1   0   1  NA   2  NA  NA
# 5 site2   1   2  NA   0  NA  NA
# 6 site3   1   1  NA   1  NA  NA
# 7 site1   0   1  NA  NA   1   1
# 8 site2   1   1  NA  NA   1   5
# 9 site3   2   0  NA  NA   0   0
Then, aggregate with plyr's ddply as follows:
ddply(pp, .(names), function(x) colSums(x[,-1], na.rm = TRUE))
#   names Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# 1 site1   1   4   3   3   1   1
# 2 site2   2   5   0   1   1   5
# 3 site3   4   2   1   2   0   0
Adding to the options available, here are two more that stick with base R.
First option: Wide aggregation (sort of)
temp <- cbind(df1, df2, df3)
temp
#       Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp4 Sp1 Sp2 Sp5 Sp6
# site1   1   2   3   1   0   1   2   0   1   1   1
# site2   0   2   0   1   1   2   0   1   1   1   5
# site3   1   1   1   1   1   1   1   2   0   0   0
sapply(unique(colnames(temp)), 
       function(x) rowSums(temp[, colnames(temp) == x, drop = FALSE]))
#       Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
# site1   1   4   3   3   1   1
# site2   2   5   0   1   1   5
# site3   4   2   1   2   0   0
Second option: semi-wide to long to wide
Conceptually, this is similar to Maxim. K's answer: Get the data in a long form, and it makes it much easier to manipulate things:
> temp1 <- t(cbind(df1, df2, df3))
> # You'll get a warning in the next step
> # Safe to ignore though...
> temp2 <- data.frame(var = rownames(temp), stack(data.frame(temp)))
Warning message:
In data.row.names(row.names, rowsi, i) :
  some row.names duplicated: 5,6,7,8,9 --> row.names NOT used
> xtabs(values ~ ind + var, temp2)
       var
ind     Sp1 Sp2 Sp3 Sp4 Sp5 Sp6
  site1   1   4   3   3   1   1
  site2   2   5   0   1   1   5
  site3   4   2   1   2   0   0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With