Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Conditional summing across columns with dplyr

I have a data frame with four habitats sampled over eight months. Ten samples were collected from each habitat each month. The number of individuals for species in each sample was counted. The following code generates a smaller data frame of a similar structure.

# Pseudo data
Habitat <- factor(c(rep("Dry",6), rep("Wet",6)), levels = c("Dry","Wet"))
Month <- factor(rep(c(rep("Jan",2), rep("Feb",2), rep("Mar",2)),2), levels=c("Jan","Feb","Mar"))
Sample <- rep(c(1,2),6)
Species1 <- rpois(12,6)
Species2 <- rpois(12,6)
Species3 <- rpois(12,6)

df <- data.frame(Habitat,Month, Sample, Species1, Species2, Species3)

I want to sum the total number of individuals by month, across all species sampled. I'm using ddply (preferred) but I'm open to other suggestions.

The closest I get is to add together the sum of each column, as shown here.

library(plyr)
ddply(df, ~ Month, summarize, tot_by_mon = sum(Species1) + sum(Species2) + sum(Species3))

#   Month tot_by_mon
# 1   Jan         84
# 2   Feb         92
# 3   Mar         67

This works, but I wonder if there is a generic method to handle cases with an "unknown" number of species. That is, the first species always begins in the 4th column but the last species could be in the 10th or 42nd column. I do not want to hard code the actual species names into the summary function. Note that the species names vary widely, such as Doryflav and Pheibica.

like image 421
Michael S Taylor Avatar asked Dec 04 '25 18:12

Michael S Taylor


2 Answers

Similar to @useR's answer with data.table's melt, you can use tidyr to reshape with gather:

library(tidyr)
library(dplyr)
gather(df, Species, Value, matches("Species")) %>% 
  group_by(Month) %>% summarise(z = sum(Value))

# A tibble: 3 x 2
   Month     z
  <fctr> <int>
1    Jan    90
2    Feb    81
3    Mar    70

If you know the columns by position instead of a pattern to be "matched"...

gather(df, Species, Value, -(1:3)) %>% 
  group_by(Month) %>% summarise(z = sum(Value))

(Results shown using @akrun's set.seed(123) example data.)

like image 54
Frank Avatar answered Dec 06 '25 06:12

Frank


Here's another solution with data.table without needing to know the names of the "Species" columns:

library(data.table)

DT = melt(setDT(df), id.vars = c("Habitat", "Month", "Sample"))    
DT[, .(tot_by_mon=sum(value)), by = "Month"]

or if you want it compact, here's a one-liner:

melt(setDT(df), 1:3)[, .(tot_by_mon=sum(value)), by = "Month"]

Result:

   Month tot_by_mon
1:   Jan         90
2:   Feb         81
3:   Mar         70

Data: (Setting seed to make example reproducible)

set.seed(123)
Habitat <- factor(c(rep("Dry",6), rep("Wet",6)), levels = c("Dry","Wet"))
Month <- factor(rep(c(rep("Jan",2), rep("Feb",2), rep("Mar",2)),2), levels=c("Jan","Feb","Mar"))
Sample <- rep(c(1,2),6)
Species1 <- rpois(12,6)
Species2 <- rpois(12,6)
Species3 <- rpois(12,6)

df <- data.frame(Habitat,Month, Sample, Species1, Species2, Species3)
like image 36
acylam Avatar answered Dec 06 '25 06:12

acylam



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!