I have a dataframe containing dates when a given event occurred. Some events go on for several days, and I want to summarise each event based on its start date and its total length (in days).
I want to go from this:
| Date |
|---|
| 2020-01-01 |
| 2020-01-02 |
| 2020-01-03 |
| 2020-01-15 |
| 2020-01-20 |
| 2020-01-21 |
To this:
| StartDate | EventLength |
|---|---|
| 2020-01-01 | 3 |
| 2020-01-15 | 1 |
| 2020-01-20 | 2 |
I've tried various approaches with aggregate, ave, seq_along and lag, but I haven't managed to get a count of event length that resets when the dates aren't sequential.
Code for the example data frame in case it's helpful:
Date <- c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-15", "2020-01-20", "2020-01-21")
df <- data.frame(Date)
df$Date <- as.Date(df$Date, origin = "1970-01-01")
You can split by cumsum(c(0, diff(df$Date) != 1) and then take the first date and combine it with the length assuming the dates are sorted.
do.call(rbind, lapply(split(df$Date, cumsum(c(0, diff(df$Date) != 1))),
function(x) data.frame(StartDate=x[1], EventLength=length(x))))
# StartDate EventLength
#0 2020-01-01 3
#1 2020-01-15 1
#2 2020-01-20 2
or another option using rle:
i <- cumsum(c(0, diff(df$Date) != 1))
data.frame(StartDate = df$Date[c(1, diff(i)) == 1], EventLength=rle(i)$lengths)
# StartDate EventLength
#1 2020-01-01 3
#2 2020-01-15 1
#3 2020-01-20 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With