I'm trying to group a data frame by 3 date ranges based on "2016-04-10" and "2016-04-24".
df <- structure(list(date = structure(c(16803, 16810, 16817, 16824,
16831, 16838, 16845, 16852, 16859, 16866, 16873, 16880, 16887,
16894, 16901, 16908, 16915, 16922, 16929, 16936, 16943), class = "Date"),
new = c(1507L, 2851L, 3550L, 5329L, 7557L, 5546L, 6264L,
7160L, 9468L, 5789L, 5928L, 4642L, 8145L, 4867L, 4846L, 5231L,
7137L, 3938L, 3741L, 2937L, 194L), resolved = c(21, 27, 15,
16, 56, 2773, 8490, 8748, 9325, 7734, 10264, 6739, 6110,
9613, 10314, 10349, 7200, 9637, 10831, 11170, 5666), ost = c(1486,
2824, 3535, 5313, 7501, 2773, -2226, -1588, 143, -1945, -4336,
-2097, 2035, -4746, -5468, -5118, -63, -5699, -7090, -8233,
-5472)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L), .Names = c("date", "new", "resolved", "ost"))
Tried the following:
df1 <- df %>% group_by(dr=cut(date,breaks=as.Date(c("2016-04-10","2016-04-24")))) %>%
summarise(ost = sum(ost))
Which gives the wrong result as below.
dr ost
2016-04-10 -10586
NA -17885
Help appreciated!
We create a grouping variable 'dr' with cut. The breaks mentioned are the range of the 'date' i.e. the min and max values of the 'date' along with the dates specified by the OP, concatenate it (c), use the option include.lowest and get the sum of 'ost' based on this grouping variable.
df %>%
group_by(dr = cut(date, breaks = c(range(date),
as.Date(c("2016-04-10", "2016-04-24"))), include.lowest=TRUE) ) %>%
summarise(ost =sum(ost))
# dr ost
# <fctr> <dbl>
#1 2016-01-03 8672
#2 2016-04-10 -10586
#3 2016-04-24 -26557
Or another option is findInterval which could be faster compared to cut.
df %>%
group_by(dr = findInterval(date, as.Date(c("2016-04-10", "2016-04-24")))) %>%
summarise(ost = sum(ost))
# dr ost
# <int> <dbl>
#1 0 8672
#2 1 -10586
#3 2 -26557
NOTE: The OP asked the question about cut and this solution gives that.
You can create a grouping variable first,
df %>%
mutate(group = cumsum(grepl('2016-04-10|2016-04-24', date))) %>%
group_by(group) %>%
summarise(ost = sum(ost))
#Source: local data frame [3 x 2]
# group ost
# (int) (dbl)
#1 0 8672
#2 1 -10586
#3 2 -26557
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With