Count the number of days in (long) weekends in a dataframe

Question

I have the following dataframe:

DAYS7 <- c('Monday','Tuesday','Wednesday','Thursday','Friday', 'Saturday', 'Sunday')   
DAYS <- rep(DAYS7,5)   
A <- c(0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1)   
B <- c(0,0,0,0,0,1,2,0,0,0,0,1,2,3,0,0,0,0,1,2,3,4,5,0,0,1,2,3,0,0,0,0,0,1,2)   
DESIRED_OUTPUT <- c(2,2,2,2,2,2,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,3,3,3,3,3,2,2,2,2,2,2,2)   
dataframe <- as.data.frame(cbind(DAYS, A, B, DESIRED_OUTPUT))

If it's a working day, the corresponding cell in column A equals zero. If it is holiday, column A shows 1. (Note that there might be long weekends, e.g. row 12, row 19, rows 22-23!) Column B is essentially a cummulative sum of column A, with the condition that whenever a zero occurs, the cummulation resets. What I intend to do is to calculate the length of each weekend (in days) and then print the result in an additional column like this (see column DESIRED_OUTPUT):
enter image description here

In Excel it is relatively easy…
enter image description here

…but I'm struggling to find a proper solution in R. Any suggestions?

Ronak Shah · Accepted Answer

The tricky part is to define one week which is group of 0's followed by group of 1's. We can do that using rle, once that is done we can use that as a grouping variable in ave and count number of 1's in each group using sum.

with(dataframe, ave(A, with(rle(A == 0), rep(cumsum(values), lengths)), FUN = sum))
#[1] 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 3 3 3 3 3 2 2 2 2 2 2 2

Since B is cumulative sum of A, we can use that as well here and count the max value in group.

with(dataframe, ave(B, with(rle(A == 0), rep(cumsum(values), lengths)), FUN = max))

Georgery · Answer

The dplyr way...

library(tidyverse)


DAYS7 <- c('Monday','Tuesday','Wednesday','Thursday','Friday', 'Saturday', 'Sunday')   
DAYS <- rep(DAYS7,5)   
A <- c(0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1)   
B <- c(0,0,0,0,0,1,2,0,0,0,0,1,2,3,0,0,0,0,1,2,3,4,5,0,0,1,2,3,0,0,0,0,0,1,2)   
DESIRED_OUTPUT <- c(2,2,2,2,2,2,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,3,3,3,3,3,2,2,2,2,2,2,2)   
dataframe <- data.frame(DAYS, A, B, DESIRED_OUTPUT)  


dataframe %>%
    mutate(beginningOfWeek = (B - lag(B)) < 0) %>%
    mutate(beginningOfWeek = replace_na(beginningOfWeek, TRUE)) %>%
    mutate(week_nr = cumsum(beginningOfWeek)) %>%
    group_by(week_nr) %>%
    mutate(desired_output = max(B))

   DAYS          A     B DESIRED_OUTPUT beginningOfWeek week_nr desired_output
   <fct>     <dbl> <dbl>          <dbl> <lgl>             <int>          <dbl>
 1 Monday        0     0              2 TRUE                  1              2
 2 Tuesday       0     0              2 FALSE                 1              2
 3 Wednesday     0     0              2 FALSE                 1              2
 4 Thursday      0     0              2 FALSE                 1              2
 5 Friday        0     0              2 FALSE                 1              2
 6 Saturday      1     1              2 FALSE                 1              2
 7 Sunday        1     2              2 FALSE                 1              2
 8 Monday        0     0              3 TRUE                  2              3
 9 Tuesday       0     0              3 FALSE                 2              3
10 Wednesday     0     0              3 FALSE                 2              3
11 Thursday      0     0              3 FALSE                 2              3
12 Friday        1     1              3 FALSE                 2              3
13 Saturday      1     2              3 FALSE                 2              3
14 Sunday        1     3              3 FALSE                 2              3
15 Monday        0     0              5 TRUE                  3              5
16 Tuesday       0     0              5 FALSE                 3              5
17 Wednesday     0     0              5 FALSE                 3              5
18 Thursday      0     0              5 FALSE                 3              5
19 Friday        1     1              5 FALSE                 3              5
20 Saturday      1     2              5 FALSE                 3              5
21 Sunday        1     3              5 FALSE                 3              5
22 Monday        1     4              5 FALSE                 3              5
23 Tuesday       1     5              5 FALSE                 3              5
24 Wednesday     0     0              3 TRUE                  4              3
25 Thursday      0     0              3 FALSE                 4              3
26 Friday        1     1              3 FALSE                 4              3
27 Saturday      1     2              3 FALSE                 4              3
28 Sunday        1     3              3 FALSE                 4              3
29 Monday        0     0              2 TRUE                  5              2
30 Tuesday       0     0              2 FALSE                 5              2
31 Wednesday     0     0              2 FALSE                 5              2
32 Thursday      0     0              2 FALSE                 5              2
33 Friday        0     0              2 FALSE                 5              2
34 Saturday      1     1              2 FALSE                 5              2
35 Sunday        1     2              2 FALSE                 5              2

So, I didn't remove the intermediate columns, so that it's clearer, what they are actually for.

Count the number of days in (long) weekends in a dataframe

Tags:

dataframe

r

Laszlo

2 Answers

Ronak Shah

Georgery

Recent Activity

Donate For Us

Count the number of days in (long) weekends in a dataframe

Tags:

dataframe

r

Laszlo

2 Answers

Ronak Shah

Georgery

Related questions

Recent Activity

Donate For Us