Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Count the number of days in (long) weekends in a dataframe

Tags:

dataframe

r

I have the following dataframe:

DAYS7 <- c('Monday','Tuesday','Wednesday','Thursday','Friday', 'Saturday', 'Sunday')   
DAYS <- rep(DAYS7,5)   
A <- c(0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1)   
B <- c(0,0,0,0,0,1,2,0,0,0,0,1,2,3,0,0,0,0,1,2,3,4,5,0,0,1,2,3,0,0,0,0,0,1,2)   
DESIRED_OUTPUT <- c(2,2,2,2,2,2,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,3,3,3,3,3,2,2,2,2,2,2,2)   
dataframe <- as.data.frame(cbind(DAYS, A, B, DESIRED_OUTPUT))  

If it's a working day, the corresponding cell in column A equals zero. If it is holiday, column A shows 1. (Note that there might be long weekends, e.g. row 12, row 19, rows 22-23!) Column B is essentially a cummulative sum of column A, with the condition that whenever a zero occurs, the cummulation resets. What I intend to do is to calculate the length of each weekend (in days) and then print the result in an additional column like this (see column DESIRED_OUTPUT):
enter image description here

In Excel it is relatively easy…
enter image description here

…but I'm struggling to find a proper solution in R. Any suggestions?

like image 304
Laszlo Avatar asked Dec 15 '25 16:12

Laszlo


2 Answers

The tricky part is to define one week which is group of 0's followed by group of 1's. We can do that using rle, once that is done we can use that as a grouping variable in ave and count number of 1's in each group using sum.

with(dataframe, ave(A, with(rle(A == 0), rep(cumsum(values), lengths)), FUN = sum))
#[1] 2 2 2 2 2 2 2 3 3 3 3 3 3 3 5 5 5 5 5 5 5 5 5 3 3 3 3 3 2 2 2 2 2 2 2

Since B is cumulative sum of A, we can use that as well here and count the max value in group.

with(dataframe, ave(B, with(rle(A == 0), rep(cumsum(values), lengths)), FUN = max))
like image 78
Ronak Shah Avatar answered Dec 17 '25 11:12

Ronak Shah


The dplyr way...

library(tidyverse)


DAYS7 <- c('Monday','Tuesday','Wednesday','Thursday','Friday', 'Saturday', 'Sunday')   
DAYS <- rep(DAYS7,5)   
A <- c(0,0,0,0,0,1,1,0,0,0,0,1,1,1,0,0,0,0,1,1,1,1,1,0,0,1,1,1,0,0,0,0,0,1,1)   
B <- c(0,0,0,0,0,1,2,0,0,0,0,1,2,3,0,0,0,0,1,2,3,4,5,0,0,1,2,3,0,0,0,0,0,1,2)   
DESIRED_OUTPUT <- c(2,2,2,2,2,2,2,3,3,3,3,3,3,3,5,5,5,5,5,5,5,5,5,3,3,3,3,3,2,2,2,2,2,2,2)   
dataframe <- data.frame(DAYS, A, B, DESIRED_OUTPUT)  


dataframe %>%
    mutate(beginningOfWeek = (B - lag(B)) < 0) %>%
    mutate(beginningOfWeek = replace_na(beginningOfWeek, TRUE)) %>%
    mutate(week_nr = cumsum(beginningOfWeek)) %>%
    group_by(week_nr) %>%
    mutate(desired_output = max(B))

   DAYS          A     B DESIRED_OUTPUT beginningOfWeek week_nr desired_output
   <fct>     <dbl> <dbl>          <dbl> <lgl>             <int>          <dbl>
 1 Monday        0     0              2 TRUE                  1              2
 2 Tuesday       0     0              2 FALSE                 1              2
 3 Wednesday     0     0              2 FALSE                 1              2
 4 Thursday      0     0              2 FALSE                 1              2
 5 Friday        0     0              2 FALSE                 1              2
 6 Saturday      1     1              2 FALSE                 1              2
 7 Sunday        1     2              2 FALSE                 1              2
 8 Monday        0     0              3 TRUE                  2              3
 9 Tuesday       0     0              3 FALSE                 2              3
10 Wednesday     0     0              3 FALSE                 2              3
11 Thursday      0     0              3 FALSE                 2              3
12 Friday        1     1              3 FALSE                 2              3
13 Saturday      1     2              3 FALSE                 2              3
14 Sunday        1     3              3 FALSE                 2              3
15 Monday        0     0              5 TRUE                  3              5
16 Tuesday       0     0              5 FALSE                 3              5
17 Wednesday     0     0              5 FALSE                 3              5
18 Thursday      0     0              5 FALSE                 3              5
19 Friday        1     1              5 FALSE                 3              5
20 Saturday      1     2              5 FALSE                 3              5
21 Sunday        1     3              5 FALSE                 3              5
22 Monday        1     4              5 FALSE                 3              5
23 Tuesday       1     5              5 FALSE                 3              5
24 Wednesday     0     0              3 TRUE                  4              3
25 Thursday      0     0              3 FALSE                 4              3
26 Friday        1     1              3 FALSE                 4              3
27 Saturday      1     2              3 FALSE                 4              3
28 Sunday        1     3              3 FALSE                 4              3
29 Monday        0     0              2 TRUE                  5              2
30 Tuesday       0     0              2 FALSE                 5              2
31 Wednesday     0     0              2 FALSE                 5              2
32 Thursday      0     0              2 FALSE                 5              2
33 Friday        0     0              2 FALSE                 5              2
34 Saturday      1     1              2 FALSE                 5              2
35 Sunday        1     2              2 FALSE                 5              2

So, I didn't remove the intermediate columns, so that it's clearer, what they are actually for.

like image 43
Georgery Avatar answered Dec 17 '25 10:12

Georgery



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!