Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine groups which fulfill certain condition for two consecutive time periods in R

Tags:

r

data.table

let's say I have this simple dataset called data:

customer_id <- c("1","1","1","2","2","2","2","3","3","3")
account_id <- as.character(c(11,11,11,55,55,55,55,38,38,38))
obs_date <- c(as.Date("2017-01-01","%Y-%m-%d"), as.Date("2017-02-01","%Y-%m-%d"), as.Date("2017-03-01","%Y-%m-%d"),
          as.Date("2017-12-01","%Y-%m-%d"), as.Date("2018-01-01","%Y-%m-%d"), as.Date("2018-02-01","%Y-%m-%d"),
          as.Date("2018-03-01","%Y-%m-%d"), as.Date("2018-04-01","%Y-%m-%d"), as.Date("2018-05-01","%Y-%m-%d"),
          as.Date("2018-06-01","%Y-%m-%d"))
variable <- c(87,90,100,120,130,150,12,13,15,14)
data <- data.table(customer_id,account_id,obs_date,variable)

and I would like to add another variable called indicator, that would equal to 1 for those customer_id, account_id pairs that have variable <= 90 for two or more consecutive observation dates (obs_date) and zero otherwise. Therefore, indicator would be equal to 1 for the first and the third customer_id, account_id pair and it would be like:

indicator <- c(1,1,1,0,0,0,0,1,1,1)
data <- data.table(customer_id,account_id,obs_date,variable, indicator)

Do you please have any ideas how to create this variable called indicator? I need to group by customer_id, account_id and identify those that have variable <= 90 for at least two consecutive time periods. Thanks very much.

like image 857
doremi Avatar asked Jan 27 '26 17:01

doremi


1 Answers

You could do...

data[, v := with(rle(variable <= 90), 
  any(lengths >= 2 & values)
), by=.(customer_id, account_id)]

    customer_id account_id   obs_date variable indicator     v
 1:           1         11 2017-01-01       87         1  TRUE
 2:           1         11 2017-02-01       90         1  TRUE
 3:           1         11 2017-03-01      100         1  TRUE
 4:           2         55 2017-12-01      120         0 FALSE
 5:           2         55 2018-01-01      130         0 FALSE
 6:           2         55 2018-02-01      150         0 FALSE
 7:           2         55 2018-03-01       12         0 FALSE
 8:           3         38 2018-04-01       13         1  TRUE
 9:           3         38 2018-05-01       15         1  TRUE
10:           3         38 2018-06-01       14         1  TRUE

To see how it works, look at a simpler line:

data[, rle(variable <= 90), by=.(customer_id, account_id)]

   customer_id account_id lengths values
1:           1         11       2   TRUE
2:           1         11       1  FALSE
3:           2         55       3  FALSE
4:           2         55       1   TRUE
5:           3         38       3   TRUE
like image 50
Frank Avatar answered Jan 29 '26 06:01

Frank



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!