Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Enumerate events where n consecutive values are not NA [duplicate]

Tags:

r

dplyr

I would like to compute a new column in a dataframe that finds n consecutive values not being null and adds an enumeration to it, per user group. Let df be our sample dataframe:

n <- 4

# Sample DataFrame
df <- tibble(
  user_id = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2),
  date = as.Date('2024-01-01') + 0:19,
  value1 = c(NA, 5, 6, 7, 8, NA, NA, 2, 3, 4, 5, NA, 1, 2, 3, NA, 1, 2, 3, 4)
)

Expected result:

# A tibble: 20 × 4
   user_id date       value1 expected_result
     <dbl> <date>      <dbl>           <dbl>
 1       1 2024-01-01     NA              NA
 2       1 2024-01-02      5               1
 3       1 2024-01-03      6               2
 4       1 2024-01-04      7               3
 5       1 2024-01-05      8               4
 6       1 2024-01-06     NA              NA
 7       1 2024-01-07     NA              NA
 8       1 2024-01-08      2               5
 9       1 2024-01-09      3               6
10       1 2024-01-10      4               7
11       1 2024-01-11      5               8
12       1 2024-01-12     NA              NA
13       2 2024-01-13      1              NA
14       2 2024-01-14      2              NA
15       2 2024-01-15      3              NA
16       2 2024-01-16     NA              NA
17       2 2024-01-17      1               1
18       2 2024-01-18      2               2
19       2 2024-01-19      3               3
20       2 2024-01-20      4               4

like image 283
Codutie Avatar asked Oct 25 '25 05:10

Codutie


2 Answers

You can use cumsum

df %>%
  mutate(gp=value1/value1 * cumsum(is.na(value1)), .by=user_id) %>%
  mutate(n=(n()>=n & !is.na(value1)), .by=c(user_id, gp)) %>%
  mutate(expected=na_if(value1/value1 * cumsum(!is.na(value1) & n), 0), .by=user_id) %>%
  select(-c(gp, n))

Gives

# A tibble: 20 × 4
   user_id date       value1 expected
     <dbl> <date>      <dbl>    <dbl>
 1       1 2024-01-01     NA       NA
 2       1 2024-01-02      5        1
 3       1 2024-01-03      6        2
 4       1 2024-01-04      7        3
 5       1 2024-01-05      8        4
 6       1 2024-01-06     NA       NA
 7       1 2024-01-07     NA       NA
 8       1 2024-01-08      2        5
 9       1 2024-01-09      3        6
10       1 2024-01-10      4        7
11       1 2024-01-11      5        8
12       1 2024-01-12     NA       NA
13       2 2024-01-13      1       NA 
14       2 2024-01-14      2       NA 
15       2 2024-01-15      3       NA  
16       2 2024-01-16     NA       NA
17       2 2024-01-17      1        1
18       2 2024-01-18      2        2
19       2 2024-01-19      3        3
20       2 2024-01-20      4        4
like image 67
Edward Avatar answered Oct 27 '25 17:10

Edward


You could use a combination of rle and collapse::seqid. Basically rle creates a vector l that is 1 if x is NA, and otherwise is the length of consecutive non-NAs. Then, replace to NA if the length if below n, and compute the seqid on the non-NAs.

library(collapse)
library(dplyr)

f <- function(x, n){
  l <- with(rle(!is.na(x)), rep(lengths, lengths))
  qF(seqid(replace(l, l < n, NA), na.skip = TRUE))
}

df |>
  mutate(new = f(value1, n), .by = user_id)

   user_id       date value1  new
1        1 2024-01-01     NA <NA>
2        1 2024-01-02      5    1
3        1 2024-01-03      6    2
4        1 2024-01-04      7    3
5        1 2024-01-05      8    4
6        1 2024-01-06     NA <NA>
7        1 2024-01-07     NA <NA>
8        1 2024-01-08      2    5
9        1 2024-01-09      3    6
10       1 2024-01-10      4    7
11       1 2024-01-11      5    8
12       1 2024-01-12     NA <NA>
13       2 2024-01-13      1 <NA>
14       2 2024-01-14      2 <NA>
15       2 2024-01-15      3 <NA>
16       2 2024-01-16     NA <NA>
17       2 2024-01-17      1    1
18       2 2024-01-18      2    2
19       2 2024-01-19      3    3
20       2 2024-01-20      4    4
like image 23
Maël Avatar answered Oct 27 '25 18:10

Maël