Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find closest value above current row that meets condition and apply function for each group

I want to do a simple calculation for each row by group, but I need to refer to a previous row that meets certain conditions. I want to create a new variable, results. For each row in each group, I want to find the closest row above where tag == "Y" and code is not NA. Then, I want to use the value from that row, and multiply by the value in the current row.

Minimal Example

df <- structure(list(name = c("apples", "apples", "apples", "apples", 
                              "oranges", "oranges", "oranges", "oranges"), 
               id = 1:8, 
               tag = c("X", "Y", "Y", "X", "X", "Y", "X", "X"), 
               code = c(1, 1, NA, 1, NA, 1, NA, NA),
               value = c(1, 11, 4, 3, 9, 5, 7, 8)), 
          class = "data.frame", row.names = c(NA, -8L))

     name id tag code value
1  apples  1   X    1     1
2  apples  2   Y    1    11
3  apples  3   Y   NA     4
4  apples  4   X    1     3
5 oranges  5   X   NA     9
6 oranges  6   Y    1     5
7 oranges  7   X   NA     7
8 oranges  8   X   NA     8

Expected Output

For example, for row 3, row 2 would be the closest that meets the conditions, so multiply 4 by 11 (to get 44). For row 4, row 3 does not meet the conditions, so we go to row 2, and multiply 3 by 11 (to get 33). And so on.

     name id tag code value results
1  apples  1   X    1     1      NA
2  apples  2   Y    1    11      NA
3  apples  3   Y   NA     4      44
4  apples  4   X    1     3      33
5 oranges  5   X   NA     9      NA
6 oranges  6   Y    1     5      NA
7 oranges  7   X   NA     7      35
8 oranges  8   X   NA     8      40

I am guessing that I will need to use cumsum and/or fill, but unsure how to use it here. I know that if I was preforming a calculation on the previous row, then I could use lag, but unsure how to search multiple values above. I am open to base R, data.table, tidyverse, or other solutions.

like image 444
AndrewGB Avatar asked Sep 06 '25 03:09

AndrewGB


2 Answers

df %>% 
  group_by(name) %>%
  mutate(t = na_if(lag(value * (tag == 'Y' & !is.na(code))), 0)) %>%
  fill(t) %>%
  mutate(results = t * value)

# A tibble: 8 x 7
# Groups:   name [2]
  name       id tag    code value     t results
  <chr>   <int> <chr> <dbl> <dbl> <dbl>   <dbl>
1 apples      1 X         1     1    NA      NA
2 apples      2 Y         1    11    NA      NA
3 apples      3 Y        NA     4    11      44
4 apples      4 X         1     3    11      33
5 oranges     5 X        NA     9    NA      NA
6 oranges     6 Y         1     5    NA      NA
7 oranges     7 X        NA     7     5      35
8 oranges     8 X        NA     8     5      40
like image 83
KU99 Avatar answered Sep 08 '25 00:09

KU99


with data.table:

library(data.table)
setDT(df)

df[,result:=value*shift(nafill(fifelse(tag=='Y'&!is.na(code),value,NA),type = 'locf')), 
   by=name][]

      name    id    tag  code value result
    <char> <int> <char> <num> <num>  <num>
1:  apples     1      X     1     1     NA
2:  apples     2      Y     1    11     NA
3:  apples     3      Y    NA     4     44
4:  apples     4      X     1     3     33
5: oranges     5      X    NA     9     NA
6: oranges     6      Y     1     5     NA
7: oranges     7      X    NA     7     35
8: oranges     8      X    NA     8     40
like image 40
Waldi Avatar answered Sep 07 '25 23:09

Waldi