I want to do a simple calculation for each row by group, but I need to refer to a previous row that meets certain conditions. I want to create a new variable, results
. For each row in each group, I want to find the closest row above where tag == "Y"
and code
is not NA
. Then, I want to use the value
from that row, and multiply by the value in the current row.
Minimal Example
df <- structure(list(name = c("apples", "apples", "apples", "apples",
"oranges", "oranges", "oranges", "oranges"),
id = 1:8,
tag = c("X", "Y", "Y", "X", "X", "Y", "X", "X"),
code = c(1, 1, NA, 1, NA, 1, NA, NA),
value = c(1, 11, 4, 3, 9, 5, 7, 8)),
class = "data.frame", row.names = c(NA, -8L))
name id tag code value
1 apples 1 X 1 1
2 apples 2 Y 1 11
3 apples 3 Y NA 4
4 apples 4 X 1 3
5 oranges 5 X NA 9
6 oranges 6 Y 1 5
7 oranges 7 X NA 7
8 oranges 8 X NA 8
Expected Output
For example, for row 3, row 2 would be the closest that meets the conditions, so multiply 4 by 11 (to get 44). For row 4, row 3 does not meet the conditions, so we go to row 2, and multiply 3 by 11 (to get 33). And so on.
name id tag code value results
1 apples 1 X 1 1 NA
2 apples 2 Y 1 11 NA
3 apples 3 Y NA 4 44
4 apples 4 X 1 3 33
5 oranges 5 X NA 9 NA
6 oranges 6 Y 1 5 NA
7 oranges 7 X NA 7 35
8 oranges 8 X NA 8 40
I am guessing that I will need to use cumsum
and/or fill
, but unsure how to use it here. I know that if I was preforming a calculation on the previous row, then I could use lag
, but unsure how to search multiple values above. I am open to base R, data.table
, tidyverse
, or other solutions.
df %>%
group_by(name) %>%
mutate(t = na_if(lag(value * (tag == 'Y' & !is.na(code))), 0)) %>%
fill(t) %>%
mutate(results = t * value)
# A tibble: 8 x 7
# Groups: name [2]
name id tag code value t results
<chr> <int> <chr> <dbl> <dbl> <dbl> <dbl>
1 apples 1 X 1 1 NA NA
2 apples 2 Y 1 11 NA NA
3 apples 3 Y NA 4 11 44
4 apples 4 X 1 3 11 33
5 oranges 5 X NA 9 NA NA
6 oranges 6 Y 1 5 NA NA
7 oranges 7 X NA 7 5 35
8 oranges 8 X NA 8 5 40
with data.table
:
library(data.table)
setDT(df)
df[,result:=value*shift(nafill(fifelse(tag=='Y'&!is.na(code),value,NA),type = 'locf')),
by=name][]
name id tag code value result
<char> <int> <char> <num> <num> <num>
1: apples 1 X 1 1 NA
2: apples 2 Y 1 11 NA
3: apples 3 Y NA 4 44
4: apples 4 X 1 3 33
5: oranges 5 X NA 9 NA
6: oranges 6 Y 1 5 NA
7: oranges 7 X NA 7 35
8: oranges 8 X NA 8 40
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With