I have a large data.table, similar in structure to df:
library("data.table")
df <- data.frame(part = c("A", "B", "A", "C", "A", "D", "B", "D", "E"),
day = c(1, 2, 3, 4, 5, 6, 6, 7, 15),
code = c("S", "S", "P", "X", "P", "S", "P", "P", "P"))
setDT(df)
df
part day code
1: A 1 S
2: B 2 S
3: A 3 P
4: C 4 X
5: A 5 P
6: D 6 S
7: B 6 P
8: D 7 P
9: E 15 P
How can I add a column that flags records where code = S and the same part has code = P within 3 subsequent days? Expected result:
part day code flag
1: A 1 S TRUE
2: B 2 S FALSE
3: A 3 P FALSE
4: C 4 X FALSE
5: A 5 P FALSE
6: D 6 S TRUE
7: B 6 P FALSE
8: D 7 P FALSE
9: E 15 P FALSE
I think this does it
df[, v := FALSE ]
df[code == "S", v := !is.na(
df[code == "P"][df[code == "S"], on=c("part", "day"), roll=-3, which=TRUE]
)]
part day code v
1: A 1 S TRUE
2: B 2 S FALSE
3: A 3 P FALSE
4: C 4 X FALSE
5: A 5 P FALSE
6: D 6 S TRUE
7: B 6 P FALSE
8: D 7 P FALSE
9: E 15 P FALSE
How it works !is.na(x[i, which=TRUE]) tells us whether each row of i found a match in x. (This might break down if i finds multiple matches.) The roll part expands the scope of matches to cover rows that are nearby in terms of the last column being joined on.
I'm not sure that the roll value there is correct since I have never used it this way.
This can work too-
df$v <- as.logical((df$code== "S") *
c(sapply(seq(1:(nrow(df)-2)), function(x)
{
max(df[(x:x+2),"code"] == "P")
}),
df[nrow(df)-1,"code"]=="P",
df[nrow(df),"code"]=="P"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With