I have a dataset comprised of students (id) and the grade they where in every year:
library(data.table)
set.seed(1)
students <- data.table("id" = rep(1:10, each = 10),
"year" = rep(2000:2009, 10),
"grade" = sample(c(9:11, rep(NA, 5)), 100, replace = T))
Here is a sample for student 1:
id year grade
1: 1 2000 9
2: 1 2001 NA
3: 1 2002 NA
4: 1 2003 9
5: 1 2004 10
6: 1 2005 NA
7: 1 2006 NA
8: 1 2007 11
9: 1 2008 NA
I would like to have a way to access each students prior and future grades to preform different operations. Say for example, adding the last three grades of the student. This would result in a dataset like this one:
id year grade sum_lag_3
1: 1 2000 9 9 # 1st window, size 1: 9
2: 1 2001 NA 9
3: 1 2002 NA 9
4: 1 2003 9 18 # 2nd, size 2: 9 + 9 = 18
5: 1 2004 10 28 # 3rd, size 3: 9 + 9 + 10 = 28
6: 1 2005 NA 28
7: 1 2006 NA 28
8: 1 2007 11 30 # 4th, size 3: 9 + 10 + 11 = 30
9: 1 2008 NA 30
10: 1 2009 10 31 # 5th, size 3: 10 + 11 + 10 = 31
11: 2 2001 11 11 # 1st window, size 1: 11
(All results would look like this).
So in the case of the first row, since there are no previous observations this would mean the 'past' vector is empty but the 'future' vector one would be NA NA 9 10 NA NA 11 NA 10.
Similarly, for the second row the 'past' vector would be 9 and the 'future' vector would be:
NA 9 10 NA NA 11 NA 10
And for the third row the 'past' vector would be 9 NA and the 'future' vector would be:
9 10 NA NA 11 NA 10
This is the information I want reference to make different calculations. Calculations that are only within each group and vary depending on the context. Preferably I would like to do this using data.table and without reshaping my data in to a wide format.
I've tried doing the following:
students[, .SD[, sum_last_3:= ...], by = id]
but I get an error message saying this feature is not yet available on data.table (where ... is a placeholder for any operation.).
Thank you all.
Similar to @chinsoon12, but using zoo::rollapply to easily apply sum to a partial window.
d[!is.na(grade), rs := rollapply(grade, 3, sum, align = "right", partial = TRUE), by = id]
d[ , rs := nafill(rs, type = "locf"), by = id]
# id year grade sum_lag_3 rs
# 1: 1 2000 9 9 9
# 2: 1 2001 NA 9 9
# 3: 1 2002 NA 9 9
# 4: 1 2003 9 18 18
# 5: 1 2004 10 28 28
# 6: 1 2005 NA 28 28
# 7: 1 2006 NA 28 28
# 8: 1 2007 11 30 30
# 9: 1 2008 NA 30 30
# 10: 1 2009 10 31 31
# 11: 2 2001 11 11 11
In data.table::frollsum, "partial window feature is not supported, although it can be accomplished by using adaptive=TRUE", and an adaptive rolling function (see ?frollsum):
arf = function(n, len) if(len < n) seq.int(len) else c(seq.int(n), rep(n, len - n))
# if no 'grade' is shorter than n (the full window width), you only need:
# c(seq.int(n), rep(n, len - n))
d[!is.na(grade) , rs2 := frollsum(grade, n = arf(3, .N), align = "right", adaptive = TRUE),
by = id]
d[ , rs2 := nafill(rs, type = "locf"), by = id]
# id year grade sum_lag_3 rs rs2
# 1: 1 2000 9 9 9 9
# 2: 1 2001 NA 9 9 9
# 3: 1 2002 NA 9 9 9
# 4: 1 2003 9 18 18 18
# 5: 1 2004 10 28 28 28
# 6: 1 2005 NA 28 28 28
# 7: 1 2006 NA 28 28 28
# 8: 1 2007 11 30 30 30
# 9: 1 2008 NA 30 30 30
# 10: 1 2009 10 31 31 31
# 11: 2 2001 11 11 11 11
A note on your comment:
I want to be able to preform operations utilizing the past and future of a student for all kinds of operations not just a sum
In zoo::rollapply you can put other functions in the FUN argument. Currently the data.table equivalent, frollapply, does not have the adaptive argument. Thus, the method I used for frollsum above can not yet be applied in frollapply.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With