I'm having some trouble understanding data.table's rollends argument when doing a rolling join.
The docs for reference:
A logical vector length 2 (a single logical is recycled) indicating whether values falling before the first value or after the last value for a group should be rolled as well.
If rollends[2]=TRUE, it will roll the last value forward. TRUE by default for LOCF and FALSE for NOCB rolls.
If rollends[1]=TRUE, it will roll the first value backward. TRUE by default for NOCB and FALSE for LOCF rolls.
Now a confusing example. Here, I build a table of commercials and two different tables of sales.
# commercials
commercials<-data.table(commercialID=c("C1","C2","C3","C4"), commercialDate=as.Date(c("2014-1-1","2014-4-1","2014-7-1","2014-9-15")))
commercials[, rollDate:=commercialDate] #Add a column, rollDate equal to commercialDate
setkey(commercials, "rollDate")
commercials
   commercialID commercialDate   rollDate
1:           C1     2014-01-01 2014-01-01
2:           C2     2014-04-01 2014-04-01
3:           C3     2014-07-01 2014-07-01
4:           C4     2014-09-15 2014-09-15
# sales1 (A single sale before all commercials)
sales1 <- data.table(saleID=c("S0"), saleDate=as.Date(c("2010-12-31")))
sales1[, rollDate:=saleDate]
setkey(sales1, "rollDate")
sales1
saleID   saleDate   rollDate
1:     S0 2010-12-31 2010-12-31
# sales2 (A sale before all commercials and a sale after commercial1)
sales2 <- data.table(saleID=c("S0", "S1"), saleDate=as.Date(c("2010-12-31", "2014-2-1")))
sales2[, rollDate:=saleDate]
setkey(sales2, "rollDate")
sales2
saleID   saleDate   rollDate
1:     S0 2010-12-31 2010-12-31
2:     S1 2014-02-01 2014-02-01
Now for some rolling joins
sales1[commercials, roll=TRUE, rollends=c(TRUE, FALSE)]
   saleID saleDate   rollDate commercialID commercialDate
1:     NA     <NA> 2014-01-01           C1     2014-01-01
2:     NA     <NA> 2014-04-01           C2     2014-04-01
3:     NA     <NA> 2014-07-01           C3     2014-07-01
4:     NA     <NA> 2014-09-15           C4     2014-09-15
sales2[commercials, roll=TRUE, rollends=c(TRUE, FALSE)]
   saleID   saleDate   rollDate commercialID commercialDate
1:     S0 2010-12-31 2014-01-01           C1     2014-01-01
2:     NA       <NA> 2014-04-01           C2     2014-04-01
3:     NA       <NA> 2014-07-01           C3     2014-07-01
4:     NA       <NA> 2014-09-15           C4     2014-09-15
rollends is doing.Oh, and I'm currently using the development version, 1.9.7
In the first case,
sales1[commercials, roll=TRUE, rollends=c(TRUE, FALSE)]
2014-01-01 row in commercials falls after 2010-12-31. The prevailing value has to be carried forward. But it also falls on the end, i.e., after sales1, and you've provided rollends[2] = FALSE. So it doesn't get rolled forward.
In the second case,
sales2[commercials, roll=TRUE, rollends=c(TRUE, FALSE)]
2014-01-01 row in commercials falls in between 2010-12-31 and 2014-02-01. There's no effect of rollends for this row since it doesn't fall on either end. So the last value gets rolled forward.
All other values fall outside of sales2. So rollends argument comes into play. And rollends[2] = FALSE] means prevailing values won't be rolled forwards.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With