Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculating expired vouchers, given a fixed window

Tags:

sql

sql-server

r

Problem:

I have time series data for vouchers being earned and redeemed. Vouchers only last for a 3 day fixed window including the day they were earnt eg. a voucher earnt on 1st Jan would be active for 1st, 2nd & 3rd.

I need to make an assumption that it's first come first served when it comes to redemptions happening against them being earned. E.g. if we have the data

Date       VouchersEarned VouchersRedeemed
01/01/2020             10                0
02/01/2020              8                9
03/01/2020              4                4
04/01/2020              2                4
05/01/2020              1                4

then on 2nd, those 9 vouchers are from the 1st ie we have remaining vouchers

Date       VouchersEarned VouchersRedeemed RemainingVouchers
01/01/2020             10                0                 1
02/01/2020              8                9                 8

then on the 3rd, those 4 redemptions will be the remaining 1 from the 1st and 3 from the 2nd

Date       VouchersEarned VouchersRedeemed RemainingVouchers
01/01/2020             10                0                 0
02/01/2020              8                9                 5
03/01/2020              4                4                 0

the 4th:

Date       VouchersEarned VouchersRedeemed RemainingVouchers
01/01/2020             10                0                 0
02/01/2020              8                9                 1
03/01/2020              4                4                 4
04/01/2020              2                4                 2

the 5th:

Date       VouchersEarned VouchersRedeemed RemainingVouchers
01/01/2020             10                0                 0
02/01/2020              8                9                 1**
03/01/2020              4                4                 0
04/01/2020              2                4                 2
05/01/2020              1                4                 1

So on the 5th we have our first expired voucher as the ** one wasn't used up in it's 3 day period.

I need to calculate at any point in time how many vouchers there are that expire using this logic flow. I can figure out how to do it in my head, going through line by line like above. But I'm struggling to see how to do it in a vectorised way, that could also work in MSSQL. If it was just in R I could brute force it with a loop. I have also posted this on Cross Validated but had no feedback yet so extending this to the programming community.

Data:

I have included an R dput at the end but here is what it looks like Edit: There is some new data at the end of this summary that's a more tricky example

        Date VouchersEarned VouchersRedeemed ActualActive ActualExpired CumulativeEarned
1 01/01/2020             10                0           10             0               10
2 02/01/2020              8                9            9             0               18
3 03/01/2020              4                4            9             0               22
4 04/01/2020              2                4            7             0               24
5 05/01/2020              1                4            3             1               25
6 06/01/2020              0                1            2             0               25
7 07/01/2020              0                1            0             1               25
8 08/01/2020              0                0            0             0               25
  CumulativeRedeemed CumulativeDiff
1                  0             10
2                  9              9
3                 13              9
4                 17              7
5                 21              4
6                 22              3
7                 23              2
8                 23              2

ActualActive & ActualExpired are the numbers I get to with pen and paper. Note that I put them on the date they expired, rather than against the date they were earned. Either would work for me, just changes the reporting view. I can get the total vouchers in play by looking at the CumulativeEarned & CumulativeRedeemed and then taking the difference. I think then if I can just get the Expired ones, then calculating the active is simple.

If anyone has any ideas I'd really appreciate it as I seem to have a mental block on this today! Thank you! :)

Edit: My actual problem is a 28 day window, this is a simplified view :)

df <- structure(list(Date = c("01/01/2020", "02/01/2020", "03/01/2020", 
"04/01/2020", "05/01/2020", "06/01/2020", "07/01/2020", "08/01/2020"
), VouchersEarned = c(10L, 8L, 4L, 2L, 1L, 0L, 0L, 0L), VouchersRedeemed = c(0L, 
9L, 4L, 4L, 4L, 1L, 1L, 0L), ActualActive = c(10L, 9L, 9L, 7L, 
3L, 2L, 0L, 0L), ActualExpired = c(0L, 0L, 0L, 0L, 1L, 0L, 1L, 
0L), CumulativeEarned = c(10L, 18L, 22L, 24L, 25L, 25L, 25L, 
25L), CumulativeRedeemed = c(0L, 9L, 13L, 17L, 21L, 22L, 23L, 
23L), CumulativeDiff = c(10L, 9L, 9L, 7L, 4L, 3L, 2L, 2L)), class = "data.frame", row.names = c(NA, 
-8L))

Edit 2: My latest attempt in R. Something is off though, but I feel like it's not impossible with the right combination of lagged columns

library(data.table)
dt <- as.data.table(df)
dt[, Date := lubridate::dmy(Date)]  

# functional form
findExpiredVouchers <- function(dt, period=3){
  
  #  generation of cumulative data
  dt[, CumulativeEarned := cumsum(VouchersEarned)]
  dt[, CumulativeRedeemed := cumsum(VouchersRedeemed)]
  dt[, CumulativeDiff := CumulativeEarned-CumulativeRedeemed]
  
  # I think if we look at the cumulative earned against the cumulative redeemed,
  # plus it's lag period from that point, ie the cumulative redeemed in 2 days,
  # then we can see for data prior to last 3 which have expired
  dt[, LaggedCumulativeRedeemed := shift(CumulativeRedeemed, period-1, type="lead")]
  dt[, ExpiredCumulative := CumulativeEarned - LaggedCumulativeRedeemed]
  
  # Now this creates negative values though for eg the first case, I'm not 100%
  # if I need to worry about these
  dt[ExpiredCumulative < 0, ExpiredCumulative := 0]
  
  # so now it should be the difference in this series that captures the origin
  # of an expiration
  dt[, Expired := c(NA, diff(ExpiredCumulative))]
  dt[1, Expired := ExpiredCumulative]
  
  # and I can shift this by the lag period to get the end result
  dt[, OutputExpired := shift(Expired, period, type="lag")]
  dt[is.na(OutputExpired), OutputExpired := 0]
  
  # and active
  dt[, CumulativeExpired := cumsum(OutputExpired)]
  dt[, OutputActive := CumulativeDiff-CumulativeExpired]
}
dt <- findExpiredVouchers(dt, 3)
dt[] # OutputExpired & OutputActive

With some new fake data, flaws are exposed as negative expirations come out :(

set.seed(1)
p = 0.2
new_dt <- data.table(
  Date = 1:10,
  VouchersEarned = sample(0:15, 10, replace=TRUE)
  )
new_dt[, CumulativeEarned := cumsum(VouchersEarned)]

# fake VouchersRedeemed
new_dt[, VouchersRedeemed := as.integer(NA)]
new_dt[, CumulativeDiff := CumulativeEarned]
for(i in 1:nrow(new_dt)){
  new_value <- sum(rbinom(new_dt$CumulativeDiff[i], 1, p))
  new_dt[i, VouchersRedeemed := new_value]
  new_dt[i:.N, CumulativeDiff := CumulativeDiff - new_value]
}

new_dt <- findExpiredVouchers(dt=new_dt, 3)
new_dt[] # OutputExpired < 0

new example

    Date VouchersEarned VouchersRedeemed OutputExpired
 1:    1              8                1             0
 2:    2              3                1             0
 3:    3              6                3             0
 4:    4              0                1             3
 5:    5              1                1             2
 6:    6             12                5             5
 7:    7              6                5            -5
 8:    8             10                3            -4
 9:    9             13                7             9
10:   10              1                8            -1

Running Waldi's loop shows a similar result but the -5, -4 and 9 cancel to be 0's (as they should!)

like image 459
Jonny Phelps Avatar asked Jan 26 '26 00:01

Jonny Phelps


2 Answers

Using the tidyverse libraries, in particular dplyr, and magrittr, you can write this code

df %<>%  
  mutate(VouchersEarnedCumsum = cumsum(VouchersEarned),
         VouchersRedeemedCumsum = cumsum(VouchersRedeemed),
         VouchersCumsumDifference = VouchersEarnedCumsum - lead(VouchersRedeemedCumsum, 2, default = max(VouchersRedeemedCumsum)),
         VouchersCumsumDifference = as.numeric(VouchersCumsumDifference),
         VouchersCumsumDifference = case_when(VouchersCumsumDifference < 0 ~ 0, T ~ VouchersCumsumDifference),
         ExpiredVouchers = VouchersCumsumDifference - lag(VouchersCumsumDifference, default = 0),
         ExpiredVouchersInDate = lag(ExpiredVouchers, 3, default = 0))

The resulting table, starting from your dataframe df, is

    # A tibble: 8 x 8
      Date       VouchersEarned VouchersRedeemed VouchersEarnedCumsum VouchersRedeemedCumsum VouchersCumsumDifference ExpiredVouchers ExpiredVouchersInDate
      <chr>               <int>            <int>                <int>                  <int>                    <dbl>           <dbl>                 <dbl>
    1 01/01/2020             10                0                   10                      0                        0               0                     0
    2 02/01/2020              8                9                   18                      9                        1               1                     0
    3 03/01/2020              4                4                   22                     13                        1               0                     0
    4 04/01/2020              2                4                   24                     17                        2               1                     0
    5 05/01/2020              1                4                   25                     21                        2               0                     1
    6 06/01/2020              0                1                   25                     22                        2               0                     0
    7 07/01/2020              0                1                   25                     23                        2               0                     1
    8 08/01/2020              0                0                   25                     23                        2               0                     0

Then, it is quite easy to translate each function in corresponding SQL queries using dbplyr, with translate_sql.

I'm not sure this code works in every situation you're going to face, I recommend to test it extensively.

EDIT I think this corrected code solves the problem.

df %<>%  
  mutate(VouchersEarnedCumsum = cumsum(VouchersEarned),
         VouchersRedeemedCumsum = cumsum(VouchersRedeemed),
         VouchersCumsumDifference = VouchersEarnedCumsum - lead(VouchersRedeemedCumsum, 2, default = max(VouchersRedeemedCumsum)),
         VouchersCumsumDifference = as.numeric(VouchersCumsumDifference),
         VouchersCumsumDifference = case_when(VouchersCumsumDifference < 0 ~ 0, T ~ VouchersCumsumDifference),
         VouchersCumsumDifference = accumulate(VouchersCumsumDifference, max),
         ExpiredVouchers = VouchersCumsumDifference - lag(VouchersCumsumDifference, default = 0),
         ExpiredVouchersInDate = lag(ExpiredVouchers, 3, default = 0))
like image 145
Edoch Avatar answered Jan 28 '26 13:01

Edoch


If you just need the total vouchers expired at a point in time you can use the following query and get the value from the max date line. If not exactly what you need, perhaps it can be a good starting point.

declare @vouchers table (VDate date, VouchersEarned int, VouchersRedeemed int);
insert  @vouchers values
('2020-01-01', 10, 0),
('2020-01-02', 8, 9),
('2020-01-03', 4, 4),
('2020-01-04', 2, 4),
('2020-01-05', 1, 4),
('2020-01-06', 0, 1),
('2020-01-07', 0, 1),
('2020-01-08', 0, 0);

--select * from @vouchers;

declare @expiry_days int = 3;
declare @start_date date = '2020-01-01';
declare @end_date date = '2020-01-08';

with tally_cte (N) as ( --just a list of numbers 1-n
        --need at lease as many rows as dates in date range other options including a permanent tally table here: https://www.sqlservercentral.com/blogs/tally-tables-in-t-sql
        -- 1000 rows
        SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
        FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
        CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
), days_cte as (    --a list of all dates in date range to include records for days with no earned or redeemed
        select VDate = dateadd(day, t.N - 1, @start_date)
        from tally_cte t
        where t.N <= datediff(day, @start_date, @end_date) + 1
)
select  xed.VDate, xed.VouchersEarned, xed.VouchersRedeemed, xed.CumulativeEarned, xed.CumulativeRedeemed
                , Expiring = lag(xed.Expiring, 1, 0) over (order by xed.VDate)
from    (--determine expired vouchers with a lag
        select  xing.VDate, xing.VouchersEarned, xing.VouchersRedeemed, xing.CumulativeEarned, xing.CumulativeRedeemed
                , Expiring = case when xing.Expiring < 0 then 0 else xing.Expiring end
        from    (--determine expiring vouchers
                select  v.*
                        , Expiring = lag(v.CumulativeEarned, @expiry_days - 1, 0) over (order by v.VDate) - v.CumulativeRedeemed
                from    (--basic query with cumulative earned and redeemed
                        select  v.*
                                , CumulativeEarned = sum(v.VouchersEarned) over (order by v.VDate)
                                , CumulativeRedeemed = sum(v.VouchersRedeemed) over (order by v.vDate)
                        from    days_cte d
                        join    @vouchers v
                                on  v.VDate = d.VDate
                        ) v
                ) xing
        ) xed
order   by xed.VDate;

Output:

VDate      VouchersEarned VouchersRedeemed CumulativeEarned CumulativeRedeemed Expiring
---------- -------------- ---------------- ---------------- ------------------ -----------
2020-01-01 10             0                10               0                  0
2020-01-02 8              9                18               9                  0
2020-01-03 4              4                22               13                 0
2020-01-04 2              4                24               17                 0
2020-01-05 1              4                25               21                 1
2020-01-06 0              1                25               22                 1
2020-01-07 0              1                25               23                 2
2020-01-08 0              0                25               23                 2
like image 29
SMM Avatar answered Jan 28 '26 15:01

SMM



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!