I'm tracking violent events around a treatment-week to see if the treatment has an effect on the event count around it. The table is broken into country-week and records violent events for most countries between 1989-2019. It is similar to this, but with a total of approx. 120 countries and 70k rows:
Country Year Week Event_Count Treatment
------- ---- ---- ----------- ---------
Angola 1995 33 3 NA
Angola 1995 34 2 NA
Angola 1995 35 3 NA
Angola 1995 36 0 NA
Angola 1995 37 4 1
Angola 1995 38 1 NA
Angola 1995 39 0 NA
Angola 1995 40 2 NA
Angola 1995 41 3 NA
I'd like to find the average weekly event count broken into four-week intervals around the treatment week (in this case, week 37 denoted by the "1" in the Treatment column) up until twelve weeks in both directions. In other words, weeks -1 to -4 would be 2 events as this is the average weekly event count for weeks 33 thru 36. Week 0 (in this case, week 37) would be 4. Weeks 1 to 4 (38 thru 41) would be 1.5. And weeks -5 to -8 (weeks 29 to 32), 5 to 8 (42 to 45), -9 to -12 (25 to 28), and 9 to 12 (46 to 49) would be the average weekly values within their respective bins. I'm using a regression discontinuity design, so I'd like to do the same thing for countries that just missed receiving the treatment, denoted by "0" in the treatment column. All other values in the treatment column are "NA". I was thinking about coding like this:
aggregate(df[row_interval, column_interval], list(df$Country), mean)
but since I'll be looking at several treatment weeks within the same country, and the treatment weeks change regularly (e.g., Angola, 1995 Week 37; Algeria, 1998 Week 12, etc.), this wouldn't work. Ideally, I'd like to place the output for treatments of both "0" and "1" in the same row as the treatment week. For example:
Country Year Week Event_Count Treatment -12to-9 -8to-5 -4to-1 0 1to4 5to8 9to12
------- ---- ---- ----------- --------- ------- ------ ------ - ---- ---- -----
Algeria 2002 14 4 0 3 0.5 1 4 2 5 2
I'm trying to put the outputs for similar weeks in the same column, regardless of receiving "0" or "1" for treatment. E.g., all values for weeks 1 to 4 after the treatment would fall under the "1to4" column.
Thank you in advance for your help! I'm fairly new to stack and r, so I apologize for any confusion.
I write longer data that contains total 25 weeks.
Country Year Week Event_Count Treatment
<chr> <dbl> <dbl> <dbl> <dbl>
1 Angola 1995 25 3 NA
2 Angola 1995 26 2 NA
3 Angola 1995 27 4 NA
4 Angola 1995 28 1 NA
5 Angola 1995 29 0 NA
6 Angola 1995 30 4 NA
7 Angola 1995 31 1 NA
8 Angola 1995 32 0 NA
9 Angola 1995 33 3 NA
10 Angola 1995 34 2 NA
# ... with 15 more rows
Same as your example, Week 37 has Treatment.
Define function gola as,
gola <- function(df) {
idx <- which(!is.na(df$Treatment))
res <- c()
for (i in idx){
trt <- df$Event_Count
x.3 <- mean(trt[(i-12):(i-9)])
x.2 <- mean(trt[(i-8):(i-5)])
x.1 <- mean(trt[(i-4):(i-1)])
x <- trt[i]
x1 <- mean(trt[(i+1):(i+4)])
x2 <- mean(trt[(i+5):(i+8)])
x3 <- mean(trt[(i+9):(i+12)])
res <- rbind(res,c(x.3, x.2, x.1, x, x1, x2, x3))
}
colnames(res) <- c("_12to_9", "_8to_5", "_4to_1", "0", "1to4", "5to8", "9to12")
res <- cbind(df[idx,], res)
return(res)
}
Then, gola(df) will return
Country Year Week Event_Count Treatment _12to_9 _8to_5 _4to_1 0 1to4 5to8 9to12
1 Angola 1995 37 4 1 2.5 1.25 2 4 1.5 2 1.75
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With