Suppose I have the following dataset:
library(dplyr)
library(zoo)
df <- data.frame(date = seq.Date(from = "2025-01-01", to = "2025-01-10"),
value = 1:10)
df
#> date value
#> 1 2025-01-01 1
#> 2 2025-01-02 2
#> 3 2025-01-03 3
#> 4 2025-01-04 4
#> 5 2025-01-05 5
#> 6 2025-01-06 6
#> 7 2025-01-07 7
#> 8 2025-01-08 8
#> 9 2025-01-09 9
#> 10 2025-01-10 10
When I calculate the simple moving average for, let's say, the last 5 observations, this is what I get:
df |>
mutate(value_roll = rollapply(value, width = 5, FUN = mean, fill = NA, align = "right"))
#> date value value_roll
#> 1 2025-01-01 1 NA
#> 2 2025-01-02 2 NA
#> 3 2025-01-03 3 NA
#> 4 2025-01-04 4 NA
#> 5 2025-01-05 5 3
#> 6 2025-01-06 6 4
#> 7 2025-01-07 7 5
#> 8 2025-01-08 8 6
#> 9 2025-01-09 9 7
#> 10 2025-01-10 10 8
As expected, the first 4 values are NA. However, for a simple moving average of order k, I'd like that the first k-1 values were the first simple moving average of order j-1, k = 1, ..., k-1. For example,
#> date value value_roll
#> 1 2025-01-01 1 1
#> 2 2025-01-02 2 1.5
#> 3 2025-01-03 3 2
#> 4 2025-01-04 4 2.5
#> 5 2025-01-05 5 3
#> 6 2025-01-06 6 4
#> 7 2025-01-07 7 5
#> 8 2025-01-08 8 6
#> 9 2025-01-09 9 7
#> 10 2025-01-10 10 8
Also, I want to be able to inform other functions for the argument FUN in the rollapply function, such as sum (i.e., the sum of the last k values) and sd (i.e., the standard deviation of the last k values), but in the same fashion as the moving average.
Is there a simple way to do it in R? I bet there is, but I couldn't come with any simple idea. They are all too complex for my taste.
From help(zoo::rollapply):
partial
logical or numeric. If FALSE (default) then FUN is only applied when all indexes of the rolling window are within the observed time range. If TRUE, then the subset of indexes that are in range are passed to FUN. A numeric argument to partial can be used to determin the minimal window size for partial computations. See below for more details.
e.g.
data.frame(date = seq.Date(from='2025-01-01', to='2025-01-10'), value = 1:10) |>
transform(value_roll = zoo::rollapply(
value, width=5, FUN=mean, fill=NA, align='right', partial=TRUE))
gives
date value value_roll
1 2025-01-01 1 1.0
2 2025-01-02 2 1.5
3 2025-01-03 3 2.0
4 2025-01-04 4 2.5
5 2025-01-05 5 3.0
6 2025-01-06 6 4.0
7 2025-01-07 7 5.0
8 2025-01-08 8 6.0
9 2025-01-09 9 7.0
10 2025-01-10 10 8.0
Note
Edit to add session info, see comments below this answer.
> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Berlin
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] zoo_1.8-14 compiler_4.5.0 tools_4.5.0 grid_4.5.0 lattice_0.22-6
You can create a "sub-frame" with slider::slide_dfr() with all your windowed calculations; mutate(), when called without a name, unnests it to columns:
tibble::tibble(date = seq.Date(from = "2025-01-01", to = "2025-01-10"), value = 1:10) |>
dplyr::mutate(
slider::slide_dfr(
value,
.f = \(x) data.frame(mean_roll = mean(x), sum_roll = sum(x), sd_roll = sd(x)),
.before = 4
)
)
#> # A tibble: 10 × 5
#> date value mean_roll sum_roll sd_roll
#> <date> <int> <dbl> <int> <dbl>
#> 1 2025-01-01 1 1 1 NA
#> 2 2025-01-02 2 1.5 3 0.707
#> 3 2025-01-03 3 2 6 1
#> 4 2025-01-04 4 2.5 10 1.29
#> 5 2025-01-05 5 3 15 1.58
#> 6 2025-01-06 6 4 20 1.58
#> 7 2025-01-07 7 5 25 1.58
#> 8 2025-01-08 8 6 30 1.58
#> 9 2025-01-09 9 7 35 1.58
#> 10 2025-01-10 10 8 40 1.58
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With