Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Calculate extra values for moving average (and other functions)

Suppose I have the following dataset:

library(dplyr)
library(zoo)

df <- data.frame(date = seq.Date(from = "2025-01-01", to = "2025-01-10"),
                 value = 1:10)

df
#>          date value
#> 1  2025-01-01     1
#> 2  2025-01-02     2
#> 3  2025-01-03     3
#> 4  2025-01-04     4
#> 5  2025-01-05     5
#> 6  2025-01-06     6
#> 7  2025-01-07     7
#> 8  2025-01-08     8
#> 9  2025-01-09     9
#> 10 2025-01-10    10

When I calculate the simple moving average for, let's say, the last 5 observations, this is what I get:

df |> 
  mutate(value_roll = rollapply(value, width = 5, FUN = mean, fill = NA, align = "right"))
#>          date value value_roll
#> 1  2025-01-01     1         NA
#> 2  2025-01-02     2         NA
#> 3  2025-01-03     3         NA
#> 4  2025-01-04     4         NA
#> 5  2025-01-05     5          3
#> 6  2025-01-06     6          4
#> 7  2025-01-07     7          5
#> 8  2025-01-08     8          6
#> 9  2025-01-09     9          7
#> 10 2025-01-10    10          8

As expected, the first 4 values are NA. However, for a simple moving average of order k, I'd like that the first k-1 values were the first simple moving average of order j-1, k = 1, ..., k-1. For example,

#>          date value   value_roll
#> 1  2025-01-01     1          1
#> 2  2025-01-02     2          1.5
#> 3  2025-01-03     3          2
#> 4  2025-01-04     4          2.5
#> 5  2025-01-05     5          3
#> 6  2025-01-06     6          4
#> 7  2025-01-07     7          5
#> 8  2025-01-08     8          6
#> 9  2025-01-09     9          7
#> 10 2025-01-10    10          8

Also, I want to be able to inform other functions for the argument FUN in the rollapply function, such as sum (i.e., the sum of the last k values) and sd (i.e., the standard deviation of the last k values), but in the same fashion as the moving average.

Is there a simple way to do it in R? I bet there is, but I couldn't come with any simple idea. They are all too complex for my taste.

like image 616
Marcus Nunes Avatar asked Oct 30 '25 17:10

Marcus Nunes


2 Answers

From help(zoo::rollapply):

partial
logical or numeric. If FALSE (default) then FUN is only applied when all indexes of the rolling window are within the observed time range. If TRUE, then the subset of indexes that are in range are passed to FUN. A numeric argument to partial can be used to determin the minimal window size for partial computations. See below for more details.

e.g.

data.frame(date = seq.Date(from='2025-01-01', to='2025-01-10'), value = 1:10) |>
  transform(value_roll = zoo::rollapply(
    value, width=5, FUN=mean, fill=NA, align='right', partial=TRUE))

gives

         date value value_roll
1  2025-01-01     1        1.0
2  2025-01-02     2        1.5
3  2025-01-03     3        2.0
4  2025-01-04     4        2.5
5  2025-01-05     5        3.0
6  2025-01-06     6        4.0
7  2025-01-07     7        5.0
8  2025-01-08     8        6.0
9  2025-01-09     9        7.0
10 2025-01-10    10        8.0

Note

Edit to add session info, see comments below this answer.

> sessionInfo()
R version 4.5.0 (2025-04-11)
Platform: aarch64-apple-darwin20
Running under: macOS Sequoia 15.2

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] zoo_1.8-14     compiler_4.5.0 tools_4.5.0    grid_4.5.0     lattice_0.22-6
like image 119
Friede Avatar answered Nov 01 '25 07:11

Friede


You can create a "sub-frame" with slider::slide_dfr() with all your windowed calculations; mutate(), when called without a name, unnests it to columns:

tibble::tibble(date = seq.Date(from = "2025-01-01", to = "2025-01-10"), value = 1:10) |>
  dplyr::mutate(
    slider::slide_dfr(
      value, 
      .f = \(x) data.frame(mean_roll = mean(x), sum_roll = sum(x), sd_roll = sd(x)), 
      .before = 4
    )
  )
#> # A tibble: 10 × 5
#>    date       value mean_roll sum_roll sd_roll
#>    <date>     <int>     <dbl>    <int>   <dbl>
#>  1 2025-01-01     1       1          1  NA    
#>  2 2025-01-02     2       1.5        3   0.707
#>  3 2025-01-03     3       2          6   1    
#>  4 2025-01-04     4       2.5       10   1.29 
#>  5 2025-01-05     5       3         15   1.58 
#>  6 2025-01-06     6       4         20   1.58 
#>  7 2025-01-07     7       5         25   1.58 
#>  8 2025-01-08     8       6         30   1.58 
#>  9 2025-01-09     9       7         35   1.58 
#> 10 2025-01-10    10       8         40   1.58
like image 26
margusl Avatar answered Nov 01 '25 06:11

margusl