Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate time series data, get average without NA or 0s in R

I am trying to aggregate my time series data, and I want to get the average value of the aggregated value without including NAs or 0s.

Here is what my data looks like.

                     V1      423 470  473 626
 1: 2018-01-01 00:00:00  0.00000   0  0.0   0
 2: 2018-01-01 00:01:00  8.00000   0 95.0   0
 3: 2018-01-01 00:02:00  0.00000   0  0.0   0
 4: 2018-01-01 00:03:00 31.00000   0 24.5   0
 5: 2018-01-01 00:04:00 37.00000  28 33.0  31

I am trying to aggregate in 5 minute interval, and my expected output is

                   V1      423 470  473 626
  2018-01-01 00:05:00 34.00000  28 50.8  31
                         ~
: 2018-01-01 00:10:00        A   B    C   D

How do I aggregate them in 5 minute interval while getting the average value excluding 0 s or NA s ?

EDIT

structure(list(V1 = c("2018-01-01 00:00:00", "2018-01-01 00:01:00", 
"2018-01-01 00:02:00", "2018-01-01 00:03:00", "2018-01-01 00:04:00", 
"2018-01-01 00:05:00", "2018-01-01 00:06:00", "2018-01-01 00:07:00", 
"2018-01-01 00:08:00", "2018-01-01 00:09:00", "2018-01-01 00:10:00", 
"2018-01-01 00:11:00", "2018-01-01 00:12:00", "2018-01-01 00:13:00", 
"2018-01-01 00:14:00", "2018-01-01 00:15:00", "2018-01-01 00:16:00", 
"2018-01-01 00:17:00", "2018-01-01 00:18:00", "2018-01-01 00:19:00"
), `423` = c(0, 8, 0, 31, 37, 31, 26.1111111111111, 39.375, 35.5, 
19.3, 21.5454545454545, 41.2, 31, 27.375, 31, 24.3076923076923, 
26.1666666666667, 24, 26.8, 30.8181818181818), `470` = c(0, 0, 
0, 0, 28, 0, 0, 0, 27, 21, 0, 21.5, 0, 0, 0, 0, 10, 46, 19.5, 
0), `473` = c(0, 95, 0, 24.5, 33, 55, 50, 0, 47, 45, 0, 0, 35.4, 
0, 0, 23, 32.5, 0, 0, 55), `626` = c(0, 0, 0, 0, 31, 26, 0, 0, 
0, 16, 0, 0, 0, 0, 75, 0, 0, 48, 0, 0)), row.names = c(NA, -20L
), .internal.selfref = <pointer: 0x0000029131ff1ef0>, class = c("data.table", 
"data.frame"))
like image 260
Yun Hyunsoo Avatar asked Dec 10 '25 06:12

Yun Hyunsoo


1 Answers

The following uses cut to make of column V1 a grouping variable by 5 minutes intervals and then summarizes using a custom function to compute means without NA's or zero values. I have left this function in two code lines to make it more readable but it could simply be

f <- function(x) mean(x[x != 0], na.rm = TRUE)

The date time column V1 is first coerced to class "POSIXct".

library(data.table)

f <- function(x){
  y <- x[x != 0]
  mean(y, na.rm = TRUE)
}

df[, V1 := as.POSIXct(V1)]
df[, V1 := cut(V1, "5 mins")]
df[, lapply(.SD, f), by = V1]
#                    V1      423      470      473 626
#1: 2018-01-01 00:00:00 25.33333 28.00000 50.83333  31
#2: 2018-01-01 00:05:00 30.25722 24.00000 49.25000  21
#3: 2018-01-01 00:10:00 30.42409 21.50000 35.40000  75
#4: 2018-01-01 00:15:00 26.41851 25.16667 36.83333  48

A one-liner could be

df[, lapply(.SD, f), by = cut(as.POSIXct(V1), "5 mins")]
like image 157
Rui Barradas Avatar answered Dec 11 '25 21:12

Rui Barradas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!