I am trying to aggregate my time series data, and I want to get the average value of the aggregated value without including NAs or 0s.
Here is what my data looks like.
V1 423 470 473 626
1: 2018-01-01 00:00:00 0.00000 0 0.0 0
2: 2018-01-01 00:01:00 8.00000 0 95.0 0
3: 2018-01-01 00:02:00 0.00000 0 0.0 0
4: 2018-01-01 00:03:00 31.00000 0 24.5 0
5: 2018-01-01 00:04:00 37.00000 28 33.0 31
I am trying to aggregate in 5 minute interval, and my expected output is
V1 423 470 473 626
2018-01-01 00:05:00 34.00000 28 50.8 31
~
: 2018-01-01 00:10:00 A B C D
How do I aggregate them in 5 minute interval while getting the average value excluding 0 s or NA s ?
EDIT
structure(list(V1 = c("2018-01-01 00:00:00", "2018-01-01 00:01:00",
"2018-01-01 00:02:00", "2018-01-01 00:03:00", "2018-01-01 00:04:00",
"2018-01-01 00:05:00", "2018-01-01 00:06:00", "2018-01-01 00:07:00",
"2018-01-01 00:08:00", "2018-01-01 00:09:00", "2018-01-01 00:10:00",
"2018-01-01 00:11:00", "2018-01-01 00:12:00", "2018-01-01 00:13:00",
"2018-01-01 00:14:00", "2018-01-01 00:15:00", "2018-01-01 00:16:00",
"2018-01-01 00:17:00", "2018-01-01 00:18:00", "2018-01-01 00:19:00"
), `423` = c(0, 8, 0, 31, 37, 31, 26.1111111111111, 39.375, 35.5,
19.3, 21.5454545454545, 41.2, 31, 27.375, 31, 24.3076923076923,
26.1666666666667, 24, 26.8, 30.8181818181818), `470` = c(0, 0,
0, 0, 28, 0, 0, 0, 27, 21, 0, 21.5, 0, 0, 0, 0, 10, 46, 19.5,
0), `473` = c(0, 95, 0, 24.5, 33, 55, 50, 0, 47, 45, 0, 0, 35.4,
0, 0, 23, 32.5, 0, 0, 55), `626` = c(0, 0, 0, 0, 31, 26, 0, 0,
0, 16, 0, 0, 0, 0, 75, 0, 0, 48, 0, 0)), row.names = c(NA, -20L
), .internal.selfref = <pointer: 0x0000029131ff1ef0>, class = c("data.table",
"data.frame"))
The following uses cut to make of column V1 a grouping variable by 5 minutes intervals and then summarizes using a custom function to compute means without NA's or zero values. I have left this function in two code lines to make it more readable but it could simply be
f <- function(x) mean(x[x != 0], na.rm = TRUE)
The date time column V1 is first coerced to class "POSIXct".
library(data.table)
f <- function(x){
y <- x[x != 0]
mean(y, na.rm = TRUE)
}
df[, V1 := as.POSIXct(V1)]
df[, V1 := cut(V1, "5 mins")]
df[, lapply(.SD, f), by = V1]
# V1 423 470 473 626
#1: 2018-01-01 00:00:00 25.33333 28.00000 50.83333 31
#2: 2018-01-01 00:05:00 30.25722 24.00000 49.25000 21
#3: 2018-01-01 00:10:00 30.42409 21.50000 35.40000 75
#4: 2018-01-01 00:15:00 26.41851 25.16667 36.83333 48
A one-liner could be
df[, lapply(.SD, f), by = cut(as.POSIXct(V1), "5 mins")]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With