I'm trying to take something like this
df <- data.frame(times = c("0915", "0930", "0945", "1000", "1015", "1030", "1045", "1100", "1130", "1145", "1200"),
values = c(1,2,3,4,1,2,3,4,1,3,4))
> df
times values
1 0915 1
2 0930 2
3 0945 3
4 1000 4
5 1015 1
6 1030 2
7 1045 3
8 1100 4
9 1130 1
10 1145 3
11 1200 4
12 1215 1
13 1245 3
14 1300 4
15 1330 2
16 1345 4
And turn it into something like this
> df2
times values
1 0930 3
2 1000 7
3 1030 3
4 1100 7
5 1130 NA
6 1200 7
7 1230 NA
8 1300 7
9 1330 NA
10 1400 NA
Essentially, take values measured in 15 minute intervals, and convert them into values measured across 30 minute intervals (summing is sufficient for this).
I can think of an okay solution if I can be certain I have two 15 minute readings for each half hourly reading. I could just add elements pairwise and get what I want. But I can't be certain of that in my data set. As my demo also shows, there could be multiple consecutive values missing.
So I thought some kind of number recognition was necessary, e.g. recognises the time is between 9:15 and 9:30, and just sums those two. So I have a function already called hr2dec which I created to convert these times to decimal so it looks like this
> hr2dec(df$times)
[1] 9.25 9.50 9.75 10.00 10.25 10.50 10.75 11.00 11.50 11.75 12.00
I mention this in case it's easier to solve this problem with decimal instead of 4 digit time.
I also have this data for 24 hours, and multiple days. So if I have a solution that loops, it would need to reset to 0015 after 2400, as these are the first and last measurements for each day. A full set of data with dates included could be generated like so (with decimals for times, like I said, either is fine for me):
set.seed(42)
full_df <- data.frame(date = rep(as.Date(c("2010-02-02", "2010-02-03")), each = 96),
dec_times = seq(0.25,24,0.25),
values = rnorm(96)
)
full_df <- full_df[-c(2,13,15,19,95,131,192),]
The best solution I can come up with so far is a pairwise comparative loop. But even this is not perfect.
Is there some elegant way to do what I'm after? I.e. check the first and last values (in terms of date and time), and sum each half hourly interval? I'm not satisfied with my loop that...
You should check out the tibbletime package -- specifically, you'll want to look at collapse_by() which collapses a tbl_time object by a time period.
library(tibbletime)
library(dplyr)
# create a series of 7 days
# 2018-01-01 to 2018-01-07 by 15 minute intervals
df <- create_series('2018-01-01' ~ '2018-01-07', period = "15 minute")
df$values <- rnorm(nrow(df))
df
#> # A time tibble: 672 x 2
#> # Index: date
#> date values
#> <dttm> <dbl>
#> 1 2018-01-01 00:00:00 -0.365
#> 2 2018-01-01 00:15:00 -0.275
#> 3 2018-01-01 00:30:00 -1.50
#> 4 2018-01-01 00:45:00 -1.64
#> 5 2018-01-01 01:00:00 -0.341
#> 6 2018-01-01 01:15:00 -1.05
#> 7 2018-01-01 01:30:00 -0.544
#> 8 2018-01-01 01:45:00 -1.10
#> 9 2018-01-01 02:00:00 0.0824
#> 10 2018-01-01 02:15:00 0.477
#> # ... with 662 more rows
# Collapse into 30 minute intervals, group, and sum
df %>%
collapse_by("30 minute") %>%
group_by(date) %>%
summarise(sum_values = sum(values))
#> # A time tibble: 336 x 2
#> # Index: date
#> date sum_values
#> <dttm> <dbl>
#> 1 2018-01-01 00:15:00 -0.640
#> 2 2018-01-01 00:45:00 -3.14
#> 3 2018-01-01 01:15:00 -1.39
#> 4 2018-01-01 01:45:00 -1.64
#> 5 2018-01-01 02:15:00 0.559
#> 6 2018-01-01 02:45:00 0.581
#> 7 2018-01-01 03:15:00 -1.50
#> 8 2018-01-01 03:45:00 1.36
#> 9 2018-01-01 04:15:00 0.872
#> 10 2018-01-01 04:45:00 -0.835
#> # ... with 326 more rows
# Alternatively, you can use clean = TRUE
df %>%
collapse_by("30 minute", clean = TRUE) %>%
group_by(date) %>%
summarise(sum_values = sum(values))
#> # A time tibble: 336 x 2
#> # Index: date
#> date sum_values
#> <dttm> <dbl>
#> 1 2018-01-01 00:30:00 -0.640
#> 2 2018-01-01 01:00:00 -3.14
#> 3 2018-01-01 01:30:00 -1.39
#> 4 2018-01-01 02:00:00 -1.64
#> 5 2018-01-01 02:30:00 0.559
#> 6 2018-01-01 03:00:00 0.581
#> 7 2018-01-01 03:30:00 -1.50
#> 8 2018-01-01 04:00:00 1.36
#> 9 2018-01-01 04:30:00 0.872
#> 10 2018-01-01 05:00:00 -0.835
#> # ... with 326 more rows
If you're more into videos (< 20 minutes), check out the The Future of Time Series and Financial Analysis in the Tidyverse by David Vaughan.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With