Let's assume I have a Ruby array with arrays of time/value pairs, like:
[
# about 9:00 AM on consecutive days
[<DateTime: 2014-05-15T09:00:00Z>, 56],
[<DateTime: 2014-05-16T09:06:00Z>, 57],
# ... missing data for May 17th, 2014
# ... missing data for May 18th, 2014
[<DateTime: 2014-05-19T08:57:00Z>, 61],
# ...
]
Notice that (1) the values aren't collected at the same time each day, and (2) some of the values are missing.
I want to normalize the data by:
What's the right way to programmatically do that?
How do you want to interpolate? In your example [58, 59], [58, 60] and [59, 60] would be equally plausible.
The expected value will depend on the interpolation strategy that gets used (e.g., linear, quadratic, etc.), so I can't provide an exact answer.
I'm willing to accept any interpolation strategy that predicts the original, actual data points with minimal error (e.g. < 0.1%). I'm willing to accept any normalization strategy that results in the timeseries observations being equally spaced.
You could use spline interpolation. Here's an example using the Spliner gem:
require 'date'
require 'spliner'
arr = [
[DateTime.new(2014,5,15,9), 56],
[DateTime.new(2014,5,16,9,6), 57],
[DateTime.new(2014,5,19,8,57), 61]
]
spline = Spliner::Spliner.new(arr.to_h, extrapolate: '10%')
(DateTime.new(2014,5,15,9)..DateTime.new(2014,5,19,9)).each do |date|
puts "#{date}: #{spline[date]}"
end
Output:
2014-05-15T09:00:00+00:00: 56.0 # exact value
2014-05-16T09:00:00+00:00: 56.995496729398646 # interpolated value
2014-05-17T09:00:00+00:00: 58.18937752978536 # interpolated value
2014-05-18T09:00:00+00:00: 59.55365781173006 # interpolated value
2014-05-19T09:00:00+00:00: 61.0030489943531 # extrapolated value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With