Do you know any parallel modified moving average algorithm?
I want quickly calculate moving average but not with sequential algorithms. I want use parallel algorithms but I have still not found solution.
The best algorithm which I found is sequential algorithm modified moving average for measuring computer performance:
new_avg =  alfa(new_time, previous_time) * new_value + (1-alfa(new_time, previous_time)) * previous_avg
alfa(new_time, previous_time) = 1- exp(-(new_time - previous_time)/moving_period)
Some other algorithms are good also but I have not found parallel algorithms.
It is hard question and I need some help with it.
Consider that I want count events that will come in random time order - early events can come later that late events - you could assume that early event can be skipped/become obsolete after processing late events (or with some timeout). Not assume sequential time order of events and that event from same time will come with same time.
I do not want use any algorithm which require to remember many samples (especially all) it should only remember time and previous average value maybe some additional value but not all or same samples. Consider that algorithm can make some minor errors not need to be perfect if reason of it is some performance boost.
It will be very nice if it will use sharding but not required.
A moving average where events arrive in sequence could be done like this:
newMovingAverage = ((MovingAverage * (n - 1)) + newSample) / n
where n dictates how big (or little) influence this sample should have on the moving average. The greater the n, the smaller the influence. Over time, older samples will have less and less influence on the moving average as new samples arrive.
With samples coming out of sequence you can try to mimic that behavioral by letting the age of the sample dictate how much influence it should have on the moving average. This could e.g. be done like this:
influence = (1 + sampleAge)^2 * n 
newMovingAverage = ((MovingAverage * (influence - 1)) + newSample) / influence 
Where I let the sampleAge dictate how much the newSample should influence the moving average.
The possibility of having a parallel algorithm would depend on the nature of the moving average that you are using.
The algorithm that you show in your question is an exponential smoother. Thus, the first value of the data has an influence on every calculated average value. The amount of influence that the first value has decreases with every new data point, but even the last average in the sequence will be slightly influenced by the first data point.
This sort of moving average can't be parallelised because you can't calculate any average without using (explicitly or implicitly) all the previous data that has been received.
However, Wikipedia's article on moving averages nicely summarises a range of moving average methods, some of which are easily implemented in parallel.
For example, a simple moving average takes the following form (for odd n)**:
n2 = int(n/2)
moving_average[i] = (data[i-n2] + data[i-n2+1] ... + 
    data[i] + ... + data[i+n2-1] + data[i+n2])/n
This method doesn't make use of any data earlier than int(n/2) points before i to calculate the moving average at point i. Therefore, you could calculate the moving average of a data set of m items in parallel with p threads by dividing the m items into p sub-sequences, each of which overlaps the next and previous (excepting the first and last sub-sequences) sub-sequence by int(n/2) data points, and have each thread calculate the moving averages for its sub-sequence. 
You can find an efficient sequential implementation of this algorithm (which would be applicable to each thread of the parallel implementation) in the question Simple Moving Average summation/offset issue and its answer. That method calculates a trailing moving average rather than the (arguably preferred) centrally-located moving average that I've shown above. That is, it puts the value that I calculated above at moving_average[i+n2] instead of at moving_average[i].
** This leaves aside the possibility that the data may be at irregular time intervals. The method you've shown addresses that issue and it can be dealt with the same way in other methods.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With