Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding/Combining Standard Deviations

Short Version:
Can StdDevs be added/combined? i.e.

if StdDev(11,14,16,17)=X and StdDev(21,34,43,12)=Y  
can we calculate StdDev(11,14,16,17,21,34,43,12) from X & Y

Long Version:
I am designing a star schema. The schema has a fact_table (grain=transaction) which stores individual transaction response_time. The schema also has an aggregate_table (grain=day) which stores the response_time_sum per day.
In my report I need to calculate standard deviations of the response time for a given timedimension, say day, week, month etc. How can I calculate the StandardDeviation using the aggregate_table instead of touching the huge fact_table?

like image 239
Riyaz Avatar asked Sep 01 '25 10:09

Riyaz


1 Answers

Yes, you can combine them. You need to know the number of observations, mean, and standard deviation for each day. The variance is easier to work with than the standard deviation, so I'll express everything else in terms of variance. (Standard deviation is defined as the square root of the variance.)

Denote:

n[i] # observations for day i
m[i] # mean for day i
v[i] # variance for day i

You'll need to calculate the total number of observations N and the overall mean M. This is easy:

days = [day1, day2, ..., day_final]
N = sum(n[i] for i in days)
M = sum(n[i] * m[i] for i in days) / N

The overall variance V is more complicated, but still can be calculated:

s1 = sum(n[i] * v[i] for i in days)
s2 = sum(n[i] * (m[i] - M)**2 for i in days)
V = (s1 + s2) / N

The above are for the population variance. If you instead have v[i] as the sample variance, some minor modifications to s1 and V are needed:

s1_sample = sum((n[i] - 1) * v[i] for i in days)
V_sample = (s1_sample + s2) / (N - 1)
like image 166
Michael J. Barber Avatar answered Sep 04 '25 01:09

Michael J. Barber