Short Version:
Can StdDevs be added/combined? i.e.
if StdDev(11,14,16,17)=X and StdDev(21,34,43,12)=Y
can we calculate StdDev(11,14,16,17,21,34,43,12) from X & Y
Long Version:
I am designing a star schema. The schema has a fact_table (grain=transaction) which stores individual transaction response_time. The schema also has an aggregate_table (grain=day) which stores the response_time_sum per day.
In my report I need to calculate standard deviations of the response time for a given timedimension, say day, week, month etc. How can I calculate the StandardDeviation using the aggregate_table instead of touching the huge fact_table?
Yes, you can combine them. You need to know the number of observations, mean, and standard deviation for each day. The variance is easier to work with than the standard deviation, so I'll express everything else in terms of variance. (Standard deviation is defined as the square root of the variance.)
Denote:
n[i] # observations for day i
m[i] # mean for day i
v[i] # variance for day i
You'll need to calculate the total number of observations N
and the overall mean M
. This is easy:
days = [day1, day2, ..., day_final]
N = sum(n[i] for i in days)
M = sum(n[i] * m[i] for i in days) / N
The overall variance V
is more complicated, but still can be calculated:
s1 = sum(n[i] * v[i] for i in days)
s2 = sum(n[i] * (m[i] - M)**2 for i in days)
V = (s1 + s2) / N
The above are for the population variance. If you instead have v[i]
as the sample variance, some minor modifications to s1
and V
are needed:
s1_sample = sum((n[i] - 1) * v[i] for i in days)
V_sample = (s1_sample + s2) / (N - 1)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With