Pandas variance and Standard deviation result differing with manual calculation

Question

I'm trying to fin Mean, Variance and SD using pandas. However, manual calcuation is different from that of pandas output. is there anything i'm missing using pandas. Attached the xl screenshot for reference Mean=394, Variance21704, SD=147.32

import pandas as pd

dg_df = pd.DataFrame(
            data=[600,470,170,430,300],
            index=['a','b','c','d','e'])

print(dg_df.mean(axis=0)) # 394.0 matches with manual calculation
print(dg_df.var())        # 27130.0 not matching with manual calculation 21704
print(dg_df.std(axis=0))  # 164.71187 not matching with manual calculation 147.32

jpp · Accepted Answer

There is more than one definition of standard deviation. You are calculating the equivalent of Excel STDEV.P, which has the description: "Calculates standard deviation based on the entire population...". If you need sample standard deviation in Excel use STDEV.S.

pd.DataFrame.std assumes 1 degree of freedom by default, also known as sample standard deviation.

numpy.std assumes 0 degree of freedom by default, also known as population standard deviation.

See Bessel's correction to understand the difference between sample and population.

You can also specify ddof=0 with Pandas std / var methods:

dg_df.std(ddof=0)
dg_df.var(ddof=0)

Pandas variance and Standard deviation result differing with manual calculation

Tags:

python

pandas

statistics

standard-deviation

variance

luckyluke

1 Answers

jpp

Recent Activity

Donate For Us

Pandas variance and Standard deviation result differing with manual calculation

Tags:

python

pandas

statistics

standard-deviation

variance

luckyluke

1 Answers

jpp

Related questions

Recent Activity

Donate For Us