Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas Rolling vs Scipy kurtosis - serious numerical inaccuracy

Tags:

python

pandas

First and foremost, I'm sorry for the clearly not minimal examples that I listed below. I am fully aware this doesn't meet SO's minimally reproducible constraint, however, having been experimenting now for hours trying to recreate the issue, it really seems to me it only arises when calculation is performed on at least a couple of hundreds of values.

I have a dataframe with millions of values where I want to calculate kurtosis in each column on a rolling basis. Initally I used pd.rolling.kurt:

df.rolling(20, min_periods=3).kurt(bias=False)

but noticed two serious issues with that approach:

  1. accuracy is not satisfactory; even though pandas's method gives an approximately okay result, for my use case a deviation in the order of magnitude 1e-4 is hard to accept;
  2. even more worrisome are the regularly "exploding" kurtosis values: for no visible reason, kurtosis values suddenly begin to diverge into +/-10,000s, entirely distorting the intended output.

I created three series, s1,s2, and s3 with 300, 600, and 900 values respectively. (Assignments with the exact values are added at the end of this post so as not to cause much trouble following my post.) These three series are slices from one column of the dataframe. The slices are created in such a way that the last position is fixed, i.e. s1 has values from N-299 to N, s2 from N-599 to N and s3 from N-899 to N. Running pd.rolling.kurt on these three series and printing the tail of the dataframe (where the issue I want to talk about appears) gives the following:

>>> s1.rolling(20,min_periods=3).kurt().tail(10)
290     9.591067
291     9.591067
292     9.591067
293     9.591067
294    19.663666
295    14.872262
296    14.147157
297    16.716964
298     7.032522
299    19.983796
>>> s2.rolling(20,min_periods=3).kurt().tail(10)
590     9.591067
591     9.591067
592     9.591067
593     9.591067
594    19.663666
595    14.872262
596    14.147157
597    16.716964
598     7.032522
599    19.983796
>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890         9.591071
891         9.591071
892         9.591071
893         9.591071
894        19.663685
895        15.248361
896        40.444894
897      1368.233241
898    251407.375343
899    902540.031652

I performed the same computation in Excel and for the last ten indices, the kurtosis values should be the following (I used the notation 290 / 590 / 890 to save some space: the three output series have the same values for index values 290-299, 590-599, and 890-899):

290 / 590 / 890      9.591067361
291 / 591 / 891      9.591067361
292 / 592 / 892      9.591067361
293 / 593 / 893      9.591067361
294 / 594 / 894      19.66366573
295 / 595 / 895      14.87226197
296 / 596 / 896      14.14715754
297 / 597 / 897      16.7169886
298 / 598 / 898      7.037037037
299 / 599 / 899      20

Observing the outputs provided by pd.rolling.kurt we see that the first two outputs are identical, although they do not match with the real output I computed using Excel. However, the even larger problem happens with the third output where the values explode as if the total number of values in the series would somehow influence the kurtosis values, even though for all three cases I used a rolling window of 20 with a minimum required number of 3. Theoretically, if my understanding is correct, this means that nothing else should interfere with the kurtosis output besides the current and the 19 last rows. I'm puzzled how these "exploding" values can appear.

I then recomputed the kurtosis values for the same series using scipy.stats.kurtosis. This gave me the following output:

>>> s1.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
290     9.591067
291     9.591067
292     9.591067
293     9.591067
294    19.663666
295    14.872262
296    14.147158
297    16.716989
298     7.037037
299    20.000000
>>> s2.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
590     9.591067
591     9.591067
592     9.591067
593     9.591067
594    19.663666
595    14.872262
596    14.147158
597    16.716989
598     7.037037
599    20.000000
>>> s3.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
890     9.591067
891     9.591067
892     9.591067
893     9.591067
894    19.663666
895    14.872262
896    14.147158
897    16.716989
898     7.037037
899    20.000000

This computes the kurtosis perfectly. However, the .apply(lambda x: kurtosis(x,...) construct is shockingly inefficient compared to the vectorized pandas approach, pushing the total processing time for the entire dataframe from a couple of minutes all the way to more than an hour! I am fully aware that in many cases an inbuilt vectorized solution tends to prefer speed over numerical accuracy which would explain the first issue I listed above; however, as for the second issue (i.e. "exploding" values) I simply don't see a justification.

Is there any way to compute the kurtosis efficiently without values diverging and invalidating my whole output?


Series definitions

Here come the exact values I used to compute the aforementioned outputs:

s1 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])

s2 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])

s3 = pd.Series([0.0006613932897393013,0.0002659978876289742,0.000658737582405648,0.0005623339888467145,0.0008417590777197284,0.000542090011101782,0.0007813756301534222,0.0003713395103963933,0.0001847566192768637,0.0005892778635844672,-0.0001955367110279687,0.0004436264576506058,0.000302660947173135,0.0007556577955957223,0.0004099113835531532,0.0002143017625986564,1.052211101549051e-05,6.481751166152551e-05,6.615670911548045e-05,-2.169766854576383e-05,-1.302819997635433e-05,-7.303052044212008e-06,-0.1163297855507419,-0.06335289603465369,-0.03314811069814094,-0.01697505737063765,-0.008591697883893402,-0.004342398361182662,-0.002157940126839023,-0.001100682037128825,-0.0005507856703497119,-0.0002554269710891206,-0.0001277329565522002,-8.395111298446951e-05,-2.189884089509773e-05,-1.094960028496637e-05,-5.479844975342307e-06,-2.739933748392279e-06,-1.369969689294177e-06,-6.799856523827107e-07,-3.399929995978179e-07,-1.79996340600251e-07,-7.999838400850306e-08,-3.999919442393075e-08,-2.999939675042158e-08,-2.007979819879551e-05,-1.004005030070562e-05,-5.52007060169889e-06,-2.760046727695654e-06,9.150125677134498e-06,4.580031464668292e-06,2.2900078662783e-06,1.150001972312828e-06,5.700004873407606e-07,2.80000120302654e-07,1.50000032247295e-07,7.000000733862829e-08,3.000000181016647e-08,2.000000056662899e-08,1.00000003333145e-08,1.000000011126989e-08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
like image 440
lazarea Avatar asked Dec 14 '25 21:12

lazarea


1 Answers

It looks like a bug in older Pandas version. I could reproduce on an old installation Python 3.6.2 64 bit on win32, Pandas 1.0.3, numpy 1.15.4:

>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890         9.591071
891         9.591071
892         9.591071
893         9.591071
894        19.663685
895        15.248361
896        40.444894
897      1368.233241
898    251407.375343
899    902540.031652
dtype: float64

It seems to be fixed on my newer version, Python 3.8.4 64 bit, Pandas 1.2.2, numpy 1.20.1:

>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890     9.591067
891     9.591067
892     9.591067
893     9.591067
894    19.663666
895    14.872262
896    14.147158
897    16.716989
898     7.037037
899    20.000000
dtype: float64

both installations on the same Windows 10 machine.

I cannot say which component (Pandas or numpy) is the cause. As your tests using numpy.stats.kurtosis give correct result, I would suspect Pandas, but without further analysis by Pandas experts (and I am not one) I cannot be affirmative.

IMHO, the most reasonable solution is either to upgrade your system, or add a fresh new independant Python installation with the last possible Pandas version.

like image 71
Serge Ballesta Avatar answered Dec 16 '25 11:12

Serge Ballesta



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!