pandas rolling apply function has slow performance

Question

The source code in question is

import numpy as np
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
df.rolling(window=period, min_periods=1).apply(dd)

It takes an extremely long time to execute the above 2 lines of code. It is with latest pandas version(1.4.0). The dataframe has 3000 rows and 2000 columns only.

Same code with previous pandas version(0.23.x) provides result much faster.

I've tried with other suggessions and questions like Slow performance of pandas groupby/apply but are of not much help.

period is a int variable with value 250.

3 revsMichael Szczesny · Accepted Answer

These are not a solution, at most workarounds for simple cases like the example function. But it confirms the suspicion that the processing speed of df.rolling.apply is anything but optimal.

Using a much smaller dataset for obvious reasons

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.rand(200,100)
)
period = 10
res = [0,0]

Running time with pandas v1.3.5

%%timeit -n1 -r1
dd=lambda x: np.nanmax(1.0 - x / np.fmax.accumulate(x))
res[0] = df.rolling(window=period, min_periods=1).apply(dd)
# 1 loop, best of 1: 8.72 s per loop

Against a numpy implementation

from numpy.lib.stride_tricks import sliding_window_view as window

%%timeit
x = window(np.vstack([np.full((period-1,df.shape[1]), np.nan),df.to_numpy()]), period, axis=0)
res[1] = np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1)
# 100 loops, best of 5: 3.39 ms per loop

np.testing.assert_allclose(res[0], res[1])

8.72*1000 / 3.39 = 2572.27 x speedup.

Processing columns in chunks

l = []
for arr in np.array_split(df.to_numpy(), 100, 1):
    x = window(np.vstack([np.full((period-1,arr.shape[1]), np.nan),arr]), period, axis=0)
    l.append(np.nanmax(1.0 - x / np.fmax.accumulate(x, axis=-1), axis=-1))
res[1] = np.hstack(l)
# 1 loop, best of 5: 9.15 s per loop for df.shape (2000,2000)

Using `pandas` `numba` engine

We can get even faster with pandas support for numba jitted functions. Unfortunately numba v0.55.1 can't compile ufunc.accumulate. We have to write our own implementation of np.fmax.accumulate (no guarantees on my implementation). Please note that the first call is slower because the function needs to be compiled.

def dd_numba(x):
    res = np.empty_like(x)
    res[0] = x[0]
    for i in range(1, len(res)):
        if res[i-1] > x[i] or np.isnan(x[i]):
            res[i] = res[i-1]
        else:
            res[i] = x[i]
    return np.nanmax(1.0 - x / res)

df.rolling(window=period, min_periods=1).apply(dd_numba, engine='numba', raw=True)

We can use the familiar pandas interface and it's ~1.16x faster than my chunked numpy approach for df.shape (2000,2000).

pandas rolling apply function has slow performance

Tags:

python

pandas

apply

pandas-rolling

jagpreet

1 Answers

Processing columns in chunks

Using `pandas` `numba` engine

3 revsMichael Szczesny

Recent Activity

Donate For Us

pandas rolling apply function has slow performance

Tags:

python

pandas

apply

pandas-rolling

jagpreet

1 Answers

Processing columns in chunks

Using pandas numba engine

3 revsMichael Szczesny

Related questions

Recent Activity

Donate For Us

Using `pandas` `numba` engine