pandas or Polars: find index of previous element larger than current one

Question

Suppose my data looks like this:

data = {
    'value': [1,9,6,7,3, 2,4,5,1,9]
}

For each row, I would like to find the row number of the latest previous element larger than the current one.

So, my expected output is:

[None, 0, 1, 2, 1, 1, 3, 4, 1, 0]

the first element 1 has no previous element, so I want None in the result
the next element 9 is at least as large than all its previous elements, so I want 0 in the result
the next element 6, has its previous element 9 which is larger than it. The distance between them is 1. So, I want 1 in the result here.

I'm aware that I can do this in a loop in Python (or in C / Rust if I write an extension).

My question: is it possible to solve this using entirely dataframe operations? pandas or Polars, either is fine. But only dataframe operations.

So, none of the following please:

apply
map_elements
map_rows
iter_rows
Python for loops which loop over the rows and extract elements one-by-one from the dataframes

Andrej Kesely · Accepted Answer

It's hard to vectorize these kind of problems, but you can use numba module to speed-up the task. Also this problem can be parallelized very easily:

from numba import njit, prange

@njit(parallel=True)
def get_values(values):
    out = np.zeros_like(values, dtype=np.float64)

    for i in prange(len(values)):
        idx = np.int64(i)
        v = values[idx]

        while idx > -1 and values[idx] <= v:
            idx -= 1

        if idx > -1:
            out[i] = i - idx

    out[0] = np.nan
    return out

data = {
    "value": [1, 9, 6, 7, 3, 2, 4, 5, 1, 9],
    "out": [None, 0, 1, 2, 1, 1, 3, 4, 1, 0],
}
df = pd.DataFrame(data)

df["out2"] = get_values(df["value"].values)
print(df)

Prints:

   value  out  out2
0      1  NaN   NaN
1      9  0.0   0.0
2      6  1.0   1.0
3      7  2.0   2.0
4      3  1.0   1.0
5      2  1.0   1.0
6      4  3.0   3.0
7      5  4.0   4.0
8      1  1.0   1.0
9      9  0.0   0.0

Benchmark (with 1_000_000 items from 1-100):

from timeit import timeit

data = {
    "value": np.random.randint(1, 100, size=1_000_000),
}
df = pd.DataFrame(data)

t = timeit('df["out"] = get_values(df["value"].values)', globals=globals(), number=1)
print(t)

Prints on my machine (AMD 5700x):

0.3559090679627843

pandas or Polars: find index of previous element larger than current one

Tags:

python

pandas

python-polars

ignoring_gravity

1 Answers

Andrej Kesely

Recent Activity

Donate For Us

pandas or Polars: find index of previous element larger than current one

Tags:

python

pandas

python-polars

ignoring_gravity

1 Answers

Andrej Kesely

Related questions

Recent Activity

Donate For Us