Suppose I have a function:
def f(prev, curr):
return prev * 2 + curr
(Just an example, could have been anything)
And a Polars dataframe:
| some_col | other_col |
|----------|-----------|
| 7 | ...
| 3 |
| 9 |
| 2 |
I would like to use f
on my dataframe cumulatively, and the output would be:
| some_col | other_col |
|----------|-----------|
| 7 | ...
| 17 |
| 43 |
| 88 |
I understand that, naturally, this type of calculation isn't going to be very efficient since it has to be done one row at a time (at least in the general case).
I can obviously loop over rows. But is there an elegant, idiomatic way to do this in Polars?
It depends on the exact operation you need to perform.
The example you've given can be expressed in terms of .cum_sum()
with additional arithmetic:
def plus_prev_times_2(col):
x = 2 ** pl.int_range(pl.len() - 1).reverse()
y = 2 ** pl.int_range(1, pl.len())
cs = (x * col.slice(1)).cum_sum()
return cs / x + col.first() * y
df = pl.DataFrame({"some_col": [7, 3, 9, 2]})
df.with_columns(
pl.col.some_col.first()
.append(pl.col.some_col.pipe(plus_prev_times_2))
.alias("plus_prev_times_2")
)
shape: (4, 2)
┌──────────┬───────────────────┐
│ some_col ┆ plus_prev_times_2 │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞══════════╪═══════════════════╡
│ 7 ┆ 7.0 │
│ 3 ┆ 17.0 │
│ 9 ┆ 43.0 │
│ 2 ┆ 88.0 │
└──────────┴───────────────────┘
In general, I believe what you're asking for is called a "Vertical fold/scan"
Polars only offers a horizontal version, pl.cum_fold
df = pl.DataFrame(dict(a=[7], b=[3], c=[9], d=[2]))
df.with_columns(
pl.cum_fold(acc=0, function=lambda acc, x: acc * 2 + x, exprs=pl.all())
)
shape: (1, 5)
┌─────┬─────┬─────┬─────┬──────────────┐
│ a ┆ b ┆ c ┆ d ┆ cum_fold │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 ┆ struct[4] │
╞═════╪═════╪═════╪═════╪══════════════╡
│ 7 ┆ 3 ┆ 9 ┆ 2 ┆ {7,17,43,88} │
└─────┴─────┴─────┴─────┴──────────────┘
As discussed in the issue, a vertical equivalent would be hugely inefficient.
For an efficient approach, you can write plugins in Rust:
But using something like numba is probably easier to implement.
There are several existing numba answers, e.g.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With