How to group_by and rolling sum in polars？

Question

import polars as pl

data = {'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
        'value': [5, 9, 1, 0, 3, 2, 5, 8, 9, 1, 0, 3, 3, 1, 1, 0, 2, 0, 0, 5, 7, 4, 7, 8, 9, 11, 1, 1, 0, 1, 4, 3, 21]}
df = pl.DataFrame(data)
print(df)

Given two columns of data, how can we group them by the 'type' column, sum the 'value' column using a rolling window of size 5, and then place the resulting data into a column named 'result'?

The results are as follows:

[None, None, None, None, 18, 15, 11, 18, 27, 25, 23, 21, 16, None, None, None, None, 4, 3, 7, 14, 16, None, None, None, None, 36, 30, 22, 14, 7, 9, 29]

(Please using the polars library only, Polars version = 0.17.9)

jqurious · Accepted Answer

You can use .rolling_sum() as a Window function.

with pl.Config(tbl_rows=33):
    df.with_columns(
        pl.col("value").rolling_sum(window_size=5).over("type").alias("result")
    )

shape: (33, 3)
┌──────┬───────┬────────┐
│ type ┆ value ┆ result │
│ ---  ┆ ---   ┆ ---    │
│ str  ┆ i64   ┆ i64    │
╞══════╪═══════╪════════╡
│ A    ┆ 5     ┆ null   │
│ A    ┆ 9     ┆ null   │
│ A    ┆ 1     ┆ null   │
│ A    ┆ 0     ┆ null   │
│ A    ┆ 3     ┆ 18     │
│ A    ┆ 2     ┆ 15     │
│ A    ┆ 5     ┆ 11     │
│ A    ┆ 8     ┆ 18     │
│ A    ┆ 9     ┆ 27     │
│ A    ┆ 1     ┆ 25     │
│ A    ┆ 0     ┆ 23     │
│ A    ┆ 3     ┆ 21     │
│ A    ┆ 3     ┆ 16     │
│ B    ┆ 1     ┆ null   │
│ B    ┆ 1     ┆ null   │
│ B    ┆ 0     ┆ null   │
│ B    ┆ 2     ┆ null   │
│ B    ┆ 0     ┆ 4      │
│ B    ┆ 0     ┆ 3      │
│ B    ┆ 5     ┆ 7      │
│ B    ┆ 7     ┆ 14     │
│ B    ┆ 4     ┆ 16     │
│ C    ┆ 7     ┆ null   │
│ C    ┆ 8     ┆ null   │
│ C    ┆ 9     ┆ null   │
│ C    ┆ 11    ┆ null   │
│ C    ┆ 1     ┆ 36     │
│ C    ┆ 1     ┆ 30     │
│ C    ┆ 0     ┆ 22     │
│ C    ┆ 1     ┆ 14     │
│ C    ┆ 4     ┆ 7      │
│ C    ┆ 3     ┆ 9      │
│ C    ┆ 21    ┆ 29     │
└──────┴───────┴────────┘

Hericks · Answer

I agree that using pl.Expr.rolling_sum in conjunction with pl.Expr.over is the most elegant solution - especially, if the dataframe has multiple columns one wants to keep.

Still, if one doesn't want to rely pl.Expr.rolling_sum (which is marked unstable) the same behaviour can be achieved using pl.DataFrame.rolling as follows.

(
    df
    .rolling(pl.int_range(pl.len()).alias("index"), period="5i", group_by="type")
    .agg(
        pl.col("value").alias("window"),
        pl.col("value").last(),
        pl.when(pl.len() == 5).then(pl.col("value").sum()).alias("result"),
    )
    .drop("index")
)

Here, the column "window" is only created to highlight the window aggregated by agg.

shape: (33, 4)
┌──────┬──────────────────┬───────┬────────┐
│ type ┆ window           ┆ value ┆ result │
│ ---  ┆ ---              ┆ ---   ┆ ---    │
│ str  ┆ list[i64]        ┆ i64   ┆ i64    │
╞══════╪══════════════════╪═══════╪════════╡
│ A    ┆ [5]              ┆ 5     ┆ null   │
│ A    ┆ [5, 9]           ┆ 9     ┆ null   │
│ A    ┆ [5, 9, 1]        ┆ 1     ┆ null   │
│ A    ┆ [5, 9, 1, 0]     ┆ 0     ┆ null   │
│ A    ┆ [5, 9, 1, 0, 3]  ┆ 3     ┆ 18     │
│ A    ┆ [9, 1, 0, 3, 2]  ┆ 2     ┆ 15     │
│ A    ┆ [1, 0, 3, 2, 5]  ┆ 5     ┆ 11     │
│ A    ┆ [0, 3, 2, 5, 8]  ┆ 8     ┆ 18     │
│ …    ┆ …                ┆ …     ┆ …      │
│ C    ┆ [7, 8, 9, 11, 1] ┆ 1     ┆ 36     │
│ C    ┆ [8, 9, 11, 1, 1] ┆ 1     ┆ 30     │
│ C    ┆ [9, 11, 1, 1, 0] ┆ 0     ┆ 22     │
│ C    ┆ [11, 1, 1, 0, 1] ┆ 1     ┆ 14     │
│ C    ┆ [1, 1, 0, 1, 4]  ┆ 4     ┆ 7      │
│ C    ┆ [1, 0, 1, 4, 3]  ┆ 3     ┆ 9      │
│ C    ┆ [0, 1, 4, 3, 21] ┆ 21    ┆ 29     │
└──────┴──────────────────┴───────┴────────┘

Note. I've used the following polars configuration to obtain the exact output printed above.

pl.Config.set_tbl_rows(15)
pl.Config.set_fmt_table_cell_list_len(5)

How to group_by and rolling sum in polars？

Tags:

python

dataframe

rolling-computation

python-polars

Young Eddie

2 Answers

jqurious

Hericks

Recent Activity

Donate For Us

How to group_by and rolling sum in polars？

Tags:

python

dataframe

rolling-computation

python-polars

Young Eddie

2 Answers

jqurious

Hericks

Related questions

Recent Activity

Donate For Us