import polars as pl
data = {'type': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C'],
'value': [5, 9, 1, 0, 3, 2, 5, 8, 9, 1, 0, 3, 3, 1, 1, 0, 2, 0, 0, 5, 7, 4, 7, 8, 9, 11, 1, 1, 0, 1, 4, 3, 21]}
df = pl.DataFrame(data)
print(df)
Given two columns of data, how can we group them by the 'type' column, sum the 'value' column using a rolling window of size 5, and then place the resulting data into a column named 'result'?
The results are as follows:
[None, None, None, None, 18, 15, 11, 18, 27, 25, 23, 21, 16, None, None, None, None, 4, 3, 7, 14, 16, None, None, None, None, 36, 30, 22, 14, 7, 9, 29]
(Please using the polars library only, Polars version = 0.17.9)
You can use .rolling_sum() as a Window function.
with pl.Config(tbl_rows=33):
df.with_columns(
pl.col("value").rolling_sum(window_size=5).over("type").alias("result")
)
shape: (33, 3)
┌──────┬───────┬────────┐
│ type ┆ value ┆ result │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞══════╪═══════╪════════╡
│ A ┆ 5 ┆ null │
│ A ┆ 9 ┆ null │
│ A ┆ 1 ┆ null │
│ A ┆ 0 ┆ null │
│ A ┆ 3 ┆ 18 │
│ A ┆ 2 ┆ 15 │
│ A ┆ 5 ┆ 11 │
│ A ┆ 8 ┆ 18 │
│ A ┆ 9 ┆ 27 │
│ A ┆ 1 ┆ 25 │
│ A ┆ 0 ┆ 23 │
│ A ┆ 3 ┆ 21 │
│ A ┆ 3 ┆ 16 │
│ B ┆ 1 ┆ null │
│ B ┆ 1 ┆ null │
│ B ┆ 0 ┆ null │
│ B ┆ 2 ┆ null │
│ B ┆ 0 ┆ 4 │
│ B ┆ 0 ┆ 3 │
│ B ┆ 5 ┆ 7 │
│ B ┆ 7 ┆ 14 │
│ B ┆ 4 ┆ 16 │
│ C ┆ 7 ┆ null │
│ C ┆ 8 ┆ null │
│ C ┆ 9 ┆ null │
│ C ┆ 11 ┆ null │
│ C ┆ 1 ┆ 36 │
│ C ┆ 1 ┆ 30 │
│ C ┆ 0 ┆ 22 │
│ C ┆ 1 ┆ 14 │
│ C ┆ 4 ┆ 7 │
│ C ┆ 3 ┆ 9 │
│ C ┆ 21 ┆ 29 │
└──────┴───────┴────────┘
I agree that using pl.Expr.rolling_sum in conjunction with pl.Expr.over is the most elegant solution - especially, if the dataframe has multiple columns one wants to keep.
Still, if one doesn't want to rely pl.Expr.rolling_sum (which is marked unstable) the same behaviour can be achieved using pl.DataFrame.rolling as follows.
(
df
.rolling(pl.int_range(pl.len()).alias("index"), period="5i", group_by="type")
.agg(
pl.col("value").alias("window"),
pl.col("value").last(),
pl.when(pl.len() == 5).then(pl.col("value").sum()).alias("result"),
)
.drop("index")
)
Here, the column "window" is only created to highlight the window aggregated by agg.
shape: (33, 4)
┌──────┬──────────────────┬───────┬────────┐
│ type ┆ window ┆ value ┆ result │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ list[i64] ┆ i64 ┆ i64 │
╞══════╪══════════════════╪═══════╪════════╡
│ A ┆ [5] ┆ 5 ┆ null │
│ A ┆ [5, 9] ┆ 9 ┆ null │
│ A ┆ [5, 9, 1] ┆ 1 ┆ null │
│ A ┆ [5, 9, 1, 0] ┆ 0 ┆ null │
│ A ┆ [5, 9, 1, 0, 3] ┆ 3 ┆ 18 │
│ A ┆ [9, 1, 0, 3, 2] ┆ 2 ┆ 15 │
│ A ┆ [1, 0, 3, 2, 5] ┆ 5 ┆ 11 │
│ A ┆ [0, 3, 2, 5, 8] ┆ 8 ┆ 18 │
│ … ┆ … ┆ … ┆ … │
│ C ┆ [7, 8, 9, 11, 1] ┆ 1 ┆ 36 │
│ C ┆ [8, 9, 11, 1, 1] ┆ 1 ┆ 30 │
│ C ┆ [9, 11, 1, 1, 0] ┆ 0 ┆ 22 │
│ C ┆ [11, 1, 1, 0, 1] ┆ 1 ┆ 14 │
│ C ┆ [1, 1, 0, 1, 4] ┆ 4 ┆ 7 │
│ C ┆ [1, 0, 1, 4, 3] ┆ 3 ┆ 9 │
│ C ┆ [0, 1, 4, 3, 21] ┆ 21 ┆ 29 │
└──────┴──────────────────┴───────┴────────┘
Note. I've used the following polars configuration to obtain the exact output printed above.
pl.Config.set_tbl_rows(15)
pl.Config.set_fmt_table_cell_list_len(5)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With