Given a polars DataFrame:
data = pl.DataFrame({"user_id": [1, 1, 1, 2, 2, 2], "login": [False, True, False, False, False, True]})
How could I add a column which adds the number of rows until the user next logs in, with any rows after the last login for that user being set to None? Example output for the above data is
[1, 0, None, 2, 1, 0]
I have tried adapting the answer from here with a backward_fill()
but can not get it working
IIUC, you have to use backward_fill
and invert the subtraction:
(data
.with_row_index()
.with_columns(distance =
pl.when("login").then("index").backward_fill().over("user_id") - pl.col.index
)
)
Output:
┌───────┬─────────┬───────┬──────────┐
│ index ┆ user_id ┆ login ┆ distance │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ bool ┆ u32 │
╞═══════╪═════════╪═══════╪══════════╡
│ 0 ┆ 1 ┆ false ┆ 1 │
│ 1 ┆ 1 ┆ true ┆ 0 │
│ 2 ┆ 1 ┆ false ┆ null │
│ 3 ┆ 2 ┆ false ┆ 2 │
│ 4 ┆ 2 ┆ false ┆ 1 │
│ 5 ┆ 2 ┆ true ┆ 0 │
└───────┴─────────┴───────┴──────────┘
You can create reverse index using step
parameter.
i_expr = pl.int_range(pl.len(), 0, step=-1)
(
df.with_columns(
(i_expr - pl.when("login").then(i_expr).backward_fill())
.over('user_id')
.alias('distance')
)
)
┌─────────┬───────┬──────────┐
│ user_id ┆ login ┆ distance │
│ --- ┆ --- ┆ --- │
│ i64 ┆ bool ┆ i64 │
╞═════════╪═══════╪══════════╡
│ 1 ┆ false ┆ 1 │
│ 1 ┆ true ┆ 0 │
│ 1 ┆ false ┆ null │
│ 2 ┆ false ┆ 2 │
│ 2 ┆ false ┆ 1 │
│ 2 ┆ true ┆ 0 │
└─────────┴───────┴──────────┘
Or just reverse subtraction as in @mozway answer:
i_expr = pl.int_range(pl.len())
(
df.with_columns(
(pl.when("login").then(i_expr).backward_fill() - i_expr)
.over('user_id')
.alias('distance')
)
)
┌─────────┬───────┬──────────┐
│ user_id ┆ login ┆ distance │
│ --- ┆ --- ┆ --- │
│ i64 ┆ bool ┆ i64 │
╞═════════╪═══════╪══════════╡
│ 1 ┆ false ┆ 1 │
│ 1 ┆ true ┆ 0 │
│ 1 ┆ false ┆ null │
│ 2 ┆ false ┆ 2 │
│ 2 ┆ false ┆ 1 │
│ 2 ┆ true ┆ 0 │
└─────────┴───────┴──────────┘
Note I've also moved index calculation to separate i_expr
variable and shifted over()
operation further so you only have to use it once, thus making it's easier to adjust the solution.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With