Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to update fields with previous fields value in polars?

I have this dataframe:

import polars as pl

df = pl.DataFrame({
    'file':['a','a','a','a','b','b'],
    'ru':['fe','fe','ev','ev','ba','br'],
    'rt':[0,0,1,1,1,0],
})
shape: (6, 3)
┌──────┬─────┬─────┐
│ file ┆ ru  ┆ rt  │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ i64 │
╞══════╪═════╪═════╡
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ ev  ┆ 1   │
│ a    ┆ ev  ┆ 1   │
│ b    ┆ ba  ┆ 1   │
│ b    ┆ br  ┆ 0   │
└──────┴─────┴─────┘

I'd like to replace the values in "ru" and "rt" within the same group defined by "file" with the values of the first row in the group if the first "rt" value is 0.

The desired output would look as follows.

shape: (6, 3)
┌──────┬─────┬─────┐
│ file ┆ ru  ┆ rt  │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ i64 │
╞══════╪═════╪═════╡
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ b    ┆ ba  ┆ 1   │
│ b    ┆ br  ┆ 0   │
└──────┴─────┴─────┘

How can I achieve that?

like image 765
lmocsi Avatar asked Sep 07 '25 19:09

lmocsi


1 Answers

Get first column values within each "file" group.

This can be achieved using window functions (using pl.Expr.over in polars).

df.with_columns(
    pl.col("ru").first().over("file"),
    pl.col("rt").first().over("file"),
)

Polars also accepts multiple column names in pl.col and will evaluate the expressions independently of each other (just as above).

df.with_columns(
    pl.col("ru", "rt").first().over("file")
)
shape: (6, 3)
┌──────┬─────┬─────┐
│ file ┆ ru  ┆ rt  │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ i64 │
╞══════╪═════╪═════╡
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ b    ┆ ba  ┆ 1   │
│ b    ┆ ba  ┆ 1   │
└──────┴─────┴─────┘

Add condition to take use first column values.

To only use the first values within each group, if the first column value of "rt" within the group is 0, we can use a pl.when().then().otherwise() construct. For the condition we again use window functions.

df.with_columns(
    pl.when(
        pl.col("rt").first().over("file") == 0
    ).then(
        pl.col("ru", "rt").first().over("file")
    ).otherwise(
        pl.col("ru", "rt")
    )
)
shape: (6, 3)
┌──────┬─────┬─────┐
│ file ┆ ru  ┆ rt  │
│ ---  ┆ --- ┆ --- │
│ str  ┆ str ┆ i64 │
╞══════╪═════╪═════╡
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ a    ┆ fe  ┆ 0   │
│ b    ┆ ba  ┆ 1   │
│ b    ┆ br  ┆ 0   │
└──────┴─────┴─────┘
like image 180
Hericks Avatar answered Sep 10 '25 06:09

Hericks