Is there an elegant way how to recode values in polars dataframe.
For example
1->0,
2->0,
3->1...
in Pandas it is simple like that:
df.replace([1,2,3,4,97,98,99],[0,0,1,1,2,2,2])
Polars has dedicated replace and replace_strict expressions.
df = pl.DataFrame({
"a": [1, 2, 3, 4, 5]
})
mapper = {
1: 0,
2: 0,
3: 10,
4: 10
}
df.select(
pl.all().replace(mapper)
)
shape: (5, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 0 │
│ 0 │
│ 10 │
│ 10 │
│ 5 │
└─────┘
In polars you can build columnar if else statetements called if -> then -> otherwise expressions.
So let's say we have this DataFrame.
df = pl.DataFrame({
"a": [1, 2, 3, 4, 5]
})
And we'd like to replace these with the following values:
from_ = [1, 2]
to_ = [99, 12]
We could write:
df.with_columns(
pl.when(pl.col("a") == from_[0])
.then(to_[0])
.when(pl.col("a") == from_[1])
.then(to_[1])
.otherwise(pl.col("a")).alias("a")
)
shape: (5, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 99 │
│ 12 │
│ 3 │
│ 4 │
│ 5 │
└─────┘
Now, this becomes very tedious to write really fast, so we could write a function that generates these expressions for use, we are programmers aren't we!
So to replace with the values you have suggested, you could do:
from_ = [1,2,3,4,97,98,99]
to_ = [0,0,1,1,2,2,2]
def replace(column, from_, to_):
# initiate the expression with `pl.when`
branch = pl.when(pl.col(column) == from_[0]).then(to_[0])
# for every value add a `when.then`
for (from_value, to_value) in zip(from_, to_):
branch = branch.when(pl.col(column) == from_value).then(to_value)
# finish with an `otherwise`
return branch.otherwise(pl.col(column)).alias(column)
df.with_columns(replace("a", from_, to_))
Which outputs:
shape: (5, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 0 │
│ 0 │
│ 1 │
│ 1 │
│ 5 │
└─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With