In polars python I have a datafame with 3 columns x: integers (mod 5 continuous integers with missing values), y: integers and z: str (category).
I want to group by the column z and interpolate column x and y. Here is an example dataframe:
┌─────┬─────┬─────┐
│ x ┆ y ┆ z │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 5 ┆ 1 ┆ A │
│ 10 ┆ 2 ┆ A │
│ 20 ┆ 4 ┆ A │
│ 25 ┆ 5 ┆ A │
│ 10 ┆ 2 ┆ B │
│ 20 ┆ 4 ┆ B │
│ 30 ┆ 6 ┆ B │
└─────┴─────┴─────┘
And here is the desired output:
┌─────┬─────┬─────┐
│ x ┆ y ┆ z │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 5 ┆ 1 ┆ A │
│ 10 ┆ 2 ┆ A │
│ 15 ┆ 3 ┆ A │
│ 20 ┆ 4 ┆ A │
│ 25 ┆ 5 ┆ A │
│ 10 ┆ 2 ┆ B │
│ 15 ┆ 3 ┆ B │
│ 20 ┆ 4 ┆ B │
│ 25 ┆ 5 ┆ B │
│ 30 ┆ 6 ┆ B │
└─────┴─────┴─────┘
the steps between each x values (for each category) should be always 5. My real dataframe is very large so I wish I can work with pl.LazyFrame instead of pl.DataFrame
Without the category column z I solved the issue with a join:
import polars as pl
# Main dataframe
data = dict(x=[10, 20, 30], y=[2, 4, 6])
df = pl.DataFrame(data)
# Dataframe with all x values
step = 5
df_1 = pl.DataFrame(dict(x=range(df["x"].min(), df["x"].max() + step, step)))
# merging and interpolation
print((
df_1
.join(df, on="x", how="left")
.with_columns(pl.col("y").interpolate())
))
and the result was:
┌─────┬─────┐
│ x ┆ y │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 10 ┆ 2 │
│ 15 ┆ 3 │
│ 20 ┆ 4 │
│ 25 ┆ 5 │
│ 30 ┆ 6 │
└─────┴─────┘
This works as expected, but I can not figure out how to apply this in the group_by context
You could extend your example based on pl.DataFrame.join by joining on x and z as follows.
First, we create an upsampled DataFrame (for all groups defined by z) to join on.
upsampled = (
df
.group_by("z")
.agg(
pl.int_range(pl.col("x").min(), pl.col("x").max()+5, step=5).alias("x")
)
.explode("x")
)
Next, we perform a left-join on the upsampled DataFrame and interpolate column y.
(
upsampled
.join(
df,
on=["x", "z"],
how="left"
)
.with_columns(
pl.col("y").interpolate()
)
)
Output (ordering may differ when not setting maintain_order=True in the group_by) .
shape: (10, 3)
┌─────┬─────┬─────┐
│ z ┆ x ┆ y │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════╪═════╪═════╡
│ A ┆ 5 ┆ 1.0 │
│ A ┆ 10 ┆ 2.0 │
│ A ┆ 15 ┆ 3.0 │
│ A ┆ 20 ┆ 4.0 │
│ A ┆ 25 ┆ 5.0 │
│ B ┆ 10 ┆ 2.0 │
│ B ┆ 15 ┆ 3.0 │
│ B ┆ 20 ┆ 4.0 │
│ B ┆ 25 ┆ 5.0 │
│ B ┆ 30 ┆ 6.0 │
└─────┴─────┴─────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With