I'm looking for a way to apply a user defined function taking a dictionary, and not a tuple, of arguments as input when using pl.DataFrame.map_rows.
Trying something like
df.map_rows(lambda x: udf({k:v for k, v in zip(df.columns, x)}))
I'm getting a RuntimeError: Already mutably borrowed
In the doc it is said that :
The frame-level map_rows cannot track column names (as the UDF is a black-box that may arbitrarily drop, rearrange, transform, or add new columns); if you want to apply a UDF such that column names are preserved, you should use the expression-level map_elements syntax instead.
But how does this prevent polars to pass a dict and not a tuple to the udf ? Just like calling df.row(i, named=True). Why the struct can't be named ?
I know I can iterate trough df.rows() and do my user-defined stuff, then convert back to pl.DataFrame, but I would have liked a way to do this without leaving the polars API.
I don't know enough about the underlying rust dynamics, but capturing df.columns before calling map_rows seems to work.
cols = df.columns
df.map_rows(lambda x: udf({k:v for k, v in zip(cols, x)}))
Moreover, you can simplify the creation of the dictionary by using the dict() constructor.
cols = df.columns
df.map_rows(lambda x: udf(dict(zip(cols, x))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With