Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python polars: pass named row to pl.DataFrame.map_rows

I'm looking for a way to apply a user defined function taking a dictionary, and not a tuple, of arguments as input when using pl.DataFrame.map_rows.

Trying something like

df.map_rows(lambda x: udf({k:v for k, v in zip(df.columns, x)}))

I'm getting a RuntimeError: Already mutably borrowed

In the doc it is said that :

The frame-level map_rows cannot track column names (as the UDF is a black-box that may arbitrarily drop, rearrange, transform, or add new columns); if you want to apply a UDF such that column names are preserved, you should use the expression-level map_elements syntax instead.

But how does this prevent polars to pass a dict and not a tuple to the udf ? Just like calling df.row(i, named=True). Why the struct can't be named ?

I know I can iterate trough df.rows() and do my user-defined stuff, then convert back to pl.DataFrame, but I would have liked a way to do this without leaving the polars API.

like image 967
paulduf Avatar asked Jan 26 '26 21:01

paulduf


1 Answers

I don't know enough about the underlying rust dynamics, but capturing df.columns before calling map_rows seems to work.

cols = df.columns
df.map_rows(lambda x: udf({k:v for k, v in zip(cols, x)}))

Moreover, you can simplify the creation of the dictionary by using the dict() constructor.

cols = df.columns
df.map_rows(lambda x: udf(dict(zip(cols, x))))
like image 60
Hericks Avatar answered Jan 28 '26 09:01

Hericks