Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do if-else inside list eval expression?

I want to perform an if-else transformation for elements in a List series. I use pl.when inside list.eval but encounter a warning message.

I have an DataFrame containing a List series, in which the lengths of each row are different:

In [2]: df = pl.DataFrame({"Tokens": [["a", "b", "c"], ["a"], ["unknown"]]})

In [3]: df
Out[3]:
shape: (3, 1)
┌─────────────────┐
│ Tokens          │
│ ---             │
│ list[str]       │
╞═════════════════╡
│ ["a", "b", "c"] │
│ ["a"]           │
│ ["unknown"]     │
└─────────────────┘

Now I want to perform a if-else transformation on each elements in the List series. More specifically: lambda token: -1 if token == 'unknown' else hash(token)

I try to use pl.when inside list.eval expression. It works but raises such warning: The predicate '[(col("")) == (Utf8(__ANY__))]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the group_by operation would. This behavior is experimental and may be subject to change

In [12]: df.with_columns(pl.col("Tokens").list.eval(pl.when(pl.element() == 'unknown').then(pl.lit(0, dtype=pl.UInt64)).otherwise(pl.element().hash())))
The predicate '[(col("")) == (Utf8(unknown))]' in 'when->then->otherwise' is not a valid aggregation and might produce a different number of rows than the groupby operation would. This behavior is experimental and may be subject to change
Out[12]:
shape: (3, 1)
┌───────────────────────────────────┐
│ Tokens                            │
│ ---                               │
│ list[u64]                         │
╞═══════════════════════════════════╡
│ [1588745937650624681, 1558575890… │
│ [1588745937650624681]             │
│ [0]                               │
└───────────────────────────────────┘

What is the proper way to do this?

like image 384
Zeyan Li Avatar asked Oct 11 '25 23:10

Zeyan Li


1 Answers

Conditionals in polars

Per the docs, use the pl.when, then, otherwise syntax.

(See also: Column assignment based on predicate.)

import polars as pl
  
print(f"Polars version: {pl.__version__}\n") # NOTE: See polars docs.*

df = pl.DataFrame({"Tokens": [["a", "b", "c"], ["a"], ["unknown"]]})
print(df)

transform_expr = (
    pl.when(pl.element() == "unknown")
    .then(pl.lit(-1))
    .otherwise(pl.element().hash())
)

df = df.with_columns(
    pl.col("Tokens").list.eval(transform_expr).alias("Tokens")
)

print(df)

gives:

Polars version: 0.20.13

shape: (3, 1)
┌─────────────────┐
│ Tokens          │
│ ---             │
│ list[str]       │
╞═════════════════╡
│ ["a", "b", "c"] │
│ ["a"]           │
│ ["unknown"]     │
└─────────────────┘
shape: (3, 1)
┌───────────────────────────────────┐
│ Tokens                            │
│ ---                               │
│ list[f64]                         │
╞═══════════════════════════════════╡
│ [8.1448e17, 6.3145e15, 1.8296e19… │
│ [8.1448e17]                       │
│ [-1.0]                            │
└───────────────────────────────────┘

polars.Expr.hash: Hash values returned not guaranteed stable except within same version of polars.

like image 70
John Collins Avatar answered Oct 14 '25 11:10

John Collins