Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get length of lists inside a struct using Polars expressions

In Python Polars, I am trying to extract the length of the lists inside a struct to re-use it in an expression.

For example, I have the code below:

import polars as pl


df = pl.DataFrame(
    {
        "x": [0, 4],
        "y": [
            {"low": [-1, 0, 1], "up": [1, 2, 3]},
            {"low": [-2, -1, 0], "up": [0, 1, 2]},
        ],
    }
)

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(3)]).list.max()
)

shape: (2, 3)
┌─────┬─────────────────────────┬───────┐
│ x   ┆ y                       ┆ check │
│ --- ┆ ---                     ┆ ---   │
│ i64 ┆ struct[2]               ┆ bool  │
╞═════╪═════════════════════════╪═══════╡
│ 0   ┆ {[-1, 0, 1],[1, 2, 3]}  ┆ true  │
│ 4   ┆ {[-2, -1, 0],[0, 1, 2]} ┆ false │
└─────┴─────────────────────────┴───────┘

and I would like to infer the length of the lists in advance (i.e. not having to hardcode the 3), as it can change depending on the call.

The challenge I am facing, is that I need to include everything in the same expression context. I have tried as below, but it is not working as I cannot extract the value returned by one of the expressions:

df.with_columns(
    check=pl.concat_list([pl.all_horizontal(
        [
            pl.col("x").ge(pl.col("y").struct["low"].list.get(i)),
            pl.col("x").le(pl.col("y").struct["up"].list.get(i)),
        ]
    ) for i in range(pl.col("y").struct["low"].list.len())]).list.max()
)
like image 794
yz_jc Avatar asked Dec 08 '25 16:12

yz_jc


1 Answers

Unfortunately, I don't see a way to use an expression for the list length here. Also, direct comparisons of list columns are not yet natively supported.

Still, some on-the-fly exploding and imploding of the list columns could be used to achieve the desired result without relying on knowing the list lengths upfront.

(
    df
    .with_columns(
        ge_low=(pl.col("x") >= pl.col("y").struct["low"].explode()).implode().over(pl.int_range(pl.len())),
        le_up=(pl.col("x") <= pl.col("y").struct["up"].explode()).implode().over(pl.int_range(pl.len())),
    )
    .with_columns(
        check=(pl.col("ge_low").explode() & pl.col("le_up").explode()).implode().over(pl.int_range(pl.len()))
    )
)
shape: (2, 5)
┌─────┬─────────────────────────┬─────────────────────┬───────────────────────┬───────────────────────┐
│ x   ┆ y                       ┆ ge_low              ┆ le_up                 ┆ check                 │
│ --- ┆ ---                     ┆ ---                 ┆ ---                   ┆ ---                   │
│ i64 ┆ struct[2]               ┆ list[bool]          ┆ list[bool]            ┆ list[bool]            │
╞═════╪═════════════════════════╪═════════════════════╪═══════════════════════╪═══════════════════════╡
│ 0   ┆ {[-1, 0, 1],[1, 2, 3]}  ┆ [true, true, false] ┆ [true, true, true]    ┆ [true, true, false]   │
│ 4   ┆ {[-2, -1, 0],[0, 1, 2]} ┆ [true, true, true]  ┆ [false, false, false] ┆ [false, false, false] │
└─────┴─────────────────────────┴─────────────────────┴───────────────────────┴───────────────────────┘
like image 149
Hericks Avatar answered Dec 11 '25 06:12

Hericks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!