Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I iterate over all columns using pl.all() in Polars?

I've written a custom function in Polars to generate a horizontal forward/backward fill list of expressions. The function accepts an iterable of expressions (or column names) to determine the order of filling. I want to to use all columns via pl.all() as default. The problem is that pl.all() returns a single expression rather than an iterable, so trying to reverse or iterate over it leads to a TypeError.

Is there a way to convert between single expressions and iterables of expressions? Any suggestions or workarounds are greatly appreciated!

Here is the function:

from typing import Iterable
from polars._typing import IntoExpr
import polars as pl

def fill_horizontal(exprs: Iterable[IntoExpr], forward: bool = True) -> list[pl.Expr]:
    """Generate a horizontal forward/backward fill list of expressions."""
    # exprs = exprs or pl.all()  # use all columns as default
    cols = [col for col in reversed(exprs)] if forward else exprs
    return [pl.coalesce(cols[i:]) for i in range(0, len(cols) - 1)]

Here is an example:

df = pl.DataFrame({
    "col1": [1, None, 2],
    "col2": [1, 2, None],
    "col3": [None, None, 3]})
print(df)
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ null │
# │ null ┆ 2    ┆ null │
# │ 2    ┆ null ┆ 3    │
# └──────┴──────┴──────┘
print('forward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=True)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ 1    │
# │ null ┆ 2    ┆ 2    │
# │ 2    ┆ 2    ┆ 3    │
# └──────┴──────┴──────┘
print('backward_fill')
print(df.with_columns(fill_horizontal(df.columns, forward=False)))
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ null │
# │ 2    ┆ 2    ┆ null │
# │ 2    ┆ 3    ┆ 3    │
# └──────┴──────┴──────┘

Edit: Merging @Henry Harbeck's answer and @jqurious's comment seems to be not perfect but a sufficient solution as of now.

def fill_horizontal(
        exprs: Iterable[IntoExpr] | None = None,
        *,
        forward: bool = True,
        ncols: int = 1000) -> pl.Expr:
    """Generate a horizontal forward/backward fill expression."""
    if exprs is None:
        # if forward is false, ncols has to be defined with the present number of cols or more
        cols = pl.all() if forward else pl.nth(range(ncols, -1, -1))
    else:
        cols = exprs if forward else reversed(exprs)
    return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()
like image 475
Olibarer Avatar asked Oct 25 '25 06:10

Olibarer


1 Answers

Check out cum_reduce, which does a cumulative horizontal reduction. This is pretty much what you are after and saves you having to do any Python looping.

Unfortunately, it reduces from left to right only. I've made this feature request to ask for right to left reductions, which should fully enable your use-case.

Here's a tweaked version of your function that works in a cases except pl.all() and forward=False

def fill_horizontal(
    exprs: Iterable[IntoExpr] | None = None,
    *,
    forward: bool = True
) -> pl.Expr:
    """Generate a horizontal forward/backward fill list of expressions."""
    exprs = exprs or [pl.all()]  # use all columns as default
    # Doesn't do anything for pl.all() - columns still remain in their original order 
    cols = exprs if forward else reversed(exprs)
    return pl.cum_reduce(lambda s1, s2: pl.coalesce(s2, s1), cols).struct.unnest()

df.with_columns(fill_horizontal())
# shape: (3, 3)
# ┌──────┬──────┬──────┐
# │ col1 ┆ col2 ┆ col3 │
# │ ---  ┆ ---  ┆ ---  │
# │ i64  ┆ i64  ┆ i64  │
# ╞══════╪══════╪══════╡
# │ 1    ┆ 1    ┆ 1    │
# │ null ┆ 2    ┆ 2    │
# │ 2    ┆ 2    ┆ 3    │
# └──────┴──────┴──────┘

# Doesn't work :(
df.with_columns(fill_horizontal(forward=False))

# Works as a backward fill
df.with_columns(fill_horizontal(df.columns, forward=False))

Other options I can think of are:

  • make this a DataFrame / LazyFrame level function. You can pipe the frame in and access the schema directly. You can then reversthe columns without needing to expose this to the caller. This may block some optimisations / lazy evaluation
  • make a feature request to reverse the column order of multi-column expressions such as pl.all() and pl.col("a", "b")
  • upvote the feature request I've linked, and force the caller of your function to use df.columns until it hopefully gets implemented
like image 81
Henry Harbeck Avatar answered Oct 26 '25 19:10

Henry Harbeck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!