Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get list of column names with values > 0 for every row in polars

I want to add a column result to a polars DataFrame that contains a list of the column names with a value greater than zero at that position.

So given this:

import polars as pl

df = pl.DataFrame({"apple": [1, 0, 2, 0], "banana": [1, 0, 0, 1]})
cols = ["apple", "banana"]

How do I get:

shape: (4, 3)
┌───────┬────────┬─────────────────────┐
│ apple ┆ banana ┆ result              │
│ ---   ┆ ---    ┆ ---                 │
│ i64   ┆ i64    ┆ list[str]           │
╞═══════╪════════╪═════════════════════╡
│ 1     ┆ 1      ┆ ["apple", "banana"] │
│ 0     ┆ 0      ┆ []                  │
│ 2     ┆ 0      ┆ ["apple"]           │
│ 0     ┆ 1      ┆ ["banana"]          │
└───────┴────────┴─────────────────────┘

All I have so far is the truth values:

df.with_columns(pl.concat_list(pl.col(cols).gt(0)).alias("result"))

shape: (4, 3)
┌───────┬────────┬────────────────┐
│ apple ┆ banana ┆ result         │
│ ---   ┆ ---    ┆ ---            │
│ i64   ┆ i64    ┆ list[bool]     │
╞═══════╪════════╪════════════════╡
│ 1     ┆ 1      ┆ [true, true]   │
│ 0     ┆ 0      ┆ [false, false] │
│ 2     ┆ 0      ┆ [true, false]  │
│ 0     ┆ 1      ┆ [false, true]  │
└───────┴────────┴────────────────┘
like image 781
spettekaka Avatar asked Oct 29 '25 14:10

spettekaka


2 Answers

Here's one way: you can use pl.when with pl.lit in the concat_list to get either the literal column names or nulls, then do a list.drop_nulls:

df.with_columns(
    result=pl.concat_list(
        pl.when(pl.col(col) > 0).then(pl.lit(col)) for col in df.columns
    ).list.drop_nulls()
)
shape: (4, 3)
┌───────┬────────┬─────────────────────┐
│ apple ┆ banana ┆ result              │
│ ---   ┆ ---    ┆ ---                 │
│ i64   ┆ i64    ┆ list[str]           │
╞═══════╪════════╪═════════════════════╡
│ 1     ┆ 1      ┆ ["apple", "banana"] │
│ 0     ┆ 0      ┆ []                  │
│ 2     ┆ 0      ┆ ["apple"]           │
│ 0     ┆ 1      ┆ ["banana"]          │
└───────┴────────┴─────────────────────┘
like image 160
Wayoshi Avatar answered Oct 31 '25 03:10

Wayoshi


Here's another way using unpivot to move the column names into the data

cols = ["apple", "banana"]
(
    df
    .with_row_index('i')
    .unpivot(index='i')
    .group_by('i',maintain_order=True)
    .agg(
        # this dict restores the original columns from before unpivot
        **{x:pl.col('value').filter(pl.col('variable')==x).first() 
                for x in df.columns}, 
        result=pl.col('variable')
                .filter(
                    (pl.col('value')>0) & 
                    (pl.col('variable').is_in(cols)) 
                    # the 2nd filter is in case there are columns 
                    # you don't want in the result, 
                    # if you want all the columns, as in the example,
                    # then this isn't necessary
                    )
        )
    .drop('i')
)

Original answer:

df.with_row_index('i').join(
    df
        .with_row_index('i')
        .unpivot(index='i')
        .filter(pl.col('value')>0)
        .group_by('i')
        .agg(result=pl.col('variable')),
    on='i', how='left'
).drop('i').with_columns(pl.col('result').fill_null([]))


shape: (4, 3)
┌───────┬────────┬─────────────────────┐
│ apple ┆ banana ┆ result              │
│ ---   ┆ ---    ┆ ---                 │
│ i64   ┆ i64    ┆ list[str]           │
╞═══════╪════════╪═════════════════════╡
│ 1     ┆ 1      ┆ ["apple", "banana"] │
│ 0     ┆ 0      ┆ []                  │
│ 2     ┆ 0      ┆ ["apple"]           │
│ 0     ┆ 1      ┆ ["banana"]          │
└───────┴────────┴─────────────────────┘

I've seen in other scripts that pl.element inside .list.eval can be surprisingly slow so this might be worth a shot even though I'll concede it isn't particularly nice to look at.

Performance:

Setup:

n=10_000_000
df=pl.DataFrame({'apple':np.random.randint(0,5,n),
                 'banana':np.random.randint(0,5,n)})

My original method took 21.4 s ± 1.36 s per loop

My new method took 14.1 s ± 296 ms per loop

@Wayoshi's method took 20.9 s ± 606 ms per loop

like image 25
Dean MacGregor Avatar answered Oct 31 '25 04:10

Dean MacGregor



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!