Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I write a query like (A or B) and C in Polars?

I expected either a or b would be 0.0 (not NaN) and c would always be 0.0. The Polars documentation said to use | as "or" and & as "and". I believe I have the logic right: (((a not Nan) or (b not Nan)) and (c not NaN))

However, the output is wrong.

import polars as pl
import numpy as np

df = pl.DataFrame(
    data={
        "a": [0.0, 0.0,    0.0,    0.0,    np.nan, np.nan, np.nan],
        "b": [0.0, 0.0,    np.nan, np.nan, 0.0,    0.0,    np.nan],
        "c": [0.0, np.nan, 0.0,    np.nan, 0.0,    np.nan, np.nan]
    }
)

df.with_columns(
    ((pl.col('a').is_not_nan() | pl.col('b').is_not_nan())
     & pl.col('c').is_not_nan()).alias('Keep'))
df_actual = df.filter(pl.col("Keep") is True)

print("df\n", df)
print("df_expect\n", df_expect)
print("df_actual\n", df_actual)

df

 shape: (7, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ NaN │
│ 0.0 ┆ NaN ┆ 0.0 │
│ 0.0 ┆ NaN ┆ NaN │
│ NaN ┆ 0.0 ┆ 0.0 │
│ NaN ┆ 0.0 ┆ NaN │
│ NaN ┆ NaN ┆ NaN │
└─────┴─────┴─────┘

df_expect

 shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘

df_actual

 shape: (0, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
└─────┴─────┴─────┘
like image 496
Steve Maguire Avatar asked Nov 15 '25 22:11

Steve Maguire


1 Answers

The logic looks fine.

One issue is that Polars operations are not "in-place". (apart from some niche methods)

.with_columns() returns a new frame - which you are not using.

Another issue is the usage of is with Expr objects.

>>> type(pl.col("Keep"))
polars.expr.expr.Expr
>>> pl.col("Keep") is True
False

You end up running .filter(False) - hence the result of 0 rows.

If you add the column:

df_actual = df.with_columns(
    ((pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
     & pl.col("c").is_not_nan()).alias("Keep")
)

You can just pass the name (or pl.col) directly.

df_actual = df_actual.filter("Keep")

You could also chain the calls e.g. df.with_columns().filter()

Or you can filter the predicates directly.

df_actual = df.filter(
    (pl.col("a").is_not_nan() | pl.col("b").is_not_nan())
     & pl.col("a").is_not_nan()
)
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╡
│ 0.0 ┆ 0.0 ┆ 0.0 │
│ 0.0 ┆ NaN ┆ 0.0 │
│ NaN ┆ 0.0 ┆ 0.0 │
└─────┴─────┴─────┘
like image 147
jqurious Avatar answered Nov 17 '25 16:11

jqurious



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!