Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Polars.filter() with null data [duplicate]

I have a dataframe and need to filter for all rows with received no equal to "qty".

df = pl.DataFrame({
    'doc_n': ['1111', '2222', '3333'],
    'received': ['qty', '6.0', None],
})

However, after applying the following filter

df = df.filter(pl.col('received') != 'qty')

only the row ['6.0', '2222'] remains. Especially, polars filtered out the row with null value as well.

How can I apply a filter while leaving null values? The expected outcome has 2 rows (['6.0', '2222'] and [None, '3333']).

like image 633
Masik Avatar asked Nov 14 '25 10:11

Masik


1 Answers

To understand this behavior, see this related Github Issue and the comment. In short, Null values are propagated through comparison operators.

>>> import polars as pl
>>> pl.Series([1, 2, None]) != pl.Series([3, 4, 5])
shape: (3,)
Series: '' [bool]
[
    true
    true
    null <-- HERE
]

To get the expected outcome, you can use ne_missing

>>> df.filter(pl.col('received').ne_missing('qty'))
shape: (2, 2)
┌───────┬──────────┐
│ doc_n ┆ received │
│ ---   ┆ ---      │
│ str   ┆ str      │
╞═══════╪══════════╡
│ 2222  ┆ 6.0      │
│ 3333  ┆ null     │
└───────┴──────────┘
like image 126
Abdul Niyas P M Avatar answered Nov 17 '25 09:11

Abdul Niyas P M



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!