I have a dataframe and need to filter for all rows with received
no equal to "qty".
df = pl.DataFrame({
'doc_n': ['1111', '2222', '3333'],
'received': ['qty', '6.0', None],
})
However, after applying the following filter
df = df.filter(pl.col('received') != 'qty')
only the row ['6.0', '2222'] remains. Especially, polars filtered out the row with null value as well.
How can I apply a filter while leaving null values? The expected outcome has 2 rows (['6.0', '2222'] and [None, '3333']).
To understand this behavior, see this related Github Issue and the comment.
In short, Null values are propagated through comparison operators.
>>> import polars as pl
>>> pl.Series([1, 2, None]) != pl.Series([3, 4, 5])
shape: (3,)
Series: '' [bool]
[
true
true
null <-- HERE
]
To get the expected outcome, you can use ne_missing
>>> df.filter(pl.col('received').ne_missing('qty'))
shape: (2, 2)
┌───────┬──────────┐
│ doc_n ┆ received │
│ --- ┆ --- │
│ str ┆ str │
╞═══════╪══════════╡
│ 2222 ┆ 6.0 │
│ 3333 ┆ null │
└───────┴──────────┘
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With