In Pandas and Numpy, there are vectorized functions like np.isnan, np.isinf, and pd.isnull to check if the elements of an array, series, or dataframe are various kinds of missing/null/invalid.
They do work on scalars. pd.isnull(None) simply returns True rather than pd.Series([True]), which is convenient.
But let's say I want to know if any object is one of these null values; You can't do that with any of these functions! That's because they will happily vectorize over a variety of data structures. Carelessly using them will inevitably lead to the dreaded "The truth value of a Series is ambiguous" error.
What I want is a function like this:
assert not is_scalar_null(3)
assert not is_scalar_null([1,2])
assert not is_scalar_null([None, 1])
assert not is_scalar_null(pd.Series([None, 1]))
assert not is_scalar_null(pd.Series([None, None]))
assert is_scalar_null(None)
assert is_scalar_null(np.nan)
Internally, the Pandas function pandas._lib.missing.checknull will do the right thing:
import pandas._libs.missing as libmissing
libmissing.checknull(pd.Series([1,2])) # correctly returns False
But it's generally bad practice to use it; according to Python naming convention, _lib is private. I'm also not sure about the Numpy equivalents.
Is there an "acceptable" but official way to use the same null-checking logic as NumPy and Pandas, but strictly for scalars?
All you have to do is wrap pd.isnull in a way that in case it gets an iterable it will be forced to check it element-wise. This way you will always get a scalar boolean as output.
from collections import Iterable
def is_scalar_null(value):
if isinstance(value, Iterable):
return all(not pd.isnull(v) for v in value)
return not pd.isnull(value)
assert is_scalar_null(3)
assert is_scalar_null([1, 2])
assert is_scalar_null(pd.Series([1]))
assert not is_scalar_null(None)
assert not is_scalar_null(np.nan)
assert not is_scalar_null([np.nan, 1])
assert not is_scalar_null(pd.Series([np.nan, 1]))
You can then patch the actual pd.isnull, but I can not say that I suggest doing so.
from collections import Iterable
orig_pd_is_null = pd.isnull
def is_scalar_null(value):
if isinstance(value, Iterable):
return all(not orig_pd_is_null(v) for v in value)
return not orig_pd_is_null(value)
pd.isnull = is_scalar_null
assert pd.isnull(3)
assert pd.isnull([1, 2])
assert pd.isnull(pd.Series([1]))
assert not pd.isnull(None)
assert not pd.isnull(np.nan)
assert not pd.isnull([np.nan, 1])
assert not pd.isnull(pd.Series([np.nan, 1]))
This approach will probably break in case of nested iterables, but that can be fixed by using recursion in is_scalar_null.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With