Scalar-valued isnull()/isnan()/isinf()

Question

In Pandas and Numpy, there are vectorized functions like np.isnan, np.isinf, and pd.isnull to check if the elements of an array, series, or dataframe are various kinds of missing/null/invalid.

They do work on scalars. pd.isnull(None) simply returns True rather than pd.Series([True]), which is convenient.

But let's say I want to know if any object is one of these null values; You can't do that with any of these functions! That's because they will happily vectorize over a variety of data structures. Carelessly using them will inevitably lead to the dreaded "The truth value of a Series is ambiguous" error.

What I want is a function like this:

assert not is_scalar_null(3)
assert not is_scalar_null([1,2])
assert not is_scalar_null([None, 1])
assert not is_scalar_null(pd.Series([None, 1]))
assert not is_scalar_null(pd.Series([None, None]))
assert is_scalar_null(None)
assert is_scalar_null(np.nan)

Internally, the Pandas function pandas._lib.missing.checknull will do the right thing:

import pandas._libs.missing as libmissing
libmissing.checknull(pd.Series([1,2]))  # correctly returns False

But it's generally bad practice to use it; according to Python naming convention, _lib is private. I'm also not sure about the Numpy equivalents.

Is there an "acceptable" but official way to use the same null-checking logic as NumPy and Pandas, but strictly for scalars?

DeepSpace · Accepted Answer

All you have to do is wrap pd.isnull in a way that in case it gets an iterable it will be forced to check it element-wise. This way you will always get a scalar boolean as output.

from collections import Iterable

def is_scalar_null(value):
    if isinstance(value, Iterable):
        return all(not pd.isnull(v) for v in value)
    return not pd.isnull(value)

assert is_scalar_null(3)
assert is_scalar_null([1, 2])
assert is_scalar_null(pd.Series([1]))
assert not is_scalar_null(None)
assert not is_scalar_null(np.nan)
assert not is_scalar_null([np.nan, 1])
assert not is_scalar_null(pd.Series([np.nan, 1]))

You can then patch the actual pd.isnull, but I can not say that I suggest doing so.

from collections import Iterable

orig_pd_is_null = pd.isnull

def is_scalar_null(value):
    if isinstance(value, Iterable):
        return all(not orig_pd_is_null(v) for v in value)
    return not orig_pd_is_null(value)

pd.isnull = is_scalar_null

assert pd.isnull(3)
assert pd.isnull([1, 2])
assert pd.isnull(pd.Series([1]))
assert not pd.isnull(None)
assert not pd.isnull(np.nan)
assert not pd.isnull([np.nan, 1])
assert not pd.isnull(pd.Series([np.nan, 1]))

This approach will probably break in case of nested iterables, but that can be fixed by using recursion in is_scalar_null.

Scalar-valued isnull()/isnan()/isinf()

Tags:

python

pandas

missing-data

numpy

shadowtalker

1 Answers

DeepSpace

Recent Activity

Donate For Us

Scalar-valued isnull()/isnan()/isinf()

Tags:

python

pandas

missing-data

numpy

shadowtalker

1 Answers

DeepSpace

Related questions

Recent Activity

Donate For Us