Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Comparing sets that contain nan in python

Tags:

python

nan

set

I'm trying to compare two sets in python that contain nan but struggling to do so because {float('nan')} != {float('nan')}. For example:

s1 = {float('nan'), 1}
s2 = {float('nan'), 1, 2}

assert set.issubset(s1, s2)

And I get an assertion error. How can I handle this?

like image 612
user1507844 Avatar asked May 06 '26 01:05

user1507844


2 Answers

One approach: identity is tested before equality (see here in the docs, for example), so it'd work if you use the same nan:

>>> nan = float("nan")
>>> s1 = {nan, 1}
>>> s2 = {nan, 1, 2}
>>> set.issubset(s1, s2)
True

even though

>>> s1 = {float("nan"), 1}
>>> s2 = {float("nan"), 1, 2}
>>> set.issubset(s1, s2)
False

Working with nans is awkward enough that I'd try to avoid putting them in sets and switch to a different canonical form. But you could always just make sure it's the same one:

>>> def one_nan(x, nan=float("nan")):
...     return nan if math.isnan(x) else x
... 
>>> set.issubset(set(map(one_nan, s1)), set(map(one_nan, s2)))
True

or a thousand variants on the same. (I sometimes use x != x as a shortcut for nan-detection but it's probably a good idea to be explicit here.)

like image 90
DSM Avatar answered May 09 '26 02:05

DSM


You could also write a simple function for this. Note that float('nan') == float('nan') is False for nan; to check if any element is nan, we just have to compare it with itself.

def is_subset(s1, s2):
    no_nan_set = lambda s: {x for x in s if x == x}
    s1_nan, s2_nan = no_nan_set(s1), no_nan_set(s2)
    if s1_nan != s1 and s2_nan != s2:
        return s1_nan.issubset(s2_nan)
    elif s1_nan == s1 and s2_nan == s2:
        return s1.issubset(s2)
    else:
        return False

You can simplify the if-elif-else block

def is_subset(s1, s2):
    no_nan_set = lambda s: {x for x in s if x == x}
    s1_nan, s2_nan = no_nan_set(s1), no_nan_set(s2)
    return (s1_nan != s1 and s2_nan != s2 and s1_nan.issubset(s2_nan)) \
        or (s1_nan == s1 and s2_nan == s2 and s1.issubset(s2))

Note that if either of your set has two or more nans (because float('nan') != float('nan')), this will work correctly, and similarly it will work all right if the ids of the nans are different. And lastly, this will work even if you don't have the nans in one or both of your set.

like image 24
Anshul Goyal Avatar answered May 09 '26 01:05

Anshul Goyal



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!