Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find missing values in NumPy array of dtype obj

I'm being driven crazy by a NumPy array of dtype obj with a missing value (in the example below, it is the penultimate value).

>> a
array([0, 3, 'Braund, Mr. Owen Harris', 'male', 22.0, 1, 0, 'A/5 21171',
       7.25, nan, 'S'], dtype=object)

I want to find this missing value programatically with a function that returns a boolean vector with True values in elements that correspond to missing values in the array (as per the example below).

>> some_function(a)
array([False, False, False, False, False, False, False, False, False, True, False],
      dtype=bool)

I tried isnan to no avail.

>> isnan(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not
be safely coerced to any supported types according to the casting rule ''safe''

I also attempted performing the operation explicitly over every element of the array with apply_along_axis, but the same error is returned.

>> apply_along_axis(isnan, 0, a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not
be safely coerced to any supported types according to the casting rule ''safe''

Can anyone explain to me (1) what I'm doing wrong and (2) what I can do to solve this problem? From the error, I gather that it has to do with one of the elements not being in an appropriate type. What is the easiest way to get around this issue?

like image 292
Gyan Veda Avatar asked Sep 20 '25 07:09

Gyan Veda


1 Answers

Another workaround is:

In [148]: [item != item for item in a]
Out[148]: [False, False, False, False, False, False, False, False, False, True, False]

since NaNs are not equal to themselves. Note, however, that it is possible to define custom objects which, like NaN, are not equal to themselves:

class Foo(object):
    def __cmp__(self, obj):
        return -1
foo = Foo()
assert foo != foo

so using item != item does not necessarily mean item is a NaN.


Note that it is generally a good idea to avoid NumPy arrays of dtype object if possible.

  • They are not particularly quick -- operations on its contents generally devolve into Python calls on the underlying Python objects. A normal Python list often has better performance.
  • Unlike numeric arrays which can be more space efficient than a Python list of numbers, object arrays are not particularly space efficient since every item is a reference to a Python object.
  • They are also not particular convenient since many NumPy operations do not work on arrays of dtype object. isnan is one such example.
like image 160
unutbu Avatar answered Sep 21 '25 22:09

unutbu