Want I want to achieve
I wish to get unique rows in a 2d numpy array containing nan.
More generally I would like to obtain unique values according to an axis in a n-d numpy.ndarray
.
A reproducible example
import numpy as np
example = np.array([[0, np.nan],
[np.nan, 1],
[0, np.nan],
[np.nan, np.nan],
[np.nan, 1],
[np.nan, np.nan]])
What I wish as a result it:
array([[ 0., nan],
[nan, 1.],
[nan, nan]])
What I have try
I have tried using np.unique
but it won't work:
np.unique(example, axis=0)
Result is:
array([[ 0., nan],
[ 0., nan],
[nan, 1.],
[nan, 1.],
[nan, nan],
[nan, nan]])
So I have discovered that np.nan == np.nan
is False
... :/
I have thought of using np.allclose
which as an equal_nan
option. But re-implementing unique will not be efficient
NB: I want to use it in a large scale way. So it should be fast.
Does any function exist? Have I to code it? Any advice would be helpful.
Replace nan
with any value that is certainly not in the data, and np.unique
will just work:
import numpy as np
example = np.array([[0, np.nan],
[np.nan, 1],
[0, np.nan],
[np.nan, np.nan],
[np.nan, 1],
[np.nan, np.nan]])
# substitute nan with inf
example[np.isnan(example)] = np.inf
u = np.unique(example, axis=0)
# substitute inf with nan
u[u == np.inf] = np.nan
print(u)
# [[ 0. nan]
# [ nan 1.]
# [ nan nan]]
In the example I used inf
but any other value is fine. Just make sure it cannot occur in the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With