I am currently playing with Kaggle Titanic dataset (train.csv)
Embarked column has nan value. But when I tried to filter it using the following code, I am getting an empty array import pandas as pd
df = df.read_csv(<file_loc>, header=0)
df[df.Embarked == 'nan']
I tried to import numpy.nan to replace the string nan above. But it doesn't work.
What am I trying to find - is all the cells which are not 'S', 'C', 'Q'.
Also realised later that.... the nan is a Float type using type(df.Embarked.unique()[-1]). Could someone help me understand how to identify those nan cells?
NaN is used to represent missing values.
.isna()
Detect missing values.
.fillna(value)
Fill NA/NaN values
Some examples on a series called col:
>>> col
0 1.0
1 NaN
2 2.0
dtype: float64
>>> col[col.isna()]
1 NaN
dtype: float64
>>> col.index[col.isna()]
Int64Index([1], dtype='int64')
>>> col.fillna(-1)
0 1.0
1 -1.0
2 2.0
dtype: float64
Note that you can’t compare equality with nan as by definition it’s not equal to anything, not even itself:
>>> np.nan == np.nan
False
This is likely the property that is used to identify nan under the hood:
>>> col != col
0 False
1 True
2 False
dtype: bool
But it’s better (more readable) to use the pandas functions than to test for inequality yourself.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With