How should I check more than 10 columns for nan values and select those rows having nan values, ie keepna() instead of dropna()

Question

Output = df[df['TELF1'].isnull() | df['STCEG'].isnull() | df['STCE1'].isnull()]

This is my code I am checking here if a column contains nan value than only select that row. But here I have over 10 columns to do that. This will make my code huge. Is there any short or more pythonic way to do it.

df.dropna(subset=['STRAS','ORT01','LAND1','PSTLZ','STCD1','STCD2','STCEG','TELF1','BANKS','BANKL','BANKN','E-MailAddress'])

Is there any way to get the opposite of the above command.It will give me the same output what I was trying above but it was getting very long.

Alexander · Accepted Answer

Using loc with a simple boolean filter should work:

df = pd.DataFrame(np.random.random((5,4)), columns=list('ABCD'))
subset = ['C', 'D']
df.at[0, 'C'] = None
df.at[4, 'D'] = None
>>> df
          A         B         C         D
0  0.985707  0.806581       NaN  0.373860
1  0.232316  0.321614  0.606824  0.439349
2  0.956236  0.169002  0.989045  0.118812
3  0.329509  0.644687  0.034827  0.637731
4  0.980271  0.001098  0.918052       NaN

>>> df.loc[df[subset].isnull().any(axis=1), :]
          A         B         C        D
0  0.985707  0.806581       NaN  0.37386
4  0.980271  0.001098  0.918052      NaN

df[subset].isnull() returns boolean values of whether or not any of the subset columns have a NaN.

>>> df[subset].isnull()
       C      D
0   True  False
1  False  False
2  False  False
3  False  False
4  False   True

.any(axis=1) will return True if any value in the row (because axis=1, otherwise the column) is True.

>>> df[subset].isnull().any(axis=1)
0     True
1    False
2    False
3    False
4     True
dtype: bool

Finally, use loc (rows, columns) to locate rows that satisfy a boolean condition. The : symbol means to select everything, so it selects all columns for rows 0 and 4.

How should I check more than 10 columns for nan values and select those rows having nan values, ie keepna() instead of dropna()

Tags:

pandas

python-2.7

Rahul Shrivastava

1 Answers

Alexander

Recent Activity

Donate For Us

How should I check more than 10 columns for nan values and select those rows having nan values, ie keepna() instead of dropna()

Tags:

pandas

python-2.7

Rahul Shrivastava

1 Answers

Alexander

Related questions

Recent Activity

Donate For Us