Output = df[df['TELF1'].isnull() | df['STCEG'].isnull() | df['STCE1'].isnull()]
This is my code I am checking here if a column contains nan value than only select that row. But here I have over 10 columns to do that. This will make my code huge. Is there any short or more pythonic way to do it.
df.dropna(subset=['STRAS','ORT01','LAND1','PSTLZ','STCD1','STCD2','STCEG','TELF1','BANKS','BANKL','BANKN','E-MailAddress'])
Is there any way to get the opposite of the above command.It will give me the same output what I was trying above but it was getting very long.
Using loc with a simple boolean filter should work:
df = pd.DataFrame(np.random.random((5,4)), columns=list('ABCD'))
subset = ['C', 'D']
df.at[0, 'C'] = None
df.at[4, 'D'] = None
>>> df
A B C D
0 0.985707 0.806581 NaN 0.373860
1 0.232316 0.321614 0.606824 0.439349
2 0.956236 0.169002 0.989045 0.118812
3 0.329509 0.644687 0.034827 0.637731
4 0.980271 0.001098 0.918052 NaN
>>> df.loc[df[subset].isnull().any(axis=1), :]
A B C D
0 0.985707 0.806581 NaN 0.37386
4 0.980271 0.001098 0.918052 NaN
df[subset].isnull() returns boolean values of whether or not any of the subset columns have a NaN.
>>> df[subset].isnull()
C D
0 True False
1 False False
2 False False
3 False False
4 False True
.any(axis=1) will return True if any value in the row (because axis=1, otherwise the column) is True.
>>> df[subset].isnull().any(axis=1)
0 True
1 False
2 False
3 False
4 True
dtype: bool
Finally, use loc (rows, columns) to locate rows that satisfy a boolean condition. The : symbol means to select everything, so it selects all columns for rows 0 and 4.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With