I am trying to clean a dataset and basically get rid of all the features which have a certain amount of empty values, in more than 100 empty values inclusive, with pandas/python. I am using the following command
train.isnull().sum()>=100
which gets me:
Id False
Feature 1 False
Feature 2 False
Feature 3 True
Feature 4 False
Feature 5 True
I would like to return a new dataframe without the features 3 and 4.
Thank you.
in your case, just run:
train[train.columns[train.isnull().sum()<100]]
Full example:
import pandas as pd
df = pd.DataFrame([[1,None,2],[3,4,None],[7,8,9]], columns = ['A','B','C'])
You'll get:
A B C 0 1 NaN 2.0 1 3 4.0 NaN 2 7 8.0 9.0
then running:
df.isnull().sum()
will result in null count:
A 0 B 1 C 1
then just select the wanted columns:
df.columns[df.isnull().sum()<100]
and filter your data frame:
df[ df.columns[df.isnull().sum()<100]]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With