sklearn imputer drop column with missing values

Question

I am learning currently about sklearn imputer and I found that there is one strategy that isn't implemented by the imputers.

I would like to build a pipeline that deletes the columns with any missing values or delete all the rows with missing values.

Why do I want this?

Because I would like to do a grid search and find the defect of any imputing method on my RMSE or classification score.

Is there a way I can do this with sklearn pipeline? Or should I create my own imputer?

If this has been asked before, feel free to suggest closing the question and pointing me out to the correct resource.

For more context, I have 21 features and 1000 data points, only one column has missing values and those missing values are 50% of the values in the columns. I just want to explore the effect of the missing value imputation method on my classifier's accuracy and f1 score.

user4718221 · Accepted Answer

I would suggest using autoimpute library. It's probably the best tool currently to deal with datasets that have missing values.

It has a function that does exactly what you asked, deletes rows with any missing values.

from autoimpute.imputations import MiceImputer, SingleImputer, listwise_delete

listwise_delete(df, inplace=True, verbose=False)

In general, sklearn's imputer is very limited in its usefulness and autoimpute is able to fill a lot of gaps. More specifically, it allows to:

Explicitly set columns that you would like to treat as variables in calculating the imputed values
Set different imputation algorithms for every column or a set of columns

si_dict_col = SingleImputer(
    strategy={"gender":"categorical", "salary": "pmm", "weight": "pmm"},
    predictors={"gender": ["salary", "weight", "looks"], "salary": ["weight", "gender"])

There are built-in methods to visualize different imputation method's results

plot_imp_scatter(data_het_miss, "x", "y", "least squares")

It also follows sklearn's patterns and can be substituted for sklearn's own imputer function in the pipeline.

sklearn imputer drop column with missing values

Tags:

python

imputation

scikit-learn

Espoir Murhabazi

1 Answers

user4718221

Recent Activity

Donate For Us

sklearn imputer drop column with missing values

Tags:

python

imputation

scikit-learn

Espoir Murhabazi

1 Answers

user4718221

Related questions

Recent Activity

Donate For Us