Know feature names after imputation

Question

I run an sk-learn classifier on a pandas dataframe (X). Since some data is missing, I use sk-learn's imputer like this:

imp=Imputer(strategy='mean',axis=0)
X=imp.fit_transform(X)

After doing that however, my number of features is decreased, presumably because the imputer just gets rids of the empty columns.

That's fine, except that the imputer transforms my dataframe into a numpy ndarray, and thus I lose the column/feature names. I need them later on to identify the important features (with clf.feature_importances_).

How can I know the names of the features in clf.feature_importances_, if some of the columns of my initial dataframe have been dropped by the imputer?

Ibraim Ganiev · Accepted Answer

you can do this:

invalid_mask = np.isnan(imp.statistics_)
valid_mask = np.logical_not(invalid_mask)
valid_idx, = np.where(valid_mask)

Now you have old indexes (Indexes that these columns had in matrix X) for valid columns. You can get feature names by these indexes from list of feature names of old X.

Know feature names after imputation

Tags:

python

pandas

scikit-learn

Alexis Eggermont

1 Answers

Ibraim Ganiev

Recent Activity

Donate For Us

Know feature names after imputation

Tags:

python

pandas

scikit-learn

Alexis Eggermont

1 Answers

Ibraim Ganiev

Related questions

Recent Activity

Donate For Us