Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Know feature names after imputation

I run an sk-learn classifier on a pandas dataframe (X). Since some data is missing, I use sk-learn's imputer like this:

imp=Imputer(strategy='mean',axis=0)
X=imp.fit_transform(X)

After doing that however, my number of features is decreased, presumably because the imputer just gets rids of the empty columns.

That's fine, except that the imputer transforms my dataframe into a numpy ndarray, and thus I lose the column/feature names. I need them later on to identify the important features (with clf.feature_importances_).

How can I know the names of the features in clf.feature_importances_, if some of the columns of my initial dataframe have been dropped by the imputer?

like image 356
Alexis Eggermont Avatar asked Sep 08 '25 11:09

Alexis Eggermont


1 Answers

you can do this:

invalid_mask = np.isnan(imp.statistics_)
valid_mask = np.logical_not(invalid_mask)
valid_idx, = np.where(valid_mask)

Now you have old indexes (Indexes that these columns had in matrix X) for valid columns. You can get feature names by these indexes from list of feature names of old X.

like image 100
Ibraim Ganiev Avatar answered Sep 10 '25 23:09

Ibraim Ganiev