iw ould like to get a dataframe of important features. With the code below i have got the shap_values and i am not sure, what do the values mean. In my df are 142 features and 67 experiments, but got an array with ca. 2500 values.
explainer = shap.TreeExplainer(rf)
shap_values = explainer.shap_values(X_test)
shap.summary_plot(shap_values, X_test, plot_type="bar")

I have tried to store them in a df:
rf_resultX = pd.DataFrame(shap_values, columns = ['shap_values'])
but got: ValueError: Shape of passed values is (18, 142), indices imply (18, 1)
142 - the number of the features. 18 - i have no idea.
I believe it works as follows:
Does anybody have an experience, how to interpret shap_values? At first i thought, that the number of values are the number of features x number of rows.
Combining the other two answers like this worked for me.
feature_names = X_train.columns
rf_resultX = pd.DataFrame(shap_values, columns = feature_names)
vals = np.abs(rf_resultX.values).mean(0)
shap_importance = pd.DataFrame(list(zip(feature_names, vals)),
columns=['col_name','feature_importance_vals'])
shap_importance.sort_values(by=['feature_importance_vals'],
ascending=False, inplace=True)
shap_importance.head()
From https://github.com/slundberg/shap/issues/632
vals = np.abs(shap_values.values).mean(0) feature_names = train_x.columns() feature_importance = pd.DataFrame(list(zip(feature_names, vals)), columns=['col_name','feature_importance_vals']) feature_importance.sort_values(by=['feature_importance_vals'], ascending=False, inplace=True) feature_importance.head()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With