I don't understand why I have a feature named BIAS in the contributing features.
I read the doc and I find
" In each column there are features and their weights. Intercept (bias) feature is shown as in the same table "
But I don't understand what intercepting bias mean here.
Thank you for your help :)
This is related to the way ELI5 computes the weights.
XGBoost outputs scores only for leaves (you can see it via booster.dump_model(…, with_stats=True)), so the XGBoost explainer implementation in ELI5 starts reconstructing pseudo leaves scores for every node across all the trees. These pseudo leaves scores are basically the average leaf score you would expect if stopping the tree at this node level, thus the average of all children leaves weighted by their cover in the training set.
This algorithm also applies to the root nodes of the trees, which are similarly assigned pseudo leaves scores. At the root node level, this score is the average score you may end up going through the tree. Summed across all the trees, this sum of all root nodes scores is the average score you may get going through all the trees (the one that will be applied a sigmoid to translate into a probability). This is what ELI5 puts into <BIAS>.
So you can understand <BIAS> as the expected average score output by the model, based on the distribution of the training set.
The <BIAS> will change if you modify your base_score parameter (for instance in the case of an imbalanced binary classification, you may change the default 0.5 to something closer to your target rate, and the <BIAS> should get closer to 0).
EDIT: maybe it's clearer with the visual explanation from this blog (baseline is equivalent to <BIAS>) https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With