How does the RandomForestClassifier
of sklearn
handle a multilabel problem (under the hood)?
For example, does it brake the problem in distinct one-label problems?
Just to be clear, I have not really tested it yet but I see y : array-like, shape = [n_samples] or [n_samples, n_outputs]
at the .fit()
function of the RandomForestClassifier
.
Let me cite scikit-learn
. The user guide of random forest:
Like decision trees, forests of trees also extend to multi-output problems (if
Y
is an array of size[n_samples, n_outputs]
).
The section multi-output problems of the user guide of decision trees:
… to support multi-output problems. This requires the following changes:
- Store n output values in leaves, instead of 1;
- Use splitting criteria that compute the average reduction across all n outputs.
And I hope this will answer your question. If not, you can look at the section's reference:
I was a bit confused when I started using trees. If you refer to the sklearn doc:
https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier
If you go down on the methods to predict_proba, you can see: "The predicted class probability is the fraction of samples of the same class in a leaf."
So in predict, the class is the mode of the classes on that node. This can change if you use weighted classes
"class_weight : dict, list of dicts, “balanced” or None, default=None Weights associated with classes in the form {class_label: weight}. If not given, all classes are supposed to have weight one."
Hope this helps! :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With