Difference between MultiOutputRegressor(RandomForestRegressor()) versus RandomForestRegressor() when predicting multiple outputs?

Question

It's not clear to me why some resources online demonstrate a multi-target Random Forest regression as being instantiated as either

model = MultiOutputRegressor(RandomForestRegressor())

versus:

model = RandomForestRegressor()

when both seemingly generate multiple regressed outputs. Can anyone clarify?

Ben Reiniger · Accepted Answer

The internal models are different, but they are both multioutput regressors.

MultiOutputRegressor fits one random forest for each target. Each tree inside then is predicting one of your outputs.

Without the wrapper, RandomForestRegressor fits trees targeting all the outputs at once. The split criteria are based on the average impurity reduction across the outputs. See the User Guide.

The latter may be better computationally, since fewer trees are being built. It can also make use of the fact that the several outputs for a given input may well be correlated. That's all discussed in the user guide as well.

Some conjecture on my part: On the other hand, if the several outputs for a given input are not correlated, internal splits that are good for one output may be lousy for other inputs, so simply averaging them might not work as well. I think in that case increasing the tree complexity can alleviate the issue (but will also take more computation).

Difference between MultiOutputRegressor(RandomForestRegressor()) versus RandomForestRegressor() when predicting multiple outputs?

Tags:

python

scikit-learn

regression

random-forest

jane

1 Answers

Ben Reiniger

Recent Activity

Donate For Us

Difference between MultiOutputRegressor(RandomForestRegressor()) versus RandomForestRegressor() when predicting multiple outputs?

Tags:

python

scikit-learn

regression

random-forest

jane

1 Answers

Ben Reiniger

Related questions

Recent Activity

Donate For Us