Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between MultiOutputRegressor(RandomForestRegressor()) versus RandomForestRegressor() when predicting multiple outputs?

It's not clear to me why some resources online demonstrate a multi-target Random Forest regression as being instantiated as either

model = MultiOutputRegressor(RandomForestRegressor())

versus:

model = RandomForestRegressor()

when both seemingly generate multiple regressed outputs. Can anyone clarify?

like image 603
jane Avatar asked Oct 18 '25 12:10

jane


1 Answers

The internal models are different, but they are both multioutput regressors.

MultiOutputRegressor fits one random forest for each target. Each tree inside then is predicting one of your outputs.

Without the wrapper, RandomForestRegressor fits trees targeting all the outputs at once. The split criteria are based on the average impurity reduction across the outputs. See the User Guide.

The latter may be better computationally, since fewer trees are being built. It can also make use of the fact that the several outputs for a given input may well be correlated. That's all discussed in the user guide as well.

Some conjecture on my part: On the other hand, if the several outputs for a given input are not correlated, internal splits that are good for one output may be lousy for other inputs, so simply averaging them might not work as well. I think in that case increasing the tree complexity can alleviate the issue (but will also take more computation).

like image 185
Ben Reiniger Avatar answered Oct 20 '25 02:10

Ben Reiniger



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!