How can I use the R randomForest package with observation weights? I know that there is no such option in this package. I have 2 questions:
Are there any solutions to this problem using randomForest package? At this moment I'm drawing samples from data with weights as the probability so I can at least simulate it:
m = dim(data)[1]
sample(data, m, replace=TRUE, prob=weights)
It works are there other (better) solutions?
Are there any alternatives to the randomForest package. I found the party package (cforest) but it's terrible in terms of memory management (or I cannot use it the way I use randomForest package). I have around 200k observations and 30-40 variables.
EDIT:
Sorry for not clarifying details. I'm using the randomForest package for regression problem (not classification). It is a time series and every observation has its weight. Later on this weight is used to determine the model performance across test observations. The y variable is continuous.
I was looking for the same option as you Pawel in the Random Forest. And I figured out the package "ranger" in R incorporates it in the function "ranger" (through the parameter "case.weights").
The package released in june 2016 so it is very young.
Best,
randomForest does have a "classwt" parameter that should allow you to account for differential sampling probabilities or even for differential costs. Admittedly it is ignored with regression Perhaps you should explain why you need to use weighting and what sort of y variable you are using.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With