My data has 500000 observations and 7 variables. I split the data, 80% as training data and 20% test data. I used caret to train the model. Codes are below.I started it and it was taking so much time and eventually I had to stop it. Just wondering is there anything wrong in my model or it usually takes long time for big data? Any suggestion?
library(caret)
set.seed(130000000)
classifier_rf <- train(y=train$active,
x=train[3:5],
data=train,
method='rf',
trControl=trainControl(method='repeatedcv',
number=10,
repeats=10))
Your best bet is probably to try parallelizing the process. For a useful resource click here.
From my understanding, caret
still uses RandomForest
function underneath, plus the cross validation/grid search part, so it would take a while.
For random forest model specifically, I usually just use ranger
package, and it's so much faster. You can find their manual here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With