Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to decide 'minsplit' using rpart in decision tree?

how can we specify the parameter 'minsplit=' using 'rpart' packages to perform decision tree.

rpart(myFormula, data=train, control=rpart.control(minsplit=10))

like image 860
jstmj2002 Avatar asked Sep 02 '25 15:09

jstmj2002


1 Answers

minsplit :- the minimum number of observations that must exist in a node in order for a split to be attempted. (https://stat.ethz.ch/R-manual/R-devel/library/rpart/html/rpart.control.html)

You can overwrite the minsplit control parameter by specifying a value of your own. But be aware that this could lead to an over fitting decision tree. For an example if you have very few data points that is not enough to create a tree with RPART's default parameters set; then you can adjust the value of minsplit, minbucket to create a tree.

You can decide the value after looking at you data set.

RPART's default values :- minsplit = 20, minbucket = round(minsplit/3)

tree <- rpart(outcome ~ .,method = "class",data = data,control =rpart.control(minsplit = 1,minbucket=1, cp=0))

like image 170
navo Avatar answered Sep 05 '25 03:09

navo