Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Make overfitting tree with maximum depth using ctree

Tags:

r

party

ctree

When plotting a ctree model from partykit, I understand that it choose a default to prevent overfitting with overgrown trees. This default value sometimes results in an overly simple tree. To use a post-pruning technique I want to make an overfitting tree, potentially full-grown, using ctree and then work on the pruning later. Try many different things but my code is getting an error.

This stack overflow answer on using all variables to make the tree is not what I want. I don't necessarily want all variables, but I want maximum depth for a tree to go as overgrown as possible.

Basically, how to have the tree go as many depths as possible?

See code and output below:

treemodel <- ctree(Species ~ ., iris)
plot(treemodel)

And I use the Help + documentation from the package but don't see a lot of options to customize this. Promising one is the control parameter, but the documentation isn't very detailed. From searching on other forums, I gave the following a try:

treemodel <- ctree(Species ~ ., iris, control=mincriterion)

I also tried:

treemodel <- ctree(Species ~ ., iris, control="mincriterion")

But both code throws an error. The error:

Error in if (sum(weights) < ctrl$minsplit) return(partynode(as.integer(id))) : argument is of length zero

I am using partykit 1.1-1 and r on mac os.

like image 727
jardim Avatar asked Oct 21 '25 04:10

jardim


1 Answers

ctree from partykit accepts a ctree_control parameter through the control argument that you can use to control aspects of the tree fit.

Doing control=mincriterion or control="mincriterion" is not correct and hence you get an error. control expects a list with control parameters, not a character value.

In particular, you want to pass into ctree_control the following:

  • mincriterion: Act as a "regulator" for the depth of the tree, smaller values result in larger trees; When mincriterion is 0.8, p-value must be smaller than 0.2 in order for a node to split
  • minsplit and minbucket: Set to 0 so the minimum criterion is always met and thus splitting never stop

From the package's author itself:

A split is implemented when the criterion exceeds the value given by mincriterion as specified in ctree_control. For example, when mincriterion = 0.95, the p-value must be smaller than 0.05 in order to split this node. This statistical approach ensures that the right-sized tree is grown without additional (post-)pruning or cross-validation

So with that, the final code using control=ctree_control():

diab_model <- ctree(diabetes ~ ., diab_train, control = ctree_control(mincriterion=0.005, minsplit=0, minbucket=0))
plot(diab_model)

The first line of code creates your decision tree by overriding the defaults, and the second line of code plots the ctree object. You'll get a fully grown tree with maximum depth. Experiment with the values of mincriterion, minsplit, and minbucket. They can also be treated as a hyperparameter. Here's the output of plot(diab_model)

enter image description here

like image 114
onlyphantom Avatar answered Oct 22 '25 18:10

onlyphantom



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!