I am doing some classification tasks on heart disease dataset using C5.0 in R, in most common case the data will be divided into 80% for training, and 20% for testing, I want to use k-fold cross validation (k=10), but I am confused about this point, as we know by using 10-fold cross validation, we will divide the whole data into 9 subsets for train and one subset for the test.
Is it possible to divide the data into 80% for training and 20% for testing and then applying k-fold cross-validation on train data? or I have to apply k-fold cross-validation on the whole data set?
One option would be k=5. In this case you train with 80% and test with 20%. But for that you don't need to use k-fold cross-validation.
k-fold cross-validation is always on the whole data set. So with k=5 there are 5 possible scenarios that are tested and compared.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With