Consider the following gridsearch :grid = GridSearchCV(clf, parameters, n_jobs =-1, iid=True, cv =5)grid_fit = grid.fit(X_train1, y_train1)
According to Sklearn's ressource, grid_fit.best_score_
returns The mean cross-validated score of the best_estimator .
To me that would mean that the average of :
cross_val_score(grid_fit.best_estimator_, X_train1, y_train1, cv=5)
should be exactly the same as:
grid_fit.best_score_.
However I am getting a 10% difference between the two numbers. What am I missing ?
I am using the gridsearch on proprietary data so I am hoping somebody has run into something similar in the past and can guide me without a fully reproducible example. I will try to reproduce this with the Iris dataset if it's not clear enough...
So, Grid search is basically a brute forcing strategy in which you run the model with all possible hyperparameter combinations. With coss_val_score you don't perform the grid search (you don't use the strategy mentioned above with all predefined params), but you get the score after the cross-validation.
The “best” parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid. from sklearn.model_selection import GridSearchCV.
Does GridSearchCV use cross-validation? GridSearchCV does, in fact, do cross-validation. If I understand the notion correctly, you want to hide a portion of your data set from the model so that it may be tested. As a result, you train your models on training data and then test them on testing data.
when an integer number is passed to GridSearchCV(..., cv=int_number) parameter, then the StratifiedKFold will be used for cross-validation splitting. So the data set will be randomly splitted by StratifiedKFold. This might affect the accuracy and therefore the best score.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With