When I try to run a RandomForestClassifier with Pipeline and param_grid:
pipeline = Pipeline([("scaler" , StandardScaler()),
("rf",RandomForestClassifier())])
from sklearn.model_selection import GridSearchCV
param_grid = {
'max_depth': [4, 5, 10],
'max_features': [2, 3],
'min_samples_leaf': [3, 4, 5],
'n_estimators': [100, 200, 300]
}
# initialize
grid_pipeline = GridSearchCV(pipeline,param_grid,n_jobs=-1, verbose=1, cv=3, scoring='f1')
# fit
grid_pipeline.fit(X_train,y_train)
grid_pipeline.best_params_
I get the following error:
ValueError: Invalid parameter max_depth for estimator Pipeline(memory=None,
steps=[('scaler',
StandardScaler(copy=True, with_mean=True, with_std=True)),
('rf',
RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
class_weight=None, criterion='gini',
max_depth=None, max_features='auto',
max_leaf_nodes=None, max_samples=None,
min_impurity_decrease=0.0,
min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0,
n_estimators=100, n_jobs=None,
oob_score=False, random_state=None,
verbose=0, warm_start=False))],
verbose=False). Check the list of available parameters with `estimator.get_params().keys()`.
Although I have reviewed the scikit learn documentation and several posts, I can't find the error in my code.
When you use a pipeline with GridSearchCV() you must include names in parameter keys. Just separate names from parameter names with a double underscore. In your case:
param_grid = {
'rf__max_depth': [4, 5, 10],
'rf__max_features': [2, 3],
'rf__min_samples_leaf': [3, 4, 5],
'rf__n_estimators': [100, 200, 300]
}
Example from sklearn documentation: https://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With