How to use grid search for the svm?

Question

I think Machine learning is interesting and I am studying the scikit learn documentation for fun. Below I have done some data cleaning and the thing is that I want to use grid search to find the best values for the parameters.

from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn import metrics
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score


cats = ['sci.space','rec.autos','rec.motorcycles']
newsgroups_train = fetch_20newsgroups(subset='train',remove=('headers', 'footers', 'quotes'), categories = cats)
newsgroups_test = fetch_20newsgroups(subset='test',remove=('headers', 'footers', 'quotes'), categories = cats)

vectorizer = TfidfVectorizer( stop_words = "english")


vectors = vectorizer.fit_transform(newsgroups_train.data)
vectors_test = vectorizer.transform(newsgroups_test.data)

clf =  SVC(C=0.4,gamma=1,kernel='linear')

clf.fit(vectors, newsgroups_train.target)
vectors_test = vectorizer.transform(newsgroups_test.data)
pred = clf.predict(vectors_test)
print(accuracy_score(newsgroups_test.target, pred))

The accuracy is: 0.849

I have heard of grid search in order to find the optimal value of parameters but I can't understand how to perform it. Can you please elaborate? This is what I tried but is not correct. I would like to learn the correct way along with some explanation. Thanks

Cs = np.array([0.001, 0.01, 0.1, 1, 10])
gammas = np.array([0.001, 0.01, 0.1, 1])
model = SVC()
grid = GridSearchCV(estimator=model, param_grid=dict(Cs=alphas,gamma=gammas))
grid.fit(newsgroups_train.data, newsgroups_train.target)
print(grid)
# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_.alpha)

EDIT based on the answer received:

parameters = {'C': [1, 10], 
          'gamma': [0.001, 0.01, 1]}
model = SVC()
grid = GridSearchCV(estimator=model, param_grid=parameters)
grid.fit(vectors, newsgroups_train.target)
print(grid)
# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_)

it returns:

GridSearchCV(cv='warn', error_score='raise-deprecating',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False),
       fit_params=None, iid='warn', n_jobs=None,
       param_grid={'C': [1, 10], 'gamma': [0.001, 0.01, 1]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring=None, verbose=0)
0.8532212885154061
SVC(C=10, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

I need clarifications on these:

1)What actually is displayed on the results?
2)Does it also take ranges for C as 1 to 10 or either 1 or 10? 
3)Can you suggest anything    to improve accuracy further?  
4)I noticed that the Tfidf made the accuracy worse even though it 
              cleaned the data from words that dont have any value

db702 · Accepted Answer

You want to pass a dictionary of parameters where the keys are the name of the parameter as defined by the model's documentation (1). The values should be a list of the values you would like to try.

The grid search will then call every possible combination of those parameters. There are some good examples with the documentation (2).

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html

For your script, you also want to make sure that you are feeding the grid search the correct training data, in this case, 'vectors' not 'newsgroups_test.data'.

See below:

parameters = {'C': [1, 10], 
          'gamma': [0.001, 0.01, 1]}
model = SVC()
grid = GridSearchCV(estimator=model, param_grid=parameters)
grid.fit(vectors, newsgroups_train.target)
print(grid)
# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_)

Please accept the answer if it works. Good luck!

How to use grid search for the svm?

Tags:

python

machine-learning

svm

scikit-learn

EDIT based on the answer received:

user11911849

1 Answers

db702

Recent Activity

Donate For Us

How to use grid search for the svm?

Tags:

python

machine-learning

svm

scikit-learn

EDIT based on the answer received:

user11911849

1 Answers

db702

Related questions

Recent Activity

Donate For Us