Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Identifying a sklearn-model's classes

The documentation on SVMs implies that an attribute called classes_ exists, which allegedly reveals how the model represents classes internally.

I would like to get that information in order to interpret the output from functions like predict_proba, which generates probabilities of classes for a number of samples. Hopefully, knowing that given some illustrating values:

model.classes_ 
>>> [1, 2, 4]

means that I can assume that this holds:

model.predict_proba([[1.2312, 0.23512, 6.01234], [3.7655, 8.2353, 0.86323]]) 
>>> [[0.032, 0.143, 0.825], [0.325, 0.143, 0.532]]

Probabilities should translate to the same order as the classes, i.e. for the first set of features I can assume:

probability of class 1: 0.032
probability of class 2: 0.143
probability of class 4: 0.825

But calling classes_ on an SVM results in an error. Is there a good way to get that information? I can't imagine that it's not accessible any more after the model is trained.


edit: The way I build my model is more or less like this:

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion


pipeline = Pipeline([
   ('features', FeatureUnion(transformer_list[ ... ])),
   ('svm', SVC(probability=True))
])
parameters = { ... }
grid_search = GridSearchCV(
    pipeline,
    parameters
)

grid_search.fit(get_data(), get_labels())
clf = [elem for elem in grid_search.estimator.steps if elem[0] == 'svm'][0][1]

print(clf)
>> SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
print(clf.classes_)
>> Traceback (most recent call last):
  File "path/to/script.py", line 284, in <module>
  File "path/to/script.py", line 181, in re_train
    print(clf.classes_)
AttributeError: 'SVC' object has no attribute 'classes_'
like image 397
Arne Avatar asked Sep 08 '25 01:09

Arne


2 Answers

The grid_search.estimator that you are looking at is the unfitted pipeline. The classes_ attribute only exists after fitting, as the classifier needs to have seen y.

What you want it the estimator that was trained using the best parameter settings, which is grid_search.best_estimator_.

The following will work:

clf = grid_search.best_estimator_.named_steps['svm']
print(clf.classes_)

[and classes_ does exactly what you think it does].

like image 131
Andreas Mueller Avatar answered Sep 09 '25 14:09

Andreas Mueller


There is a classes field in sklearn, it probably means you were calling the wrong model, see example below, we can see that there are classes when looking at the classes_ field:

>>> import numpy as np
>>> from sklearn.svm import SVC
>>> X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
>>> y = np.array([1, 1, 2, 2])
>>> clf = SVC(probability=True)
>>> clf.fit(X, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
>>> print clf.classes_
[1 2]
>>> print clf.predict([[-0.8, -1]])
[1]
>>> print clf.predict_proba([[-0.8, -1]])
[[ 0.92419129  0.07580871]]
like image 38
chappers Avatar answered Sep 09 '25 15:09

chappers