SGDClassifier with class_weight=auto fails on scikit-learn 0.15 but not 0.14

Tags:

scikit-learn

When I train an scikit-learn v0.15 SGDClassifier with these options: SGDClassifier(loss='log', class_weight=None, penalty='l2'), training completes with no error. Yet when I train this classifier with class_weight='auto' on scikit-learn v0.15, I get this error:

  return self.model.fit(X, y)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 485, in fit
    sample_weight=sample_weight)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 389, in _fit
    classes, sample_weight, coef_init, intercept_init)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/linear_model/stochastic_gradient.py", line 336, in _partial_fit
    y_ind)
  File "/home/rose/.local/lib/python2.7/site-packages/scikit_learn-0.15.0b1-py2.7-linux-x86_64.egg/sklearn/utils/class_weight.py", line 43, in compute_class_weight
    raise ValueError("classes should have valid labels that are in y")
ValueError: classes should have valid labels that are in y

What could cause it?

For reference, here's the documentation on class_weight:

Preset for the class_weight fit parameter. Weights associated with classes. If not given, all classes are supposed to have weight one. The “auto” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies.

772

asked Jul 17 '14 16:07

Rose Perrone

1 Answers

I think this may be a bug within scikit-learn. As a work around, try the following:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_encoded = le.fit_transform(y)
self.model.fit(X, y_encoded)
pred = le.inverse_transform(self.model.predict(X))

134

answered Nov 16 '22 08:11

Danny Sullivan

Related questions
                            
                                Binary vectors as y_score argument of roc_curve
                            
                                OneVsRestClassification with GridSearchCV in Sklearn
                            
                                Making scikit deterministic?
                            
                                How to use pickled classifier with countVectorizer.fit_transform() for labeling data
                            
                                Implement K Neighbors Classifier and Linear SVM in scikit-learn for Word sense disambiguiation
                            
                                Time series forecasting with scikit learn
                            
                                How to implement Kernel density estimation in multivariate/3D
                            
                                Accessing transformer functions in `sklearn` pipelines
                            
                                Does scikit learn's fit_transform also transform my original dataframe?
                            
                                Sklearn: Difference between using OneVsRestClassifier and build each classifier individually
                            
                                How to implement callable distance metric in scikit-learn?
                            
                                How to do POS tagging using SVM in Python?
                            
                                Python, ValueError, BroadCast Error with SKLearn Preproccesing
                            
                                Sparse implementations of distance computations in python / scikit-learn
                            
                                Different accuracy for LibSVM and scikit-learn
                            
                                How to Extend Scipy Sparse Matrix returned by sklearn TfIdfVectorizer to hold more features
                            
                                Macro VS Micro VS Weighted VS Samples F1 Score
                            
                                Web application that uses scikit-learn
                            
                                scikit-learn kernel PCA explained variance
                            
                                Using the predict_proba() function of RandomForestClassifier in the safe and right way

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With