I am trying to use the ExtraTreesClassifier in scikit-learn on my data. I have two numpy arrays X and y. X is of dimension (10000,51) and y is (10000,). To make sure they are in numpy array format, I use
X = numpy.array(X, dtype=np.float32)
print numpy.asarray(X,dtype=np.float32) is X
y = numpy.array(y, dtype=np.float32)
print numpy.asarray(y,dtype=np.float32) is y`
and I get TRUE
for both. Then I define my model as:
clf = ExtraTreesClassifier(n_estimators=10, max_depth=None, min_samples_split=1, random_state=0, n_jobs = -1)`
And when I want to train my model using
clf = clf.fit(X, y)`
I get the following error:
File "CFD_scikit_learn.py", line 169, in <module>
clf = Xtra_Trees(my_var)
File "CFD_scikit_learn.py", line 140, in Xtra_Trees
clf = clf.fit(X, y)
File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 235, in fit
y, expanded_class_weight = self._validate_y_class_weight(y)
File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py", line 421, in _validate_y_class_weight
check_classification_targets(y)
File "/user/leuven/308/vsc30879/.local/lib/python2.7/site-packages/sklearn/utils/multiclass.py", line 173, in check_classification_targets
raise ValueError("Unknown label type: %r" % y)
ValueError: Unknown label type: array([[ 2.09895 ],
[ 1.658568],
[ 1.242831],
...,
[ 1.743349],
[ 1.765763],
[ 1.824112]])
If anybody knows how to solve this problem, be grateful if you let me know.
Classifiers need integer labels.
You either need to turn them into integers (e.g. bin them), or use a regression-type model.
If you think you can bin the floats into sensible classes, numpy.digitize
might help. Or you could binarize them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With