Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikit-Learn: Loading & Dumping multi-label SVM light format

In scikit-learn, two functions are provided to load and dump files in SVM^light format:

sklearn.datasets.load_svmlight_file and sklearn.datasets.dump_svmlight_file

The documentation shows (and the function supports) that load_svmlight_file can load multi-label data, that is where the target categories are separated by a comma, instead of there being a single category as the target. However, the dump_svmlight_file doesn't seem to support this.

Am I reading things wrong, or does the dump_svmlight_file for some reason just not support this? Its not even possible to 'trick it' by passing a y-vector with string-based target values, because the file writer requires a float for the value. The dump file code can be found at https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py from line 230 to 262.

like image 822
Alathon Avatar asked Mar 22 '26 03:03

Alathon


1 Answers

You're right, dump_svmlight_file does not at present support multi-label tasks. That's an omission; you can file a bug report for it, although a good patch (pull request) would lead to quicker action.

(Signed, one of the authors of that module.)

like image 115
Fred Foo Avatar answered Mar 24 '26 17:03

Fred Foo