In scikit-learn, two functions are provided to load and dump files in SVM^light format:
sklearn.datasets.load_svmlight_file and sklearn.datasets.dump_svmlight_file
The documentation shows (and the function supports) that load_svmlight_file can load multi-label data, that is where the target categories are separated by a comma, instead of there being a single category as the target. However, the dump_svmlight_file doesn't seem to support this.
Am I reading things wrong, or does the dump_svmlight_file for some reason just not support this? Its not even possible to 'trick it' by passing a y-vector with string-based target values, because the file writer requires a float for the value. The dump file code can be found at https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py from line 230 to 262.
You're right, dump_svmlight_file does not at present support multi-label tasks. That's an omission; you can file a bug report for it, although a good patch (pull request) would lead to quicker action.
(Signed, one of the authors of that module.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With