So I am facing the following classification problem: I have a lot of different (large) 2d-matrices with many zero-entries (therefore maybe meaningful to use sparse-matrices), which need to be classified.
I wanted to test sklearn's various classifiers, but they only seem to work with np.ndarrays as X_train data (at least according to the documentation)
I wanted to do the following (minimal example):
data=np.ndarray(2);
data[0] = sparse.csr_matrix((x_d, (x_r, x_c)), shape=(x_size, y_size))
But this gives me the following error:
ValueError: setting an array element with a sequence.
Any idea how to deal with this? I haven't really found any input on classifying a number of sparse matrices.
The full error traceback should hint at what the problem is:
TypeError: float() argument must be a string or a number, not 'csr_matrix'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/hayesall/teaching/sparse_matrix.py", line 11, in <module>
data[0] = X
ValueError: setting an array element with a sequence.
The lines:
data = np.ndarray(2)
data[0] = sparse.csr_matrix((x_d, (x_r, x_c)), shape=(x_size, y_size))
... initialize an array with random memory contents (which is usually not recommended) and then tries to set the first entry to the result of a sparse matrix (which raises a TypeError + ValueError):
>>> import numpy as np
>>> np.ndarray(10)
array([6.92548229e-310, 6.92548229e-310, 3.60419659e-306, 1.60105975e+002,
1.74568648e-309, 1.05751040e+007, 5.79612503e+017, 7.93549630e-301,
4.17034227e+021, 2.03099725e+026])
sklearn classifiers can usually handle sparse matrix inputs. For example, this fits a multilayer perceptron to a sparse XOR classification problem:
from sklearn.neural_network import MLPClassifier
import numpy as np
from scipy.sparse import csr_matrix
i_pointer = np.array([1, 3, 2, 3])
j_pointer = np.array([0, 0, 1, 1])
values = np.array([1.0, 1.0, 1.0, 1.0])
X = csr_matrix((values, (i_pointer, j_pointer)))
y = np.array([0.0, 1.0, 1.0, 0.0])
print(X.todense())
clf = MLPClassifier()
clf.fit(X=X, y=y)
If you want to learn/predict using multiple sparse matrices, you'd probably need to vstack or hstack them. I don't know which without knowing a lot more about the data though.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With