Use sklearn.neural_network.MLPClassifier with ndarray of csr_matrices

Question

So I am facing the following classification problem: I have a lot of different (large) 2d-matrices with many zero-entries (therefore maybe meaningful to use sparse-matrices), which need to be classified.

I wanted to test sklearn's various classifiers, but they only seem to work with np.ndarrays as X_train data (at least according to the documentation)

I wanted to do the following (minimal example):

data=np.ndarray(2);
data[0] = sparse.csr_matrix((x_d, (x_r, x_c)), shape=(x_size, y_size))

But this gives me the following error:

ValueError: setting an array element with a sequence.

Any idea how to deal with this? I haven't really found any input on classifying a number of sparse matrices.

Alexander L. Hayes · Accepted Answer

The full error traceback should hint at what the problem is:

TypeError: float() argument must be a string or a number, not 'csr_matrix'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/hayesall/teaching/sparse_matrix.py", line 11, in <module>
    data[0] = X
ValueError: setting an array element with a sequence.

The lines:

data = np.ndarray(2)
data[0] = sparse.csr_matrix((x_d, (x_r, x_c)), shape=(x_size, y_size))

... initialize an array with random memory contents (which is usually not recommended) and then tries to set the first entry to the result of a sparse matrix (which raises a TypeError + ValueError):

>>> import numpy as np
>>> np.ndarray(10)
array([6.92548229e-310, 6.92548229e-310, 3.60419659e-306, 1.60105975e+002,
       1.74568648e-309, 1.05751040e+007, 5.79612503e+017, 7.93549630e-301,
       4.17034227e+021, 2.03099725e+026])

sklearn classifiers can usually handle sparse matrix inputs. For example, this fits a multilayer perceptron to a sparse XOR classification problem:

from sklearn.neural_network import MLPClassifier
import numpy as np
from scipy.sparse import csr_matrix

i_pointer = np.array([1, 3, 2, 3])
j_pointer = np.array([0, 0, 1, 1])
values = np.array([1.0, 1.0, 1.0, 1.0])

X = csr_matrix((values, (i_pointer, j_pointer)))
y = np.array([0.0, 1.0, 1.0, 0.0])

print(X.todense())

clf = MLPClassifier()
clf.fit(X=X, y=y)

If you want to learn/predict using multiple sparse matrices, you'd probably need to vstack or hstack them. I don't know which without knowing a lot more about the data though.

Use sklearn.neural_network.MLPClassifier with ndarray of csr_matrices

Tags:

python

machine-learning

numpy

scikit-learn

petwri

1 Answers

Alexander L. Hayes

Recent Activity

Donate For Us

Use sklearn.neural_network.MLPClassifier with ndarray of csr_matrices

Tags:

python

machine-learning

numpy

scikit-learn

petwri

1 Answers

Alexander L. Hayes

Related questions

Recent Activity

Donate For Us