Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(Python Scipy) How to flatten a csr_matrix and append it to another csr_matrix?

I am representing each XML document as a feature matrix in a csr_matrix format. Now that I have around 3000 XML documents, I got a list of csr_matrices. I want to flatten each of these matrices to become feature vectors, then I want to combine all of these feature vectors to form one csr_matrix representing all the XML documents as one, where each row is a document and each column is a feature.

One way to achieve this is through this code

X= csr_matrix([a.toarray().ravel().tolist() for a in ls])

where ls is the list of csr_matrices, however, this is highly inefficient, as with 3000 documents, this simply crashes!

In other words, my question is, how to flatten each csr_matrix in that list 'ls' without having to turn it into an array, and how to append the flattened csr_matrices into another csr_matrix.

Please note that I am using python with Scipy

Thanks in advance!

like image 429
IssamLaradji Avatar asked Oct 19 '25 10:10

IssamLaradji


1 Answers

Why you use csr_matrix for each XML, maybe it's better to use lil, lil_matrix support reshape method, here is an example:

N, M, K = 100, 200, 300
matrixs = [sparse.rand(N, M, format="csr") for i in xrange(K)]
matrixs2 = [m.tolil().reshape((1, N*M)) for m in matrixs]
m1 = sparse.vstack(matrixs2).tocsr()

# test with dense array
#m2 = np.vstack([m.toarray().reshape(-1) for m in matrixs])
#np.allclose(m1.toarray(), m2)
like image 67
HYRY Avatar answered Oct 21 '25 23:10

HYRY