Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient incremental sparse matrix in python / scipy / numpy

Is there a way in Python to have an efficient incremental update of sparse matrix?

 H = lil_matrix((n,m))
 for (i,j) in zip(A,B):
   h(i,j) += compute_something  

It seems that such a way to build a sparse matrix is quite slow (lil_matrix is the fastest sparse matrix type for that).

Is there a way (like using dict of dict or other kind of approaches) to efficiently build the sparse matrix H?

like image 267
francois rousseau Avatar asked Oct 20 '25 16:10

francois rousseau


2 Answers

In https://stackoverflow.com/a/27771335/901925 I explore incremental matrix assignment.

lol and dok are the recommended formats if you want to change values. csr will give you an efficiency warning, and coo does not allow indexing.

But I also found that dok indexing is slow compared to regular dictionary indexing. So for many changes it is better to build a plain dictionary (with the same tuple indexing), and build the dok matrix from that.

But if you can calculate the H data values with a fast numpy vector operation, as opposed to iteration, it is best to do so, and construct the sparse matrix from that (e.g. coo format). In fact even with iteration this would be faster:

 h = np.zeros(A.shape)
 for k, (i,j) in enumerate(zip(A,B)):
    h[k] = compute_something 
 H = sparse.coo_matrix((h, (A, B)), shape=(n,m))

e.g.

In [780]: A=np.array([0,1,1,2]); B=np.array([0,2,2,1])
In [781]: h=np.zeros(A.shape)
In [782]: for k, (i,j) in enumerate(zip(A,B)):
    h[k] = i+j+k
   .....:     
In [783]: h
Out[783]: array([ 0.,  4.,  5.,  6.])
In [784]: M=sparse.coo_matrix((h,(A,B)),shape=(4,4))
In [785]: M
Out[785]: 
<4x4 sparse matrix of type '<class 'numpy.float64'>'
    with 4 stored elements in COOrdinate format>
In [786]: M.A
Out[786]: 
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  9.,  0.],
       [ 0.,  6.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])

Note that the (1,2) value is the sum 4+5. That's part of the coo to csr conversion.

In this case I could have calculated h with:

In [791]: A+B+np.arange(A.shape[0])
Out[791]: array([0, 4, 5, 6])

so there's no need for iteration.

like image 117
hpaulj Avatar answered Oct 22 '25 04:10

hpaulj


Nope, do not use csr_matrix or csc_matrix, as they are going to be even more slower than lil_matrix, if you construct them incrementally. The Dictionary of Key based sparse matrix is exactly what you are looking for

from scipy.sparse import dok_matrix
S = dok_matrix((5, 5), dtype=np.float32)
for i in range(5):
    for j in range(5):
        S[i,j] = i+j    # Update elements
like image 33
romeric Avatar answered Oct 22 '25 04:10

romeric



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!