How to get columns from big sparse csc matrix

Question

I have a sparse matrix X

<1000000x153047 sparse matrix of type '<class 'numpy.float64'>'
with 5082518 stored elements in Compressed Sparse Column format>

and I have an array

columns_to_use

It consist of 10000 id of columns of matrix X. I want to use only these columns and drop another columns. I try to use such code:

X_new = X[:, columns_to_use]

And it works good with small X (10 000 rows), but with 100 000 rows or more I get memory error. How to get specific columns without memory error?

malugina · Accepted Answer

I got such decision:

cols = []
for i in columns_to_use:
    cols.append(X[:,i])
X_new = hstack(cols)

it works fast enough and without any erorrs. And it's easy.

Donate For Us