Low rank approximation using scipy

Question

I'm trying to use low-rank-approximation for latent semantic indexing. I thought that doing low rank approximation reduces matrix dimensions but it contradicts the results I get.

Assume I have my dictionary with 40 000 words and 2000 documents. Then my term-by-document matrix is 40 000 x 2000. According to wikipedia, I have to do SVD of a matrix and then apply

enter image description here

This is the code I use for SVD and low rank approximation (the matrix is sparse):

import scipy
import numpy as np

u, s, vt = scipy.sparse.linalg.svds(search_matrix, k=20)
search_matrix = u @ np.diag(s) @ vt

print('u: ', u.shape) # (40000, 20)
print('s: ', s.shape) # (20, )
print('vt: ', vt.shape) # (20, 2000)

The result matrix is: (40 000 x 20) * (20 x 20) * (20, 2000) = 40 000 x 2000, which is exactly what I started with.

So... how does the low-rank-approximation reduce the dimensions of the matrix exactly?

Also, I will be doing queries on this approximated matrix to find correlation between user vector and each document (naive search engine). The user vector has dimensions 40 000 x 1 to start with (bag of words). According to the same wikipedia page, this is what I should do:

enter image description here

The code:

user_vec = np.diag((1 / s)) @ u.T @ user_vec

And it produces a matrix 20 x 1 which is what I expected! ((20 x 20) * (20 x 40 000) * (40 000 x 1) = (20 x 1)). But now, it has dimensions that do not match the search_matrix I want to multiply it with.

So... What am I doing wrong and why?

Sources:

https://en.wikipedia.org/wiki/Latent_semantic_analysis

matthiasbe · Accepted Answer

About low rank approximation :

The goal is to have a matrix that you can store with less memory and with which you can compute faster.

But you want it to have the same behavior as the original matrix (in particular the same dimensions).

That's why you use a product of matrices. They give you a small rank, but without changing the dimensions of the matrix.

Low rank approximation using scipy

Tags:

python

numpy

scipy

nlp

svd

zemiret

1 Answers

matthiasbe

Recent Activity

Donate For Us

Low rank approximation using scipy

Tags:

python

numpy

scipy

nlp

svd

zemiret

1 Answers

matthiasbe

Related questions

Recent Activity

Donate For Us