embedding vector search efficient algorithm

Question

background: I have a machine learning model in which given an object returns an embedding vector with dimension d, the model is trained in a way such that the semantic similarity of two embedding vectors is very close. Now, the verification process is relatively simple, I can take something like the cosine similarity of the two vectors. For recognition, it's a little bit complicated, either I can loop through all the anchor documents and compare the cosine similarity, or use something like kNN (online).

problem: I have a list of embedding vectors, each vector has a dimension d, with length N. Each vector contains floating-point data.

What will be an efficient data structure + algorithm that can do the following:

Can add a new vector with a unique ID to the list efficiently (<= logarithmic complexity)
Search with a random vector in the list, and retrieve top k vectors, such that the Manhattan distance / L1 norm is minimum for those vectors efficiently (hopefully, <= logarithmic complexity).

example:

[
 [1., 2., 3.],
 [5., 6., 8.],
 [-11., 2., 31.]
]

k = 2 query = [1.5, 2.5, 3.2] results:

[
 [1., 2., 3.],
 [5., 6., 8.],
]

Phoenix · Accepted Answer

I think Faiss is exactly what you are looking for. The Github page is here, if you are interested in the implementation details (this is pretty technical) see here, and the tutorial is here.

embedding vector search efficient algorithm

Tags:

python

algorithm

search

data-structures

similarity

Zabir Al Nazi

1 Answers

Phoenix

Recent Activity

Donate For Us

embedding vector search efficient algorithm

Tags:

python

algorithm

search

data-structures

similarity

Zabir Al Nazi

1 Answers

Phoenix

Related questions

Recent Activity

Donate For Us