Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

search engine with Tf-Idf in python

here is my code

 from sklearn.feature_extraction.text import TfidfVectorizer
 corpus = [
     "this is first document ","this is second document","this is third","which document is first", ]

 vectorizer = TfidfVectorizer()

X = vectorizer.fit_transform(corpus)

X.toarray()

now this is what i want to do?

when i search document it should give me [ 1,2,4]documents(sentence)

when i search first document it should give me [1]documents(sentence)

when i search second it should give me [2]documents(sentence)

i want to do this with TfIdf (i can't do normal searching )

how can i do that?

like image 373
jony Avatar asked Mar 23 '26 19:03

jony


1 Answers

First of all, you have to ask yourself the question: what does the TfidfVectorizer do? The answer is: it transforms your documents into vectors. How can you proceed further? One solution is to transform your query also into a vector by using the vectorizer. Then, you can compare the cosine similarity between the transformed query vector and each of the vectors of the documents in your database. The document with the highest cosine similarity to your query vector is the most relevant one (at least according to the Vector space model). Here https://towardsdatascience.com/tf-idf-for-document-ranking-from-scratch-in-python-on-real-world-dataset-796d339a4089 is an example implementation.

like image 97
teoML Avatar answered Mar 26 '26 07:03

teoML