Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

LangChain Chroma - load data from Vector Database

I have written LangChain code using Chroma DB to vector store the data from a website url. It currently works to get the data from the URL, store it into the project folder and then use that data to respond to a user prompt. I figured out how to make that data persist/be stored after the run, but I can't figure out how to then load that data for future prompts. The goal is a user input is received, and the program using OpenAI LLM will generate a response based on the existing database files, as opposed to the program needing to create/write those database files on each run. How can this be done?

What should I do?

I tried this as this would likely be the ideal solution:

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", vectorstore=vectordb)

But the from_chain_type() function doesn't take a vectorstore db as an input, so therefore this doesn't work.

like image 539
max choate Avatar asked Sep 07 '25 09:09

max choate


2 Answers

You need to define the retriever and pass that to the chain. That will use your previously persisted DB to be used in queries.

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)

retriever = vectordb.as_retriever()

qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)

like image 102
Andrew Avatar answered Sep 10 '25 03:09

Andrew


All the answers I have seen are missing one crucial step to call persist the DB. As a complete solution, you need to perform following steps.

To create db first time and persist it using the below lines.

vectordb = Chroma.from_documents(data, embedding=embeddings, persist_directory = persist_directory)
vectordb.persist()

The db can then be loaded using the below line.

vectordb = Chroma(persist_directory=persist_directory, embedding_function=embeddings)
like image 24
Gautam Chauhan Avatar answered Sep 10 '25 03:09

Gautam Chauhan