Skip to content

A Python project that deploys a Local RAG chatbot using Ollama API and vLLM API. Refines answers with internal RAG knowledge base, using both Embedding and Rerank models to improve accuracy of context provided to LLM models.

License

Notifications You must be signed in to change notification settings

Violet-sword/Local-RAG-Chatbot-Rerank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Local RAG Chatbot with Rerank

A internal Retrieval-Augmented Generation (RAG) assistant that answers your questions using a local Embedding model and LLM model pulled via Ollama, Rerank model downloaded from Hugging Face and run on vLLM.

This version is trained on community-sourced cooking tips, but you can customize it to your own content easily.

Features

  • Fully local RAG setup (no need for cloud api)
  • Uses Ollama-compatible models for both embedding and generation
  • Incorporate huggingface.co rerank model, run on vLLM
  • Built with Python, using LangChain and FAISS library
  • Utilized the power of embedding models and rerank models to enhance knowledge retrieval accuracy
  • Interactive terminal interface
  • Fallback to general knowledge if answer isn't found in local data

Requirements

  • Python 3.11+
  • Ollama installed and running
  • pull LLM model and Embedding model from Ollama to local (after pulling, update line 21 and 25 of the code accordingly)
  • vLLM installed and running (update line 31 after vLLM is setup)
  • install huggingface_hub
  • download the Rerank model 'bge-reranker-v2-m3' from huggingface.co with huggingface-cli command

Install Python library dependencies:

pip install langchain langchain-community langchain-ollama faiss-cpu requests json

Check if the knowledge base file is in the same directory as "internal-rag-cookbot.py". I have named mine "cooking-tips-comments.txt", the name and contents of the file can be changed.

How It Works

This project uses Retrieval-Augmented Generation (RAG), which combines a embedding vector database created from your content, a reranker to rank relevance of content, along with a language model to provide more accurate and contextual answers.

Embedding the Content

  • The text file (cooking-tips-comments.txt) is our knowledge base file.

  • The contents of the file is converted by projecting the high-dimensional space of initial data vectors into a lower-dimensional space using a local embedding model (in the code provided, we used 'snowflake-arctic-embed:335m' from Ollama).

  • These vectors capture the semantic meaning of the text — two similar tips will have similar embeddings.

Storing in a Vector Database

  • The vectors are stored in FAISS, a fast, in-memory vector store that supports efficient similarity search.

  • This lets the system quickly find the most relevant chunks of text when a new question is asked.

Retrieving Context from Embedding

  • When you ask a question, it is also embedded into a vector on the spot for the LLM model to actully understand your question.

  • FAISS compares this vector to the ones in the vector database and returns the top matching text chunks. The specific phrase we use here is "top-k".

  • top-k is the number of top matching entries we retrieve from the embedding using a fast but rough similarity search. If the number k is too small, we might miss out on some relevant information; if it's too large, we're likely getting too much unrelated content. In the code provided we are using "top-k 10", which is a good number compared to our data size. In the case for using a much larger knowledge base, a bigger k is prefered (some knowledge bases are so big that they use k = 100).

Rerank Top-K Content

  • These chunks are sent to a rerank API using 'bge-reranker-v2-m3' to reorder them by relevance.

  • Only the top result is used as context for generation.

Generate The Answer

  • The contxt is passed to the LLM (in the provided code we used 'gemma3:12b') along with our question.

  • The LLM uses this context to generate a more accurate, grounded, and helpful response.

Usage

Run the following command in the directory of the python file.

python internal-rag-cookbot.py

Type in what you want to ask when the prompt "Your question: " shows up.

Example use of the code:

<some-user-directory>:~$ python chatbot-rerank.py
    Internal RAG Q&A Bot    
Ask questions about cooking hacks and kitchen tips.
This assistant is powered by a local language model and a custom knowledge base built from community-sourced cooking advice.
Type 'exit' to quit.

Your question:

Here we enter our question:

Your question: i am trying to make chocolate chip cookies

The answer responded is:

Top relevant chunk:
 Not mine, but my wife browns the butter before she adds it to chocolate chip cookie dough and they're the best freakin' cookies I've ever eaten! ...


Answer: That's great! My wife browns the butter before adding it to her chocolate chip cookie dough, and it makes a huge difference – they're amazing! You should try it! 

Your question: 

Now we can type "exit" to close this chatbot:

Your question: exit
Goodbye!

About

A Python project that deploys a Local RAG chatbot using Ollama API and vLLM API. Refines answers with internal RAG knowledge base, using both Embedding and Rerank models to improve accuracy of context provided to LLM models.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages