0% found this document useful (0 votes)
22 views11 pages

RAG-Powered Local Document Chatbot

Uploaded by

satyamvatsal7
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views11 pages

RAG-Powered Local Document Chatbot

Uploaded by

satyamvatsal7
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Project Report

on

RAG-Powered Local Document Chatbot


Submitted for Internship/ Industrial training requirements

of

[Link] and [Link] (Dual Degree)

in

Mathematics And Data Science

Submitted by

Nishant Bharati

Scholar No. 2340401103

Project work completed


as part of the

IBM SkillBuild AI Agent Architect Program

Department of Mathematics, Bioinformatics and Computer Applications

Maulana Azad National Institute of Technology Bhopal-462003 (India)


ABSTRACT

CONCEPT DEFINITIONS

TABLE OF CONTENTS ...


SECTION 3: IMPLEMENTATION
CONCEPT DEFINITIONS ... (Page 8)
(Glossary) ... (Page 5)
➢ 3.1. Project Structure
SECTION 1: INTRODUCTION ...
(Page 6) ➢ 3.2. Core Functionality 1:
Persistent Workspaces
➢ 1.1. The Problem: (SQLite)
Information Overload &
Privacy ➢ 3.3. Core Functionality 2: The
RAG Indexing Pipeline
➢ 1.2. The Solution: Local-First
RAG ➢ 3.4. Core Functionality 3: The
"Live" Persona Chat Logic
➢ 1.3. Project Objectives
SECTION 4: RESULTS &
SECTION 2: SYSTEM CONCLUSION ... (Page 9)
ARCHITECTURE &
TECHNOLOGY STACK ... (Page 7 ➢ 4.1. Results

➢ 2.1. System Architecture ➢ 4.2. Conclusion

➢ 2.2. Technology Stack ➢ 4.3. Future Work


ABSTRACT

The exponential growth of digital information, siloed in diverse formats


like PDFs, DOCX files, and web pages, presents a significant challenge
for efficient information retrieval. Traditional search methods are often
insufficient, and Large Language Models (LLMs) are constrained by
limited context windows.

This project implements a "Chat with your Documents" application to


solve this problem. It is a comprehensive, multimodal conversational AI
built on a Retrieval-Augmented Generation (RAG) architecture. The
system ingests user-provided documents, processes them into a
searchable vector index, and stores this index in a persistent, project-
based workspace managed by an SQLite database.

The application is built using Python, with Streamlit for the web interface
and LangChain for orchestrating the RAG pipeline. To ensure complete
privacy, cost-free operation, and offline capability, the entire stack runs
locally. Text embedding is handled by a Hugging Face (all-MiniLM-L6-
v2) model, and the core conversational reasoning is powered by a
locally-hosted LLM (e.g., Llama 3) served via Ollama.

Key features include a project-based system for saving and loading


knowledge bases, "live" persona selection for a personalized user
experience, multimodal input (text, images, and audio), and robust
export options (PDF, DOCX, PPTX). The result is a scalable,
responsive, and fully private tool that transforms static documents into
an interactive, conversational knowledge base.
CONCEPT DEFINITIONS (Glossary)
AI: Artificial Intelligence.

LLM: Large Language Model. The "brain" of the chatbot (e.g., Llama 3, Mistral).

Ollama: A tool for running, serving, and managing open-source LLMs (like Llama 3)
locally on a user's machine.

RAG: Retrieval-Augmented Generation. The process of retrieving relevant data


before asking an LLM to generate an answer.

Embeddings: Numerical representations (vectors) of text, images, or audio, allowing


them to be searched based on meaning.

Vector Store: A specialized database (e.g., FAISS) designed to store and search
embeddings efficiently.

LangChain: A Python framework used to "chain" together components (LLM, vector


store, prompts) to build a complete application.

FAISS: Facebook AI Similarity Search. A high-performance local vector database.

Hugging Face: A platform providing open-source ML models. We use it for the local
embedding model.

Streamlit: A Python library used to build and deploy the web-based user interface
(UI) for the app.

SQLite: A lightweight, serverless, disk-based database used to store project


metadata.

Persona: A "system instruction" that defines the AI's tone, style, and goal (e.g.,
"Tutor," "Exam Ready").

Multimodal: The ability to process and understand multiple types of information,


such as text, images, and audio.
SECTION 1: INTRODUCTION
1.1. The Problem: Information Overload & Privacy

In the modern digital landscape, individuals and organizations face a constant deluge of
information. This data is often unstructured and locked in various formats. Manually
extracting specific information is tedious and prone to error.
Furthermore, while cloud-based Large Language Models (LLMs) are powerful, they present
two critical issues:
● Context & Cost: They have limited "context windows" and cannot read thousands of
pages at once. Sending large documents via API is slow and can be expensive.
● Privacy & Security: Using a commercial API requires sending private, sensitive, or
proprietary data to a third-party server, which is not acceptable for many academic,
corporate, or personal use cases.

1.2. The Solution: Local-First RAG


This project solves these problems using a 100% local-first Retrieval-Augmented Generation
(RAG) architecture. The application runs entirely on the user's machine, ensuring that no
data ever leaves their computer.

Indexing: The application reads all documents, splits them into small chunks, and converts
each chunk into a numerical "embedding" using a local Hugging Face model. These
embeddings are stored in a local vector database (FAISS).

Retrieval: When the user asks a question, the app converts the question into an embedding
and uses it to search the local database, retrieving only the top-K most relevant chunks of
text.

Generation: The app feeds this small, relevant context to a local LLM (e.g., Llama 3) served
by Ollama.

This local-first method allows the app to "chat" with thousands of pages of documents for
free, with zero latency, and with perfect privacy, even while offline.

1.3. Project Objectives

The primary objective is to build a robust, private, and offline-capable conversational AI that
serves as a personal knowledge assistant.

The key goals are:

Implement a 100% local RAG pipeline, using Ollama for chat and Hugging Face for
embeddings, to ensure user privacy and eliminate all API costs and rate limits.

Develop a persistent, project-based system using SQLite, allowing users to create, save,
and reload their document collections.

Incorporate "live" persona switching to allow users to dynamically change the AI's output
style (e.g., from "Tutor" to "Exam Ready").
Support multimodal RAG by integrating local vision models (e.g., llava) via Ollama.
Provide robust export options (PDF, DOCX, PPTX) to integrate the AI's answers into a
user's workflow.
SECTION 2: SYSTEM ARCHITECTURE &
TECHNOLOGY STACK

System Architecture

Technology Stack
SECTION 3: IMPLEMENTATION
3.1. Project Structure

The application is a self-contained system. All data and models (except for the initial
Ollama setup) are stored within the project directory, ensuring portability and privacy.

/your-project-folder/

├── [Link] # The main Streamlit application
├── [Link] # All Python dependencies
├── [Link] # SQLite database for project management

└── projects/ # Directory for all saved project data

├── my_project_1/
│ ├── faiss_index/
│ │ ├── [Link] # The RAG vector index
│ │ └── [Link] # The FAISS index metadata
│ │
│ └── an_image.png # A copy of an uploaded image

└── my_project_2/
└── ...
3.2. Core Functionality

1: Persistent Workspaces (SQLite)


Persistence is achieved via a [Link] SQLite file. This file contains a single table,
projects, which acts as the master record.

projects table schema:

name (TEXT): The unique project name (e.g., "AI Course Notes").

index_path (TEXT): The file path to its faiss_index folder (e.g.,


projects/ai_course_notes/faiss_index).
image_paths (TEXT): A JSON string list of paths to any associated images.
persona (TEXT): The string name of the persona saved with the project (e.g.,
"Tutor").
The "Load Project" button queries this database, loads the FAISS index from the
index_path into memory (st.session_state.retriever), and sets the app's state.
3.3. Core Functionality

2: The RAG Indexing Pipeline


The build_vector_index function is the core of the RAG pipeline.

Text Extraction: It first loops through all uploaded files and the URL, using
pdfplumber, docx, and BeautifulSoup to extract raw text.

Splitting: It uses LangChain's RecursiveCharacterTextSplitter with chunk_size=1000


and chunk_overlap=200.

Embedding: It initializes the local HuggingFaceEmbeddings(model_name="all-


MiniLM-L6-v2") model. This runs on the user's CPU/GPU, ensuring data never
leaves the machine.

Indexing: It calls FAISS.from_documents(all_chunks, embeddings) to create the in-


memory vector store.

Saving: Finally, it saves the index to disk using vector_index.save_local(index_path).

3.4. Core Functionality

3: The "Live" Persona & Chat Logic


To solve the "stale persona" problem (where changing the persona in the dropdown
had no effect), the application does not store the full, pre-built LangChain RAG
chain. Instead, it stores the building blocks: st.session_state.llm and
st.session_state.retriever.

The RAG chain is rebuilt on-the-fly with every single query. This allows the persona
to be "live":

Python

# This logic runs on EVERY user question:


if prompt_parts:
...
# 1. Get the CURRENTLY selected persona from the sidebar
current_persona_prompt = personae[st.session_state.system_instruction]

# 2. Build a new prompt template with this "live" persona


prompt_template = ChatPromptTemplate.from_template(
f"{current_persona_prompt}\n\n"
"Context: {context}\nQuestion: {input}"
)
# 3. Build a new, fresh RAG chain
document_chain = create_stuff_documents_chain(st.session_state.llm,
prompt_template)
retrieval_chain = create_retrieval_chain(st.session_state.retriever,
document_chain)

# 4. Run the chain


response_stream = retrieval_chain.stream({"input": text_for_rag})
...
SECTION 4: RESULTS & CONCLUSION

4.1. Results

The project was evaluated on its qualitative performance, scalability, and privacy.

Privacy & Cost (Local-First): The primary goal was achieved. The app runs 100%
offline, with zero API calls for embeddings or chat. This makes it infinitely scalable
for a single user, with no API costs or rate limits, and guarantees perfect data
privacy.
Accuracy (RAG Tuning): Initial tests with large chunks (chunk_size=4000) yielded
short answers. By tuning the parameters to chunk_size=1000 and retrieving more
chunks (k=8), the "Detailed Explainer" persona successfully generated long,
comprehensive, and accurate answers, as intended.
Flexibility (Live Personas): The "on-the-fly" chain construction was a success. The
user can switch from "Exam Ready" (getting short, bulleted answers) to "Tutor"
(getting long, simple explanations) for the very next question, all while using the
same underlying data.
Usability (UI/UX): The WhatsApp-style dark mode, streaming responses, "Read
Aloud," and audio inputs provide a modern, professional user experience. The
persistent project system and export features successfully transition the app from a
simple demo to a genuine productivity tool.

4.2. Conclusion
This project successfully demonstrates the power and feasibility of a 100% local-first
RAG architecture. By combining the open-source power of Ollama, LangChain,
Hugging Face, and FAISS, we have created a scalable, persistent, and multimodal
conversational AI that runs entirely on a personal computer.
The application overcomes the core limitations of cloud-based LLMs—namely cost,
rate-limiting, and privacy. It provides a user-friendly interface for anyone to build and
chat with their own specialized, private knowledge base.

4.3. Future Work


While the project is a comprehensive success, the following steps could enhance it
further:
Advanced Multimodal RAG: Currently, images are stored and passed to the LLM.
A more advanced system would use a vision embedding model (like CLIP) to make
the images themselves searchable, allowing a user to ask, "Show me all the
diagrams in my documents."
RAG-on-Audio: The current audio input is transcribed. A future version could
"index" the audio itself, allowing a user to search through hours of lecture recordings.
Agentic Behavior: The system could be expanded using LangGraph to perform
multi-step tasks, such as reading a document, searching the web for new
information, and then combining both to write a summary.
REFERENCES

Master Class: link

Master Class: link

Master Class: link

Master Class: link

Master Class: link

You might also like