0% found this document useful (0 votes)

22 views11 pages

RAG-Powered Local Document Chatbot

Uploaded by

satyamvatsal7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views11 pages

RAG-Powered Local Document Chatbot

Uploaded by

satyamvatsal7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Project Report

RAG-Powered Local Document Chatbot

Submitted for Internship/ Industrial training requirements

[Link] and [Link] (Dual Degree)

Mathematics And Data Science

Submitted by

Nishant Bharati

Scholar No. 2340401103

Project work completed

as part of the

IBM SkillBuild AI Agent Architect Program

Department of Mathematics, Bioinformatics and Computer Applications

Maulana Azad National Institute of Technology Bhopal-462003 (India)

ABSTRACT

CONCEPT DEFINITIONS

TABLE OF CONTENTS ...

SECTION 3: IMPLEMENTATION
CONCEPT DEFINITIONS ... (Page 8)
(Glossary) ... (Page 5)
➢ 3.1. Project Structure
SECTION 1: INTRODUCTION ...
(Page 6) ➢ 3.2. Core Functionality 1:
Persistent Workspaces
➢ 1.1. The Problem: (SQLite)
Information Overload &
Privacy ➢ 3.3. Core Functionality 2: The
RAG Indexing Pipeline
➢ 1.2. The Solution: Local-First
RAG ➢ 3.4. Core Functionality 3: The
"Live" Persona Chat Logic
➢ 1.3. Project Objectives
SECTION 4: RESULTS &
SECTION 2: SYSTEM CONCLUSION ... (Page 9)
ARCHITECTURE &
TECHNOLOGY STACK ... (Page 7 ➢ 4.1. Results

➢ 2.1. System Architecture ➢ 4.2. Conclusion

➢ 2.2. Technology Stack ➢ 4.3. Future Work

ABSTRACT

The exponential growth of digital information, siloed in diverse formats

like PDFs, DOCX files, and web pages, presents a significant challenge
for efficient information retrieval. Traditional search methods are often
insufficient, and Large Language Models (LLMs) are constrained by
limited context windows.

This project implements a "Chat with your Documents" application to

solve this problem. It is a comprehensive, multimodal conversational AI
built on a Retrieval-Augmented Generation (RAG) architecture. The
system ingests user-provided documents, processes them into a
searchable vector index, and stores this index in a persistent, project-
based workspace managed by an SQLite database.

The application is built using Python, with Streamlit for the web interface
and LangChain for orchestrating the RAG pipeline. To ensure complete
privacy, cost-free operation, and offline capability, the entire stack runs
locally. Text embedding is handled by a Hugging Face (all-MiniLM-L6-
v2) model, and the core conversational reasoning is powered by a
locally-hosted LLM (e.g., Llama 3) served via Ollama.

Key features include a project-based system for saving and loading

knowledge bases, "live" persona selection for a personalized user
experience, multimodal input (text, images, and audio), and robust
export options (PDF, DOCX, PPTX). The result is a scalable,
responsive, and fully private tool that transforms static documents into
an interactive, conversational knowledge base.
CONCEPT DEFINITIONS (Glossary)
AI: Artificial Intelligence.

LLM: Large Language Model. The "brain" of the chatbot (e.g., Llama 3, Mistral).

Ollama: A tool for running, serving, and managing open-source LLMs (like Llama 3)
locally on a user's machine.

RAG: Retrieval-Augmented Generation. The process of retrieving relevant data

before asking an LLM to generate an answer.

Embeddings: Numerical representations (vectors) of text, images, or audio, allowing

them to be searched based on meaning.

Vector Store: A specialized database (e.g., FAISS) designed to store and search
embeddings efficiently.

LangChain: A Python framework used to "chain" together components (LLM, vector

store, prompts) to build a complete application.

FAISS: Facebook AI Similarity Search. A high-performance local vector database.

Hugging Face: A platform providing open-source ML models. We use it for the local
embedding model.

Streamlit: A Python library used to build and deploy the web-based user interface
(UI) for the app.

SQLite: A lightweight, serverless, disk-based database used to store project

metadata.

Persona: A "system instruction" that defines the AI's tone, style, and goal (e.g.,
"Tutor," "Exam Ready").

Multimodal: The ability to process and understand multiple types of information,

such as text, images, and audio.
SECTION 1: INTRODUCTION
1.1. The Problem: Information Overload & Privacy

In the modern digital landscape, individuals and organizations face a constant deluge of
information. This data is often unstructured and locked in various formats. Manually
extracting specific information is tedious and prone to error.
Furthermore, while cloud-based Large Language Models (LLMs) are powerful, they present
two critical issues:
● Context & Cost: They have limited "context windows" and cannot read thousands of
pages at once. Sending large documents via API is slow and can be expensive.
● Privacy & Security: Using a commercial API requires sending private, sensitive, or
proprietary data to a third-party server, which is not acceptable for many academic,
corporate, or personal use cases.

1.2. The Solution: Local-First RAG

This project solves these problems using a 100% local-first Retrieval-Augmented Generation
(RAG) architecture. The application runs entirely on the user's machine, ensuring that no
data ever leaves their computer.

Indexing: The application reads all documents, splits them into small chunks, and converts
each chunk into a numerical "embedding" using a local Hugging Face model. These
embeddings are stored in a local vector database (FAISS).

Retrieval: When the user asks a question, the app converts the question into an embedding
and uses it to search the local database, retrieving only the top-K most relevant chunks of
text.

Generation: The app feeds this small, relevant context to a local LLM (e.g., Llama 3) served
by Ollama.

This local-first method allows the app to "chat" with thousands of pages of documents for
free, with zero latency, and with perfect privacy, even while offline.

1.3. Project Objectives

The primary objective is to build a robust, private, and offline-capable conversational AI that
serves as a personal knowledge assistant.

The key goals are:

Implement a 100% local RAG pipeline, using Ollama for chat and Hugging Face for
embeddings, to ensure user privacy and eliminate all API costs and rate limits.

Develop a persistent, project-based system using SQLite, allowing users to create, save,
and reload their document collections.

Incorporate "live" persona switching to allow users to dynamically change the AI's output
style (e.g., from "Tutor" to "Exam Ready").
Support multimodal RAG by integrating local vision models (e.g., llava) via Ollama.
Provide robust export options (PDF, DOCX, PPTX) to integrate the AI's answers into a
user's workflow.
SECTION 2: SYSTEM ARCHITECTURE &
TECHNOLOGY STACK

System Architecture

Technology Stack
SECTION 3: IMPLEMENTATION
3.1. Project Structure

The application is a self-contained system. All data and models (except for the initial
Ollama setup) are stored within the project directory, ensuring portability and privacy.

/your-project-folder/
│
├── [Link] # The main Streamlit application
├── [Link] # All Python dependencies
├── [Link] # SQLite database for project management
│
└── projects/ # Directory for all saved project data
│
├── my_project_1/
│ ├── faiss_index/
│ │ ├── [Link] # The RAG vector index
│ │ └── [Link] # The FAISS index metadata
│ │
│ └── an_image.png # A copy of an uploaded image
│
└── my_project_2/
└── ...
3.2. Core Functionality

1: Persistent Workspaces (SQLite)

Persistence is achieved via a [Link] SQLite file. This file contains a single table,
projects, which acts as the master record.

projects table schema:

name (TEXT): The unique project name (e.g., "AI Course Notes").

index_path (TEXT): The file path to its faiss_index folder (e.g.,

projects/ai_course_notes/faiss_index).
image_paths (TEXT): A JSON string list of paths to any associated images.
persona (TEXT): The string name of the persona saved with the project (e.g.,
"Tutor").
The "Load Project" button queries this database, loads the FAISS index from the
index_path into memory (st.session_state.retriever), and sets the app's state.
3.3. Core Functionality

2: The RAG Indexing Pipeline

The build_vector_index function is the core of the RAG pipeline.

Text Extraction: It first loops through all uploaded files and the URL, using
pdfplumber, docx, and BeautifulSoup to extract raw text.

Splitting: It uses LangChain's RecursiveCharacterTextSplitter with chunk_size=1000

and chunk_overlap=200.

Embedding: It initializes the local HuggingFaceEmbeddings(model_name="all-

MiniLM-L6-v2") model. This runs on the user's CPU/GPU, ensuring data never
leaves the machine.

Indexing: It calls FAISS.from_documents(all_chunks, embeddings) to create the in-

memory vector store.

Saving: Finally, it saves the index to disk using vector_index.save_local(index_path).

3.4. Core Functionality

3: The "Live" Persona & Chat Logic

To solve the "stale persona" problem (where changing the persona in the dropdown
had no effect), the application does not store the full, pre-built LangChain RAG
chain. Instead, it stores the building blocks: st.session_state.llm and
st.session_state.retriever.

The RAG chain is rebuilt on-the-fly with every single query. This allows the persona
to be "live":

Python

# This logic runs on EVERY user question:

if prompt_parts:
...
# 1. Get the CURRENTLY selected persona from the sidebar
current_persona_prompt = personae[st.session_state.system_instruction]

# 2. Build a new prompt template with this "live" persona

prompt_template = ChatPromptTemplate.from_template(
f"{current_persona_prompt}\n\n"
"Context: {context}\nQuestion: {input}"
)
# 3. Build a new, fresh RAG chain
document_chain = create_stuff_documents_chain(st.session_state.llm,
prompt_template)
retrieval_chain = create_retrieval_chain(st.session_state.retriever,
document_chain)

# 4. Run the chain

response_stream = retrieval_chain.stream({"input": text_for_rag})
...
SECTION 4: RESULTS & CONCLUSION

4.1. Results

The project was evaluated on its qualitative performance, scalability, and privacy.

Privacy & Cost (Local-First): The primary goal was achieved. The app runs 100%
offline, with zero API calls for embeddings or chat. This makes it infinitely scalable
for a single user, with no API costs or rate limits, and guarantees perfect data
privacy.
Accuracy (RAG Tuning): Initial tests with large chunks (chunk_size=4000) yielded
short answers. By tuning the parameters to chunk_size=1000 and retrieving more
chunks (k=8), the "Detailed Explainer" persona successfully generated long,
comprehensive, and accurate answers, as intended.
Flexibility (Live Personas): The "on-the-fly" chain construction was a success. The
user can switch from "Exam Ready" (getting short, bulleted answers) to "Tutor"
(getting long, simple explanations) for the very next question, all while using the
same underlying data.
Usability (UI/UX): The WhatsApp-style dark mode, streaming responses, "Read
Aloud," and audio inputs provide a modern, professional user experience. The
persistent project system and export features successfully transition the app from a
simple demo to a genuine productivity tool.

4.2. Conclusion
This project successfully demonstrates the power and feasibility of a 100% local-first
RAG architecture. By combining the open-source power of Ollama, LangChain,
Hugging Face, and FAISS, we have created a scalable, persistent, and multimodal
conversational AI that runs entirely on a personal computer.
The application overcomes the core limitations of cloud-based LLMs—namely cost,
rate-limiting, and privacy. It provides a user-friendly interface for anyone to build and
chat with their own specialized, private knowledge base.

4.3. Future Work

While the project is a comprehensive success, the following steps could enhance it
further:
Advanced Multimodal RAG: Currently, images are stored and passed to the LLM.
A more advanced system would use a vision embedding model (like CLIP) to make
the images themselves searchable, allowing a user to ask, "Show me all the
diagrams in my documents."
RAG-on-Audio: The current audio input is transcribed. A future version could
"index" the audio itself, allowing a user to search through hours of lecture recordings.
Agentic Behavior: The system could be expanded using LangGraph to perform
multi-step tasks, such as reading a document, searching the web for new
information, and then combining both to write a summary.
REFERENCES

Master Class: link

Build RAG Chatbot Guide
No ratings yet
Build RAG Chatbot Guide
17 pages
Retrieval-Augmented Generation (RAG) - Based Chatbot System: © NOV 2025 - IRE Journals - Volume 9 Issue 5 - ISSN: 2456-8880
No ratings yet
Retrieval-Augmented Generation (RAG) - Based Chatbot System: © NOV 2025 - IRE Journals - Volume 9 Issue 5 - ISSN: 2456-8880
4 pages
RAG Chatbots: Performance and Challenges
No ratings yet
RAG Chatbots: Performance and Challenges
20 pages
Rag Custom Chatbot
No ratings yet
Rag Custom Chatbot
5 pages
Build Open Source LLM RAG Chatbot Guide
No ratings yet
Build Open Source LLM RAG Chatbot Guide
12 pages
Project Report AI
No ratings yet
Project Report AI
3 pages
Offline Rag Based Chatbot
No ratings yet
Offline Rag Based Chatbot
87 pages
Simple RAG Chatbot Overview
No ratings yet
Simple RAG Chatbot Overview
8 pages
Enhancing LLM Accuracy With RAG
No ratings yet
Enhancing LLM Accuracy With RAG
10 pages
Ravana AGI Core Implementation Plan
No ratings yet
Ravana AGI Core Implementation Plan
6 pages
Building RAG Applications with Langchain
No ratings yet
Building RAG Applications with Langchain
32 pages
RAG Euri
No ratings yet
RAG Euri
4 pages
Leveraging LLMs with RAG Systems
No ratings yet
Leveraging LLMs with RAG Systems
4 pages
Foundations For Llms Integration
No ratings yet
Foundations For Llms Integration
53 pages
Understanding Retrieval Augmented Generation
No ratings yet
Understanding Retrieval Augmented Generation
16 pages
Rag Explanation Document
No ratings yet
Rag Explanation Document
26 pages
What Is RAG?: RAG Introduction To RAG
No ratings yet
What Is RAG?: RAG Introduction To RAG
11 pages
Explainable Ai Chatbot Using Vector Similarity Search and LLM'S
No ratings yet
Explainable Ai Chatbot Using Vector Similarity Search and LLM'S
7 pages
Build Your Personalized AI Chatbot
No ratings yet
Build Your Personalized AI Chatbot
6 pages
RAG Frameworks: LangChain vs. LlamaIndex vs. Haystack
No ratings yet
RAG Frameworks: LangChain vs. LlamaIndex vs. Haystack
12 pages
Chatbot Usng Natural Processing Language
No ratings yet
Chatbot Usng Natural Processing Language
54 pages
Building a GPT-Based Chatbot Guide
No ratings yet
Building a GPT-Based Chatbot Guide
26 pages
Updated Rag Chatbot
No ratings yet
Updated Rag Chatbot
4 pages
Open-Source RAG for Library Search Systems
No ratings yet
Open-Source RAG for Library Search Systems
7 pages
RAG Chatbots: LLaMA & ChromaDB Insights
No ratings yet
RAG Chatbots: LLaMA & ChromaDB Insights
3 pages
Generative AI: Building Chatbots & RAG
No ratings yet
Generative AI: Building Chatbots & RAG
43 pages
Multi-Agent LLM Framework Overview
No ratings yet
Multi-Agent LLM Framework Overview
24 pages
Azure Confidential RAG Deployment Guide
No ratings yet
Azure Confidential RAG Deployment Guide
27 pages
Private Data in Generative AI Solutions
No ratings yet
Private Data in Generative AI Solutions
14 pages
RAGBot: Intelligent Retrieval Chatbot
No ratings yet
RAGBot: Intelligent Retrieval Chatbot
56 pages
Genairag LLM 71731191 PDF
No ratings yet
Genairag LLM 71731191 PDF
31 pages
Real-time RAG System with LLMs
No ratings yet
Real-time RAG System with LLMs
7 pages
Beyond RAG Advanced Architectures
No ratings yet
Beyond RAG Advanced Architectures
12 pages
MCP Rag Final Year Project
No ratings yet
MCP Rag Final Year Project
26 pages
Retrieval Augmented Generation in LLMs
No ratings yet
Retrieval Augmented Generation in LLMs
110 pages
Open Source Financial Document Chatbot
No ratings yet
Open Source Financial Document Chatbot
8 pages
Intelligent RAG Chatbot for Data Marketplace
No ratings yet
Intelligent RAG Chatbot for Data Marketplace
30 pages
ChatBot Report
No ratings yet
ChatBot Report
31 pages
AI Chatbot Project with RAG Setup
No ratings yet
AI Chatbot Project with RAG Setup
6 pages
Project Implementation Report 20pages
No ratings yet
Project Implementation Report 20pages
20 pages
Build a Python Chatbot with LangChain
No ratings yet
Build a Python Chatbot with LangChain
32 pages
Summarizing Private Documents with RAG
No ratings yet
Summarizing Private Documents with RAG
13 pages
RAG Chatbot Development for Student Support
No ratings yet
RAG Chatbot Development for Student Support
12 pages
Understanding RAG with Gemini Pro
No ratings yet
Understanding RAG with Gemini Pro
42 pages
PersonaAI: Personalized Digital Avatars
No ratings yet
PersonaAI: Personalized Digital Avatars
12 pages
Building RAG with LLMs and LlamaIndex
No ratings yet
Building RAG with LLMs and LlamaIndex
20 pages
RAG Chatbot with LangChain & Llama 2
No ratings yet
RAG Chatbot with LangChain & Llama 2
12 pages
Context-Aware AI for Conversational Agents
No ratings yet
Context-Aware AI for Conversational Agents
27 pages
RAG Chatbot for Healthcare with Llama 2
No ratings yet
RAG Chatbot for Healthcare with Llama 2
12 pages
STRP
No ratings yet
STRP
7 pages
Beyond Static Knowledge A Survey On The Evolution and Future of Retrieval Augmented Generation For LLMs
No ratings yet
Beyond Static Knowledge A Survey On The Evolution and Future of Retrieval Augmented Generation For LLMs
8 pages
FlashRAG: Modular RAG Toolkit
No ratings yet
FlashRAG: Modular RAG Toolkit
26 pages
RAG: Enhancing AI with Data Integration
No ratings yet
RAG: Enhancing AI with Data Integration
8 pages
Rag With Python Cookbook 9798341600560 9798341600515
No ratings yet
Rag With Python Cookbook 9798341600560 9798341600515
109 pages
GenAI Assignment6
No ratings yet
GenAI Assignment6
7 pages
Request for Competitive Programming Workshop
No ratings yet
Request for Competitive Programming Workshop
3 pages
Aeroglide Workshop Promotion Request
No ratings yet
Aeroglide Workshop Promotion Request
1 page
Computer Science Theory Assignment
No ratings yet
Computer Science Theory Assignment
1 page
Data Science Assignment on Augmentation Techniques
No ratings yet
Data Science Assignment on Augmentation Techniques
1 page
EE 371 Microprocessor Systems Outline
No ratings yet
EE 371 Microprocessor Systems Outline
2 pages
PLC Fundamentals and Architecture Guide
No ratings yet
PLC Fundamentals and Architecture Guide
12 pages
KNX Home Automation Training System F1101
No ratings yet
KNX Home Automation Training System F1101
9 pages
Understanding Reconnaissance in Hacking
No ratings yet
Understanding Reconnaissance in Hacking
13 pages
Class X Practical Coding Tasks
No ratings yet
Class X Practical Coding Tasks
4 pages
AIDL in Android Development Explained
No ratings yet
AIDL in Android Development Explained
2 pages
An Introduction To Programming For Hackers
No ratings yet
An Introduction To Programming For Hackers
62 pages
LSASS Malware Analysis Report
No ratings yet
LSASS Malware Analysis Report
33 pages
Hand Gesture Volume Control System
No ratings yet
Hand Gesture Volume Control System
41 pages
Software Engineering Lab Exam Topics
No ratings yet
Software Engineering Lab Exam Topics
6 pages
K L Microcontroller-Based Code Hopping Encoder: EE OQ ®
No ratings yet
K L Microcontroller-Based Code Hopping Encoder: EE OQ ®
12 pages
IoT Microservices for Aging in Smart Cities
No ratings yet
IoT Microservices for Aging in Smart Cities
14 pages
E-Commerce Website Project Report
No ratings yet
E-Commerce Website Project Report
16 pages
2015 MicroLink IT College Exit Exam
No ratings yet
2015 MicroLink IT College Exit Exam
28 pages
IBM Power E1050 Server Overview
No ratings yet
IBM Power E1050 Server Overview
65 pages
Analyzing Network Traffic Protocols
No ratings yet
Analyzing Network Traffic Protocols
3 pages
Understanding Learning Management Systems
No ratings yet
Understanding Learning Management Systems
20 pages
Introduction to Python Programming Basics
No ratings yet
Introduction to Python Programming Basics
30 pages
Brief: Integrated Gigabit Ethernet and Memory Card Reader Controller
No ratings yet
Brief: Integrated Gigabit Ethernet and Memory Card Reader Controller
2 pages
Understanding S0C7 ABEND Issues
No ratings yet
Understanding S0C7 ABEND Issues
15 pages
Mobility Management
No ratings yet
Mobility Management
31 pages
PHP Full-Stack Developer Trainee JD
No ratings yet
PHP Full-Stack Developer Trainee JD
2 pages
C++ Object-Oriented Programming Notes
100% (2)
C++ Object-Oriented Programming Notes
32 pages
Total Response of Continuous Time LTI Systems
No ratings yet
Total Response of Continuous Time LTI Systems
8 pages
IoT Architecture Explained: Layers & Functions
No ratings yet
IoT Architecture Explained: Layers & Functions
50 pages
Intel 8085 Microprocessor Overview
100% (1)
Intel 8085 Microprocessor Overview
43 pages
ROV202 Solutions for Data Processing
No ratings yet
ROV202 Solutions for Data Processing
2 pages
C Programming Practice Questions
100% (1)
C Programming Practice Questions
17 pages
RTL838x RTL833x Developer Guide V1.3 NP
No ratings yet
RTL838x RTL833x Developer Guide V1.3 NP
169 pages
Internal Organization of Memory Chips
No ratings yet
Internal Organization of Memory Chips
8 pages

RAG-Powered Local Document Chatbot

Uploaded by

RAG-Powered Local Document Chatbot

Uploaded by

Project Report

RAG-Powered Local Document Chatbot

[Link] and [Link] (Dual Degree)

Mathematics And Data Science

Scholar No. 2340401103

Project work completed

IBM SkillBuild AI Agent Architect Program

Department of Mathematics, Bioinformatics and Computer Applications

Maulana Azad National Institute of Technology Bhopal-462003 (India)

TABLE OF CONTENTS ...

➢ 2.1. System Architecture ➢ 4.2. Conclusion

➢ 2.2. Technology Stack ➢ 4.3. Future Work

The exponential growth of digital information, siloed in diverse formats

This project implements a "Chat with your Documents" application to

Key features include a project-based system for saving and loading

RAG: Retrieval-Augmented Generation. The process of retrieving relevant data

Embeddings: Numerical representations (vectors) of text, images, or audio, allowing

LangChain: A Python framework used to "chain" together components (LLM, vector

FAISS: Facebook AI Similarity Search. A high-performance local vector database.

SQLite: A lightweight, serverless, disk-based database used to store project

Multimodal: The ability to process and understand multiple types of information,

1.2. The Solution: Local-First RAG

1.3. Project Objectives

The key goals are:

1: Persistent Workspaces (SQLite)

projects table schema:

index_path (TEXT): The file path to its faiss_index folder (e.g.,

2: The RAG Indexing Pipeline

Splitting: It uses LangChain's RecursiveCharacterTextSplitter with chunk_size=1000

Embedding: It initializes the local HuggingFaceEmbeddings(model_name="all-

Indexing: It calls FAISS.from_documents(all_chunks, embeddings) to create the in-

Saving: Finally, it saves the index to disk using vector_index.save_local(index_path).

3.4. Core Functionality

3: The "Live" Persona & Chat Logic

# This logic runs on EVERY user question:

# 2. Build a new prompt template with this "live" persona

# 4. Run the chain

4.3. Future Work

Master Class: link

Master Class: link

Master Class: link

Master Class: link

Master Class: link

You might also like