"A professional-grade medical assistant that grounds every answer in verified documents β not hallucinations."
- π©Ί What is MediQuery.ai?
- πΈ UI Showcase
- π Live Project Dashboard
- β¨ Key Features
- π§ RAG Pipeline
- π οΈ Tech Stack
- π Project Structure
- π¬ Experimental Phase
- π§ͺ Sample Queries β Zero Hallucination Proof
- π¦ Getting Started
- 10.1 π§ Prerequisites
- 10.2 β¬οΈ Install & Configure
- 10.3 ποΈ Build Vector Index
- 10.4 π₯οΈ Run Locally
- π³ Docker Quick Start
- ποΈ Enterprise Infrastructure Showcase
- β‘ Performance
- πΊοΈ Roadmap
- π€ Contributing
- π Changelog
- π€ Author
- β Show Your Support
MediQuery.ai is a production-grade Retrieval-Augmented Generation (RAG) medical assistant. Unlike standard LLMs that rely on pre-trained data alone, MediQuery.ai grounds every response in your indexed medical documents β delivering accurate, traceable, and hallucination-resistant healthcare insights.
π The core guarantee: If the answer isn't in the indexed documents, the model says so β no fabrication.
| π | Version | π¦ Highlight |
|---|---|---|
| π | v2.0 |
Flask UI with dark/light mode, full AWS EC2+ECR+GitHub Actions pipeline |
| π | v1.5 |
Groq Llama 3.3 70B integration, Pinecone semantic search |
| π | v1.0 |
Initial RAG chatbot β LangChain + HuggingFace embeddings |
β‘ Thinking Indicator animates while Groq LPUβ’ processes Β· Glassmorphism panels with cyan neon glow Β· Dark/Light toggle top-right
π©Ί Source Attribution visible per response Β· Same RAG accuracy Β· Optimised for daytime clinical use
| π₯οΈ Feature | π Dark Mode | βοΈ Light Mode |
|---|---|---|
| β‘ Thinking Indicator | β Neon pulse animation | β Subtle spinner |
| π Source Attribution | β Cyan-highlighted | β Grey-highlighted |
| πͺ Glassmorphism UI | β Full depth blur | β Light frosted |
| π± Mobile Responsive | β | β |
| π Theme Toggle | β One-click switch | β One-click switch |
| π Service | π‘ Status | π Description |
|---|---|---|
| βοΈ CI/CD Pipeline | GitHub Actions β ECR β EC2 auto-deploy | |
| π Production App | mediquery-ai.streamlit.app β primary serverless host | |
| ποΈ AWS EC2 | Enterprise scalability showcase (Docker + ECR) | |
| ποΈ Vector DB | Index: medical-chatbot |
|
| π§ Inference Engine | Real-time neural inference via Groq LPUβ’ |
| π‘οΈ | Verifiable Accuracy | Responses grounded strictly in indexed medical PDFs β hallucinations eliminated by design |
| β‘ | Ultra-Low Latency | Groq LPUβ’ Inference Engine delivers near-instantaneous responses on Llama 3.3 70B |
| π | Semantic Search | Pinecone real-time similarity search over all-MiniLM-L6-v2 vector embeddings |
| π | Dark / Light Mode UI | Clean Flask frontend with glassmorphism, dark/light toggle, and real-time thinking indicators |
| π | Full CI/CD Pipeline | GitHub Actions β Docker build β AWS ECR push β EC2 auto-deploy on every git push |
| π³ | Docker Native | Single docker run to launch the full stack β no conda, no local setup required |
| π | Custom Knowledge Base | Drop any medical PDF into data/ and re-run store_index.py to update the vector index |
| π | Secret Management | All API keys managed via .env locally and GitHub Secrets in CI/CD β never hardcoded |
MediQuery.ai follows a strict 5-stage RAG pipeline:
| Stage | π§ Component | π What Happens |
|---|---|---|
| 1οΈβ£ Ingestion | store_index.py |
Medical PDFs loaded, split into semantic chunks |
| 2οΈβ£ Embedding | all-MiniLM-L6-v2 |
Chunks converted to high-dimensional vectors |
| 3οΈβ£ Indexing | Pinecone | Vectors stored in medical-chatbot index |
| 4οΈβ£ Retrieval | LangChain Retriever | User query β top-k similar chunks fetched |
| 5οΈβ£ Generation | Groq Llama 3.3 70B | Answer synthesised strictly from retrieved context |
graph TD
U[π€ USER QUERY] -->|HTTP POST| FL[π Flask App β app.py]
subgraph Ingestion ["π INGESTION β store_index.py"]
PDF[π Medical PDFs<br/>data/]
CHUNK[βοΈ Text Splitter<br/>Semantic Chunks]
EMBED[π’ HuggingFace Embeddings<br/>all-MiniLM-L6-v2]
end
PDF --> CHUNK --> EMBED
subgraph VectorStore ["ποΈ VECTOR STORE"]
PC[π Pinecone Index<br/>medical-chatbot]
end
EMBED -->|Index vectors| PC
subgraph RAG ["π§ RAG PIPELINE β app.py"]
RET[π LangChain Retriever<br/>Top-k similarity search]
CTX[π Retrieved Context<br/>Relevant chunks]
GEN[β‘ Groq Llama 3.3 70B<br/>Answer generation]
end
FL -->|Embed query| PC
PC -->|Top-k vectors| RET
RET --> CTX
CTX --> GEN
GEN -->|Grounded answer| FL
FL -->|JSON response| U
subgraph DevOps ["βοΈ CI/CD β GitHub Actions"]
GH[π git push]
ECR[π¦ AWS ECR<br/>Docker image]
EC2[π₯οΈ AWS EC2<br/>Docker run :8080]
end
GH --> ECR --> EC2
classDef user fill:#0a1a2e,stroke:#0ea5e9,stroke-width:2px,color:#fff;
classDef app fill:#0f172a,stroke:#06b6d4,stroke-width:2px,color:#fff;
classDef ingest fill:#0a2e0a,stroke:#10b981,stroke-width:2px,color:#fff;
classDef store fill:#0a1a2e,stroke:#008080,stroke-width:2px,color:#fff;
classDef rag fill:#1e1b0a,stroke:#FF6B35,stroke-width:2px,color:#fff;
classDef devops fill:#2e1a0a,stroke:#FF9900,stroke-width:2px,color:#fff;
class U user;
class FL app;
class PDF,CHUNK,EMBED ingest;
class PC store;
class RET,CTX,GEN rag;
class GH,ECR,EC2 devops;
sequenceDiagram
autonumber
participant U as π€ User
participant FL as π Flask
participant PC as ποΈ Pinecone
participant GR as β‘ Groq LPU
Note over U,FL: π¬ Query Phase
U->>FL: POST /get { "msg": "What is hypertension?" }
FL->>FL: Embed query via all-MiniLM-L6-v2
Note over FL,PC: π Retrieval Phase
FL->>PC: similarity_search(query_vector, top_k=3)
PC-->>FL: Top-3 relevant medical chunks
Note over FL,GR: π§ Generation Phase
FL->>GR: prompt = system + context + user_query
GR-->>FL: Grounded answer (Llama 3.3 70B)
Note over FL,U: π€ Response Phase
FL-->>U: JSON { "answer": "Hypertension is..." }
| βοΈ Capability | π¬ Implementation | π Result |
|---|---|---|
| π‘οΈ Hallucination Guard | RAG β answers from docs only | Verifiable, traceable responses |
| β‘ Inference Speed | Groq LPUβ’ hardware | Near-zero token latency |
| π Semantic Search | Pinecone ANN index | Sub-100ms top-k retrieval |
| π Secret Safety | .env + GitHub Secrets |
Zero hardcoded credentials |
| π Auto-Deploy | GitHub Actions β ECR β EC2 | One push, live in minutes |
π©Ί MediQuery.ai/
β
βββ π app.py # Flask entry point β routes & RAG logic
βββ ποΈ store_index.py # Data ingestion & Pinecone indexing script
β
βββ π§ src/
β βββ π§ helper.py # Embedding logic & utility functions
β βββ π prompt.py # System & RAG prompt templates
β
βββ π data/ # Source medical PDFs (drop new PDFs here)
β βββ π Medical_book.pdf
β
βββ πΌοΈ assets/ # UI screenshots & demo images
β
βββ π¨ static/ # CSS, JS, images
β βββ π dark.css # Dark mode stylesheet
β βββ π chat.js # AJAX real-time chat logic
β
βββ πΌοΈ templates/
β βββ π chat.html # Main chat UI template
β
βββ π¬ research/ # Jupyter notebooks for experimentation
β
βββ π³ Dockerfile # Container build (python:3.10-slim + HEALTHCHECK)
βββ βοΈ .github/workflows/cicd.yaml # GitHub Actions CI/CD pipeline
βββ π¦ requirements.txt # Python dependencies
βββ π§ setup.py # Project packaging config
βββ π .env.example # Environment variable template
The research/ folder contains trials.ipynb β the engineering workbench used before settling on the final pipeline parameters. This was not a tutorial copy; it was active optimisation.
| π§ͺ Variable | Values Tested | β Final Choice | π Why |
|---|---|---|---|
| Chunk Size | 500, 750, 1000 tokens | 500 |
Better semantic precision; 1000 caused context bleed across topics |
| Chunk Overlap | 0, 50, 100 tokens | 50 |
Prevents answer truncation at chunk boundaries |
| Top-K Retrieval | 2, 3, 5 | 3 |
2 missed edge cases; 5 added noise to prompt context |
| Embedding Model | all-MiniLM-L6-v2, mpnet-base-v2 |
all-MiniLM-L6-v2 |
5Γ faster with comparable accuracy on medical text |
| LLM Temperature | 0.0, 0.3, 0.7 | 0.0 |
Deterministic answers critical for medical use case |
π‘ All experiments are reproducible in
research/trials.ipynbβ open it to see the raw token latency and retrieval precision comparisons.
The table below demonstrates the RAG pipeline in action β showing how retrieved context from Medical_book.pdf directly shapes the grounded answer, with no fabrication.
| π¬ User Question | π Retrieved Context (from Medical_book.pdf) |
π€ Grounded Response |
|---|---|---|
| "What is hypertension and how is it classified?" | "Hypertension is defined as systolic BP β₯ 140 mmHg or diastolic BP β₯ 90 mmHg. Stage 1: 140β159/90β99. Stage 2: β₯ 160/100..." | "Hypertension is high blood pressure classified into Stage 1 (140β159/90β99 mmHg) and Stage 2 (β₯160/100 mmHg) based on systolic and diastolic readings." |
| "What are the symptoms of Type 2 diabetes?" | "Common symptoms include polyuria, polydipsia, polyphagia, fatigue, blurred vision, and slow wound healing..." | "Type 2 diabetes presents with increased urination, excessive thirst, increased hunger, fatigue, blurred vision, and poor wound healing." |
| "What is the mechanism of action of aspirin?" | "Aspirin irreversibly inhibits cyclooxygenase (COX-1 and COX-2), blocking thromboxane A2 synthesis and reducing platelet aggregation..." | "Aspirin works by permanently blocking COX-1 and COX-2 enzymes, which prevents thromboxane A2 production and reduces the blood's ability to clot." |
| "Who invented the telescope?" | (No relevant chunk found in medical index) | "I cannot find information about this in the indexed medical documents. Please ask a medically relevant question." |
π‘οΈ Row 4 is the most important: when the answer doesn't exist in the documents, the model says so β this is the hallucination guard in practice.
| π οΈ Tool | π Version | π Link |
|---|---|---|
β₯ 3.10 |
python.org | |
| any | anaconda.com | |
| ποΈ Pinecone account | free tier | pinecone.io |
| β‘ Groq API key | free tier | console.groq.com |
π₯ Step 1 β Clone
git clone https://github.com/salonyranjan/MediQuery.ai.git
cd MediQuery.aiπ Step 2 β Create environment
# With Conda (recommended)
conda create -n medibot python=3.10 -y
conda activate medibot
# Or with venv
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activateπ¦ Step 3 β Install dependencies
pip install -r requirements.txt
# Required for create_retrieval_chain in newer LangChain versions
pip install langchain-classic
# Installs src/ as a local editable package via setup.py
# This lets app.py import from src/helper.py and src/prompt.py without path hacks
pip install -e .π Step 4 β Configure secrets
cp .env.example .envEdit .env:
PINECONE_API_KEY=your_pinecone_api_key
GROQ_API_KEY=your_groq_api_keyπ Security Note: The project uses
.gitignoreto protect API keys (*.env), exclude virtual environments (venv_medical/,.venv/), and keep generated artifacts out of version control. This is a security-first practice β never hardcode credentials, never commit yourvenv/or.env. If you accidentally track them, rungit rm --cached .envto untrack without deleting.
Place your medical PDFs in the data/ folder, then run:
python store_index.pyβ This embeds your PDFs with
all-MiniLM-L6-v2and pushes vectors to Pinecone. Run once per new PDF batch.
python app.pyπ Opens at http://localhost:8080
No conda, no venv β single command:
# Build
docker build -t mediquery .
# Run with secrets injected at runtime
docker run -d -p 8080:8080 \
-e PINECONE_API_KEY="your_pinecone_key" \
-e GROQ_API_KEY="your_groq_key" \
--name mediquery_app \
mediqueryπ Opens at http://localhost:8080
Recommended Dockerfile (slim + health-checked):
# slim base β ~200 MB vs ~900 MB for full python:3.10
FROM python:3.10-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
# Health check β Docker/AWS monitors if Flask is responding
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
CMD curl -f http://localhost:8080/ || exit 1
CMD ["python", "app.py"]πΌ Recruiter note: While the app runs serverlessly on Streamlit Cloud for cost efficiency, this section demonstrates the full production-grade AWS infrastructure that can be activated for enterprise scale β showing Docker, ECR, EC2, and automated CI/CD are all in place.
Step 1 β IAM user for deployment
Create an IAM user with:
AmazonEC2ContainerRegistryFullAccessAmazonEC2FullAccess
Save the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
Step 2 β Create ECR repository
<account-id>.dkr.ecr.<region>.amazonaws.com/medicalbot
# Example: 577435557871.dkr.ecr.eu-north-1.amazonaws.com/medical_chatbot
Step 3 β Launch EC2 (Ubuntu) + install Docker
sudo apt-get update -y && sudo apt-get upgrade -y
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu && newgrp docker
β οΈ Open port 8080 in your EC2 Security Group inbound rules.
Step 4 β Register EC2 as self-hosted GitHub runner
Go to: GitHub repo β Settings β Actions β Runners β New self-hosted runner β follow the Linux install commands on your EC2 instance.
Step 5 β Add GitHub Secrets
Go to: Settings β Secrets and variables β Actions β add:
| π Secret | π Value |
|---|---|
AWS_ACCESS_KEY_ID |
From IAM step |
AWS_SECRET_ACCESS_KEY |
From IAM step |
AWS_DEFAULT_REGION |
e.g. eu-north-1 |
ECR_REPO |
Your ECR URI |
PINECONE_API_KEY |
Your Pinecone key |
GROQ_API_KEY |
Your Groq key |
Step 6 β Push to trigger pipeline
git push origin mainOn every push, GitHub Actions will:
git push β Build Docker image β Push to ECR β docker pull on EC2 β docker run :8080
| π Metric | π― Value | π Notes |
|---|---|---|
| β‘ Groq Inference Latency | ~500ms |
Llama 3.3 70B via Groq LPUβ’ hardware |
| π Token Throughput | ~2,000 tok/s |
Groq LPUβ’ β orders of magnitude faster than GPU inference |
| π Pinecone Retrieval | < 100ms |
Top-k ANN similarity search |
| π¬ End-to-End Latency | < 1s |
Query β embed β retrieve β generate β response |
| ποΈ CI/CD Deploy | < 5 min |
GitHub Actions β ECR β EC2 full pipeline |
| π³ Docker Image Size | ~200 MB |
python:3.10-slim base |
| π Index Capacity | unlimited |
Add any number of PDFs to data/ |
| Status | π Feature | π― Priority |
|---|---|---|
| β | RAG pipeline β LangChain + Pinecone + Groq | π΄ Core |
| β | Flask UI with dark/light mode | π΄ Core |
| β | Docker + AWS EC2+ECR deployment | π΄ Core |
| β | GitHub Actions CI/CD auto-deploy | π΄ Core |
| π | Multi-document support β index multiple PDFs simultaneously | π‘ High |
| π | Source citation β show which document/page the answer came from | π‘ High |
| π | Conversation memory β multi-turn context window | π‘ High |
| π | User auth β personal indexed document libraries | π’ Planned |
| π | Streamlit variant β parallel serverless deployment | π’ Planned |
| π | Fine-tuned embeddings β domain-specific medical embedding model | π’ Planned |
| π‘ | Voice interface β STT/TTS for accessibility | π΅ Idea |
# 1. Fork on GitHub
# 2. Create your branch
git checkout -b feature/your-feature
# 3. Commit with conventional format
git commit -m "feat: add your feature"
# Prefixes: fix: | docs: | style: | refactor: | test: | chore:
# 4. Push & open a PR
git push origin feature/your-featurePriority areas:
| π₯ Area | π What's Needed |
|---|---|
| π Source Citations | Return document name + page number per answer |
| π§ Memory | LangChain ConversationBufferMemory integration |
| π§ͺ Tests | Pytest for RAG pipeline stages and Flask routes |
| π¨ UI | More theme variants, mobile responsiveness |
| Version | Highlights |
|---|---|
π v2.0.0 |
Flask UI + dark/light mode Β· full AWS EC2+ECR+GitHub Actions CI/CD |
v1.5.0 |
Groq Llama 3.3 70B Β· Pinecone semantic search Β· Docker support |
v1.0.0 |
π Initial RAG chatbot β LangChain + HuggingFace embeddings |
|
π€ ML Engineer Β Β·Β π§βπ» Full-Stack Dev Β Β·Β βοΈ Cloud & DevOps "Building intelligent systems that are as trustworthy as they are fast." |
If MediQuery.ai impressed you, helped your research, or gave you ideas for your own RAG system β show it some love! π©Ί
π‘ Pro Tip: Go to GitHub repo Settings β Social Preview and upload the dark-mode screenshot. When you share on LinkedIn, your Cyber-Neon UI shows instead of a generic GitHub card β instant recruiter attention.
Developed with π©Ί by Salony Ranjan Β Β·Β Β© 2026 MediQuery.ai Β· MIT
