Skip to content

salonyranjan/MediQuery.ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Typing SVG







"A professional-grade medical assistant that grounds every answer in verified documents β€” not hallucinations."


Β  Β  Β 


πŸ“‹ Table of Contents

  1. 🩺 What is MediQuery.ai?
  2. πŸ“Έ UI Showcase
  3. πŸ“Š Live Project Dashboard
  4. ✨ Key Features
  5. 🧠 RAG Pipeline
  6. πŸ› οΈ Tech Stack
  7. πŸ“‚ Project Structure
  8. πŸ”¬ Experimental Phase
  9. πŸ§ͺ Sample Queries β€” Zero Hallucination Proof
  10. πŸ“¦ Getting Started
  11. 🐳 Docker Quick Start
  12. πŸ—οΈ Enterprise Infrastructure Showcase
  13. ⚑ Performance
  14. πŸ—ΊοΈ Roadmap
  15. 🀝 Contributing
  16. πŸ“„ Changelog
  17. πŸ‘€ Author
  18. ⭐ Show Your Support

1. 🩺 What is MediQuery.ai?

MediQuery.ai is a production-grade Retrieval-Augmented Generation (RAG) medical assistant. Unlike standard LLMs that rely on pre-trained data alone, MediQuery.ai grounds every response in your indexed medical documents β€” delivering accurate, traceable, and hallucination-resistant healthcare insights.

πŸ”‘ The core guarantee: If the answer isn't in the indexed documents, the model says so β€” no fabrication.

πŸ”– Version πŸ“¦ Highlight
πŸ†• v2.0 Flask UI with dark/light mode, full AWS EC2+ECR+GitHub Actions pipeline
πŸ”„ v1.5 Groq Llama 3.3 70B integration, Pinecone semantic search
πŸŽ‰ v1.0 Initial RAG chatbot β€” LangChain + HuggingFace embeddings

2. πŸ“Έ UI Showcase

UI tagline

πŸŒ™ Dark Mode β€” Default Cyber-Neon Experience

MediQuery AI Dark Mode

⚑ Thinking Indicator animates while Groq LPUβ„’ processes Β· Glassmorphism panels with cyan neon glow Β· Dark/Light toggle top-right


β˜€οΈ Light Mode β€” Clean Clinical Interface

MediQuery AI Light Mode

🩺 Source Attribution visible per response · Same RAG accuracy · Optimised for daytime clinical use


πŸ–₯️ Feature πŸŒ™ Dark Mode β˜€οΈ Light Mode
⚑ Thinking Indicator βœ… Neon pulse animation βœ… Subtle spinner
πŸ“„ Source Attribution βœ… Cyan-highlighted βœ… Grey-highlighted
πŸͺŸ Glassmorphism UI βœ… Full depth blur βœ… Light frosted
πŸ“± Mobile Responsive βœ… βœ…
πŸŒ“ Theme Toggle βœ… One-click switch βœ… One-click switch

3. πŸ“Š Live Project Dashboard

πŸ”Œ Service πŸ“‘ Status πŸ“ Description
βš™οΈ CI/CD Pipeline Build GitHub Actions β†’ ECR β†’ EC2 auto-deploy
🌐 Production App Online mediquery-ai.streamlit.app β€” primary serverless host
πŸ—οΈ AWS EC2 AWS Enterprise scalability showcase (Docker + ECR)
πŸ—„οΈ Vector DB Pinecone Index: medical-chatbot
🧠 Inference Engine Groq Real-time neural inference via Groq LPUβ„’

4. ✨ Key Features

πŸ›‘οΈVerifiable AccuracyResponses grounded strictly in indexed medical PDFs β€” hallucinations eliminated by design
⚑Ultra-Low LatencyGroq LPUβ„’ Inference Engine delivers near-instantaneous responses on Llama 3.3 70B
πŸ”Semantic SearchPinecone real-time similarity search over all-MiniLM-L6-v2 vector embeddings
πŸŒ™Dark / Light Mode UIClean Flask frontend with glassmorphism, dark/light toggle, and real-time thinking indicators
πŸ”„Full CI/CD PipelineGitHub Actions β†’ Docker build β†’ AWS ECR push β†’ EC2 auto-deploy on every git push
🐳Docker NativeSingle docker run to launch the full stack β€” no conda, no local setup required
πŸ“„Custom Knowledge BaseDrop any medical PDF into data/ and re-run store_index.py to update the vector index
πŸ”Secret ManagementAll API keys managed via .env locally and GitHub Secrets in CI/CD β€” never hardcoded

5. 🧠 RAG Pipeline

5.1 πŸ”„ Pipeline Flow

MediQuery.ai follows a strict 5-stage RAG pipeline:

Stage πŸ”§ Component πŸ“ What Happens
1️⃣ Ingestion store_index.py Medical PDFs loaded, split into semantic chunks
2️⃣ Embedding all-MiniLM-L6-v2 Chunks converted to high-dimensional vectors
3️⃣ Indexing Pinecone Vectors stored in medical-chatbot index
4️⃣ Retrieval LangChain Retriever User query β†’ top-k similar chunks fetched
5️⃣ Generation Groq Llama 3.3 70B Answer synthesised strictly from retrieved context

5.2 πŸ“ Architecture Diagram

graph TD
    U[πŸ‘€ USER QUERY] -->|HTTP POST| FL[🌐 Flask App β€” app.py]

    subgraph Ingestion ["πŸ“„ INGESTION β€” store_index.py"]
        PDF[πŸ“‹ Medical PDFs<br/>data/]
        CHUNK[βœ‚οΈ Text Splitter<br/>Semantic Chunks]
        EMBED[πŸ”’ HuggingFace Embeddings<br/>all-MiniLM-L6-v2]
    end

    PDF --> CHUNK --> EMBED

    subgraph VectorStore ["πŸ—„οΈ VECTOR STORE"]
        PC[πŸ“Œ Pinecone Index<br/>medical-chatbot]
    end

    EMBED -->|Index vectors| PC

    subgraph RAG ["🧠 RAG PIPELINE β€” app.py"]
        RET[πŸ” LangChain Retriever<br/>Top-k similarity search]
        CTX[πŸ“„ Retrieved Context<br/>Relevant chunks]
        GEN[⚑ Groq Llama 3.3 70B<br/>Answer generation]
    end

    FL -->|Embed query| PC
    PC -->|Top-k vectors| RET
    RET --> CTX
    CTX --> GEN
    GEN -->|Grounded answer| FL
    FL -->|JSON response| U

    subgraph DevOps ["☁️ CI/CD β€” GitHub Actions"]
        GH[πŸ”€ git push]
        ECR[πŸ“¦ AWS ECR<br/>Docker image]
        EC2[πŸ–₯️ AWS EC2<br/>Docker run :8080]
    end

    GH --> ECR --> EC2

    classDef user fill:#0a1a2e,stroke:#0ea5e9,stroke-width:2px,color:#fff;
    classDef app fill:#0f172a,stroke:#06b6d4,stroke-width:2px,color:#fff;
    classDef ingest fill:#0a2e0a,stroke:#10b981,stroke-width:2px,color:#fff;
    classDef store fill:#0a1a2e,stroke:#008080,stroke-width:2px,color:#fff;
    classDef rag fill:#1e1b0a,stroke:#FF6B35,stroke-width:2px,color:#fff;
    classDef devops fill:#2e1a0a,stroke:#FF9900,stroke-width:2px,color:#fff;

    class U user;
    class FL app;
    class PDF,CHUNK,EMBED ingest;
    class PC store;
    class RET,CTX,GEN rag;
    class GH,ECR,EC2 devops;
Loading

5.3 ⚑ Sequence Diagram

sequenceDiagram
    autonumber
    participant U  as πŸ‘€ User
    participant FL as 🌐 Flask
    participant PC as πŸ—„οΈ Pinecone
    participant GR as ⚑ Groq LPU

    Note over U,FL: πŸ’¬ Query Phase
    U->>FL: POST /get { "msg": "What is hypertension?" }
    FL->>FL: Embed query via all-MiniLM-L6-v2

    Note over FL,PC: πŸ” Retrieval Phase
    FL->>PC: similarity_search(query_vector, top_k=3)
    PC-->>FL: Top-3 relevant medical chunks

    Note over FL,GR: 🧠 Generation Phase
    FL->>GR: prompt = system + context + user_query
    GR-->>FL: Grounded answer (Llama 3.3 70B)

    Note over FL,U: πŸ“€ Response Phase
    FL-->>U: JSON { "answer": "Hypertension is..." }
Loading

6. πŸ› οΈ Tech Stack

🧠 AI / ML Layer

🌐 Backend & Frontend

☁️ DevOps & Cloud

βš™οΈ Capability πŸ”¬ Implementation πŸ† Result
πŸ›‘οΈ Hallucination Guard RAG β€” answers from docs only Verifiable, traceable responses
⚑ Inference Speed Groq LPUβ„’ hardware Near-zero token latency
πŸ” Semantic Search Pinecone ANN index Sub-100ms top-k retrieval
πŸ” Secret Safety .env + GitHub Secrets Zero hardcoded credentials
πŸ”„ Auto-Deploy GitHub Actions β†’ ECR β†’ EC2 One push, live in minutes

7. πŸ“‚ Project Structure

🩺 MediQuery.ai/
β”‚
β”œβ”€β”€ 🌐 app.py                        # Flask entry point β€” routes & RAG logic
β”œβ”€β”€ πŸ—„οΈ store_index.py                # Data ingestion & Pinecone indexing script
β”‚
β”œβ”€β”€ 🧠 src/
β”‚   β”œβ”€β”€ πŸ”§ helper.py                 # Embedding logic & utility functions
β”‚   └── πŸ“ prompt.py                 # System & RAG prompt templates
β”‚
β”œβ”€β”€ πŸ“„ data/                         # Source medical PDFs (drop new PDFs here)
β”‚   └── πŸ“‹ Medical_book.pdf
β”‚
β”œβ”€β”€ πŸ–ΌοΈ assets/                       # UI screenshots & demo images
β”‚
β”œβ”€β”€ 🎨 static/                       # CSS, JS, images
β”‚   β”œβ”€β”€ πŸŒ™ dark.css                  # Dark mode stylesheet
β”‚   └── πŸ“œ chat.js                   # AJAX real-time chat logic
β”‚
β”œβ”€β”€ πŸ–ΌοΈ templates/
β”‚   └── 🌐 chat.html                 # Main chat UI template
β”‚
β”œβ”€β”€ πŸ”¬ research/                     # Jupyter notebooks for experimentation
β”‚
β”œβ”€β”€ 🐳 Dockerfile                    # Container build (python:3.10-slim + HEALTHCHECK)
β”œβ”€β”€ βš™οΈ .github/workflows/cicd.yaml   # GitHub Actions CI/CD pipeline
β”œβ”€β”€ πŸ“¦ requirements.txt              # Python dependencies
β”œβ”€β”€ πŸ”§ setup.py                      # Project packaging config
└── πŸ”’ .env.example                  # Environment variable template

8. πŸ”¬ Experimental Phase

The research/ folder contains trials.ipynb β€” the engineering workbench used before settling on the final pipeline parameters. This was not a tutorial copy; it was active optimisation.

πŸ§ͺ Variable Values Tested βœ… Final Choice πŸ“ Why
Chunk Size 500, 750, 1000 tokens 500 Better semantic precision; 1000 caused context bleed across topics
Chunk Overlap 0, 50, 100 tokens 50 Prevents answer truncation at chunk boundaries
Top-K Retrieval 2, 3, 5 3 2 missed edge cases; 5 added noise to prompt context
Embedding Model all-MiniLM-L6-v2, mpnet-base-v2 all-MiniLM-L6-v2 5Γ— faster with comparable accuracy on medical text
LLM Temperature 0.0, 0.3, 0.7 0.0 Deterministic answers critical for medical use case

πŸ’‘ All experiments are reproducible in research/trials.ipynb β€” open it to see the raw token latency and retrieval precision comparisons.


9. πŸ§ͺ Sample Queries β€” Zero Hallucination Proof

The table below demonstrates the RAG pipeline in action β€” showing how retrieved context from Medical_book.pdf directly shapes the grounded answer, with no fabrication.

πŸ’¬ User Question πŸ“„ Retrieved Context (from Medical_book.pdf) πŸ€– Grounded Response
"What is hypertension and how is it classified?" "Hypertension is defined as systolic BP β‰₯ 140 mmHg or diastolic BP β‰₯ 90 mmHg. Stage 1: 140–159/90–99. Stage 2: β‰₯ 160/100..." "Hypertension is high blood pressure classified into Stage 1 (140–159/90–99 mmHg) and Stage 2 (β‰₯160/100 mmHg) based on systolic and diastolic readings."
"What are the symptoms of Type 2 diabetes?" "Common symptoms include polyuria, polydipsia, polyphagia, fatigue, blurred vision, and slow wound healing..." "Type 2 diabetes presents with increased urination, excessive thirst, increased hunger, fatigue, blurred vision, and poor wound healing."
"What is the mechanism of action of aspirin?" "Aspirin irreversibly inhibits cyclooxygenase (COX-1 and COX-2), blocking thromboxane A2 synthesis and reducing platelet aggregation..." "Aspirin works by permanently blocking COX-1 and COX-2 enzymes, which prevents thromboxane A2 production and reduces the blood's ability to clot."
"Who invented the telescope?" (No relevant chunk found in medical index) "I cannot find information about this in the indexed medical documents. Please ask a medically relevant question."

πŸ›‘οΈ Row 4 is the most important: when the answer doesn't exist in the documents, the model says so β€” this is the hallucination guard in practice.


10. πŸ“¦ Getting Started

10.1 πŸ”§ Prerequisites

πŸ› οΈ Tool πŸ“Œ Version πŸ”— Link
Python β‰₯ 3.10 python.org
Conda any anaconda.com
πŸ—„οΈ Pinecone account free tier pinecone.io
⚑ Groq API key free tier console.groq.com

10.2 ⬇️ Install & Configure

πŸ“₯ Step 1 β€” Clone

git clone https://github.com/salonyranjan/MediQuery.ai.git
cd MediQuery.ai

🐍 Step 2 β€” Create environment

# With Conda (recommended)
conda create -n medibot python=3.10 -y
conda activate medibot

# Or with venv
python -m venv .venv && source .venv/bin/activate  # Windows: .venv\Scripts\activate

πŸ“¦ Step 3 β€” Install dependencies

pip install -r requirements.txt

# Required for create_retrieval_chain in newer LangChain versions
pip install langchain-classic

# Installs src/ as a local editable package via setup.py
# This lets app.py import from src/helper.py and src/prompt.py without path hacks
pip install -e .

πŸ” Step 4 β€” Configure secrets

cp .env.example .env

Edit .env:

PINECONE_API_KEY=your_pinecone_api_key
GROQ_API_KEY=your_groq_api_key

πŸ” Security Note: The project uses .gitignore to protect API keys (*.env), exclude virtual environments (venv_medical/, .venv/), and keep generated artifacts out of version control. This is a security-first practice β€” never hardcode credentials, never commit your venv/ or .env. If you accidentally track them, run git rm --cached .env to untrack without deleting.

10.3 πŸ—„οΈ Build Vector Index

Place your medical PDFs in the data/ folder, then run:

python store_index.py

βœ… This embeds your PDFs with all-MiniLM-L6-v2 and pushes vectors to Pinecone. Run once per new PDF batch.

10.4 πŸ–₯️ Run Locally

python app.py

🌐 Opens at http://localhost:8080


11. 🐳 Docker Quick Start

No conda, no venv β€” single command:

# Build
docker build -t mediquery .

# Run with secrets injected at runtime
docker run -d -p 8080:8080 \
  -e PINECONE_API_KEY="your_pinecone_key" \
  -e GROQ_API_KEY="your_groq_key" \
  --name mediquery_app \
  mediquery

🌐 Opens at http://localhost:8080

Recommended Dockerfile (slim + health-checked):

# slim base β€” ~200 MB vs ~900 MB for full python:3.10
FROM python:3.10-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

EXPOSE 8080

# Health check β€” Docker/AWS monitors if Flask is responding
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:8080/ || exit 1

CMD ["python", "app.py"]

12. πŸ—οΈ Enterprise Infrastructure Showcase

πŸ’Ό Recruiter note: While the app runs serverlessly on Streamlit Cloud for cost efficiency, this section demonstrates the full production-grade AWS infrastructure that can be activated for enterprise scale β€” showing Docker, ECR, EC2, and automated CI/CD are all in place.

12.1 πŸ—οΈ Infrastructure Setup

Step 1 β€” IAM user for deployment

Create an IAM user with:

  • AmazonEC2ContainerRegistryFullAccess
  • AmazonEC2FullAccess

Save the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Step 2 β€” Create ECR repository

<account-id>.dkr.ecr.<region>.amazonaws.com/medicalbot
# Example: 577435557871.dkr.ecr.eu-north-1.amazonaws.com/medical_chatbot

Step 3 β€” Launch EC2 (Ubuntu) + install Docker

sudo apt-get update -y && sudo apt-get upgrade -y
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu && newgrp docker

⚠️ Open port 8080 in your EC2 Security Group inbound rules.

Step 4 β€” Register EC2 as self-hosted GitHub runner

Go to: GitHub repo β†’ Settings β†’ Actions β†’ Runners β†’ New self-hosted runner β†’ follow the Linux install commands on your EC2 instance.

12.2 βš™οΈ GitHub Actions CI/CD

Step 5 β€” Add GitHub Secrets

Go to: Settings β†’ Secrets and variables β†’ Actions β†’ add:

πŸ”‘ Secret πŸ“ Value
AWS_ACCESS_KEY_ID From IAM step
AWS_SECRET_ACCESS_KEY From IAM step
AWS_DEFAULT_REGION e.g. eu-north-1
ECR_REPO Your ECR URI
PINECONE_API_KEY Your Pinecone key
GROQ_API_KEY Your Groq key

Step 6 β€” Push to trigger pipeline

git push origin main

On every push, GitHub Actions will:

git push β†’ Build Docker image β†’ Push to ECR β†’ docker pull on EC2 β†’ docker run :8080

13. ⚑ Performance

πŸ“Š Metric 🎯 Value πŸ“ Notes
⚑ Groq Inference Latency ~500ms Llama 3.3 70B via Groq LPUβ„’ hardware
πŸš€ Token Throughput ~2,000 tok/s Groq LPUβ„’ β€” orders of magnitude faster than GPU inference
πŸ” Pinecone Retrieval < 100ms Top-k ANN similarity search
πŸ’¬ End-to-End Latency < 1s Query β†’ embed β†’ retrieve β†’ generate β†’ response
πŸ—οΈ CI/CD Deploy < 5 min GitHub Actions β†’ ECR β†’ EC2 full pipeline
🐳 Docker Image Size ~200 MB python:3.10-slim base
πŸ“„ Index Capacity unlimited Add any number of PDFs to data/

14. πŸ—ΊοΈ Roadmap

Status πŸš€ Feature 🎯 Priority
βœ… RAG pipeline β€” LangChain + Pinecone + Groq πŸ”΄ Core
βœ… Flask UI with dark/light mode πŸ”΄ Core
βœ… Docker + AWS EC2+ECR deployment πŸ”΄ Core
βœ… GitHub Actions CI/CD auto-deploy πŸ”΄ Core
πŸ”„ Multi-document support β€” index multiple PDFs simultaneously 🟑 High
πŸ”„ Source citation β€” show which document/page the answer came from 🟑 High
πŸ”„ Conversation memory β€” multi-turn context window 🟑 High
πŸ“… User auth β€” personal indexed document libraries 🟒 Planned
πŸ“… Streamlit variant β€” parallel serverless deployment 🟒 Planned
πŸ“… Fine-tuned embeddings β€” domain-specific medical embedding model 🟒 Planned
πŸ’‘ Voice interface β€” STT/TTS for accessibility πŸ”΅ Idea

πŸ’¬ Open a feature request β†’


15. 🀝 Contributing

# 1. Fork on GitHub
# 2. Create your branch
git checkout -b feature/your-feature

# 3. Commit with conventional format
git commit -m "feat: add your feature"
# Prefixes: fix: | docs: | style: | refactor: | test: | chore:

# 4. Push & open a PR
git push origin feature/your-feature

Priority areas:

πŸ”₯ Area πŸ“ What's Needed
πŸ“„ Source Citations Return document name + page number per answer
🧠 Memory LangChain ConversationBufferMemory integration
πŸ§ͺ Tests Pytest for RAG pipeline stages and Flask routes
🎨 UI More theme variants, mobile responsiveness

16. πŸ“„ Changelog

Version Highlights
πŸ†• v2.0.0 Flask UI + dark/light mode Β· full AWS EC2+ECR+GitHub Actions CI/CD
v1.5.0 Groq Llama 3.3 70B Β· Pinecone semantic search Β· Docker support
v1.0.0 πŸŽ‰ Initial RAG chatbot β€” LangChain + HuggingFace embeddings

17. πŸ‘€ Author

Salony Ranjan

✦ Salony Ranjan

πŸ€– ML Engineer Β Β·Β  πŸ§‘β€πŸ’» Full-Stack Dev Β Β·Β  ☁️ Cloud & DevOps

"Building intelligent systems that are as trustworthy as they are fast."


Β  Β  Β 

18. ⭐ Show Your Support

If MediQuery.ai impressed you, helped your research, or gave you ideas for your own RAG system β€” show it some love! 🩺

πŸ’‘ Pro Tip: Go to GitHub repo Settings β†’ Social Preview and upload the dark-mode screenshot. When you share on LinkedIn, your Cyber-Neon UI shows instead of a generic GitHub card β€” instant recruiter attention.

Β  Β  Β 




Developed with 🩺 by Salony Ranjan  ·  © 2026 MediQuery.ai · MIT

About

MediQuery.ai β€” A Production-Grade Medical RAG Pipeline. Built with LangChain, Groq (Llama-3), and Pinecone. Fully containerized with Docker and deployed on AWS (EC2/ECR) using GitHub Actions CI/CD. πŸ©ΊπŸ€–

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages