Hospital Agentic AI Assistant: Home
Assignment
Overview
This home assignment consists of two tasks designed to evaluate your skills in building
agentic AI systems, database management, natural language processing, and model
fine-tuning. The assignment simulates developing a customer-facing AI assistant for a
hospital to handle doctor availability, patient queries, and bookings.
• TASK 1: Implement the core agentic AI assistant with database integration and
query handling. Deadline: Monday, January 5, 2026, 9:00 AM (local time).
Submit the full implementation (code, database schema, and a demo/run
instructions).
• TASK 2: Enhance the system with a fine-tuned Transformer model for query
classification. Due to resource constraints (e.g., limited GPU/CPU), submit a
detailed plan by Tuesday, January 6, 2026, 9:00 AM. The final results will be
evaluated leniently based on your hardware limitations—focus on clear
communication of your approach, challenges, and outcomes. If full fine-tuning
isn't feasible, demonstrate partial progress (e.g., on a subset of data) with
metrics.
Submission Guidelines:
• Use GitHub (or similar) for code submission. Include a README with setup
instructions, dependencies, and sample interactions.
• For TASK 2 plan: A 2-3 page document (PDF/Word) outlining your methodology,
dataset , evaluation setup, and preliminary results.
• Technologies: Python preferred. Use libraries like LangChain/crewAI for agents,
SQLite/PostgreSQL for DB, Hugging Face Transformers for TASK 2.
• Hints: Test on local hardware. For agents, break into specialized roles (e.g., DB
Agent, Query Parser Agent, Booking Agent). Document edge cases (e.g., no
available slots).
TASK 1: Core Agentic AI Assistant Implementation
Problem Statement
Design and implement a customer-facing Agentic AI assistant (leveraging multiple AI
agents) for a hospital. The system should handle doctor availability tracking, user
authentication, and natural language queries for consultations and bookings. The
assistant must be interactive (e.g., via console, Streamlit/Gradio UI, or API endpoints)
and use SQL for data persistence.
Requirements
1. Doctor Availability Database in SQL
a. Create and initialize a relational database (e.g., SQLite for simplicity).
b. Generate a structured dataset of at least 20-30 doctors across 5+
departments.
c. Key table schema (doctors_availability):
Column Type Description
Unique ID (primary key, auto-
doctor_id INTEGER
increment)
name TEXT Doctor's full name
department TEXT e.g., "Cardiology"
day_of_week TEXT e.g., "Monday", "Tuesday"
time_slot TEXT e.g., "09:00-10:00"
is_booked BOOLEAN True if slot is booked, False otherwise
d. Hint: Use SQL scripts or Python (e.g., SQLAlchemy) to populate data.
Ensure slots reset weekly or handle overlaps logically. Add indexes for
fast queries on department and day_of_week.
2. User Login and Profile Management
a. Implement a simple login mechanism:
i. Option A: One fixed test user (e.g., username: "patient1",
password: "pass123").
ii. Option B: Allow new users per session (generate a temporary ID;
no full registration UI needed).
b. Upon login, prompt the user to share personal details and store them in a
patients table:
Column Type Description
patient_id INTEGER Unique ID (primary key)
name TEXT Patient's name
age INTEGER Patient's age
Description of
medical_issue TEXT
symptoms/issue
c. Hint: Use session state (e.g., in-memory or file-based) to track logged-in
users. Hash passwords if using real auth (bcrypt via passlib). Link patient
details to bookings later.
3. Query Handling and Booking
a. The AI assistant must parse natural language queries from logged-in
users and respond accordingly. Supported query types:
i. Diagnosis/Symptoms: User describes issue (e.g., "I have chest
pain"); agent suggests relevant department (integrate with TASK 2
output if implemented early).
ii. Availability Check: e.g., "Show available doctors in Neurology on
Wednesday"; query DB and list free slots.
iii. Booking Request: e.g., "Book a slot with Dr. Smith on Friday at 2
PM"; update is_booked to True if available, else suggest
alternatives.
b. Use multiple agents (e.g., Intent Classifier Agent → DB Query Agent →
Response Generator Agent).
c. Hint: For NLP parsing, use regex/simple rules or lightweight libraries like
spaCy/Rasa NLU. Ensure bookings are atomic (use transactions to avoid
race conditions). Provide confirmation emails/simulated notifications.
d. Use LLM of your choice
Deliverables
• Full source code (Python/SQL).
• Sample database dump (e.g., .sql file).
• Demo script or UI to showcase 3-5 end-to-end interactions.
• Brief report (1 page): Architecture diagram (agents' roles), challenges, and how it
scales.
Evaluation Criteria: Functionality (80%), Code quality/readability (10%), Edge case
handling (10%).
TASK 2: Fine-Tuning for Query Classification
Problem Statement
Enhance the TASK 1 agent by integrating a fine-tuned Transformer model to
automatically classify patient symptom queries into the appropriate medical
department. This will route users to the right specialists efficiently. Focus on resource
efficiency—use a small model and limited dataset.
Requirements
1. Model Selection
a. Load a pretrained Transformer from Hugging Face Hub (≤500M
parameters for feasibility on consumer hardware).
b. Recommended: GPT-2 (small), or LLaMA or Gemma.
c. Hint: Use transformers library. Install via pip if needed (assume local
env). Test on CPU if no GPU.
2. Pre-Fine-Tuning Evaluation
a. Test the base model on a held-out validation set (10-20% of dataset).
b. Metrics: Accuracy, Precision, Recall, F1-score (macro-averaged for multi-
class).
c. Hint: Use [Link] for evaluation. Report baseline performance
(expect ~random chance for untrained model).
3. Dataset Preparation
a. Create/use a labeled dataset for department classification.
b. Minimum 5 departments (use or expand this example set): Cardiology,
Neurology, Orthopedics, Pediatrics, Dermatology, Oncology, ENT,
General Medicine.
c. Format (CSV/JSON):
query_text department_label
"I have a headache and
Neurology
dizziness"
"My child has a fever" Pediatrics
d. Size: approx 1,000 samples total (e.g., 80-100 per department).
e. Sources: Hugging Face Datasets , or synthetically generate
f. Split: 80% train, 10% val, 10% test.
g. Hint: For synthetic data, use templates like "Patient reports [symptom] in
[body part]" mapped to departments. Ensure balance across classes.
Augment with paraphrasing if needed (via nlpaug library).
4. Fine-Tuning
a. Fine-tune for 3-5 epochs (use LoRA/PEFT for efficiency if params are
tight).
b. Hyperparams: Learning rate 2e-5, batch size 8-16 (adjust for RAM).
c. Hint: Use Trainer API from Hugging Face. Monitor for overfitting with early
stopping. If hardware limits, subsample data or use Google Colab (free
tier).
5. Post-Fine-Tuning Evaluation
a. Re-evaluate on the same validation/test set.
b. Demonstrate measurable improvement (e.g., F1 >20% gain).
c. Integrate output into TASK 1 (e.g., classification feeds into availability
query).
d. Hint: Visualize metrics with confusion matrix (matplotlib/seaborn).
Discuss limitations (e.g., model hallucinations on rare symptoms).
Deliverables
• Tuesday Plan (due Jan 5): Outline dataset , model choice, fine-tuning script
skeleton, expected metrics, and hardware mitigation (e.g., "If OOM error, reduce
batch to 4").
• Final Submission : Trained model (HF Hub upload if possible), evaluation
notebook, integration code snippet.