Module IV: Adaptation, Evaluation, Safety and
Deployment
1. LoRA (Low-Rank Adaptation)
What it is:
LoRA is a method to fine-tune large language models efficiently without retraining all the
parameters.
Instead of changing the whole network, LoRA adds small trainable layers that adapt the
model for a new task.
Example:
Imagine you have a large model trained on general English.
If you want it to write medical reports, instead of retraining everything, LoRA adds small
“update blocks” that learn medical patterns — saving memory and time.
Why it matters:
• Reduces computation and storage needs
• Allows quick domain adaptation
• Used in tools like ChatGPT fine-tuning and Stable Diffusion custom styles
2. Adapters
What it is:
Adapters are small neural layers inserted between the layers of a pre-trained model.
When fine-tuning, only these adapters are trained — not the whole model.
Example:
Think of adapters like “plug-ins” in a big system.
You can train one adapter for sentiment analysis and another for translation — all using the
same base model.
Advantages:
• You can switch adapters for different tasks
• Keeps the main model frozen → faster updates
3. PEFT (Parameter-Efficient Fine-Tuning)
What it is:
A general term for techniques like LoRA, Adapters, and others that reduce the number of
trainable parameters during fine-tuning.
Example:
If a model has 1 billion parameters, PEFT might only train 10 million — achieving almost
the same performance with less cost.
Why used:
• Cheaper fine-tuning
• Suitable for limited GPU systems
• Ideal for classroom or research-level custom LLMs
4. Alignment: RLHF (Reinforcement Learning with
Human Feedback)
What it is:
A process that makes models behave in human-aligned and ethical ways.
Human’s rate model responses → feedback helps train it to prefer helpful and safe answers.
Steps:
1. Model generates outputs
2. Human’s rate good vs bad responses
3. A “reward model” learns from this
4. The main model is updated using reinforcement learning
Example:
Used in ChatGPT, where feedback ensures polite, relevant, and fact-based answers.
5. DPO (Direct Preference Optimization)
What it is:
A newer and simpler alternative to RLHF.
Instead of reinforcement learning, it directly optimizes the model to prefer human-
preferred answers using pairwise comparisons.
Example:
If humans prefer Answer A over Answer B, DPO teaches the model that “A is better.”
It’s like a simpler “preference training” system.
6. Safety Filters
What it is:
Rules and models that block harmful or unsafe outputs before showing results to the user.
Example:
• Filtering hate speech or violent text
• Preventing disallowed image generation in Stable Diffusion
• Flagging sensitive or private information
These ensure AI safety and responsible deployment.
7. Red-Teaming
What it is:
Testing an AI system by trying to “break it” — finding weaknesses, unsafe behaviors, or
bias.
Red team testing in AI refers to a systematic approach to identifying vulnerabilities, risks, and
potential harms in AI systems, particularly large language models (LLMs). This process involves
adversarial probing and testing to uncover weaknesses, ensuring the AI system is robust, secure,
and aligned with responsible AI principles.
Red teaming is essential for identifying harmful outputs such as hate speech, incitement to
violence, or other unintended consequences. It complements systematic measurement and
mitigation strategies by exposing risks and enabling the development of effective safeguards.
Key Steps in Red Team Testing
1. Planning the Team: Assemble a diverse group of testers with expertise in AI, security,
and the specific domain of the application. Include individuals with both adversarial
mindsets and ordinary user perspectives to uncover a wide range of risks.
2. Defining the Scope: Test at multiple layers, including the base LLM model and the
application interface. Evaluate the system both before and after implementing
mitigation strategies to assess their effectiveness.
3. Testing Methodology: Begin with open-ended testing to explore a broad range of
potential harms. Document findings and iteratively refine the testing process by
focusing on identified risks and their mitigations.
4. Data Collection: Establish a structured approach for recording inputs, outputs, and
findings. Use shared tools like spreadsheets to facilitate collaboration and avoid
duplication.
5. Active Monitoring: During testing, provide support to testers, monitor progress, and
address any access or instruction issues promptly.
6. Reporting Results: Share concise reports with stakeholders, highlighting key issues,
raw data, and plans for future testing. Clearly differentiate between identifying risks
and measuring their prevalence.
Example:
Before release, companies have internal teams who try to get the AI to:
• Reveal private data
• Say offensive things
• Give unsafe advice
Goal → find & fix these problems early.
8. Hallucinations
What it is:
When an AI confidently generates false or made-up information.
Example:
You ask: “Who invented solar power in 1990?”
AI replies: “Dr. Alan Peters invented it.” — (no such person exists!)
Hallucinations happen because models generate patterns, not verified facts.
Solution:
• Use factual retrieval systems (like RAG)
• Improve data quality
• Add fact-checking layers
9. Toxicity
What it is:
AI producing harmful, offensive, or biased text.
May reflect biases in the data (e.g., gender, race, or religion).
Example:
If a model trained on internet comments generates rude or discriminatory text — that’s
toxicity.
Mitigation:
• Apply safety filters
• Use balanced, curated datasets
• Conduct bias audits
10. Bias
What it is:
When AI favors or discriminates unintentionally because of biased training data.
Example:
If most resumes in training data belong to men, an AI hiring model might favor male
candidates.
Prevention:
• Diverse datasets
• Bias detection tools
• Transparent evaluation
11. Watermarking
What it is:
A hidden signal embedded in AI-generated content (text or images) to identify it as AI-
generated.
Example:
OpenAI and Google are exploring watermarking so that fake news or AI-generated art can be
traced back safely.
Purpose:
• Detect misuse
• Verify authenticity
• Support digital ethics
12. Provenance
What it is:
Tracking the origin and creation history of AI-generated content — like a “chain of
custody.”
Example:
An image with metadata showing it was created by Stable Diffusion on a specific date using
a certain prompt.
This helps ensure transparency and accountability in generative AI outputs.
Concept Meaning (Simple) Real-Life Example
LoRA Small efficient model fine-tuning Custom chatbot for education
Concept Meaning (Simple) Real-Life Example
Switching models for
Adapters Plug-in modules for specific tasks
translation/sentiment
PEFT Fine-tuning few parameters Save GPU cost
RLHF Learn from human feedback ChatGPT’s polite responses
Directly train on preferred
DPO Simple alignment training
answers
Safety Filters Block unsafe outputs Filter hate speech
Red-Teaming Test AI for weaknesses Internal safety audits
Hallucination False but confident info Made-up facts
Toxicity Harmful content Offensive comments
Bias Unfair results Gender or racial bias
Watermarking Tag AI-generated content Detect fake media
Provenance Trace origin of content Verify AI-generated images
A. System Aspects
These are the technical challenges and optimizations used when deploying large AI models
(like ChatGPT, GPT-4, Stable Diffusion, etc.) so they can serve millions of users efficiently.
Serving Large Models
What it means:
“Serving” means deploying and running a trained AI model so that users can access it
through APIs or apps.
Challenges:
• Large models (billions of parameters) need powerful GPUs or TPUs.
• Must handle many users at once without lag.
• Need systems for scaling, load balancing, and monitoring.
Example:
OpenAI serves ChatGPT through cloud servers optimized for large-scale inference. When
you type a prompt, your request goes to a cluster of GPUs that generate the answer in real-
time.
Key takeaway:
Efficient serving = smooth user experience.
Quantization
What it means:
Quantization reduces the precision of numbers (weights and activations) used by a model —
e.g., from 32-bit floating-point (FP32) to 8-bit integers (INT8).
Why:
To reduce memory usage and increase speed, while keeping accuracy almost the same.
Example:
If GPT-3 (175B parameters) uses quantization, its storage might reduce from 700GB →
150GB, making it faster and cheaper to run.
Real analogy:
Like compressing a high-quality photo — smaller size, similar clarity.
Distillation
What it means:
Model distillation is the process of transferring knowledge from a large, complex model
(called the teacher) to a smaller, simpler model (called the student).
Why:
• To make models faster and lighter for edge devices (phones, browsers).
• Maintain similar performance with less computation.
Example:
• DistilBERT → a smaller, faster version of BERT.
• Used in chatbots and mobile apps where full BERT is too heavy.
Analogy:
Like a professor teaching the main ideas to a student — smaller but still knowledgeable.
Latency and Throughput
Latency = How long it takes to get one response.
Throughput = How many requests can be handled per second.
Example:
• Low latency: ChatGPT replies instantly → good user experience.
• High throughput: Model handles 1000 users simultaneously → efficient server usage.
Optimization methods:
• Use GPUs, batching, and caching.
• Compress and optimize models.
Cost and Carbon Accounting
What it means:
Training and running large models are expensive and energy-intensive.
• Cost → GPUs, storage, cloud infrastructure, staff.
• Carbon accounting → Measuring and reducing environmental impact (energy use,
CO₂ emissions).
Example:
• Training GPT-3 cost millions of dollars and emitted tons of CO₂.
• Companies now use green data centers and energy-efficient chips.
Responsible AI practice:
Balance performance with sustainability — design eco-friendly AI systems.
B. Legal and Ethical Issues
These cover rules, responsibilities, and ethics of developing and deploying generative AI.
Copyright
What it means:
AI-generated content may use or resemble copyrighted works, raising legal questions about
ownership.
Example:
If an AI generates a song similar to a Beatles track, who owns it — the user, AI developer, or
original artist?
Key principles:
• Using copyrighted data for training may need permission or licenses.
• Generated content must not copy existing works.
Data Licensing
What it means:
Ensuring that data used to train models is legally obtained and used according to its license.
Example:
• “CC-BY” (Creative Commons Attribution) → can use but must give credit.
• Proprietary datasets → require paid access.
Ethical rule:
Always know where your data comes from and respect creators’ rights.
Responsible GenAI Governance
What it means:
Setting up rules, guidelines, and policies for responsible development, deployment, and
monitoring of AI.
Includes:
• Transparency (how models work)
• Fairness (no discrimination)
• Accountability (who is responsible for outputs)
• Privacy (protecting user data)
Example:
Organizations have AI ethics boards that review projects for potential harm before launch.
Adaptation
Meaning:
Adapting a pre-trained AI model (like ChatGPT, BERT, or Stable Diffusion) to a specific task or
dataset.
Instead of training from scratch, we slightly modify or fine-tune it to perform better for our purpose.
Example:
• GPT can write essays, but we can fine-tune it to answer only medical or engineering
questions.
• Stable Diffusion can generate any image, but we can adapt it to make architectural
designs only.
2. Evaluation
Meaning:
Once adapted, we must check how good and safe the AI model is.
➤ Evaluation Focus:
1. Performance:
• Check accuracy, precision, recall, F1 score (for text models)
• Check image quality metrics like FID (for diffusion models)
2. Safety and fairness:
• Test for bias, toxicity, misinformation, etc.
3. Human feedback:
• Let users or experts review model answers for quality.
➤ Example:
After fine-tuning ChatGPT for education, test it by:
• Giving it 100 student questions → check correctness
• Ensure it doesn’t generate harmful or biased content.
3. Safety
Meaning:
Ensuring AI behaves ethically, doesn’t harm users, and gives trustworthy results.
4. Deployment
Meaning: Deployment is making the AI available for real use — like launching ChatGPT, DALL·E, or
Copilot for users.
5. Legal & Ethical Aspects (during Deployment)
• Follow copyright and data privacy laws.
• Avoid deepfakes, plagiarism, and biased outputs.
• Ensure AI models are transparent and explainable.
• Monitor continuously after deployment for any misuse.
• When a user submits a query, the RAG system first searches its database for relevant
information.
• This retrieved data is then combined with the original query and fed into the LLM.
• Finally, the model generates a response using both its pre-trained knowledge and the
context provided by the retrieved information.
This approach enables the LLM to produce more accurate and relevant outputs.
• This process allows the model to become more proficient in handling particular types of
queries or generating domain-specific content.
• fine-tuned models may lose some of their general capabilities as they become more
specialized.
LoRA (Low-Rank Adaptation)
• What it is:
LoRA is a method to fine-tune large language models efficiently without retraining all the
parameters.
Instead of changing the whole network, it adds small trainable layers that adapt the model
for a new task.
• Example:
Imagine you have a large model trained on general English.
If you want it to write medical reports, instead of retraining everything, it adds small
“update blocks” that learn medical patterns — saving memory and time.
• Why it matters:
• Reduces computation and storage needs
• Allows quick domain adaptation
• Used in tools like ChatGPT fine-tuning and Stable Diffusion custom styles
• Adapters
• What it is:
Adapters are small neural layers inserted between the layers of a pre-trained model.
When fine-tuning, only these adapters are trained — not the whole model.
• Example:
Think of adapters like “plug-ins” in a big system.
You can train one adapter for sentiment analysis and another for translation — all using the
same base model.
• Advantages:
• You can switch adapters for different tasks
• Keeps the main model frozen → faster updates
3. PEFT (Parameter-Efficient Fine-Tuning)
• What it is:
A general term for techniques like LoRA, Adapters, and others that reduce the number of
trainable parameters during fine-tuning.
• Example:
If a model has 1 billion parameters, PEFT might only train 10 million — achieving almost the
same performance with less cost.
• Why used:
• Cheaper fine-tuning
• Suitable for limited GPU systems
• Ideal for classroom or research-level custom LLMs
Alignment: RLHF (Reinforcement Learning with Human Feedback)
What it is:
A process that makes models behave in human-aligned and ethical ways.
Humans rate model responses → feedback helps train it to prefer helpful and safe answers.
Steps:
1. Model generates outputs
2. Humans rate good vs bad responses
3. A “reward model” learns from this
4. The main model is updated using reinforcement learning
Example:
Used in ChatGPT, where feedback ensures polite, relevant, and fact-based answers.
• Safety Filters
• What it is:
Rules and models that block harmful or unsafe outputs before showing results to the user.
• Example:
• Filtering hate speech or violent text
• Preventing disallowed image generation in Stable Diffusion
• Flagging sensitive or private information
• These ensure AI safety and responsible deployment.
Red-Teaming
What it is:
Testing an AI system by trying to “break it” — finding weaknesses, unsafe behaviors, or bias.
Example:
Before release, companies have internal teams who try to get the AI to:
• Reveal private data
• Say offensive things
• Give unsafe advice
Goal → find & fix these problems early. Hallucinations
• What it is:
When an AI confidently generates false or made-up information.
• Example:
You ask: “Who invented solar power in 1990?”
AI replies: “Dr. Alan Peters invented it.” — (no such person exists!)
• Hallucinations happen because models generate patterns, not verified facts.
• Solution:
• Use factual retrieval systems (like RAG)
• Improve data quality
• Add fact-checking layers
• Toxicity
• What it is:
AI producing harmful, offensive, or biased text.
May reflect biases in the data (e.g., gender, race, or religion).
• Example:
If a model trained on internet comments generates rude or discriminatory text — that’s
toxicity.
Mitigation:
• Apply safety filters
• Use balanced, curated datasets
• Conduct bias audits
. Bias
What it is:
When AI favors or discriminates unintentionally because of biased training data.
Example:
If most resumes in training data belong to men, an AI hiring model might favor male candidates.
Prevention:
• Diverse datasets
• Bias detection tools
Transparent evaluation
Watermarking
• What it is:
A hidden signal embedded in AI-generated content (text or images) to identify it as AI-
generated.
• Example:
OpenAI and Google are exploring watermarking so that fake news or AI-generated art can be
traced back safely.
• Purpose:
• Detect misuse
• Verify authenticity
• Support digital ethics
• Provenance
• What it is:
Tracking the origin and creation history of AI-generated content — like a “chain of custody.”
• Example:
An image with metadata showing it was created by Stable Diffusion on a specific date using
a certain prompt.
• This helps ensure transparency and accountability in generative AI outputs.
Tool Purpose
OpenAI API GPT-3, GPT-4 access via API
Hugging Face Transformers Open-source models like BERT, LLaMA
Stable Diffusion Open-source image generation
Midjourney Text → Image AI art tool
DALL-E OpenAI image model
GitHub Copilot AI code suggestion tool
ChatGPT General LLM assistant
• Serving Large Models
• What it means:
“Serving” means deploying and running a trained AI model so that users can access it
through APIs or apps.
• Challenges:
• Large models (billions of parameters) need powerful GPUs or TPUs.
• Must handle many users at once without lag.
• Need systems for scaling, load balancing, and monitoring.
• Example:
OpenAI serves ChatGPT through cloud servers optimized for large-scale inference. When you
type a prompt, your request goes to a cluster of GPUs that generate the answer in real-time.
• Quantization
• What it means:
Quantization reduces the precision of numbers (weights and activations) used by a model —
e.g., from 32-bit floating-point (FP32) to 8-bit integers (INT8).
• Why:
To reduce memory usage and increase speed, while keeping accuracy almost the same.
• Example:
If GPT-3 (175B parameters) uses quantization, its storage might reduce from 700GB →
150GB, making it faster and cheaper to run.
• Real analogy:
Like compressing a high-quality photo — smaller size, similar clarity
• Distillation
• What it means:
Model distillation is the process of transferring knowledge from a large, complex model
(called the teacher) to a smaller, simpler model (called the student).
• Why:
• To make models faster and lighter for edge devices (phones, browsers).
• Maintain similar performance with less computation.
• Example:
• DistilBERT → a smaller, faster version of BERT.
• Used in chatbots and mobile apps where full BERT is too heavy.