0% found this document useful (0 votes)

328 views6 pages

Understanding Bias in Large Language Models

The document is an assignment on Large Language Models (LLMs) consisting of 10 questions, each worth 1 mark, focusing on bias, metrics, and training data selection in LLMs. Key concepts include the characterization of bias, the Stereotype Score, and the impact of high-resource versus low-resource languages on model training. The assignment also covers bias mitigation strategies and the ethical considerations of responsible LLM development.

Uploaded by

Harsh Vardhan Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

328 views6 pages

Understanding Bias in Large Language Models

Uploaded by

Harsh Vardhan Choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Introduction to Large Language Models

Assignment- 12

Number of questions: 10 Total mark: 10 X 1 = 10

_________________________________________________________________________

QUESTION 1: [1 mark]

Which statements correctly characterize “bias” in the context of LLMs?

1. Bias can generate objectionable or stereotypical views in model outputs.

2. Bias is always intentionally introduced by malicious data curators.
3. Bias can cause harmful real-world impacts such as reinforcing discrimination.
4. Bias only affects low-resource languages; high-resource languages are unaffected.

a. 1 and 2

b. 1 and 3

c. 2 and 4

d. 1, 3, and 4

Correct Answer: b

Explanation:

• (1) True: Model outputs can reflect harmful stereotypes if training data or modelling
procedures contain biases.
• (3) True: Biased outputs may perpetuate discrimination or unfair treatment in real-
world contexts.

• Statements (2) and (4) are not necessarily correct:

o (2) False: Bias in data is often unintentional, reflecting existing societal or

historical imbalances.
o (4) False: Bias can affect any language; high-resource languages are not
inherently immune.

_______________________________________________________________________

QUESTION 2: [1 mark]

The Stereotype Score (ss) refers to:

a. The frequency with which a language model rejects biased associations.

b. The measure of how often a model’s predictions are meaningless as opposed to
meaningful.
c. A ratio of positive sentiment to negative sentiment in model outputs.
d. The proportion of examples in which a model chooses a stereotypical association
over an anti-stereotypical one.
Correct Answer: d

Explanation:

• Stereotype Score (ss) is a metric that measures how frequently the model picks a
stereotypical continuation or association instead of a non-stereotypical or anti-
stereotypical one.

• Essentially, it’s a proportion (or fraction) of test items for which the model output
aligns with the stereotype.

_________________________________________________________________________

QUESTION 3: [1 mark]

Which of the following are prominent sources of bias in LLMs?

1. Improper selection of training data leading to skewed distributions.

2. Reliance on older datasets causing “temporal bias.”
3. Overemphasis on low-resource languages causing “linguistic inversion.”
4. Unequal focus on high-resource languages resulting in “cultural bias.”

a. 1 and 2 only

b. 2 and 3 only

c. 1, 2, and 4

d. 1, 3, and 4

Correct Answer: c

Explanation:

1. Improper selection of training data (true) can lead to some groups or topics
being over-represented, causing bias.
2. Reliance on older datasets (true) can introduce out-of-date or “temporal
bias” that doesn’t reflect current social norms or language usage.
3. “Overemphasis on low-resource languages” is not commonly described as
“linguistic inversion”; typically the bias is the opposite — under-representation
of low-resource languages.
4. Unequal focus on high-resource languages (true) can lead to cultural
biases and poor performance or misrepresentations of underrepresented
cultures.

_________________________________________________________________________

QUESTION 4: [1 mark]
In the context of bias mitigation based on adversarial triggers, which best describes the goal
of prepending specially chosen tokens to prompts?

a. To directly fine-tune the model parameters to remove bias

b. To override all prior knowledge in a model, effectively “resetting” it
c. To exploit the model’s distributional patterns, thereby neutralizing or flipping biased
associations in generated text
d. To randomly shuffle the tokens so that the model becomes more robust

Correct Answer: c

Explanation:

• Adversarial triggers are carefully crafted token sequences that, when prepended to
the prompt, steer the model’s output in a certain direction (e.g., reducing bias or
toxicity). They work within the model’s learned distribution rather than overriding its
knowledge.
• They do not retrain the model; they exploit patterns in the existing parameters to
mitigate biased outcomes.

________________________________________________________________________

QUESTION 5: [1 mark]

Which of the following best describes the “regard” metric?

a. It is a measure of how well a model can explain its internal decision process.
b. It is a measurement of a model’s perplexity on demographically sensitive text.
c. It is the proportion of times a model self-corrects discriminatory language.
d. It is a classification label reflecting the attitude towards a demographic group in the
generated text.

Correct Answer: d

Explanation:

• Regard is typically measured by classifying the tone of text toward a demographic

group (e.g., “positive,” “negative,” or “neutral” regard).
• It’s used to assess whether certain demographics consistently receive negative or
disrespectful language.

_________________________________________________________________________

QUESTION 6: [1 mark]
Which of the following steps compose the approach for improving response safety via in-
context learning?

a. Retrieving safety demonstrations similar to the user query.

b. Fine-tuning the model with additional labeled data after generation.
c. Providing retrieved demonstrations as examples in the prompt to guide the model’s
response generation.
d. Sampling multiple outputs from LLMs and choosing the majority opinion.

Correct Answer: a, c

Explanation:

• One strategy for safe or polite generation with large language models is to retrieve
“safety demonstrations” from a database of safe examples. Then you include these
examples in the prompt to the LLM, showing it how to respond safely.

• Fine-tuning (b) is a different technique, not part of the described in-context learning
approach.

• Majority vote (d) is also not typically a method described under “improving response
safety via in-context learning.”
_________________________________________________________________________

QUESTION 7: [1 mark]

Which statement(s) is/are correct about how high-resource (HRL) vs. low-resource
languages (LRL) affect model training?

a. LRLs typically have higher performance metrics due to smaller population sizes.
b. HRLs get more data, so the model might overfit to HRL cultural perspectives.
c. LRLs are often under-represented, leading to potential underestimation of their
cultural nuances.
d. The dominance of HRLs can cause a reinforcing cycle that perpetuates imbalance.

Correct Answers: b, c, d

Explanation:

• (b) True: If the model sees far more data in certain HRLs, it might be overly biased or
“overfit” to those languages’ norms and perspectives.
• (c) True: LRLs often lack extensive corpora, so the model learns fewer details about
these languages, risking lower performance and cultural misrepresentations.
• (d) True: The more a model focuses on HRLs, the more beneficial it appears to be
for those languages, attracting further data, thus perpetuating imbalance.
• (a) is not correct: LRLs typically have lower performance metrics due to insufficient
training data, not higher.

_________________________________________________________________________

QUESTION 8: [1 mark]

The “Responsible LLM” concept is stated to address:

a. Only the bias in LLMs
b. A set of concerns including explainability, fairness, robustness, and security
c. Balancing training costs with carbon footprint
d. Implementation of purely rule-based safety filters

Correct Answer: b

Explanation:

• Responsible LLM research focuses on a broad range of ethical, social, and

technical concerns:
o Fairness & bias mitigation
o Explainability & transparency
o Robustness to adversarial inputs
o Security & safe deployment

_________________________________________________________________________

QUESTION 9: [1 mark]

Within the StereoSet framework, the icat metric specifically refers to:

a. The ratio of anti-stereotypical associations to neutral associations

b. The percentage of times a model refuses to generate content deemed hateful
c. A measure of domain coverage across different demographic groups
d. A balanced metric capturing both a model’s language modelling ability and the
tendency to avoid stereotypical bias

Correct Answer: d

Explanation:

• In the StereoSet framework, icat is designed to measure how well the model
balances contextual accuracy (i.e., good language modelling) and reduced
stereotyping.
• It’s a combined metric that looks at correctness in typical language modelling tasks
while also penalizing stereotypical responses.

_________________________________________________________________________

QUESTION 10: [1 mark]

Bias due to improper selection of training data typically arises in LLMs when:

a. Data are selected exclusively from curated, balanced sources with equal
representation
b. The language model sees only real-time social media feeds without any historical
texts
c. The training corpus over-represents some topics or groups, creating a skewed
distribution
d. All data are automatically filtered to remove any demographic markers

Correct Answer: c

Explanation:

• Improper data selection leads to over-representation of certain domains, topics, or

demographic groups, causing the learned model to be skewed.

• Balanced data curation and filtering are actually methods to reduce bias. If data
come only from certain communities or perspectives, the model lacks balanced
coverage, and biases surface.

_________________________________________________________________________

Common questions

HRLs receive more data, potentially causing models to overfit to these languages' norms and cultural perspectives. This results in less focus and representation for low-resource languages (LRLs), thereby perpetuating a reinforcing cycle that leads to imbalance by continually privileging HRLs .

Adversarial triggers are specially chosen token sequences prepended to prompts which exploit the model's distributional patterns without modifying its parameters. This method aims to neutralize or reverse biased associations in generated text, utilizing the model's existing knowledge to mitigate biased outcomes .

Bias in LLMs can perpetuate discrimination or result in unfair treatment in real-world contexts by reinforcing harmful stereotypes present in the training data, which can lead to objectionable or stereotypical views in model outputs .

The 'regard' metric evaluates the tone or attitude reflected in language model outputs towards demographic groups, classifying it as positive, negative, or neutral. It's used to determine if demographic groups consistently face negative or disrespectful language, thus assessing potential bias .

The 'Responsible LLM' concept aims to address a broad set of concerns beyond bias, including explainability, fairness, robustness to adversarial inputs, and security in LLM deployment. This ensures more ethically aligned and socially responsible AI systems .

The risk of selecting improper training data includes over-representing certain topics or groups, which results in skewed distribution and learned biases in the model. This can lead to unbalanced output, misrepresentations, and perpetuation of stereotypes or potentially harmful outputs in real-world applications .

The 'Stereotype Score' (ss) measures the frequency with which a model selects stereotypical associations over anti-stereotypical ones. It evaluates the proportion of test items for which a model's output aligns with stereotypes, helping assess bias levels in LLM outputs .

The 'icat' metric balances a model's contextual accuracy in standard language modelling tasks with its tendency to avoid stereotypical biases. It combines evaluation of both the model’s general performance and its proclivity to reduce stereotyping, thus providing a nuanced assessment of bias in LLMs .

Prominent sources of bias in LLMs include improper selection of training data leading to skewed distributions, reliance on older datasets introducing temporal bias, and unequal focus on high-resource languages resulting in cultural bias. These factors contribute to imbalances and misrepresentations in model outputs .

Key Concepts in Instruction Tuning
No ratings yet
Key Concepts in Instruction Tuning
7 pages
Language Model Assignment Overview
No ratings yet
Language Model Assignment Overview
4 pages
Understanding Knowledge Graph Embeddings
No ratings yet
Understanding Knowledge Graph Embeddings
7 pages
SAID and Adaptation Techniques in LLMs
No ratings yet
SAID and Adaptation Techniques in LLMs
6 pages
AI Language Model Assignment Questions
No ratings yet
AI Language Model Assignment Questions
4 pages
Introduction To Large Language Models (LLMS) - Unit 7 - Week 5
No ratings yet
Introduction To Large Language Models (LLMS) - Unit 7 - Week 5
4 pages
Neural Network Functions and Techniques
No ratings yet
Neural Network Functions and Techniques
3 pages
LLMs Quiz: Term-Context Matrix Insights
No ratings yet
LLMs Quiz: Term-Context Matrix Insights
3 pages
Introduction to Large Language Models
No ratings yet
Introduction to Large Language Models
3 pages
Assignment 4 Solution
No ratings yet
Assignment 4 Solution
3 pages
Neural Network Fundamentals and Techniques
100% (1)
Neural Network Fundamentals and Techniques
3 pages
NPTEL Reinforcement Learning Course Notes
0% (1)
NPTEL Reinforcement Learning Course Notes
3 pages
Deep Learning Course Insights - IIT Ropar
100% (1)
Deep Learning Course Insights - IIT Ropar
5 pages
Key Concepts in Linear Regression
100% (3)
Key Concepts in Linear Regression
2 pages
EM Algorithm in Gaussian Mixture Models
100% (2)
EM Algorithm in Gaussian Mixture Models
3 pages
LLMs Quiz: Key Concepts and Answers
0% (1)
LLMs Quiz: Key Concepts and Answers
3 pages
Week 4: LLMs Assignment Overview
No ratings yet
Week 4: LLMs Assignment Overview
3 pages
Unit 7 - Week 5: Assignment 5
No ratings yet
Unit 7 - Week 5: Assignment 5
5 pages
Week 5 Machine Learning Assignment
100% (1)
Week 5 Machine Learning Assignment
5 pages
LLMs and Language Model Assignments
No ratings yet
LLMs and Language Model Assignments
51 pages
True Statements About Rotations
No ratings yet
True Statements About Rotations
7 pages
SVM Classifier on Modified Iris Dataset
0% (1)
SVM Classifier on Modified Iris Dataset
2 pages
Linear Regression Assignment Insights
100% (1)
Linear Regression Assignment Insights
3 pages
NLP Final Exam: Transformer MCQs
No ratings yet
NLP Final Exam: Transformer MCQs
7 pages
Dependency Parsing MCQ Assignment
No ratings yet
Dependency Parsing MCQ Assignment
6 pages
Decision Tree Classifier Insights
No ratings yet
Decision Tree Classifier Insights
3 pages
IIT Patna Deep Learning Exam Questions
No ratings yet
IIT Patna Deep Learning Exam Questions
2 pages
Machine Learning Week 3 Assignment MCQs
No ratings yet
Machine Learning Week 3 Assignment MCQs
6 pages
Assignment 8 Solutions: Bagging & Boosting
No ratings yet
Assignment 8 Solutions: Bagging & Boosting
3 pages
Deep Learning Course Notes - IIT Ropar
No ratings yet
Deep Learning Course Notes - IIT Ropar
1 page
Week 1 Machine Learning Assignment
100% (2)
Week 1 Machine Learning Assignment
4 pages
Machine Learning Assignment Overview
No ratings yet
Machine Learning Assignment Overview
45 pages
NPTEL Machine Learning Week 2 Overview
No ratings yet
NPTEL Machine Learning Week 2 Overview
6 pages
Linear Regression Model Insights
No ratings yet
Linear Regression Model Insights
8 pages
Machine Learning Assignment MCQs
No ratings yet
Machine Learning Assignment MCQs
34 pages
SVM Classifier on Modified Iris Dataset
No ratings yet
SVM Classifier on Modified Iris Dataset
45 pages
Neural Networks Question Bank Guide
No ratings yet
Neural Networks Question Bank Guide
46 pages
Week 6 Neural Networks Assignment
100% (1)
Week 6 Neural Networks Assignment
4 pages
AI Knowledge Representation & Reasoning
No ratings yet
AI Knowledge Representation & Reasoning
4 pages
Deep Learning Course at IIT Ropar
No ratings yet
Deep Learning Course at IIT Ropar
4 pages
SVM and Perceptron on Modified Iris Dataset
No ratings yet
SVM and Perceptron on Modified Iris Dataset
3 pages
AI Knowledge Representation Quiz Answers
No ratings yet
AI Knowledge Representation Quiz Answers
3 pages
Week 1 Assignment: Machine Learning Concepts
100% (1)
Week 1 Assignment: Machine Learning Concepts
3 pages
NPTEL Machine Learning Week 1 MCQs
No ratings yet
NPTEL Machine Learning Week 1 MCQs
25 pages
Week 12 Machine Learning Assignment
100% (1)
Week 12 Machine Learning Assignment
4 pages
Reinforcement Learning Assignment Solutions
No ratings yet
Reinforcement Learning Assignment Solutions
50 pages
Quiz on Large Language Models
No ratings yet
Quiz on Large Language Models
3 pages
Reinforcement Learning Assignment Insights
No ratings yet
Reinforcement Learning Assignment Insights
3 pages
NPTEL Machine Learning Assignment Week 0
No ratings yet
NPTEL Machine Learning Assignment Week 0
56 pages
Week 5 Deep Learning Assignment
No ratings yet
Week 5 Deep Learning Assignment
9 pages
Machine Learning Assignment 3 Overview
No ratings yet
Machine Learning Assignment 3 Overview
5 pages
NPTEL Week 11: RL Assignment Insights
No ratings yet
NPTEL Week 11: RL Assignment Insights
3 pages
Deep Learning Exam Paper 2022-23
0% (1)
Deep Learning Exam Paper 2022-23
2 pages
Understanding Bias in Large Language Models
No ratings yet
Understanding Bias in Large Language Models
4 pages
CS388 Fall 2023 NLP Midterm Exam Instructions
No ratings yet
CS388 Fall 2023 NLP Midterm Exam Instructions
11 pages
Week 7-9 Assignments on NLP Models
No ratings yet
Week 7-9 Assignments on NLP Models
17 pages
Understanding Large Language Models Quiz
No ratings yet
Understanding Large Language Models Quiz
3 pages
Risk and AI: Key Concepts Explained
100% (2)
Risk and AI: Key Concepts Explained
19 pages
Understanding Bias in Data Analysis
No ratings yet
Understanding Bias in Data Analysis
2 pages
Fairness in Automated Test Content
No ratings yet
Fairness in Automated Test Content
19 pages
Animals in The Quran Series Camel
No ratings yet
Animals in The Quran Series Camel
9 pages
ECO 1102B Macroeconomics Course Outline
No ratings yet
ECO 1102B Macroeconomics Course Outline
7 pages
Postmodern Analysis of The Tempest
No ratings yet
Postmodern Analysis of The Tempest
42 pages
LIC Nav Jeevan Shree Plan 912 Overview
No ratings yet
LIC Nav Jeevan Shree Plan 912 Overview
5 pages
Evolution of Science & Technology in the Philippines
No ratings yet
Evolution of Science & Technology in the Philippines
23 pages
Principles for Effective Policy Design
No ratings yet
Principles for Effective Policy Design
24 pages
Presentation Template for Research
No ratings yet
Presentation Template for Research
50 pages
1992 Rio Declaration Overview
No ratings yet
1992 Rio Declaration Overview
6 pages
Hide and Seek Bible Songs
No ratings yet
Hide and Seek Bible Songs
34 pages
Kindergarten English Report Cards
No ratings yet
Kindergarten English Report Cards
13 pages
Titanic: Untold Love Stories
No ratings yet
Titanic: Untold Love Stories
12 pages
Public Administration N4 Exam Paper
No ratings yet
Public Administration N4 Exam Paper
6 pages
The Romance of Tristan and Iseult
No ratings yet
The Romance of Tristan and Iseult
16 pages
Queen Anne Furniture: Style & Characteristics
No ratings yet
Queen Anne Furniture: Style & Characteristics
3 pages
Guidelines for External Auditor Applications
No ratings yet
Guidelines for External Auditor Applications
6 pages
RFID Access Control System KS158/KS168
100% (1)
RFID Access Control System KS158/KS168
3 pages
Language Inequality and Communication Insights
No ratings yet
Language Inequality and Communication Insights
4 pages
The Fall of Lucifer Explained
No ratings yet
The Fall of Lucifer Explained
3 pages
Easy Company Structure and Ranks
No ratings yet
Easy Company Structure and Ranks
11 pages
Disclosure
No ratings yet
Disclosure
2 pages
Drafting Corp Agree 2007
No ratings yet
Drafting Corp Agree 2007
6 pages
Year 1 NIS Textbook and Exercise List
No ratings yet
Year 1 NIS Textbook and Exercise List
1 page
Understanding Environment and Ecology
No ratings yet
Understanding Environment and Ecology
28 pages
Annual Household Expense Summary 2022
No ratings yet
Annual Household Expense Summary 2022
1 page
Chinese Phrases for Doctor Visits
No ratings yet
Chinese Phrases for Doctor Visits
3 pages
Understanding Deviance and Social Control
No ratings yet
Understanding Deviance and Social Control
103 pages
Midterm Exam: Chinese & English Questions
No ratings yet
Midterm Exam: Chinese & English Questions
5 pages
Elden Ring Lore Secrets Revealed
No ratings yet
Elden Ring Lore Secrets Revealed
1 page
Midlothian Council's Climate Action Plan
No ratings yet
Midlothian Council's Climate Action Plan
19 pages
Pizzamania's Quality Control Challenges
No ratings yet
Pizzamania's Quality Control Challenges
2 pages

Understanding Bias in Large Language Models

Uploaded by

Understanding Bias in Large Language Models

Uploaded by

Introduction to Large Language Models

Number of questions: 10 Total mark: 10 X 1 = 10

Which statements correctly characterize “bias” in the context of LLMs?

1. Bias can generate objectionable or stereotypical views in model outputs.

• Statements (2) and (4) are not necessarily correct:

o (2) False: Bias in data is often unintentional, reflecting existing societal or

The Stereotype Score (ss) refers to:

a. The frequency with which a language model rejects biased associations.

Which of the following are prominent sources of bias in LLMs?

1. Improper selection of training data leading to skewed distributions.

a. To directly fine-tune the model parameters to remove bias

Which of the following best describes the “regard” metric?

• Regard is typically measured by classifying the tone of text toward a demographic

a. Retrieving safety demonstrations similar to the user query.

The “Responsible LLM” concept is stated to address:

• Responsible LLM research focuses on a broad range of ethical, social, and

a. The ratio of anti-stereotypical associations to neutral associations

QUESTION 10: [1 mark]

• Improper data selection leads to over-representation of certain domains, topics, or

Common questions

In which ways can high-resource languages (HRLs) create imbalance in training LLMs?

What strategy best describes the use of adversarial triggers in bias mitigation for LLMs?

What are some potential real-world impacts of bias in large language models (LLMs)?

Explain what the 'regard' metric evaluates in the context of language model outputs.

What does the 'Responsible LLM' concept aim to address beyond just bias?

What is a potential risk in selecting improper training data for LLMs?

How does the 'Stereotype Score' (ss) in large language models function as a metric?

How does the 'icat' metric within the StereoSet framework approach bias measurement in LLMs?

Discuss prominent sources of bias in the development of large language models.

You might also like