0% found this document useful (0 votes)
328 views6 pages

Understanding Bias in Large Language Models

The document is an assignment on Large Language Models (LLMs) consisting of 10 questions, each worth 1 mark, focusing on bias, metrics, and training data selection in LLMs. Key concepts include the characterization of bias, the Stereotype Score, and the impact of high-resource versus low-resource languages on model training. The assignment also covers bias mitigation strategies and the ethical considerations of responsible LLM development.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
328 views6 pages

Understanding Bias in Large Language Models

The document is an assignment on Large Language Models (LLMs) consisting of 10 questions, each worth 1 mark, focusing on bias, metrics, and training data selection in LLMs. Key concepts include the characterization of bias, the Stereotype Score, and the impact of high-resource versus low-resource languages on model training. The assignment also covers bias mitigation strategies and the ethical considerations of responsible LLM development.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Introduction to Large Language Models

Assignment- 12

Number of questions: 10 Total mark: 10 X 1 = 10


_________________________________________________________________________

QUESTION 1: [1 mark]

Which statements correctly characterize “bias” in the context of LLMs?

1. Bias can generate objectionable or stereotypical views in model outputs.


2. Bias is always intentionally introduced by malicious data curators.
3. Bias can cause harmful real-world impacts such as reinforcing discrimination.
4. Bias only affects low-resource languages; high-resource languages are unaffected.

a. 1 and 2

b. 1 and 3

c. 2 and 4

d. 1, 3, and 4

Correct Answer: b

Explanation:

• (1) True: Model outputs can reflect harmful stereotypes if training data or modelling
procedures contain biases.
• (3) True: Biased outputs may perpetuate discrimination or unfair treatment in real-
world contexts.

• Statements (2) and (4) are not necessarily correct:

o (2) False: Bias in data is often unintentional, reflecting existing societal or


historical imbalances.
o (4) False: Bias can affect any language; high-resource languages are not
inherently immune.

_______________________________________________________________________

QUESTION 2: [1 mark]

The Stereotype Score (ss) refers to:

a. The frequency with which a language model rejects biased associations.


b. The measure of how often a model’s predictions are meaningless as opposed to
meaningful.
c. A ratio of positive sentiment to negative sentiment in model outputs.
d. The proportion of examples in which a model chooses a stereotypical association
over an anti-stereotypical one.
Correct Answer: d

Explanation:

• Stereotype Score (ss) is a metric that measures how frequently the model picks a
stereotypical continuation or association instead of a non-stereotypical or anti-
stereotypical one.

• Essentially, it’s a proportion (or fraction) of test items for which the model output
aligns with the stereotype.

_________________________________________________________________________

QUESTION 3: [1 mark]

Which of the following are prominent sources of bias in LLMs?

1. Improper selection of training data leading to skewed distributions.


2. Reliance on older datasets causing “temporal bias.”
3. Overemphasis on low-resource languages causing “linguistic inversion.”
4. Unequal focus on high-resource languages resulting in “cultural bias.”

a. 1 and 2 only

b. 2 and 3 only

c. 1, 2, and 4

d. 1, 3, and 4

Correct Answer: c

Explanation:

1. Improper selection of training data (true) can lead to some groups or topics
being over-represented, causing bias.
2. Reliance on older datasets (true) can introduce out-of-date or “temporal
bias” that doesn’t reflect current social norms or language usage.
3. “Overemphasis on low-resource languages” is not commonly described as
“linguistic inversion”; typically the bias is the opposite — under-representation
of low-resource languages.
4. Unequal focus on high-resource languages (true) can lead to cultural
biases and poor performance or misrepresentations of underrepresented
cultures.

_________________________________________________________________________

QUESTION 4: [1 mark]
In the context of bias mitigation based on adversarial triggers, which best describes the goal
of prepending specially chosen tokens to prompts?

a. To directly fine-tune the model parameters to remove bias


b. To override all prior knowledge in a model, effectively “resetting” it
c. To exploit the model’s distributional patterns, thereby neutralizing or flipping biased
associations in generated text
d. To randomly shuffle the tokens so that the model becomes more robust

Correct Answer: c

Explanation:

• Adversarial triggers are carefully crafted token sequences that, when prepended to
the prompt, steer the model’s output in a certain direction (e.g., reducing bias or
toxicity). They work within the model’s learned distribution rather than overriding its
knowledge.
• They do not retrain the model; they exploit patterns in the existing parameters to
mitigate biased outcomes.

________________________________________________________________________

QUESTION 5: [1 mark]

Which of the following best describes the “regard” metric?

a. It is a measure of how well a model can explain its internal decision process.
b. It is a measurement of a model’s perplexity on demographically sensitive text.
c. It is the proportion of times a model self-corrects discriminatory language.
d. It is a classification label reflecting the attitude towards a demographic group in the
generated text.

Correct Answer: d

Explanation:

• Regard is typically measured by classifying the tone of text toward a demographic


group (e.g., “positive,” “negative,” or “neutral” regard).
• It’s used to assess whether certain demographics consistently receive negative or
disrespectful language.

_________________________________________________________________________

QUESTION 6: [1 mark]
Which of the following steps compose the approach for improving response safety via in-
context learning?

a. Retrieving safety demonstrations similar to the user query.


b. Fine-tuning the model with additional labeled data after generation.
c. Providing retrieved demonstrations as examples in the prompt to guide the model’s
response generation.
d. Sampling multiple outputs from LLMs and choosing the majority opinion.

Correct Answer: a, c

Explanation:

• One strategy for safe or polite generation with large language models is to retrieve
“safety demonstrations” from a database of safe examples. Then you include these
examples in the prompt to the LLM, showing it how to respond safely.

• Fine-tuning (b) is a different technique, not part of the described in-context learning
approach.

• Majority vote (d) is also not typically a method described under “improving response
safety via in-context learning.”
_________________________________________________________________________

QUESTION 7: [1 mark]

Which statement(s) is/are correct about how high-resource (HRL) vs. low-resource
languages (LRL) affect model training?

a. LRLs typically have higher performance metrics due to smaller population sizes.
b. HRLs get more data, so the model might overfit to HRL cultural perspectives.
c. LRLs are often under-represented, leading to potential underestimation of their
cultural nuances.
d. The dominance of HRLs can cause a reinforcing cycle that perpetuates imbalance.

Correct Answers: b, c, d

Explanation:

• (b) True: If the model sees far more data in certain HRLs, it might be overly biased or
“overfit” to those languages’ norms and perspectives.
• (c) True: LRLs often lack extensive corpora, so the model learns fewer details about
these languages, risking lower performance and cultural misrepresentations.
• (d) True: The more a model focuses on HRLs, the more beneficial it appears to be
for those languages, attracting further data, thus perpetuating imbalance.
• (a) is not correct: LRLs typically have lower performance metrics due to insufficient
training data, not higher.

_________________________________________________________________________

QUESTION 8: [1 mark]

The “Responsible LLM” concept is stated to address:


a. Only the bias in LLMs
b. A set of concerns including explainability, fairness, robustness, and security
c. Balancing training costs with carbon footprint
d. Implementation of purely rule-based safety filters

Correct Answer: b

Explanation:

• Responsible LLM research focuses on a broad range of ethical, social, and


technical concerns:
o Fairness & bias mitigation
o Explainability & transparency
o Robustness to adversarial inputs
o Security & safe deployment

_________________________________________________________________________

QUESTION 9: [1 mark]

Within the StereoSet framework, the icat metric specifically refers to:

a. The ratio of anti-stereotypical associations to neutral associations


b. The percentage of times a model refuses to generate content deemed hateful
c. A measure of domain coverage across different demographic groups
d. A balanced metric capturing both a model’s language modelling ability and the
tendency to avoid stereotypical bias

Correct Answer: d

Explanation:

• In the StereoSet framework, icat is designed to measure how well the model
balances contextual accuracy (i.e., good language modelling) and reduced
stereotyping.
• It’s a combined metric that looks at correctness in typical language modelling tasks
while also penalizing stereotypical responses.

_________________________________________________________________________

QUESTION 10: [1 mark]

Bias due to improper selection of training data typically arises in LLMs when:

a. Data are selected exclusively from curated, balanced sources with equal
representation
b. The language model sees only real-time social media feeds without any historical
texts
c. The training corpus over-represents some topics or groups, creating a skewed
distribution
d. All data are automatically filtered to remove any demographic markers

Correct Answer: c

Explanation:

• Improper data selection leads to over-representation of certain domains, topics, or


demographic groups, causing the learned model to be skewed.

• Balanced data curation and filtering are actually methods to reduce bias. If data
come only from certain communities or perspectives, the model lacks balanced
coverage, and biases surface.

_________________________________________________________________________

Common questions

Powered by AI

HRLs receive more data, potentially causing models to overfit to these languages' norms and cultural perspectives. This results in less focus and representation for low-resource languages (LRLs), thereby perpetuating a reinforcing cycle that leads to imbalance by continually privileging HRLs .

Adversarial triggers are specially chosen token sequences prepended to prompts which exploit the model's distributional patterns without modifying its parameters. This method aims to neutralize or reverse biased associations in generated text, utilizing the model's existing knowledge to mitigate biased outcomes .

Bias in LLMs can perpetuate discrimination or result in unfair treatment in real-world contexts by reinforcing harmful stereotypes present in the training data, which can lead to objectionable or stereotypical views in model outputs .

The 'regard' metric evaluates the tone or attitude reflected in language model outputs towards demographic groups, classifying it as positive, negative, or neutral. It's used to determine if demographic groups consistently face negative or disrespectful language, thus assessing potential bias .

The 'Responsible LLM' concept aims to address a broad set of concerns beyond bias, including explainability, fairness, robustness to adversarial inputs, and security in LLM deployment. This ensures more ethically aligned and socially responsible AI systems .

The risk of selecting improper training data includes over-representing certain topics or groups, which results in skewed distribution and learned biases in the model. This can lead to unbalanced output, misrepresentations, and perpetuation of stereotypes or potentially harmful outputs in real-world applications .

The 'Stereotype Score' (ss) measures the frequency with which a model selects stereotypical associations over anti-stereotypical ones. It evaluates the proportion of test items for which a model's output aligns with stereotypes, helping assess bias levels in LLM outputs .

The 'icat' metric balances a model's contextual accuracy in standard language modelling tasks with its tendency to avoid stereotypical biases. It combines evaluation of both the model’s general performance and its proclivity to reduce stereotyping, thus providing a nuanced assessment of bias in LLMs .

Prominent sources of bias in LLMs include improper selection of training data leading to skewed distributions, reliance on older datasets introducing temporal bias, and unequal focus on high-resource languages resulting in cultural bias. These factors contribute to imbalances and misrepresentations in model outputs .

You might also like