0% found this document useful (0 votes)
18 views33 pages

Basic Statistics and Probability Notes

The document provides an introduction to statistical methods, covering basic statistics and probability concepts. It includes definitions of statistics, types of data, levels of measurement, measures of central tendency, and variability, as well as foundational probability concepts such as random experiments, sample spaces, events, and probability rules. Key topics include the classification of variables, measures of central tendency and variability, and the distinction between independent and dependent events.

Uploaded by

Rohit Sharma
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views33 pages

Basic Statistics and Probability Notes

The document provides an introduction to statistical methods, covering basic statistics and probability concepts. It includes definitions of statistics, types of data, levels of measurement, measures of central tendency, and variability, as well as foundational probability concepts such as random experiments, sample spaces, events, and probability rules. Key topics include the classification of variables, measures of central tendency and variability, and the distinction between independent and dependent events.

Uploaded by

Rohit Sharma
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

📘 INTRODUCTION TO STATISTICAL METHODS (ISM)

MODULE–1 : BASIC STATISTICS (EXAM READY NOTES)

What is Statistics?
Definition
Statistics is a branch of applied mathematics that deals with:
• Collection of data
• Organization & presentation of data
• Analysis of data
• Interpretation & inference
• Decision making based on data
Statistics = Data-driven decision making science
Backbone of:
• Data Science
• Artificial Intelligence
• Machine Learning

Types of Data
(A) Numerical / Quantitative Data
• Data expressed in numbers
1. Discrete Data
• Countable
• Finite values
Examples
• Number of children
• Defects per hour
2. Continuous Data
• Measured
• Infinite possible values
Examples
• Height
• Weight
• Voltage

(B) Categorical / Qualitative Data


• Represents categories or labels
Examples
• Gender
• Marital status
• Eye color
• Political party

Levels of Measurement (Very Important)


1. Nominal Scale
• Categories only
• No ordering
Examples
• Gender
• Religion
• Political party
Lowest level of measurement

2. Ordinal Scale
• Categories + ranking
• Differences not meaningful
Examples
• Grades (A, B, C)
• Satisfaction (Low, Medium, High)
• Faculty rank

3. Interval Scale
• Ordered scale
• Differences meaningful
• No true zero
Examples
• Temperature (°C, °F)
• Months of the year
Cannot say “twice as much”

4. Ratio Scale
• Ordered
• Differences meaningful
• True zero exists
Examples
• Weight
• Age
• Salary
• Length
Highest level of measurement
Ratios are meaningful

Variables – Classification
Quantitative Variables
• Discrete
• Continuous
Qualitative Variables
• Nominal
• Ordinal

Measures of Central Tendency


A single value that represents the center of data.
Types:
1. Mean
2. Median
3. Mode

Mean (Arithmetic Average)


Formula (Ungrouped Data)
∑𝑋
𝑋ˉ =
𝑁

Formula (Grouped Data)


∑𝑓𝑋
𝑋ˉ =
𝑁

Properties of Mean
✔ Most stable measure
✔ Uses all observations
Affected by extreme values
✔ Sum of deviations from mean = 0
May not be actual data value

When to Use Mean?


• Quantitative data
• No extreme outliers
• Interval / Ratio scale
• When SD, variance required

Mode
Definition
• Value with highest frequency
Properties
✔ Can be used for qualitative & quantitative data
✔ Not affected by outliers
May not be unique

When to Use Mode?


• Nominal data
• Most frequent category required

Median
Definition
• Middle value when data is arranged
Rules
• Odd n → middle value
• Even n → average of two middle values

Properties
✔ Not affected by outliers
✔ Positional measure
✔ 50% data on each side

When to Use Median?


• Ordinal data
• Highly skewed distributions
• Presence of outliers

Data Distribution & Skewness


Symmetric Distribution
𝑀𝑒𝑎𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑜𝑑𝑒

Positively Skewed (Right Skew)


𝑀𝑒𝑎𝑛 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑜𝑑𝑒

• Long right tail

Negatively Skewed (Left Skew)


𝑀𝑒𝑎𝑛 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑜𝑑𝑒

• Long left tail

Empirical Relation (Moderate Skew)


𝑀𝑜𝑑𝑒 = 3𝑀𝑒𝑑𝑖𝑎𝑛 − 2𝑀𝑒𝑎𝑛

Mean preferred → symmetric data


Median preferred → heavily skewed data

Why Mean Alone is Not Enough?


Different datasets can have:
• Same Mean
• Same Median
• Same Mode
Still spread of data differs
Hence we need Measures of Variability

Measures of Variability
Describe spread / dispersion of data.
Types:
1. Range
2. Variance
3. Standard Deviation

Range
Formula
𝑅𝑎𝑛𝑔𝑒 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛

Properties
✔ Easy to compute
Uses only two values
Highly unreliable

Variance
Definition
Average of squared deviations from mean
Population Variance
∑(𝑋 − 𝜇)2
𝜎2 =
𝑁

Sample Variance
2
∑(𝑋 − 𝑋ˉ)2
𝑠 =
𝑛−1

Uses (n−1) to remove bias

Standard Deviation
Definition
Square root of variance
Measures average distance from mean
Formula
𝜎 = √𝜎 2 , 𝑠 = √𝑠 2

Properties
✔ Most important variability measure
✔ Uses all data
✔ Same unit as original data

Five Number Summary


Includes:
1. Minimum
2. Q1 (25%)
3. Median (Q2)
4. Q3 (75%)
5. Maximum

Interquartile Range (IQR)


Formula
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Properties
✔ Measures middle 50% spread
✔ Not affected by outliers

Outlier Detection
Limits
𝐿𝑜𝑤𝑒𝑟 = 𝑄1 − 1.5(𝐼𝑄𝑅)
𝑈𝑝𝑝𝑒𝑟 = 𝑄3 + 1.5(𝐼𝑄𝑅)

• Values outside → potential outliers

Box & Whisker Plot


• Visualizes:
o Median
o Quartiles
o Spread
o Outliers
Used for comparison & skewness detection
INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–1 (Session 2): BASIC PROBABILITY

Random Experiment
Definition
A random experiment is an experiment whose outcome cannot be predicted with certainty before performing it.
Examples
• Tossing a coin → Head / Tail
• Throwing a die → 1 to 6
• Waiting time at bus stop
• Transmitting a signal through a channel
Outcome is uncertain, but set of outcomes is known

Sample Space (S)


Definition
The sample space is the set of all possible outcomes of a random experiment.
Examples
• Toss a coin:
𝑆 = {𝐻, 𝑇}

• Throw a die:
𝑆 = {1,2,3,4,5,6}

• Quality test of IC:


𝑆 = {𝐴𝑐𝑐𝑒𝑝𝑡𝑒𝑑, 𝑅𝑒𝑗𝑒𝑐𝑡𝑒𝑑}

Event
Definition
An event is a subset of the sample space.
• Includes:
o Empty set (∅)
o Entire sample space (S)
If outcome ∈ A → event occurs

Complement of an Event
Definition
The complement of event A (denoted by 𝐴𝑐 ) consists of all outcomes not in A.
𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)

Very useful in numericals (shortcut method)

Union of Two Events


Definition
𝐴 ∪ 𝐵 = {outcomes in A or B or both}

Probability Rule
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

If mutually exclusive:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Intersection of Two Events


Definition
𝐴 ∩ 𝐵 = {outcomes common to both A and B}

Represents simultaneous occurrence

Mutually Exclusive Events


Definition
Two events A and B are mutually exclusive if:
𝐴∩𝐵 =∅

If one occurs, the other cannot occur.


Property
𝑃(𝐴 ∩ 𝐵) = 0

Independent vs Dependent Events (Very Important)


Independent Events
• Occurrence of one does not affect the other
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)
Dependent Events
• Occurrence of one affects the probability of the other

Comparison (Direct Exam Question)


Basis Mutually Exclusive Independent
Can occur together No Yes
Intersection 0 ≠0
Formula P(A∩B)=0 P(A∩B)=P(A)P(B)
Events cannot be both mutually exclusive and independent (unless probability is zero)

Definition of Probability
(A) Classical Probability
Favorable outcomes
𝑃(𝐴) =
Total outcomes

All outcomes must be equally likely

(B) Empirical Probability


Number of times event occurs
𝑃(𝐴) =
Total trials

Based on Law of Large Numbers

(C) Axiomatic Probability


If S is sample space and E is an event:
1. 𝑃(𝑆) = 1
2. 0 ≤ 𝑃(𝐸) ≤ 1
3. If 𝐸1 ∩ 𝐸2 = ∅:
𝑃(𝐸1 ∪ 𝐸2 ) = 𝑃(𝐸1 ) + 𝑃(𝐸2 )

Probability Scale
• 0 → Impossible event
• 0.5 → Equally likely
• 1 → Certain event

Addition Rule of Probability


General Rule
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

For Mutually Exclusive Events


𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Solved Numerical Patterns (VERY IMPORTANT)
Example: Two Dice
• Total outcomes = 36
P(sum > 8)
10
= 𝑃(9) + 𝑃(10) + 𝑃(11) + 𝑃(12) =
36

P(neither 7 nor 11)


28
= 1 − [𝑃(7) + 𝑃(11)] =
36

Always use complement if condition says “neither / at least one / not”

At Least One Event


𝑃(at least one) = 𝑃(𝐴 ∪ 𝐵)

Shortcut:
𝑃(at least one) = 1 − 𝑃(none)

Maximum Probability Product Question


If A and B are mutually exclusive and:
𝐴∪𝐵 =𝑆

Then:
𝑃(𝐴) + 𝑃(𝐵) = 1

Product is maximum when:


1
𝑃(𝐴) = 𝑃(𝐵) =
2
1
⇒ max⁡[𝑃(𝐴)𝑃(𝐵)] =
4

Conditional Independence Check


To check if A and B are independent:
𝑃(𝐴 ∩ 𝐵) =? 𝑃(𝐴)𝑃(𝐵)

If not equal → events are dependent.

Combination-Based Probability
Example:
• Selecting 2 laptops out of 6 computers
Total outcomes:
6
( ) = 15
2
Probability = Favorable / Total combinations
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–1 (Session 2): BASIC PROBABILITY

Random Experiment
Definition
A random experiment is an experiment whose outcome cannot be predicted with certainty before performing it.
Examples
• Tossing a coin → Head / Tail
• Throwing a die → 1 to 6
• Waiting time at bus stop
• Transmitting a signal through a channel
Outcome is uncertain, but set of outcomes is known

Sample Space (S)


Definition
The sample space is the set of all possible outcomes of a random experiment.
Examples
• Toss a coin:
𝑆 = {𝐻, 𝑇}

• Throw a die:
𝑆 = {1,2,3,4,5,6}

• Quality test of IC:


𝑆 = {𝐴𝑐𝑐𝑒𝑝𝑡𝑒𝑑, 𝑅𝑒𝑗𝑒𝑐𝑡𝑒𝑑}

Event
Definition
An event is a subset of the sample space.
• Includes:
o Empty set (∅)
o Entire sample space (S)
If outcome ∈ A → event occurs

Complement of an Event
Definition
The complement of event A (denoted by 𝐴𝑐 ) consists of all outcomes not in A.
𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)

Very useful in numericals (shortcut method)

Union of Two Events


Definition
𝐴 ∪ 𝐵 = {outcomes in A or B or both}
Probability Rule
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

If mutually exclusive:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Intersection of Two Events


Definition
𝐴 ∩ 𝐵 = {outcomes common to both A and B}

Represents simultaneous occurrence

Mutually Exclusive Events


Definition
Two events A and B are mutually exclusive if:
𝐴∩𝐵 =∅

If one occurs, the other cannot occur.


Property
𝑃(𝐴 ∩ 𝐵) = 0

Independent vs Dependent Events (Very Important)


Independent Events
• Occurrence of one does not affect the other
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)

Dependent Events
• Occurrence of one affects the probability of the other

Comparison (Direct Exam Question)


Basis Mutually Exclusive Independent
Can occur together No Yes
Intersection 0 ≠0
Formula P(A∩B)=0 P(A∩B)=P(A)P(B)
Events cannot be both mutually exclusive and independent (unless probability is zero)

Definition of Probability
(A) Classical Probability
Favorable outcomes
𝑃(𝐴) =
Total outcomes
All outcomes must be equally likely

(B) Empirical Probability


Number of times event occurs
𝑃(𝐴) =
Total trials

Based on Law of Large Numbers

(C) Axiomatic Probability


If S is sample space and E is an event:
1. 𝑃(𝑆) = 1
2. 0 ≤ 𝑃(𝐸) ≤ 1
3. If 𝐸1 ∩ 𝐸2 = ∅:
𝑃(𝐸1 ∪ 𝐸2 ) = 𝑃(𝐸1 ) + 𝑃(𝐸2 )

Probability Scale
• 0 → Impossible event
• 0.5 → Equally likely
• 1 → Certain event

Addition Rule of Probability


General Rule
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

For Mutually Exclusive Events


𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Solved Numerical Patterns (VERY IMPORTANT)


Example: Two Dice
• Total outcomes = 36
P(sum > 8)
10
= 𝑃(9) + 𝑃(10) + 𝑃(11) + 𝑃(12) =
36

P(neither 7 nor 11)


28
= 1 − [𝑃(7) + 𝑃(11)] =
36

Always use complement if condition says “neither / at least one / not”

At Least One Event


𝑃(at least one) = 𝑃(𝐴 ∪ 𝐵)

Shortcut:
𝑃(at least one) = 1 − 𝑃(none)

Maximum Probability Product Question


If A and B are mutually exclusive and:
𝐴∪𝐵 =𝑆

Then:
𝑃(𝐴) + 𝑃(𝐵) = 1

Product is maximum when:


1
𝑃(𝐴) = 𝑃(𝐵) =
2
1
⇒ max⁡[𝑃(𝐴)𝑃(𝐵)] =
4

Conditional Independence Check


To check if A and B are independent:
𝑃(𝐴 ∩ 𝐵) =? 𝑃(𝐴)𝑃(𝐵)

If not equal → events are dependent.

Combination-Based Probability
Example:
• Selecting 2 laptops out of 6 computers
Total outcomes:
6
( ) = 15
2

Probability = Favorable / Total combinations

MOST IMPORTANT EXAM / QUIZ QUESTIONS (Session-2)


✔ Define random experiment, sample space, event
✔ Union, intersection, complement with formula
✔ Mutually exclusive vs independent events
✔ State axioms of probability
✔ Classical vs empirical probability
✔ Two-dice numericals
✔ “At least one / neither / exactly one” problems
✔ Maximum value probability problems
✔ Independence checking

ISM STATUS (TILL NOW)


✔ Session-1: Statistics basics
✔ Session-2: Probability fundamentals & numericals
NEXT STEP
Say FINAL ISM REVISION
or FINAL COMBINED REVISION (ML + ISM)
I’ll give you last-night survival notes + formula sheet
Sources
3. ISM_Session 3_November [Link]
PDF
Perfect
I’ve carefully studied ISM – Session 3 PDF (Conditional Probability & Bayes’ Theorem) and converted it into clean,
exam-oriented notes with step-by-step numericals exactly as your faculty expects.
Notes strictly prepared from your uploaded PDF (Session-3)
3. ISM_Session 3_November 2025

High-scoring for Mid-Sem + Quiz

INTRODUCTION TO STATISTICAL METHODS (ISM)


MODULE–2 : CONDITIONAL PROBABILITY & BAYES’ THEOREM
(Session 3 – Exam Ready Notes)

Conditional Probability
Concept
Sometimes partial information is available (event A has already occurred).
Then probability of B changes.
Definition
For two events A and B:
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 ∣ 𝐴) = , 𝑃(𝐴) ≠ 0
𝑃(𝐴)

Similarly,
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 ∣ 𝐵) = , 𝑃(𝐵) ≠ 0
𝑃(𝐵)

“𝐵 ∣ 𝐴” → B given A (NOT B divided by A)

Multiplication Rule of Probability


For Two Events
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) 𝑃(𝐵 ∣ 𝐴)

For Three Events


𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) 𝑃(𝐵 ∣ 𝐴) 𝑃(𝐶 ∣ 𝐴 ∩ 𝐵)

General Case
𝑃(𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 ) = 𝑃(𝐴1 ) 𝑃(𝐴2 ∣ 𝐴1 ) … 𝑃(𝐴𝑛 ∣ 𝐴1 ∩ ⋯ ∩ 𝐴𝑛−1 )
Important Conditional Probability Results
• If B has no effect on A:
𝑃(𝐴 ∣ 𝐵) = 𝑃(𝐴)

• Then events are Independent:


𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴)𝑃(𝐵)

Independence of Events
Definition
Two events A and B are independent iff:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) 𝑃(𝐵)

Equivalent Conditions
• 𝑃(𝐴 ∣ 𝐵) = 𝑃(𝐴)
• 𝑃(𝐵 ∣ 𝐴) = 𝑃(𝐵)
Independence ≠ Mutually Exclusive
Mutually exclusive events cannot be independent (unless probability = 0)

Solved Pattern – Conditional Probability (Exam Type)


Example: Loan Default Table
Let
A = “person will not default”
B = “person is middle-aged”
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 ∣ 𝐵) =
𝑃(𝐵)
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 ∣ 𝐴) =
𝑃(𝐴)

Same data → different answers depending on conditioning.

Conditional Probability with Subset


If 𝐵 ⊂ 𝐴:
𝐴∩𝐵 = 𝐵

So,
𝑃(𝐵)
𝑃(𝐵 ∣ 𝐴) =
𝑃(𝐴)

Very common trap question

Complement in Conditional Probability


𝑃(𝐵𝑐 ∣ 𝐴) = 1 − 𝑃(𝐵 ∣ 𝐴)
𝑃(𝐴𝑐 ∣ 𝐵) = 1 − 𝑃(𝐴 ∣ 𝐵)
Law of Total Probability
Statement
Let 𝐴1 , 𝐴2 , … , 𝐴𝑘 be:
• Mutually exclusive
• Exhaustive (their union = S)
Then for any event B:
𝑘

𝑃(𝐵) = ∑ 𝑃(𝐵 ∣ 𝐴𝑖 ) 𝑃(𝐴𝑖 )


𝑖=1

Used when event B can occur via multiple disjoint causes

Tree Diagram Interpretation


• First level → Prior probabilities 𝑃(𝐴𝑖 )
• Second level → Conditional probabilities 𝑃(𝐵 ∣ 𝐴𝑖 )
• Multiply along branches
• Add at the end
Very helpful in Bayes’ theorem numericals

Bayes’ Theorem (MOST IMPORTANT)


Formula
𝑃(𝐵 ∣ 𝐴𝑖 ) 𝑃(𝐴𝑖 )
𝑃(𝐴𝑖 ∣ 𝐵) = 𝑘
∑𝑗=1 𝑃(𝐵 ∣ 𝐴𝑗 ) 𝑃(𝐴𝑗 )

Meaning
Posterior = Likelihood × Prior / Evidence
Used to reverse conditioning

Typical Bayes’ Theorem Questions


(A) Binary Channel / Medical Test
• Given:
o Transmission / disease probability
o Correct / incorrect detection
• Asked:
o Given output, what was input?

(B) Device / Forest / Advertisement Problems


𝑃(𝐵) = ∑𝑃(𝐵 ∣ 𝐴𝑖 )𝑃(𝐴𝑖 )

Then apply Bayes’ theorem.

Example Pattern (Exam Favorite)


Binary Channel
• P(1 transmitted) = 0.4
• P(1 received | 1 sent) = 0.95
• P(1 received | 0 sent) = 0.10
Step 1: Total Probability
𝑃(1 received) = 0.95(0.4) + 0.10(0.6)

Step 2: Bayes
𝑃(1 sent ∣ 1 received)

“At Least One” & Complement Rule


𝑃(at least one) = 1 − 𝑃(none)

Always check if complement is easier

Maximum Probability Product Question


If A and B are mutually exclusive and:
𝐴∪𝐵 =𝑆

Then:
𝑃(𝐴) + 𝑃(𝐵) = 1

Product 𝑃(𝐴)𝑃(𝐵)is maximum when equal:


1 1
𝑃(𝐴) = 𝑃(𝐵) = , max⁡ =
2 4

📘 INTRODUCTION TO STATISTICAL METHODS (ISM)


MODULE–2 : BAYES’ THEOREM & NAÏVE BAYES CLASSIFIER
(Session 4 – Exam Ready Notes)

Bayes’ Theorem (Formal Statement)


Let
• 𝐸1 , 𝐸2 , … , 𝐸𝑛 be mutually exclusive and exhaustive events
• 𝑃(𝐸𝑖 ) > 0
• 𝐴be any event such that 𝑃(𝐴) > 0
Then,
𝑃(𝐸𝑖 ) 𝑃(𝐴 ∣ 𝐸𝑖 )
𝑃(𝐸𝑖 ∣ 𝐴) = 𝑛
∑𝑗=1 𝑃( 𝐸𝑗 ) 𝑃(𝐴 ∣ 𝐸𝑗 )

Used when cause is unknown but effect is known

Interpretation of Bayes’ Theorem


Term Meaning
𝑃(𝐸𝑖 ) Prior probability
(P(A E_i))
(P(E_i A))
Term Meaning
Denominator Normalization (Total Probability)
Posterior = Prior × Likelihood / Evidence

Proof of Bayes’ Theorem (Exam-Friendly Logic)


1. Since 𝐴 ⊆ ⋃𝐸𝑖
2. 𝐴 = ⋃(𝐴 ∩ 𝐸𝑖 )
3. Using addition rule (mutually exclusive):
𝑃(𝐴) = ∑𝑃(𝐴 ∩ 𝐸𝑖 )

4. Using multiplication rule:


𝑃(𝐴 ∩ 𝐸𝑖 ) = 𝑃(𝐸𝑖 )𝑃(𝐴 ∣ 𝐸𝑖 )

5. Substitute in conditional probability definition


Mention “Rule of Total Probability” → scoring point

Law of Total Probability


If 𝐸1 , 𝐸2 , … , 𝐸𝑛 are mutually exclusive & exhaustive:
𝑛

𝑃(𝐴) = ∑ 𝑃( 𝐸𝑖 )𝑃(𝐴 ∣ 𝐸𝑖 )
𝑖=1

Used before applying Bayes’ theorem

Typical Bayes’ Theorem Problems


(A) Medical Diagnosis
• Disease vs No disease
• Symptom observed → find disease probability
(B) Spam Detection
• Spam / Not spam
• Email detected as spam → find true class
(C) Manager / Election / Product launch
• Multiple causes → single effect

Example Pattern (Exam Style)


Given:
• 𝑃(𝑋) = 4/9, 𝑃(𝑌) = 2/9, 𝑃(𝑍) = 1/3
• 𝑃(𝐴 ∣ 𝑋) = 3/10, 𝑃(𝐴 ∣ 𝑌) = 1/2, 𝑃(𝐴 ∣ 𝑍) = 4/5
(i) Find 𝑷(𝑨)
𝑃(𝐴) = ∑𝑃(𝐸𝑖 )𝑃(𝐴 ∣ 𝐸𝑖 )

(ii) Find 𝑷(𝑿 ∣ 𝑨)


𝑃(𝑋)𝑃(𝐴 ∣ 𝑋)
𝑃(𝑋 ∣ 𝐴) =
𝑃(𝐴)

This exact structure is repeated in exam numericals


Bayes’ Theorem for Hypothesis
𝑃(𝐸 ∣ 𝐻)𝑃(𝐻)
𝑃(𝐻 ∣ 𝐸) =
𝑃(𝐸)

Where:
• 𝐻: hypothesis
• 𝐸: evidence
Foundation of Machine Learning probabilistic models

MAP vs ML Hypothesis
Maximum A Posteriori (MAP)
ℎ𝑀𝐴𝑃 = arg⁡ max⁡ 𝑃(𝐷 ∣ ℎ)𝑃(ℎ)

Maximum Likelihood (ML)


If all priors equal:
ℎ𝑀𝐿 = arg⁡ max⁡ 𝑃(𝐷 ∣ ℎ)

MAP considers prior, ML ignores it

Bayesian Learning – Key Features


✔ Combines prior knowledge + data
✔ Produces probabilistic predictions
✔ Uses all hypotheses, weighted by probability
✔ Robust for uncertainty

Naïve Bayes Classifier (VERY IMPORTANT)


Core Formula
𝑃(𝑋 ∣ 𝐶)𝑃(𝐶)
𝑃(𝐶 ∣ 𝑋) =
𝑃(𝑋)

Where:
• 𝐶= class (Spam / Normal)
• 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )= features

Conditional Independence Assumption


Naïve Bayes assumes:
𝑛

𝑃(𝑋1 , 𝑋2 , … , 𝑋𝑛 ∣ 𝐶) = ∏ 𝑃( 𝑋𝑖 ∣ 𝐶)
𝑖=1

This is the “naïve” assumption

Naïve Bayes Classification Rule


𝐶̂ = arg⁡ max⁡ 𝑃(𝐶) ∏ 𝑃( 𝑋𝑖 ∣ 𝐶)
𝐶
𝑖

Denominator ignored (same for all classes)

Naïve Bayes – Step-by-Step (Must Write)


1. Collect raw data
2. Convert to frequency table
3. Compute:
o Prior 𝑃(𝐶)
o Likelihood 𝑃(𝑋𝑖 ∣ 𝐶)
4. Apply Naïve Bayes formula
5. Compare probabilities
6. Choose class with maximum posterior

Naïve Bayes for Text Classification


• Bag-of-Words model
• Each word treated as independent feature
𝑃(𝑐 ∣ 𝑑) = 𝑃(𝑐) ∏ 𝑃(𝑡 ∣ 𝑐)
𝑡∈𝑑

Where:
𝑁𝑡𝑐 + 1
𝑃(𝑡 ∣ 𝑐) =
𝑁𝑐 + 𝑉

𝑉= vocabulary size
+1 → Laplace smoothing

Zero Frequency Problem


If:
𝑃(𝑋𝑖 ∣ 𝐶) = 0 ⇒ Entire product becomes 0

Classifier fails

Laplace Smoothing (Solution)


𝑁𝑖𝑐 + 1
𝑃(𝑋𝑖 ∣ 𝐶) =
𝑁𝑐 + 𝑉

✔ Avoids zero probability


✔ Essential in text classification

Worked Naïve Bayes Pattern (Exam Favourite)


• Given dataset (Play Tennis / Email words)
• Find:
𝑃(𝑌𝑒𝑠 ∣ 𝑋)and𝑃(𝑁𝑜 ∣ 𝑋)
• Compare
• Decide class
Show multiplication of probabilities clearly

Advantages of Naïve Bayes


✔ Simple & fast
✔ Works well on large datasets
✔ Excellent for text classification
✔ Low computation

Limitations
Independence assumption unrealistic
Zero frequency issue (needs smoothing)
Not good for correlated features
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–3 : RANDOM VARIABLES & PROBABILITY DISTRIBUTIONS
(Session 5 – Exam Ready Notes)

Random Variable (RV)


Definition
A random variable is a function that assigns a real number to each outcome in the sample space of a random
experiment.
• Domain → Sample space
• Range → Real numbers
One and only one numerical value is assigned to each outcome.

Types of Random Variables


(A) Discrete Random Variable
• Takes countable values
• Values may be finite or infinite
Examples
• Number of heads in coin toss
• Number of customers arriving
• Number of defects

(B) Continuous Random Variable


• Takes any value in an interval
• Uncountable
Examples
• Height
• Weight
• Time
• Temperature

Classification of Random Variables


Random Variable Distribution Function
Discrete PMF + CDF
Continuous PDF + CDF

Probability Distribution (Discrete RV)


Probability Mass Function (PMF)
𝑓(𝑥) = 𝑃(𝑋 = 𝑥)

Conditions for Valid PMF


1. 𝑓(𝑥) ≥ 0
2. ∑𝑓(𝑥) = 1
Can be represented using table, graph, or formula.

Discrete Uniform Distribution


Definition
All values are equally likely.
1
𝑓(𝑥) =
𝑛

Example:
• Dice throw
1
𝑃(𝑋 = 𝑥) = , 𝑥 = 1,2,3,4,5,6
6

Mathematical Expectation (Mean)


Definition
Expected value is the weighted average of all possible values.
For Discrete RV
𝐸(𝑋) = ∑𝑥𝑓(𝑥)

Properties of Expectation
• 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋)
• 𝐸(𝑋 + 𝑏) = 𝐸(𝑋) + 𝑏
• 𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏
• 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
• 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)only if X and Y are independent

Fair Game Concept


A game is fair if:
𝐸(𝑋) = 0

Expected gain = Expected loss


(Used heavily in exam numericals)
Variance of Discrete Random Variable
Definition
Measures spread around the mean.
𝑉𝑎𝑟(𝑋) = 𝜎 2 = 𝐸[(𝑋 − 𝜇)2 ]

Alternative Formula (VERY IMPORTANT)


𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2

Standard Deviation
𝜎 = √𝑉𝑎𝑟(𝑋)

Rules of Variance
• 𝑉𝑎𝑟(𝑏) = 0(constant)
• 𝑉𝑎𝑟(𝑎𝑋) = 𝑎2 𝑉𝑎𝑟(𝑋)
• 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
• If X, Y independent:
𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏 2 𝑉𝑎𝑟(𝑌)

Cumulative Distribution Function (CDF)


Definition (Discrete Case)
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥)

Properties
• Non-decreasing
• 0 ≤ 𝐹(𝑥) ≤ 1
• 𝐹(∞) = 1
Built by cumulative sum of PMF values.

Continuous Random Variable


Key Points
• Probability at a single point = 0
𝑃(𝑋 = 𝑥) = 0

• Probability over interval:


𝑏
𝑃(𝑎 < 𝑋 < 𝑏) = ∫ 𝑓(𝑥) 𝑑𝑥
𝑎

Probability Density Function (PDF)


Conditions
1. 𝑓(𝑥) ≥ 0

2. ∫−∞ 𝑓(𝑥) 𝑑𝑥 = 1
Area under curve = probability

Expectation (Continuous RV)



𝐸(𝑋) = ∫ 𝑥𝑓(𝑥) 𝑑𝑥
−∞

𝑉𝑎𝑟(𝑋) = 𝐸(𝑋 2 ) − [𝐸(𝑋)]2

Joint Probability Distribution


Discrete Case
𝑓(𝑥, 𝑦) = 𝑃(𝑋 = 𝑥, 𝑌 = 𝑦)

Conditions
• 𝑓(𝑥, 𝑦) ≥ 0
• ∑ ∑𝑦 𝑓(𝑥, 𝑦) = 1
𝑥

Marginal Probability
Obtained by summing rows or columns.
𝑝𝑋 (𝑥) = ∑ 𝑓(𝑥, 𝑦)
𝑦

𝑝𝑌 (𝑦) = ∑ 𝑓(𝑥, 𝑦)
𝑥

Called marginal because values appear in table margins.

Independence of Random Variables


X and Y are independent if:
𝑓(𝑥, 𝑦) = 𝑓𝑋 (𝑥)𝑓𝑌 (𝑦)

Very common true/false / justification question

Conditional Probability Distribution


Discrete Case
𝑓(𝑥, 𝑦)
𝑃(𝑋 = 𝑥 ∣ 𝑌 = 𝑦) =
𝑓𝑌 (𝑦)

Continuous Case
𝑓(𝑥, 𝑦)
𝑓𝑋∣𝑌 (𝑥 ∣ 𝑦) =
𝑓𝑌 (𝑦)

📘 INTRODUCTION TO STATISTICAL METHODS (ISM)


MODULE–4 : IMPORTANT PROBABILITY DISTRIBUTIONS
(Session 6 – COMPLETE Exam Notes)
Bernoulli Distribution
Bernoulli Trial
A random experiment with only two outcomes:
• Success (1) with probability p
• Failure (0) with probability q = 1 − p

Definition
A random variable X follows Bernoulli distribution if:
𝑝, 𝑥=1
𝑃(𝑋 = 𝑥) = {
𝑞 = 1 − 𝑝, 𝑥 = 0

Mean & Variance


𝐸(𝑋) = 𝑝
𝑉𝑎𝑟(𝑋) = 𝑝𝑞

Key Points
✔ Single trial
✔ Special case of Binomial distribution (n=1)
✔ Used for yes/no, pass/fail, hit/miss problems

Binomial Distribution (VERY IMPORTANT)


Conditions for Binomial Experiment
1. Fixed number of trials n
2. Trials are independent
3. Probability of success p is constant
4. Each trial has only two outcomes

Probability Mass Function (PMF)


𝑛
𝑃(𝑋 = 𝑥) = ( ) 𝑝 𝑥 𝑞𝑛−𝑥
𝑥

Where:
• 𝑥 = 0,1,2, … , 𝑛
• 𝑞 =1−𝑝

Mean & Variance


𝐸(𝑋) = 𝑛𝑝
𝑉𝑎𝑟(𝑋) = 𝑛𝑝𝑞

Important Exam Language Mapping


Phrase Meaning
Exactly =
At most ≤
Phrase Meaning
At least ≥
Less than <
More than >

Typical Binomial Questions


✔ Coins, dice
✔ MCQ guessing
✔ Defective items
✔ Success/failure in repeated trials

Poisson Distribution (VERY FREQUENT)


When to Use
✔ Rare events
✔ Large 𝑛, small 𝑝
✔ Mean 𝜆 = 𝑛𝑝finite
Binomial → Poisson if:
𝑛 ≥ 100, 𝑝 ≤ 0.1, 𝜆 = 𝑛𝑝 ≤ 10

Definition (PMF)
𝑒 −𝜆 𝜆𝑥
𝑃(𝑋 = 𝑥) = , 𝑥 = 0,1,2, …
𝑥!

Mean & Variance (VERY IMPORTANT)


𝐸(𝑋) = 𝜆
𝑉𝑎𝑟(𝑋) = 𝜆

Mean = Variance (write this line in exam)

Typical Applications
✔ Calls arriving
✔ Customers arriving
✔ Defects per unit
✔ Accidents, errors

Common Probability Patterns


• 𝑃(𝑋 ≥ 𝑘) = 1 − 𝑃(𝑋 < 𝑘)
• 𝑃(𝑋 < 𝑘) = 𝑃(0) + 𝑃(1) + ⋯ + 𝑃(𝑘 − 1)

Normal Distribution (CORE CONCEPT)


Definition
A continuous random variable X follows normal distribution with parameters μ (mean) and σ² (variance) if:
2
(𝑥 −𝜇)
1 −
𝑓(𝑥) = 𝑒 2𝜎2
𝜎√2𝜋

Properties
✔ Bell-shaped curve
✔ Symmetric about mean
✔ Mean = Median = Mode
✔ Total area = 1
✔ Defined for −∞ < 𝑥 < ∞

Empirical Rule (68-95-99.7)


• 68% data → 𝜇 ± 𝜎
• 95% data → 𝜇 ± 2𝜎
• 99.7% data → 𝜇 ± 3𝜎

Standard Normal Distribution


Transformation
Any normal variable:
𝑋 ∼ 𝑁(𝜇, 𝜎 2 )

Converted to standard normal:


𝑋−𝜇
𝑍=
𝜎

Where:
𝑍 ∼ 𝑁(0,1)

Why Use Z-Transformation?


✔ Makes probability calculation easy
✔ Allows use of Z-tables

Standard Normal CDF


𝑃(𝑍 ≤ 𝑧) = 𝐹(𝑧)

Use symmetry:
𝐹(−𝑧) = 1 − 𝐹(𝑧)

Normal Approximation to Binomial


Conditions
𝑛𝑝 ≥ 15and𝑛𝑞 ≥ 15

Then:
𝑋 ∼ 𝑁(𝑛𝑝, 𝑛𝑝𝑞)
Continuity Correction (VERY IMPORTANT)
Binomial Probability Normal Approximation
𝑃(𝑋 ≤ 𝑘) 𝑃(𝑋 ≤ 𝑘 + 0.5)
𝑃(𝑋 < 𝑘) 𝑃(𝑋 < 𝑘 − 0.5)
𝑃(𝑋 ≥ 𝑘) 𝑃(𝑋 ≥ 𝑘 − 0.5)
𝑃(𝑋 > 𝑘) 𝑃(𝑋 > 𝑘 + 0.5)
Missing this = marks cut

Chi-Square Distribution (Intro)


Definition
If:
𝑍1 , 𝑍2 , … , 𝑍𝑘 ∼ 𝑁(0,1)

Then:
𝑌 = 𝑍12 + 𝑍22 + ⋯ + 𝑍𝑘2 ∼ 𝜒 2 (𝑘)

Properties
✔ Values from 0 to ∞
✔ Depends on degrees of freedom (k)
✔ Used in variance & goodness-of-fit

t-Distribution (Intro)
When Used
✔ Small sample size
✔ Population standard deviation unknown

Definition
𝑍
𝑡=
√𝜒 2 /𝑘

Follows t-distribution with k degrees of freedom


✔ Bell-shaped
✔ Heavier tails than normal

F-Distribution (Intro)
When Used
✔ Ratio of two variances
✔ Comparison of variability of two populations

Properties
✔ Defined only for positive values
✔ Depends on two degrees of freedom (𝜈1 , 𝜈2 )
✔ Used in ANOVA
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–5 : SAMPLING DISTRIBUTION & ESTIMATION
(Session 7 – COMPLETE Exam Notes)

Population and Sample


Population
• Entire set of elements of interest
• Can be finite or infinite
• Survey on population = Census
Sample
• Subset of population
• Survey on sample = Sample Survey
Statistics uses sample data to infer about population → Inferential Statistics

Parameter vs Statistic
Parameter
• Population characteristic
• Generally unknown
• Examples:
o Population mean → 𝜇
o Population variance → 𝜎 2
o Population proportion → 𝑃

Statistic
• Function of sample observations
• Used to estimate parameter
• Examples:
o Sample mean → 𝑥ˉ
o Sample variance → 𝑠 2
o Sample proportion → 𝑝̂
Statistic is also called an Estimator
Numerical value obtained = Estimate

Sampling – Why Needed?


• Population may be:
o Too large
o Infinite
o Costly & time-consuming
• Sampling:
o Saves money
o Saves time
o Gives reliable estimates if done properly
Observed Data Decomposition
Observed Data = Truth + Bias + Random Error

Good sampling design → minimizes bias & error

Methods of Sampling
(A) Probability Sampling
(All units have known probability of selection)
1. Simple Random Sampling
o Homogeneous population
o Each unit has equal chance
2. Systematic Sampling
o Arrange population in order
o Select every 𝑘th unit
𝑁
𝑘=
𝑛

3. Stratified Random Sampling


o Population is heterogeneous
o Divide into homogeneous strata
o Sample from each stratum proportionally
PPPS:
Stratum size × Total sample size
Sample from stratum =
Population size

(B) Non-Probability Sampling


• Convenience sampling
• Judgement sampling
• Quota sampling
• Snowball sampling
Higher sampling error

Sampling Error & Sampling Variation


Sampling Error
• Occurs when sample is not representative
• Due to improper sampling method
Sampling Variation
• Estimates vary from sample to sample
• Decreases as sample size increases
Larger sample → less variability

Sampling Distribution
Definition
The probability distribution of a statistic is called its sampling distribution.
Depends on:
• Population distribution
• Sample size
• Sampling method

Sampling Distribution of Sample Mean


Key Idea
• Take many samples of size 𝑛
• Compute sample mean for each
• Distribution of these means = sampling distribution of 𝑥ˉ

Properties
𝜇𝑥ˉ = 𝜇
𝜎
𝜎𝑥ˉ =
√𝑛

𝜎𝑥ˉ is called Standard Error of Mean

Central Limit Theorem (CLT) (MOST IMPORTANT)


Statement
If samples of size 𝑛are drawn randomly from a population with mean 𝜇and standard deviation 𝜎:
• For large samples (𝑛 ≥ 30):
o Sampling distribution of 𝑥ˉis approximately normal, regardless of population shape
• If population is normal:
o Sampling distribution of 𝑥ˉis normal for any n

CLT Results
𝜇𝑥ˉ = 𝜇
𝜎
𝜎𝑥ˉ =
√𝑛

CLT justifies use of Z-distribution

Z-Score for Sample Mean


Formula
𝑥ˉ − 𝜇
𝑍=
𝜎/√𝑛

Used to find probability related to sample mean

Exam Pattern (VERY COMMON)


1. Check 𝑛 ≥ 30→ Apply CLT
2. Compute standard error
3. Convert to Z
4. Use Z-table

Finite Population Correction Factor


When sampling from finite population:
𝜎 𝑁−𝑛
𝜎𝑥ˉ = √
√𝑛 𝑁 − 1

Rule of Thumb
𝑛
• If < 0.05
𝑁
→ Ignore correction factor

Sampling Distribution of Sample Proportion


Sample Proportion
𝑥
𝑝̂ =
𝑛

Where:
• 𝑥= number of successes
• 𝑛= sample size

CLT for Proportion


Applies if:
𝑛𝑝 > 15and𝑛𝑞 > 15

Mean & Standard Error


𝜇𝑝̂ = 𝑝
𝑝𝑞
𝜎𝑝̂ = √
𝑛

For finite population:


𝑝𝑞 𝑁 − 𝑛
𝜎𝑝̂ = √ √
𝑛 𝑁−1

Z-Score for Proportion


𝑝̂ − 𝑝
𝑍=
√𝑝𝑞/𝑛

Statistical Inference
Three forms:
1. Point Estimation
2. Interval Estimation
3. Hypothesis Testing (later module)

Point Estimation
• Single value used to estimate parameter
• Examples:
o 𝑥ˉestimates 𝜇
o 𝑠 2 estimates 𝜎 2
Varies from sample to sample

Confidence Interval
Meaning
A range of values within which the true population parameter lies with a given confidence.

Confidence Level
100(1 − 𝛼)%

Common values:
• 90% → 𝛼 = 0.10
• 95% → 𝛼 = 0.05
• 99% → 𝛼 = 0.01

Confidence Interval for Mean (σ known)


𝜎
𝑥ˉ ± 𝑍𝛼/2 ( )
√𝑛

Use Z-values:
• 95% → 1.96
• 99% → 2.58

Confidence Interval for Difference of Two Means


𝜎12 𝜎22
(𝑥ˉ1 − 𝑥ˉ2 ) ± 𝑍𝛼/2 √ +
𝑛1 𝑛2

Sample Size Determination


Formula
𝑍𝛼/2 𝜎 2
𝑛=( )
𝐸

Where:
• 𝐸= margin of error
Smaller error → larger sample size

You might also like