Basic Statistics and Probability Notes
Basic Statistics and Probability Notes
What is Statistics?
Definition
Statistics is a branch of applied mathematics that deals with:
• Collection of data
• Organization & presentation of data
• Analysis of data
• Interpretation & inference
• Decision making based on data
Statistics = Data-driven decision making science
Backbone of:
• Data Science
• Artificial Intelligence
• Machine Learning
Types of Data
(A) Numerical / Quantitative Data
• Data expressed in numbers
1. Discrete Data
• Countable
• Finite values
Examples
• Number of children
• Defects per hour
2. Continuous Data
• Measured
• Infinite possible values
Examples
• Height
• Weight
• Voltage
2. Ordinal Scale
• Categories + ranking
• Differences not meaningful
Examples
• Grades (A, B, C)
• Satisfaction (Low, Medium, High)
• Faculty rank
3. Interval Scale
• Ordered scale
• Differences meaningful
• No true zero
Examples
• Temperature (°C, °F)
• Months of the year
Cannot say “twice as much”
4. Ratio Scale
• Ordered
• Differences meaningful
• True zero exists
Examples
• Weight
• Age
• Salary
• Length
Highest level of measurement
Ratios are meaningful
Variables – Classification
Quantitative Variables
• Discrete
• Continuous
Qualitative Variables
• Nominal
• Ordinal
Properties of Mean
✔ Most stable measure
✔ Uses all observations
Affected by extreme values
✔ Sum of deviations from mean = 0
May not be actual data value
Mode
Definition
• Value with highest frequency
Properties
✔ Can be used for qualitative & quantitative data
✔ Not affected by outliers
May not be unique
Median
Definition
• Middle value when data is arranged
Rules
• Odd n → middle value
• Even n → average of two middle values
Properties
✔ Not affected by outliers
✔ Positional measure
✔ 50% data on each side
Measures of Variability
Describe spread / dispersion of data.
Types:
1. Range
2. Variance
3. Standard Deviation
Range
Formula
𝑅𝑎𝑛𝑔𝑒 = 𝑋𝑚𝑎𝑥 − 𝑋𝑚𝑖𝑛
Properties
✔ Easy to compute
Uses only two values
Highly unreliable
Variance
Definition
Average of squared deviations from mean
Population Variance
∑(𝑋 − 𝜇)2
𝜎2 =
𝑁
Sample Variance
2
∑(𝑋 − 𝑋ˉ)2
𝑠 =
𝑛−1
Standard Deviation
Definition
Square root of variance
Measures average distance from mean
Formula
𝜎 = √𝜎 2 , 𝑠 = √𝑠 2
Properties
✔ Most important variability measure
✔ Uses all data
✔ Same unit as original data
Outlier Detection
Limits
𝐿𝑜𝑤𝑒𝑟 = 𝑄1 − 1.5(𝐼𝑄𝑅)
𝑈𝑝𝑝𝑒𝑟 = 𝑄3 + 1.5(𝐼𝑄𝑅)
Random Experiment
Definition
A random experiment is an experiment whose outcome cannot be predicted with certainty before performing it.
Examples
• Tossing a coin → Head / Tail
• Throwing a die → 1 to 6
• Waiting time at bus stop
• Transmitting a signal through a channel
Outcome is uncertain, but set of outcomes is known
• Throw a die:
𝑆 = {1,2,3,4,5,6}
Event
Definition
An event is a subset of the sample space.
• Includes:
o Empty set (∅)
o Entire sample space (S)
If outcome ∈ A → event occurs
Complement of an Event
Definition
The complement of event A (denoted by 𝐴𝑐 ) consists of all outcomes not in A.
𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)
Probability Rule
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
If mutually exclusive:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Definition of Probability
(A) Classical Probability
Favorable outcomes
𝑃(𝐴) =
Total outcomes
Probability Scale
• 0 → Impossible event
• 0.5 → Equally likely
• 1 → Certain event
Shortcut:
𝑃(at least one) = 1 − 𝑃(none)
Then:
𝑃(𝐴) + 𝑃(𝐵) = 1
Combination-Based Probability
Example:
• Selecting 2 laptops out of 6 computers
Total outcomes:
6
( ) = 15
2
Probability = Favorable / Total combinations
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–1 (Session 2): BASIC PROBABILITY
Random Experiment
Definition
A random experiment is an experiment whose outcome cannot be predicted with certainty before performing it.
Examples
• Tossing a coin → Head / Tail
• Throwing a die → 1 to 6
• Waiting time at bus stop
• Transmitting a signal through a channel
Outcome is uncertain, but set of outcomes is known
• Throw a die:
𝑆 = {1,2,3,4,5,6}
Event
Definition
An event is a subset of the sample space.
• Includes:
o Empty set (∅)
o Entire sample space (S)
If outcome ∈ A → event occurs
Complement of an Event
Definition
The complement of event A (denoted by 𝐴𝑐 ) consists of all outcomes not in A.
𝑃(𝐴𝑐 ) = 1 − 𝑃(𝐴)
If mutually exclusive:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Dependent Events
• Occurrence of one affects the probability of the other
Definition of Probability
(A) Classical Probability
Favorable outcomes
𝑃(𝐴) =
Total outcomes
All outcomes must be equally likely
Probability Scale
• 0 → Impossible event
• 0.5 → Equally likely
• 1 → Certain event
Shortcut:
𝑃(at least one) = 1 − 𝑃(none)
Then:
𝑃(𝐴) + 𝑃(𝐵) = 1
Combination-Based Probability
Example:
• Selecting 2 laptops out of 6 computers
Total outcomes:
6
( ) = 15
2
Conditional Probability
Concept
Sometimes partial information is available (event A has already occurred).
Then probability of B changes.
Definition
For two events A and B:
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 ∣ 𝐴) = , 𝑃(𝐴) ≠ 0
𝑃(𝐴)
Similarly,
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 ∣ 𝐵) = , 𝑃(𝐵) ≠ 0
𝑃(𝐵)
General Case
𝑃(𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 ) = 𝑃(𝐴1 ) 𝑃(𝐴2 ∣ 𝐴1 ) … 𝑃(𝐴𝑛 ∣ 𝐴1 ∩ ⋯ ∩ 𝐴𝑛−1 )
Important Conditional Probability Results
• If B has no effect on A:
𝑃(𝐴 ∣ 𝐵) = 𝑃(𝐴)
Independence of Events
Definition
Two events A and B are independent iff:
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) 𝑃(𝐵)
Equivalent Conditions
• 𝑃(𝐴 ∣ 𝐵) = 𝑃(𝐴)
• 𝑃(𝐵 ∣ 𝐴) = 𝑃(𝐵)
Independence ≠ Mutually Exclusive
Mutually exclusive events cannot be independent (unless probability = 0)
So,
𝑃(𝐵)
𝑃(𝐵 ∣ 𝐴) =
𝑃(𝐴)
Meaning
Posterior = Likelihood × Prior / Evidence
Used to reverse conditioning
Step 2: Bayes
𝑃(1 sent ∣ 1 received)
Then:
𝑃(𝐴) + 𝑃(𝐵) = 1
𝑃(𝐴) = ∑ 𝑃( 𝐸𝑖 )𝑃(𝐴 ∣ 𝐸𝑖 )
𝑖=1
Where:
• 𝐻: hypothesis
• 𝐸: evidence
Foundation of Machine Learning probabilistic models
MAP vs ML Hypothesis
Maximum A Posteriori (MAP)
ℎ𝑀𝐴𝑃 = arg max 𝑃(𝐷 ∣ ℎ)𝑃(ℎ)
ℎ
Where:
• 𝐶= class (Spam / Normal)
• 𝑋 = (𝑥1 , 𝑥2 , … , 𝑥𝑛 )= features
𝑃(𝑋1 , 𝑋2 , … , 𝑋𝑛 ∣ 𝐶) = ∏ 𝑃( 𝑋𝑖 ∣ 𝐶)
𝑖=1
Where:
𝑁𝑡𝑐 + 1
𝑃(𝑡 ∣ 𝑐) =
𝑁𝑐 + 𝑉
𝑉= vocabulary size
+1 → Laplace smoothing
Classifier fails
Limitations
Independence assumption unrealistic
Zero frequency issue (needs smoothing)
Not good for correlated features
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–3 : RANDOM VARIABLES & PROBABILITY DISTRIBUTIONS
(Session 5 – Exam Ready Notes)
Example:
• Dice throw
1
𝑃(𝑋 = 𝑥) = , 𝑥 = 1,2,3,4,5,6
6
Properties of Expectation
• 𝐸(𝑎𝑋) = 𝑎𝐸(𝑋)
• 𝐸(𝑋 + 𝑏) = 𝐸(𝑋) + 𝑏
• 𝐸(𝑎𝑋 + 𝑏) = 𝑎𝐸(𝑋) + 𝑏
• 𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌)
• 𝐸(𝑋𝑌) = 𝐸(𝑋)𝐸(𝑌)only if X and Y are independent
Standard Deviation
𝜎 = √𝑉𝑎𝑟(𝑋)
Rules of Variance
• 𝑉𝑎𝑟(𝑏) = 0(constant)
• 𝑉𝑎𝑟(𝑎𝑋) = 𝑎2 𝑉𝑎𝑟(𝑋)
• 𝑉𝑎𝑟(𝑎𝑋 + 𝑏) = 𝑎2 𝑉𝑎𝑟(𝑋)
• If X, Y independent:
𝑉𝑎𝑟(𝑎𝑋 + 𝑏𝑌) = 𝑎2 𝑉𝑎𝑟(𝑋) + 𝑏 2 𝑉𝑎𝑟(𝑌)
Properties
• Non-decreasing
• 0 ≤ 𝐹(𝑥) ≤ 1
• 𝐹(∞) = 1
Built by cumulative sum of PMF values.
Conditions
• 𝑓(𝑥, 𝑦) ≥ 0
• ∑ ∑𝑦 𝑓(𝑥, 𝑦) = 1
𝑥
Marginal Probability
Obtained by summing rows or columns.
𝑝𝑋 (𝑥) = ∑ 𝑓(𝑥, 𝑦)
𝑦
𝑝𝑌 (𝑦) = ∑ 𝑓(𝑥, 𝑦)
𝑥
Continuous Case
𝑓(𝑥, 𝑦)
𝑓𝑋∣𝑌 (𝑥 ∣ 𝑦) =
𝑓𝑌 (𝑦)
Definition
A random variable X follows Bernoulli distribution if:
𝑝, 𝑥=1
𝑃(𝑋 = 𝑥) = {
𝑞 = 1 − 𝑝, 𝑥 = 0
Key Points
✔ Single trial
✔ Special case of Binomial distribution (n=1)
✔ Used for yes/no, pass/fail, hit/miss problems
Where:
• 𝑥 = 0,1,2, … , 𝑛
• 𝑞 =1−𝑝
Definition (PMF)
𝑒 −𝜆 𝜆𝑥
𝑃(𝑋 = 𝑥) = , 𝑥 = 0,1,2, …
𝑥!
Typical Applications
✔ Calls arriving
✔ Customers arriving
✔ Defects per unit
✔ Accidents, errors
Properties
✔ Bell-shaped curve
✔ Symmetric about mean
✔ Mean = Median = Mode
✔ Total area = 1
✔ Defined for −∞ < 𝑥 < ∞
Where:
𝑍 ∼ 𝑁(0,1)
Use symmetry:
𝐹(−𝑧) = 1 − 𝐹(𝑧)
Then:
𝑋 ∼ 𝑁(𝑛𝑝, 𝑛𝑝𝑞)
Continuity Correction (VERY IMPORTANT)
Binomial Probability Normal Approximation
𝑃(𝑋 ≤ 𝑘) 𝑃(𝑋 ≤ 𝑘 + 0.5)
𝑃(𝑋 < 𝑘) 𝑃(𝑋 < 𝑘 − 0.5)
𝑃(𝑋 ≥ 𝑘) 𝑃(𝑋 ≥ 𝑘 − 0.5)
𝑃(𝑋 > 𝑘) 𝑃(𝑋 > 𝑘 + 0.5)
Missing this = marks cut
Then:
𝑌 = 𝑍12 + 𝑍22 + ⋯ + 𝑍𝑘2 ∼ 𝜒 2 (𝑘)
Properties
✔ Values from 0 to ∞
✔ Depends on degrees of freedom (k)
✔ Used in variance & goodness-of-fit
t-Distribution (Intro)
When Used
✔ Small sample size
✔ Population standard deviation unknown
Definition
𝑍
𝑡=
√𝜒 2 /𝑘
F-Distribution (Intro)
When Used
✔ Ratio of two variances
✔ Comparison of variability of two populations
Properties
✔ Defined only for positive values
✔ Depends on two degrees of freedom (𝜈1 , 𝜈2 )
✔ Used in ANOVA
📘 INTRODUCTION TO STATISTICAL METHODS (ISM)
MODULE–5 : SAMPLING DISTRIBUTION & ESTIMATION
(Session 7 – COMPLETE Exam Notes)
Parameter vs Statistic
Parameter
• Population characteristic
• Generally unknown
• Examples:
o Population mean → 𝜇
o Population variance → 𝜎 2
o Population proportion → 𝑃
Statistic
• Function of sample observations
• Used to estimate parameter
• Examples:
o Sample mean → 𝑥ˉ
o Sample variance → 𝑠 2
o Sample proportion → 𝑝̂
Statistic is also called an Estimator
Numerical value obtained = Estimate
Methods of Sampling
(A) Probability Sampling
(All units have known probability of selection)
1. Simple Random Sampling
o Homogeneous population
o Each unit has equal chance
2. Systematic Sampling
o Arrange population in order
o Select every 𝑘th unit
𝑁
𝑘=
𝑛
Sampling Distribution
Definition
The probability distribution of a statistic is called its sampling distribution.
Depends on:
• Population distribution
• Sample size
• Sampling method
Properties
𝜇𝑥ˉ = 𝜇
𝜎
𝜎𝑥ˉ =
√𝑛
CLT Results
𝜇𝑥ˉ = 𝜇
𝜎
𝜎𝑥ˉ =
√𝑛
Rule of Thumb
𝑛
• If < 0.05
𝑁
→ Ignore correction factor
Where:
• 𝑥= number of successes
• 𝑛= sample size
Statistical Inference
Three forms:
1. Point Estimation
2. Interval Estimation
3. Hypothesis Testing (later module)
Point Estimation
• Single value used to estimate parameter
• Examples:
o 𝑥ˉestimates 𝜇
o 𝑠 2 estimates 𝜎 2
Varies from sample to sample
Confidence Interval
Meaning
A range of values within which the true population parameter lies with a given confidence.
Confidence Level
100(1 − 𝛼)%
Common values:
• 90% → 𝛼 = 0.10
• 95% → 𝛼 = 0.05
• 99% → 𝛼 = 0.01
Use Z-values:
• 95% → 1.96
• 99% → 2.58
Where:
• 𝐸= margin of error
Smaller error → larger sample size