Statistics Lecture 1 Overview
Statistics Lecture 1 Overview
Sep 4, 2025
Lecture 1: Introduction
● MICRO data - on an individual level
○ TIME SERIES
○ PANEL DATA
○ REPEATED CROSS-SECTION
- Statistical units
TYPES OF VARIABLE
Lecture 1 Notes
SCOPE & CONTENT OF THE COURSE
Lecture 2
STATISTICAL VARIABLES
GRAPHS
● Pie chart:
○ whole circle = total
○ slice = share
● Bar chart:
○ height of bar = frequency
HISTOGRAMS
SYNTHETIC MEASURES
● Categories:
○ Central tendency (mode, median, mean)
○ Location
○ Dispersion
CENTRAL TENDENCY
MODE
MEDIAN
● Definition: The median is the central value of an ordered dataset.
○ It splits the dataset into two halves:
■ 50% of units below the median
■ 50% of units above the median
1. If n is odd:
𝑛+1
a. value with position
2
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2
2. If n is even:
a. Two middle observations
b. If numerical → average of the two
𝑥𝑛 +𝑥𝑛
+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2
2
2
● Ordered: [1, 1, 2, 3, 3, 5]
● n=6 (even) → Median = (2+3)/2=2.5
Cumulative Frequencies
How to compute the median using cumulative frequencies, both for discrete and
continuous data.
● Definition:
The cumulative relative frequency for each value is the proportion of
observations with values smaller than or equal to that value.
● Applicable to:
○ Ordinal variables (can be ranked)
○ Numerical variables
● Purpose: Helps locate medians, percentiles, and quantiles.
● If the cumulative frequency surpasses 0.5 → median is that value (or interval).
● If it equals 0.5 exactly → median is that value, interval, or midpoint.
● In histograms → median divides the area into two equal halves.
MEAN
Raw data
formula:
𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛
Deviation
𝑑𝑖 = 𝑥𝑖 − 𝑥
Properties of the Mean
𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1
2. Minimization:
○ Minimizes the sum of squared deviations
2 2 2
(𝑥1 − 𝑎) + (𝑥2 − 𝑎) + ... + (𝑥𝑛 − 𝑎)
The slide explains how to compute the mean 𝑥 when data is summarized in a frequency
distribution table.
● When values are repeated with certain frequencies, instead of summing every
observation individually, we use the weighted mean formula:
𝑘
∑ 𝑥𝑖· 𝑓𝑖
𝑖=1
𝑥= 𝑛
Where:
0 60 0.30
1 40 0.20
2 60 0.30
3 20 0.10
4 20 0.10
Key Idea
OUTLIERS
○ Mean: 𝑥 = (Σ𝑥𝑖)/𝑛
𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛
𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1
📖 formulas
1. Mean (Arithmetic Average) – Raw Data
𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛
𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1
𝑛
2
𝑚𝑖𝑛𝑎 ∑ (𝑥𝑖 − 𝑎) 𝑜𝑐𝑐𝑢𝑟𝑠 𝑤ℎ𝑒𝑛 𝑎 = 𝑥
𝑖=1
𝑘
𝑥 = ∑ 𝑥 𝑖 · 𝑝𝑖
𝑖=1
5. Histogram Density
𝑓
𝐷𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐿
6. Example – Sports Practice Data
Frequencies:
𝑥 = 0 + 0. 20 + 0. 60 + 0. 30 + 0. 40 = 1. 5
● If n odd:
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2
● If n even:
𝑥 𝑛 +𝑥 𝑛
+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2
2
2
● Lower quartile -> First value where the cumulative frequency reaches or
exceeds 0.25
● Upper quartile -> First value where the cumulative frequency reaches or
exceeds 0.75
PERCENTILES
𝑡ℎ
𝑝 percentile value such that p% of observations are below
0 0.30 0.30
1 0.20 0.50
2 0.30 0.80
3 0.10 0.90
4 0.10 1
FIVE NUMBERS SUMMARY
1. Minimum
2. Lower quartile
3. Median
4. Upper quartile
5. Maximum
𝑄3 − 𝑄1 = 𝐼𝑅
𝑄1 − 1. 5𝐼𝑅 < 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
𝑄3 + 𝐼𝑅 > 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
OUTLIER DETECTION FROM BOXPLOT
● Outlier -> any value falling below the lower quartile or above the upper
quartile by more than 1.5 IR
WHISKERS – NO OUTLIERS
No outliers:
Range
● max – min
● Affected by outliers
● Q3 – Q1 = IQR
● variability of central 50%
Variance (s²)
2 2 2
2 (𝑥1−𝑥) +(𝑥2−𝑥) +⋯+(𝑥𝑛−𝑥)
𝑠 = 𝑛−1
SHAPE OF DISTRIBUTIONS
Symmetric distribution
SYMMETRIC DISTRIBUTION
● Bell-shaped distribution
○ equal spacing of quartiles
● U-shaped distribution
○ median–Q1 > Q1–min
CONCENTRATION ANALYSIS
How equally or unequally is a variable (like income, wealth, sales, etc.) distributed among
individuals or units? Can one unit hold a huge share of the total, while others have little?
Used for numerical positive variables with transferable characteristics (e.g., income)
● Is the variable equally distributed among units?
● Do a few units hold most of the amount?
EXAMPLE:
● Total = 160
● Mean income = 40
Extreme Scenarios:
Perfect equality:
Maximum concentration:
CONCENTRATION CURVE
NOTATION
Compute coordinates:
𝐹0 = 𝑄0 = 0 𝐹𝑛 = 𝑄𝑛 = 1
Example
Interpretation:
Properties:
● Non-decreasing → monotone
● Convex → 𝐹 > 𝑄
𝑖 𝑖
CONCENTRATION INDEXES
(𝐹1−𝑄1)+(𝐹2−𝑄2)+...+(𝐹𝑛−1−𝑄𝑛−1)
𝑅= 𝐹1+𝐹2+...+𝐹𝑛−1
● Perfect equality → 𝐹 = 𝑄 → 𝑅 = 0
𝑖 𝑖
● Maximum concentration → 𝑎𝑙𝑙 𝑄 = 0 , 𝑄 = 1 → 𝑅 = 1
𝑖 𝑛
● Range: [0,1]
● higher = more concentrated
● Perfect equality → 𝑃 = 0
● Maximum concentration → 𝑃 = 1
INTERPRETATION:
Example:
🌸 cheat sheet 4
CHEAT SHEET (QUICK REFERENCE)
● Shape:
○ Symmetric → mean = median
○ Right-skewed → mean > median
○ Left-skewed → mean < median
● Concentration:
○ Perfect equality: all equal
○ Max concentration: one unit holds all
● Concentration curve:
○ Fi=i/n, Qi=cumulative xtotal xQi=total xcumulative x
○ Perfect equality → diagonal line
○ Max concentration → flat at 0 until last unit
● Indexes:
○ Gini (R): [0,1]; higher = more concentrated
○ Pietra (P): [0,1]; higher = more concentrated
🍥 Lecture 5
Sep 12, 2025
Lecture 5
AIM OF BIVARIATE DESCRIPTIONS
Bivariate statistics = techniques to study the joint behavior of two variables at sample
level
ASYMMETRIC PERSPECTIVE
Investigating in bivariate association → assessing dependency
Roles of variables
TECHNIQUES
Depends on the variable’s type
CASES
Cross-tab of example:
Marginal Distributions
Relative Frequencies
Example
Association:
Means Comparison
Bivariate Association
Scatterplot
Linear Association
Covariance
Formula:
(𝑥1−𝑥)(𝑦1−𝑦)+(𝑥2−𝑥)(𝑦2−𝑦)+...+(𝑥𝑛−𝑥)(𝑦𝑛−𝑦)
𝐶𝑜𝑣(𝑋, 𝑌) = 𝑛−1
● 𝑥 → value of X observed on unit 1
1
● 𝑦 → value of Y observed on unit 1
1
● 𝑥 → mean of X
● 𝑦 → mean of Y
Interpretation:
Formula:
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟= 𝑠𝑋 · 𝑠𝑌
● 𝑠 standard deviation of X
𝑋
● 𝑠 standard deviation of Y
𝑌
Range: −1≤𝑟≥1
Measures:
Interpretation:
Random Variables
Random Experiments
Random Variable
EXAMPLE:
Types:
Conditions:
● 0 ≤ 𝑝𝑖 ≤ 1
● 𝑝 + 𝑝2 + ... + 𝑝𝑘 = 1
1
● Values: {0,1,2}
● Probabilities: 0.25, 0.50, 0.25
Example:
𝐸(𝑋) = 0 · 0. 25 + 1 · 0. 5 + 2 · 0. 25 = 1
Variance and Standard Deviation of a Random Variable
Variance
2 2 2 2
𝑉𝑎𝑟(𝑋) = σ = (𝑥1 − μ) 𝑝1 + (𝑥2 − µ) 𝑝2 + ... + (𝑥𝑘 − µ) 𝑝𝑘
Standard deviation
2
σ = 𝑉𝑎𝑟(𝑋) = σ
Quantile of order k
𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘
🌸 cheat sheet 7
📑 Cheat Sheet (Quick Reference)
● Probability = quantification of uncertainty; used in decisions & inference.
● Random variable (X) = numerical outcome of random experiment.
● Discrete random variable:
○ pmf: Pr(X=xi)=pi, ∑pi=1
○ Mean: E(X)=∑xipi
○ Variance: Var(X)=∑(xi−μ)2pi
○ SD: σ=Var(X)
● Continuous random variable:
○ pdf: non-negative, total area = 1
○ Probabilities = areas under pdf
○ Quantiles: Pr(X≤q)=k
● Median = value with 50% probability below it.
🍥 Lecture 8
Lecture 8
Linear Transformation
for a random variable X, constants a, b, define:
𝑌 = 𝑎 + 𝑏𝑋 𝑌 → linear transformation of 𝑋
Example
2
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(10000 + 1. 5𝑆) = 𝑉𝑎𝑟(1. 5𝑆) = 1. 5 𝑉𝑎𝑟(𝑆)
2 2 2 2
𝑉𝑎𝑟(𝑃) = 1. 5 · 𝑆𝐷 (𝑆) = 1. 5 · 8000 = 12000
Standardization
2
For X , random variable, 𝐸(𝑋) = µ 𝑉𝑎𝑟(𝑋) = σ
Standardized variable
𝑋−µ
𝑍= σ
Expected value & Variance of Z
𝑋−µ 1 µ
𝑍= σ
= σ
𝑋 − σ
● Expected Value:
1 µ µ µ
𝐸(𝑍) = σ
𝐸(𝑋) − σ
= σ
− σ
= 0
● Variance:
2
1 σ
𝑉𝑎𝑟(𝑍) = 2 𝑉𝑎𝑟(𝑋) = 2 = 1
σ σ
Bernoulli Distribution
Experiment with two outcomes → success / failure
● 𝑝 → success probability
● 𝑋 → nr of times we observe success
● Value 1 → probability 𝑝
● Value 0 → probability 1 − 𝑝
Notation: 𝑋 ∼ 𝐵𝑒𝑟(𝑝)
Examples:
● Number of heads in one toss of a coin → Bernoulli random variable with success
probability p=0.5
● Number of 6 observed in rolling once a dice → Bernoulli random variable with
success probability p = 1/6
Probability distribution
𝑋 ∼ 𝐵𝑒𝑟(𝑝)
Distribution parametrized by 𝑝
2 2
𝑉𝑎𝑟(𝑋) = (0 − 𝑝) (1 − 𝑝) + (1 − 𝑝) 𝑝 = 𝑝(1 − 𝑝)
𝑓(𝑥) =
2
·𝑒
2πσ
● µ real number
2
● σ positive real number
2 𝑋−µ
𝑋 ∼ 𝑁(µ, σ ) → 𝑍= σ
Commands for normal distribution
how to compute probabilities using R functions
Cumulative probability
2
For 𝑋 ∼ 𝑁(µ, σ )
pnorm(x,µ, σ)
returns
𝑃𝑟(𝑋 ≤ 𝑥)
Example:
2
X, 𝑋 ∼ 𝑁(3, 5 ), with normal distribution, 𝐸(𝑋) = 3, 𝑆𝐷(𝑋) = 5 ( σ = 5)
Interval probability:
Quantile
qnorm( k, mean = μ, sd = σ )
2
For 𝑋 ∼ 𝑁(µ, σ )
𝑞𝑛𝑜𝑟𝑚( 𝑘, µ, σ )
𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘
Example:
We want to find the value 𝑞0.8 (a quantile or cut-off point) such that:
That means we are looking for the 80th percentile — the value below which 80% of all
the possible X values lie.
● The function qnorm(p, mean, sd) gives the quantile 𝑞𝑝 for a normal distribution
with mean = μ and standard deviation = σ.
● In mathematical terms, it finds the x-value that satisfies:
𝑃(𝑋 ≤ 𝑞𝑝) = 𝑝
So, when we call qnorm(0.8, mean = 3, sd = 5), R is finding the x-value 𝑞 such
0.8
that:
𝑃(𝑋 ≤ 𝑞0.8) = 0. 8
2
To find 𝑞0.8 for 𝑋 ∼ 𝑁(3, 5 ) , R internally performs standardization.
From standard normal tables (or qnorm(0.8) for the standard case):
𝑧0.8 = 0. 8416
That means:
𝑃(𝑍 ≤ 0. 8416) = 0. 8
Using R:
qnorm(0.8, mean = 3, sd = 5)
and R returns:
[1] 7.208106
● If X represents, for example, the number of hours watched, a value of 7.21 (approx)
is such that:
○ 80% of people (or observations) watch ≤ 7.21 hours.
○ 20% watch more than 7.21 hours.
2 𝑋−3 Standardize X
𝑍=
5
Final Answer
Random Vector
Goal: study joint probabilistic behavior of more variables
Example
● 𝑋 →
○ 1 → respondent is a student
○ 0 → otherwise
● 𝑌 →
○ 1 → unsatisfied
○ 2 → indifferent
○ 3 → satifed
X Y
1 2 3
Maraginals
𝑃𝑟(𝑋 = 0, 𝑌 = 1) = 0. 2 𝑃𝑟(𝑋 = 1, 𝑌 = 3) = 0. 1
Independence condition
Definition:
X and Y are independent if knowing one gives no information about the other.
We can check:
𝑃(𝑋 = 0, 𝑌 = 1) = 0. 1
𝑃𝑋(0)⋅𝑃𝑌(1) = 0. 4 × 0. 2 = 0. 08
Covariance
Formula:
Interpretation:
− 1 ≤ ρ(𝑋 , 𝑌) ≤ 1
Where:
● σ𝑋 = 𝑉𝑎𝑟(𝑋)
● σ𝑌 = 𝑉𝑎𝑟(𝑌)
𝑋 and 𝑌 independent → ρ( 𝑋 , 𝑌 ) = 0
● ρ( 𝑋 , 𝑌 ) = 0 → NO linear association
● ρ( 𝑋 , 𝑌 ) > 0 → positive linear association
● ρ( 𝑋 , 𝑌 ) < 0 → negative linear association
● 𝑋 →
○ 𝐸(𝑋) = µ
𝑋
2
○ 𝑉𝑎𝑟(𝑋) = σ
𝑋
● 𝑌 →
○ 𝐸(𝑌) = µ
𝑌
2
○ 𝑉𝑎𝑟(𝑌) = σ𝑌
Sum of two Random Variables
● X and Y correlated:
2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 + 2𝐶𝑜𝑣(𝑋 , 𝑌)
2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌
● X and Y correlated:
2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 − 2𝐶𝑜𝑣(𝑋 , 𝑌)
2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌
● Independent
● Identically distributed:
○ Same 𝐸(𝑋) = µ
2
○ Same 𝑉𝑎𝑟(𝑋) = σ
Sum of I.I.D Random Variables
𝑋1 + 𝑋2 + ... + 𝑋𝑛
𝑋=
𝑛
● Expected value:
𝑋1 + 𝑋2 + ... + 𝑋𝑛
1 1
𝐸(𝑋) = 𝐸 ( 𝑛
)= 𝑛
𝐸(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝑛
𝑛µ = µ
● Variance:
𝑋! + 𝑋2 + ... + 𝑋𝑛 𝑛σ
2 2
σ
1
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟 ( 𝑛
)= 2 𝑉𝑎𝑟(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 2 = 𝑛
𝑛 𝑛
🌸 cheat sheet 8
📑 Cheat Sheet — Lecture 8
● Linear transform:
Y=a+bX⇒E(Y)=a+bE(X), Var(Y)=b2Var(X)Y=a+bX⇒E(Y)=a+bE(X),Var(Y)=b2Var(X).
● Standardization: Z=X−μσ⇒E(Z)=0, Var(Z)=1Z=σX−μ⇒E(Z)=0,Var(Z)=1.
● Bernoulli X∼Ber(p)X∼Ber(p):
Pr(X=1)=p, Pr(X=0)=1−p; E(X)=p, Var(X)=p(1−p)Pr(X=1)=p,Pr(X=0)=1−p;E(X)=p,Var(X)=
p(1−p).
● Normal N(μ,σ2)N(μ,σ2): density
f(x)=12πσ2e−(x−μ)2/(2σ2)f(x)=2πσ21e−(x−μ)2/(2σ2); standardization
Z=X−μσ∼N(0,1)Z=σX−μ∼N(0,1);
probs with pnorm, quantiles with qnorm.
● Joint discrete (X,Y): joint pmf p(xi,yj)p(xi,yj); independence iff
p(xi,yj)=PrX(xi)PrY(yj)p(xi,yj)=PrX(xi)PrY(yj); correlation
ρ=Cov(X,Y)σXσYρ=σXσYCov(X,Y).
● Sum/Difference:
Var(X±Y)=σX2+σY2±2 Cov(X,Y),E(X±Y)=μX±μ[Link](X±Y)=σX2+σY2±2Cov(X,Y),E(X±Y)
=μX±μY.
If not correlated: Var(X±Y)=σX2+σY2Var(X±Y)=σX2+σY2.
● i.i.d.: ∑Xi∑Xihas E=nμ, Var=nσ2E=nμ,Var=nσ2; XˉXˉ has E=μ, Var=σ2/nE=μ,Var=σ2/n.
🍥 Lecture 8 pt 2
Lecture 8.2
=> 𝐸(𝑋 ) = µ
𝑖
2
=> 𝑉𝑎𝑟(𝑋𝑖) = σ
𝑋1+ ... + 𝑋𝑛
𝑇
𝑋=
𝑛
= 𝑛
𝑇 1 1
𝐸(𝑋) = 𝐸( 𝑛 ) = 𝑛
𝐸(𝑇) = 𝑛
𝑛µ = µ
2
𝑇 1 1 2 σ
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟( ) = 𝑛 2 𝑉𝑎𝑟(𝑇) = 2 𝑛σ = 𝑛
𝑛 𝑛
1. 𝑦 = 𝑎 + 6𝑥 = 3𝑥
When we take many i.i.d. random variables with any distribution (not necessarily normal),
the distribution of their sum or average becomes approximately normal as the sample
size n grows large.
In simple terms
Even if the original data are not normally distributed, the sample mean (average of many
independent observations) will follow a bell-shaped (normal) distribution if the sample
size 𝑛 is big enough.
Mathematical Formulation
𝑆𝑛 = 𝑋1 + 𝑋2 + ... + 𝑋𝑛
That means:
● The mean (center) of 𝑆 is 𝑛µ
𝑛
2
● The variance (spread) of 𝑆 is 𝑛σ
𝑛
Sample mean:
That means:
We can rewrite the theorem in standardized form, to use the standard normal
distribution 𝑁(0, 1):
That means: if we convert the sample mean into a z-score, it will follow a normal
distribution when 𝑛 is large.
It lets us use normal probability methods even when data are not normal, as long as 𝑛
is large.
How the shape changes with n
2
Sum of i.i.d. 𝑆𝑛 ≈ 𝑁(𝑛µ , 𝑛σ ) total becomes approximately
variables normal
𝑋𝑖 → 0 𝐹𝑎𝑖𝑙𝑢𝑟𝑒 (1 − 𝑝)
→ 1 𝑆𝑢𝑐𝑐𝑒𝑠𝑠 𝑝
𝑃(𝑋𝑖 = 0) = 1 − 𝑝 𝑃(𝑋𝑖 = 1) = 𝑝
𝐸(𝑋𝑖) = 𝑝 𝑉𝑎𝑟(𝑋𝑖) = 𝑝(1 − 𝑝)
𝑇 = 𝑋1 + ... + 𝑋𝑛 = total number of successes in our n
𝐸(𝑇) = 𝑛𝐸(𝑋𝑖) = 𝑛𝑝
𝐸(𝑃) = 𝐸(𝑋) = 𝑝
𝑉𝑎𝑟(𝑋𝑖) 𝑝(1−𝑝)
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(𝑋) = 𝑛
= 𝑛
CENTRAL LIMIT TH.
For a large 𝑛 :
𝑝(1−𝑝)
𝑃(𝑋) ≈ 𝑁(𝑝, 𝑛
)
Example 1 — Sum of Bernoulli
𝑇 = 𝑋1 + ⋯ + 𝑋150
⇒ 𝑇 ≈ 𝑁(60 , 36) → 𝑆𝐷 = 36 = 6
Using CLT, we can now find probabilities using the normal distribution:
Even though the original distribution (Bernoulli) is not normal, 𝑇 behaves approximately
normal for 𝑛 = 150
𝑇
𝑝 = 0. 4 𝑛 = 200 𝑃= 𝑛
𝑝(1−𝑝)
𝐸(𝑃) = 𝑝 = 0. 4 𝑉𝑎𝑟(𝑃) = 𝑛
= 0. 0012
So:
This means about 7.45% of samples will have a proportion below 0.35.
→ 1 𝑠𝑒𝑙𝑓 − 𝑝𝑜𝑚𝑝 0. 7
𝑖 = 1, 2, ..., 80
a) 𝑋 ∼ 𝐵𝑒𝑟(0. 7)
𝑖
𝐸(𝑋𝑖) = 0. 7 𝑉𝑎𝑟(𝑋𝑖) = 0. 7 · 0. 3
𝑇 = 𝑋1 + ... + 𝑋80
𝑃(𝑥) 𝑃(𝑃 ≥ 0. 6)
𝑃 ≈ 𝑁𝑂𝑅𝑀𝐴𝐿(0. 7, 0. 0026)
0.7·0.3
𝐸(𝑃) = 0. 7 𝑉𝑎𝑟(𝑃) = 80
= 0. 0026
🍥 Lecture 9
Oct 2, 2025
Lecture 9
Population & Sample
Population:
● set of all statistical units on which a characteristic may be measured.
Sample:
● any subset of the population.
Description vs Prediction:
● analysis should not stop at describing the sample
● aim is to make predictions about the population.
Inferential statistics:
● techniques to draw conclusions on the entire population from a sample.
Good sample:
● representative and obtained via random sampling.
Probabilistic framework
● express inferential problems in probability terms to quantify and control the
probability of errors.
Example
Inferential Problems
Focus → inference on the population mean (μ)
Main tasks
● Point estimation
● Confidence interval estimation
● Hypothesis testing
Statistics
Sample statistic → a function of the random sample that summarizes information
→ a random variable
𝐸(𝑋) = µ
2
σ
II. Variance →
𝑛
2
σ
𝑉𝑎𝑟(𝑋) = 𝑛
○ It is the standard deviation of all possible sample means over all possible
samples
III. From CLT → for large 𝑛, the distribution approximately normal (Gaussian).
Point Estimation Of The Population Variance And Standard
Deviation
Sample variance
2 2
● Sample variance 𝑆 → estimator of the population variance σ
2 2
● Observed value 𝑠 → estimate of σ
Unbiased estimator:
2 2
● 𝐸(𝑆 ) = σ
2 2
● Sample variance 𝑆 → unbiased estimator of the population variance σ
Point estimation means using a single number calculated from a sample to estimate an
unknown population parameter.
The formula is called the point estimator, and the resulting number is the point
estimate.
Standard Error
The standard deviation of the sampling distribution of 𝑋 is called the Standard Error (SE)
of the mean.
It measures how much sample means tend to vary from one sample to another.
Formula:
Interpretation:
● It represents the standard deviation of all possible sample means over all
possible random samples of the same size.
● It is the expected variability of 𝑋 around the true mean μ.
This slide explains what to do when σ (the population standard deviation) is unknown,
which is almost always the case in real life.
Given:
● 𝑠 = 2. 5783
● 𝑛 = 709
Then:
Interpretation
( 1 − α ) · 100%
where:
Typical choices:
Interpretation
Imagine we repeat the sampling process many times (same population, same sample size).
Each time, we compute a confidence interval for the mean (μ).
Then:
● About 95% of these intervals (if the confidence level is 95%) will contain the true µ
● About 5% will not.
Normal Population Known Variance
2
𝑋1 , ..., 𝑋𝑛 i.i.d → Normal, µ unknown, σ known
2
σ
⇒ 𝑋 normal with mean → µ known variance →
𝑛
The confidence interval of level (1 − α)100% for the unknown population mean µ is:
Frequentist interpretation
Draw repeatedly samples of size 𝑛 from the population → compute for each the
confidence interval of level (1 − α)100% → about (1 − α)100% of the CIs will contain
the unknown parameter µ
σ
● 𝑧α → margin of error
2 𝑛
σ
● 2𝑧α → length of the confidence interval
2 𝑛
→ measure of accuracy
2
For given confidence level and given σ , the greater 𝑛 :
The greater the confidence level (1 − α) → the lower α → the greater 𝑧 α → the longer
2
Example
σ = 30 𝑥 = 150
The confidence interval for the mean score is:
→ we are 95% confident that the mean score 𝑥 is between 144.12 and 155.88
→ we are 5% confident that the mean score is smaller than 144.12 or higher that 155.88
𝑆 σ
𝑋 → µ 𝑆 → σ →
𝑛 𝑛
2
When σ is unknown, estimate it with 𝑆 , and use Student’s t distribution:
● t-quantile in R:
t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).
● Known (\sigma):
[
\bar{x}\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\qquad
z_{\alpha/2}=\texttt{qnorm}(1-\alpha/2).
]
● Unknown (\sigma):
[
\bar{x}\pm t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}},\qquad
t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).
]
● Margin of error: (z_{\alpha/2}\frac{\sigma}{\sqrt{n}}) (or
(t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}})).
● Length: (2\times) Margin. Higher confidence ⇒ larger quantile ⇒ longer interval.
(pp. 40, 45–47, 53–55)
Worked examples
R helper
These are random variables because they depend on the sample (which changes every
time you sample again).
● Example:
○ The formula ( \bar{X} ) is the point estimator of ( \mu ).
○ The calculated number ( \bar{x} = 2.97 ) is the point estimate.
In symbols:
[
\text{Point estimate of } \mu = \bar{x}.
]
So:
🔹 6. Summary Table
Concept Symbol / Formula Description
Point Numerical result from applying the Specific number used as the
Estimate estimator (e.g., ( \bar{x}=2.97 )) best guess of the parameter
✅ In short:
Point estimation means using a single number calculated from a sample to
estimate an unknown population parameter.
The formula is called the point estimator, and the resulting number is the
point estimate.
Example:
The average number of TV hours in a sample is 2.97 →
this is the point estimate for the population mean ( \mu ) (the true average TV hours in
the entire population).
📘 Slide 23 – Title: “Estimators of the population variance
and standard deviation”
This slide explains how we estimate the population variance (σ²) and population
standard deviation (σ) when we only have sample data.
🔹 1. The Goal
In practice, we rarely know the true population parameters (σ² and σ).
Therefore, we must use sample statistics as estimators of those parameters.
🔹 2. The Estimators
Sample variance (S²)
Mathematical operation:
[
S^2 = \frac{(X_1 - \bar{X})^2 + (X_2 - \bar{X})^2 + \cdots + (X_n - \bar{X})^2}{n - 1}
]
Mathematical operation:
[
S = \sqrt{S^2}
]
[
E(S^2) = \sigma^2
]
[
s^2 = \text{observed sample variance}, \quad s = \text{observed sample standard
deviation}.
]
● ( n = 709 )
● ( s^2 = 6.6474 )
● ( s = 2.5783 )
Interpretation:
● The sample variance (s²) = 6.6474 is our best estimate of the population variance
(σ²).
● The sample standard deviation (s) = 2.5783 is our best estimate of the population
standard deviation (σ).
Thus:
[
\hat{\sigma}^2 = 6.6474, \quad \hat{\sigma} = 2.5783.
]
🔹 6. Summary of Slide 23
Quantity Formula Purpose Notes
📘 1. The context
In R, you started with a contingency table:
private 76 56
public 64 24
[Link](tab1, 2)
[Link](tab1, 1)
🔹 3. When margin = 2
[Link](tab1, 2)
Interpretation:
● Among salaried workers → 54.3% use private, 45.7% use public transport.
● Among self-employed workers → 70% use private, 30% use public transport.
🔹 4. When margin = 1
[Link](tab1, 1)
“Among people who use private transport (and among those who use public
transport), what proportion are salaried vs self-employed?”
Interpretation:
● Among those who use private transport, 64.4% are salaried and 35.6% are
self-employed.
● Among those who use public transport, 78% are salaried and 22% are
self-employed.
[Link](tab1)
It divides by the total sum of all cells, giving overall proportions (out of all 200
respondents).
The total of all cells = 1.
✅ Summary Table
Command Divides by Shows proportions Use when you want to
know…
[Link]( Column totals Within each worker type, “Given working status, what
tab1, 2) how many use private vs is the transport used?”
public transport
✅ In short
● 2 → column proportions → “within each worker group”
● 1 → row proportions → “within each transport group”
🍥 Lecture 13
NORMAL POPULATION KNOWN VARIANCE - intuition on the test’s derivation
Ex: Is there enough empirical evidence to presume that the mean weight of all produced
cereals packages is greater than 16?
Xi = weight of a cereal package drawn at random
Test
Against
P - VALUE
🍥 Lecture 14