0% found this document useful (0 votes)
4 views118 pages

Statistics Lecture 1 Overview

Lecture 1 introduces key statistical concepts, including types of data (micro, macro, cross-sectional, longitudinal), the difference between population and sample, and the definitions of parameters and statistics. It emphasizes the importance of understanding variable types (categorical vs. numerical) and lays out the course goals and modules. Lecture 2 expands on statistical variables, frequency distribution, graphs, and measures of central tendency, including mode, median, and mean.

Uploaded by

sbbhjkrmmn
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views118 pages

Statistics Lecture 1 Overview

Lecture 1 introduces key statistical concepts, including types of data (micro, macro, cross-sectional, longitudinal), the difference between population and sample, and the definitions of parameters and statistics. It emphasizes the importance of understanding variable types (categorical vs. numerical) and lays out the course goals and modules. Lecture 2 expands on statistical variables, frequency distribution, graphs, and measures of central tendency, including mode, median, and mean.

Uploaded by

sbbhjkrmmn
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

🍥 Lecture 1

Sep 4, 2025

Lecture 1: Introduction
●​ MICRO data - on an individual level

●​ MACRO data - aggregate level

●​ CROSS-SECTIONAL data - only once, at a fixed time

●​ LONGITUDINAL data - collected at different periods in time:

○​ TIME SERIES

○​ PANEL DATA

○​ REPEATED CROSS-SECTION

★​ Primary data - the one concluded by scientists


★​ Secondary data - the one given to us

POPULATION VERSUS SAMPLE

-​ Statistical units

PARAMETER - numerical summary at population level (average nr of x in the last year by


all people aged 15 to 60 living in Milano)
-​ It is a number that we will never know
STATISTIC - numerical summary at sample level

TYPES OF VARIABLE

●​ categorical - not numbers


○​ nominal - no ranking (ex: what course do you study > list of courses are
categorical nominal variables)
○​ ordinal - ranking (ex: what is your hughes level of education > sec
school, bachelor, master, PhD)
●​ Numerical
Here are advanced bullet-point notes from Lecture 1 of your macroeconomics/statistics
material, based strictly on the uploaded text:

Lecture 1 Notes
SCOPE & CONTENT OF THE COURSE

●​ Statistics in everyday life: Numbers dominate society; statistics appear everywhere.


●​ Statistic = science of:
1.​ Designing studies
2.​ Analysing collected data
3.​ Translating data into knowledge for decisions & predictions
●​ Course goal: learn a method = problem-solving via knowledgeable use of
statistics:
1.​ Understand theory
2.​ Interpret results correctly
3.​ Apply techniques using R
●​ Learning materials: lecture notes, slides, Blackboard tests
●​ Software: R & RStudio (free to download)
●​ Modules:
1.​ Describing data
2.​ Elements of Probability
3.​ Inferential Statistics – Estimation
4.​ Inferential Statistics – Hypothesis Testing
5.​ Linear Regression

TERMINOLOGY & NOTATION


●​ Population (size N): all statistical units of interest
●​ Sample (size n): subset of population
●​ Inferential process: draw conclusions about population from sample
●​ Random sampling:
○​ Units drawn one at a time
○​ Equal probability for each unit
○​ Equal probability for samples of same size
●​ Parameter: numerical summary at population level
●​ Statistic: numerical summary at sample level
○​ Example:
■​ Parameter = average books read by all people 15–60 in Milan
■​ Statistic = average books read by random sample of people 15–60 in
Milan
●​ Descriptive statistics: describe data using sample statistics
●​ Inferential statistics: learn about population parameters via sample statistics

📑 CHEAT SHEET (QUICK REFERENCE)


●​ Statistic = science of designing, analysing, translating data into knowledge.
●​ Population (N) vs. Sample (n).
●​ Parameter = population-level measure.
●​ Statistic = sample-level measure.
●​ Random sample = each unit/sample has equal chance.
●​ Descriptive statistics = describe sample data.
●​ Inferential statistics = infer about population from sample.
●​ Exams:
○​ Test (8 MCQs, 16 pts, no penalties).
○​ Written exam (3 open questions, 15 pts, R required).
○​ Options: 2 partials (average) or 1 general exam (sum).
●​ Modules: Describing data → Probability → Estimation → Hypothesis testing →
Regression.
●​ Software: R & RStudio required.
🍥 Lecture 2
Sep 9, 2025

Lecture 2
STATISTICAL VARIABLES

●​ Statistical variable = result of measurement process; each question → variable


●​ Types of variables:
○​ Categorical (non-numerical)
■​ Nominal = no ranking (e.g., music type, program attended)
■​ Ordinal = ranked (e.g., education level, agreement scale)
○​ Numerical
■​ Discrete = counting process (e.g., # books read, # siblings)
■​ Continuous = measurable, not countable (e.g., height, commuting
time, house price)
●​ Not a priori: same characteristic can be categorical or numerical depending on
measurement (e.g., age as number vs. grouped intervals)
●​ Statistical tool selection depends on type of variable

FREQUENCY DISTRIBUTION TABLES

●​ Best when few distinct values


●​ First column = distinct values
●​ Second column =
○​ Absolute frequency (counts)
○​ Relative frequency (proportion)
●​ Examples:
○​ Education levels → high school most frequent (50.9%)
○​ Working status → full-time 47.7%, retired 18.2%, unemployed 4.8%

GRAPHS

●​ Pie chart:
○​ whole circle = total
○​ slice = share
●​ Bar chart:
○​ height of bar = frequency

HISTOGRAMS

●​ Used for numerical variables with many distinct values


●​ Steps:
○​ Group into intervals (classes)
○​ Count absolute/relative frequency
●​ Intervals:
○​ Equal length → how many?
○​ Unequal length → decide bounds
●​ Histogram rules:
○​ Horizontal axis = intervals
○​ Bar’s area = relative frequency
○​ Vertical axis = density = (relative frequency ÷ interval length)

SYNTHETIC MEASURES

●​ Categories:
○​ Central tendency (mode, median, mean)
○​ Location
○​ Dispersion

CENTRAL TENDENCY

●​ Purpose: condense data into one typical value


●​ Mode
●​ Median
●​ Mean (arithmetic average)

MODE

●​ Most frequent value (highest frequency)


●​ Only valid for nominal categorical variables.

MEDIAN
●​ Definition: The median is the central value of an ordered dataset.
○​ It splits the dataset into two halves:
■​ 50% of units below the median
■​ 50% of units above the median

Median from Raw Data

1.​ If n is odd:
𝑛+1
a.​ value with position ​
2

​ ​ 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2

2.​ If n is even:
a.​ Two middle observations
b.​ If numerical → average of the two​
𝑥𝑛 +𝑥𝑛
+1
​ ​ 𝑀𝑒𝑑𝑖𝑎𝑛 = 2

2
2

Example: Data = [1, 3, 1, 2, 5, 3]

●​ Ordered: [1, 1, 2, 3, 3, 5]
●​ n=6 (even) → Median = (2+3)/2=2.5

Cumulative Frequencies

How to compute the median using cumulative frequencies, both for discrete and
continuous data.

●​ Definition:​
The cumulative relative frequency for each value is the proportion of
observations with values smaller than or equal to that value.
●​ Applicable to:
○​ Ordinal variables (can be ranked)
○​ Numerical variables
●​ Purpose: Helps locate medians, percentiles, and quantiles.

Example: Degree (Education Level)

Level Absolute Relative Cumulative


Frequency Frequency Frequency

Less than high 82 82709=0.116 0.116


school

High school 361 361709=0.509 0.116+0.509=0.625

Junior college 50 50709=0.071 0.625+0.071=0.696

Bachelor 141 141709=0.199 0.696+0.199=0.895

Graduate 75 75709=0.106 0.895+0.106= 1


Median from Frequency Distribution

●​ Case 1: Cumulative frequency ≠ 0.5


○​ The median is the first value at which the cumulative frequency exceeds
0.5.

●​ Case 2: Cumulative frequency = 0.5


○​ The median can be:
■​ That value,
■​ Any value in the interval with that value as lower bound,
■​ Or the average of interval bounds (for numerical variables).
●​ If cumulative frequency hits exactly 0.5 (not just greater), then:
○​ Median is that value,
○​ Or if grouped data: any point in that interval,
○​ Or midpoint between interval bounds.
●​ Here, cumulative frequency = 0.5 exactly at value 1.​

→ Median = 1 (times per week).

Median from a Histogram

●​ For continuous/grouped data shown in histograms:


○​ Find the interval where the cumulative frequency first exceeds 0.5.
○​ Interpolate within that interval if needed.
●​ Graphically:
○​ The median is the vertical line splitting the histogram’s total area into two
equal halves (50% each).

●​ If the cumulative frequency surpasses 0.5 → median is that value (or interval).
●​ If it equals 0.5 exactly → median is that value, interval, or midpoint.
●​ In histograms → median divides the area into two equal halves.

MEAN

●​ Mean (Arithmetic average) = sum of all values ÷ number of observations


●​ Applicable only to numerical variables

Raw data

formula:
𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

Deviation

●​ Deviation of observation 𝑥𝑖:

𝑑𝑖 = 𝑥𝑖 − 𝑥
Properties of the Mean

1.​ Balancing Point:


a.​ Sum of deviations = 0

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1

2.​ Minimization:
○​ Minimizes the sum of squared deviations

2 2 2
(𝑥1 − 𝑎) + (𝑥2 − 𝑎) + ... + (𝑥𝑛 − 𝑎)

Mean from Frequency Distribution

The slide explains how to compute the mean 𝑥 when data is summarized in a frequency
distribution table.

●​ When values are repeated with certain frequencies, instead of summing every
observation individually, we use the weighted mean formula:

𝑘
∑ 𝑥𝑖· 𝑓𝑖
𝑖=1
𝑥= 𝑛

Where:

●​ 𝑥𝑖 = distinct value of the variable

●​ 𝑓𝑖 = absolute frequency (number of times xi appears)

●​ 𝑛 = total sample size

Alternatively, using relative frequencies (𝑝𝑖 = 𝑓𝑖 / 𝑛):


𝑘
𝑥 = ∑ 𝑥𝑖 · 𝑝𝑖
𝑖=1

Value 𝑥𝑖 Absolute frequency 𝑓𝑖 Relative frequency 𝑝𝑖

0 60 0.30

1 40 0.20

2 60 0.30

3 20 0.10

4 20 0.10

Total 200 1.00

Step 1: Formula with absolute frequencies

(0 ⋅ 60)+(1 ⋅ 40)+(2 ⋅ 60)+(3 ⋅ 20)+(4 ⋅ 20)


𝑥= 200
= 1. 5

Step 2: Formula with relative frequencies

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10) = 1. 5

Key Idea

●​ The mean from a frequency distribution is a weighted average.


●​ Weights are either the absolute frequencies (fi) or the relative frequencies (pi).
●​ This avoids listing all individual data values when summarizing grouped data.
MEAN VS. MEDIAN

●​ Mean: considers all values


●​ Median: depends on ranking/frequencies only
●​ Outliers:
○​ Median = unaffected
○​ Mean = affected (pulled toward outlier)
●​ Example (siblings distribution):
○​ Median = 3
○​ Mean = 3.54 → influenced by few with many siblings

OUTLIERS

●​ Definition: unexpectedly high or low values with low frequency


●​ Strong effect on
mean, weak or no
effect on median
🌸 cheat sheet 2
📑 CHEAT SHEET (QUICK REFERENCE)
●​ Variable types:
○​ Categorical → Nominal (no rank), Ordinal (ranked)
○​ Numerical → Discrete (counts), Continuous (measurements)
●​ Frequency distribution:
○​ Absolute frequency = count
○​ Relative frequency = proportion
●​ Graphs: Pie chart (share), Bar chart (frequency), Histogram (distribution of
continuous/discrete data)
●​ Histogram density = relative frequency ÷ interval length
●​ Central tendency measures:
○​ Mode = most frequent value (nominal OK)
○​ Median = central ranked value (ordinal/numerical)
○​ Mean = arithmetic average (numerical only)
●​ Formulas:

○​ Mean: 𝑥 = (Σ𝑥𝑖)/𝑛

○​ Weighted mean: xˉ=Σ(value×relativefrequency)


●​ Outliers: affect mean, not median
●​ Mean formula: xˉ=Σxi/n
●​ Deviation: xi–xˉ
●​ Properties of mean:
𝑛
○​ Balancing point: ∑ (𝑥 − 𝑥) = 0
𝑖
𝑖=1

○​ Minimizes Σ(xi–a)2 at a=xˉ


●​ Weighted mean: xˉ=Σ(value×relativefrequency)
●​ Mean vs Median:
○​ Mean → sensitive to outliers
○​ Median → robust to outliers
●​ Outliers: rare, extreme values
●​ Formula for the mean:

𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

2.​ Property (Balancing Point):

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1
📖 formulas
1. Mean (Arithmetic Average) – Raw Data

𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

●​ where 𝑥1, 𝑥2, …, 𝑥𝑛 are the observed values of the sample.

2. Property of the Mean (Balancing Point)

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1

3. Property of the Mean (Minimization)

𝑛
2
𝑚𝑖𝑛⁡𝑎 ∑ (𝑥𝑖 − 𝑎) ​ ​ 𝑜𝑐𝑐𝑢𝑟𝑠 𝑤ℎ𝑒𝑛 𝑎 = 𝑥
𝑖=1

4. Weighted Mean (from Frequency Distribution)

If values 𝑥𝑖 have relative frequencies 𝑝𝑖:

𝑘
𝑥 = ∑ 𝑥 𝑖 · 𝑝𝑖
𝑖=1

5. Histogram Density

For an interval of length 𝐿 with relative frequency 𝑓:

𝑓
𝐷𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐿
6. Example – Sports Practice Data

Frequencies:

●​ 0 (30%), 1 (20%), 2 (30%), 3 (10%), 4 (10%).

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10)

𝑥 = 0 + 0. 20 + 0. 60 + 0. 30 + 0. 40 = 1. 5

7. Median (Raw Data)

●​ If n odd:

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2

●​ If n even:

𝑥 𝑛 +𝑥 𝑛
+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2

2
2

8. Median (Frequency Distribution – Cumulative Relative Frequency)

●​ Case 1: if no cumulative frequency = 0.5 →

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑤ℎ𝑒𝑟𝑒 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 > 0. 5

●​ Case 2: if cumulative frequency = 0.5 at value xj:

𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙+𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙


𝑀𝑒𝑑𝑖𝑎𝑛 = 2
🍥 Lecture 3
Sep 11, 2025

Lecture 3: Measures of location


●​ Identify the value

●​ Location of a value in the sequence of orderer observations -> ordinal and


numerical variables
●​ Rank observations and divide in quarters:
-​ Lower quartile -> 𝑄1
-​ Second quartile -> median
-​ Upper quartile -> 𝑄3

QUARTILES FROM A FREQUENCY DISTRIBUTION

●​ Lower quartile -> First value where the cumulative frequency reaches or
exceeds 0.25

●​ Upper quartile -> First value where the cumulative frequency reaches or
exceeds 0.75

PERCENTILES
𝑡ℎ
𝑝 percentile value such that p% of observations are below

value relative freq cumulative freq

0 0.30 0.30

1 0.20 0.50

2 0.30 0.80

3 0.10 0.90

4 0.10 1
FIVE NUMBERS SUMMARY

1.​ Minimum
2.​ Lower quartile
3.​ Median
4.​ Upper quartile
5.​ Maximum

Height of the box (𝑄3 − 𝑄1) = INTERQUARTILE RANGE IR

𝑄3 − 𝑄1 = 𝐼𝑅
𝑄1 − 1. 5𝐼𝑅 < 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
𝑄3 + 𝐼𝑅 > 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
OUTLIER DETECTION FROM BOXPLOT

●​ Interquartile range IR -> upper quartile – lower quartile

●​ Outlier -> any value falling below the lower quartile or above the upper
quartile by more than 1.5 IR

WHISKERS – NO OUTLIERS

No outliers:

●​ lower whisker connects the lower quartile to the minimum value


(𝑄1 − 1. 5𝐼𝑅)

●​ upper whisker connects the upper quartile to the maximum value


(𝑄3 + 1. 5𝐼𝑅)
MEASURES OF VARIABILITY
●​ Applied to numerical variables

Range

●​ max – min
●​ Affected by outliers

IQR (Interquartile range)

●​ Q3 – Q1 = IQR
●​ variability of central 50%

Variance and Standard Deviation

Deviation = value - mean

Variance (s²)​
2 2 2
2 (𝑥1−𝑥) +(𝑥2−𝑥) +⋯+(𝑥𝑛−𝑥)
​ ​ ​ 𝑠 = 𝑛−1

→ variability around mean

Standard deviation (s) = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠

COEFFICIENT OF VARIATION (CV)


●​ Formula:​
𝑠
​ ​ ​ 𝐶𝑉 =
|𝑥|

●​ standard deviation expressed as a percentage of the mean


Variability Comparison

●​ Variance / Standard Deviation → variables with the same mean


●​ Coefficient of Variation → variables with different means or unit of measurement
Example:
Sample of 100 students
SAT scores → mean 550, stand dev 100
ACT scores → mean 18, stand dev 6
SAT scores → CV = 100/550 = 0.19
ACT scores → CV = 6/18 = 0.34
→ ACT more variable relative to its mean
🌸 cheat sheet 3
📑 CHEAT SHEET (QUICK REFERENCE)
●​ Quartiles: Q1 = 25%, Q2 = 50% (median), Q3 = 75%
●​ Five-number summary: min, Q1, median, Q3, max
●​ Boxplot:
○​ Whiskers = min/max (no outliers) OR within 1.5×IQR
○​ Outliers = beyond 1.5×IQR
●​ Variability measures:
○​ Range = max – min
○​ IQR = Q3 – Q1
○​ Variance (s²) = average squared deviations (n–1 denominator)
○​ SD (s) = √variance
●​ Coefficient of Variation (CV): s/∣xˉ∣ → compares across scales
●​ Interpretation:
○​ Range & variance affected by outliers
○​ Median & IQR robust to outliers
🍥 Lecture 4
Sep 12, 2025

Lecture 4: Shape of Distributions


When we look at a numerical variable (like income, grades, height, etc.), we want to know
the shape of its distribution.

SHAPE OF DISTRIBUTIONS

●​ Applies to numerical variables


●​ Tools: Histogram or Boxplot

Symmetric distribution

●​ left and right sides mirror each other


●​ MEAN = MEDIAN

SYMMETRIC DISTRIBUTION

●​ Bell-shaped distribution
○​ equal spacing of quartiles
●​ U-shaped distribution
○​ median–Q1 > Q1–min

ASYMMETRIC DISTRIBUTION (SKEWED)

●​ Right-skewed (positively skewed):

○​ Mean > Median.


●​ Left-skewed (negatively skewed):

○​ Mean < Median.

MEAN AND MEDIAN:

●​ Symmetric distribution → mean = median


●​ Skewed-right distribution → mean > median
●​ Skewed-left distribution → mean < median

CONCENTRATION ANALYSIS
How equally or unequally is a variable (like income, wealth, sales, etc.) distributed among
individuals or units? Can one unit hold a huge share of the total, while others have little?

Used for numerical positive variables with transferable characteristics (e.g., income)
●​ Is the variable equally distributed among units?
●​ Do a few units hold most of the amount?

EXAMPLE:

4 households’ incomes: 40, 20, 30, 70

●​ Total = 160
●​ Mean income = 40

Extreme Scenarios:

Perfect equality:

●​ all units same amount


●​ Mean income 40 (160/4)

Maximum concentration:

●​ one unit has all (160), others zero

Goal: compare observed distribution to these extremes

How To Measure Concentreation:

1.​ Concentration curve


2.​ Concentration index

CONCENTRATION CURVE

A graphical tool to see inequality.

NOTATION

Variable 𝑋 → values 𝑥 , 𝑥 , ..., 𝑥 ranked in ascending order


1 2 𝑛

Compute coordinates:

For each unit 𝑖 = 1 ... 𝑛


𝑖 𝑥1+𝑥2+ ... + 𝑥𝑖
𝐹𝑖 = 𝑛
​​ ​ 𝑄𝑖 = 𝑥1+𝑥2+ ... + 𝑥𝑛

𝐹0 = 𝑄0 = 0​ ​ ​ 𝐹𝑛 = 𝑄𝑛 = 1

Curve joining the n+1 points (Fi,Qi)

●​ 𝑄 → proportion of the total variable’s amount held by the bottom 𝐹 proportion


𝑖 𝑖
●​ (𝑄 · 100) % → percentage of the total variable’s amount held by the bottom
𝑖
(𝐹𝑖 · 100) % of the sample

Example

●​ Point (0.25, 0.125) →


○​ 12.5% of the total income held by the bottom 25% (poorest)
○​ 87.5% of the total income is held by the top 75% (richest)
●​ Point (0.75, 0.5625) →
○​ bottom 75% holds 56.25% of the total income
○​ top 25% holds 43.65% of the total income

COORDINATES: Extreme situation


●​ Perfect equality: line Q = F
●​ Perfect concentration: curve flat at 0 until last unit

Interpretation:

●​ If the curve is close to the diagonal line → distribution is equal.


●​ If the curve is close to the x-axis → distribution is very unequal.
●​ Closer to diagonal → lower concentration
●​ Closer to x-axis → higher concentration

Properties:

●​ Non-decreasing → monotone
●​ Convex → 𝐹 > 𝑄
𝑖 𝑖

CONCENTRATION INDEXES

GINI’S INDEX (R):

(𝐹1−𝑄1)+(𝐹2−𝑄2)+...+(𝐹𝑛−1−𝑄𝑛−1)
𝑅= 𝐹1+𝐹2+...+𝐹𝑛−1
●​ Perfect equality → 𝐹 = 𝑄 → 𝑅 = 0
𝑖 𝑖
●​ Maximum concentration → 𝑎𝑙𝑙 𝑄 = 0 , 𝑄 = 1 → 𝑅 = 1
𝑖 𝑛
●​ Range: [0,1]
●​ higher = more concentrated

PIETRA’S INDEX (P):

●​ Perfect equality → 𝑃 = 0
●​ Maximum concentration → 𝑃 = 1

INTERPRETATION:

●​ Values closer to 0 → the lower the concentration (more equal)


●​ Values closer to 1 → the greater the concentration (more concentrated)

Example:
🌸 cheat sheet 4
CHEAT SHEET (QUICK REFERENCE)
●​ Shape:
○​ Symmetric → mean = median
○​ Right-skewed → mean > median
○​ Left-skewed → mean < median
●​ Concentration:
○​ Perfect equality: all equal
○​ Max concentration: one unit holds all
●​ Concentration curve:
○​ Fi=i/n, Qi=cumulative xtotal xQi​=total xcumulative x​
○​ Perfect equality → diagonal line
○​ Max concentration → flat at 0 until last unit
●​ Indexes:
○​ Gini (R): [0,1]; higher = more concentrated
○​ Pietra (P): [0,1]; higher = more concentrated
🍥 Lecture 5
Sep 12, 2025

Lecture 5
AIM OF BIVARIATE DESCRIPTIONS

Bivariate statistics = techniques to study the joint behavior of two variables at sample
level

ASYMMETRIC PERSPECTIVE
Investigating in bivariate association → assessing dependency

●​ Does one variable depend on the other?


●​ Can one variable’s variability be explained by another?

Roles of variables

●​ Dependent variable → measures what we want to explain


●​ Independent variable → measures proposed explanation

TECHNIQUES
Depends on the variable’s type
CASES

●​ Both variables (dependent & independent) → categorical


●​ Dependent → numerical; independent → categorical
●​ Both variables (dependent & independent) → numerical
CONTINGENCY TABLES

Cross-table (Cross - Tab)

●​ Rows = values of one variable


●​ Columns = values of another variable
●​ Cells = joint counts or joint proportions
●​ Marginal distributions = univariate distributions (row totals, column totals)

Example: Streaming Activity Vs Subscription

●​ Streaming (dependent variable): Light, Moderate, Heavy


●​ Subscription (independent variable): Free, Standard, Premium

Univariate descriptives show distributions separately, but cannot detect association

Cross-tab of example:

Marginal Distributions
Relative Frequencies

●​ Each cell can be expressed as a relative frequency of total sample


●​ Example:
○​ Free & Light = 0.21
○​ Premium & Heavy = 0.13
●​ Marginal totals represent univariate distributions

CONDITIONAL FREQUENCY DISTRIBUTIONS


●​ Independent variable splits the sample into groups (sub-samples)
●​ Within each group, compute relative frequencies of the dependent variable =
conditional frequency distribution
●​ Compare conditional distributions:
○​ Association assessment → comparison
■​ Similar cond freq distr → no association
■​ Different cond freq distr → association

Example

●​ Independent = Subscription (Free, Standard, Premium)


●​ Dependent = Streaming activity
●​ Compute 3 separate conditional frequency distributions:
○​ Streaming | Free
○​ Streaming | Standard
○​ Streaming | Premium
🌸 cheat sheet 5
📑 CHEAT SHEET (QUICK REFERENCE)
●​ Bivariate statistics: study joint behavior of 2 variables, assess association
●​ Dependent variable: explained; Independent variable: explanation
●​ Cases:
○​ Categorical vs categorical
○​ Numerical vs categorical
○​ Numerical vs numerical
●​ Contingency table (cross-tab):
○​ Rows = categories of one variable
○​ Columns = categories of another
○​ Cells = joint counts/proportions
○​ Marginals = univariate totals
●​ Relative frequencies: cell count ÷ total
●​ Conditional frequency distribution: relative distribution of dependent variable
within each group of independent variable
●​ Association test:
○​ Conditional distributions similar → no association
○​ Conditional distributions different → association present
🍥 Lecture 6
Lecture 6
Example (lecture 5)
●​ Relative frequency distribution of Streaming separately for each of the three
groups:
○​ CFD of Streaming given Subscription = Free

○​ CFD of Streaming given Subscription = Standard

○​ CFD of Streaming given Subscription = Premium


Streaming vs Subscription example:

○​ Free users: mostly Light (0.56)


○​ Standard users: mostly Moderate (0.55)
○​ Premium users: mostly Heavy (0.54)

→ Streaming depends on Subscription

Association:

●​ Association between Streaming and Subscription → Streaming depends on


Subscription

Charts of Conditional Distributions

Stacked bar plot

→ shows conditional distribution stacked in each group


Side-by-side bar plot

→ compares conditional frequencies across groups

Means Comparison

●​ Dependent variable: numerical (e.g., streaming minutes/day)


●​ Independent variable: categorical (e.g., device used)
●​ Procedure:
○​ Split sample by groups of IV
○​ Compute group means of DV
○​ Compare means
○​ If means close → no association; if different → possible association
●​ Example: Device used for streaming
○​ Phone: mean = 143.7 min, SD = 38.6
○​ Laptop: mean = 154.8 min, SD = 36.4
○​ SmartTV: mean = 174.7 min, SD = 36.5
○​ → Suggests association between device and streaming time
Association Between Numerical Variables

Bivariate Association

Scatterplot

○​ X-axis → independent variable


○​ Y-axis → dependent variable
●​ Example: Internet penetration vs Facebook penetration
○​ Higher internet use → higher Facebook use
○​ Trend shows positive relationship

Linear Association

●​ Positive: high values of one variable with high values of other


●​ Negative: high values of one with low values of other

Measures of linear association

Covariance

Formula:​
(𝑥1−𝑥)(𝑦1−𝑦)+(𝑥2−𝑥)(𝑦2−𝑦)+...+(𝑥𝑛−𝑥)(𝑦𝑛−𝑦)
​ ​ 𝐶𝑜𝑣(𝑋, 𝑌) = 𝑛−1
●​ 𝑥 → value of X observed on unit 1
1
●​ 𝑦 → value of Y observed on unit 1
1

●​ 𝑥 → mean of X
●​ 𝑦 → mean of Y

Interpretation:

●​ Positive covariance → positive linear association


●​ Negative covariance → negative linear association
●​ Covariance = 0 → no linear association

Example: Internet vs Facebook penetration → Cov = 219.9 (positive)

●​ Points in I and III → positive


covariance
●​ Points in II and IV → negative
covariance

●​ Majority of points in quadrants I and III → positive linear association


●​ Overall trend → line with positive slope

●​ Majority of points in quadrants II and IV → negative linear association


●​ Overall trend → line with negative slope
Pearson’s Correlation Index (r)

Formula:​
𝐶𝑜𝑣(𝑋, 𝑌)
​ ​ ​ 𝑟= 𝑠𝑋 · 𝑠𝑌

●​ 𝑠 standard deviation of X
𝑋

●​ 𝑠 standard deviation of Y
𝑌

Range: ​ ​ −1≤𝑟≥1

Measures:

●​ Direction of the linear association


●​ Strength of the linear association

Interpretation:

●​ r = 0 → no linear correlation (linear


association)
●​ r > 0 → positive linear association
●​ r < 0 → negative linear association
●​ The closer |r| to 1 → the stronger the
association
🌸 cheat sheet 6
📑 Cheat Sheet (Quick Reference)
●​ Conditional frequency distributions:
○​ Compare groups → Similar = no association; Different = association
●​ Means comparison (numerical DV vs categorical IV):
○​ Compare group means → similar = no association; different = association
●​ Scatterplot: visual relationship (X = IV, Y = DV)
●​ Covariance:
■​ = positive trend
○​ – = negative trend
○​ 0 = no linear trend
●​ Pearson’s correlation (r):
○​ Between -1 and +1
○​ Measures direction & strength of linear association
○​ Example: r = 0.614 → moderate-high positive
🍥 Lecture 7
Lecture 7: Probability & Random
Variables
The Need for Probability
●​ Probability → quantifies uncertainty
●​ Inferential process → conclusions affected by uncertainty
●​ Mathematical → probabilistic framework

Random Variables
Random Experiments

●​ Experiment = process with uncertain outcome


●​ Examples:
○​ Roll a die → outcomes {1,2,3,4,5,6}
○​ Draw a household → measure income
○​ Draw a firm → measure employees
○​ Draw an adult → measure TV hours or museum visits

Random Variable

●​ Random variable (X) → function associates a number to each experiment outcome


○​ Numerical measurement of an outcome of an experiment

EXAMPLE:

Toss two coins. Number of observed heads


●​ Variable → 0, 1, 2 possible values
●​ Random → unknown value before the experiment

Types:

○​ Discrete: finite/countable outcomes (e.g., coin tosses)


○​ Continuous: uncountable outcomes over an interval
Discrete Random Variables

X discrete with finite number of values → 𝑥 , 𝑥 , ..., 𝑥


1 2 𝑘

Probability Mass Function (pmf)

The pmf, for each value 𝑥𝑖, returns:


𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖​ ​ for 𝑖 = 1, 2, ..., 𝑘

Conditions:

●​ 0 ≤ 𝑝𝑖 ≤ 1
●​ 𝑝 + 𝑝2 + ... + 𝑝𝑘 = 1
1

Example (two coins, X = # of heads):

●​ Values: {0,1,2}
●​ Probabilities: 0.25, 0.50, 0.25

Expected Value (Mean):

X discrete with finite number of values → 𝑥 , 𝑥 , ..., 𝑥


1 2 𝑘

𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖​

𝐸(𝑋) = μ = 𝑥1𝑝1 + 𝑥2𝑝2 + ... + 𝑥𝑘𝑝𝑘

Example:

No. of heads in the toss of 2 coins

Expected value of the number of heads:

𝐸(𝑋) = 0 · 0. 25 + 1 · 0. 5 + 2 · 0. 25 = 1
Variance and Standard Deviation of a Random Variable

Variance

2 2 2 2
𝑉𝑎𝑟(𝑋) = σ = (𝑥1 − μ) 𝑝1 + (𝑥2 − µ) 𝑝2 + ... + (𝑥𝑘 − µ) 𝑝𝑘

Standard deviation

2
σ = 𝑉𝑎𝑟(𝑋) = σ

Median, quartiles, percentiles

●​ defined same as descriptive statistics


●​ e.g., median → value such that there is 50% probability of observing a smaller value

Continuous Random Variables


●​ Can take any value in an interval
●​ Described by probability density function (pdf):
○​ Must be non-negative
○​ Total area under curve = 1
●​ Probability of interval: area under curve between bounds

Expected value, variance, quantiles: interpreted as in descriptive statistics

Quantile of order k

●​ Let k, 0 < k < 1


●​ Quantile of order k of the random variable X → value q such that:
○​ There is a probability equal to k
○​ To observe a value ≤ q

𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘
🌸 cheat sheet 7
📑 Cheat Sheet (Quick Reference)
●​ Probability = quantification of uncertainty; used in decisions & inference.
●​ Random variable (X) = numerical outcome of random experiment.
●​ Discrete random variable:
○​ pmf: Pr(X=xi)=pi, ∑pi=1
○​ Mean: E(X)=∑xipi
○​ Variance: Var(X)=∑(xi−μ)2pi
○​ SD: σ=Var(X)
●​ Continuous random variable:
○​ pdf: non-negative, total area = 1
○​ Probabilities = areas under pdf
○​ Quantiles: Pr(X≤q)=k
●​ Median = value with 50% probability below it.
🍥 Lecture 8
Lecture 8
Linear Transformation
for a random variable X, constants a, b, define:

𝑌 = 𝑎 + 𝑏𝑋 ​ ​ 𝑌 → linear transformation of 𝑋

Expected Value & Variance of Y

𝐸(𝑌) = 𝐸(𝑎 + 𝑏𝑋) = 𝑎 + 𝑏𝐸(𝑋)


2
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝑎 + 𝑏𝑋) = 𝑏 𝑉𝑎𝑟(𝑋)

Example

Payment 𝑃 = 10000 + 1. 5𝑆​ , where S is # sold,

𝐸(𝑆) = 30000, ​ 𝑆𝐷(𝑆) = 8000​

𝐸(𝑃) = 𝐸(10000 + 1. 5𝑆) = 10000 + 1. 5𝐸(𝑆) = 10000 + 1. 5 · 30000 = 55000

2
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(10000 + 1. 5𝑆) = 𝑉𝑎𝑟(1. 5𝑆) = 1. 5 𝑉𝑎𝑟(𝑆)

2 2 2 2
𝑉𝑎𝑟(𝑃) = 1. 5 · 𝑆𝐷 (𝑆) = 1. 5 · 8000 = 12000​

Standardization
2
For X , random variable, 𝐸(𝑋) = µ​ 𝑉𝑎𝑟(𝑋) = σ

Standardized variable

𝑋−µ
𝑍= σ

Expected value & Variance of Z

𝑋−µ 1 µ
𝑍= σ
= σ
𝑋 − σ

●​ Expected Value:

1 µ µ µ
𝐸(𝑍) = σ
𝐸(𝑋) − σ
= σ
− σ
= 0

●​ Variance:
2
1 σ
𝑉𝑎𝑟(𝑍) = 2 𝑉𝑎𝑟(𝑋) = 2 = 1
σ σ

Bernoulli Distribution
Experiment with two outcomes → success / failure

●​ 𝑝 → success probability
●​ 𝑋 → nr of times we observe success
●​ Value 1 → probability 𝑝
●​ Value 0 → probability 1 − 𝑝

Notation: ​ ​ ​ 𝑋 ∼ 𝐵𝑒𝑟(𝑝)

Examples:

●​ Number of heads in one toss of a coin → Bernoulli random variable with success
probability p=0.5
●​ Number of 6 observed in rolling once a dice → Bernoulli random variable with
success probability p = 1/6

Probability distribution

𝑋 ∼ 𝐵𝑒𝑟(𝑝)

Distribution parametrized by 𝑝

𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖​


Expected Value & Variance​

𝐸(𝑋) = 0 · (1 − 𝑝) + 1 · 𝑝 = 𝑝

2 2
𝑉𝑎𝑟(𝑋) = (0 − 𝑝) (1 − 𝑝) + (1 − 𝑝) 𝑝 = 𝑝(1 − 𝑝)

Normal (Gaussian) Distribution


Density
2
𝑋 ∼ 𝑁(µ, σ )
2
(𝑥−µ)

1 2σ
2

𝑓(𝑥) =
2
·𝑒
2πσ

●​ µ real number
2
●​ σ positive real number

●​ µ → expected value and median (mean = median = mode)


2
●​ σ → variance

Standard normal distribution


2
µ = 0​​ σ = 1​ 𝑍 ∼ 𝑁(0, 1)

Standardization of normal random variable

2 𝑋−µ
𝑋 ∼ 𝑁(µ, σ ) → ​ 𝑍= σ
Commands for normal distribution
how to compute probabilities using R functions​

Cumulative probability
2
For 𝑋 ∼ 𝑁(µ, σ )

​ pnorm(x,µ, σ)

returns

​ 𝑃𝑟(𝑋 ≤ 𝑥)

Example:

Cumulative Probability Examples:

2
X, 𝑋 ∼ 𝑁(3, 5 ), with normal distribution, 𝐸(𝑋) = 3,​ 𝑆𝐷(𝑋) = 5 ( σ = 5)

𝑃𝑟(𝑋 ≤ 4. 5) = 𝑃𝑟(𝑋 < 4. 5) = 𝑝𝑛𝑜𝑟𝑚(4. 5, 𝑚𝑒𝑎𝑛 = 3, 𝑠𝑡𝑎𝑛𝑑 𝑑𝑒𝑣 = 5) = 0. 6179144

𝑃𝑟(𝑋 > 6) = 1 − 𝑃𝑟(𝑋 ≤ 6) = 1 − 𝑃𝑟(𝑋 < 6) = 1 − 𝑝𝑛𝑜𝑟𝑚(6, 3, 5) = 1 − 0. 7257 = 0. 27425

Interval probability:

𝑃𝑟(2. 5 ≤ 𝑋 ≤ 5) = 𝑃𝑟(𝑋 ≤ 5) − 𝑃𝑟(𝑋 ≤ 2. 5) = 𝑝𝑛𝑜𝑟𝑚(5, 3, 5) − 𝑝𝑛𝑜𝑟𝑚(2. 5, 3, 5) = 0. 195


Quantile

qnorm( k, mean = μ, sd = σ )

2
For 𝑋 ∼ 𝑁(µ, σ )​ ​
𝑞𝑛𝑜𝑟𝑚( 𝑘, µ, σ )

returns the quantile of order 𝑘, the value 𝑞, such that

𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘

Example:

We are working with a normally distributed variable:

We want to find the value 𝑞0.8 (a quantile or cut-off point) such that:

That means we are looking for the 80th percentile — the value below which 80% of all
the possible X values lie.

Step 1 — What qnorm() does

●​ The function qnorm(p, mean, sd) gives the quantile 𝑞𝑝 for a normal distribution
with mean = μ and standard deviation = σ.
●​ In mathematical terms, it finds the x-value that satisfies:

𝑃(𝑋 ≤ 𝑞𝑝) = 𝑝

So, when we call qnorm(0.8, mean = 3, sd = 5), R is finding the x-value 𝑞 such
0.8
that:​
​ ​ ​ 𝑃(𝑋 ≤ 𝑞0.8) = 0. 8​

Step 2 — Link between standard and general normal

2
To find 𝑞0.8 for 𝑋 ∼ 𝑁(3, 5 ) , R internally performs standardization.

It uses the property of the normal distribution:​


𝑋−µ
𝑍= ​ and hence​ 𝑋 = µ + σ𝑍
σ

where 𝑍 ∼ 𝑁( 0, 1 ) (the standard normal distribution) → µ = 0 & σ = 1


Step 3 — Find the z-value for probability 0.8

We want the value 𝑧 such that 𝑃(𝑍 ≤ 𝑧 ) = 0. 8


0.8 0.8

From standard normal tables (or qnorm(0.8) for the standard case):​
𝑧0.8 = 0. 8416

That means:​
𝑃(𝑍 ≤ 0. 8416) = 0. 8

Using R:

Step 4 — Convert the z-value back to the X scale

Now we transform 𝑧 back into 𝑥 using the formula:​


0.8
​ ​ ​ ​ ​ 𝑞0.8 = µ + σ · 𝑧0.8

Substitute the numbers:​


​ ​ 𝑞0.8 = 3 + 5 · 0. 8416 = 3 + 4. 208 = 7. 208
Step 5 — Using R (as shown on the slide)

R does all these steps internally.

You just write:

qnorm(0.8, mean = 3, sd = 5)

and R returns:

[1] 7.208106

This is the 80th percentile, i.e.:​


𝑃(𝑋 ≤ 7. 208) = 0. 8

Step 6 — Interpretation (what the result means)

●​ If X represents, for example, the number of hours watched, a value of 7.21 (approx)
is such that:
○​ 80% of people (or observations) watch ≤ 7.21 hours.
○​ 20% watch more than 7.21 hours.

Summary of the mathematical process

Step Formula Explanation

1 𝑃(𝑋 ≤ 𝑞0.8) = 0. 8 Definition of quantile

2 𝑋−3 Standardize X
𝑍=
5

3 𝑃(𝑍 ≤ 0. 8416) = 0. 8 ⇒ 𝑧0.8 = 0. 8416 80th percentile of standard


normal
4 𝑞0.8 = 3 + 5 · 0. 8416 = 7. 208 Convert back to original scale

5 R command: qnorm(0.8, mean=3, sd=5) Automatically gives 7.208

6 Interpretation 80% of values lie below 7.208

Final Answer

𝑞0.8 = 7. 208​ and ​ 𝑃(𝑋 ≤ 7. 208) = 0. 8

Random Vector
Goal: study joint probabilistic behavior of more variables

focus on the case of two discrete, random variables.

Discrete Bivariate Vector

𝑋 → discrete random variable taking values 𝑥1, 𝑥2 ..., 𝑥𝑘


𝑌 → discrete random variable taking values 𝑦1, 𝑦2, ..., 𝑦ℎ

Marginal probability functions

𝑃𝑟𝑋(𝑋 = 𝑥𝑖) 𝑓𝑜𝑟 𝑖 = 1, 2, ..., 𝑘


𝑃𝑟𝑌(𝑌 = 𝑦𝑗) 𝑓𝑜𝑟 𝑗 = 1, 2, ..., ℎ

Joint probability function

→ joint probabilistic behaviour


For each pair (𝑥𝑖, 𝑦𝑗) returns:​

​ ​ 𝑃𝑟(𝑋 = 𝑥𝑖 𝑎𝑛𝑑 𝑌 = 𝑦𝑗) = 𝑃𝑟(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗) = 𝑝(𝑥𝑖 , 𝑦𝑗)​

Represent with a cross-tab.

Example

●​ 𝑋 →
○​ 1 → respondent is a student
○​ 0 → otherwise
●​ 𝑌 →
○​ 1 → unsatisfied
○​ 2 → indifferent
○​ 3 → satifed

Joint Probability Distribution

X Y

1 2 3

0 0.2 0.05 0.15

1 0.15 0.35 0.1

Maraginals

​ ​ 𝑃𝑟(𝑋 = 0, 𝑌 = 1) = 0. 2​ ​ 𝑃𝑟(𝑋 = 1, 𝑌 = 3) = 0. 1
Independence condition

Definition:
X and Y are independent if knowing one gives no information about the other.

Two discrete random variables are statistically independent​


​ ​ 𝑃𝑟(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗) = 𝑃𝑟(𝑋 = 𝑥𝑖)𝑃𝑟(𝑌 = 𝑦𝑗)

For each 𝑖 = 1, 2, ..., 𝑘 and each 𝑗 = 1, 2, ..., ℎ

X Y=1 Y=2 Y=3

0 0.1 0.2 0.1

1 0.1 0.2 0.3

We can check:

𝑃(𝑋 = 0, 𝑌 = 1) = 0. 1

𝑃𝑋(0)⋅𝑃𝑌(1) = 0. 4 × 0. 2 = 0. 08

Since 0. 1≠0. 08 → X and Y are not independent.​

Linear association measures

COVARIANCE and PEARSON’S CORRELATION COEFFICIENTS → def & interp as in


descriptive statistics

Covariance
Formula:
Interpretation:

●​ 𝐶𝑜𝑣 > 0 → when X increases, Y tends to increase (positive relationship)


●​ 𝐶𝑜𝑣 < 0 → when X increases, Y tends to decrease (negative relationship)
●​ 𝐶𝑜𝑣 = 0 → no linear relationship

Pearson’s Correlation Index


𝐶𝑜𝑣 ( 𝑋 , 𝑌 )
ρ(𝑋 , 𝑌) =
σ𝑋σ𝑌

− 1 ≤ ρ(𝑋 , 𝑌) ≤ 1
Where:

●​ σ𝑋 = 𝑉𝑎𝑟(𝑋)

●​ σ𝑌 = 𝑉𝑎𝑟(𝑌)

𝑋 and 𝑌 independent → ρ( 𝑋 , 𝑌 ) = 0
●​ ρ( 𝑋 , 𝑌 ) = 0 → NO linear association
●​ ρ( 𝑋 , 𝑌 ) > 0 → positive linear association
●​ ρ( 𝑋 , 𝑌 ) < 0 → negative linear association

Sum and Difference of Two Random Variables


Bivariate Vector

●​ 𝑋 →
○​ 𝐸(𝑋) = µ
𝑋
2
○​ 𝑉𝑎𝑟(𝑋) = σ
𝑋
●​ 𝑌 →
○​ 𝐸(𝑌) = µ
𝑌
2
○​ 𝑉𝑎𝑟(𝑌) = σ𝑌
Sum of two Random Variables

●​ X and Y correlated:

𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) = µ𝑋 + µ𝑌

2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 + 2𝐶𝑜𝑣(𝑋 , 𝑌)

●​ X and Y not correlated ( 𝐶𝑜𝑣(𝑋 , 𝑌) = 0 ):

𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) = µ𝑋 + µ𝑌

2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌 ​

Difference of two Random Variables

●​ X and Y correlated:

𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = µ𝑋 − µ𝑌

2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 − 2𝐶𝑜𝑣(𝑋 , 𝑌)

●​ X and Y not correlated ( 𝐶𝑜𝑣(𝑋 , 𝑌) = 0 ):

𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = µ𝑋 − µ𝑌

2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌

I.I.D. Random Variables


N random variables 𝑋1, 𝑋2, ..., 𝑋𝑛:

●​ Independent
●​ Identically distributed:
○​ Same 𝐸(𝑋) = µ
2
○​ Same 𝑉𝑎𝑟(𝑋) = σ ​
Sum of I.I.D Random Variables

𝐸(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝐸(𝑋1) + 𝐸(𝑋2) + ... + 𝐸(𝑋𝑛) = 𝑛µ


2
𝑉𝑎𝑟(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝑉𝑎𝑟(𝑋1) + 𝑉𝑎𝑟(𝑋2) + ... + 𝑉𝑎𝑟(𝑋𝑛) = 𝑛σ

Arithmetic average of I.I.D Random Variables

𝑋1 + 𝑋2 + ... + 𝑋𝑛
𝑋=
𝑛

●​ Expected value:

𝑋1 + 𝑋2 + ... + 𝑋𝑛
1 1
𝐸(𝑋) = 𝐸 ( 𝑛
)= 𝑛
𝐸(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝑛
𝑛µ = µ

●​ Variance:
𝑋! + 𝑋2 + ... + 𝑋𝑛 𝑛σ
2 2
σ
1
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟 ( 𝑛
)= 2 𝑉𝑎𝑟(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 2 = 𝑛
𝑛 𝑛
🌸 cheat sheet 8
📑 Cheat Sheet — Lecture 8
●​ Linear transform:
Y=a+bX⇒E(Y)=a+bE(X), Var(Y)=b2Var(X)Y=a+bX⇒E(Y)=a+bE(X),Var(Y)=b2Var(X).
●​ Standardization: Z=X−μσ⇒E(Z)=0, Var(Z)=1Z=σX−μ​⇒E(Z)=0,Var(Z)=1.
●​ Bernoulli X∼Ber(p)X∼Ber(p):
Pr(X=1)=p, Pr(X=0)=1−p; E(X)=p, Var(X)=p(1−p)Pr(X=1)=p,Pr(X=0)=1−p;E(X)=p,Var(X)=
p(1−p).
●​ Normal N(μ,σ2)N(μ,σ2): density
f(x)=12πσ2e−(x−μ)2/(2σ2)f(x)=2πσ2​1​e−(x−μ)2/(2σ2); standardization
Z=X−μσ∼N(0,1)Z=σX−μ​∼N(0,1);​
probs with pnorm, quantiles with qnorm.
●​ Joint discrete (X,Y): joint pmf p(xi,yj)p(xi​,yj​); independence iff
p(xi,yj)=PrX(xi)PrY(yj)p(xi​,yj​)=PrX​(xi​)PrY​(yj​); correlation
ρ=Cov(X,Y)σXσYρ=σX​σY​Cov(X,Y)​.
●​ Sum/Difference:​
Var(X±Y)=σX2+σY2±2 Cov(X,Y),E(X±Y)=μX±μ[Link](X±Y)=σX2​+σY2​±2Cov(X,Y),E(X±Y)
=μX​±μY​.​
If not correlated: Var(X±Y)=σX2+σY2Var(X±Y)=σX2​+σY2​.
●​ i.i.d.: ∑Xi∑Xi​has E=nμ, Var=nσ2E=nμ,Var=nσ2; XˉXˉ has E=μ, Var=σ2/nE=μ,Var=σ2/n.
🍥 Lecture 8 pt 2
Lecture 8.2

I.I.D Random Variables


(𝑋1, 𝑋2, ..., 𝑋𝑛) :
-​ independent
-​ Identically distributed

=> 𝐸(𝑋 ) = µ
𝑖

2
=> 𝑉𝑎𝑟(𝑋𝑖) = σ

Sum Of I.I.D Random Variables


𝑇 = 𝑋1 + 𝑋2 + ... + 𝑋𝑛

𝐸(𝑇) = 𝐸(𝑋1 + ... + 𝑋𝑛) = 𝐸(𝑋1) + ... + 𝐸(𝑋𝑛) = 𝑛µ


2
𝑉𝑎𝑟 (𝑇) = 𝑉𝑎𝑟 (𝑋1 + ... + 𝑋𝑛) = 𝑉𝑎𝑟(𝑋1) + 𝑉𝑎𝑟(𝑋2) + ... + 𝑉𝑎𝑟(𝑋𝑛) = 𝑛σ

Because they are ind & id. distr.

Arithmetic Average Of I.I.D Random Variables


(𝑋1, 𝑋2, ..., 𝑋𝑛) i. i. d.

𝑋1+ ... + 𝑋𝑛
𝑇
𝑋=
𝑛
= 𝑛

𝑇 1 1
𝐸(𝑋) = 𝐸( 𝑛 ) = 𝑛
𝐸(𝑇) = 𝑛
𝑛µ = µ

2
𝑇 1 1 2 σ
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟( ) = 𝑛 2 𝑉𝑎𝑟(𝑇) = 2 𝑛σ = 𝑛
𝑛 𝑛
1.​ 𝑦 = 𝑎 + 6𝑥 = 3𝑥

𝐸(𝑦) = 𝐸(3𝑥) = 3𝐸(𝑥)


𝑉𝑎𝑟(𝑦) = 𝑉𝑎𝑟(3𝑥) = 9𝑉𝑎𝑟(𝑥)

Central Limit Theorem

When we take many i.i.d. random variables with any distribution (not necessarily normal),
the distribution of their sum or average becomes approximately normal as the sample
size n grows large.

In simple terms

Even if the original data are not normally distributed, the sample mean (average of many
independent observations) will follow a bell-shaped (normal) distribution if the sample
size 𝑛 is big enough.

The idea behind it

When we add random variables together:

●​ Small random ups and downs tend to cancel out,


●​ The sum or average of many variables behaves in a predictable, smooth, and
symmetric way..

Mathematical Formulation

Let 𝑋 , 𝑋 , ... , 𝑋 be i.i.d. random variables with:​


1 2 𝑛
2
​ ​ ​ 𝐸(𝑋𝑖) = µ​ ​ 𝑉𝑎𝑟(𝑋𝑖) = σ

Sum of the variables:

𝑆𝑛 = 𝑋1 + 𝑋2 + ... + 𝑋𝑛

Then for large 𝑛 :​


2
​ ​ ​ 𝑆𝑛 ≈ 𝑁(𝑛µ , 𝑛σ )

That means:
●​ The mean (center) of 𝑆 is 𝑛µ
𝑛
2
●​ The variance (spread) of 𝑆 is 𝑛σ
𝑛

Sample mean:

Then for large 𝑛 :

That means:

●​ The mean (center) of 𝑋 is µ


2
σ
●​ The variance (spread) of 𝑋 is smaller:
𝑛

→ The more observations, the less variability in the sample mean.

Standardized version (Z form)

We can rewrite the theorem in standardized form, to use the standard normal
distribution 𝑁(0, 1):

That means: if we convert the sample mean into a z-score, it will follow a normal
distribution when 𝑛 is large.

What makes CLT important

It lets us use normal probability methods even when data are not normal, as long as 𝑛
is large.
How the shape changes with n

n Shape of distribution of the mean

small n (e.g., 5) can be skewed or irregular

moderate n (e.g., 30) smoother, more symmetric

large n (e.g., 100+) very close to normal

Summary of Key CLT Formulas

Concept Formula Meaning

2
Sum of i.i.d. 𝑆𝑛 ≈ 𝑁(𝑛µ , 𝑛σ ) total becomes approximately
variables normal

Mean of i.i.d. 2 average becomes


𝑋≈𝑁(μ, σ /𝑛)
variables approximately normal

Standardized allows use of standard


form normal tables

Bernoulli sum approximates binomial


distribution

Sample proportion also follows CLT


proportion
Sum Of Bernoulli Random Variables

Ber (p) ⇒ 𝐸(𝑋 ) = 𝑝​ &​ 𝑉(𝑋𝑖) = 𝑝(1 − 𝑝)


𝑖

𝑋𝑖 → 0 𝐹𝑎𝑖𝑙𝑢𝑟𝑒 (1 − 𝑝)
→ 1 𝑆𝑢𝑐𝑐𝑒𝑠𝑠 𝑝

𝑃(𝑋𝑖 = 0) = 1 − 𝑝​ ​ ​ 𝑃(𝑋𝑖 = 1) = 𝑝
𝐸(𝑋𝑖) = 𝑝​​ ​ ​ ​ 𝑉𝑎𝑟(𝑋𝑖) = 𝑝(1 − 𝑝)
𝑇 = 𝑋1 + ... + 𝑋𝑛 = total number of successes in our n
𝐸(𝑇) = 𝑛𝐸(𝑋𝑖) = 𝑛𝑝

𝑉𝑎𝑟(𝑇) = 𝑛𝑉𝑎𝑟(𝑋𝑖) = 𝑛𝑝(1 − 𝑝)

𝑇 ≈ 𝑁(𝑛𝑝, 𝑛𝑝(1 − 𝑝))

Arithmetic Average Of Bernoulli Random Variables


Consider a vector (𝑋 , 𝑋 , ..., 𝑋 ) i.i.d. Ber(p)
1 2 𝑛
𝑇
𝑋= 𝑛
= 𝑃 → Sample Proportion (portion of success in n trials)

𝐸(𝑃) = 𝐸(𝑋) = 𝑝
𝑉𝑎𝑟(𝑋𝑖) 𝑝(1−𝑝)
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(𝑋) = 𝑛
= 𝑛
​ ​ CENTRAL LIMIT TH.

For a large 𝑛 :

𝑝(1−𝑝)
𝑃(𝑋) ≈ 𝑁(𝑝, 𝑛
)
Example 1 — Sum of Bernoulli

Each student admitted with probability 𝑝 = 0. 4

𝑛 = 150​ 𝐸(𝑋𝑖) = 𝑝 = 0. 4 ​ 𝑉𝑎𝑟(𝑋𝑖) = 𝑝(1 − 𝑝) = 0. 24

𝑇 = 𝑋1​ + ⋯ + 𝑋150

𝐸(𝑇) = 𝑛𝑝 = 60​​ ​ 𝑉𝑎𝑟(𝑇) = 𝑛𝑝(1 − 𝑝) = 36

⇒ 𝑇 ≈ 𝑁(60 , 36)​ → 𝑆𝐷 = 36 = 6

Using CLT, we can now find probabilities using the normal distribution:​

Even though the original distribution (Bernoulli) is not normal, 𝑇 behaves approximately
normal for 𝑛 = 150

Example 2 — Sample Proportion

𝑇
𝑝 = 0. 4​ 𝑛 = 200​ 𝑃= 𝑛

𝑝(1−𝑝)
𝐸(𝑃) = 𝑝 = 0. 4​ ​ 𝑉𝑎𝑟(𝑃) = 𝑛
= 0. 0012

So:​

To find 𝑃𝑟(𝑃 < 0. 35) :​

This means about 7.45% of samples will have a proportion below 0.35.

Example 1 (same as the one at CLT, but in class solution):


n=80​ i driver → 𝑋 → 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0. 3
𝑖

→ 1 𝑠𝑒𝑙𝑓 − 𝑝𝑜𝑚𝑝 0. 7
𝑖 = 1, 2, ..., 80
a)​ 𝑋 ∼ 𝐵𝑒𝑟(0. 7)
𝑖

𝐸(𝑋𝑖) = 0. 7​ ​ ​ 𝑉𝑎𝑟(𝑋𝑖) = 0. 7 · 0. 3

𝑇 = 𝑋1 + ... + 𝑋80

𝑇 ≈ 𝑁𝑂𝑅𝑀𝐴𝐿 (56, 16. 8)


𝐸(𝑇) = 80𝐸(𝑋𝑖) = 80 · 0. 7 = 56

𝑉𝑎𝑟(𝑇) = 80𝑉(𝑋𝑖) = 80 · 0. 3 = 16. 8

b)​ 60% of 80 → 48 ​ ​ ​ 𝑇 ≈ 𝑁(56, 16. 8)


𝑃(𝑇 ≥ 48) = 1 − 𝑃(𝑇 ≤ 48) = 1 − 𝑝𝑛𝑜𝑟𝑚(48, 56, 𝑠𝑞𝑟𝑡(16. 8)) = 0. 9745

𝑃(𝑥)​ ​ 𝑃(𝑃 ≥ 0. 6)

𝑃 ≈ 𝑁𝑂𝑅𝑀𝐴𝐿(0. 7, 0. 0026)
0.7·0.3
𝐸(𝑃) = 0. 7​ ​ ​ 𝑉𝑎𝑟(𝑃) = 80
= 0. 0026
🍥 Lecture 9
Oct 2, 2025

Lecture 9
Population & Sample
Population:
●​ set of all statistical units on which a characteristic may be measured.
Sample:
●​ any subset of the population.
Description vs Prediction:
●​ analysis should not stop at describing the sample
●​ aim is to make predictions about the population.
Inferential statistics:
●​ techniques to draw conclusions on the entire population from a sample.
Good sample:
●​ representative and obtained via random sampling.

Probabilistic framework
●​ express inferential problems in probability terms to quantify and control the
probability of errors.

Experiment: draw units at random and measure a characteristic on them.

Random variable (X) → mathematical description of the outcome of each draw.

Example

●​ 𝑋 → hours per day spent watching TV by a randomly drawn respondent

●​ µ → mean (expected value) of 𝑋


○​ Write as the operation → μ = 𝐸[𝑋]
2
●​ σ → variance of 𝑋
○​ Write as → σ² = 𝑉𝑎𝑟(𝑋)
●​ Population mean µ → arithmetic average of the number of hours among all
adults (parameter, unknown number).
2
●​ Population variance σ → the variance of the number of hours among all adults
(parameter, unknown).

Random Sample (I.I.D.)

●​ Draw n units at random → obtain n random variables (𝑋 , …, 𝑋 ).


1 𝑛

●​ Random sample: (𝑋 , …, 𝑋 ) are independent and identically distributed (i.i.d.)


1 𝑛

○​ with the same mean μ and same variance σ².


○​ Sample realization → the concrete values (𝑥 , …, 𝑥 ).
1 𝑛

Inferential Problems
Focus → inference on the population mean (μ)

Main tasks
●​ Point estimation
●​ Confidence interval estimation
●​ Hypothesis testing

Statistics
Sample statistic → a function of the random sample that summarizes information
→ a random variable

Point Estimation Problem


Provide a guess or aprox of the unknown population mean

●​ estimator → a sample statistic to provide a guess of μ (population mean)


●​ estimate → the estimator’s observed value on the sample
Point Estimation of the Population Mean

Sample Mean → estimator of the population mean µ

Observed value 𝑋 → estimate of μ


🍥 Lecture 10
Lecture 10
Properties of the Sample Mean
For an i.i.d sample, the sample mean has:

I.​ Mean (expected value) → µ​

𝐸(𝑋) = µ​

Interpretation → the arithmetic average of all possible values of 𝑋 equals µ


(population mean)

⇒ sample mean 𝑋 → unbiased estimator (no systematic error).

2
σ
II.​ Variance → ​
𝑛
2
σ
𝑉𝑎𝑟(𝑋) = 𝑛

●​ The > the sample size → closer to the population mean


●​ Sample mean might overestimate or underestimate population size
●​ The spread of possible 𝑋 values decreases as 𝑛 grows
⇒ sample mean 𝑋 → consistent estimator
σ
●​ standard deviation of the sample mean → standard error of the mean
𝑛

○​ It is the standard deviation of all possible sample means over all possible
samples

III.​ From CLT → for large 𝑛, the distribution approximately normal (Gaussian).
Point Estimation Of The Population Variance And Standard
Deviation
Sample variance

2 2
●​ Sample variance 𝑆 → estimator of the population variance σ
2 2
●​ Observed value 𝑠 → estimate of σ

Unbiased estimator:

2 2
●​ 𝐸(𝑆 ) = σ
2 2
●​ Sample variance 𝑆 → unbiased estimator of the population variance σ

Sample standard deviation

●​ We use the sample standard deviation 𝑆 as an estimator of the population standard


deviation σ
●​ The observed value of the sample standard deviation 𝑠 → estimate of the
population standard deviation σ

Point Estimation Not Enough

Point estimation means using a single number calculated from a sample to estimate an
unknown population parameter.

The formula is called the point estimator, and the resulting number is the point
estimate.

Properties of a “good” point estimator:


1.​ Unbiased → its expected value equals the true parameter​
​ ​ ​ 𝐸(𝑋) = μ
2.​ Consistent → as sample size nn increases, the estimator gets closer to the true
value
3.​ Efficient → among all unbiased estimators, it has the smallest variance
a single point estimate (like 𝑥 ) is not enough. We must also evaluate how precise or
accurate that estimate is.

Why accuracy matters

●​ Different random samples → different sample means ( 𝑥 ).


●​ Hence, even though 𝑥 estimates μ, it varies across samples.
●​ We need a way to quantify that variability — that’s where the Standard Error
(SE) comes in.

Standard Error

The standard deviation of the sampling distribution of 𝑋 is called the Standard Error (SE)
of the mean.​
It measures how much sample means tend to vary from one sample to another.

Formula:

●​ σ → population standard deviation


●​ 𝑛 → sample size

Interpretation:

●​ Smaller SE → higher accuracy (sample mean is more stable)

●​ Larger SE → lower accuracy (sample mean varies more)

Estimation Of The Standard Error Of The Mean

For a sample of size 𝑛 :

Taking the square root:

This 𝑆𝐷(𝑋) is the Standard Error (SE) of the sample mean.


Meaning of SE

●​ It represents the standard deviation of all possible sample means over all
possible random samples of the same size.
●​ It is the expected variability of 𝑋 around the true mean μ.

Estimating the SE of the Mean

This slide explains what to do when σ (the population standard deviation) is unknown,
which is almost always the case in real life.

Since σ is unknown, we estimate it with the sample standard deviation S:

This estimated SE is used to measure the accuracy of the point estimate 𝑥

Example from Lecture (TV Hours)

Given:

●​ 𝑠 = 2. 5783
●​ 𝑛 = 709

Then:

Interpretation

●​ The sample mean ( 𝑥 = 2. 97 ) is expected to vary by about 0.0968 hours across


repeated samples.
●​ This tells us that the estimate of the population mean is quite precise because SE
is small.
Confidence intervals (CI) for μ
Confidence level

( 1 − α ) · 100%

where:

●​ α = probability of being wrong (the “risk” or significance level),


●​ 1 − α = probability that the interval contains the true parameter.

Typical choices:

●​ α = 0. 10 → 90% confidence interval


●​ α = 0. 05 → 95% confidence interval
●​ α = 0. 01 → 99% confidence interval

A confidence interval provides:

●​ A range of values (an interval),


●​ Constructed from the sample data,
●​ That is expected, with a chosen level of confidence, to contain the true population
parameter.

Interpretation

A (1 − α) · 100% confidence interval for a population parameter:

●​ interval of plausible values for that parameter


●​ constructed in such a way that we are (1 − α) · 100% confident it contains the
true value of the parameter

The Logic Behind It

Imagine we repeat the sampling process many times (same population, same sample size).​
Each time, we compute a confidence interval for the mean (μ).

Then:

●​ About 95% of these intervals (if the confidence level is 95%) will contain the true µ
●​ About 5% will not.
Normal Population Known Variance
2
𝑋1 , ..., 𝑋𝑛 i.i.d → Normal, µ unknown, σ known

2
σ
⇒ 𝑋 normal with​ mean → µ​ ​ known variance → ​
𝑛

Standardization of the sample mean

→ has standard normal distribution

Notation For Quantile Of A Standard Normal

𝑧α → quantile of order 1 − α of a standard normal distribution

​ → contains µ with probability 1 − α

Confidence Interval (1 - 𝛼 )100% for 𝜇

The confidence interval of level (1 − α)100% for the unknown population mean µ is:


Frequentist interpretation

Draw repeatedly samples of size 𝑛 from the population → compute for each the
confidence interval of level (1 − α)100% → about (1 − α)100% of the CIs will contain
the unknown parameter µ

Length and Margin of Error

σ
●​ 𝑧α → margin of error
2 𝑛
σ
●​ 2𝑧α → length of the confidence interval
2 𝑛
​ ​ → measure of accuracy
2
For given confidence level and given σ , the greater 𝑛 :

●​ the lower the standard error


●​ the lower the margin of error
●​ the lower is the length of the interval

The greater the confidence level (1 − α) → the lower α → the greater 𝑧 α → the longer
2

the confidence level

Example ​

σ = 30​ 𝑥 = 150

95% confidence interval for the population mean score 𝑥 ⇒


α
⇒ (1 − α) = 0. 95 ⇒ α = 0. 05 ⇒ 2
= 0. 025​
α
𝑧 α = 𝑞𝑛𝑜𝑟𝑚(1 − 2
) = 𝑞𝑛𝑜𝑟𝑚(0. 975) = 1. 96​
2


The confidence interval for the mean score is:

[150 − 3 · 1. 96, 150 + 3 · 1. 96] = [144. 12, 155. 88]​

→ we are 95% confident that the mean score 𝑥 is between 144.12 and 155.88
→ we are 5% confident that the mean score is smaller than 144.12 or higher that 155.88

Normal Population Unknown Variance


2
𝑋1 , ..., 𝑋𝑛 i.i.d → Normal, µ unknown, σ unknown

𝑆 σ
𝑋 → µ​ 𝑆 → σ​ →
𝑛 𝑛

2
When σ is unknown, estimate it with 𝑆 , and use Student’s t distribution:​

(Figure on p. 52 compares (t) densities with the standard normal.)

●​ t-quantile in R:​

t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).​

●​ CI for (\mu) with unknown variance:​



\left(,\bar{x}-t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}};,;\bar{x}+t_{,n-1,;\alpha/2}\frac{s}
{\sqrt{n}},\right).​

●​ Worked example (TV time, (n=30)):​



\bar{x}=2.11,;; s=1.364,;; \alpha=0.10,;; t_{29,,0.05}=\texttt{qt}(0.95,29)=1.699,​

\frac{s}{\sqrt{n}}=\frac{1.364}{\sqrt{30}}=0.249,\quad​
\text{CI}=[2.11-1.699\cdot0.249,;2.11+1.699\cdot0.249]=[1.687,;2.533].​

●​ R shortcut for a CI on a mean:​



\texttt{[Link](variable_name,;[Link]=1-\alpha)}​

(defaults to 95% if (\texttt{[Link]}) is omitted)

What to report with point estimates


●​ Point estimate alone is not enough; report an accuracy measure (SE or CI). (p. 24)
●​ SE gives a first accuracy gauge; CI pairs the estimate with an interval and an
explicit confidence level. (pp. 26, 29–32)

Cheat Sheet — Lecture 10


Core estimators

●​ Sample mean: (\displaystyle \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i).


●​ Sample variance: (\displaystyle S^2=\frac{\sum_{i=1}^n (X_i-\bar{X})^2}{n-1}).
●​ Sample SD: (\displaystyle S=\sqrt{S^2}).

Key properties (i.i.d.)

●​ (\displaystyle E(\bar{X})=\mu) (unbiased).


●​ (\displaystyle Var(\bar{X})=\frac{\sigma^2}{n}) (precision increases with (n)).
●​ Large (n): (\bar{X}) approximately normal.
●​ SE of mean: (\displaystyle \text{SE}=\frac{\sigma}{\sqrt{n}}), estimate
(\displaystyle \widehat{\text{SE}}=\frac{s}{\sqrt{n}}).

CI for μ (Normal population)

●​ Known (\sigma):​
[​
\bar{x}\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\qquad​
z_{\alpha/2}=\texttt{qnorm}(1-\alpha/2).​
]
●​ Unknown (\sigma):​
[​
\bar{x}\pm t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}},\qquad​
t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).​
]
●​ Margin of error: (z_{\alpha/2}\frac{\sigma}{\sqrt{n}}) (or
(t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}})).
●​ Length: (2\times) Margin. Higher confidence ⇒ larger quantile ⇒ longer interval.
(pp. 40, 45–47, 53–55)

Worked examples

●​ Known (\sigma) (aptitude test): (n=100,;\sigma=30,;\bar{x}=150) → 95% CI ([144.12,


155.88]). (pp. 42–44)
●​ Unknown (\sigma) (TV hours): (n=30,;s=1.364,;\bar{x}=2.11) → 90% CI ([1.687,
2.533]). (pp. 56–58)

R helper

●​ (\texttt{[Link](variable_name, [Link]=1-α)}) → returns the CI on the mean


(default 95%)
Tab 20
📘 Point Estimation — Definition
A point estimation is a single numerical value (a “point”) used to estimate an unknown
population parameter.

🔹 1. Population Parameters vs. Sample Statistics


In statistics, we distinguish between:

●​ Parameters: numerical values that describe a population (fixed but unknown)


○​ Example: population mean ( \mu ), population variance ( \sigma^2 )
●​ Statistics: numerical values that describe a sample (known and computable)
○​ Example: sample mean ( \bar{x} ), sample variance ( s^2 )

Because we rarely have access to an entire population, we use sample statistics to


estimate population parameters.

🔹 2. What a “Point Estimator” Is


A point estimator is a statistical formula that provides an estimate for a population
parameter.

●​ For the mean, the estimator is:​


[​
\bar{X} = \frac{X_1 + X_2 + \cdots + X_n}{n}​
]
●​ For the variance, the estimator is:​
[​
S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n - 1}​
]

These are random variables because they depend on the sample (which changes every
time you sample again).

🔹 3. What a “Point Estimate” Is


Once you apply the estimator to your actual sample data, you get a numerical value —
this is the point estimate.

●​ Example:
○​ The formula ( \bar{X} ) is the point estimator of ( \mu ).
○​ The calculated number ( \bar{x} = 2.97 ) is the point estimate.

In symbols:​
[​
\text{Point estimate of } \mu = \bar{x}.​
]

🔹 4. Examples from the Lecture


Population Estimator (formula) Example from Result
parameter lecture (estimate)

Population mean ( ( \bar{X} = \frac{\sum X_i}{n} Mean TV hours ( \bar{x} =


\mu ) ) 2.97 )

Population variance ( S^2 = \frac{\sum (X_i - TV hours ( s^2 =


( \sigma^2 ) \bar{X})^2}{n - 1} ) variance 6.6474 )

Population SD ( ( S = \sqrt{S^2} ) TV hours SD ( s = 2.5783


\sigma ) )

So:

●​ ( \bar{X} ) is the point estimator of ( \mu ),


●​ ( \bar{x} = 2.97 ) is the point estimate.

🔹 5. Properties of a “Good” Point Estimator


A point estimator should ideally be:

1.​ Unbiased → its expected value equals the true parameter​


[​
E(\bar{X}) = \mu​
]
2.​ Consistent → as sample size ( n ) increases, the estimator gets closer to the true
value
3.​ Efficient → among all unbiased estimators, it has the smallest variance

🔹 6. Summary Table
Concept Symbol / Formula Description

Point Function of sample data (e.g., ( Formula that estimates a


Estimato \bar{X} ), ( S^2 )) population parameter
r

Point Numerical result from applying the Specific number used as the
Estimate estimator (e.g., ( \bar{x}=2.97 )) best guess of the parameter

Paramet ( \mu, \sigma^2, \sigma ) True (unknown) value


er describing the population

Goal Estimate ( \mu, \sigma^2, \sigma ) Use sample data to infer


with ( \bar{x}, s^2, s ) population characteristics

✅ In short:
Point estimation means using a single number calculated from a sample to
estimate an unknown population parameter.​
The formula is called the point estimator, and the resulting number is the
point estimate.

Example:​
The average number of TV hours in a sample is 2.97 →​
this is the point estimate for the population mean ( \mu ) (the true average TV hours in
the entire population).
📘 Slide 23 – Title: “Estimators of the population variance
and standard deviation”
This slide explains how we estimate the population variance (σ²) and population
standard deviation (σ) when we only have sample data.

🔹 1. The Goal
In practice, we rarely know the true population parameters (σ² and σ).​
Therefore, we must use sample statistics as estimators of those parameters.

🔹 2. The Estimators
Sample variance (S²)

Mathematical operation:​
[​
S^2 = \frac{(X_1 - \bar{X})^2 + (X_2 - \bar{X})^2 + \cdots + (X_n - \bar{X})^2}{n - 1}​
]

●​ ( X_i ): value of the i-th observation


●​ ( \bar{X} ): sample mean
●​ ( n ): sample size
●​ Denominator (n – 1) is used instead of n to make S² unbiased — this correction is
called the Bessel correction.

Sample standard deviation (S)

Mathematical operation:​
[​
S = \sqrt{S^2}​
]

So once you calculate S², take the square root to get S.

🔹 3. Why we use (n-1) (Unbiasedness Property)


If we used (n) in the denominator, the sample variance would underestimate the true
variance on average.​
Using (n-1) corrects for this bias, making (S^2) an unbiased estimator of σ²:

[​
E(S^2) = \sigma^2​
]

This is called the unbiasedness property of the sample variance.

🔹 4. The Estimates (Observed Values)


When we compute (S^2) and (S) from real data, their observed values are denoted by
lowercase letters:

[​
s^2 = \text{observed sample variance}, \quad s = \text{observed sample standard
deviation}.​
]

These are point estimates of the population parameters σ² and σ.

🔹 5. Example (from the lecture context – TV hours dataset)


From the slide example:

●​ ( n = 709 )
●​ ( s^2 = 6.6474 )
●​ ( s = 2.5783 )

Interpretation:

●​ The sample variance (s²) = 6.6474 is our best estimate of the population variance
(σ²).
●​ The sample standard deviation (s) = 2.5783 is our best estimate of the population
standard deviation (σ).

Thus:​
[​
\hat{\sigma}^2 = 6.6474, \quad \hat{\sigma} = 2.5783.​
]
🔹 6. Summary of Slide 23
Quantity Formula Purpose Notes

Sample (S^2 = \frac{\sum (X_i Estimates population Uses n–1 to be


variance (S²) - \bar{X})^2}{n-1}) variance σ² unbiased

Sample (S = \sqrt{S^2}) Estimates population Square root of


standard standard deviation σ variance
deviation (S)

Unbiasedness (E(S^2) = \sigma^2) Ensures fair Important


estimation of theoretical
population spread property

Point (s^2, s) Numerical values Best


estimates computed from the single-number
data estimates

✅ Key takeaway from Slide 23:


The sample variance (S²) and sample standard deviation (S) are unbiased
estimators of the population variance (σ²) and standard deviation (σ). They
summarize how spread out the sample data are around the mean and serve as
building blocks for later inferential procedures (like the standard error and
confidence intervals).
R functions
[Link]()

📘 1. The context
In R, you started with a contingency table:

tab1 = table(work$transport, work$worker)

which looks like this:

transport salaried self-employed

private 76 56

public 64 24

Then, you used:

[Link](tab1, 2)
[Link](tab1, 1)

🔹 2. Meaning of [Link](x, margin)


The [Link]() function converts the counts in a contingency table into relative
frequencies (proportions).

●​ margin = 1 → proportions by row


●​ margin = 2 → proportions by column

🔹 3. When margin = 2
[Link](tab1, 2)

→ divides each cell by the column total.

So, it answers the question:


“Among salaried workers and among self-employed workers, what proportion
use public vs private transport?”

transport salaried self-employed

private 0.5428571 0.7000000

public 0.4571429 0.3000000

Interpretation:

●​ Among salaried workers → 54.3% use private, 45.7% use public transport.
●​ Among self-employed workers → 70% use private, 30% use public transport.

So the column sums = 1.

🔹 4. When margin = 1
[Link](tab1, 1)

→ divides each cell by the row total.

That means it gives conditional proportions within each transport category.

So, it answers the question:

“Among people who use private transport (and among those who use public
transport), what proportion are salaried vs self-employed?”

transport salaried self-employed

private 0.6440678 0.3559322

public 0.7804878 0.2195122

Interpretation:

●​ Among those who use private transport, 64.4% are salaried and 35.6% are
self-employed.
●​ Among those who use public transport, 78% are salaried and 22% are
self-employed.

So the row sums = 1.


🔹 5. Without specifying a margin
If you just run:

[Link](tab1)

It divides by the total sum of all cells, giving overall proportions (out of all 200
respondents).​
The total of all cells = 1.

✅ Summary Table
Command Divides by Shows proportions Use when you want to
know…

[Link]( Row totals Within each transport “Given transport, what is


tab1, 1) type, how many are the working status?”
salaried vs self-employed

[Link]( Column totals Within each worker type, “Given working status, what
tab1, 2) how many use private vs is the transport used?”
public transport

[Link]( Grand total Across all observations Overall joint distribution


tab1)

✅ In short
●​ 2 → column proportions → “within each worker group”
●​ 1 → row proportions → “within each transport group”
🍥 Lecture 13
NORMAL POPULATION KNOWN VARIANCE - intuition on the test’s derivation

Ex: Is there enough empirical evidence to presume that the mean weight of all produced
cereals packages is greater than 16?
Xi = weight of a cereal package drawn at random

Test

Against
P - VALUE
🍥 Lecture 14

You might also like