0% found this document useful (0 votes)

4 views118 pages

Statistics Lecture 1 Overview

Lecture 1 introduces key statistical concepts, including types of data (micro, macro, cross-sectional, longitudinal), the difference between population and sample, and the definitions of parameters and statistics. It emphasizes the importance of understanding variable types (categorical vs. numerical) and lays out the course goals and modules. Lecture 2 expands on statistical variables, frequency distribution, graphs, and measures of central tendency, including mode, median, and mean.

Uploaded by

sbbhjkrmmn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views118 pages

Statistics Lecture 1 Overview

Uploaded by

sbbhjkrmmn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

🍥 Lecture 1

Sep 4, 2025

Lecture 1: Introduction
● MICRO data - on an individual level

● MACRO data - aggregate level

● CROSS-SECTIONAL data - only once, at a fixed time

● LONGITUDINAL data - collected at different periods in time:

○ TIME SERIES

○ PANEL DATA

○ REPEATED CROSS-SECTION

★ Primary data - the one concluded by scientists

★ Secondary data - the one given to us

POPULATION VERSUS SAMPLE

- Statistical units

PARAMETER - numerical summary at population level (average nr of x in the last year by

all people aged 15 to 60 living in Milano)
- It is a number that we will never know
STATISTIC - numerical summary at sample level

TYPES OF VARIABLE

● categorical - not numbers

○ nominal - no ranking (ex: what course do you study > list of courses are
categorical nominal variables)
○ ordinal - ranking (ex: what is your hughes level of education > sec
school, bachelor, master, PhD)
● Numerical
Here are advanced bullet-point notes from Lecture 1 of your macroeconomics/statistics
material, based strictly on the uploaded text:

Lecture 1 Notes
SCOPE & CONTENT OF THE COURSE

● Statistics in everyday life: Numbers dominate society; statistics appear everywhere.

● Statistic = science of:
1. Designing studies
2. Analysing collected data
3. Translating data into knowledge for decisions & predictions
● Course goal: learn a method = problem-solving via knowledgeable use of
statistics:
1. Understand theory
2. Interpret results correctly
3. Apply techniques using R
● Learning materials: lecture notes, slides, Blackboard tests
● Software: R & RStudio (free to download)
● Modules:
1. Describing data
2. Elements of Probability
3. Inferential Statistics – Estimation
4. Inferential Statistics – Hypothesis Testing
5. Linear Regression

TERMINOLOGY & NOTATION

● Population (size N): all statistical units of interest
● Sample (size n): subset of population
● Inferential process: draw conclusions about population from sample
● Random sampling:
○ Units drawn one at a time
○ Equal probability for each unit
○ Equal probability for samples of same size
● Parameter: numerical summary at population level
● Statistic: numerical summary at sample level
○ Example:
■ Parameter = average books read by all people 15–60 in Milan
■ Statistic = average books read by random sample of people 15–60 in
Milan
● Descriptive statistics: describe data using sample statistics
● Inferential statistics: learn about population parameters via sample statistics

📑 CHEAT SHEET (QUICK REFERENCE)

● Statistic = science of designing, analysing, translating data into knowledge.
● Population (N) vs. Sample (n).
● Parameter = population-level measure.
● Statistic = sample-level measure.
● Random sample = each unit/sample has equal chance.
● Descriptive statistics = describe sample data.
● Inferential statistics = infer about population from sample.
● Exams:
○ Test (8 MCQs, 16 pts, no penalties).
○ Written exam (3 open questions, 15 pts, R required).
○ Options: 2 partials (average) or 1 general exam (sum).
● Modules: Describing data → Probability → Estimation → Hypothesis testing →
Regression.
● Software: R & RStudio required.
🍥 Lecture 2
Sep 9, 2025

Lecture 2
STATISTICAL VARIABLES

● Statistical variable = result of measurement process; each question → variable

● Types of variables:
○ Categorical (non-numerical)
■ Nominal = no ranking (e.g., music type, program attended)
■ Ordinal = ranked (e.g., education level, agreement scale)
○ Numerical
■ Discrete = counting process (e.g., # books read, # siblings)
■ Continuous = measurable, not countable (e.g., height, commuting
time, house price)
● Not a priori: same characteristic can be categorical or numerical depending on
measurement (e.g., age as number vs. grouped intervals)
● Statistical tool selection depends on type of variable

FREQUENCY DISTRIBUTION TABLES

● Best when few distinct values

● First column = distinct values
● Second column =
○ Absolute frequency (counts)
○ Relative frequency (proportion)
● Examples:
○ Education levels → high school most frequent (50.9%)
○ Working status → full-time 47.7%, retired 18.2%, unemployed 4.8%

GRAPHS

● Pie chart:
○ whole circle = total
○ slice = share
● Bar chart:
○ height of bar = frequency

HISTOGRAMS

● Used for numerical variables with many distinct values

● Steps:
○ Group into intervals (classes)
○ Count absolute/relative frequency
● Intervals:
○ Equal length → how many?
○ Unequal length → decide bounds
● Histogram rules:
○ Horizontal axis = intervals
○ Bar’s area = relative frequency
○ Vertical axis = density = (relative frequency ÷ interval length)

SYNTHETIC MEASURES

● Categories:
○ Central tendency (mode, median, mean)
○ Location
○ Dispersion

CENTRAL TENDENCY

● Purpose: condense data into one typical value

● Mode
● Median
● Mean (arithmetic average)

MODE

● Most frequent value (highest frequency)

● Only valid for nominal categorical variables.

MEDIAN
● Definition: The median is the central value of an ordered dataset.
○ It splits the dataset into two halves:
■ 50% of units below the median
■ 50% of units above the median

Median from Raw Data

1. If n is odd:
𝑛+1
a. value with position
2

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2

2. If n is even:
a. Two middle observations
b. If numerical → average of the two
𝑥𝑛 +𝑥𝑛
+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2

2
2

Example: Data = [1, 3, 1, 2, 5, 3]

● Ordered: [1, 1, 2, 3, 3, 5]
● n=6 (even) → Median = (2+3)/2=2.5

Cumulative Frequencies

How to compute the median using cumulative frequencies, both for discrete and
continuous data.

● Definition:
The cumulative relative frequency for each value is the proportion of
observations with values smaller than or equal to that value.
● Applicable to:
○ Ordinal variables (can be ranked)
○ Numerical variables
● Purpose: Helps locate medians, percentiles, and quantiles.

Example: Degree (Education Level)

Level Absolute Relative Cumulative

Frequency Frequency Frequency

Less than high 82 82709=0.116 0.116

school

High school 361 361709=0.509 0.116+0.509=0.625

Junior college 50 50709=0.071 0.625+0.071=0.696

Bachelor 141 141709=0.199 0.696+0.199=0.895

Graduate 75 75709=0.106 0.895+0.106= 1

Median from Frequency Distribution

● Case 1: Cumulative frequency ≠ 0.5

○ The median is the first value at which the cumulative frequency exceeds
0.5.

● Case 2: Cumulative frequency = 0.5

○ The median can be:
■ That value,
■ Any value in the interval with that value as lower bound,
■ Or the average of interval bounds (for numerical variables).
● If cumulative frequency hits exactly 0.5 (not just greater), then:
○ Median is that value,
○ Or if grouped data: any point in that interval,
○ Or midpoint between interval bounds.
● Here, cumulative frequency = 0.5 exactly at value 1.

→ Median = 1 (times per week).

Median from a Histogram

● For continuous/grouped data shown in histograms:

○ Find the interval where the cumulative frequency first exceeds 0.5.
○ Interpolate within that interval if needed.
● Graphically:
○ The median is the vertical line splitting the histogram’s total area into two
equal halves (50% each).

● If the cumulative frequency surpasses 0.5 → median is that value (or interval).
● If it equals 0.5 exactly → median is that value, interval, or midpoint.
● In histograms → median divides the area into two equal halves.

MEAN

● Mean (Arithmetic average) = sum of all values ÷ number of observations

● Applicable only to numerical variables

Raw data

formula:
𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

Deviation

● Deviation of observation 𝑥𝑖:

𝑑𝑖 = 𝑥𝑖 − 𝑥
Properties of the Mean

1. Balancing Point:

a. Sum of deviations = 0

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1

2. Minimization:
○ Minimizes the sum of squared deviations

2 2 2
(𝑥1 − 𝑎) + (𝑥2 − 𝑎) + ... + (𝑥𝑛 − 𝑎)

Mean from Frequency Distribution

The slide explains how to compute the mean 𝑥 when data is summarized in a frequency
distribution table.

● When values are repeated with certain frequencies, instead of summing every
observation individually, we use the weighted mean formula:

𝑘
∑ 𝑥𝑖· 𝑓𝑖
𝑖=1
𝑥= 𝑛

Where:

● 𝑥𝑖 = distinct value of the variable

● 𝑓𝑖 = absolute frequency (number of times xi appears)

● 𝑛 = total sample size

Alternatively, using relative frequencies (𝑝𝑖 = 𝑓𝑖 / 𝑛):

𝑘
𝑥 = ∑ 𝑥𝑖 · 𝑝𝑖
𝑖=1

Value 𝑥𝑖 Absolute frequency 𝑓𝑖 Relative frequency 𝑝𝑖

0 60 0.30

1 40 0.20

2 60 0.30

3 20 0.10

4 20 0.10

Total 200 1.00

Step 1: Formula with absolute frequencies

(0 ⋅ 60)+(1 ⋅ 40)+(2 ⋅ 60)+(3 ⋅ 20)+(4 ⋅ 20)

𝑥= 200
= 1. 5

Step 2: Formula with relative frequencies

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10) = 1. 5

Key Idea

● The mean from a frequency distribution is a weighted average.

● Weights are either the absolute frequencies (fi) or the relative frequencies (pi).
● This avoids listing all individual data values when summarizing grouped data.
MEAN VS. MEDIAN

● Mean: considers all values

● Median: depends on ranking/frequencies only
● Outliers:
○ Median = unaffected
○ Mean = affected (pulled toward outlier)
● Example (siblings distribution):
○ Median = 3
○ Mean = 3.54 → influenced by few with many siblings

OUTLIERS

● Definition: unexpectedly high or low values with low frequency

● Strong effect on
mean, weak or no
effect on median
🌸 cheat sheet 2
📑 CHEAT SHEET (QUICK REFERENCE)
● Variable types:
○ Categorical → Nominal (no rank), Ordinal (ranked)
○ Numerical → Discrete (counts), Continuous (measurements)
● Frequency distribution:
○ Absolute frequency = count
○ Relative frequency = proportion
● Graphs: Pie chart (share), Bar chart (frequency), Histogram (distribution of
continuous/discrete data)
● Histogram density = relative frequency ÷ interval length
● Central tendency measures:
○ Mode = most frequent value (nominal OK)
○ Median = central ranked value (ordinal/numerical)
○ Mean = arithmetic average (numerical only)
● Formulas:

○ Mean: 𝑥 = (Σ𝑥𝑖)/𝑛

○ Weighted mean: xˉ=Σ(value×relativefrequency)

● Outliers: affect mean, not median
● Mean formula: xˉ=Σxi/n
● Deviation: xi–xˉ
● Properties of mean:
𝑛
○ Balancing point: ∑ (𝑥 − 𝑥) = 0
𝑖
𝑖=1

○ Minimizes Σ(xi–a)2 at a=xˉ

● Weighted mean: xˉ=Σ(value×relativefrequency)
● Mean vs Median:
○ Mean → sensitive to outliers
○ Median → robust to outliers
● Outliers: rare, extreme values
● Formula for the mean:

𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

2. Property (Balancing Point):

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1
📖 formulas
1. Mean (Arithmetic Average) – Raw Data

𝑥1+𝑥2+⋯+𝑥𝑛
𝑥= 𝑛

● where 𝑥1, 𝑥2, …, 𝑥𝑛 are the observed values of the sample.

2. Property of the Mean (Balancing Point)

𝑛
∑ (𝑥𝑖 − 𝑥) = 0
𝑖=1

3. Property of the Mean (Minimization)

𝑛
2
𝑚𝑖𝑛⁡𝑎 ∑ (𝑥𝑖 − 𝑎) 𝑜𝑐𝑐𝑢𝑟𝑠 𝑤ℎ𝑒𝑛 𝑎 = 𝑥
𝑖=1

4. Weighted Mean (from Frequency Distribution)

If values 𝑥𝑖 have relative frequencies 𝑝𝑖:

𝑘
𝑥 = ∑ 𝑥 𝑖 · 𝑝𝑖
𝑖=1

5. Histogram Density

For an interval of length 𝐿 with relative frequency 𝑓:

𝑓
𝐷𝑒𝑛𝑠𝑖𝑡𝑦 = 𝐿
6. Example – Sports Practice Data

Frequencies:

● 0 (30%), 1 (20%), 2 (30%), 3 (10%), 4 (10%).

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10)

𝑥 = 0 + 0. 20 + 0. 60 + 0. 30 + 0. 40 = 1. 5

7. Median (Raw Data)

● If n odd:

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑥 𝑛+1
2

● If n even:

𝑥 𝑛 +𝑥 𝑛
+1
𝑀𝑒𝑑𝑖𝑎𝑛 = 2

2
2

8. Median (Frequency Distribution – Cumulative Relative Frequency)

● Case 1: if no cumulative frequency = 0.5 →

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑤ℎ𝑒𝑟𝑒 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 > 0. 5

● Case 2: if cumulative frequency = 0.5 at value xj:

𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙+𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

𝑀𝑒𝑑𝑖𝑎𝑛 = 2
🍥 Lecture 3
Sep 11, 2025

Lecture 3: Measures of location

● Identify the value

● Location of a value in the sequence of orderer observations -> ordinal and

numerical variables
● Rank observations and divide in quarters:
- Lower quartile -> 𝑄1
- Second quartile -> median
- Upper quartile -> 𝑄3

QUARTILES FROM A FREQUENCY DISTRIBUTION

● Lower quartile -> First value where the cumulative frequency reaches or
exceeds 0.25

● Upper quartile -> First value where the cumulative frequency reaches or
exceeds 0.75

PERCENTILES
𝑡ℎ
𝑝 percentile value such that p% of observations are below

value relative freq cumulative freq

0 0.30 0.30

1 0.20 0.50

2 0.30 0.80

3 0.10 0.90

4 0.10 1
FIVE NUMBERS SUMMARY

1. Minimum
2. Lower quartile
3. Median
4. Upper quartile
5. Maximum

Height of the box (𝑄3 − 𝑄1) = INTERQUARTILE RANGE IR

𝑄3 − 𝑄1 = 𝐼𝑅
𝑄1 − 1. 5𝐼𝑅 < 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
𝑄3 + 𝐼𝑅 > 𝑂𝑈𝑇𝐿𝐼𝐸𝑅
OUTLIER DETECTION FROM BOXPLOT

● Interquartile range IR -> upper quartile – lower quartile

● Outlier -> any value falling below the lower quartile or above the upper
quartile by more than 1.5 IR

WHISKERS – NO OUTLIERS

No outliers:

● lower whisker connects the lower quartile to the minimum value

(𝑄1 − 1. 5𝐼𝑅)

● upper whisker connects the upper quartile to the maximum value

(𝑄3 + 1. 5𝐼𝑅)
MEASURES OF VARIABILITY
● Applied to numerical variables

Range

● max – min
● Affected by outliers

IQR (Interquartile range)

● Q3 – Q1 = IQR
● variability of central 50%

Variance and Standard Deviation

Deviation = value - mean

Variance (s²)
2 2 2
2 (𝑥1−𝑥) +(𝑥2−𝑥) +⋯+(𝑥𝑛−𝑥)
𝑠 = 𝑛−1

→ variability around mean

Standard deviation (s) = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠

COEFFICIENT OF VARIATION (CV)

● Formula:
𝑠
𝐶𝑉 =
|𝑥|

● standard deviation expressed as a percentage of the mean

Variability Comparison

● Variance / Standard Deviation → variables with the same mean

● Coefficient of Variation → variables with different means or unit of measurement
Example:
Sample of 100 students
SAT scores → mean 550, stand dev 100
ACT scores → mean 18, stand dev 6
SAT scores → CV = 100/550 = 0.19
ACT scores → CV = 6/18 = 0.34
→ ACT more variable relative to its mean
🌸 cheat sheet 3
📑 CHEAT SHEET (QUICK REFERENCE)
● Quartiles: Q1 = 25%, Q2 = 50% (median), Q3 = 75%
● Five-number summary: min, Q1, median, Q3, max
● Boxplot:
○ Whiskers = min/max (no outliers) OR within 1.5×IQR
○ Outliers = beyond 1.5×IQR
● Variability measures:
○ Range = max – min
○ IQR = Q3 – Q1
○ Variance (s²) = average squared deviations (n–1 denominator)
○ SD (s) = √variance
● Coefficient of Variation (CV): s/∣xˉ∣ → compares across scales
● Interpretation:
○ Range & variance affected by outliers
○ Median & IQR robust to outliers
🍥 Lecture 4
Sep 12, 2025

Lecture 4: Shape of Distributions

When we look at a numerical variable (like income, grades, height, etc.), we want to know
the shape of its distribution.

SHAPE OF DISTRIBUTIONS

● Applies to numerical variables

● Tools: Histogram or Boxplot

Symmetric distribution

● left and right sides mirror each other

● MEAN = MEDIAN

SYMMETRIC DISTRIBUTION

● Bell-shaped distribution
○ equal spacing of quartiles
● U-shaped distribution
○ median–Q1 > Q1–min

ASYMMETRIC DISTRIBUTION (SKEWED)

● Right-skewed (positively skewed):

○ Mean > Median.

● Left-skewed (negatively skewed):

○ Mean < Median.

MEAN AND MEDIAN:

● Symmetric distribution → mean = median

● Skewed-right distribution → mean > median
● Skewed-left distribution → mean < median

CONCENTRATION ANALYSIS
How equally or unequally is a variable (like income, wealth, sales, etc.) distributed among
individuals or units? Can one unit hold a huge share of the total, while others have little?

Used for numerical positive variables with transferable characteristics (e.g., income)
● Is the variable equally distributed among units?
● Do a few units hold most of the amount?

EXAMPLE:

4 households’ incomes: 40, 20, 30, 70

● Total = 160
● Mean income = 40

Extreme Scenarios:

Perfect equality:

● all units same amount

● Mean income 40 (160/4)

Maximum concentration:

● one unit has all (160), others zero

Goal: compare observed distribution to these extremes

How To Measure Concentreation:

1. Concentration curve

2. Concentration index

CONCENTRATION CURVE

A graphical tool to see inequality.

NOTATION

Variable 𝑋 → values 𝑥 , 𝑥 , ..., 𝑥 ranked in ascending order

1 2 𝑛

Compute coordinates:

For each unit 𝑖 = 1 ... 𝑛

𝑖 𝑥1+𝑥2+ ... + 𝑥𝑖
𝐹𝑖 = 𝑛
𝑄𝑖 = 𝑥1+𝑥2+ ... + 𝑥𝑛

𝐹0 = 𝑄0 = 0 𝐹𝑛 = 𝑄𝑛 = 1

Curve joining the n+1 points (Fi,Qi)

● 𝑄 → proportion of the total variable’s amount held by the bottom 𝐹 proportion

𝑖 𝑖
● (𝑄 · 100) % → percentage of the total variable’s amount held by the bottom
𝑖
(𝐹𝑖 · 100) % of the sample

Example

● Point (0.25, 0.125) →

○ 12.5% of the total income held by the bottom 25% (poorest)
○ 87.5% of the total income is held by the top 75% (richest)
● Point (0.75, 0.5625) →
○ bottom 75% holds 56.25% of the total income
○ top 25% holds 43.65% of the total income

COORDINATES: Extreme situation

● Perfect equality: line Q = F
● Perfect concentration: curve flat at 0 until last unit

Interpretation:

● If the curve is close to the diagonal line → distribution is equal.

● If the curve is close to the x-axis → distribution is very unequal.
● Closer to diagonal → lower concentration
● Closer to x-axis → higher concentration

Properties:

● Non-decreasing → monotone
● Convex → 𝐹 > 𝑄
𝑖 𝑖

CONCENTRATION INDEXES

GINI’S INDEX (R):

(𝐹1−𝑄1)+(𝐹2−𝑄2)+...+(𝐹𝑛−1−𝑄𝑛−1)
𝑅= 𝐹1+𝐹2+...+𝐹𝑛−1
● Perfect equality → 𝐹 = 𝑄 → 𝑅 = 0
𝑖 𝑖
● Maximum concentration → 𝑎𝑙𝑙 𝑄 = 0 , 𝑄 = 1 → 𝑅 = 1
𝑖 𝑛
● Range: [0,1]
● higher = more concentrated

PIETRA’S INDEX (P):

● Perfect equality → 𝑃 = 0
● Maximum concentration → 𝑃 = 1

INTERPRETATION:

● Values closer to 0 → the lower the concentration (more equal)

● Values closer to 1 → the greater the concentration (more concentrated)

Example:
🌸 cheat sheet 4
CHEAT SHEET (QUICK REFERENCE)
● Shape:
○ Symmetric → mean = median
○ Right-skewed → mean > median
○ Left-skewed → mean < median
● Concentration:
○ Perfect equality: all equal
○ Max concentration: one unit holds all
● Concentration curve:
○ Fi=i/n, Qi=cumulative xtotal xQi=total xcumulative x
○ Perfect equality → diagonal line
○ Max concentration → flat at 0 until last unit
● Indexes:
○ Gini (R): [0,1]; higher = more concentrated
○ Pietra (P): [0,1]; higher = more concentrated
🍥 Lecture 5
Sep 12, 2025

Lecture 5
AIM OF BIVARIATE DESCRIPTIONS

Bivariate statistics = techniques to study the joint behavior of two variables at sample
level

ASYMMETRIC PERSPECTIVE
Investigating in bivariate association → assessing dependency

● Does one variable depend on the other?

● Can one variable’s variability be explained by another?

Roles of variables

● Dependent variable → measures what we want to explain

● Independent variable → measures proposed explanation

TECHNIQUES
Depends on the variable’s type
CASES

● Both variables (dependent & independent) → categorical

● Dependent → numerical; independent → categorical
● Both variables (dependent & independent) → numerical
CONTINGENCY TABLES

Cross-table (Cross - Tab)

● Rows = values of one variable

● Columns = values of another variable
● Cells = joint counts or joint proportions
● Marginal distributions = univariate distributions (row totals, column totals)

Example: Streaming Activity Vs Subscription

● Streaming (dependent variable): Light, Moderate, Heavy

● Subscription (independent variable): Free, Standard, Premium

Univariate descriptives show distributions separately, but cannot detect association

Cross-tab of example:

Marginal Distributions
Relative Frequencies

● Each cell can be expressed as a relative frequency of total sample

● Example:
○ Free & Light = 0.21
○ Premium & Heavy = 0.13
● Marginal totals represent univariate distributions

CONDITIONAL FREQUENCY DISTRIBUTIONS

● Independent variable splits the sample into groups (sub-samples)
● Within each group, compute relative frequencies of the dependent variable =
conditional frequency distribution
● Compare conditional distributions:
○ Association assessment → comparison
■ Similar cond freq distr → no association
■ Different cond freq distr → association

Example

● Independent = Subscription (Free, Standard, Premium)

● Dependent = Streaming activity
● Compute 3 separate conditional frequency distributions:
○ Streaming | Free
○ Streaming | Standard
○ Streaming | Premium
🌸 cheat sheet 5
📑 CHEAT SHEET (QUICK REFERENCE)
● Bivariate statistics: study joint behavior of 2 variables, assess association
● Dependent variable: explained; Independent variable: explanation
● Cases:
○ Categorical vs categorical
○ Numerical vs categorical
○ Numerical vs numerical
● Contingency table (cross-tab):
○ Rows = categories of one variable
○ Columns = categories of another
○ Cells = joint counts/proportions
○ Marginals = univariate totals
● Relative frequencies: cell count ÷ total
● Conditional frequency distribution: relative distribution of dependent variable
within each group of independent variable
● Association test:
○ Conditional distributions similar → no association
○ Conditional distributions different → association present
🍥 Lecture 6
Lecture 6
Example (lecture 5)
● Relative frequency distribution of Streaming separately for each of the three
groups:
○ CFD of Streaming given Subscription = Free

○ CFD of Streaming given Subscription = Standard

○ CFD of Streaming given Subscription = Premium

Streaming vs Subscription example:

○ Free users: mostly Light (0.56)

○ Standard users: mostly Moderate (0.55)
○ Premium users: mostly Heavy (0.54)

→ Streaming depends on Subscription

Association:

● Association between Streaming and Subscription → Streaming depends on

Subscription

Charts of Conditional Distributions

Stacked bar plot

→ shows conditional distribution stacked in each group

Side-by-side bar plot

→ compares conditional frequencies across groups

Means Comparison

● Dependent variable: numerical (e.g., streaming minutes/day)

● Independent variable: categorical (e.g., device used)
● Procedure:
○ Split sample by groups of IV
○ Compute group means of DV
○ Compare means
○ If means close → no association; if different → possible association
● Example: Device used for streaming
○ Phone: mean = 143.7 min, SD = 38.6
○ Laptop: mean = 154.8 min, SD = 36.4
○ SmartTV: mean = 174.7 min, SD = 36.5
○ → Suggests association between device and streaming time
Association Between Numerical Variables

Bivariate Association

Scatterplot

○ X-axis → independent variable

○ Y-axis → dependent variable
● Example: Internet penetration vs Facebook penetration
○ Higher internet use → higher Facebook use
○ Trend shows positive relationship

Linear Association

● Positive: high values of one variable with high values of other

● Negative: high values of one with low values of other

Measures of linear association

Covariance

Formula:
(𝑥1−𝑥)(𝑦1−𝑦)+(𝑥2−𝑥)(𝑦2−𝑦)+...+(𝑥𝑛−𝑥)(𝑦𝑛−𝑦)
𝐶𝑜𝑣(𝑋, 𝑌) = 𝑛−1
● 𝑥 → value of X observed on unit 1
1
● 𝑦 → value of Y observed on unit 1
1

● 𝑥 → mean of X
● 𝑦 → mean of Y

Interpretation:

● Positive covariance → positive linear association

● Negative covariance → negative linear association
● Covariance = 0 → no linear association

Example: Internet vs Facebook penetration → Cov = 219.9 (positive)

● Points in I and III → positive

covariance
● Points in II and IV → negative
covariance

● Majority of points in quadrants I and III → positive linear association

● Overall trend → line with positive slope

● Majority of points in quadrants II and IV → negative linear association

● Overall trend → line with negative slope
Pearson’s Correlation Index (r)

Formula:
𝐶𝑜𝑣(𝑋, 𝑌)
𝑟= 𝑠𝑋 · 𝑠𝑌

● 𝑠 standard deviation of X
𝑋

● 𝑠 standard deviation of Y
𝑌

Range: −1≤𝑟≥1

Measures:

● Direction of the linear association

● Strength of the linear association

Interpretation:

● r = 0 → no linear correlation (linear

association)
● r > 0 → positive linear association
● r < 0 → negative linear association
● The closer |r| to 1 → the stronger the
association
🌸 cheat sheet 6
📑 Cheat Sheet (Quick Reference)
● Conditional frequency distributions:
○ Compare groups → Similar = no association; Different = association
● Means comparison (numerical DV vs categorical IV):
○ Compare group means → similar = no association; different = association
● Scatterplot: visual relationship (X = IV, Y = DV)
● Covariance:
■ = positive trend
○ – = negative trend
○ 0 = no linear trend
● Pearson’s correlation (r):
○ Between -1 and +1
○ Measures direction & strength of linear association
○ Example: r = 0.614 → moderate-high positive
🍥 Lecture 7
Lecture 7: Probability & Random
Variables
The Need for Probability
● Probability → quantifies uncertainty
● Inferential process → conclusions affected by uncertainty
● Mathematical → probabilistic framework

Random Variables
Random Experiments

● Experiment = process with uncertain outcome

● Examples:
○ Roll a die → outcomes {1,2,3,4,5,6}
○ Draw a household → measure income
○ Draw a firm → measure employees
○ Draw an adult → measure TV hours or museum visits

Random Variable

● Random variable (X) → function associates a number to each experiment outcome

○ Numerical measurement of an outcome of an experiment

EXAMPLE:

Toss two coins. Number of observed heads

● Variable → 0, 1, 2 possible values
● Random → unknown value before the experiment

Types:

○ Discrete: finite/countable outcomes (e.g., coin tosses)

○ Continuous: uncountable outcomes over an interval
Discrete Random Variables

X discrete with finite number of values → 𝑥 , 𝑥 , ..., 𝑥

1 2 𝑘

Probability Mass Function (pmf)

The pmf, for each value 𝑥𝑖, returns:

𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖 for 𝑖 = 1, 2, ..., 𝑘

Conditions:

● 0 ≤ 𝑝𝑖 ≤ 1
● 𝑝 + 𝑝2 + ... + 𝑝𝑘 = 1
1

Example (two coins, X = # of heads):

● Values: {0,1,2}
● Probabilities: 0.25, 0.50, 0.25

Expected Value (Mean):

X discrete with finite number of values → 𝑥 , 𝑥 , ..., 𝑥

1 2 𝑘

𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖

𝐸(𝑋) = μ = 𝑥1𝑝1 + 𝑥2𝑝2 + ... + 𝑥𝑘𝑝𝑘

Example:

No. of heads in the toss of 2 coins

Expected value of the number of heads:

𝐸(𝑋) = 0 · 0. 25 + 1 · 0. 5 + 2 · 0. 25 = 1
Variance and Standard Deviation of a Random Variable

Variance

2 2 2 2
𝑉𝑎𝑟(𝑋) = σ = (𝑥1 − μ) 𝑝1 + (𝑥2 − µ) 𝑝2 + ... + (𝑥𝑘 − µ) 𝑝𝑘

Standard deviation

2
σ = 𝑉𝑎𝑟(𝑋) = σ

Median, quartiles, percentiles

● defined same as descriptive statistics

● e.g., median → value such that there is 50% probability of observing a smaller value

Continuous Random Variables

● Can take any value in an interval
● Described by probability density function (pdf):
○ Must be non-negative
○ Total area under curve = 1
● Probability of interval: area under curve between bounds

Expected value, variance, quantiles: interpreted as in descriptive statistics

Quantile of order k

● Let k, 0 < k < 1

● Quantile of order k of the random variable X → value q such that:
○ There is a probability equal to k
○ To observe a value ≤ q

𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘
🌸 cheat sheet 7
📑 Cheat Sheet (Quick Reference)
● Probability = quantification of uncertainty; used in decisions & inference.
● Random variable (X) = numerical outcome of random experiment.
● Discrete random variable:
○ pmf: Pr(X=xi)=pi, ∑pi=1
○ Mean: E(X)=∑xipi
○ Variance: Var(X)=∑(xi−μ)2pi
○ SD: σ=Var(X)
● Continuous random variable:
○ pdf: non-negative, total area = 1
○ Probabilities = areas under pdf
○ Quantiles: Pr(X≤q)=k
● Median = value with 50% probability below it.
🍥 Lecture 8
Lecture 8
Linear Transformation
for a random variable X, constants a, b, define:

𝑌 = 𝑎 + 𝑏𝑋 𝑌 → linear transformation of 𝑋

Expected Value & Variance of Y

𝐸(𝑌) = 𝐸(𝑎 + 𝑏𝑋) = 𝑎 + 𝑏𝐸(𝑋)

2
𝑉𝑎𝑟(𝑌) = 𝑉𝑎𝑟(𝑎 + 𝑏𝑋) = 𝑏 𝑉𝑎𝑟(𝑋)

Example

Payment 𝑃 = 10000 + 1. 5𝑆 , where S is # sold,

𝐸(𝑆) = 30000, 𝑆𝐷(𝑆) = 8000

𝐸(𝑃) = 𝐸(10000 + 1. 5𝑆) = 10000 + 1. 5𝐸(𝑆) = 10000 + 1. 5 · 30000 = 55000

2
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(10000 + 1. 5𝑆) = 𝑉𝑎𝑟(1. 5𝑆) = 1. 5 𝑉𝑎𝑟(𝑆)

2 2 2 2
𝑉𝑎𝑟(𝑃) = 1. 5 · 𝑆𝐷 (𝑆) = 1. 5 · 8000 = 12000

Standardization
2
For X , random variable, 𝐸(𝑋) = µ 𝑉𝑎𝑟(𝑋) = σ

Standardized variable

𝑋−µ
𝑍= σ

Expected value & Variance of Z

𝑋−µ 1 µ
𝑍= σ
= σ
𝑋 − σ

● Expected Value:

1 µ µ µ
𝐸(𝑍) = σ
𝐸(𝑋) − σ
= σ
− σ
= 0

● Variance:
2
1 σ
𝑉𝑎𝑟(𝑍) = 2 𝑉𝑎𝑟(𝑋) = 2 = 1
σ σ

Bernoulli Distribution
Experiment with two outcomes → success / failure

● 𝑝 → success probability
● 𝑋 → nr of times we observe success
● Value 1 → probability 𝑝
● Value 0 → probability 1 − 𝑝

Notation: 𝑋 ∼ 𝐵𝑒𝑟(𝑝)

Examples:

● Number of heads in one toss of a coin → Bernoulli random variable with success
probability p=0.5
● Number of 6 observed in rolling once a dice → Bernoulli random variable with
success probability p = 1/6

Probability distribution

𝑋 ∼ 𝐵𝑒𝑟(𝑝)

Distribution parametrized by 𝑝

𝑃𝑟(𝑋 = 𝑥𝑖) = 𝑝𝑖

Expected Value & Variance

𝐸(𝑋) = 0 · (1 − 𝑝) + 1 · 𝑝 = 𝑝

2 2
𝑉𝑎𝑟(𝑋) = (0 − 𝑝) (1 − 𝑝) + (1 − 𝑝) 𝑝 = 𝑝(1 − 𝑝)

Normal (Gaussian) Distribution

Density
2
𝑋 ∼ 𝑁(µ, σ )
2
(𝑥−µ)
−
1 2σ
2

𝑓(𝑥) =
2
·𝑒
2πσ

● µ real number
2
● σ positive real number

● µ → expected value and median (mean = median = mode)

2
● σ → variance

Standard normal distribution

2
µ = 0 σ = 1 𝑍 ∼ 𝑁(0, 1)

Standardization of normal random variable

2 𝑋−µ
𝑋 ∼ 𝑁(µ, σ ) → 𝑍= σ
Commands for normal distribution
how to compute probabilities using R functions

Cumulative probability
2
For 𝑋 ∼ 𝑁(µ, σ )

pnorm(x,µ, σ)

returns

𝑃𝑟(𝑋 ≤ 𝑥)

Example:

Cumulative Probability Examples:

2
X, 𝑋 ∼ 𝑁(3, 5 ), with normal distribution, 𝐸(𝑋) = 3, 𝑆𝐷(𝑋) = 5 ( σ = 5)

𝑃𝑟(𝑋 ≤ 4. 5) = 𝑃𝑟(𝑋 < 4. 5) = 𝑝𝑛𝑜𝑟𝑚(4. 5, 𝑚𝑒𝑎𝑛 = 3, 𝑠𝑡𝑎𝑛𝑑 𝑑𝑒𝑣 = 5) = 0. 6179144

𝑃𝑟(𝑋 > 6) = 1 − 𝑃𝑟(𝑋 ≤ 6) = 1 − 𝑃𝑟(𝑋 < 6) = 1 − 𝑝𝑛𝑜𝑟𝑚(6, 3, 5) = 1 − 0. 7257 = 0. 27425

Interval probability:

𝑃𝑟(2. 5 ≤ 𝑋 ≤ 5) = 𝑃𝑟(𝑋 ≤ 5) − 𝑃𝑟(𝑋 ≤ 2. 5) = 𝑝𝑛𝑜𝑟𝑚(5, 3, 5) − 𝑝𝑛𝑜𝑟𝑚(2. 5, 3, 5) = 0. 195

Quantile

qnorm( k, mean = μ, sd = σ )

2
For 𝑋 ∼ 𝑁(µ, σ )
𝑞𝑛𝑜𝑟𝑚( 𝑘, µ, σ )

returns the quantile of order 𝑘, the value 𝑞, such that

𝑃𝑟( 𝑋 ≤ 𝑞 ) = 𝑘

Example:

We are working with a normally distributed variable:

We want to find the value 𝑞0.8 (a quantile or cut-off point) such that:

That means we are looking for the 80th percentile — the value below which 80% of all
the possible X values lie.

Step 1 — What qnorm() does

● The function qnorm(p, mean, sd) gives the quantile 𝑞𝑝 for a normal distribution
with mean = μ and standard deviation = σ.
● In mathematical terms, it finds the x-value that satisfies:

𝑃(𝑋 ≤ 𝑞𝑝) = 𝑝

So, when we call qnorm(0.8, mean = 3, sd = 5), R is finding the x-value 𝑞 such
0.8
that:
𝑃(𝑋 ≤ 𝑞0.8) = 0. 8

Step 2 — Link between standard and general normal

2
To find 𝑞0.8 for 𝑋 ∼ 𝑁(3, 5 ) , R internally performs standardization.

It uses the property of the normal distribution:

𝑋−µ
𝑍= and hence 𝑋 = µ + σ𝑍
σ

where 𝑍 ∼ 𝑁( 0, 1 ) (the standard normal distribution) → µ = 0 & σ = 1

Step 3 — Find the z-value for probability 0.8

We want the value 𝑧 such that 𝑃(𝑍 ≤ 𝑧 ) = 0. 8

0.8 0.8

From standard normal tables (or qnorm(0.8) for the standard case):
𝑧0.8 = 0. 8416

That means:
𝑃(𝑍 ≤ 0. 8416) = 0. 8

Using R:

Step 4 — Convert the z-value back to the X scale

Now we transform 𝑧 back into 𝑥 using the formula:

0.8
𝑞0.8 = µ + σ · 𝑧0.8

Substitute the numbers:

𝑞0.8 = 3 + 5 · 0. 8416 = 3 + 4. 208 = 7. 208
Step 5 — Using R (as shown on the slide)

R does all these steps internally.

You just write:

qnorm(0.8, mean = 3, sd = 5)

and R returns:

[1] 7.208106

This is the 80th percentile, i.e.:

𝑃(𝑋 ≤ 7. 208) = 0. 8

Step 6 — Interpretation (what the result means)

● If X represents, for example, the number of hours watched, a value of 7.21 (approx)
is such that:
○ 80% of people (or observations) watch ≤ 7.21 hours.
○ 20% watch more than 7.21 hours.

Summary of the mathematical process

Step Formula Explanation

1 𝑃(𝑋 ≤ 𝑞0.8) = 0. 8 Definition of quantile

2 𝑋−3 Standardize X
𝑍=
5

3 𝑃(𝑍 ≤ 0. 8416) = 0. 8 ⇒ 𝑧0.8 = 0. 8416 80th percentile of standard

normal
4 𝑞0.8 = 3 + 5 · 0. 8416 = 7. 208 Convert back to original scale

5 R command: qnorm(0.8, mean=3, sd=5) Automatically gives 7.208

6 Interpretation 80% of values lie below 7.208

Final Answer

𝑞0.8 = 7. 208 and 𝑃(𝑋 ≤ 7. 208) = 0. 8

Random Vector
Goal: study joint probabilistic behavior of more variables

focus on the case of two discrete, random variables.

Discrete Bivariate Vector

𝑋 → discrete random variable taking values 𝑥1, 𝑥2 ..., 𝑥𝑘

𝑌 → discrete random variable taking values 𝑦1, 𝑦2, ..., 𝑦ℎ

Marginal probability functions

𝑃𝑟𝑋(𝑋 = 𝑥𝑖) 𝑓𝑜𝑟 𝑖 = 1, 2, ..., 𝑘

𝑃𝑟𝑌(𝑌 = 𝑦𝑗) 𝑓𝑜𝑟 𝑗 = 1, 2, ..., ℎ

Joint probability function

→ joint probabilistic behaviour

For each pair (𝑥𝑖, 𝑦𝑗) returns:

𝑃𝑟(𝑋 = 𝑥𝑖 𝑎𝑛𝑑 𝑌 = 𝑦𝑗) = 𝑃𝑟(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗) = 𝑝(𝑥𝑖 , 𝑦𝑗)

Represent with a cross-tab.

Example

● 𝑋 →
○ 1 → respondent is a student
○ 0 → otherwise
● 𝑌 →
○ 1 → unsatisfied
○ 2 → indifferent
○ 3 → satifed

Joint Probability Distribution

X Y

1 2 3

0 0.2 0.05 0.15

1 0.15 0.35 0.1

Maraginals

𝑃𝑟(𝑋 = 0, 𝑌 = 1) = 0. 2 𝑃𝑟(𝑋 = 1, 𝑌 = 3) = 0. 1
Independence condition

Definition:
X and Y are independent if knowing one gives no information about the other.

Two discrete random variables are statistically independent

𝑃𝑟(𝑋 = 𝑥𝑖 , 𝑌 = 𝑦𝑗) = 𝑃𝑟(𝑋 = 𝑥𝑖)𝑃𝑟(𝑌 = 𝑦𝑗)

For each 𝑖 = 1, 2, ..., 𝑘 and each 𝑗 = 1, 2, ..., ℎ

X Y=1 Y=2 Y=3

0 0.1 0.2 0.1

1 0.1 0.2 0.3

We can check:

𝑃(𝑋 = 0, 𝑌 = 1) = 0. 1

𝑃𝑋(0)⋅𝑃𝑌(1) = 0. 4 × 0. 2 = 0. 08

Since 0. 1≠0. 08 → X and Y are not independent.

Linear association measures

COVARIANCE and PEARSON’S CORRELATION COEFFICIENTS → def & interp as in

descriptive statistics

Covariance
Formula:
Interpretation:

● 𝐶𝑜𝑣 > 0 → when X increases, Y tends to increase (positive relationship)

● 𝐶𝑜𝑣 < 0 → when X increases, Y tends to decrease (negative relationship)
● 𝐶𝑜𝑣 = 0 → no linear relationship

Pearson’s Correlation Index

𝐶𝑜𝑣 ( 𝑋 , 𝑌 )
ρ(𝑋 , 𝑌) =
σ𝑋σ𝑌

− 1 ≤ ρ(𝑋 , 𝑌) ≤ 1
Where:

● σ𝑋 = 𝑉𝑎𝑟(𝑋)

● σ𝑌 = 𝑉𝑎𝑟(𝑌)

𝑋 and 𝑌 independent → ρ( 𝑋 , 𝑌 ) = 0
● ρ( 𝑋 , 𝑌 ) = 0 → NO linear association
● ρ( 𝑋 , 𝑌 ) > 0 → positive linear association
● ρ( 𝑋 , 𝑌 ) < 0 → negative linear association

Sum and Difference of Two Random Variables

Bivariate Vector

● 𝑋 →
○ 𝐸(𝑋) = µ
𝑋
2
○ 𝑉𝑎𝑟(𝑋) = σ
𝑋
● 𝑌 →
○ 𝐸(𝑌) = µ
𝑌
2
○ 𝑉𝑎𝑟(𝑌) = σ𝑌
Sum of two Random Variables

● X and Y correlated:

𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) = µ𝑋 + µ𝑌

2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) + 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 + 2𝐶𝑜𝑣(𝑋 , 𝑌)

● X and Y not correlated ( 𝐶𝑜𝑣(𝑋 , 𝑌) = 0 ):

𝐸(𝑋 + 𝑌) = 𝐸(𝑋) + 𝐸(𝑌) = µ𝑋 + µ𝑌

2 2
𝑉𝑎𝑟(𝑋 + 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌

Difference of two Random Variables

● X and Y correlated:

𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = µ𝑋 − µ𝑌

2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) − 2𝐶𝑜𝑣(𝑋 , 𝑌) = σ𝑋 + σ𝑌 − 2𝐶𝑜𝑣(𝑋 , 𝑌)

● X and Y not correlated ( 𝐶𝑜𝑣(𝑋 , 𝑌) = 0 ):

𝐸(𝑋 − 𝑌) = 𝐸(𝑋) − 𝐸(𝑌) = µ𝑋 − µ𝑌

2 2
𝑉𝑎𝑟(𝑋 − 𝑌) = 𝑉𝑎𝑟(𝑋) + 𝑉𝑎𝑟(𝑌) = σ𝑋 + σ𝑌

I.I.D. Random Variables

N random variables 𝑋1, 𝑋2, ..., 𝑋𝑛:

● Independent
● Identically distributed:
○ Same 𝐸(𝑋) = µ
2
○ Same 𝑉𝑎𝑟(𝑋) = σ
Sum of I.I.D Random Variables

𝐸(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝐸(𝑋1) + 𝐸(𝑋2) + ... + 𝐸(𝑋𝑛) = 𝑛µ

2
𝑉𝑎𝑟(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝑉𝑎𝑟(𝑋1) + 𝑉𝑎𝑟(𝑋2) + ... + 𝑉𝑎𝑟(𝑋𝑛) = 𝑛σ

Arithmetic average of I.I.D Random Variables

𝑋1 + 𝑋2 + ... + 𝑋𝑛
𝑋=
𝑛

● Expected value:

𝑋1 + 𝑋2 + ... + 𝑋𝑛
1 1
𝐸(𝑋) = 𝐸 ( 𝑛
)= 𝑛
𝐸(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 𝑛
𝑛µ = µ

● Variance:
𝑋! + 𝑋2 + ... + 𝑋𝑛 𝑛σ
2 2
σ
1
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟 ( 𝑛
)= 2 𝑉𝑎𝑟(𝑋1 + 𝑋2 + ... + 𝑋𝑛) = 2 = 𝑛
𝑛 𝑛
🌸 cheat sheet 8
📑 Cheat Sheet — Lecture 8
● Linear transform:
Y=a+bX⇒E(Y)=a+bE(X), Var(Y)=b2Var(X)Y=a+bX⇒E(Y)=a+bE(X),Var(Y)=b2Var(X).
● Standardization: Z=X−μσ⇒E(Z)=0, Var(Z)=1Z=σX−μ⇒E(Z)=0,Var(Z)=1.
● Bernoulli X∼Ber(p)X∼Ber(p):
Pr(X=1)=p, Pr(X=0)=1−p; E(X)=p, Var(X)=p(1−p)Pr(X=1)=p,Pr(X=0)=1−p;E(X)=p,Var(X)=
p(1−p).
● Normal N(μ,σ2)N(μ,σ2): density
f(x)=12πσ2e−(x−μ)2/(2σ2)f(x)=2πσ21e−(x−μ)2/(2σ2); standardization
Z=X−μσ∼N(0,1)Z=σX−μ∼N(0,1);
probs with pnorm, quantiles with qnorm.
● Joint discrete (X,Y): joint pmf p(xi,yj)p(xi,yj); independence iff
p(xi,yj)=PrX(xi)PrY(yj)p(xi,yj)=PrX(xi)PrY(yj); correlation
ρ=Cov(X,Y)σXσYρ=σXσYCov(X,Y).
● Sum/Difference:
Var(X±Y)=σX2+σY2±2 Cov(X,Y),E(X±Y)=μX±μ[Link](X±Y)=σX2+σY2±2Cov(X,Y),E(X±Y)
=μX±μY.
If not correlated: Var(X±Y)=σX2+σY2Var(X±Y)=σX2+σY2.
● i.i.d.: ∑Xi∑Xihas E=nμ, Var=nσ2E=nμ,Var=nσ2; XˉXˉ has E=μ, Var=σ2/nE=μ,Var=σ2/n.
🍥 Lecture 8 pt 2
Lecture 8.2

I.I.D Random Variables

(𝑋1, 𝑋2, ..., 𝑋𝑛) :
- independent
- Identically distributed

=> 𝐸(𝑋 ) = µ
𝑖

2
=> 𝑉𝑎𝑟(𝑋𝑖) = σ

Sum Of I.I.D Random Variables

𝑇 = 𝑋1 + 𝑋2 + ... + 𝑋𝑛

𝐸(𝑇) = 𝐸(𝑋1 + ... + 𝑋𝑛) = 𝐸(𝑋1) + ... + 𝐸(𝑋𝑛) = 𝑛µ

2
𝑉𝑎𝑟 (𝑇) = 𝑉𝑎𝑟 (𝑋1 + ... + 𝑋𝑛) = 𝑉𝑎𝑟(𝑋1) + 𝑉𝑎𝑟(𝑋2) + ... + 𝑉𝑎𝑟(𝑋𝑛) = 𝑛σ

Because they are ind & id. distr.

Arithmetic Average Of I.I.D Random Variables

(𝑋1, 𝑋2, ..., 𝑋𝑛) i. i. d.

𝑋1+ ... + 𝑋𝑛
𝑇
𝑋=
𝑛
= 𝑛

𝑇 1 1
𝐸(𝑋) = 𝐸( 𝑛 ) = 𝑛
𝐸(𝑇) = 𝑛
𝑛µ = µ

2
𝑇 1 1 2 σ
𝑉𝑎𝑟(𝑋) = 𝑉𝑎𝑟( ) = 𝑛 2 𝑉𝑎𝑟(𝑇) = 2 𝑛σ = 𝑛
𝑛 𝑛
1. 𝑦 = 𝑎 + 6𝑥 = 3𝑥

𝐸(𝑦) = 𝐸(3𝑥) = 3𝐸(𝑥)

𝑉𝑎𝑟(𝑦) = 𝑉𝑎𝑟(3𝑥) = 9𝑉𝑎𝑟(𝑥)

Central Limit Theorem

When we take many i.i.d. random variables with any distribution (not necessarily normal),
the distribution of their sum or average becomes approximately normal as the sample
size n grows large.

In simple terms

Even if the original data are not normally distributed, the sample mean (average of many
independent observations) will follow a bell-shaped (normal) distribution if the sample
size 𝑛 is big enough.

The idea behind it

When we add random variables together:

● Small random ups and downs tend to cancel out,

● The sum or average of many variables behaves in a predictable, smooth, and
symmetric way..

Mathematical Formulation

Let 𝑋 , 𝑋 , ... , 𝑋 be i.i.d. random variables with:

1 2 𝑛
2
𝐸(𝑋𝑖) = µ 𝑉𝑎𝑟(𝑋𝑖) = σ

Sum of the variables:

𝑆𝑛 = 𝑋1 + 𝑋2 + ... + 𝑋𝑛

Then for large 𝑛 :

2
𝑆𝑛 ≈ 𝑁(𝑛µ , 𝑛σ )

That means:
● The mean (center) of 𝑆 is 𝑛µ
𝑛
2
● The variance (spread) of 𝑆 is 𝑛σ
𝑛

Sample mean:

Then for large 𝑛 :

That means:

● The mean (center) of 𝑋 is µ

2
σ
● The variance (spread) of 𝑋 is smaller:
𝑛

→ The more observations, the less variability in the sample mean.

Standardized version (Z form)

We can rewrite the theorem in standardized form, to use the standard normal
distribution 𝑁(0, 1):

That means: if we convert the sample mean into a z-score, it will follow a normal
distribution when 𝑛 is large.

What makes CLT important

It lets us use normal probability methods even when data are not normal, as long as 𝑛
is large.
How the shape changes with n

n Shape of distribution of the mean

small n (e.g., 5) can be skewed or irregular

moderate n (e.g., 30) smoother, more symmetric

large n (e.g., 100+) very close to normal

Summary of Key CLT Formulas

Concept Formula Meaning

2
Sum of i.i.d. 𝑆𝑛 ≈ 𝑁(𝑛µ , 𝑛σ ) total becomes approximately
variables normal

Mean of i.i.d. 2 average becomes

𝑋≈𝑁(μ, σ /𝑛)
variables approximately normal

Standardized allows use of standard

form normal tables

Bernoulli sum approximates binomial

distribution

Sample proportion also follows CLT

proportion
Sum Of Bernoulli Random Variables

Ber (p) ⇒ 𝐸(𝑋 ) = 𝑝 & 𝑉(𝑋𝑖) = 𝑝(1 − 𝑝)

𝑖

𝑋𝑖 → 0 𝐹𝑎𝑖𝑙𝑢𝑟𝑒 (1 − 𝑝)
→ 1 𝑆𝑢𝑐𝑐𝑒𝑠𝑠 𝑝

𝑃(𝑋𝑖 = 0) = 1 − 𝑝 𝑃(𝑋𝑖 = 1) = 𝑝
𝐸(𝑋𝑖) = 𝑝 𝑉𝑎𝑟(𝑋𝑖) = 𝑝(1 − 𝑝)
𝑇 = 𝑋1 + ... + 𝑋𝑛 = total number of successes in our n
𝐸(𝑇) = 𝑛𝐸(𝑋𝑖) = 𝑛𝑝

𝑉𝑎𝑟(𝑇) = 𝑛𝑉𝑎𝑟(𝑋𝑖) = 𝑛𝑝(1 − 𝑝)

𝑇 ≈ 𝑁(𝑛𝑝, 𝑛𝑝(1 − 𝑝))

Arithmetic Average Of Bernoulli Random Variables

Consider a vector (𝑋 , 𝑋 , ..., 𝑋 ) i.i.d. Ber(p)
1 2 𝑛
𝑇
𝑋= 𝑛
= 𝑃 → Sample Proportion (portion of success in n trials)

𝐸(𝑃) = 𝐸(𝑋) = 𝑝
𝑉𝑎𝑟(𝑋𝑖) 𝑝(1−𝑝)
𝑉𝑎𝑟(𝑃) = 𝑉𝑎𝑟(𝑋) = 𝑛
= 𝑛
CENTRAL LIMIT TH.

For a large 𝑛 :

𝑝(1−𝑝)
𝑃(𝑋) ≈ 𝑁(𝑝, 𝑛
)
Example 1 — Sum of Bernoulli

Each student admitted with probability 𝑝 = 0. 4

𝑛 = 150 𝐸(𝑋𝑖) = 𝑝 = 0. 4 𝑉𝑎𝑟(𝑋𝑖) = 𝑝(1 − 𝑝) = 0. 24

𝑇 = 𝑋1 + ⋯ + 𝑋150

𝐸(𝑇) = 𝑛𝑝 = 60 𝑉𝑎𝑟(𝑇) = 𝑛𝑝(1 − 𝑝) = 36

⇒ 𝑇 ≈ 𝑁(60 , 36) → 𝑆𝐷 = 36 = 6

Using CLT, we can now find probabilities using the normal distribution:

Even though the original distribution (Bernoulli) is not normal, 𝑇 behaves approximately
normal for 𝑛 = 150

Example 2 — Sample Proportion

𝑇
𝑝 = 0. 4 𝑛 = 200 𝑃= 𝑛

𝑝(1−𝑝)
𝐸(𝑃) = 𝑝 = 0. 4 𝑉𝑎𝑟(𝑃) = 𝑛
= 0. 0012

So:

To find 𝑃𝑟(𝑃 < 0. 35) :

This means about 7.45% of samples will have a proportion below 0.35.

Example 1 (same as the one at CLT, but in class solution):

n=80 i driver → 𝑋 → 0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 0. 3
𝑖

→ 1 𝑠𝑒𝑙𝑓 − 𝑝𝑜𝑚𝑝 0. 7
𝑖 = 1, 2, ..., 80
a) 𝑋 ∼ 𝐵𝑒𝑟(0. 7)
𝑖

𝐸(𝑋𝑖) = 0. 7 𝑉𝑎𝑟(𝑋𝑖) = 0. 7 · 0. 3

𝑇 = 𝑋1 + ... + 𝑋80

𝑇 ≈ 𝑁𝑂𝑅𝑀𝐴𝐿 (56, 16. 8)

𝐸(𝑇) = 80𝐸(𝑋𝑖) = 80 · 0. 7 = 56

𝑉𝑎𝑟(𝑇) = 80𝑉(𝑋𝑖) = 80 · 0. 3 = 16. 8

b) 60% of 80 → 48 𝑇 ≈ 𝑁(56, 16. 8)

𝑃(𝑇 ≥ 48) = 1 − 𝑃(𝑇 ≤ 48) = 1 − 𝑝𝑛𝑜𝑟𝑚(48, 56, 𝑠𝑞𝑟𝑡(16. 8)) = 0. 9745

𝑃(𝑥) 𝑃(𝑃 ≥ 0. 6)

𝑃 ≈ 𝑁𝑂𝑅𝑀𝐴𝐿(0. 7, 0. 0026)
0.7·0.3
𝐸(𝑃) = 0. 7 𝑉𝑎𝑟(𝑃) = 80
= 0. 0026
🍥 Lecture 9
Oct 2, 2025

Lecture 9
Population & Sample
Population:
● set of all statistical units on which a characteristic may be measured.
Sample:
● any subset of the population.
Description vs Prediction:
● analysis should not stop at describing the sample
● aim is to make predictions about the population.
Inferential statistics:
● techniques to draw conclusions on the entire population from a sample.
Good sample:
● representative and obtained via random sampling.

Probabilistic framework
● express inferential problems in probability terms to quantify and control the
probability of errors.

Experiment: draw units at random and measure a characteristic on them.

Random variable (X) → mathematical description of the outcome of each draw.

Example

● 𝑋 → hours per day spent watching TV by a randomly drawn respondent

● µ → mean (expected value) of 𝑋

○ Write as the operation → μ = 𝐸[𝑋]
2
● σ → variance of 𝑋
○ Write as → σ² = 𝑉𝑎𝑟(𝑋)
● Population mean µ → arithmetic average of the number of hours among all
adults (parameter, unknown number).
2
● Population variance σ → the variance of the number of hours among all adults
(parameter, unknown).

Random Sample (I.I.D.)

● Draw n units at random → obtain n random variables (𝑋 , …, 𝑋 ).

1 𝑛

● Random sample: (𝑋 , …, 𝑋 ) are independent and identically distributed (i.i.d.)

1 𝑛

○ with the same mean μ and same variance σ².

○ Sample realization → the concrete values (𝑥 , …, 𝑥 ).
1 𝑛

Inferential Problems
Focus → inference on the population mean (μ)

Main tasks
● Point estimation
● Confidence interval estimation
● Hypothesis testing

Statistics
Sample statistic → a function of the random sample that summarizes information
→ a random variable

Point Estimation Problem

Provide a guess or aprox of the unknown population mean

● estimator → a sample statistic to provide a guess of μ (population mean)

● estimate → the estimator’s observed value on the sample
Point Estimation of the Population Mean

Sample Mean → estimator of the population mean µ

Observed value 𝑋 → estimate of μ

🍥 Lecture 10
Lecture 10
Properties of the Sample Mean
For an i.i.d sample, the sample mean has:

I. Mean (expected value) → µ

𝐸(𝑋) = µ

Interpretation → the arithmetic average of all possible values of 𝑋 equals µ

(population mean)

⇒ sample mean 𝑋 → unbiased estimator (no systematic error).

2
σ
II. Variance →
𝑛
2
σ
𝑉𝑎𝑟(𝑋) = 𝑛

● The > the sample size → closer to the population mean

● Sample mean might overestimate or underestimate population size
● The spread of possible 𝑋 values decreases as 𝑛 grows
⇒ sample mean 𝑋 → consistent estimator
σ
● standard deviation of the sample mean → standard error of the mean
𝑛

○ It is the standard deviation of all possible sample means over all possible
samples

III. From CLT → for large 𝑛, the distribution approximately normal (Gaussian).
Point Estimation Of The Population Variance And Standard
Deviation
Sample variance

2 2
● Sample variance 𝑆 → estimator of the population variance σ
2 2
● Observed value 𝑠 → estimate of σ

Unbiased estimator:

2 2
● 𝐸(𝑆 ) = σ
2 2
● Sample variance 𝑆 → unbiased estimator of the population variance σ

Sample standard deviation

● We use the sample standard deviation 𝑆 as an estimator of the population standard

deviation σ
● The observed value of the sample standard deviation 𝑠 → estimate of the
population standard deviation σ

Point Estimation Not Enough

Point estimation means using a single number calculated from a sample to estimate an
unknown population parameter.

The formula is called the point estimator, and the resulting number is the point
estimate.

Properties of a “good” point estimator:

1. Unbiased → its expected value equals the true parameter
𝐸(𝑋) = μ
2. Consistent → as sample size nn increases, the estimator gets closer to the true
value
3. Efficient → among all unbiased estimators, it has the smallest variance
a single point estimate (like 𝑥 ) is not enough. We must also evaluate how precise or
accurate that estimate is.

Why accuracy matters

● Different random samples → different sample means ( 𝑥 ).

● Hence, even though 𝑥 estimates μ, it varies across samples.
● We need a way to quantify that variability — that’s where the Standard Error
(SE) comes in.

Standard Error

The standard deviation of the sampling distribution of 𝑋 is called the Standard Error (SE)
of the mean.
It measures how much sample means tend to vary from one sample to another.

Formula:

● σ → population standard deviation

● 𝑛 → sample size

Interpretation:

● Smaller SE → higher accuracy (sample mean is more stable)

● Larger SE → lower accuracy (sample mean varies more)

Estimation Of The Standard Error Of The Mean

For a sample of size 𝑛 :

Taking the square root:

This 𝑆𝐷(𝑋) is the Standard Error (SE) of the sample mean.

Meaning of SE

● It represents the standard deviation of all possible sample means over all
possible random samples of the same size.
● It is the expected variability of 𝑋 around the true mean μ.

Estimating the SE of the Mean

This slide explains what to do when σ (the population standard deviation) is unknown,
which is almost always the case in real life.

Since σ is unknown, we estimate it with the sample standard deviation S:

This estimated SE is used to measure the accuracy of the point estimate 𝑥

Example from Lecture (TV Hours)

Given:

● 𝑠 = 2. 5783
● 𝑛 = 709

Then:

Interpretation

● The sample mean ( 𝑥 = 2. 97 ) is expected to vary by about 0.0968 hours across

repeated samples.
● This tells us that the estimate of the population mean is quite precise because SE
is small.
Confidence intervals (CI) for μ
Confidence level

( 1 − α ) · 100%

where:

● α = probability of being wrong (the “risk” or significance level),

● 1 − α = probability that the interval contains the true parameter.

Typical choices:

● α = 0. 10 → 90% confidence interval

● α = 0. 05 → 95% confidence interval
● α = 0. 01 → 99% confidence interval

A confidence interval provides:

● A range of values (an interval),

● Constructed from the sample data,
● That is expected, with a chosen level of confidence, to contain the true population
parameter.

Interpretation

A (1 − α) · 100% confidence interval for a population parameter:

● interval of plausible values for that parameter

● constructed in such a way that we are (1 − α) · 100% confident it contains the
true value of the parameter

The Logic Behind It

Imagine we repeat the sampling process many times (same population, same sample size).
Each time, we compute a confidence interval for the mean (μ).

Then:

● About 95% of these intervals (if the confidence level is 95%) will contain the true µ
● About 5% will not.
Normal Population Known Variance
2
𝑋1 , ..., 𝑋𝑛 i.i.d → Normal, µ unknown, σ known

2
σ
⇒ 𝑋 normal with mean → µ known variance →
𝑛

Standardization of the sample mean

→ has standard normal distribution

Notation For Quantile Of A Standard Normal

𝑧α → quantile of order 1 − α of a standard normal distribution

→ contains µ with probability 1 − α

Confidence Interval (1 - 𝛼 )100% for 𝜇

The confidence interval of level (1 − α)100% for the unknown population mean µ is:

Frequentist interpretation

Draw repeatedly samples of size 𝑛 from the population → compute for each the
confidence interval of level (1 − α)100% → about (1 − α)100% of the CIs will contain
the unknown parameter µ

Length and Margin of Error

σ
● 𝑧α → margin of error
2 𝑛
σ
● 2𝑧α → length of the confidence interval
2 𝑛
→ measure of accuracy
2
For given confidence level and given σ , the greater 𝑛 :

● the lower the standard error

● the lower the margin of error
● the lower is the length of the interval

The greater the confidence level (1 − α) → the lower α → the greater 𝑧 α → the longer
2

the confidence level

Example

σ = 30 𝑥 = 150

95% confidence interval for the population mean score 𝑥 ⇒

α
⇒ (1 − α) = 0. 95 ⇒ α = 0. 05 ⇒ 2
= 0. 025
α
𝑧 α = 𝑞𝑛𝑜𝑟𝑚(1 − 2
) = 𝑞𝑛𝑜𝑟𝑚(0. 975) = 1. 96
2

The confidence interval for the mean score is:

[150 − 3 · 1. 96, 150 + 3 · 1. 96] = [144. 12, 155. 88]

→ we are 95% confident that the mean score 𝑥 is between 144.12 and 155.88
→ we are 5% confident that the mean score is smaller than 144.12 or higher that 155.88

Normal Population Unknown Variance

2
𝑋1 , ..., 𝑋𝑛 i.i.d → Normal, µ unknown, σ unknown

𝑆 σ
𝑋 → µ 𝑆 → σ →
𝑛 𝑛

2
When σ is unknown, estimate it with 𝑆 , and use Student’s t distribution:

(Figure on p. 52 compares (t) densities with the standard normal.)

● t-quantile in R:

t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).

● CI for (\mu) with unknown variance:

\left(,\bar{x}-t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}};,;\bar{x}+t_{,n-1,;\alpha/2}\frac{s}
{\sqrt{n}},\right).

● Worked example (TV time, (n=30)):

\bar{x}=2.11,;; s=1.364,;; \alpha=0.10,;; t_{29,,0.05}=\texttt{qt}(0.95,29)=1.699,

\frac{s}{\sqrt{n}}=\frac{1.364}{\sqrt{30}}=0.249,\quad
\text{CI}=[2.11-1.699\cdot0.249,;2.11+1.699\cdot0.249]=[1.687,;2.533].

● R shortcut for a CI on a mean:

\texttt{[Link](variable_name,;[Link]=1-\alpha)}

(defaults to 95% if (\texttt{[Link]}) is omitted)

What to report with point estimates

● Point estimate alone is not enough; report an accuracy measure (SE or CI). (p. 24)
● SE gives a first accuracy gauge; CI pairs the estimate with an interval and an
explicit confidence level. (pp. 26, 29–32)

Cheat Sheet — Lecture 10

Core estimators

● Sample mean: (\displaystyle \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i).

● Sample variance: (\displaystyle S^2=\frac{\sum_{i=1}^n (X_i-\bar{X})^2}{n-1}).
● Sample SD: (\displaystyle S=\sqrt{S^2}).

Key properties (i.i.d.)

● (\displaystyle E(\bar{X})=\mu) (unbiased).

● (\displaystyle Var(\bar{X})=\frac{\sigma^2}{n}) (precision increases with (n)).
● Large (n): (\bar{X}) approximately normal.
● SE of mean: (\displaystyle \text{SE}=\frac{\sigma}{\sqrt{n}}), estimate
(\displaystyle \widehat{\text{SE}}=\frac{s}{\sqrt{n}}).

CI for μ (Normal population)

● Known (\sigma):
[
\bar{x}\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}},\qquad
z_{\alpha/2}=\texttt{qnorm}(1-\alpha/2).
]
● Unknown (\sigma):
[
\bar{x}\pm t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}},\qquad
t_{,n-1,;\alpha/2}=\texttt{qt}(1-\alpha/2,;n-1).
]
● Margin of error: (z_{\alpha/2}\frac{\sigma}{\sqrt{n}}) (or
(t_{,n-1,;\alpha/2}\frac{s}{\sqrt{n}})).
● Length: (2\times) Margin. Higher confidence ⇒ larger quantile ⇒ longer interval.
(pp. 40, 45–47, 53–55)

Worked examples

● Known (\sigma) (aptitude test): (n=100,;\sigma=30,;\bar{x}=150) → 95% CI ([144.12,

155.88]). (pp. 42–44)
● Unknown (\sigma) (TV hours): (n=30,;s=1.364,;\bar{x}=2.11) → 90% CI ([1.687,
2.533]). (pp. 56–58)

R helper

● (\texttt{[Link](variable_name, [Link]=1-α)}) → returns the CI on the mean

(default 95%)
Tab 20
📘 Point Estimation — Definition
A point estimation is a single numerical value (a “point”) used to estimate an unknown
population parameter.

🔹 1. Population Parameters vs. Sample Statistics

In statistics, we distinguish between:

● Parameters: numerical values that describe a population (fixed but unknown)

○ Example: population mean ( \mu ), population variance ( \sigma^2 )
● Statistics: numerical values that describe a sample (known and computable)
○ Example: sample mean ( \bar{x} ), sample variance ( s^2 )

Because we rarely have access to an entire population, we use sample statistics to

estimate population parameters.

🔹 2. What a “Point Estimator” Is

A point estimator is a statistical formula that provides an estimate for a population
parameter.

● For the mean, the estimator is:

[
\bar{X} = \frac{X_1 + X_2 + \cdots + X_n}{n}
]
● For the variance, the estimator is:
[
S^2 = \frac{\sum_{i=1}^n (X_i - \bar{X})^2}{n - 1}
]

These are random variables because they depend on the sample (which changes every
time you sample again).

🔹 3. What a “Point Estimate” Is

Once you apply the estimator to your actual sample data, you get a numerical value —
this is the point estimate.

● Example:
○ The formula ( \bar{X} ) is the point estimator of ( \mu ).
○ The calculated number ( \bar{x} = 2.97 ) is the point estimate.

In symbols:
[
\text{Point estimate of } \mu = \bar{x}.
]

🔹 4. Examples from the Lecture

Population Estimator (formula) Example from Result
parameter lecture (estimate)

Population mean ( ( \bar{X} = \frac{\sum X_i}{n} Mean TV hours ( \bar{x} =

\mu ) ) 2.97 )

Population variance ( S^2 = \frac{\sum (X_i - TV hours ( s^2 =

( \sigma^2 ) \bar{X})^2}{n - 1} ) variance 6.6474 )

Population SD ( ( S = \sqrt{S^2} ) TV hours SD ( s = 2.5783

\sigma ) )

So:

● ( \bar{X} ) is the point estimator of ( \mu ),

● ( \bar{x} = 2.97 ) is the point estimate.

🔹 5. Properties of a “Good” Point Estimator

A point estimator should ideally be:

1. Unbiased → its expected value equals the true parameter

[
E(\bar{X}) = \mu
]
2. Consistent → as sample size ( n ) increases, the estimator gets closer to the true
value
3. Efficient → among all unbiased estimators, it has the smallest variance

🔹 6. Summary Table
Concept Symbol / Formula Description

Point Function of sample data (e.g., ( Formula that estimates a

Estimato \bar{X} ), ( S^2 )) population parameter
r

Point Numerical result from applying the Specific number used as the
Estimate estimator (e.g., ( \bar{x}=2.97 )) best guess of the parameter

Paramet ( \mu, \sigma^2, \sigma ) True (unknown) value

er describing the population

Goal Estimate ( \mu, \sigma^2, \sigma ) Use sample data to infer

with ( \bar{x}, s^2, s ) population characteristics

✅ In short:
Point estimation means using a single number calculated from a sample to
estimate an unknown population parameter.
The formula is called the point estimator, and the resulting number is the
point estimate.

Example:
The average number of TV hours in a sample is 2.97 →
this is the point estimate for the population mean ( \mu ) (the true average TV hours in
the entire population).
📘 Slide 23 – Title: “Estimators of the population variance
and standard deviation”
This slide explains how we estimate the population variance (σ²) and population
standard deviation (σ) when we only have sample data.

🔹 1. The Goal
In practice, we rarely know the true population parameters (σ² and σ).
Therefore, we must use sample statistics as estimators of those parameters.

🔹 2. The Estimators
Sample variance (S²)

Mathematical operation:
[
S^2 = \frac{(X_1 - \bar{X})^2 + (X_2 - \bar{X})^2 + \cdots + (X_n - \bar{X})^2}{n - 1}
]

● ( X_i ): value of the i-th observation

● ( \bar{X} ): sample mean
● ( n ): sample size
● Denominator (n – 1) is used instead of n to make S² unbiased — this correction is
called the Bessel correction.

Sample standard deviation (S)

Mathematical operation:
[
S = \sqrt{S^2}
]

So once you calculate S², take the square root to get S.

🔹 3. Why we use (n-1) (Unbiasedness Property)

If we used (n) in the denominator, the sample variance would underestimate the true
variance on average.
Using (n-1) corrects for this bias, making (S^2) an unbiased estimator of σ²:

[
E(S^2) = \sigma^2
]

This is called the unbiasedness property of the sample variance.

🔹 4. The Estimates (Observed Values)

When we compute (S^2) and (S) from real data, their observed values are denoted by
lowercase letters:

[
s^2 = \text{observed sample variance}, \quad s = \text{observed sample standard
deviation}.
]

These are point estimates of the population parameters σ² and σ.

🔹 5. Example (from the lecture context – TV hours dataset)

From the slide example:

● ( n = 709 )
● ( s^2 = 6.6474 )
● ( s = 2.5783 )

Interpretation:

● The sample variance (s²) = 6.6474 is our best estimate of the population variance
(σ²).
● The sample standard deviation (s) = 2.5783 is our best estimate of the population
standard deviation (σ).

Thus:
[
\hat{\sigma}^2 = 6.6474, \quad \hat{\sigma} = 2.5783.
]
🔹 6. Summary of Slide 23
Quantity Formula Purpose Notes

Sample (S^2 = \frac{\sum (X_i Estimates population Uses n–1 to be

variance (S²) - \bar{X})^2}{n-1}) variance σ² unbiased

Sample (S = \sqrt{S^2}) Estimates population Square root of

standard standard deviation σ variance
deviation (S)

Unbiasedness (E(S^2) = \sigma^2) Ensures fair Important

estimation of theoretical
population spread property

Point (s^2, s) Numerical values Best

estimates computed from the single-number
data estimates

✅ Key takeaway from Slide 23:

The sample variance (S²) and sample standard deviation (S) are unbiased
estimators of the population variance (σ²) and standard deviation (σ). They
summarize how spread out the sample data are around the mean and serve as
building blocks for later inferential procedures (like the standard error and
confidence intervals).
R functions
[Link]()

📘 1. The context
In R, you started with a contingency table:

tab1 = table(work$transport, work$worker)

which looks like this:

transport salaried self-employed

private 76 56

public 64 24

Then, you used:

[Link](tab1, 2)
[Link](tab1, 1)

🔹 2. Meaning of [Link](x, margin)

The [Link]() function converts the counts in a contingency table into relative
frequencies (proportions).

● margin = 1 → proportions by row

● margin = 2 → proportions by column

🔹 3. When margin = 2
[Link](tab1, 2)

→ divides each cell by the column total.

So, it answers the question:

“Among salaried workers and among self-employed workers, what proportion
use public vs private transport?”

transport salaried self-employed

private 0.5428571 0.7000000

public 0.4571429 0.3000000

Interpretation:

● Among salaried workers → 54.3% use private, 45.7% use public transport.
● Among self-employed workers → 70% use private, 30% use public transport.

So the column sums = 1.

🔹 4. When margin = 1
[Link](tab1, 1)

→ divides each cell by the row total.

That means it gives conditional proportions within each transport category.

So, it answers the question:

“Among people who use private transport (and among those who use public
transport), what proportion are salaried vs self-employed?”

transport salaried self-employed

private 0.6440678 0.3559322

public 0.7804878 0.2195122

Interpretation:

● Among those who use private transport, 64.4% are salaried and 35.6% are
self-employed.
● Among those who use public transport, 78% are salaried and 22% are
self-employed.

So the row sums = 1.

🔹 5. Without specifying a margin
If you just run:

[Link](tab1)

It divides by the total sum of all cells, giving overall proportions (out of all 200
respondents).
The total of all cells = 1.

✅ Summary Table
Command Divides by Shows proportions Use when you want to
know…

[Link]( Row totals Within each transport “Given transport, what is

tab1, 1) type, how many are the working status?”
salaried vs self-employed

[Link]( Column totals Within each worker type, “Given working status, what
tab1, 2) how many use private vs is the transport used?”
public transport

[Link]( Grand total Across all observations Overall joint distribution

tab1)

✅ In short
● 2 → column proportions → “within each worker group”
● 1 → row proportions → “within each transport group”
🍥 Lecture 13
NORMAL POPULATION KNOWN VARIANCE - intuition on the test’s derivation

Ex: Is there enough empirical evidence to presume that the mean weight of all produced
cereals packages is greater than 16?
Xi = weight of a cereal package drawn at random

Test

Against
P - VALUE
🍥 Lecture 14

Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Understanding Statistics and Probability
No ratings yet
Understanding Statistics and Probability
59 pages
Statistics: Data Types and Analysis
No ratings yet
Statistics: Data Types and Analysis
71 pages
Business Statistics Overview and Concepts
No ratings yet
Business Statistics Overview and Concepts
46 pages
SDS - Unit IV - Mean, Median, Mode
No ratings yet
SDS - Unit IV - Mean, Median, Mode
104 pages
Madhuinferentialsimpli 1727571653492
No ratings yet
Madhuinferentialsimpli 1727571653492
124 pages
2 Statistik Deskriptif
No ratings yet
2 Statistik Deskriptif
43 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
46 pages
Importance of Descriptive Statistics in Psychology
No ratings yet
Importance of Descriptive Statistics in Psychology
12 pages
Statistical Analysis Notes for R
No ratings yet
Statistical Analysis Notes for R
27 pages
Central Tendency & Dispersion in Statistics
No ratings yet
Central Tendency & Dispersion in Statistics
31 pages
Comm 215 Midterm Review Guide
No ratings yet
Comm 215 Midterm Review Guide
71 pages
Module 1 - Session 3 - Statistics
No ratings yet
Module 1 - Session 3 - Statistics
49 pages
SPSS Data Analysis Basics Explained
100% (1)
SPSS Data Analysis Basics Explained
110 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
11 pages
Introduction to Statistical Methods
No ratings yet
Introduction to Statistical Methods
55 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
29 pages
Key Statistical Terms and Concepts
No ratings yet
Key Statistical Terms and Concepts
25 pages
Data Management in Statistics
100% (2)
Data Management in Statistics
104 pages
Statistical Analysis Notes Paper104
No ratings yet
Statistical Analysis Notes Paper104
12 pages
CH - 2 Stastical Data Analysis
No ratings yet
CH - 2 Stastical Data Analysis
46 pages
Understanding Statistics and Data Types
No ratings yet
Understanding Statistics and Data Types
12 pages
Biostatistics Basics: Key Concepts Explained
No ratings yet
Biostatistics Basics: Key Concepts Explained
74 pages
Introduction to Quantitative Methods
No ratings yet
Introduction to Quantitative Methods
49 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
59 pages
Understanding Statistics: Key Concepts
No ratings yet
Understanding Statistics: Key Concepts
60 pages
Standardized Score Transformation in Statistics
No ratings yet
Standardized Score Transformation in Statistics
81 pages
Session 1 New
No ratings yet
Session 1 New
56 pages
Statistics: Mean, Median, Mode, Variance
No ratings yet
Statistics: Mean, Median, Mode, Variance
16 pages
Introduction to Statistics Basics
No ratings yet
Introduction to Statistics Basics
44 pages
Quick Reference to Statistical Concepts
No ratings yet
Quick Reference to Statistical Concepts
21 pages
Understanding Population and Sampling Basics
No ratings yet
Understanding Population and Sampling Basics
7 pages
Quantitative Decision-Making Methods
No ratings yet
Quantitative Decision-Making Methods
100 pages
Objectives of Tabulation in Statistics
No ratings yet
Objectives of Tabulation in Statistics
13 pages
Objectives of Data Tabulation
100% (1)
Objectives of Data Tabulation
13 pages
Understanding Descriptive Statistics
No ratings yet
Understanding Descriptive Statistics
74 pages
Measures of Central Tendency Explained
No ratings yet
Measures of Central Tendency Explained
10 pages
Research
No ratings yet
Research
13 pages
Statistical Inference in Data Science
No ratings yet
Statistical Inference in Data Science
59 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
60 pages
Introduction to Statistics and Probability
No ratings yet
Introduction to Statistics and Probability
5 pages
Introduction to Statistics Concepts
No ratings yet
Introduction to Statistics Concepts
17 pages
Introduction to Biostatistics for Nursing
No ratings yet
Introduction to Biostatistics for Nursing
34 pages
Introduction to Statistics and Data Analysis
No ratings yet
Introduction to Statistics and Data Analysis
8 pages
Quantitative Data Analysis Techniques
No ratings yet
Quantitative Data Analysis Techniques
41 pages
Probability and Statistics Course Overview
No ratings yet
Probability and Statistics Course Overview
34 pages
Introduction to Statistics in Business
No ratings yet
Introduction to Statistics in Business
52 pages
Handouts in Statistics and Probability
No ratings yet
Handouts in Statistics and Probability
25 pages
CH 1 1
No ratings yet
CH 1 1
28 pages
Business Statistics Lecture Notes
No ratings yet
Business Statistics Lecture Notes
7 pages
Unit 1 Stats
No ratings yet
Unit 1 Stats
32 pages
Statistics in Data Science Overview
No ratings yet
Statistics in Data Science Overview
155 pages
Importance and Scope of Statistics
No ratings yet
Importance and Scope of Statistics
20 pages
Introduction to Statistical Concepts
No ratings yet
Introduction to Statistical Concepts
119 pages
Statistics Essentials for Data Science
100% (2)
Statistics Essentials for Data Science
27 pages
Mean, Median, Mode of Grouped Data
100% (1)
Mean, Median, Mode of Grouped Data
11 pages
Introduction to Statistics Overview
No ratings yet
Introduction to Statistics Overview
68 pages
Stat-Data p01
No ratings yet
Stat-Data p01
47 pages
Virtual Schooling and Absenteeism Analysis
No ratings yet
Virtual Schooling and Absenteeism Analysis
46 pages
9 Extracurricular Activity Involvement On The Compassion Academic Competence and Commitment of Collegiate Level Students A Structural Equation Model
No ratings yet
9 Extracurricular Activity Involvement On The Compassion Academic Competence and Commitment of Collegiate Level Students A Structural Equation Model
15 pages
2025 Internal Assessment Score Sheet
No ratings yet
2025 Internal Assessment Score Sheet
10 pages
OLTC Lifecycle Management in TNB
No ratings yet
OLTC Lifecycle Management in TNB
5 pages
Weekly Sales Analysis and Insights
No ratings yet
Weekly Sales Analysis and Insights
12 pages
Data Summary Lesson Plan for 6th Grade
No ratings yet
Data Summary Lesson Plan for 6th Grade
5 pages
Understanding Measures of Relative Standing
No ratings yet
Understanding Measures of Relative Standing
7 pages
Grade 10 Mathematics ATP 2025 Overview
100% (1)
Grade 10 Mathematics ATP 2025 Overview
6 pages
Intro to Data & Statistics with R
No ratings yet
Intro to Data & Statistics with R
45 pages
Data Types and Preprocessing in AI
No ratings yet
Data Types and Preprocessing in AI
83 pages
Data Visualization Methods Explained
No ratings yet
Data Visualization Methods Explained
43 pages
LDL Cholesterol Levels: Smokers vs Non-Smokers
No ratings yet
LDL Cholesterol Levels: Smokers vs Non-Smokers
16 pages
Introduction to Quantitative Research
No ratings yet
Introduction to Quantitative Research
68 pages
CFA Level I: Rates and Returns Guide
No ratings yet
CFA Level I: Rates and Returns Guide
111 pages
Analyzing Quartiles in Data Sets
No ratings yet
Analyzing Quartiles in Data Sets
24 pages
Statistics Mean and Distribution Problems
No ratings yet
Statistics Mean and Distribution Problems
10 pages
Data Summaries and Visualizations Guide
No ratings yet
Data Summaries and Visualizations Guide
17 pages
Gender-Based Exam Score Analysis
No ratings yet
Gender-Based Exam Score Analysis
23 pages
WASSCE 2020 Core Maths Exam Questions
No ratings yet
WASSCE 2020 Core Maths Exam Questions
20 pages
Merits and Demerits of Statistical Measures
100% (1)
Merits and Demerits of Statistical Measures
10 pages
CHAPTER 6 Statistics III (Notes and Exercises)
100% (3)
CHAPTER 6 Statistics III (Notes and Exercises)
19 pages
Data Representation Techniques in Statistics
No ratings yet
Data Representation Techniques in Statistics
13 pages
Data PDF
No ratings yet
Data PDF
66 pages
Descriptive Statistics Overview
No ratings yet
Descriptive Statistics Overview
3 pages
Bayern Munich vs Liverpool Stats Analysis
No ratings yet
Bayern Munich vs Liverpool Stats Analysis
34 pages
Grade 12 Mathematical Literacy Exam 2019
No ratings yet
Grade 12 Mathematical Literacy Exam 2019
14 pages
Question Bank Basic Concepts
No ratings yet
Question Bank Basic Concepts
11 pages
Descriptive Statistics Overview for CE Exam
No ratings yet
Descriptive Statistics Overview for CE Exam
51 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
78 pages
Logic and Computer Design Fundamentals 4th Edition
100% (2)
Logic and Computer Design Fundamentals 4th Edition
72 pages

Statistics Lecture 1 Overview

Uploaded by

Statistics Lecture 1 Overview

Uploaded by

🍥 Lecture 1

●​ MACRO data - aggregate level

●​ CROSS-SECTIONAL data - only once, at a fixed time

●​ LONGITUDINAL data - collected at different periods in time:

★​ Primary data - the one concluded by scientists

POPULATION VERSUS SAMPLE

PARAMETER - numerical summary at population level (average nr of x in the last year by

●​ categorical - not numbers

●​ Statistics in everyday life: Numbers dominate society; statistics appear everywhere.

TERMINOLOGY & NOTATION

📑 CHEAT SHEET (QUICK REFERENCE)

●​ Statistical variable = result of measurement process; each question → variable

FREQUENCY DISTRIBUTION TABLES

●​ Best when few distinct values

●​ Used for numerical variables with many distinct values

●​ Purpose: condense data into one typical value

●​ Most frequent value (highest frequency)

Median from Raw Data

Example: Data = [1, 3, 1, 2, 5, 3]

Example: Degree (Education Level)

Level Absolute Relative Cumulative

Less than high 82 82709=0.116 0.116

High school 361 361709=0.509 0.116+0.509=0.625

Junior college 50 50709=0.071 0.625+0.071=0.696

Bachelor 141 141709=0.199 0.696+0.199=0.895

Graduate 75 75709=0.106 0.895+0.106= 1

●​ Case 1: Cumulative frequency ≠ 0.5

●​ Case 2: Cumulative frequency = 0.5

→ Median = 1 (times per week).

Median from a Histogram

●​ For continuous/grouped data shown in histograms:

●​ Mean (Arithmetic average) = sum of all values ÷ number of observations

●​ Deviation of observation 𝑥𝑖:

1.​ Balancing Point:

Mean from Frequency Distribution

●​ 𝑥𝑖 = distinct value of the variable

●​ 𝑓𝑖 = absolute frequency (number of times xi appears)

●​ 𝑛 = total sample size

Alternatively, using relative frequencies (𝑝𝑖 = 𝑓𝑖 / 𝑛):

Value 𝑥𝑖 Absolute frequency 𝑓𝑖 Relative frequency 𝑝𝑖

Total 200 1.00

Step 1: Formula with absolute frequencies

(0 ⋅ 60)+(1 ⋅ 40)+(2 ⋅ 60)+(3 ⋅ 20)+(4 ⋅ 20)

Step 2: Formula with relative frequencies

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10) = 1. 5

●​ The mean from a frequency distribution is a weighted average.

●​ Mean: considers all values

●​ Definition: unexpectedly high or low values with low frequency

○​ Weighted mean: xˉ=Σ(value×relativefrequency)

○​ Minimizes Σ(xi–a)2 at a=xˉ

2.​ Property (Balancing Point):

●​ where 𝑥1, 𝑥2, …, 𝑥𝑛 are the observed values of the sample.

2. Property of the Mean (Balancing Point)

3. Property of the Mean (Minimization)

4. Weighted Mean (from Frequency Distribution)

If values 𝑥𝑖 have relative frequencies 𝑝𝑖:

For an interval of length 𝐿 with relative frequency 𝑓:

●​ 0 (30%), 1 (20%), 2 (30%), 3 (10%), 4 (10%).

𝑥 = (0 ⋅ 0. 30) + (1 ⋅ 0. 20) + (2 ⋅ 0. 30) + (3 ⋅ 0. 10) + (4 ⋅ 0. 10)

7. Median (Raw Data)

8. Median (Frequency Distribution – Cumulative Relative Frequency)

●​ Case 1: if no cumulative frequency = 0.5 →

𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑓𝑖𝑟𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑤ℎ𝑒𝑟𝑒 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 > 0. 5

●​ Case 2: if cumulative frequency = 0.5 at value xj:

𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙+𝑢𝑝𝑝𝑒𝑟 𝑏𝑜𝑢𝑛𝑑 𝑜𝑓 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

Lecture 3: Measures of location

●​ Location of a value in the sequence of orderer observations -> ordinal and

QUARTILES FROM A FREQUENCY DISTRIBUTION

value relative freq cumulative freq

Height of the box (𝑄3 − 𝑄1) = INTERQUARTILE RANGE IR

●​ Interquartile range IR -> upper quartile – lower quartile

●​ lower whisker connects the lower quartile to the minimum value

●​ upper whisker connects the upper quartile to the maximum value

IQR (Interquartile range)

Variance and Standard Deviation

Deviation = value - mean

● MACRO data - aggregate level

● CROSS-SECTIONAL data - only once, at a fixed time

● LONGITUDINAL data - collected at different periods in time:

★ Primary data - the one concluded by scientists

● categorical - not numbers

● Statistics in everyday life: Numbers dominate society; statistics appear everywhere.

● Statistical variable = result of measurement process; each question → variable

● Best when few distinct values

● Used for numerical variables with many distinct values

● Purpose: condense data into one typical value

● Most frequent value (highest frequency)

● Case 1: Cumulative frequency ≠ 0.5

● Case 2: Cumulative frequency = 0.5

● For continuous/grouped data shown in histograms:

● Mean (Arithmetic average) = sum of all values ÷ number of observations

● Deviation of observation 𝑥𝑖:

1. Balancing Point:

● 𝑥𝑖 = distinct value of the variable

● 𝑓𝑖 = absolute frequency (number of times xi appears)

● 𝑛 = total sample size

● The mean from a frequency distribution is a weighted average.

● Mean: considers all values

● Definition: unexpectedly high or low values with low frequency

○ Weighted mean: xˉ=Σ(value×relativefrequency)

○ Minimizes Σ(xi–a)2 at a=xˉ

2. Property (Balancing Point):

● where 𝑥1, 𝑥2, …, 𝑥𝑛 are the observed values of the sample.

● 0 (30%), 1 (20%), 2 (30%), 3 (10%), 4 (10%).

● Case 1: if no cumulative frequency = 0.5 →

● Case 2: if cumulative frequency = 0.5 at value xj:

● Location of a value in the sequence of orderer observations -> ordinal and

● Interquartile range IR -> upper quartile – lower quartile

● lower whisker connects the lower quartile to the minimum value

● upper whisker connects the upper quartile to the maximum value

● standard deviation expressed as a percentage of the mean

● Variance / Standard Deviation → variables with the same mean

● Applies to numerical variables

● left and right sides mirror each other

● Right-skewed (positively skewed):

○ Mean > Median.

○ Mean < Median.

● Symmetric distribution → mean = median

● all units same amount

● one unit has all (160), others zero

1. Concentration curve

● 𝑄 → proportion of the total variable’s amount held by the bottom 𝐹 proportion

● Point (0.25, 0.125) →

● If the curve is close to the diagonal line → distribution is equal.

● Values closer to 0 → the lower the concentration (more equal)

● Does one variable depend on the other?

● Dependent variable → measures what we want to explain

● Both variables (dependent & independent) → categorical

● Rows = values of one variable

● Streaming (dependent variable): Light, Moderate, Heavy

● Each cell can be expressed as a relative frequency of total sample

● Independent = Subscription (Free, Standard, Premium)

○ CFD of Streaming given Subscription = Standard

○ CFD of Streaming given Subscription = Premium

○ Free users: mostly Light (0.56)

● Association between Streaming and Subscription → Streaming depends on

● Dependent variable: numerical (e.g., streaming minutes/day)

○ X-axis → independent variable

● Positive: high values of one variable with high values of other

● Positive covariance → positive linear association

● Points in I and III → positive

● Majority of points in quadrants I and III → positive linear association

● Majority of points in quadrants II and IV → negative linear association

● Direction of the linear association

● r = 0 → no linear correlation (linear

● Experiment = process with uncertain outcome

● Random variable (X) → function associates a number to each experiment outcome

○ Discrete: finite/countable outcomes (e.g., coin tosses)