0% found this document useful (0 votes)

12 views114 pages

Module 01 2

The document outlines a training module on Unsupervised Learning, focusing on Principal Component Analysis (PCA) and its applications such as data compression and interpretation. It explains the differences between supervised and unsupervised learning, emphasizing exploratory data analysis and dimensionality reduction. The document also discusses the motivation behind PCA and its goal to find the best directions for approximating data points.

Uploaded by

Tesfaye Abera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views114 pages

Module 01 2

Uploaded by

Tesfaye Abera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unsupervised Learning Training

Module 1: Principal Component Analysis

Nicola Gnecco

18–20 June, 2024

Ethiopian Statistical Association

Module 2 :
Principal component analysis
Autoencoders and Variational Autoencoders
Module 2 :

Module 3 :
Clustering
Unsupervised learning

I In supervised learning we are given features/predictors X1 , . . . , Xp , and a response Y . The goal is to

predict Y using X1 , . . . , Xp
I Example: Linear regression, logistic regression, etc.

supervised learning
>
-
ML
-
> unsupervised learning
Unsupervised learning

I In supervised learning we are given features/predictors X1 , . . . , Xp , and a response Y . The goal is to

predict Y using X1 , . . . , Xp
I Example: Linear regression, logistic regression, etc.
I In unsupervised learning we develop a set of statistical methods for the setting in which we have
only a set of features X1 , . . . , Xp , but no response.
I The goal is to do exploratory data analysis and to discover interesting patterns about the measurements
on X1 , . . . , Xp .
I Is there an informative way to visualize the data?
> Clustering
I Can we discover subgroups among the variables or among the observations? -

I Can we reduce the dimensionality?

- PCA
Variational Auto encoders
Auto encoders ,
Unsupervised learning

I In supervised learning we are given features/predictors X1 , . . . , Xp , and a response Y . The goal is to

predict Y using X1 , . . . , Xp
I Example: Linear regression, logistic regression, etc.
I In unsupervised learning we develop a set of statistical methods for the setting in which we have
only a set of features X1 , . . . , Xp , but no response.
I The goal is to do exploratory data analysis and to discover interesting patterns about the measurements
on X1 , . . . , Xp .
I Is there an informative way to visualize the data?
I Can we discover subgroups among the variables or among the observations?
I Can we reduce the dimensionality?
I In this lecture we will discuss one of the most important tools for dimension reduction: principal
component analysis.
PCA: some intuition

I Let xi œ Rp , i = 1, . . . , n be samples of some centered p-dimensional random vector X œ Rp .

F
nx
P
PCA: some intuition

>
-
Xi =

(ineR
I Let xi œ Rp , i = 1, . . . , n be samples of some centered p-dimensional random vector X œ Rp .
I Principal component analysis or PCA aims at finding q Æ p directions w1ú , . . . , wqú œ Rp , called
principal components, such that the data can be well represented in the lower-dimensional subspace
spanned by these directions: particular each observation
for

for some coefficients zi1 , . . . , ziq œ R.

1
xi ¥ zi1 w1ú + · · · + ziq wqú
I
-

↑ observations
for all
wi ..., wat common
Example

R [p
:

200
million
= 55200 =
p
; a

/200x5
+
+
1 t 5
zos
compression

I
2

I I
-
-

(
200
= =
-

I ↑
Ix I t 5
W
-

225 40
·

WI + 2220 We +... + =
=

200X2ZCI
⑨

I I I
:
=> -
-

: I I
I *
*+ Inzow2+... + Zag Wa & 5
100 +
Xn
In Wi I I
-

I
M = 5M
200 X /M = 200 200 x5 + 5x/M =
5M + 1 , 000

=
PXn pXq + gxn qXu
Application 1: Data compression

I One common application of PCA is image compression.

Application 1: Data compression

I One common application of PCA is image compression.

I We can try to reduce the dimensionality of complex images.
Application 1: Data compression

I One common application of PCA is image compression.

I We can try to reduce the dimensionality of complex images. N
Il
I Consider the following pictures taken from a dataset of more than 10000
- faces.

400

92X114
p
=
92
=
10 , 304

112

↓
↓
30m
304
10
,

10
,
1
x - , XzE
,

400 X 10, 304

Original data : need to save nx
p =
Application 1: Data compression

92x112 10, 304

I Each image is represented as a 20
- ◊ 20 matrix of pixels, which can be unfolded into a S
400 ◊ 1 vector.
Application 1: Data compression

10, 304
I Each image is represented as a 02X12
20 ◊ 20 matrix of pixels, which can be unfolded into a 400 ◊ 1 vector.
-

I With PCA, we can compress each image and represent it as a linear combination of only q = 16 basis
images (also known as principal components).

I
Xi 2:
1
:
w , + zi2 .
I
We + .. . + zi
who
I I I
I Y-
10 , 304
Application 1: Data compression

92x/12 = 10, 304 10 ,304

I Each image is represented as a -
20 ◊ -
20 matrix of pixels, which can be unfolded into a 400
- ◊ 1 vector.
I With PCA, we can compress each image and represent it as a linear combination of only q = 16 basis
images (also known as principal components).
-We
W,
x

-
Wis
Application 1: Data compression

I Here are the resulting compressed (i.e., approximated) pictures for different number of basis images
used, q = 2, 5, 10, 30, 100, 200.
Application 1: Data compression
# coordinates

2X6

5X6

10 X6

200 X G
Application 2: Data interpretation

I Consider another problem. We take the US interest rate curves (also known as the yield curves) from
September 2009 to May 2015, on a daily basis.
Application 2: Data interpretation

I Consider another problem. We take the US interest rate curves (also known as the yield curves) from
September 2009 to May 2015, on a daily basis.
I On each of the 1480 days, we observe the snapshot of the yield curve on that day, with interest rates
that span from 3 months to 30 years maturity.
I Each observation is a curve on a given day, made of 16 maturities, i.e., a 16 ◊ 1 vector.
I To understand the dynamics of the intest rates we plot all the curves (one per day) on the same graph.
Application 2: Data interpretation

·
I
* it Zi , :
Wi
I
+ Ziz W2
:

+ 213
·
Wi
Application 2: Data interpretation

I Interestingly, we can explain the curve moves as a linear combination of the first three principal
components (PC).
I In this case, these principal components have a nice interpretation.
I The first PC represents the level of curve, i.e., all the maturities move up (or down) in parallel.
I The second PC represents the slope of the curve, i.e., long-term maturities move more (or less)
compared to the short-term maturities.
I The third PC represents the curvature of the yield curve, i.e., mid-term maturities move more (or less)
compared to the long-term and short-term maturities.
Application 2: Data interpretation
average slope curvature
-
>
Wi ↑ Ziz
:
W2 + Ziz
·
Wa
Xi
&

F *

= w/
=
w2
W3
=
Application 2: Data interpretation
I The approximated curves, look very similar to the original ones.
Principal component analysis (PCA): Motivation

I Consider a dataset x1 , . . . , xn of n iid copies of X œ Rp , where we assume that EX = 0.

Principal component analysis (PCA): Motivation

I Consider a dataset x1 , . . . , xn of n iid copies of X œ Rp , where we assume that EX = 0.

I Usually, the data does not ‘fill up’ the whole p-dimensional space.

10/ >
-

images live in lower dimensional space

100
Principal component analysis (PCA): Motivation

I Consider a dataset x1 , . . . , xn of n iid copies of X œ Rp , where we assume that EX = 0.

I Usually, the data does not ‘fill up’ the whole p-dimensional space.
Example: Consider a two-dimensional dataset, xi œ R2 , i = 1, . . . , n, where most of the observations lie close
to a line (a one-dimensional subspace of R2 ).
Principal component analysis (PCA): Motivation

=
-

-
x2
X

• • •

• •
x1
-

•
•
•
•
Principal component analysis (PCA): Motivation

x2 Xin Zi1 ·
We

• • •

•
.
•
• •
x1

•
•
Principal component analysis (PCA): Motivation

I In this situation, we can describe the dataset quite well by just looking at their coefficient along the gray
line.
Principal component analysis (PCA): Motivation

• • •

• •
x1

•
•
•
•
Principal component analysis (PCA): Motivation

•
•

⑭ •

•
•

•
•
•
•
Principal component analysis (PCA): Goal
The goal is to
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data. W,
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions. 2 i ,
i = 1,
. . .,
n .
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions.
Obviously, some directions are better than others to approximate the dataset.
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions.
Obviously, some directions are better than others to approximate the dataset. Consider the two alternatives.
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions.
Obviously, some directions are better than others to approximate the dataset. Consider the two alternatives.

• • •

• Tw
•
,

•
•
•
•
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions.
Obviously, some directions are better than others to approximate the dataset. Consider the two alternatives.

?
•

• • •

• •
h x1

:
•
•

•
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data.
I Find the coordinates of each data point with respect to these directions.
Obviously, some directions are better than others to approximate the dataset. Consider the two alternatives.

• •

"
•

• •
x1

•
•
•
•
Principal component analysis (PCA): Goal
The goal is to
I Find the directions (vectors) that ‘approximate’ best the data. > w -

I Find the coordinates of each data point with respect to these directions. > zi i -
,
n = 1
, ...,
.

Obviously, some directions are better than others to approximate the dataset. Consider the two alternatives.

sir
• • •

• •
x1

•
•
•
•
How to approximate a data point

I Consider a data point (observation) x . Given a vector w we can always write

x= zw
¸˚˙˝ + (x ≠ zw ),
¸ ˚˙ ˝
approximation error

where z œ R. The goal is to choose z to make the approximation as good as possible, or, equivalently,
to make the error as small as possible.
How to approximate a data point