Principal
Principal Component
Component Analysis
Analysis
Sayan Sikder
Outlines
What is Principal Component Analysis?
New axes
Principal Components
Principle and properties
Computing the components
Covariance
• Variance and Covariance are a measure of the “spread”
of a set
of points around their center of mass (mean)
• Variance – measure of the deviation from the mean for
points in
one dimension e.g. heights
• Covariance as a measure of how much each of the
dimensions
vary from the mean with respect to each other.
• Covariance is measured between 2 dimensions to see if
there is
a relationship between the 2 dimensions e.g. number of
hours
Calculating Covariance
Hence, if you had a 3-dimensional data set (x, y, z),
then you could measure the covariance between
the x and y dimensions, the y and z dimensions, and
the x and z dimensions. Measuring the covariance
between x and x , or y and y , or z and z would give
you the variance of the x , y and z dimensions
respectively.
An example
Prob: Given the following data, reduce dimension from 2 to
1 using PCA.
Data:
Features France Germany Brazil England
Goals 4 8 13 7
scored
Goals 11 4 5 14
conceded
No. of features, n: 2
No. of samples, N: 4
An example (contd.)
An example (contd.)
An example (contd.)
An example (contd.)
An example (contd.)
t is the scaling factor that parameterizes all possible eigenvectors for a given
eigenvalue — it represents the fact that eigenvectors are only defined up to a
multiplicative constant.
set 𝑡 = 1 in your image is purely for simplicity
An example (contd.)
An example (contd.)
Step 4:
Derive new dataset
Application: Image compression
• Divide the original 372x492 image into patches
• Each patch is an instance that contains 12x12
pixels on a grid
• View each as a 144-D vector
Application: Image compression (contd.)
Application: Image compression (contd.)
2 2
4 4
6 6
8 8
10 10
12 12
2 4 6 8 10 12 2 4 6 8 10 12
10
12
2 4 6 8 10 12
3 most important eigen vectors
Application: Image compression (contd.)
Importance of PCA