0% found this document useful (0 votes)
13 views22 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms correlated features into fewer uncorrelated principal components while preserving maximum variance. It is used for noise removal, data visualization, and improving model performance. The PCA process involves standardizing data, computing the covariance matrix, and selecting principal components based on eigenvalues and eigenvectors.

Uploaded by

ggpshamli1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views22 pages

Principal Component Analysis

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms correlated features into fewer uncorrelated principal components while preserving maximum variance. It is used for noise removal, data visualization, and improving model performance. The PCA process involves standardizing data, computing the covariance matrix, and selecting principal components based on eigenvalues and eigenvectors.

Uploaded by

ggpshamli1989
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Principal Component Analysis (PCA) is a technique used in data science and machine

learning to reduce the number of features (dimensions) in a dataset while keeping as much
important information (variance) as possible.

In simple terms:

� PCA converts many variables into fewer new variables called Principal Components.

These components capture the maximum variance in the data.

Why PCA is Used �


PCA is mainly used for:

1. Dimensionality Reduction
2. Noise Removal
3. Data Visualization (2D/3D)
4. Feature Extraction
5. Reducing Model Complexity

Example:

Before PCA After PCA


50 features 5 principal components

Intuition of PCA
PCA finds new axes that maximize variance.

 PC1 (Principal Component 1) → captures maximum variance


 PC2 → second highest variance
 PC3 → third highest variance

These axes are orthogonal (perpendicular) to each other.

Simple Example (Concept)


Suppose a dataset has two features:

 Height
 Weight
These are usually highly correlated.

Instead of using both, PCA creates a new feature:

PC1 = combination of height and weight

So instead of 2 variables → we can represent most information with 1 component.

Mathematical Idea (Concept)


PCA is based on:

 Covariance Matrix
 Eigenvectors
 Eigenvalues

Steps:

1. Standardize data
2. Compute covariance matrix
3. Calculate eigenvectors and eigenvalues
4. Select top components
5. Transform data

PCA Example Using Python �


Step 1: Import Libraries
import pandas as pd
import [Link] as plt
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import load_iris

Step 2: Load Dataset


data = load_iris()

X = [Link]
y = [Link]

Dataset features:
 sepal length
 sepal width
 petal length
 petal width

Step 3: Standardize Data


PCA works best when features are scaled.

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Apply PCA


Reduce 4 features → 2 components.

pca = PCA(n_components=2)

X_pca = pca.fit_transform(X_scaled)

Step 5: Check Explained Variance


print(pca.explained_variance_ratio_)

Example output:

[0.73, 0.23]

Meaning:

 PC1 explains 73% variance


 PC2 explains 23% variance

Total = 96% of information preserved

Step 6: Visualize PCA


[Link](X_pca[:,0], X_pca[:,1], c=y)
[Link]("Principal Component 1")
[Link]("Principal Component 2")
[Link]()

This converts 4D data → 2D visualization.


Real Example
Original Dataset:

Height Weight Age Salary

After PCA:

PC1 PC2
1.4 -0.5
0.9 0.2

Now instead of 4 variables → 2 components.

Advantages of PCA �
 Reduces dimensionality
 Removes redundant features
 Improves model performance
 Helps visualize high-dimensional data

Disadvantages of PCA �
 Components are hard to interpret
 Some information loss
 Works best with linear relationships

PCA Workflow in Data Science


1�⃣ Standardize data
2�⃣ Compute covariance matrix
3�⃣ Calculate eigenvectors/eigenvalues
4�⃣ Choose top principal components
5�⃣ Transform dataset
� Summary

PCA is a dimensionality reduction technique that transforms many correlated features into
a smaller number of uncorrelated principal components while preserving maximum
variance.

� If you'd like, I can also explain:

 PCA Interview Questions for Data Science


 Step-by-step PCA mathematical example
 Difference between PCA and LDA
 When to use PCA in Machine Learning (very important for interviews).

Step-by-step PCA mathematical example

Step-by-Step PCA Mathematical Example �


4

Below is a simple mathematical PCA example using a small dataset so you can clearly
understand every step.

We will reduce 2 features → 1 principal component.

Step 1: Original Dataset


Assume we have two variables:

 X = Study Hours
 Y = Exam Score

Dataset:
Observation X Y
1 2 4
2 4 2
3 6 6
4 8 8

Matrix form:

X=[24426688]X = \begin{bmatrix} 2 & 4 \\ 4 & 2 \\ 6 & 6 \\ 8 & 8 \end{bmatrix}X=2468


4268

Step 2: Compute Mean of Each Feature


Mean of X:

xˉ=(2+4+6+8)/4=5\bar{x} = (2+4+6+8)/4 = 5xˉ=(2+4+6+8)/4=5

Mean of Y:

yˉ=(4+2+6+8)/4=5\bar{y} = (4+2+6+8)/4 = 5yˉ=(4+2+6+8)/4=5

Mean vector:

(5,5)(5,5)(5,5)

Step 3: Center the Data


Subtract mean from each value.

XY
-3 -1
-1 -3
1 1
3 3

Matrix:

Xc=[−3−1−1−31133]X_c = \begin{bmatrix} -3 & -1 \\ -1 & -3 \\ 1 & 1 \\ 3 & 3


\end{bmatrix}Xc=−3−113−1−313

Step 4: Compute Covariance Matrix


Covariance matrix formula:

C=1n−1XcTXcC = \frac{1}{n-1} X_c^T X_cC=n−11XcTXc

Result:

C=[6.67446.67]C = \begin{bmatrix} 6.67 & 4 \\ 4 & 6.67 \end{bmatrix}C=[6.67446.67]

Meaning:

 Variance of X = 6.67
 Variance of Y = 6.67
 Covariance = 4

Step 5: Find Eigenvalues and Eigenvectors


Solve:

∣C−λI∣=0|C - \lambda I| = 0∣C−λI∣=0

Eigenvalues:

λ1=10.67\lambda_1 = 10.67λ1=10.67 λ2=2.67\lambda_2 = 2.67λ2=2.67

Eigenvectors:

v1=[0.710.71]v_1 = \begin{bmatrix} 0.71 \\ 0.71 \end{bmatrix}v1=[0.710.71]


v2=[−0.710.71]v_2 = \begin{bmatrix} -0.71 \\ 0.71 \end{bmatrix}v2=[−0.710.71]

Step 6: Select Principal Component


Largest eigenvalue = 10.67

Corresponding eigenvector:

PC1=[0.710.71]PC1 = \begin{bmatrix} 0.71 \\ 0.71 \end{bmatrix}PC1=[0.710.71]

This direction captures maximum variance.

Step 7: Transform Data


Project data onto PC1.
Example for first point:

(−3,−1)⋅(0.71,0.71)(-3,-1) \cdot (0.71,0.71)(−3,−1)⋅(0.71,0.71) =−2.84= -2.84=−2.84

Projected values (approx):

Observation PC1
1 -2.84
2 -2.84
3 1.42
4 4.26

Now we reduced:

2 variables → 1 principal component

Final Result
Original Data:

XY
2 4
4 2
6 6
8 8

After PCA:

PC1
-2.84
-2.84
1.42
4.26

Most information is preserved in PC1.

Intuition Behind PCA


PCA finds the direction where data spreads the most.
 PC1 → maximum variance
 PC2 → second largest variance

These directions are perpendicular (orthogonal).

� Summary of PCA Steps

1�⃣ Standardize data


2�⃣ Compute covariance matrix
3�⃣ Calculate eigenvalues
4�⃣ Find eigenvectors
5�⃣ Select largest eigenvalues
6�⃣ Project data to new components

� If you want, I can also explain:

 PCA intuition visually (very easy explanation)


 PCA interview questions asked in data science
 Difference between PCA and LDA
 How to decide number of PCA components (Scree Plot & Explained Variance).

Example of PCA

Example of Principal Component Analysis (PCA) �


Simple Real-World Example

Imagine a dataset of students with the following features:


Student Math Score Physics Score
A 90 88
B 85 84
C 60 65
D 95 92
E 70 72

Here we have two variables:

 Math Score
 Physics Score

These two variables are highly correlated because students good at math often perform well
in physics.

Step 1: Understand the Problem


Instead of using two variables, PCA can combine them into one principal component that
represents overall science ability.

So PCA transforms:

Math Physics
90 88

into

PC1
125.4

The new variable PC1 captures the maximum variance of both scores.

Step 2: Idea of PCA


PCA creates a new axis that best represents the data.

Instead of:

 X-axis → Math
 Y-axis → Physics
PCA finds a new axis (PC1) where data varies the most.

Step 3: Result After PCA


Original dataset:

Math Physics
90 88
85 84
60 65
95 92
70 72

After PCA:

PC1 PC2
125.4 0.3
119.0 0.2
88.5 -1.1
132.6 0.5
100.2 0.1

But PC1 explains most of the variance, so sometimes we keep only PC1.

Thus we reduce:

2 features → 1 feature

Example Using Python �


Step 1: Import Libraries
import pandas as pd
from [Link] import PCA
from [Link] import StandardScaler

Step 2: Create Dataset


data = {
'Math':[90,85,60,95,70],
'Physics':[88,84,65,92,72]
}

df = [Link](data)

Step 3: Standardize Data


scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)

Step 4: Apply PCA


pca = PCA(n_components=1)

principal_component = pca.fit_transform(scaled_data)

Step 5: Result
print(principal_component)

Output (example):

[[ 1.50]
[ 1.20]
[-1.80]
[ 1.90]
[-0.80]]

This single column represents maximum information from both subjects.

Real Data Science Example


PCA is commonly used in:

Image Processing

Example:

A face image may have 10,000 pixels (features).

Using PCA:

10,000 features → 100 principal components

This reduces computation and storage.


Finance

Stock dataset:

| Stock Price | Volume | Market Cap | P/E Ratio |

PCA reduces many financial variables into few market factors.

Key Idea to Remember


PCA transforms:

Original Features→Principal Components\text{Original Features} \rightarrow \text{Principal


Components}Original Features→Principal Components

where PC1 captures the most variance.

� Summary

 PCA reduces high-dimensional data


 Creates new uncorrelated variables
 Keeps maximum information
 Used in machine learning, computer vision, and finance

� If you want, I can also show:

 PCA interview questions (very commonly asked)


 Difference between PCA and LDA
 Scree plot example to choose PCA components
 Real dataset PCA example (Iris dataset).

Scree plot example to choose PCA components

Scree Plot Example to Choose PCA Components �


4

A Scree Plot is a graph used in Principal Component Analysis (PCA) to decide how many
principal components to keep.

It plots:

 X-axis → Principal Components


 Y-axis → Explained Variance (or Eigenvalues)

The goal is to find the “elbow point” where adding more components does not significantly
increase explained variance.

Example Dataset
Suppose PCA produces the following explained variance ratios:
Component Explained Variance
PC1 0.45
PC2 0.25
PC3 0.15
PC4 0.08
PC5 0.04
PC6 0.03

Total variance explained cumulatively:

Components Variance Explained


PC1 45%
PC1 + PC2 70%
PC1 + PC2 + PC3 85%
PC1 + PC2 + PC3 + PC4 93%

So we might keep first 3 components because they explain 85% of the data.

Scree Plot in Python �


Step 1: Import Libraries
import numpy as np
import [Link] as plt
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import load_iris

Step 2: Load Dataset


data = load_iris()
X = [Link]

Step 3: Standardize Data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

Step 4: Apply PCA


pca = PCA()
[Link](X_scaled)

Step 5: Create Scree Plot


[Link](range(1, len(pca.explained_variance_ratio_) + 1),
pca.explained_variance_ratio_,
marker='o')

[Link]("Principal Components")
[Link]("Explained Variance")
[Link]("Scree Plot")
[Link]()

How to Interpret Scree Plot


Example pattern:

Component Variance
PC1 High
PC2 Moderate
PC3 Small
PC4+ Very small

If the curve becomes flat after PC3, then:

� Choose first 3 components

This point is called the Elbow Point.

Rule of Thumb
Common criteria used:

1�⃣ Elbow Method

Choose components before the curve flattens.

2�⃣ Explained Variance Threshold

Keep components explaining 80–95% variance.

Example:
pca.explained_variance_ratio_.cumsum()

Example Interpretation
If cumulative variance:

Components Variance
1 50%
2 75%
3 90%
4 95%

Most data scientists choose 3 components.

� Summary

Scree plot helps decide:

Optimal Number of PCA Components\text{Optimal Number of PCA


Components}Optimal Number of PCA Components

by analyzing explained variance vs number of components.

� If you want, I can also explain:

 Difference between PCA and LDA (very common interview question)


 Real PCA example using the Iris dataset with visualization
 Advanced PCA concepts (loadings, explained variance, eigenvectors)
 When NOT to use PCA in machine learning.

You might also like