Principal Component Analysis (PCA) is a technique used in data science and machine
learning to reduce the number of features (dimensions) in a dataset while keeping as much
important information (variance) as possible.
In simple terms:
� PCA converts many variables into fewer new variables called Principal Components.
These components capture the maximum variance in the data.
Why PCA is Used �
PCA is mainly used for:
1. Dimensionality Reduction
2. Noise Removal
3. Data Visualization (2D/3D)
4. Feature Extraction
5. Reducing Model Complexity
Example:
Before PCA After PCA
50 features 5 principal components
Intuition of PCA
PCA finds new axes that maximize variance.
PC1 (Principal Component 1) → captures maximum variance
PC2 → second highest variance
PC3 → third highest variance
These axes are orthogonal (perpendicular) to each other.
Simple Example (Concept)
Suppose a dataset has two features:
Height
Weight
These are usually highly correlated.
Instead of using both, PCA creates a new feature:
PC1 = combination of height and weight
So instead of 2 variables → we can represent most information with 1 component.
Mathematical Idea (Concept)
PCA is based on:
Covariance Matrix
Eigenvectors
Eigenvalues
Steps:
1. Standardize data
2. Compute covariance matrix
3. Calculate eigenvectors and eigenvalues
4. Select top components
5. Transform data
PCA Example Using Python �
Step 1: Import Libraries
import pandas as pd
import [Link] as plt
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import load_iris
Step 2: Load Dataset
data = load_iris()
X = [Link]
y = [Link]
Dataset features:
sepal length
sepal width
petal length
petal width
Step 3: Standardize Data
PCA works best when features are scaled.
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Step 4: Apply PCA
Reduce 4 features → 2 components.
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
Step 5: Check Explained Variance
print(pca.explained_variance_ratio_)
Example output:
[0.73, 0.23]
Meaning:
PC1 explains 73% variance
PC2 explains 23% variance
Total = 96% of information preserved
Step 6: Visualize PCA
[Link](X_pca[:,0], X_pca[:,1], c=y)
[Link]("Principal Component 1")
[Link]("Principal Component 2")
[Link]()
This converts 4D data → 2D visualization.
Real Example
Original Dataset:
Height Weight Age Salary
After PCA:
PC1 PC2
1.4 -0.5
0.9 0.2
Now instead of 4 variables → 2 components.
Advantages of PCA �
Reduces dimensionality
Removes redundant features
Improves model performance
Helps visualize high-dimensional data
Disadvantages of PCA �
Components are hard to interpret
Some information loss
Works best with linear relationships
PCA Workflow in Data Science
1�⃣ Standardize data
2�⃣ Compute covariance matrix
3�⃣ Calculate eigenvectors/eigenvalues
4�⃣ Choose top principal components
5�⃣ Transform dataset
� Summary
PCA is a dimensionality reduction technique that transforms many correlated features into
a smaller number of uncorrelated principal components while preserving maximum
variance.
� If you'd like, I can also explain:
PCA Interview Questions for Data Science
Step-by-step PCA mathematical example
Difference between PCA and LDA
When to use PCA in Machine Learning (very important for interviews).
Step-by-step PCA mathematical example
Step-by-Step PCA Mathematical Example �
4
Below is a simple mathematical PCA example using a small dataset so you can clearly
understand every step.
We will reduce 2 features → 1 principal component.
Step 1: Original Dataset
Assume we have two variables:
X = Study Hours
Y = Exam Score
Dataset:
Observation X Y
1 2 4
2 4 2
3 6 6
4 8 8
Matrix form:
X=[24426688]X = \begin{bmatrix} 2 & 4 \\ 4 & 2 \\ 6 & 6 \\ 8 & 8 \end{bmatrix}X=2468
4268
Step 2: Compute Mean of Each Feature
Mean of X:
xˉ=(2+4+6+8)/4=5\bar{x} = (2+4+6+8)/4 = 5xˉ=(2+4+6+8)/4=5
Mean of Y:
yˉ=(4+2+6+8)/4=5\bar{y} = (4+2+6+8)/4 = 5yˉ=(4+2+6+8)/4=5
Mean vector:
(5,5)(5,5)(5,5)
Step 3: Center the Data
Subtract mean from each value.
XY
-3 -1
-1 -3
1 1
3 3
Matrix:
Xc=[−3−1−1−31133]X_c = \begin{bmatrix} -3 & -1 \\ -1 & -3 \\ 1 & 1 \\ 3 & 3
\end{bmatrix}Xc=−3−113−1−313
Step 4: Compute Covariance Matrix
Covariance matrix formula:
C=1n−1XcTXcC = \frac{1}{n-1} X_c^T X_cC=n−11XcTXc
Result:
C=[6.67446.67]C = \begin{bmatrix} 6.67 & 4 \\ 4 & 6.67 \end{bmatrix}C=[6.67446.67]
Meaning:
Variance of X = 6.67
Variance of Y = 6.67
Covariance = 4
Step 5: Find Eigenvalues and Eigenvectors
Solve:
∣C−λI∣=0|C - \lambda I| = 0∣C−λI∣=0
Eigenvalues:
λ1=10.67\lambda_1 = 10.67λ1=10.67 λ2=2.67\lambda_2 = 2.67λ2=2.67
Eigenvectors:
v1=[0.710.71]v_1 = \begin{bmatrix} 0.71 \\ 0.71 \end{bmatrix}v1=[0.710.71]
v2=[−0.710.71]v_2 = \begin{bmatrix} -0.71 \\ 0.71 \end{bmatrix}v2=[−0.710.71]
Step 6: Select Principal Component
Largest eigenvalue = 10.67
Corresponding eigenvector:
PC1=[0.710.71]PC1 = \begin{bmatrix} 0.71 \\ 0.71 \end{bmatrix}PC1=[0.710.71]
This direction captures maximum variance.
Step 7: Transform Data
Project data onto PC1.
Example for first point:
(−3,−1)⋅(0.71,0.71)(-3,-1) \cdot (0.71,0.71)(−3,−1)⋅(0.71,0.71) =−2.84= -2.84=−2.84
Projected values (approx):
Observation PC1
1 -2.84
2 -2.84
3 1.42
4 4.26
Now we reduced:
2 variables → 1 principal component
Final Result
Original Data:
XY
2 4
4 2
6 6
8 8
After PCA:
PC1
-2.84
-2.84
1.42
4.26
Most information is preserved in PC1.
Intuition Behind PCA
PCA finds the direction where data spreads the most.
PC1 → maximum variance
PC2 → second largest variance
These directions are perpendicular (orthogonal).
� Summary of PCA Steps
1�⃣ Standardize data
2�⃣ Compute covariance matrix
3�⃣ Calculate eigenvalues
4�⃣ Find eigenvectors
5�⃣ Select largest eigenvalues
6�⃣ Project data to new components
� If you want, I can also explain:
PCA intuition visually (very easy explanation)
PCA interview questions asked in data science
Difference between PCA and LDA
How to decide number of PCA components (Scree Plot & Explained Variance).
Example of PCA
Example of Principal Component Analysis (PCA) �
Simple Real-World Example
Imagine a dataset of students with the following features:
Student Math Score Physics Score
A 90 88
B 85 84
C 60 65
D 95 92
E 70 72
Here we have two variables:
Math Score
Physics Score
These two variables are highly correlated because students good at math often perform well
in physics.
Step 1: Understand the Problem
Instead of using two variables, PCA can combine them into one principal component that
represents overall science ability.
So PCA transforms:
Math Physics
90 88
into
PC1
125.4
The new variable PC1 captures the maximum variance of both scores.
Step 2: Idea of PCA
PCA creates a new axis that best represents the data.
Instead of:
X-axis → Math
Y-axis → Physics
PCA finds a new axis (PC1) where data varies the most.
Step 3: Result After PCA
Original dataset:
Math Physics
90 88
85 84
60 65
95 92
70 72
After PCA:
PC1 PC2
125.4 0.3
119.0 0.2
88.5 -1.1
132.6 0.5
100.2 0.1
But PC1 explains most of the variance, so sometimes we keep only PC1.
Thus we reduce:
2 features → 1 feature
Example Using Python �
Step 1: Import Libraries
import pandas as pd
from [Link] import PCA
from [Link] import StandardScaler
Step 2: Create Dataset
data = {
'Math':[90,85,60,95,70],
'Physics':[88,84,65,92,72]
}
df = [Link](data)
Step 3: Standardize Data
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
Step 4: Apply PCA
pca = PCA(n_components=1)
principal_component = pca.fit_transform(scaled_data)
Step 5: Result
print(principal_component)
Output (example):
[[ 1.50]
[ 1.20]
[-1.80]
[ 1.90]
[-0.80]]
This single column represents maximum information from both subjects.
Real Data Science Example
PCA is commonly used in:
Image Processing
Example:
A face image may have 10,000 pixels (features).
Using PCA:
10,000 features → 100 principal components
This reduces computation and storage.
Finance
Stock dataset:
| Stock Price | Volume | Market Cap | P/E Ratio |
PCA reduces many financial variables into few market factors.
Key Idea to Remember
PCA transforms:
Original Features→Principal Components\text{Original Features} \rightarrow \text{Principal
Components}Original Features→Principal Components
where PC1 captures the most variance.
� Summary
PCA reduces high-dimensional data
Creates new uncorrelated variables
Keeps maximum information
Used in machine learning, computer vision, and finance
� If you want, I can also show:
PCA interview questions (very commonly asked)
Difference between PCA and LDA
Scree plot example to choose PCA components
Real dataset PCA example (Iris dataset).
Scree plot example to choose PCA components
Scree Plot Example to Choose PCA Components �
4
A Scree Plot is a graph used in Principal Component Analysis (PCA) to decide how many
principal components to keep.
It plots:
X-axis → Principal Components
Y-axis → Explained Variance (or Eigenvalues)
The goal is to find the “elbow point” where adding more components does not significantly
increase explained variance.
Example Dataset
Suppose PCA produces the following explained variance ratios:
Component Explained Variance
PC1 0.45
PC2 0.25
PC3 0.15
PC4 0.08
PC5 0.04
PC6 0.03
Total variance explained cumulatively:
Components Variance Explained
PC1 45%
PC1 + PC2 70%
PC1 + PC2 + PC3 85%
PC1 + PC2 + PC3 + PC4 93%
So we might keep first 3 components because they explain 85% of the data.
Scree Plot in Python �
Step 1: Import Libraries
import numpy as np
import [Link] as plt
from [Link] import PCA
from [Link] import StandardScaler
from [Link] import load_iris
Step 2: Load Dataset
data = load_iris()
X = [Link]
Step 3: Standardize Data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Step 4: Apply PCA
pca = PCA()
[Link](X_scaled)
Step 5: Create Scree Plot
[Link](range(1, len(pca.explained_variance_ratio_) + 1),
pca.explained_variance_ratio_,
marker='o')
[Link]("Principal Components")
[Link]("Explained Variance")
[Link]("Scree Plot")
[Link]()
How to Interpret Scree Plot
Example pattern:
Component Variance
PC1 High
PC2 Moderate
PC3 Small
PC4+ Very small
If the curve becomes flat after PC3, then:
� Choose first 3 components
This point is called the Elbow Point.
Rule of Thumb
Common criteria used:
1�⃣ Elbow Method
Choose components before the curve flattens.
2�⃣ Explained Variance Threshold
Keep components explaining 80–95% variance.
Example:
pca.explained_variance_ratio_.cumsum()
Example Interpretation
If cumulative variance:
Components Variance
1 50%
2 75%
3 90%
4 95%
Most data scientists choose 3 components.
� Summary
Scree plot helps decide:
Optimal Number of PCA Components\text{Optimal Number of PCA
Components}Optimal Number of PCA Components
by analyzing explained variance vs number of components.
� If you want, I can also explain:
Difference between PCA and LDA (very common interview question)
Real PCA example using the Iris dataset with visualization
Advanced PCA concepts (loadings, explained variance, eigenvectors)
When NOT to use PCA in machine learning.