0% found this document useful (0 votes)
19 views11 pages

AI in Computer Vision: Comprehensive Notes

The document provides detailed notes on AI for Computer Vision, covering various units including image formation, feature detection, motion estimation, 3D reconstruction, and deep learning applications. Each unit includes theoretical concepts, key techniques, and practical examples, particularly focusing on methods like CNNs, SIFT, and stereo vision. Additionally, the document highlights important exam questions and code examples relevant to the topics discussed.

Uploaded by

dp9476825
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views11 pages

AI in Computer Vision: Comprehensive Notes

The document provides detailed notes on AI for Computer Vision, covering various units including image formation, feature detection, motion estimation, 3D reconstruction, and deep learning applications. Each unit includes theoretical concepts, key techniques, and practical examples, particularly focusing on methods like CNNs, SIFT, and stereo vision. Additionally, the document highlights important exam questions and code examples relevant to the topics discussed.

Uploaded by

dp9476825
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AI for Computer Vision - Detailed Notes

(RGPV 7th Sem)


Unit I: Introduction to Image Formation and Processing
Detailed notes will be added here...

Unit II: Feature Detection, Matching and Segmentation


Detailed notes will be added here...

Unit III: Feature-based Alignment & Motion Estimation (2D/3D)


Detailed notes will be added here...

Unit IV: 3D Reconstruction Techniques


Detailed notes will be added here...

Unit V: Image-based Rendering and Recognition


Detailed notes will be added here...

✅ **Unit I: Introduction to Image Formation and Processing**

--- THEORY NOTES ---

• Image Formation Basics: An image is a 2D projection of a 3D scene captured by a camera


using lens and light.

• Camera Model: Describes how light rays map from the world to the image plane.

- Pinhole Camera Model: Simplest geometry-based projection model.

- Perspective Projection: Objects farther away appear smaller.

• Radiometry & Photometry: Measurement of light energy affecting pixel intensity values.

• Image Types: Binary, Grayscale, Color, RGB, HSV.


--- IMAGE PROCESSING PIPELINE ---

Image Acquisition ➝ Preprocessing ➝ Feature Extraction ➝ Analysis ➝


Recognition/Interpretation

--- IMAGE FILTERING TECHNIQUES ---

1 Spatial Domain Filters: Operate directly on pixels


1️⃣

• Smoothing Filters → Noise Reduction (Mean, Gaussian)

• Sharpening Filters → Highlight Edges (Laplacian, High-Boost)

2️⃣Frequency Domain Processing: Apply Fourier Transform

• Low-Pass Filters → Remove noise

• High-Pass Filters → Detect edges

--- EDGE DETECTION ---

• Purpose: Detect boundaries of objects

• Operators: Sobel, Prewitt, Roberts, Canny

• Canny Edge Detector Steps: Smoothing → Gradient → Non-max suppression → Hysteresis


thresholding

--- IMAGE TRANSFORMS ---

• Fourier Transform → represents image in frequency domain

• Wavelets → Multi-resolution image analysis

--- PYRAMIDS ---

• Gaussian Pyramid: Repeated smoothing + downsampling

• Laplacian Pyramid: Edge information pyramid

--- OPTIMIZATION IN VISION ---

Used in feature matching, energy minimization, segmentation etc.


--- SHORT EXAM QUESTIONS ---

Q1: Define Pinhole Camera Model.

Q2: Differentiate between spatial and frequency domain filters.

Q3: Write steps of Canny edge detection.

--- LONG EXAM QUESTIONS ---

Q1: Explain image formation process with camera model and geometry.

Q2: Explain image filtering with suitable examples and diagrams.

✅ **Unit II: Feature Detection, Matching & Segmentation**

--- THEORY NOTES ---

⭐ IMP: Feature Detection identifies key points in an image that are invariant to changes in
scale, rotation or lighting.

Common detectors: Harris Corner, SIFT, SURF, FAST, ORB

➡ Harris Corner Detector

• Based on detecting corners where there is a large change in all directions

• Uses auto-correlation matrix

➡ SIFT (Scale Invariant Feature Transform) ⭐ IMP

• Detects scale and rotation invariant features

• Steps: Scale-space → Keypoint localization → Orientation assignment → Descriptor


generation

➡ SURF (Speeded-Up Robust Features)

• Faster than SIFT using box filters and integral images

➡ FAST & ORB


• FAST is extremely fast corner detector

• ORB combines FAST + BRIEF descriptors for real-time apps (e.g., robotics)

--- FEATURE MATCHING ---

⭐ IMP: Used to find correspondences between images

Methods: SSD (Sum of Squared Differences), NCC (Normalized Cross Correlation), Hamming
distance for ORB

➡ RANSAC (Random Sample Consensus) ⭐ Most Asked

• Removes false matches by estimating the best model through random sampling

--- IMAGE SEGMENTATION ---

⭐ IMP: Process of dividing image into meaningful regions

➡ Thresholding-based Segmentation

• Otsu Method: Finds optimal threshold by maximizing variance between classes

➡ Region-based Segmentation

• Region growing & splitting, merging

➡ Clustering-based Segmentation ⭐ IMP

• K-Means clustering: groups pixels based on similarity

➡ Graph-based Segmentation ⭐ IMP

• Graph Cut: minimizes cut cost to separate foreground & background

➡ Watershed Segmentation ⭐ Important for diagrams

• Visualizes gradient of image as a topographic surface

--- SHORT EXAM QUESTIONS ---


Q1: Define feature detection (IMP)

Q2: Difference between SIFT and SURF

Q3: What is RANSAC? Why used? ⭐

--- LONG EXAM QUESTIONS (Repeated in RGPV) ---

Q1: Explain SIFT algorithm with steps ⭐⭐

Q2: Explain image segmentation techniques with examples ⭐⭐

✅ **Unit III: Feature-based Alignment & Motion Estimation (2D/3D)**

--- THEORY NOTES ---

➡ ⭐ IMP: Pose Estimation

• Determines camera location + orientation relative to the object

• Uses feature correspondences and geometric constraints

➡ Triangulation

• Uses 2D projections from multiple views to recover 3D points

➡ ⭐ IMP: Structure from Motion (SfM)

• Recovers 3D scene + camera motion from multiple images

• Used in 3D mapping, AR, drones

➡ ⭐ Most Asked: Optical Flow

• Motion estimation by pixel intensity changes between frames

• Assumption: intensity constant over motion

Popular Methods:

• Lucas-Kanade → Sparse estimation

• Horn-Schunck → Dense estimation


➡ Bundle Adjustment ⭐⭐ Highly Asked in Exams

• Optimization technique in SfM to minimize reprojection error

➡ ⭐ Most Repeated: Camera Calibration

• Estimation of intrinsic + extrinsic parameters of camera

➡ Layered Motion Estimation

• Separates motion into different layers for better tracking

--- ✅ OpenCV Code Examples (IMP for Viva) ---

➡ Lucas-Kanade Optical Flow

import cv2

cap = [Link](0)
lk_params = dict(winSize=(15, 15))

while True:
ret, frame = [Link]()
gray = [Link](frame, cv2.COLOR_BGR2GRAY)
[Link]("Optical Flow", gray)
if [Link](1) & 0xFF == ord('q'):
break

[Link]()
[Link]()

➡ Camera Calibration (Pseudo Example)

import cv2
import numpy as np

# Termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
objp = [Link]((6*7, 3), np.float32)
objp[:, :2] = [Link][0:7, 0:6].[Link](-1, 2)
# Used when clicking chessboard patterns

--- SHORT EXAM QUESTIONS ---

Q1: What is optical flow? ⭐

Q2: Define structure from motion (SfM).

Q3: What is camera calibration? ⭐

--- LONG EXAM QUESTIONS (Repeated in RGPV) ---

Q1: Explain optical flow with its different methods ⭐⭐

Q2: Explain Structure from Motion with example diagrams ⭐⭐

✅ **Unit IV: 3D Reconstruction Techniques**

--- THEORY NOTES ---

➡ ⭐ Stereo Vision (IMP + Diagrams)

• Uses two or more camera views to estimate depth

• Works like human eyes → Binocular disparity enables depth estimation

• Steps: Feature Matching → Disparity Map → Depth Map Computation

📌 Depth Formula:

Depth (Z) = (f × B) / Disparity (d)

where f = focal length, B = baseline distance

📝 Diagram (Concept - Text Representation):

[Camera-L] ---- Object ---- [Camera-R]

Matching points shift → Disparity → Depth


➡ ⭐ Shape-from-X Methods (Mostly Asked)

• Recover 3D shape using different clues:

Shape-from-Shading
1️⃣

• Uses variations in brightness to estimate surface orientation

• Assumes: Single light source, uniform material

2️⃣Shape-from-Silhouette

• Combined outlines from multiple views create visual hull

Shape-from-Motion
3️⃣ (SfM) Repeated in Exams

• Uses motion between frames to build 3D structure

➡ 3D Point Cloud Representation

• Collection of 3D points representing object shape

• Used in robotics, AR, autonomous navigation

➡ Volumetric Reconstruction

• Represents shape using voxels (3D pixels)

• Example: CT scan volume model

➡ Surface Reconstruction ⭐ Viva Question

• Creates surface mesh from point cloud

• Output: Triangular mesh

--- APPLICATIONS ---

• AR/VR, Robotics, Medical Imaging, Gaming, Drone Mapping

--- SHORT EXAM QUESTIONS ---


Q1: Define disparity in stereo vision. ⭐

Q2: What is point cloud?

Q3: Define volumetric reconstruction.

--- LONG EXAM QUESTIONS (Frequently Asked in RGPV) ---

Q1: Explain stereo vision with depth estimation formula + diagram ⭐⭐

Q2: Describe different Shape-from-X methods ⭐⭐

UNIT-5: Deep Learning for Computer Vision

1 Introduction to Deep Learning for Vision


1️⃣
Deep Learning uses neural networks with multiple layers to learn image patterns. It
automatically extracts features like edges, textures, objects — unlike traditional computer
vision where features are manually designed.

2️⃣Convolutional Neural Networks (CNN)


CNNs are the backbone of modern computer vision. They consist of layers that learn
hierarchical features:
• Low-level: edges, corners
• Mid-level: shapes
• High-level: full objects

Main Components of CNN:

1️⃣Convolution Layer – extracts features using filters/kernels

2️⃣Activation Function (ReLU) – introduces non-linearity

3️⃣Pooling Layer – reduces spatial size (Max Pooling is common)

4️⃣Fully Connected Layer – final classification

CNN Block Diagram


Input Image → Convolution → ReLU → Pooling → Flatten → Fully Connected → Softmax
Output

Important CNN Terms


• Stride – step size of filter movement
• Padding – keeps output size same as input
• Feature Map – output of convolution layer
• Kernel – small filter matrix
3️⃣Popular CNN Architectures
✅ LeNet-5 – first CNN for digits (MNIST)
✅ AlexNet – deeper CNN using ReLU & dropout
✅ VGGNet – uses 3×3 convolutions repeatedly
✅ ResNet – introduces skip connections to solve vanishing gradient

ResNet Block Structure:

Input → Convolution Layers → Output + Input (Skip Connection) → ReLU

4️⃣Transfer Learning
Pre-trained models like VGG16, ResNet50 are trained on large datasets like ImageNet.
We reuse their learned features and train only final layers for new tasks.

Advantages:

• Less data required


• Faster training
• Higher accuracy

5️⃣CNN Code Example (PyTorch)


A simple CNN image classifier example:

import torch
import [Link] as nn
import [Link] as optim
from torchvision import datasets, transforms

# Data Preprocessing
transform = [Link]([
[Link](),
[Link]((0.5,), (0.5,))
])

train_data = [Link](root='./data', train=True,


download=True, transform=transform)
train_loader = [Link](train_data,
batch_size=64, shuffle=True)

# CNN Model
class CNN([Link]):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 32, 3)
[Link] = nn.MaxPool2d(2, 2)
self.fc1 = [Link](32 * 13 * 13, 10)

def forward(self, x):


x = [Link](self.conv1(x))
x = [Link](x)
x = [Link](-1, 32*13*13)
x = self.fc1(x)
return x

model = CNN()
criterion = [Link]()
optimizer = [Link]([Link](), lr=0.001)

# Training Loop
for epoch in range(1):
for images, labels in train_loader:
optimizer.zero_grad()
output = model(images)
loss = criterion(output, labels)
[Link]()
[Link]()

print("Training Completed!")

6️⃣Applications of Deep Learning in Vision


• Face Recognition
• Self-driving Cars
• Medical Imaging
• Object Detection (YOLO)
• Image Segmentation (U-Net)

✅ Unit-5 Completed ✅

You might also like