COMPUTER VISION
CS-6350
Prof. Sukhendu Das
Deptt. of Computer Science and Engg.,
IIT Madras, Chennai – 600036.
Email: sdas@[Link]
URL: //[Link]/~sdas
//[Link]/~vplab/computer_vision.html
1
JULY – 2025.
INTRODUCTION
Watch:
//[Link]/~vplab/computer_vision.html
For Updated Schedules, announcements, TPA, PDF of slides etc.
2
1 Introduction
Contents to be covered
2 Neighborhood and Connectivity of pixels
3 DFT, Filtering/Enhancement in spatial and spectral domains
4 3D transformations, projection and stereo reconstruction
5 Histogram based image processing & DHS
6 Concepts in Edge Detection
Use slides as brief :
7 Hough Transform Points, concepts, links
8 Image segmentation
9 Pattern Recognition
These are not substitute
10 Motion Analysis
for materials in books
11 Texture analysis using Gabor filters
12 Shape from Shading
13 Scale-Space - Image Pyramids
14 Feature extraction (recent trends) – detectors and descriptors
15 Bag of Words and Prob. Graphical Models
16 Object Recognition
17 Wavelet transform 3
18 Registration and Matching
Self-study :
Complete
Few initial slides
4
References
1.“Digital Image Processing”; R. C. Gonzalez and R. E.
Woods; Addison Wesley; 1992+.
2. “Computer Vision: Algorithms and Applications”; by
Richard Szeliski; Springer-Verlag London Limited 2011.
3. “Multiple View geometry”; R. Hartley and A. Zisserman,
2002 Cambridge university Press.
4. “Pattern Recognition and Machine Learning”; Christopher
M. Bishop; Springer, 2006.
5.“Digital Image Processing and Computer Vision”; Robert J.
Schallkoff; John Wiley and Sons; 1989+.
6.“Pattern Recognition: Statistical. Structural and Neural
Approaches”; Robert J. Schallkoff; John Wiley and Sons;
1992+.
7.“3-D Computer Vision”; Y. Shirai; Springer-Verlag, 1984
8. “Computer Vision: A Modern Approach”; D. A. Forsyth and
J. Ponce; Pearson Education; 2003+. 5
References (Contd..)
Journals:
• IEEE-T-PAMI ( Transactions on Pattern Analysis and
Machine Intelligence)
• IEEE-T-IP ( Transactions on Image processing)
• PR (Pattern Recognition)
• PRL (Pattern Recognition Letters)
• CVIU ( Computer Vision, Image Understanding)
• IJCV (International Journal of Computer Vision)
Online links
1. CV online: [Link]
2. Computer Vision Homepage:
[Link]
6
July-Nov ‘24
12.00 – 12.50
L
Time Table U
N
C
H
- TUTs – Altn. weeks; Mid-sem etc. May be held Online
Occasionally
Typical Distribution of marks for Evaluation/grading
Quiz (50 mins.) - 15 - 20
End Sem exam (120-150 mins.) - 35 – 40
TPA - 30 - 35
TUTs - 10 - 15
___________________
Total 100
+/- 05 marks variation at any part;
To be finalized well before End Sem Exam.
Pre-Req: - Linear Algebra; Geometry; Stat&Prob basics; DSP basics,
Programming, Data Structure basics; Calculus basics.
Not covered: DL (except in TPA); Latest generative models; Diffusion,
Attention, Transformers, GANs , edge-computing etc.
What is CVPR ?
[Link]
[Link]
[Link]
Also, check: ICCV (13), ECCV (46), NIPS (07), PAMI (49)
[Link]
• 3D computer vision
• Action and behavior recognition
CVPR-20 - CFP
• Adversarial learning, adversarial attack and defense methods
• Biometrics, face, gesture, body pose
• Computational photography, image and video synthesis
• Datasets and evaluation
• Efficient training and inference methods for networks
• Explainable AI, fairness, accountability, privacy, transparency ethics in vision
• Image retrieval
• Low-level and physics-based vision
• Machine learning architectures and formulations
• Medical, biological and cell microscopy
• Motion and tracking
• Neural generative models, auto encoders, GANs
• Optimization and learning methods
• Recognition (object detection, categorization)
• Representation learning, deep learning
• Scene analysis and understanding
• Segmentation, grouping and shape
• Transfer, low-shot, semi- and un- supervised learning
• Video analysis and understanding
• Vision + language, vision + other modalities
• Vision applications & systems, vision for robotics & autonomous vehicles
• Visual reasoning and logical representation
CVPR-22 - CFP
3D from single images Photogrammetry and remote sensing
Adversarial attack and defense Physics-based vision and shape-from-X
Autonomous driving Recognition: Categorization, detection,
Biometrics retrieval
Computational imaging
Computer vision for social good Robotics
Computer vision theory Scene analysis and understanding
Datasets and evaluation Segmentation, grouping and shape analysis
Deep learning architectures and Self-supervised or unsupervised
techniques representation learning
Document analysis and understanding
Efficient and scalable vision Transfer, meta, low-shot, continual, or long-tail
Embodied vision: Active agents, simulation learning
Explainable computer vision
Humans: Face, body, pose, gesture, Transparency, fairness, accountability, privacy,
movement ethics in vision
Image and video synthesis and generation
Low-level vision Video: Action and event understanding
Machine learning (other than Video: Low-level analysis, motion, and tracking
deep learning) Vision + graphics
Medical and biological vision, cell Vision, language, and reasoning
microscopy Vision applications and systems
Multi-modal learning
Optimization methods (other than deep CVPR – 2022-3
learning)
CVPR-22
3D …. - 300
Segmentation – 171
Detection – 190; Recognition – 82;
Self-supervised – 71; MTL – 06
FSL – 45; ZSL – 36
CVPR-23
Human Vision System (HVS) Vs.
Computer Vision System (CVS)
16
The Optics of the eye
A computer Vision System (CVS)
Computer
Image system
light
Digitizer
Reflected
light
17
Computer
Vision
Images, Models,
scenes, Object/Scene
pictures representation
Visualization
18
Computer Vision is an area of work, which is a combination of
concepts, techniques and ideas from Digital Image Processing, Pattern
Recognition, Artificial Intelligence and Computer Graphics.
Majority of the tasks in the fields of Digital Image Processing or
Computer Vision deals with the process of understanding or deriving
the scene information or description, from the input scene (digital
image/s). The methods used to solve a problem in digital image
processing depends on the application domain and nature of data
being analyzed.
Analysis of two-dimensional pictures are generally not
applicable of processing three-dimensional scenes, and vice-versa.
The choice of processing, techniques and methods and 'features' to
be used for a particular application is made after some amount of trial
and error, and hence experience in handling images is crucial in most
of these cases.
For example, analysis of remote sensed or satellite imagery
involves techniques based on classification or analysis of texture
imagery. These techniques are not useful for analyzing optical images
of indoor or outdoor scenes. 19
Optimization
VLSI & Techniques
Architecture
DIP
Computer
Vision PR
CG &
ML
ANN Fuzzy
Parallel and
& DL Distributed & Soft
Computing
Processing
20
The Developmental Pathway of
Computational Vision Technology ??
DL
Computational
ML
Neurosciences
GPU
Fuzzy
& Soft computing
ANN
Optimization
Methods
Computer PR
Graphics
DSP Prob.
& Stat.
Optics
Linear algebra;
Subspaces
Digital Image processing is in many cases
concerned with taking one array of pixels as input and
producing another array of pixels as output which in some
way represents an improvement to the original array.
Purpose:
1. Improvement of Pictorial Information
• improve the contrast of the image,
• remove noise,
• remove blurring caused by movement of the camera
during image acquisition,
• it may correct for geometrical distortions caused
by the lens.
2. Automatic Machine perception (termed Computer
Vision, Pattern Recognition or Visual Perception) for
intelligent interpretation of scenes or pictures.
22
Elements of a Digital Image
Processing System
Mass
Image Digitizer storage
Image Digital Operator
Processor/GPU Computer Console
Hard copy
Display device
23
Image processors: Consists of set of hardware modules that
perform 4 basic functions:
– Image acquisition: frame grabber
– Storage: frame buffer
– Low-level processing: specialized hardware device designed
to perform Arithmetic Logic operations on pixels in parallel
– Display: read from image memory (frame buffer) and
convert to analog video signal
• Digitizers: Converts image into numerical representation
suitable for input to a digital computer
• Digital Computers: Interfaced with the image processor to
provide versatility and ease of programming.
• Storage Devices: For bulk storage. e.g:- Magnetic disks,
magnetic tapes, optical disks
• Display and Recording devices : Monochrome and Color
Television monitors, CRT, Laser printers, heat-sensitive paper
devices, and ink spray systems.
24
Image acquisition using a CCD camera
Resolution standards: HDMI - 1024*768; 10K UHD - ??
25
A digital Image
Image is an array of integers: f(x,y) ε {0,1,….,Imax-1},
where, x,y ε {0,1,…..,N-1}
• N is the resolution of the image and Imax is the level of discretized
brightness value
• Larger the value of N, more is the clarity of the picture (larger
resolution), but more data to be analyzed in the image
• If the image is a gray-level (8-bit per pixel - termed raw, gray)
image, then it requires N2 Bytes for storage
• If the image is color - RGB, each pixel requires 3 Bytes of storage
space.
Image Size Storage space required
(resolution) Raw - Gray Color (RGB)
64*64 4K 12K
256*256 64K 192K
512*512 256K 768K
2048×1536 = 3.1 megapixels 9 MB for RGB
26
A digital image is a two-dimensional (3-D image is
called range data) array of intensity values, f(x, y), which
represents 2-D intensity function discretized both in spatial
coordinates (spatial sampling) and brightness
(quantization) values.
The elements of such an array are called pixels
(picture elements).
The storage requirement for an image depends on
the spatial resolution and number of bits necessary for
pixel quantization.
The processing of an image depends on the
application domain and the methodology used to solve a
problem. There exists four broad categories of tasks in
digital image processing:
(i) Reconstruction (ii) Segmentation,
(iii) Recognition and (iv) motion. 27
Segmentation deals with the process of fragmenting
the image into homogeneous meaningful parts, regions or
sub-images. Segmentation is generally based on the
analysis of the histogram of images using gray level
values as features. Other features used are edges or lines,
colors and textures.
Recognition deals with identification or
classification of objects in an image for the purpose of
interpretation or identification. Recognition is based on
models, which represent an object. A system is trained
(using HMM, GMM, ANN etc.) to learn or store the models,
based on training samples. The test data is then matched
with all such models to identify the object with a certain
measure of confidence.
28
Reconstruction (Depth & 3D) - 3D reconstruction and
depth estimation are critical techniques in computer vision that
enable machines to interpret and model the three-dimensional
structure of scenes from 2D images or video. 3D reconstruction
involves creating a 3D model of an object or environment by
integrating multiple 2D images, often using methods like
structure-from-motion (SfM), multi-view stereo, or point cloud
generation; leveraging geometric and photometric cues to
reconstruct spatial geometry.
Depth estimation, a closely related task, focuses on
inferring the distance of objects or surfaces from a camera,
typically using monocular cues (e.g., shading, texture gradients)
or stereo vision, where disparity between images from multiple
viewpoints is used. (Src: GROK)
Motion analysis (or dynamic scene analysis) involves
techniques for the purpose of tracking and estimation of the path
of movement of object/s from a sequence of frames (digital
video). Methods for dynamic scene analysis are based on (i)
tracking, (ii) obtaining correspondence between frames and then
(iii) estimating the motion parameters and (iv) structure of moving
objects. Typical methods for analysis are based on optical29flow,
iterative Kalman filter and Newton/Euler's equations of dynamics.
There are generally three main categories of tasks
involved in a complete computer vision system. They are:
• Low level processing: Involves image processing tasks
in which the quality of the image is improved for the
benefit of human observers and higher level routines to
perform better.
• Intermediate level processing: Involves the processes
of feature extraction and pattern detection tasks. The
algorithms used here are chosen and tuned in a manner as
may be required to assist the final tasks of high level
vision.
• High level vision: Involves autonomous interpretation
of scenes for pattern classification, recognition and
identification of objects in the scenes as well as any other
information required for human understanding.
A top down approach, rather than a bottom-up
approach is used in the design of these systems in many
applications. The methods used to solve a problem in
digital image processing depends on the application 30
domain and nature of data being analyzed.
Different fields of applications include:
• Character Recognition,
• Document processing,
• Commercial (signature & seal verification) application,
• Biometry and Forensic (authentication: recognition
and verification of persons using face, palm &
fingerprint),
• Pose and gesture identification,
• Automatic inspection of industrial products,
• Industrial process monitoring,
• Biomedical Engg. (Diagnosis and surgery),
• Military surveillance and target identification,
• Navigation and mobility (for robots and unmanned
vehicles - land, air and underwater),
• Remote sensing (using satellite imagery),
• GIS
• Safety and security (night vision),
• Traffic monitoring,
• Sports (training and incident analysis)
• VLDB (organization and retrieval) 31
• Entertainment and virtual reality.
TARGETED INDUSTRIAL APPLICATIONS
Intelligent Traffic Control Vehicle Segmentation
Anti-forging Stamps Visual Tracking Systems
Card Counting Systems Illegal content (adult) Filter
Drive Quality Test Scratch Detection
Camera Flame Detection Smart Traffic Monitoring
CCTV Fog Penetration Vehicle Categorization
Key Image Search/Index Vehicle Wheel alignment
Security Monitoring Number Plate Identification
Robust Shadow Detection Referrals for Line calls
32
Different categories of work being done in CV, to solve problems:
2-D image analysis – 3-D multi-camera calibration;
segmentation, target detection, Correspondence and stereo;
matching, CBIR; Reconstruction of
3-D Objects and surfaces;
Pattern Recognition
for Objects, scenes; Video and motion analysis;
Video analytics; CBVR;
Compression;
Feature extraction:
Canny, GHT, Snakes, Multi-sensor data,
DWT, Corners, Decision and feature fusion;
SIFT, GLOH, LESH;
Steganography and
Image and Video-based Watermarking;
Rendering/Synthesis;
Multimodal AI
Deepfake detection; Adversarial
33
The various sub-categories of technology in these related fields are:
Image enhancement, Image reconstruction
Image restoration and filtering, Range data processing,
Representation and description, Stereo image processing
Feature extraction, Computational geometry,
Image segmentation, medical, airborne, underwater
Image matching, SAR, LIDAR, VHR-μscopic,
Active, Robotic vision Neuro-fuzzy techniques,
Image synthesis, DL/ML driven Vision
Video Analytics Edge computing. 34
35
Few DEMOS and ILLUSTRATIONS
Courtesy: TA/students of VPLAB - CSE-IITM
Also, few from internet
36
Floor Auto-
Navigational aid
Camera View Axis
Estimated Orientation
SPCOMP ‘22
Heavy Version of YOLOV5 Light Version of YOLOV5
Sl Process GPU - CPU
NVIDIA GeForce CORE i7
RTX 2080 8th Generation
1 Yolov5 - 40 fps -
Heavy
2 Yolov5 – Light 149 fps 18 fps
3 Yolov7 (2022) 23 fps -
Results of
Segmentation
Input Image
Segmented map
before integration
Edge map before
integration
Segmented map
and Edge map
after integration
40
Road extraction from Satellite Images
SAT
Images
Results
Hand-
drawn
Object Extraction From an
Image
RCNN
Our
Unsupervised
method
Salient Object Segmentation In Images
Method 1
Unsupervised Saliency
Image IT FT CA GB IS RC HFT SF Proposed GT
Images from MSRA B 5000 image Dataset
[Link] [Link]
Oct 24, 2014
Method 2
Visual Results on PASCAL
Image SF PARAM MR wCrt Proposed GT
Snake
Output GrabCut Output
SnakeCut
Output
47
Univ. of ZURICH
49
52
53
The Problem Definition
IMRN
IMT
Given a bitmap template (IMT) and a noisy bitmap image IMRN
which contains IMT (believe me):
FIND OUT the location of IMT in IMRN !
55
Go to the next page for more:
Problem explanation for pessimists.
IMT
IMR
• IMRN (in previous page) is obtained by adding a large level
of “Salt and Pepper” noise onto IMR bitmap image.
• IMT is also obtained from IMR as shown above. 56
The RESULT beats the human EYE
IMRN IMR
Published almost 3 decades ago;
IMT Without GPU and DL
57
Thank you
58
59