Understanding Color Spaces in Computer Vision
Understanding Color Spaces in Computer Vision
[Link]
Color spaces
• A color space is a mathematical representation of continuous spectral
colors in a three-dimensional vector space, using three primary
colors, which allows color analysis and manipulation.
• Different color spaces can be derived by defining different primary
colors.
• Color spaces are developed according to output devices, physiological
or physical properties.
• A number of standard color space specifications are well developed.
However, different definitions can often be found for the same color
space. For more information about color spaces.
[Link]
Color spaces
• Color spaces can be classified as:
• Device independent
• Device dependent
• Device independent color components are the same on all output
devices.
• Device dependent color spaces will have different components for
different output devices.
• Color spaces can be also classified as user-oriented models which try
to build a bridge between the user and the hardware used to
manipulate color.
[Link]
Color spaces: RGB 0,1,0
Default color space
• The RGB camera output maps spectral power
densities to a triple of numerical components
that are the mathematical coordinates of R, G
and B.
• It is device dependent color space used for 1,0,0
capture and display devices.
• It is an additive color system. 0,0,1
• RGB space may be modelled as a cube with the
three axes corresponding to red, green and RGB cube
blue.
• The bottom corner, when red = green = blue = 0 • Easy for devices
is black, while the opposite top corner, where • But not perceptual
red = green = blue = 255 (for an 8-bit per • Where do the grays live?
channel display system), is white.
• Where is hue and saturation?
Cr
Cb
(Y=0.5,Cr=0.5)
Cb
Y=1
Cr
(Y=0.5,Cb=05)
Y CbCr
• The Y CbCr color space separates RGB into luminance and chrominance information.
• Luminance information is stored as a single component (Y), and chrominance
information is stored as two color-difference components (Cb and Cr).
• Cb represents the difference between the blue component and a reference value.
• Cr represents the difference between the red component and a reference value.
• Digital video widely uses this color space.
• Given R, G and B in the range of [0, 1], with Y in the range of [16, 235] and Cb, Cr, in
the range of [16, 240].
𝑌ሗ 16 65.841 128.553 24.966 𝑅ሗ
𝐶ሗ𝑏 = 128 + −37.797 −0.587 112 𝐺ሗ
𝐶ሗ𝑟 128 112 −93.786 −18.214 𝐵ሗ
HSV
• Hue, Saturation, Value (Intensity) HSV is a non-
linear transformation from the RGB camera
output, which is more intuitive to human vision.
• It separates color information of an image from its
intensity information.
• Color information is represented by Hue and
Saturation values, while Intensity describes the
brightness of an image pixel.
• Hue represents basic colors, and is determined by
the dominant wavelength in the spectral
distribution of light wavelengths. It is the location
of the peak in the spectral distribution.
• The Saturation is a measure of the purity of the
color, and signifies the amount of white light mixed
with the Hue. It is the height of the peak relative to
the entire spectral distribution.
3 𝑅−𝐵 𝑅+𝐺+𝐵 min 𝑅, 𝐺, 𝐵
𝐻, 𝐼 , 𝑆 = (arctan( ), ,1 − )
2𝑅 − 𝐺 − 𝐵 3 𝐼
Color spaces: HSV
Intuitive color space
H
(S=1,V=1)
S
(H=1,V=1)
V
(H=1,S=0)
HSV
brighter.
• a* is the amount of red or green tones in
the image. A large positive a* value
corresponds to red/magenta. A large a
negative a* value corresponds to green. (L=65,b=0)
f = focal length
c = center of the camera
How does a pixel get its value?
Light emitted
Lens
Sensor
Slide credit: Derek Hoiem
How does a pixel get its value?
• Major factors
• Illumination strength and direction Light emitted
• Surface geometry
• Surface material Light reflected to
camera
• Nearby surfaces
• Camera gain/exposure
Sensor
• Diffusion
λ
• Reflection
• Transparency
• Refraction
?
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffuse Reflection
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Specular Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ1
• Reflection
• Transparency λ2
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
t=1
• Reflection
• Transparency t=n
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence
• Interreflection
A photon’s life choices
• Absorption light source
• Diffusion
λ
• Reflection
• Transparency
• Refraction
• Fluorescence
• Subsurface scattering
• Phosphorescence (Specular Interreflection)
• Interreflection
Lambertian Reflectance
• In computer vision, the complexity of light transport is mostly
ignored.
• Surfaces are often assumed to be ideal diffuse reflectors with no
dependence on viewing direction.
[Link]
Basic models of reflection
• Specular: light bounces off at the incident angle
• E.g., mirror specular reflection incoming light
Θ Θ
diffuse reflection
𝜌
absorption
(1 − 𝜌)
specular reflection
Θ Θ
Flickr, by piratejohnny
Slide credit: Derek Hoiem
Most surfaces have both specular and diffuse
components
• Specularity = spot where specular reflection dominates (typically
reflects light source)
2
Recap
absorption
• When light hits a typical surface
• Some light is absorbed (1-𝜌)
• More absorbed for low albedos
diffuse
reflection
• Some light is reflected diffusely
• Independent of viewing direction
𝑆𝑐 = 𝑒 F(ln 𝐸 𝜆𝑐 +ln(𝑆 𝜆𝑐 ))
Why 3 colors?
[Link]
Slide by Steve Seitz
Color Image R
B
Images in Python
• Images represented as a matrix
• Suppose we have a NxM RGB image called “im”
– im(0,0,0) = top-left pixel value in R-channel
– im(y, x, b) = y pixels down, x pixels to right in the bth
channel
– im(N-1, M-1, 2) = bottom-right pixel in B-channel
column
row 0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99 R
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51
0.92
0.55
0.93
0.51
0.94
0.42
0.97
0.57
0.62
0.41
0.37
0.49
0.85
0.91
0.97
0.92
0.93 0.92 0.99 G
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.71 0.81 0.81
0.89
0.87
0.72
0.57
0.51
0.92
0.37
0.55
0.93
0.80
0.51
0.94
0.88
0.42
0.97
0.89
0.57
0.62
0.79
0.41
0.37
0.85
0.49
0.85
0.91
0.97
0.92
0.93 0.92 0.99
B
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.86 0.84 0.74 0.58 0.95
0.51 0.89
0.39 0.82
0.73 0.89
0.92 0.56
0.91 0.31
0.49 0.75
0.74 0.92 0.81 0.95 0.91
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.96 0.67 0.54 0.85 0.89
0.48 0.72
0.37 0.51
0.88 0.55
0.90 0.51
0.94 0.42
0.82 0.57
0.93 0.41 0.49 0.91 0.92
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.69 0.49 0.56 0.66 0.96
0.43 0.95
0.42 0.88
0.77 0.94
0.73 0.56
0.71 0.46
0.90 0.91
0.99 0.87 0.90 0.97 0.95
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.79 0.73 0.90 0.67 0.71
0.33 0.81
0.61 0.81
0.69 0.87
0.79 0.57
0.73 0.37
0.93 0.80
0.97 0.88 0.89 0.79 0.85
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.91 0.94 0.89 0.49 0.49
0.41 0.62
0.78 0.60
0.78 0.58
0.77 0.50
0.89 0.60
0.99 0.58
0.93 0.50 0.61 0.45 0.33
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
Dynamic range and camera response
• Typical scenes have a huge dynamic range
• Camera response is roughly linear in the mid
range (15 to 240) but non-linear at the extremes
• called saturation or under-saturation
• The human perception of brightness, under
common illumination conditions (neither pitch
black nor blindingly bright), follows an
approximate power function, with greater
sensitivity to relative differences between darker
tones than between lighter tones.
• If images are not gamma-encoded, they allocate
too many bits or too much bandwidth to
highlights that humans cannot differentiate, and
too few bits or too little bandwidth to shadow
values that humans are sensitive to and would
require more bits/bandwidth to maintain the
same visual quality.[
Camera Gamma Curve
𝑒𝑛𝑐𝑜𝑑𝑒𝑑 𝑏𝑟𝑖𝑔ℎ𝑡𝑛𝑒𝑠𝑠 = 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑑 𝑏𝑟𝑖𝑔ℎ𝑡𝑛𝑒𝑠𝑠 𝛾
• A pixel with value of “100” is not necessarily twice as bright as a pixel with value “50”.
Camera Gamma Curve
• Area source
• E.g., white walls, diffuser lamps, sky
• Ambient light
• Substitute for dealing with interreflections
Energy
electromagnetic energy that covers wavelengths
from approximately 400 nm to 700 nm.
2000 C
• The visible spectrum represents only a small 700 C
c2 −c 2
E ( ) = c1. .(e
−5 .T
− 1) c1. .e
−1 −5 .T
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
The light of the poor pixel
• A pixel’s brightness is determined by
• Light source (strength, direction, color)
• Surface orientation
• Surface material and albedo
• Reflected light and shadows from surrounding surfaces
• Gain on the sensor
• Key idea: for nearby scene points, most factors do not change much
• The information is mainly contained in local differences of brightness
Darkness = Large Difference in Neighboring Pixels
What differences in intensity tell us about
shape?
• Corner detection
Applications
• Key points are used for:
• Image alignment
• 3D reconstruction
• Motion tracking
• Robot navigation
• Indexing and database retrieval
• Object recognition
Why extract features?
• Motivation: panorama stitching
• We have two images – how do we combine them?
Why extract features?
• Motivation: panorama stitching
• We have two images – how do we combine them?
by Diva Sian
by swashford
Harder case
‘sliced at 1’
Visualizing quadratics
and slice at 1
Visualizing quadratics
and slice at 1
decrease width in x!
Visualizing quadratics
and slice at 1
Visualizing quadratics
decrease width in y
What happens if you increase
coefficient on y?
and slice at 1
Visualizing quadratics
Corner detection is better when you need distinct, repeatable features for matching, tracking, or identifying points of interest in images with significant texture or structure.
eigenvalues
eigenvectors along diagonal
Inverse sqr of
axis of the ‘ellipse length of the
slice’ quadratic along
the axis
Visualizing quadratics
Eigenvectors Eigenvalues
Eigenvectors
Inverse sqr of the size of the axis
Eigenvector
Eigenvector
Recall:
Eigenvectors Eigenvectors
Eigenvector
Eigenvector
Eigenvalues
Eigenvectors Eigenvectors
Eigenvalues
Eigenvectors Eigenvectors
We will need this to understand the…
Horizontal edge:
v
u
E(u,v)
Vertical edge:
v
u
General case
The shape of H tells us something about the distribution
of gradients around a pixel
We can visualize H as an ellipse with axis lengths
determined by the eigenvalues of H and orientation
determined by the eigenvectors of H
max, min : eigenvalues of H
Ellipse equation: direction of the
fastest change
u
[u v] H = const direction of the
slowest change
v
(max)-1/2
(min)-1/2
Quick eigenvalue/eigenvector review
The eigenvectors of a matrix A are the vectors x that satisfy:
• The solution:
xmin
xmax
2 “Edge”
2 >> 1 “Corner”
1 and 2 are large,
1 ~ 2 ;
E increases in all
directions
1
Corner detection summary
Here’s what you do
• Compute the gradient at each point in the image
• Create the H matrix from the entries in the gradient
• Compute the eigenvalues.
• Find points with large response (min > threshold)
• Choose those points where min is a local maximum as features
Corner detection summary
Here’s what you do
• Compute the gradient at each point in the image
• Create the H matrix from the entries in the gradient
• Compute the eigenvalues.
• Find points with large response (min > threshold)
• Choose those points where min is a local maximum as features
The Harris operator
min is a variant of the “Harris operator” for feature detection
• The trace is the sum of the diagonals, i.e., trace(H) = h11 + h22
• Very similar to min but less expensive (no square root)
• Called the “Harris Corner Detector” or “Harris Operator”
• Lots of other detectors, this is one of the most popular
The Harris operator
Harris
operator
Harris corner detector
Finding corners
(a.k.a. PCA)
[Link] image gradients over
small region
array of x gradients
array of y gradients
Visualization of gradients
image
X derivative
Y derivative
What does the distribution tell you about the region?
distribution reveals edge orientation and magnitude
How do you quantify orientation and magnitude?
2. Subtract the mean from each image gradient
2. Subtract the mean from each image gradient
plot intensities
constant intensity
gradient
plot intensities
constant intensity
gradient
subtract mean
plot intensities
constant intensity
gradient
subtract mean
data is centered
plot of image gradients
(‘DC’ offset is removed)
3. Compute the covariance matrix
3. Compute the covariance matrix
=sum( .* )
array of x gradients array of y gradients
[Moravec 1980]
Some mathematical background…
Error function
Change of intensity for the shift [u,v]:
Change in
appearance for a
shift [u,v]
flat
Which error surface indicates a good image feature?
flat edge
‘line’
Which error surface indicates a good image feature?
eigenvector
4. Compute eigenvalues and eigenvectors
eigenvalue
eigenvector
eigenvector
eigenvector
1 >> 2
1
interpreting eigenvalues
2
‘horizontal’ corner
edge
2 >> 1
1 ~ 2
flat 1 >> 2
‘vertical’
edge
1
interpreting eigenvalues
2
‘horizontal’ corner
edge
2 >> 1
1 ~ 2
flat 1 >> 2
‘vertical’
edge
1
interpreting eigenvalues
2
‘horizontal’ corner
edge
2 >> 1
1 ~ 2
flat 1 >> 2
‘vertical’
edge
1
5. Use threshold on eigenvalues to detect corners
5. Use threshold on eigenvalues to detect corners
2
Think of a function to
score ‘cornerness’
flat
1
5. Use threshold on eigenvalues to detect corners
2
strong corner
Think of a function to
score ‘cornerness’
flat
1
5. Use threshold on eigenvalues to detect corners
^
(a function of )
2
corner
flat
1
5. Use threshold on eigenvalues to detect corners
^
(a function of )
2
corner
Eigenvalues need to be
bigger than one.
flat
1
5. Use threshold on eigenvalues to detect corners
^
(a function of )
2
corner
R<0 R>0
R<0
flat
1
Harris & Stephens (1988)
Nobel (1998)
Harris Detector
[Link] and [Link]. “A Combined Corner and Edge Detector.”1988.
Harris criterion
Corner response
Thresholded corner response
Non-maximal suppression
Design a program to detect corners
(hint: use image gradients)
Harris Detector: Invariance Properties
• Rotation
R R
threshold
edge!
corner!
Harris Detector: Invariance Properties
• Scaling
Corner
f f
Image 1 Image 2
characteristic scale
we need to search over characteristic scales
What happens if you apply different Laplacian filters?
peak!
2.1 4.2 6.0
maximum response
optimal scale
2.1 4.2 6.0 9.8 15.5 17.0
4.2
6.0
local maximum
9.8
How would you implement scale selection?
Implementation
• Instead of computing f for larger and larger windows, we can
implement using a fixed window size with a Gaussian pyramid
1 2 3
4 5 6 ( 1 2 3 4 5 6 7 8 9 )
7 8 9 vector of intensity values
1 2 3
4 5 6 ( 1 2 3 4 5 6 7 8 9 )
7 8 9 vector of intensity values
1 2 3
4 5 6 ( 1 2 3 4 5 6 7 8 9 )
7 8 9 vector of intensity values
1 2 3
4 5 6 ( - + + - - + )
7 8 9 vector of x derivatives
‘binary descriptor’
1 2 3
4 5 6 ( - + + - - + )
7 8 9 vector of x derivatives
colors
colors
colors
Generalization power
HOG (Histograms of Oriented
Gradients) descriptor
HOG (Histograms of Oriented Gradients)
Dalal, Triggs. Histograms of Oriented Gradients for Human Detection. CVPR, 2005
histogram of ‘unsigned’
gradients
Cell
(8x8 pixels)
gradient magnitude histogram
(one for each cell)
soft binning
Block
(2x2 cells)
128 pixels
15 x 7 x 4 x 36 =
16 cells
3780
15 blocks
64 pixels
8 cells
7 blocks
Redundant representation due to overlapping blocks
How many times is each inner cell encoded?
[Link]
SIFT (Scale-Invariant Feature
Transform)
Fast NLoG Approximation: DoG
Difference of Gaussian 𝐷𝑜𝐺 = 𝑛𝑠𝜎 − 𝑛𝜎 ≈ (𝑠 − 1)𝜎 2 𝛻 2 𝑛𝜎
𝐷𝑜𝐺 ≈ (𝑠 − 1)𝜎 2 𝛻 2 𝑛𝜎
What Is A Useful Signature Function?
• Difference-of-Gaussian = “blob” detector
K. Grauman, B. Leibe
Difference-of-Gaussian (DoG)
- =
K. Grauman, B. Leibe
DoG – Efficient Computation
• Computation in Gaussian scale pyramid
𝑠4𝜎
𝑠3𝜎
𝑠2𝜎
𝑠𝜎
Sampling with 𝜎
step =2 𝑠4𝜎
𝑠3𝜎
𝑠2𝜎
1 𝑠𝜎
Original image =2 4
𝜎
K. Grauman, B. Leibe
Find local maxima in position-scale
space of Difference-of-Gaussian
𝑠4𝜎
𝑠3𝜎
Lxx ( ) + Lyy ( ) 𝑠 2 𝜎
𝑠𝜎
List of
𝜎 (x, y, s)
K. Grauman, B. Leibe
Results: Difference-of-Gaussian
K. Grauman, B. Leibe
SIFT (Scale Invariant
Feature Transform)
SIFT describes both a detector and descriptor
2. Keypoint localization
3. Orientation assignment
4. Keypoint descriptor
1. Multi-scale extrema
detection
𝑠4𝜎
𝑠3𝜎
Second octave 𝑠2𝜎
𝑠𝜎
𝜎
𝑠4𝜎
𝑠3𝜎
First octave
𝑠2𝜎
𝑠𝜎
𝜎
Gaussian Difference of Gaussian (DoG)
Gaussian
Laplacian
Scale-space extrema
x-derivative y-derivative
1. **Optical Flow Estimation** – It has one equation (brightness constancy) but two unknowns (motion in x and y), making it underconstrained per pixel.
2. **Shape from Shading** – Multiple 3D shapes can produce the same shading in an image, leading to ambiguity in recovering surface geometry from intensity alone.
Detection process returns
Gaussian weighting
(sigma = half width)
Orientation Normalization
[Lowe, SIFT, 1999]
0 2
T. Tuytelaars, B. Leibe
SIFT descriptor
Histograms of gradient directions over spatial regions
SIFT descriptor
Histograms of gradient directions over spatial regions
SIFT descriptor
Histograms of gradient directions over spatial regions
SIFT descriptor
Histograms of gradient directions over spatial regions
Local Descriptors: SIFT Descriptor
Histogram of oriented
gradients
• Captures important texture
information
• Robust to small translations /
[Lowe, ICCV 1999]
affine deformations
K. Grauman, B. Leibe
Scale Invariant Feature Transform (SIFT)
Basic idea:
• Take 16x16 square window around detected feature
• Compute edge orientation (angle of the gradient - 90) for each pixel
• Throw out weak edges (threshold gradient magnitude)
• Create histogram of surviving edge orientations
angle histogram
sift
K. Grauman, B. Leibe
SIFT Results: Scale Invariance
SIFT Results: Rotation Invariance
SIFT Results: Robustness to Clutter
Feature matching
• Essentially comparing two arrays of data.
𝑑(𝐻1 , 𝐻2 ) = (𝐻1 𝑘 − 𝐻2 𝑘 )2
𝑘
• Normalized Correlation:
σ𝑘[ 𝐻1 𝑘 − 𝐻1 𝐻2 𝑘 − 𝐻2 ]
𝑑 𝐻1 , 𝐻2 =
σ𝑘 𝐻1 𝑘 − 𝐻1 2 σ𝑘 𝐻2 𝑘 − 𝐻2 2
1 𝑁
where 𝐻𝑖 = σ𝑘=1 𝐻𝑖 𝑘
𝑁
𝑑 𝐻1 , 𝐻2 = σ𝑘 𝑚𝑖𝑛(𝐻1 𝑘 , 𝐻2 𝑘 )
f1 f2
I1 I2
Feature distance
How to define the difference between two features f1, f2?
• Better approach: ratio distance = ||f1 - f2 || / || f1 - f2’ ||
• f2 is best SSD match to f1 in I2
• f2’ is 2nd best SSD match to f1 in I2
• gives large values for ambiguous matches
f1 f2' f2
I1 I2
Feature matching example
51 matches
Feature matching example
58 matches
Evaluating the results
How can we measure the performance of a feature matcher?
50
75
200
feature distance
True/false positives
How can we measure the performance of a feature matcher?
50
true match
75
200
false match
feature distance
0.7
true
# true positives
positive
# correctly matched features (positives)
rate
“recall”
0.7
true
# true positives
positive
# correctly matched features (positives)
rate
“recall”
• Why choose?
– Get more points with more detectors
• Interest point/keypoint/feature
descriptors
• SIFT (do read the paper)
• Optical flow
• Recover image motion at each pixel from spatio-temporal image brightness variations
G. Johansson, “Visual Perception of Biological Motion and a Model For Its Analysis",
Perception and Psychophysics 14, 201-211, 1973.
Motion and perceptual organization
• Even “impoverished” motion data can evoke a strong percept
The optical flow model estimates motion by assuming constant brightness and linking spatial and temporal intensity changes.
G. Johansson, “Visual Perception of Biological Motion and a Model For Its Analysis",
Perception and Psychophysics 14, 201-211, 1973.
Uses of motion
• Estimating 3D structure
• Segmenting objects based on motion cues
• Learning and tracking dynamical models
• Recognizing events and activities
• Improving video quality (motion stabilization)
Video Stabilization
What would the motion field of a non-rotating ball moving towards the camera look like?
Optical flow
• Definition: optical flow is the apparent motion of brightness patterns
in the image
• Ideally, optical flow would be the same as the motion field
• Have to be careful: apparent motion can be caused by lighting
changes without any actual motion
• Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere
under moving illumination
When is optical Flow ≠ Motion Field?
Barber Pole
Illusion
When is optical Flow ≠ Motion Field?
Donguri Wave
Motion Illusion
When is optical Flow ≠ Motion Field?
Donguri Wave
Motion Illusion
Optical Flow
𝑡 𝑡 + 𝛿𝑡
Optical Flow
𝑡 𝑡 + 𝛿𝑡
Optical Flow:
(𝑥, 𝑦) (𝑥 + 𝛿𝑥, 𝑦 + 𝛿𝑦) 𝜹𝒙 𝜹𝒚
𝒖, 𝒗 = ( , )
Displacement: (𝜹𝒙, 𝜹𝒚) 𝜹𝒕 𝜹𝒕
Feature tracking - Challenges
I(x,y,t) I(x,y,t+1)
• Given two subsequent frames, estimate the point translation
Problem Definition
Assumptions
Brightness constancy
Small motion
Optical Flow (Problem definition)
Color Constancy
(Brightness constancy for intensity images)
Small Motion
(pixels only move a little bit)
constant
Assumption 2
Small motion
Assumption 2
Small motion
Assumption 2
Small motion
Insight:
If the time step is really small,
we can linearize the intensity function
𝐼 𝑥 + 𝛿𝑥, 𝑦 + 𝛿𝑦, 𝑡 + 𝛿𝑡 = 𝐼(𝑥, 𝑦, 𝑡)
assuming small
motion
Multivariable Taylor Series Expansion
(First order approximation, two variables)
partial derivative
assuming small
motion
fixed point
cancel terms
Multivariable Taylor Series Expansion
(First order approximation, two variables)
assuming small
motion
cancel terms
Multivariable Taylor Series Expansion
(First order approximation, two variables)
assuming small
motion
divide by
take limit
Multivariable Taylor Series Expansion
(First order approximation, two variables)
assuming small
motion
divide by
take limit
Multivariable Taylor Series Expansion
(First order approximation, two variables)
assuming small
motion
divide by
take limit
Brightness Constancy
Equation
Brightness
Constancy Equation
shorthand notation
(x-flow) (y-flow)
vector form
(1 x 2) (2 x 1)
(putting the math aside for a second…)
Image gradients
(at a point p)
(putting the math aside for a second…)
flow velocities
Image gradients
(at a point p)
(putting the math aside for a second…)
flow velocities
spatial derivative
How do you compute …
spatial derivative
Forward difference
Sobel filter
Scharr filter
…
How do you compute …
Forward difference
Sobel filter
Scharr filter
…
How do you compute …
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
1
1
1
10 10 10 10
10 10 10 10
10 10 10 10
- 1
1
1
1
1
1
1 1
10 10 10
10 10 10
1
= 0
0
0
9
9
9
9
0
0
9
0
0
9
0
0
1 10 10 10 10 1 1 10 10 10 0 9 0 0 0
x x x
- 0 0 0 - - - - - - 0 0 0 0 0
- 0 0 0 - 0 0 0 0 0 0 0 0 0 0
- 9 0 0 - 0 9 9 9 9 0 9 9 9 9
- 9 0 0 - - 0 0 0 0 0 9 0 0 0
- 9 0 0 - 0 0 0 0 0 0 9 0 0 0
- 9 0 0 - - - - - - 0 9 0 0 0
y y
-1 y
-1 0 1 0
1
How do you compute …
known
known
brightness constancy
method of differences
small motion
Matrix form
Assumptions:
Flow is locally smooth
Neighboring pixels have same displacement
is equivalent to solving
Least squares approximation
is equivalent to solving
Square Matrix
Pseudo Inverse
Least squares approximation
is equivalent to solving
=
Where have you seen this before?
86
* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003
Coarse-to-fine optical flow estimation
image J1 image
image I2
Fails in areas of
large motion
• Key ideas
• By assuming brightness constancy, truncated Taylor expansion leads to simple
and fast patch matching across frames
• Coarse-to-fine registration
Next week
• Fitting and Alignment
Fitting
Ix2 Iy 2 IxIy
2. Square of
derivatives
det M = 12
trace M = 1 + 2
3. Gaussian g(Ix2) g(Iy2) g(IxIy)
filter g(I)
Lxx ( ) + Lyy ( )
List of
(x, y, s)
K. Grauman, B. Leibe
Review: SIFT Descriptor
Histogram of oriented
gradients
• Captures important texture
information
• Robust to small translations /
[Lowe, ICCV 1999]
affine deformations
K. Grauman, B. Leibe
Review: Lucas-Kanade Tracker
I(x,y,t) I(x,y,t+1)
I ( x, y, t ) = I ( x + u, y + v, t + 1) Brightness consistency
I ( x + u, y + v, t + 1) I ( x, y , t ) + I x u + I y v + I t
Small motion
I x u + I y v + It 0
Spatial coherence
Dealing with larger movements: Iterative
Original (x,y) position
refinement
1. Initialize (x’,y’) = (x,y) It = I(x’, y’, t+1) - I(x, y, t)
2. Compute (u,v) by
image J1 image
image I2
Alignment [əˈlīnmənt]:
find the parameters of the
transformation that best align
matched points
Fitting and alignment
• Choose a parametric model to represent a set of
features
Find: Parameters
Using:
Note:
Find: Parameters
Using:
Huge E!
Problems with parameterizations
Where is the line that minimizes E?
• Cost:
– It is not feasible to check all combinations of features by fitting a
model to each possible subset
slope y-intercept
slope y-intercept
Double intercept form
x-intercept y-intercept
x-intercept y-intercept
Derivation:
(Similar slope)
Normal Form
Derivation:
plug into:
Hough transform
Hough transform
parameters
Image space
Image and parameter
space
variables variables
parameters parameters
a line
becomes a
point
parameters
Image space
Image and parameter
space
variables variables
parameters parameters
a point
becomes a
line
parameters parameters
two points
become
?
parameters parameters
two points
become
?
parameters parameters
three points
become
?
parameters parameters
three points
become
?
parameters parameters
four points
become
?
parameters parameters
four points
become
?
Algorithm:
[Link]
1 1
Increment 1 1
1 1
1 1
1 1
2
1 1
1 1
1 1
Problems with
parameterization
How big does the accumulator need to be for the parameterization ?
1 1
1 1
1 1
2
1 1
1 1
1 1
Image Space
Hough Space Sinusoid
?
(Finite Accumulator Array Size)
Hough Space
Image and parameter
space
variables parameters
parameters variables
a point
becomes?
parameters variables
a point
becomes a
wave
parameters
a line
becomes?
parameters
a line
becomes a
point
parameters
a line
becomes?
parameters
a line
becomes a
point
parameters
a line
becomes a
point
parameters
a line
becomes a
point
parameters
a line
becomes a
point
parameters
a line
becomes a
point
parameters
Wait …why is rho negative?
a line
becomes a
point
parameters
a line
becomes a
point
Recall:
Image and parameter
space
variables
parameters
a line
becomes a
point
parameters
two points
become
?
parameters
three points
become
?
parameters
four points
become
?
line
Basic shapes
(in parameter space)
line rectangle
Basic shapes
(in parameter space)
𝑀 = 14
Algorithm:
1. Sample (randomly) the number of points (s) required to fit the model
Typically s is the minimum samples to fit a model.)
2. Fit the model to the randomly chosen samples.
3. Count the number M of data points (inliers) that fit the model within a
measure of error 𝜖
4. Repeat Steps 1-3 N times
5. Choose the mode; that has the largest number of M
RANSAC:
RANdom Sample Consensus
for k = 1:N
rp = randperm(numel(x));
tx = x(rp(1:2));
ty = y(rp(1:2));
m = (ty(2)-ty(1)) ./ (tx(2)-tx(1));
b = ty(2)-m*tx(2);
nin = sum(abs(y-m*x-b)<thresh);
if nin > bestcount
bestcount = nin;
inliers = (abs(y - m*x - b) < thresh);
end
end
% total least square fitting on inliers
[m, b] = total_lsqfit(x(inliers), y(inliers));
Hough Circles
Let’s assume radius known
parameters parameters
variables variables
variables variables
variables variables
parameters parameters
variables variables
parameters parameters
variables variables
parameters parameters
variables variables
What if radius is unknown?
parameters parameters
variables variables
What if radius is unknown?
parameters parameters
variables variables
Edge Location
Edge Direction
variables variables
parameters parameters
variables variables
Pennie Hough detector Quarter Hough detector
Pennie Hough detector Quarter Hough detector
1. Image → Canny
2. Canny → Hough votes
3. Hough votes → Edges
Find peaks and post-process
Hough transform example
[Link]
Hough transform for circles
• Circle: center (a,b) and radius r Equation of circle?
( xi − a) 2 + ( yi − b) 2 = r 2
• For a fixed radius r Equation of set of
circles that all pass
through a point?
b
Intersection:
most votes for
center occur
here.
b
a
Image space Hough space
121
Kristen Grauman
Hough transform for circles
• Circle: center (a,b) and radius r
( xi − a) 2 + ( yi − b) 2 = r 2
• For an unknown radius r
b
a
Image space Hough space
122
Kristen Grauman
Hough transform for circles
• Circle: center (a,b) and radius r
( xi − a) 2 + ( yi − b) 2 = r 2
• For an unknown radius r, known gradient direction
123
Kristen Grauman
The Hough transform …
• 2D transformations.
• Classification of 2D transformations.
A (grayscale)
image is a 2D
function.
grayscale image
Filtering Warping
Filtering Warping
and a transformation:
transformation
parameters
function
p = (x,y) p’ = (x’,y’)
Transformation T is a coordinate-changing machine:
p’ = T(p)
original
S
Scale
Scaling (Stretching or Squishing)
Scaling (Stretching or Squishing)
Scale
scaling matrix S
Shear
Skew (Shear)
rotation around
the origin
Rotation
rotation around
the origin
Rotation
Polar coordinates…
x = r cos (φ)
y = r sin (φ)
x’ = r cos (φ + θ)
y’ = r sin (φ + θ)
Trigonometric Identity…
x’ = r cos(φ) cos(θ) – r sin(φ) sin(θ)
𝑟 rotation around
y’ = r sin(φ) cos(θ) + r cos(φ) sin(θ)
the origin
Substitute…
𝑟 x’ = x cos(θ) - y sin(θ)
y’ = x sin(θ) + y cos(θ)
φ
Rotation
or in matrix form:
rotation around
the origin
2D planar and linear transformations
′
𝒙 = 𝑓 𝒙; 𝑝
𝑥′ 𝑥
′ =𝑴 𝑦
𝑦
parameters 𝑝 point 𝒙
2D planar and linear transformations
Scale Flip across y
Shear Identity
2D translation
T
𝑡𝑦
Scale
𝑡𝑥
2D translation
𝑥ư = 𝑥 + 𝑡𝑥
𝑦ư = 𝑦 + 𝑡𝑦
𝑥ư = 𝑥 + 𝑡𝑥
𝑦ư = 𝑦 + 𝑡𝑦
Not possible.
Projective geometry 101
Homogeneous coordinates
heterogeneous homogeneous
coordinates coordinates
𝑥 𝑑𝑒𝑓
𝑥
𝑦 𝑦 add a 1 here
1
• Represent 2D point with a 3D vector
Homogeneous coordinates
heterogeneous homogeneous
coordinates coordinates
𝑥 𝑑𝑒𝑓
𝑥 𝑎𝑥
𝑦 𝑦 ≝ 𝑎𝑦
1 𝑎
• Represent 2D point with a 3D vector
• 3D vectors are only defined up to scale
2D translation
• scale invariance
Projective geometry
image plane
image point in
pixel coordinates
?
translation scaling
? ?
rotation shearing
2D transformations in heterogeneous coordinates
Re-write these transformations as 3x3 matrices:
translation scaling
? ?
rotation shearing
2D transformations in heterogeneous coordinates
Re-write these transformations as 3x3 matrices:
translation scaling
?
rotation shearing
2D transformations in heterogeneous coordinates
Re-write these transformations as 3x3 matrices:
translation scaling
rotation shearing
Matrix composition
Transformations can be combined by matrix multiplication:
p’ = ? ? ? p
Matrix composition
Transformations can be combined by matrix multiplication:
?
?
?
?
?
Classification of 2D transformations
Translation:
Euclidean (rigid):
rotation + translation
Euclidean (rigid):
rotation + translation
Euclidean (rigid):
rotation + translation
Classification of 2D transformations
what will happen to the
image if this increases?
Euclidean (rigid):
rotation + translation
Classification of 2D transformations
what will happen to the
image if this increases?
Euclidean (rigid):
rotation + translation
Classification of 2D transformations
Similarity:
uniform scaling + rotation
+ translation
Similarity:
uniform scaling + rotation
+ translation
Similarity:
uniform scaling + rotation
+ translation
Classification of 2D transformations
Affine transform:
uniform scaling + shearing
+ rotation + translation
Affine transform:
uniform scaling + shearing
+ rotation + translation
Affine transform:
uniform scaling + shearing
+ rotation + translation
image point in
pixel coordinates
image point in
pixel coordinates
B E
D
C
A
F
Determining unknown transformations
Suppose we have two triangles: ABC and DEF.
• What type of transformation will map A to D, B to E, and C to F?
B E
D
C
A
F
Determining unknown transformations
Suppose we have two triangles: ABC and DEF.
• What type of transformation will map A to D, B to E, and C to F?
• How do we determine the unknown parameters?
B E
A
F
Affine transform:
How many degrees of
uniform scaling + shearing
freedom do we have?
+ rotation + translation
Determining unknown transformations
Suppose we have two triangles: ABC and DEF.
• What type of transformation will map A to D, B to E, and C to F?
• How do we determine the unknown parameters?
B E
D
C
A
F
unknowns
• One point correspondence gives
how many equations?
• How many point
point correspondences correspondences do we need?
Determining unknown transformations
Suppose we have two triangles: ABC and DEF.
• What type of transformation will map A to D, B to E, and C to F?
• How do we determine the unknown parameters?
B E
D
C
A
F
unknowns
What is What is
this? this?
Least Squares Error Euclidean
(L2) norm
squared!
predicted measured
location location
Least Squares Error
Residual (projection
error)
Least Squares Error
(matrix form)
Determining unknown transformations
Why can we drop
Affine transformation:
the last line?
Vectorize transformation
parameters:
(matrix form)
x = A \ b
Expand the error:
T(x, y)
y y’
x x’
f(x, y) g(x’, y’)
Forward warping
Suppose we have two images.
• How do we compute the transform that takes one to the other?
T(x, y)
y y’
x x’
f(x, y) g(x’, y’)
1. Form enough pixel-to-pixel correspondences between two images
2. Solve for linear transform parameters as before
3. Send intensities f(x,y) in first image to their corresponding location in the second image
Forward warping
Suppose we have two images.
• How do we compute the transform that takes one to the other?
T(x, y)
y y’
x x’
f(x, y) g(x’, y’)
what is the problem
1. Form enough pixel-to-pixel correspondences between two images with this?
2. Solve for linear transform parameters as before
3. Send intensities f(x,y) in first image to their corresponding location in the second image
Forward warping
Pixels may end up between two points
• How do we determine the intensity of each point?
f(x,y) g(x’,y’)
T(x,y)
y y’
x x’
Forward warping
Pixels may end up between two points
• How do we determine the intensity of each point?
✓ We distribute color among neighboring pixels (x’,y’) (“splatting”)
f(x,y) g(x’,y’)
T(x,y)
y y’
x x’
• What if a pixel (x’,y’) receives intensity from more than one pixels (x,y)?
Forward warping
Pixels may end up between two points
• How do we determine the intensity of each point?
✓ We distribute color among neighboring pixels (x’,y’) (“splatting”)
f(x,y) g(x’,y’)
T(x,y)
y y’
x x’
• What if a pixel (x’,y’) receives intensity from more than one pixels (x,y)?
✓ We average their intensity contributions.
Alignment as fitting
• Previous lectures: fitting a model to features in one
image M
residual( x , M )
i
i
residual(T ( x ), x)
i
i i
What if you want to align but have no prior matched pairs?
• Important applications
• Difficulties
• Noise (typically 1-3 pixels)
• Outliers (often 30-50%)
• Many-to-one matches or multiple objects
Projective Transformations (homography)
• The transformation between two views of a planar
surface
Matched
keypoints
2. Solve for affine
transformation parameters
Affine
Parameters
3. Score by inliers and choose
solutions with score above This Class
# Inliers
threshold
Choose hypothesis with max
score above threshold
Overview of Keypoint Matching
1. Find a set of
distinctive key-
points
A1
2. Define a region
around each
A2 A3 keypoint
3. Extract and
normalize the
region content
fA fB
4. Compute a local
N pixels
Input
Image Stored
Image
• Recognition
– Feature matches may be
spread over several
training viewpoints
Use the known links to
“transfer votes” to other
viewpoints [Lowe01]
• SIFT usage
– Recognize
docking station
– Communicate
with visual cards
• Other uses
– Place recognition
– Loop closure in SLAM
Training
[Lowe04]
Slide credit: David Lowe
Another application: category
recognition
• Goal: identify what type of object is in the image
• Approach: align to known objects and choose
category with best match
“Shape matching and object recognition using low distortion correspondence”, Berg et
al., CVPR 2005: [Link]
Examples of Matches
Examples of Matches
Next week
• Object instance recognition
Computer Vision Q&A
Your Name
May 27, 2025
Contents
1
Questions and Answers
Questions and Answers
Question 1 (1 point)
What is a homography between two images?
Answer:
A homography is a transformation between two images that relates corresponding points
in the two images. It is a projective transformation that maps points from one plane to
another and is represented by a 3 × 3 matrix H. A homography accounts for effects such
as translation, rotation, scaling, and perspective distortion.
Mathematically, the homography transformation is written as:
x′ = Hx,
where:
• Perspective correction,
Question 2 (2 points)
Explain the distinction between motion field and optical flow in an image.
Use a diagram to illustrate your answer.
Answer:
The distinction between motion field and optical flow can be explained as follows:
• Motion Field: The motion field represents the actual projection of 3D scene points
onto the 2D image plane as they move through time. It describes the true physical
motion of points in the scene.
• Optical Flow: Optical flow, on the other hand, refers to the apparent motion of
brightness patterns in an image sequence. It estimates pixel movement between
frames based on the intensity values in images. However, it may not always corre-
spond to the actual motion field due to occlusions, lighting changes, or textureless
regions.
2
The relationship between the two can be summarized:
Motion Field ̸= Optical Flow (in general)
However, in ideal conditions (e.g., Lambertian surfaces and small displacements), optical
flow approximates the motion field.
motion_field_vs_optical_flow.png
Figure 1: Difference between Motion Field and Optical Flow. Motion field represents
actual motion in 3D, while optical flow estimates 2D apparent motion.
Question 3 (4 points)
Using appropriate mathematical expressions, define the following operations
commonly used in computer vision and briefly explain their function and
applications:
1. Convolution:
Convolution is a mathematical operation used to combine two functions to pro-
duce a third function. In computer vision, it is widely used for filtering images.
Mathematically, convolution is defined as:
XX
(f ∗ g)(x, y) = f (i, j)g(x − i, y − j),
i j
where f is the input image, g is the kernel (filter), and (x, y) are coordinates.
Convolution is fundamental to operations like image smoothing, sharpening, and
feature extraction.
3
2. Correlation:
Correlation is similar to convolution but involves a direct alignment of the kernel
without flipping it. It is defined as:
XX
(f ⋆ g)(x, y) = f (i, j)g(x + i, y + j).
i j
Correlation is used in template matching, where we slide a kernel over the image
to find matches.
4. Invariant Transform:
Invariant transforms are mathematical operations that produce outputs unchanged
under specific transformations (e.g., rotation, scaling, or translation). For example:
where f is the input image, and f ′ is the transformed image. Invariant transforms
are widely used in object recognition and image matching, where features must
remain consistent under different viewpoints or scales.
Question 4 (6 points)
Explain the notion of scale-space and how it is used in various areas of com-
puter vision. Include the following:
Answer:
Scale-space is a multi-scale representation of an image, where the image is analyzed at
multiple levels of resolution or scales. It addresses the problem of finding structures or
features in images that exist at different spatial scales, such as fine details (small scales)
or large structures (large scales).
4
The scale-space is generated by convolving an image I(x, y) with a Gaussian kernel
G(x, y; σ), where σ controls the scale. Mathematically, the scale-space representation
L(x, y; σ) is defined as:
L(x, y; σ) = I(x, y) ∗ G(x, y; σ),
where:
1 − x2 +y2 2
G(x, y; σ) = e 2σ .
2πσ 2
Here, σ determines the amount of smoothing (blurring).
• Image compression,
• Coarse-scale edge detectors capture large-scale structures, but miss finer details.
Edges detected at smaller scales may not persist at larger scales. Therefore, edges that
consistently appear across multiple scales are more robust and meaningful for computer
vision tasks.
• The set of all zero-crossings across scales forms a fingerprint in scale-space, which
encodes the structure of the image.
5
• Detecting object boundaries,
• Extracting meaningful image features at different resolutions.
Question 5 (2 points)
Why is it important that images are not under-sampled if you wish to accu-
rately track the images of objects?
Answer:
Under-sampling occurs when the sampling rate is insufficient to capture the variations
in an image. Inaccurate sampling can lead to loss of information and aliasing, where
high-frequency details are misrepresented as lower frequencies.
The importance of avoiding under-sampling for object tracking includes:
• Preserving Fine Details: Under-sampling can blur or distort object edges, mak-
ing it difficult to detect and track features like corners or edges.
• Avoiding Aliasing: Aliasing introduces artifacts that can mislead object tracking
algorithms. For instance, patterns in objects may appear shifted or incorrectly
matched.
• Ensuring Spatial Accuracy: Accurate sampling ensures that objects’ positions,
motion trajectories, and shapes are precisely captured.
To avoid under-sampling, the Nyquist criterion should be satisfied:
fs ≥ 2fmax ,
where fs is the sampling frequency and fmax is the highest frequency in the image.
Proper sampling is critical for applications such as motion estimation, optical flow,
and object tracking in computer vision.
Question 6 (3 points)
The Harris corner detection algorithm computes a 2 × 2 matrix at each pixel
based on the first derivatives at that point and then computes the two eigen-
values of the matrix, λ1 and λ2 , where λ1 ≤ λ2 . How can these two values be
used to label each pixel as either a locally smooth region (S), an edge point
(E), or a corner point (C)? Give your answer by specifying “Label pixel S if
...”, “Label pixel E if ...” and “Label pixel C if ...”
Answer:
The Harris corner detection algorithm analyzes the local structure of the image based
on the eigenvalues of the 2 × 2 structure tensor (autocorrelation matrix) defined as:
2
Ix Ix Iy
M= ,
Ix Iy Iy2
where Ix and Iy are the gradients of the image along the x- and y-directions, respectively.
The eigenvalues λ1 and λ2 represent the intensity variations in orthogonal directions.
The classification into smooth regions, edge points, and corner points is done as follows:
6
• Label pixel S (Smooth region) if:
λ1 ≈ 0 and λ2 ≈ 0.
λ1 ≈ 0 and λ2 > 0.
Explanation: One eigenvalue is large (significant variation in one direction), and the
other is close to zero. This indicates an edge where intensity changes significantly
along one direction but not the other.
To summarize:
Question 7 (4 points)
Given the two eigenvalues specified in (Question 6), explain in English the
rationale and difference(s) between detecting corner points using the crite-
rion λ1 > T1 versus the criterion λ1 λ2 > T2 , where T1 and T2 are appropriate
thresholds.
Answer:
The two criteria for detecting corner points using eigenvalues of the structure tensor
M are as follows:
• Criterion 1: λ1 > T1
This condition considers only the magnitude of the smaller eigenvalue λ1 relative
to a threshold T1 .
Rationale:
7
Difference: The criterion λ1 > T1 can lead to false positives for edges, as it does
not account for the second eigenvalue λ2 .
• Criterion 2: λ1 λ2 > T2
This condition considers the product of both eigenvalues relative to a threshold T2 .
Rationale:
– The product λ1 λ2 becomes large only if both eigenvalues are large.
– Since corners exhibit significant intensity variation in two orthogonal direc-
tions, both eigenvalues must be non-negligible for a corner to be detected.
– This criterion is more robust for corner detection because it avoids detecting
edges where only one eigenvalue is large.
Difference: The criterion λ1 λ2 > T2 is stricter than λ1 > T1 because it ensures that
intensity variation occurs in all directions, making it more reliable for detecting
corners.
Summary of Differences:
• λ1 > T1 detects regions with strong variation in at least one direction (both edges
and corners).
• λ1 λ2 > T2 detects regions with strong variation in two orthogonal directions (corner
points only).
Thus, the second criterion (λ1 λ2 > T2 ) is preferred for detecting corners because it
avoids misclassifying edges as corners.
Question 8 (4 points)
a. Explain the key principles underlying the Scale-Invariant Feature Trans-
form (SIFT).
b. What is it used for?
c. What goals does it achieve?
d. How does it achieve them?
Answer:
8
• Keypoint descriptor: A descriptor is constructed by analyzing the gradient
magnitude and direction around the keypoint in a local neighborhood. The
descriptor is represented as a 128-dimensional vector.
This multi-step pipeline ensures that SIFT can reliably detect and describe features
that are invariant to scale, rotation, and partially to affine transformations and
lighting changes.
9
Question 9 (3 points)
What special difficulties would you expect to have in computing optical flow
given:
a. Spatially aliased images?
b. Temporally aliased image sequence?
c. What can you do to deal with these situations?
Answer:
• High-frequency details such as fine edges, textures, or small objects may not
be accurately represented.
• Features that are spatially close together can overlap or appear as a single
feature, leading to ambiguous motion estimation.
• Optical flow algorithms rely on spatial gradients, and aliasing causes incorrect
gradient computation, resulting in inaccurate flow vectors.
10
– Use robust methods such as multi-frame optical flow estimation, which
integrates information across multiple frames rather than relying on two
consecutive frames.
Question 10 (4 points)
For an image I(x, y), define its gradient vector field ∇I(x, y).
2. Define the gradient magnitude over the image plane (x, y).
The gradient magnitude at a point (x, y) in the image is defined as the Euclidean
norm of the gradient vector:
s
2 2
∂I ∂I
|∇I(x, y)| = +
∂x ∂y
∂I ∂I
where ∂x and ∂y are the partial derivatives of the image intensity I(x, y) with
respect to the spatial coordinates x and y. The gradient magnitude indicates the
strength of the intensity change at each pixel.
3. Define the gradient direction over the image plane (x, y).
The gradient direction at a point (x, y) is the angle of the gradient vector relative
to the horizontal axis and is given by:
∂I
!
∂y
θ(x, y) = tan−1 ∂I
∂x
This direction represents the orientation of the steepest ascent in intensity, indicat-
ing the direction of maximum change in the image.
4. Explain how the gradient vector field is used in the Canny edge detector,
the main steps in its use, and its advantages.
The Canny edge detector uses the gradient vector field to detect edges in an image
by following these main steps:
11
• Step 1: Smoothing (Gaussian filtering): The image is smoothed using a
Gaussian filter to reduce noise that may interfere with edge detection.
• Step 2: Gradient computation: The gradient vector field of the smoothed
image is computed using Sobel or similar filters. This step calculates both the
gradient magnitude and direction at each pixel.
• Step 3: Non-maximum suppression: After calculating the gradient mag-
nitude, non-maximum suppression is applied to thin the edges by keeping only
local maxima in the gradient magnitude along the gradient direction.
• Step 4: Edge tracking by hysteresis:
– High gradient magnitude pixels are considered strong edges, and low mag-
nitude pixels are considered weak.
– Weak pixels are included in the final edge set if they are connected to
strong edges, forming continuous edges while eliminating noise.
Question 11 (5 points)
1. Define the gradient vector field ∇2 f (x, y) over an image f (x, y), and ex-
plain what makes it useful. Contrast its features and capabilities with
the ∇2 Gσ (x, y) (Laplacian of a Gaussian) operator.
The operator ∇2 f (x, y) is the Laplacian of the image function f (x, y). It is defined
as the sum of the second partial derivatives of the image intensity f (x, y) with
respect to x and y:
2 ∂ 2 f (x, y) ∂ 2 f (x, y)
∇ f (x, y) = +
∂x2 ∂y 2
The Laplacian operator measures the local rate of change of the gradient in an
image and is commonly used to detect regions of rapid intensity change, such as
edges and corners. It is particularly useful for identifying points of zero-crossing,
where the intensity changes from increasing to decreasing or vice versa.
The Laplacian of a Gaussian (∇2 Gσ (x, y)) operator is a combination of two opera-
tions:
2 2 1 − x2 +y2 2
∇ Gσ (x, y) = ∇ e 2σ
2πσ 2
This operator involves first applying a Gaussian smoothing function with a standard
deviation σ and then computing the Laplacian of the smoothed image. The key
difference is that the Laplacian of a Gaussian smooths the image before applying
12
the Laplacian operator, which helps suppress noise and enhance the detection of
edges at different scales. The ∇2 Gσ (x, y) operator is more robust to noise than
∇2 f (x, y) due to the smoothing effect.
– This kernel is convolved with the image to calculate the Laplacian at each
pixel.
• Laplacian of Gaussian (∇2 Gσ (x, y)):
– This can be implemented in two steps: first, apply a Gaussian filter
(smoothing), and then compute the Laplacian of the resulting image.
– The Gaussian kernel is defined as:
1 − x2 +y2 2
Gσ (x, y) = e 2σ
2πσ 2
– After smoothing, apply the Laplacian filter, which can also be approxi-
mated using discrete finite differences, such as the following kernel:
1 1 1
∇2 Gσ = 1 −8 1
1 1 1
13
• Discuss any neurobiological analogues for both:
In neurobiology, the operations of the Laplacian and Laplacian of Gaussian
are somewhat analogous to the way the human visual system detects edges
and contours:
– The Laplacian (∇2 f (x, y)) is analogous to the way simple cells in the visual
cortex (V1) respond to changes in contrast or the boundaries between
light and dark regions. These cells are sensitive to the first derivative of
intensity and help detect edges.
– The Laplacian of Gaussian (∇2 Gσ (x, y)) is similar to the behavior of com-
plex cells that are tuned to edges and contours at different scales. These
cells often respond to the second derivative of intensity and are thought to
integrate information across a region of visual space, making them more
robust to noise and more sensitive to significant features like edges at
various scales.
Question 12 (2 points)
(a) Would you aim to include, as design goals, the standard geometric
visual illusions as well?
No, in a machine vision system, especially one designed for a visual prosthesis
for the blind, the goal is to restore or enhance vision in a way that is as
close to natural human vision as possible. Standard geometric visual illusions,
which are artifacts of human perception, would not be a desirable design goal.
These illusions often arise due to limitations or biases in the way human brains
process visual stimuli, and they could lead to confusion or misinterpretation
of the visual data for users relying on the prosthesis.
(b) If such errors of vision emerged as unintended consequences of your
design, would you consider them to be features, or bugs, and why?
If visual illusions emerged as unintended consequences, they would be consid-
ered bugs, not features. The purpose of a prosthetic system is to replicate
or approximate normal vision, and introducing illusions would hinder the sys-
tem’s ability to provide accurate visual information. Illusions are generally
undesirable because they interfere with perception and could lead to misin-
terpretation of visual cues, which could be especially harmful in contexts like
navigation or object identification.
Question 13 (3 points)
(a) Provide a 3 × 3 discrete filter kernel array that approximates the
Laplacian operator.
A common 3x3 discrete kernel that approximates the Laplacian operator is:
0 1 0
1 −4 1
0 1 0
14
This kernel computes the Laplacian by approximating the second-order partial
derivatives of the image.
(b) Explain what the Laplacian might be used for:
The Laplacian operator is commonly used for edge detection in image process-
ing. It highlights regions where the intensity changes rapidly, such as edges,
and is useful in applications like object recognition, segmentation, and fea-
ture extraction. The Laplacian can also be used for image sharpening, as it
highlights areas where there are significant intensity transitions.
(c) What is the significance of the sum of all of the coefficients in the
filter?
The sum of the coefficients in the Laplacian filter kernel is typically zero. This
property ensures that the filter detects differences in intensity across the image
rather than affecting the overall brightness of the image. A zero sum ensures
that the filter responds only to relative intensity changes (edges) rather than
to global image intensity, which is important for preserving the contrast in the
image while detecting edges or regions of rapid intensity change.
Question 14 (2 points)
Extraction of visual features from images often involves convolution with filters
that are themselves constructed from combinations of differential operators. One
example is the Laplacian of a Gaussian Gσ (x, y) having scale parameter σ, gener-
ating the filter ∇2 Gσ (x, y) for convolution with the image I(x, y). Explain in detail
each of the following three operator sequences, where ∗ signifies two-dimensional
convolution.
15
(c) What are the differences amongst their effects on the image?
The key difference between the two sequences lies in the order of operations and
the resulting effects on the image. - In ∇2 [Gσ (x, y) ∗ I(x, y)], the convolution
with the Gaussian smooths the image first, and then the Laplacian detects
edges or regions with rapid intensity change. This sequence is more sensitive to
noise, but it provides a more detailed edge detection. - In Gσ (x, y) ∗ ∇2 I(x, y),
the Laplacian is applied first to detect edges, and then the Gaussian smooths
the result. This sequence is more robust to noise and small intensity variations
but may miss finer edge details because it operates on a second-derivative
image that has already been modified by the Laplacian.
Question 15 (2 points)
Explain the notion of scale-space and how it is used in various areas of computer
vision. Include the following:
16
Question 16 (4 points)
(a) Describe the Canny edge detection algorithm, including how the
edge strength and orientation is calculated.
The Canny edge detection algorithm is a multi-step process used to identify
edges in an image. It aims to detect areas with rapid intensity changes and is
considered one of the most effective edge detection techniques due to its accu-
racy and robustness to noise. The steps involved in the Canny edge detection
are as follows:
i. Smoothing: The image is first smoothed using a Gaussian filter to reduce
noise and fine details. The amount of smoothing is controlled by the
standard deviation σ of the Gaussian kernel.
ii. Gradient calculation: The gradients of the image in the horizontal and
vertical directions are calculated using operators such as the Sobel opera-
tor. This step helps to identify areas where intensity changes significantly.
The gradient magnitude is computed as:
q
G(x, y) = G2x (x, y) + G2y (x, y)
This gives the direction of the edge at each pixel, which is important for
accurate edge localization.
v. Non-maximum suppression: To thin the edges, a non-maximum sup-
pression step is performed. This involves comparing each pixel’s gradient
magnitude with the magnitudes of the pixels along the gradient direction.
If a pixel’s magnitude is not the maximum along the gradient direction,
it is suppressed.
vi. Edge tracking by hysteresis: Finally, edge tracking by hysteresis is
used to determine which edges are real. Pixels with a gradient magnitude
above a high threshold are considered strong edges, and pixels with a
magnitude between a low and high threshold are considered weak edges.
Weak edges that are connected to strong edges are retained, while others
are discarded.
(b) What is the spatial orientation tensor and how is it used?
The spatial orientation tensor, also known as the structure tensor or the second
moment matrix, is a representation of the local image structure at each pixel.
It captures the directionality of image features (such as edges or corners) by
considering the local gradients at that pixel. The tensor is constructed from
17
the gradients of the image in both the x- and y-directions and is given by:
Ix2 (x, y) Ix (x, y)Iy (x, y)
J(x, y) =
Ix (x, y)Iy (x, y) Iy2 (x, y)
where Ix and Iy are the partial derivatives of the image I(x, y) in the x- and
y-directions, respectively. The eigenvalues of this matrix can be used to assess
the local structure of the image:
• If both eigenvalues are small, the region is flat (i.e., no edge or corner).
• If one eigenvalue is large and the other is small, the region represents an
edge.
• If both eigenvalues are large, the region represents a corner.
The spatial orientation tensor is particularly useful for analyzing the direction-
ality and structure of edges and corners in an image. It is widely used in corner
detection algorithms and in applications where local image structure needs to
be understood for tasks such as feature matching and object recognition.
Question 17 (4 points)
Let the following matrix H denote a general 2D planar transformation:
h11 h12 h13
H = h21 h22 h23
h31 h32 h33
(a) Euclidean
• Parameterization of the hij ’s: In the Euclidean case, the transformation
includes rotation, translation, and possibly reflection. The matrix H is
parameterized by:
cos θ − sin θ tx
H = sin θ cos θ ty
0 0 1
where θ is the rotation angle and tx , ty are the translations in the x- and
y-directions, respectively.
• Degrees of freedom: The Euclidean transformation has 3 degrees of free-
dom: 2 for the translation and 1 for the rotation.
• Number of correspondences: To estimate the transformation, 3 correspon-
dences are needed (since each correspondence provides 2 equations for the
parameters).
(b) Similarity
18
• Parameterization of the hij ’s: A similarity transformation includes scal-
ing, rotation, and translation. The matrix H is parameterized as:
s cos θ −s sin θ tx
H = s sin θ s cos θ ty
0 0 1
where s is the scaling factor, θ is the rotation angle, and tx , ty are the
translations.
• Degrees of freedom: The similarity transformation has 4 degrees of free-
dom: 1 for scaling, 1 for rotation, and 2 for translation.
• Number of correspondences: To estimate the transformation, 4 correspon-
dences are needed.
(c) Affine
• Parameterization of the hij ’s: An affine transformation includes rotation,
translation, scaling, and shearing. The matrix H is parameterized as:
h11 h12 tx
H = h21 h22 ty
0 0 1
where h11 , h12 , h21 , h22 are the shearing and scaling parameters, and tx , ty
are the translations.
• Degrees of freedom: The affine transformation has 6 degrees of freedom:
2 for translation, 2 for scaling/shearing, and 2 for rotation.
• Number of correspondences: To estimate the transformation, 3 correspon-
dences are needed (since each correspondence provides 2 equations).
(d) Projective
• Parameterization of the hij ’s: A projective transformation (or homogra-
phy) includes all the affine transformations as well as perspective distor-
tion. The matrix H is parameterized as:
h11 h12 h13
H = h21 h22 h23
h31 h32 h33
Question 18 (5 points)
(a) Describe when it would be suitable to use the Prewitt masks.
19
The Prewitt masks are suitable for edge detection in images, particularly when
detecting edges in a noisy environment or when the image has strong gradients in
the horizontal or vertical directions. The Prewitt operator is a simple convolution-
based edge detection technique that is commonly used for detecting edges in images
by approximating the gradient in the horizontal and vertical directions.
Prewitt masks are specifically useful in cases where:
• The edges are relatively well-defined and the image is not excessively noisy.
• You want to detect horizontal and vertical edges in an image.
• A simpler edge-detection method is required, as it is less computationally
expensive compared to more complex methods such as Sobel.
(c) Sketch the result when one of these masks is applied to the greyscale
image below.
Given the following grayscale image:
6 7 7 8 8 9
6 7 8 34 38 39
7 8 9 36 38 39
Image =
7
6 36 35 37 40
6 6 8 36 18 39
3 7 8 35 20 39
We will apply the Prewitt masks to the central region of the image, covering the
area from the second to the fifth row and second to the fifth column.
7 7 8 34
8 9 36 38
Central region =
6 36 35
37
6 8 36 18
Let’s apply the Prewitt Horizontal Mask (for example) to this central region:
7 7 8 34
−1 0 1 8 9 36 38
Convolution = −1 0 1 ∗
6 36 35
37
−1 0 1
6 8 36 18
20
The result is computed by multiplying each element of the mask with the corre-
sponding image element and summing the products.
Similarly, apply the Prewitt Vertical Mask to the central region, and compute the
result.
The final edge magnitudes will be calculated using:
q
G= G2x + G2y
where Gx and Gy are the results of applying the horizontal and vertical Prewitt
masks, respectively.
Question 19 (5 points)
(a) Define the gradient vector field ∇f (x, y) over an image f (x, y), and
explain what makes it useful. Contrast its features and capabilities with
the ∇2 Gσ (x, y) (Laplacian of a Gaussian) operator.
The gradient vector field of an image f (x, y) is a vector field that describes the rate
and direction of change in intensity at each point in the image. It is defined as:
∂f ∂f
∇f (x, y) = ,
∂x ∂y
where ∂f
∂x
and ∂f
∂y
represent the partial derivatives of the image with respect to the
horizontal and vertical directions, respectively. The gradient is useful in image
processing because it highlights edges and boundaries in the image. Strong gradi-
ent magnitudes indicate areas of rapid intensity change, typically corresponding to
edges.
The gradient is used for tasks such as edge detection (e.g., Canny edge detector),
image segmentation, and texture analysis. The gradient provides information about
the orientation and magnitude of intensity changes in the image.
On the other hand, the Laplacian of a Gaussian (LoG) operator, ∇2 Gσ (x, y), is
defined as:
∂ 2 Gσ (x, y) ∂ 2 Gσ (x, y)
∇2 Gσ (x, y) = +
∂x2 ∂y 2
where Gσ (x, y) is a Gaussian kernel with scale parameter σ. The LoG operator
is used primarily for edge detection, especially in scale-space analysis, and is more
sensitive to the presence of corners and blobs in an image compared to the gradient.
It is useful for detecting edges with varying thickness and is often applied in multi-
scale approaches.
(b) Identify their respective orders as differential operators.
21
The gradient ∇f (x, y) is a first-order differential operator because it computes
the first derivatives of the image along the horizontal and vertical directions. It
measures the rate of change of intensity.
The Laplacian of a Gaussian ∇2 Gσ (x, y) is a second-order differential operator
because it involves second-order derivatives. The Laplacian operator calculates the
second derivatives, which help in identifying regions of the image where intensity
changes rapidly in all directions (i.e., edges or regions of interest).
(c) Explain how they can be implemented.
- The gradient can be implemented using discrete approximations of the partial
derivatives, such as the Sobel operator or the Prewitt operator, which convolve the
image with kernel masks to approximate the gradients in the horizontal and vertical
directions.
−1 0 1 −1 −2 −1
Sobel Operator (Horizontal) = −2 0 2 Sobel Operator (Vertical) = 0 0 0
−1 0 1 1 2 1
∂2 ∂2
1 − x2 +y2 2 2
Gσ (x, y) = e 2σ ∇ Gσ (x, y) = + Gσ (x, y)
2πσ 2 ∂x2 ∂y 2
Question 20 (5 points)
Explain the Iterative Closest Point (ICP) Algorithm to estimate the
transform between two dense sets of points.
The Iterative Closest Point (ICP) algorithm is used to estimate the rigid trans-
formation (rotation and translation) between two dense sets of points, commonly
22
referred to as the source set P and the target set Q. The goal is to align the source
point cloud with the target point cloud as accurately as possible.
Steps in the ICP Algorithm:
1. **Initialization**: - Start with an initial guess for the transformation T, typically
a 4×4 matrix combining rotation R and translation t. - Apply this transformation
to the source set P.
2. **Closest Point Association**: - For each point p ∈ P (transformed source
set), find the closest point q ∈ Q in the target set. This step involves a nearest
neighbor search, which can be computationally intensive for large datasets. - The
correspondence pairs are recorded as (pi , qi ) for all i.
3. **Error Metric Definition**: - Define the alignment error as the sum of squared
distances between each correspondence:
X
E(T) = ∥qi − (Rpi + t)∥2
i
Question 21 (5 points)
1. Explain the Hough Transform.
The Hough Transform is a feature extraction technique used in image analysis and
computer vision to detect specific shapes, such as lines, circles, and other parametric
23
curves. The method maps points from the image space to a parameter space where
the desired shape can be identified as an intersection or peak in the accumulator
space.
Steps in the Hough Transform for Lines: 1. **Parametric Representation**: - A
line in 2D can be expressed parametrically as:
ρ = x cos θ + y sin θ,
where: - ρ is the perpendicular distance from the origin to the line, - θ is the angle
between the line’s normal and the x-axis, - (x, y) are the coordinates of a point on
the line.
2. **Voting in Parameter Space**: - Each edge point (x, y) in the image contributes
votes to all possible (ρ, θ) pairs in the parameter space that could define a line
passing through the point. - This results in a sinusoidal curve in the (ρ, θ) space.
3. **Accumulator Array**: - The parameter space is discretized into an accumula-
tor array. - Votes are tallied for each (ρ, θ) combination. Peaks in the accumulator
indicate potential lines.
4. **Peak Detection**: - Peaks in the accumulator array correspond to lines in the
image. These peaks are identified and mapped back to the image space.
Applications: - Detecting lines in edge-detected images (e.g., road lane detection). -
Identifying shapes like circles, ellipses, or other geometric forms through extensions.
—
2. Can the Hough Transform be used with shapes other than circles and
lines? If so, how?
Yes, the Hough Transform can be extended to detect shapes other than circles and
lines by modifying the parameterization to match the equation of the desired shape.
Below are some examples:
a. Detecting Circles: 1. **Parametric Representation**: - A circle is defined by its
center (a, b) and radius r:
(x − a)2 + (y − b)2 = r2 .
- The parameter space becomes (a, b, r).
2. **Voting Process**: - For each edge point (x, y), votes are cast for all possible
(a, b, r) combinations that could define a circle passing through the point. - This
results in a 3D accumulator array where peaks correspond to potential circles.
b. Detecting Arbitrary Shapes: 1. **Parametric Representation**: - Shapes can
be represented by a set of equations, such as ellipses or parabolas. - For example,
an ellipse can be parameterized as:
(x − a)2 (y − b)2
+ = 1,
rx2 ry2
where (a, b) is the center, and rx , ry are the semi-major and semi-minor axes.
2. **Template Matching**: - For complex shapes, the Hough Transform can be
adapted by using a shape template. Edge points vote for transformations (e.g.,
translation, rotation, scaling) that align the template with the image edges.
24
3. **Generalized Hough Transform**: - For non-parametric or irregular shapes, a
reference table is constructed with information about the relative position of edge
points. - Votes are cast based on this table to detect the shape in the image.
—
Advantages of the Hough Transform: - Robust to noise and partial occlusions. -
Handles shapes parametrically or via templates.
Limitations: - Computationally expensive for high-dimensional parameter spaces.
- Sensitive to parameter discretization.
Question 22 (5 points)
1. Explain how to make the Harris Corner Detector scale-invariant.
The Harris Corner Detector is inherently not scale-invariant because it relies on a
fixed window size for evaluating local image gradients. To achieve scale invariance,
the following steps are applied:
—
Steps to Make the Harris Corner Detector Scale-Invariant:
1. **Incorporate Scale-Space Analysis:** - Perform the corner detection over a
scale-space representation of the image. - A scale-space representation is con-
structed by convolving the image I(x, y) with a Gaussian kernel at multiple scales
σ:
Iσ (x, y) = Gσ (x, y) ∗ I(x, y),
where Gσ (x, y) is the Gaussian kernel with standard deviation σ.
2. **Adjust the Structure Tensor:** - At each scale σ, compute the structure tensor:
Gσ ∗ Ix2 Gσ ∗ Ix Iy
Mσ = ,
Gσ ∗ Ix Iy Gσ ∗ Iy2
25
Why This Makes the Detector Scale-Invariant: - By analyzing the image
across multiple scales, the detector becomes sensitive to features at varying sizes.
- The use of Gaussian smoothing ensures that details irrelevant to the scale being
analyzed are suppressed. - Non-maximum suppression across scales ensures that
corners are detected consistently, regardless of the image’s scaling.
—
Applications: - The scale-invariant Harris Corner Detector is often used in tasks
where images may vary in size, such as object recognition or image matching.
Question 23 (5 points)
1. Explain how bilateral and guided image filters work.
—
Bilateral Filter
where:
26
Guided Filter
The guided filter is a more efficient edge-preserving filter that leverages a guidance
image to refine the filtering process.
1. **Definition:** The guided filter computes the filtered output Ifiltered based on a
guidance image G, which may or may not be the same as the input image I:
where:
covΩ (G, I)
a(x, y) = , b(x, y) = µΩ (I) − a(x, y) · µΩ (G),
varΩ (G) + ϵ
where µΩ and covΩ are the mean and covariance within Ω, and ϵ is a regularization
term. - Compute the final output using the guidance image.
3. **Key Features:** - Efficient and linear in complexity. - Suitable for applications
requiring real-time performance.
4. **Applications:** - Joint image filtering. - Image matting and feathering. - High
dynamic range (HDR) compression.
—
• Edge Preservation: Both filters preserve edges, but the guided filter is more
computationally efficient.
• Guidance: The bilateral filter relies only on the input image, while the guided
filter uses an explicit guidance image.
• Flexibility: The guided filter can be tailored to specific tasks by adjusting
the guidance image.
Question 23 (5 points)
1. Explain how image interpolation works and its uses.
—
27
Image Interpolation
where:
Ideal Reconstruction
1. **Definition:** The ideal reconstruction uses the sinc function to perfectly in-
terpolate pixel values. The sinc function is the Fourier transform of a rectangular
function in the frequency domain.
X
Iideal (x′ , y ′ ) = I(i, j) · sinc(x′ − i) · sinc(y ′ − j).
i,j
Nearest-Neighbor Interpolation
28
2. **Key Features:** - Fast and computationally efficient. - Produces blocky and
jagged artifacts, especially for enlargements.
—
Linear Interpolation
Gaussian Reconstruction
Comparison of Methods
29
Examination #1
1 _______ 15
2 _______ 15
3 _______ 20
4 _______ 10
5 _______ 15
6 _______ 10
Total _______ 85
CS 766 Exam #1 Fall 2004
(a) [3] Can the matrix P incorporate any lens distortions that might be in the camera?
Briefly explain.
(b) [4] Give two lists, one specifying the intrinsic camera parameters and the other
giving the extrinsic camera parameters.
(c) [4] Show how P can be decomposed into a product of matrices that contain
elements expressed in terms of the intrinsic and extrinsic camera parameters.
(d) [4] Give the main steps of an algorithm for computing the matrix P from a single
image of a known 3D “calibration object.”
2 of 8
CS 766 Exam #1 Fall 2004
(a) [3] What is the planar projective transformation that describes the relationship
between (X,Y) and (u,v)? Give your answer using homogeneous coordinates.
(b) [1] How many degrees of freedom does this transformation have?
(c) [1] How many point correspondences are required to determine this transformation?
(d) [2] Would having more correspondences than your answer to (c) be helpful in any
way? If no, briefly explain why not. If yes, explain how they could be used.
(f) [2] Give one invariant of a planar affine transformation that is not an invariant for a
planar projective transformation.
(g) [4] If the building has sets of lines on it running parallel to both the X and Y axes,
how could we use the corresponding lines in the image to determine if the building
plane is parallel to the image plane?
3 of 8
CS 766 Exam #1 Fall 2004
Ignore computing a value for the first and last image pixels (in other words, your result
will be 6 values). In addition to showing the result of the convolution, indicate where
edges would be detected and why.
8 15 19 17 11 6 1
(b) [6] What property of the coefficients of a kernel ensures that an appropriate output is
obtained for regions of constant intensity in an image when
(c) [4] Describe a major advantage or disadvantage of using an isotropic operator instead
of a non-isotropic operator with respect to the following issues:
(i) Computational efficiency.
4 of 8
CS 766 Exam #1 Fall 2004
(d) [5] What is the purpose of (i) non-maxima suppression and of (ii) hysteresis that are
done in the Canny edge detector?
5 of 8
CS 766 Exam #1 Fall 2004
(b) [4] The Harris corner detection algorithm computes a 2 × 2 matrix at each pixel based
on the first derivatives at that point and then computes the two eigenvalues of the
matrix, λ1 and λ2, where λ1 ≤ λ2. How can these two values be used to label each pixel
as either a locally smooth region (S), an edge point (E), or a corner point (C)? Give
your answer by specifying “Label pixel S if …”, “Label pixel E if …” and “Label pixel C if
…”
The Harris corner detector first computes R = λ1λ2 –k(λ1 + λ2)2 and
then
labels a pixel S if |R| ≈ 0 (or, alternatively, λ1 ≈ λ2 ≈ 0);
labels a pixel E if R < T1 < 0 (or, alternatively, λ1 ≈ 0
(corresponding to the direction of the edge) and λ2 is large
(corresponding to the normal direction at the edge)); or
labels a pixel C if R > T2 > 0 (or, alternatively, λ1 and λ1 are both
large).
(c) [4] Given the two eigenvalues specified in (b), explain in English the rationale and
difference(s) between detecting corner points using the criterion λ1 > T1 (which is used
in the Tomasi and Kanade algorithm) versus the criterion λ1λ2 > T2 (which is used in
the Harris algorithm), where T1 and T2 are appropriate thresholds.
6 of 8
CS 766 Exam #1 Fall 2004
(a) [9] For each of these three terms explain briefly what it measures, how it is
defined, and what happens if the term is omitted.
This will cause snaxels to bunch up and tend to cause the contour to
expand.
(c) [3] How could the energy functional definition be specified so as to cause the
contour to expand?
7 of 8
CS 766 Exam #1 Fall 2004
(b) [5] Say we want to find “regions” corresponding to object boundaries by grouping
connected chains of “strong” edge points that are adjacent and their gradients
imply a smooth contour. Define an affinity measure for this purpose that uses the
magnitude, mag(∇I(p)), and direction, direc(∇I(p)), of the gradient, ∇I, at a pixel p in
image I to compute affedge(x,y) for a pair of adjacent pixels x and y. Include any
additional computational steps before computing the affinity that make sense for
obtaining good results. Briefly explain your definition.
(c) [2] Say we want to segment images into regions of uniform texture using a
measure of the texture at each pixel by computing some function over a (2n+1) ×
(2n+1) neighborhood centered on the pixel. Briefly explain why using this function
to define the texture affinity between two pixels, afftexture(x,y), will be biased
depending on how close the pixels are to a region boundary.
8 of 8
Student Name: ____________________________ Student Number: _________________
EXAMINATION INSTRUCTIONS:
· Do not turn this page until asked to do so.
· Exam time is 120 minutes.
· Write your name and Number in the indicated places.
· Put the answers on the same question sheet, do not use any additional papers, even for
scratch.
· Write all steps used to get the final answers.
· Read the exam instructions carefully.
· This exam is for assessment purposes only, and will not be available for review.
· Read the honesty policy and sign the following statement.
· There are 15 pages including this page. Please make sure your exam has these many pages.
▶ Page 1 | 11
________________________________________________________________________________________
QUESTION 1 1 point
What is a homography between two images?
QUESTION 2 2 points
Explain the distinction between motion field and optical flow in an image. Use a diagram to illustrate
your answer.
Ideally, Optical Flow is equal to the Motion field. However, the optical flow is more
of the motion of the brightness patterns in one image (the velocity of the brightness pixels),
while the motion field is the actual velocity of an the object has moved from to
that is being compared to.
QUESTION 3 4 points
Using appropriate mathematical expressions, define the following operations commonly used in
computer vision and briefly explain their function and applications:
a. Convolution
b. Correlation
c. edge detection by second-derivative zero-crossings
d. invariant transform
Page 2 | 15
___________________________________________________________________________
QUESTION 4 6 points
Explain the notion of scale-space and how it is used in various areas of computer vision. Include the
following:
a. Pyramidal representations of image structure across successive scales of blurred
undersampling.
b. Edge detection operators that extract edges at particular scales of analysis, but not at others.
c. The behaviour of zero-crossings, their trajectories and “fingerprints” in scale-space.
a. First we get the pyramidal representations of the images using gaussian blurring and then
undersampling it. this is so useful in application like Hybrid image such as overlaying two aligned
shapes on top of each others while having one retains either frequencies. anotehr application is
the object detection where it is useful to try laplacian filters with different sigmas on different
image scale so we can find the maximum response.
[Link] Canny edge detector for example, this operator is used to retain the high frequencies
in an image after getting the gradients in both direction and retain the magnitude. So, it is a
powerful operator that is used to detect boundaries and edges using certain thresholds.
c. zero crossing is leverged using the Laplacian of the Gaussian operator which is resembling
the second derivative of an edge detector. it is widely used application is feature detection as well.
whenever the intensity in images changes, those trajectories appear more.
Zero-crossings are used in scale-space to detect edges by identifying points
where the image intensity undergoes a significant change in curvature.
In scale-space, these zero-crossings are tracked across different scales.
The trajectories of zero-crossings provide information about the object boundaries
at different scales and can reveal features that are robust across scales
QUESTION 5 2 points
Why is it important that images are not under-sampled if you wish to accurately track the images of
objects?
Using any technique of tracking objects will be through detecting keypoints or descriptors
like SIFT or using harris operator. When an image is undersampled, it becomes harder
for those appraoches to extract those features
undersamples images do lose alot of details and the target her is to find distinctive
features that can we use for tracking the motio of an object
Page 3 | 15
________________________________________________________________________________________
QUESTION 6 3 points
The Harris corner detection algorithm computes a 2 × 2 matrix at each pixel based on the first
derivatives at that point and then computes the two eigenvalues of the matrix, λ 1 and λ2, where λ1 ≤ λ2.
How can these two values be used to label each pixel as either a locally smooth region (S), an edge
point (E), or a corner point (C)? Give your answer by specifying “Label pixel S if …”, “Label pixel E if
…” and “Label pixel C if …”
There are three labels wherther an flat region, an edge, or a corner which is the goal to find.
if the Lambda 1 and lambda 2 are both very low, then they are labeled as a flat area region.
if lamdba 1 and lambda 2 are significantly high and are close to each others in value, then
this si definitely a corner point. the last case is when one eigen value lambda 1 is significantly
higher than the otehr, this is an egde point.
QUESTION 7 4 points
Given the two eigenvalues specified in (QUESTION 6), explain in English the rationale and
difference(s) between detecting corner points using the criterion λ1 > T1 versus the criterion λ1λ2 > T2,
where T1 and T2 are appropriate thresholds?
If there is lambda1 > T1, that only detects one eigen value whether it is larger than the threshold
or not. if using lambda12>T2 is ddetecting whether the product is producing the expected
product when the two eigen values are near each other and signifciantly high.
Page 4 | 15
___________________________________________________________________________
QUESTION 8 4 points
a. Explain the key principles underlying the Scale-Invariant Feature Transform (SIFT).
b. What is it used for?
c. What goals does it achieve?
d. How does it achieve them?
Page 5 | 15
________________________________________________________________________________________
QUESTION 9 3 points
What special difficulties would you expect to have in computing optical flow given
a. Spatially aliased images?
b. Temporally aliased image sequence?
c. What can you do to deal with these situations?
a. Spatially aliased images would affect the optical flow veclocities of the brightness pixels
to be detected successfully since the pixels themselves would be distorted/ loss
of motion details. this alias happens because of the sampling rate is two low.
solutions: high sampling rate and multi scale on the same images using pyramids to detect
flow on
b. it is also result of limited frame rate. this can cause the motion to miss some keypoints
that was already its velocity changed from one place to another.
Solution is to increase the frame rate as well.
Page 6 | 15
___________________________________________________________________________
QUESTION 10 4 points
For an image I(x,y), define its gradient vector field ∇ I(x,y).
a. Why is this vector field a useful thing to compute?
b. Define the gradient magnitude over the image plane (x,y).
c. Define the gradient direction over the image plane (x,y).
d. Explain how the gradient vector field is used in the Canny edge detector, the main
steps in its use, and its advantages.
a. The gradient vector will be useful for defining the gradient in teh ydirection and xdirection
where we can then use it in edge detection in canny for example or object detection or even in
corner detection by harris operation.
[Link] gradient magnitude is square root of x swuared + y swuared to result in all the edges in all directions
[Link] the image is being blurred by gaussian operator, then the gradient in x and y is obtained by usinh
some edge detection filter in both directions (fisrt derivative of gaussian). then the magnitude is
obtained to get the edge in both orientations, its advantages is that is its a robust approach for detecting
texture, features and even used as part of corner detection.
Page 7 | 15
________________________________________________________________________________________
QUESTION 11 5 points
a. Define the gradient vector field ∇2 f(x,y) over an image f(x,y), and explain what makes
it useful. Contrast its features and capabilities with the ∇2Gσ(x,y) (Laplacian of a
Gaussian) operator shown below.
b. Identify their respective orders as differential operators,
c. explain how they can be implemented, and
d. discuss any neurobiological analogues for both.
a. this gradient vector field is the laplacian of the gaussian filter (Second derivative) with derivative
of the gaussian. derivative of the gaussian or LOG both detect edges pretty good. but the LOG
is more robust and accurate since it uses the zero crossing as edge detector threshold.
b. Laplacian is the second order derivative while the otehr is first order.
c. The first one is taking the laplacian operator and apply it on the gaussian filter.
the second is applying any filter of the edge detecttion such as sobel or prewitt on the gaussian direct
[Link] retina performs similar tasks through its network of retinal ganglion cells,
which process visual information before sending it to the brain similar to the laplacian.
Page 8 | 15
___________________________________________________________________________
QUESTION 12 2 points
Suppose you were trying to design a machine vision system based as closely as possible upon
human vision; for example, perhaps a visual prosthesis for the blind.
a. Would you aim to include, as design goals, the standard geometric visual illusions as
well?
b. If such errors of vision emerged as unintended consequences of your design, would
you consider them to be features, or bugs, and why?
QUESTION 13 3 points
a. Provide a 3 × 3 discrete filter kernel array that approximates the Laplacian operator.
b. Explain what the Laplacian might be used for,
c. What is the significance of the sum of all of the coefficients in the filter?
0 -1 0
-1 4 -1 -1 -1 -1
-1 8 -1 the laplacian is the residual block that can retain the
0 -1 0 high frequencies of the image. it used in many ways:
-1 -1 -1
edge detection where i can apply laplacian to the
gaussian, image pyramid where i can reconstruct
the image by upscaling and add teh residual block to
it. it is also commonly used in sift detector where it can
be implemented on different scales and see the max
responses.
So i can check teh zero sum property. and to see if it is sum upto zero or not to check if it needs to
be normalized or does it maintain teh overall intensity or not.
Page 9 | 15
________________________________________________________________________________________
QUESTION 14 2 points
Extraction of visual features from images often involves convolution with filters that are
themselves constructed from combinations of differential operators.
One example is the Laplacian of a Gaussian Gσ(x,y) having scale parameter σ, generating the
filter ∇2Gσ(x,y) for convolution with the image I(x,y). Explain in detail each of the following
three operator sequences, where ∗ signifies two-dimensional convolution.
a. ∇2 [Gσ(x,y) ∗ I(x,y)]
b. Gσ(x,y) ∗∇2I(x,y)
c. What are the differences amongst their effects on the image?
Page 10 | 15
___________________________________________________________________________
QUESTION 15 2 points
Explain the notion of scale-space and how it is used in various areas of computer vision.
Include the following:
(a) Pyramidal representations of image structure across successive scales of blurred
undersampling.
(b) Edge detection operators that extract edges at particular scales of analysis, but
not at others.
a. This is used in different computer vision areas such as hybrid images for visualizing
the different scal of the image ro assess how good it was merged together. additionally, it is
used to extract different laplacian residuals blocks that is used in retaining the high frequencies
of the images. it is also used in blob and feature detection by SIFT where the different scales is
then been applied laplacian to it with different sigmas so we can see the maximum resonse
of the blov
b.
Page 11 | 15
________________________________________________________________________________________
QUESTION 16 4 points
a. Describe the Canny edge detection algorithm, including how the edge strength and
orientation is calculated.
b. What is the spatial orientation tensor and how is it used?
Canny edge detector first smoothes the image with a sigma specific. then the gradient is
obtained in the x and y directions. after that the magnitude is extracted and the orientations.
for making it more efficeint, canny uses non maxiimum suppression to retain only the thin
edges eliminating the other noises. also we are using histerises thresholfing where there are
two thresholds on high and one low to detect the strong edges and retain the weak edges
if it is only connected to one of the strong edges
Page 12 | 15
___________________________________________________________________________
QUESTION 17 4 points
Let the following matrix H denote a general 2D planar transformation:
a.
i. It has two: rotation and translation
degree of freedom: 3 theta and 2 for translaton
b.
i. Scaling + rotation + translation
dof: 4
Projective : 8 dofs
Page 13 | 15
________________________________________________________________________________________
QUESTION 18 5 points
a. Describe when it would be suitable to use the Prewitt masks.
b. Use a diagram to illustrate these masks.
c. Sketch the result when one of these masks is applied to the greyscale image below.
You only need to compute the values for the central region of the image that is fully
covered by the mask.
6 7 7 8 8 9
6 7 8 34 38 39
7 8 9 36 38 39
7 6 36 35 37 40
6 6 8 36 18 39
3 7 8 35 20 39
It is suitable to use prewitt masks when we need to retain the edges in the y direction and x firections as shown below:
prewitt vertical:
1 0 -1
1 0 -1
1 0 -1
Page 14 | 15
___________________________________________________________________________
Page 15 | 15