Module 1
Module 1
.IN
Challenges in Computer Vision
Inverse Problem: Involves reconstructing three-dimensional scenes from two-dimensional images.
Physics-Based Models: Understanding light reflection, shading, and transparency.
C
Machine Learning Dependency: Requires large datasets to train models effectively.
N
Error Proneness: Unlike humans, computer vision algorithms often misinterpret images due to variations
in lighting, occlusions, and object distortions.
SY
Computer vision has widespread applications across various industries. Some key areas include:
Industrial Applications
VT
Optical Character Recognition (OCR): Used in postal systems for reading handwritten text and number
plate recognition.
Machine Inspection: Quality assurance in manufacturing, defect detection in materials.
Retail Automation: Self-checkout systems and automated stores.
Warehouse Logistics: Autonomous robots for sorting and package delivery.
Medical Imaging: Assists in diagnosing diseases and tracking brain morphology over time.
Self-Driving Vehicles: Autonomous navigation using computer vision for real-time object detection.
3D Model Building (Photogrammetry): Constructing 3D models from drone and aerial photography.
Match Move in Film Industry: Used for integrating CGI with real-world footage.
Motion Capture (MoCap): Capturing human movements for animation and virtual reality.
1
Surveillance: Security monitoring, traffic analysis, and public safety.
Fingerprint and Facial Recognition: Biometric authentication and forensic applications.
.IN
C
N
SY
Figure 1.3 some common optical illusions and what they might tell us about the visual system: (a)
U
the classic where the lengths of the two horizontal lines appear different, probably due to the imagined perspective
effects. (b) The “white” square B in the shadow and the “black” square A in the light actually have the same absolute
VT
intensity value. The percept is due to brightness constancy, the visual system’s attempt to discount illumination when
interpreting colors. Image courtesy of Ted Adelson. (c) A variation of the Hermann grid illusion, courtesy of Hany
Farid. As you move your eyes over the figure, gray spots appear at the intersections. (d) Count the red Xs in the left
half of the figure. Now count them in the right half. Is it significantly harder? The explanation has to do with a pop-
out effect, which tells us about the operations of parallel perception and integration pathways in the brain.
Consumer-Level Applications
Photo Stitching: Creating panoramas by merging overlapping images.
Exposure Bracketing: Combining multiple exposures to optimize image lighting.
Morphing: Transforming one image into another with smooth transitions.
3D Modelling: Converting images into 3D representations.
Video Stabilization: Reducing motion blur and shake in recorded videos.
Photo-Based Walkthroughs: Virtual navigation through image datasets.
2
Face Detection & Visual Authentication: Used in smart cameras and device security.
.IN
Algorithmic Approach in Computer Vision
Computer vision often involves solving inverse problems and estimating unknown quantities.
C
Algorithms should be robust to noise and adaptable to real-world conditions.
Efficient use of computational resources is crucial.
N
Bayesian techniques help ensure robust estimations.
SY
3
.IN
C
N
SY
Figure 1.4 some industrial applications of computer vision: (a) optical character recognition (OCR), (b) mechanical
inspection (c) warehouse picking, (d) medical imaging, (e) self-driving cars, (f) drone-based photogrammetry.
U
VT
4
.IN
C
N
SY
Figure 1.5 some consumer applications of computer vision: (a) image stitching: merging different views. (b) Exposure
bracketing: merging different exposures; (c) morphing: blending between two photographs (d) smartphone
augmented reality showing real-time depth occlusion effects.
U
VT
Advancements in deep learning and AI continue to enhance accuracy and real-world applicability.
Integration with robotics, augmented reality (AR), and the Internet of Things (IoT) is expanding computer
vision's capabilities.
Ethical considerations, including privacy and bias in AI models, are crucial for responsible deployment.
5
1.2 A Brief History of Computer Vision
Introduction: Computer vision, as a field of study, has evolved significantly over the past five decades.
This section presents a chronological overview of key developments in the field, focusing on pioneering
research, major breakthroughs, and the technological advancements that have shaped modern computer
vision. 1970s: The Birth of Computer Vision
Computer vision emerged in the early 1970s as a subfield of artificial intelligence (AI) and robotics. The
initial goal was to develop visual perception systems capable of mimicking human vision, enabling robots
to interact intelligently with their surroundings.
Early Ambitions and Challenges: Researchers at institutions like MIT, Stanford, and Carnegie Mellon
University (CMU) believed that solving the problem of visual input processing was a stepping stone toward
achieving advanced reasoning and planning capabilities.
Historical Anecdote: In 1966, Marvin Minsky at MIT tasked undergraduate student Gerald Jay Sussman
with a summer project: linking a camera to a computer to describe what it saw. This underestimated the
complexity of the task, revealing the vast challenges in replicating human vision computationally.
.IN
Differentiation from Image Processing: Unlike digital image processing, which focused on manipulating
images (e.g., filtering, enhancement), computer vision aimed to reconstruct the three-dimensional (3D)
structure of a scene from two-dimensional (2D) images.
C
Early Approaches:
o Edge Detection & Scene Understanding: Techniques such as line labeling (Huffman, 1971; Clowes, 1971;
N
Waltz, 1975) and edge-based 3D reconstruction (Roberts, 1965) laid foundational work.
SY
o Shape Representation: Researchers explored ways to model 3D structures using geometric primitives, such
as generalized cylinders (Agin & Binford, 1976; Nevatia & Binford, 1977).
o Understanding Light & Shading: Efforts to interpret shading and lighting variations in images led to the
concept of intrinsic images (Barrow & Tenenbaum, 1981) and the 2.5D sketch model (Marr, 1982).
U
VT
Figure 1.8 Examples of computer vision algorithms from the 1980s: (a) pyramid blending. (b) shape from shading.
(c) edge detection. (d) physically based models (e) regularization-based surface reconstruction. (f) range data
acquisition and merging .
6
.IN
Figure 1.9 Examples of computer vision algorithms from the 1990s: (a) factorizationbased structure from motion
C
(b) dense stereo matching (c) multi-view reconstruction (d) face tracking (e) image segmentation
N
1980s: Mathematical Rigor and Computational Techniques
SY
The 1980s witnessed the adoption of sophisticated mathematical models for quantitative image and scene analysis.
Multi-Resolution Image Analysis: Image pyramids (Burt & Adelson, 1983) were developed for coarse-to-fine
image processing, leading to the emergence of scale-space representations (Witkin, 1983).
U
Shape-from-X Techniques: Various methods were introduced to infer 3D shapes from different visual cues:
VT
7
1990s: Algorithmic Advancements and Statistical Learning
The 1990s saw the emergence of more advanced computational techniques, making computer vision more practical
for real-world applications.
Structure from Motion (SfM): Methods to reconstruct 3D scenes from moving cameras gained prominence
(Faugeras, 1992; Hartley & Zisserman, 2004).
Statistical Learning Approaches:
o Eigenfaces (Turk & Pentland, 1991) revolutionized face recognition using Principal Component Analysis (PCA).
o Hidden Markov Models (HMMs) and Kalman filters were applied to motion tracking and object recognition.
Segmentation & Graph-Based Approaches:
o The Normalized Cuts method (Shi & Malik, 2000) improved image segmentation techniques.
o Graph-cut algorithms (Boykov, Veksler, & Zabih, 1998) optimized stereo correspondence.
Computer Vision & Graphics Integration:
Image-based rendering techniques enabled applications like panoramic stitching (Mann & Picard, 1994) and
.IN
o
morphing (Beier & Neely, 1992).
C
N
SY
U
VT
Figure 1.10 Examples of computer vision algorithms from the 2000s: (a) image-based rendering (b) image-based
modeling (c) interactive tone mapping (d) texture synthesis, (e) feature-based recognition (f) region-based
recognition
8
.IN
Figure 1.11 Examples of computer vision algorithms. (a) the Supervision deep neural network (b) object instance
segmentation (c) whole body, expression, and gesture fitting from a single image. (d) fusing multiple color depth
images using the KinectFusion real-time system (e) smartphone augmented reality with real-time depth occlusion
C
effects. (f) 3D map computed in real-time on a fully autonomous.
N
2000s: The Rise of Machine Learning and Large-Scale Data
SY
In the 2000s, computer vision benefited from increased computational power, larger datasets, and improved
machine learning models.
Feature-Based Recognition:
U
o Scale-Invariant Feature Transform (SIFT) (Lowe, 1999) became a standard feature extraction technique.
o Speeded-Up Robust Features (SURF) (Bay et al., 2006) improved efficiency in object detection.
VT
So the evolution of computer vision has been marked by ground-breaking theories, mathematical advancements, and
increasing computational capabilities. From early edge-detection methods to modern deep learning-based recognition
9
systems, computer vision continues to revolutionize industries, including robotics, healthcare, surveillance, and
entertainment. The field is expected to grow further, driven by AI advancements and the availability of large-scale
visual datasets.
.IN
process of image formation. More comprehensive discussions on this topic can be found in textbooks related to
computer graphics and image synthesis (Cohen & Wallace, 1993; Sillion & Puech, 1994; Watt, 1995; Glassner, 1995;
C
Weyrich et al., 2009; Hughes et al., 2013; Marschner & Shirley, 2015).
N
2.2.1 Lighting and Its Role in Image Formation
Light is fundamental for image creation. Without illumination, a scene remains dark, and no image can be captured.
SY
The process of image formation depends on the presence of one or more light sources. While some imaging
techniques, such as fluorescence microscopy and X-ray tomography, operate differently, the focus here is on
U
10
o Environment maps record incident light from all directions and map it to corresponding color values.
o These maps assume that all light sources exist at an infinite distance.
o They can be represented in different formats:
Cubical face maps (Greene, 1986)
Longitude–latitude maps (Blinn & Newell, 1976)
Reflective sphere images (Watt, 1995)
o A practical way to create an environment map is by capturing an image of a mirrored sphere and then processing it
to match the required mapping (Debevec, 1998).
.IN
o A general function that describes how light is reflected at an interface given the incoming and outgoing directions of
light.
It accounts for both diffuse and specular reflections.
C
o
2. Diffuse Reflection:
N
o Occurs when light scatters uniformly in all directions upon hitting a rough surface.
SY
o The Lambertian reflectance model assumes that the apparent brightness of the surface remains constant regardless of
the viewing angle.
3. Specular Reflection:
U
o Happens when light reflects in a particular direction, as seen in smooth and shiny surfaces (e.g., mirrors, polished
metals).
VT
11
reflectance, we can create more visually accurate and physically meaningful representations of the world in digital
images.
Figure 2.15 (a) Light scatters when it hits a surface. (b) The bidirectional reflectance distribution function (BRDF)
f(_i; _i; _r; _r) is parameterized by the angles that the incident, ^vi, and reflected, ^vr, light ray directions make with
the local surface coordinate frame (^dx;^dy; ^n).
The BRDF is a four-dimensional function that defines how much light arriving from one direction (incident light)
is reflected in another direction. It is typically represented as:
U
VT
In simpler terms, the BRDF describes how much light is reflected in a given direction based on how the light
originally struck the surface. This function is crucial for realistic rendering in computer graphics, image processing,
and physics-based simulations of light interactions.
12
Helmholtz Reciprocity and BRDF Symmetry
One of the fundamental properties of the BRDF is reciprocity, also known as Helmholtz reciprocity. This principle
states that if we swap the incident and reflected light directions, the BRDF remains unchanged. Mathematically, this
means:
1. Isotropic Surfaces
.IN
o These surfaces reflect light uniformly in all directions, meaning there is no preferred orientation for light transport.
o Examples include plastic, unpolished wood, and matte surfaces.
o The BRDF for isotropic surfaces simplifies to:
C
N
SY
This means that reflection depends only on the angle difference between the incident and reflected directions, rather
than their absolute orientations.
2. Anisotropic Surfaces
U
These surfaces have directional dependence, meaning light reflection varies based on the orientation of surface
features.
VT
A common example is brushed aluminum or scratched metal, where the scratches create a preferred direction of
reflection.
The BRDF of anisotropic materials must account for this directional dependence and cannot be simplified in the same
way as isotropic materials.
13
Key Points:
The BRDF models how light is reflected from a surface based on its material properties and incoming light direction.
It is a four-dimensional function that depends on incident and reflected angles, as well as the light wavelength.
Helmholtz reciprocity states that swapping incident and reflected directions does not change the BRDF value.
.IN
Isotropic surfaces reflect light evenly, while anisotropic surfaces have a directional dependence.
BRDFs are essential in computer graphics, image synthesis, and physics-based simulations to create realistic
C
lighting effects.
N
SY
U
VT
Figure 2.16 This close-up of a statue shows both diffuse (smooth shading) and specular (shiny highlight) reflection,
as well as darkening in the grooves and creases due to reduced light visibility and interreflections.
14
BRDFs for a given surface can be obtained through physical modeling heuristic modeling, or through empirical
observation. Typical BRDFs can often be split into their diffuse and specular components, as described below.
.IN
C
N
SY
U
VT
Diffuse Reflection
Diffuse reflection, also known as Lambertian reflection, occurs when light is scattered uniformly in all directions
upon striking a surface. This type of reflection is responsible for the soft, non-shiny appearance of materials such as
painted walls, paper, or statues.
Unlike specular reflection, where light is reflected in a single direction, diffuse reflection results from the selective
absorption and re-emission of light within the material. This interaction often gives objects their characteristic body
color, as certain wavelengths are absorbed while others are reflected.
While light is scattered uniformly in all directions, i.e., the BRDF is constant,
15
Effect of Incident Angle on Diffuse Reflection
The amount of light reflected from a surface depends on the angle between the incident light direction and the
surface normal (denoted as θᵢ). At oblique angles, the same amount of light spreads over a larger surface area,
reducing its intensity.
When the surface normal points away from the light source, the surface becomes completely self-shadowed,
receiving little to no illumination. A practical example is how we adjust our body’s orientation toward the Sun or a
fireplace to maximize warmth. Similarly, a flashlight shining directly on a wall appears brighter than when projected
at an angle. The shading equation for diffuse reflection models this effect mathematically to determine surface
illumination.
.IN
C
Specular Reflection
N
Specular reflection (gloss or highlight reflection) depends on the outgoing light direction. For a mirrored surface,
light reflects in a direction rotated 180° around the surface normal ^n (Figure 2.17b). This means that incident light
SY
rays follow a predictable reflection path, computed using geometric principles. In simpler terms, specular reflection
creates sharp highlights on smooth surfaces, like the shine on polished metal or a water surface.
U
VT
16
Phong Shading
Phong (1975) introduced a shading model that combines diffuse reflection, specular reflection, and an additional
component called ambient illumination. This ambient term accounts for general, indirect lighting caused by inter-
reflections (such as light bouncing off walls) or distant sources like the sky. Unlike diffuse and specular reflections,
the ambient component does not depend on surface orientation but is influenced by the colors of both the ambient
illumination Lₐ(λ) and the object’s ambient reflectivity kₐ(λ).
.IN
C
N
SY
U
VT
Typically, the ambient and diffuse reflection color distributions kₐ(λ) and k_d(λ) are similar since both result from
subsurface scattering within the material. The specular reflection kₛ(λ) is usually uniform (white), as interface
reflections do not alter the light color. However, metallic materials like copper show color variation in specular
reflection.
The ambient illumination Lₐ(λ) often differs in color from direct light sources Lᵢ(λ)—for example, it may be blue in
an outdoor setting or yellow in an indoor environment lit by candles or incandescent lights. Shadows appear bluer
due to ambient sky illumination.
17
While the diffuse component depends on the angle of the incoming light direction ^vi, the specular component is
influenced by the angle between the viewer ^vr and the specular reflection direction ^si, which itself depends on
^vi and the surface normal ^n.
Though widely used, the Phong shading model has been replaced by more physically accurate models such as the
Cook-Torrance model (1982), based on the microfacet theory of Torrance and Sparrow (1967). Initially, Phong
shading was implemented in hardware, but modern programmable pixel shaders allow for more complex and
realistic shading models.
The Torrance and Sparrow (1967) model of reflection also forms the basis of Shafer’s (1985) di-chromatic reflection
model, which states that the apparent color of a uniform material lit from a single source depends on the sum of two
.IN
terms,
C
N
The reflected light radiance consists of two components: Lᵢ, the radiance reflected at the interface, and L_b, the
radiance reflected from the surface body. Each is a product of a relative power spectrum c(λ) (wavelength-
SY
dependent) and a magnitude function m(^vᵣ, ^vᵢ, ^n) (geometry-dependent). This model, derived from a
generalized Phong model, has been effectively used in computer vision for specular object segmentation (Klinker,
1993) and has influenced two-color models in applications like Bayer pattern demosaicing (Bennett et al., 2006).
U
Basic shading models assume light travels directly from sources to surfaces and then to the camera. However, in
reality, light can be blocked (shadows) and bounce multiple times before reaching the camera.
1. Ray Tracing / Path Tracing – Ideal for highly specular scenes (e.g., glass, mirrors). It traces individual rays,
following their reflections and refractions (Glassner, 1995).
2. Radiosity – Suited for diffuse surfaces, calculating light exchange between surfaces based on form factors and
reflectance (Cohen & Wallace, 1993).
18
Ray Tracing Algorithm:
Traces rays from the camera to surfaces, computing illumination using shading equations.
Radiosity Algorithm:
Associates light values with surface patches and computes light exchange.
Uses a linear system to solve for final illumination and allows rendering from different viewpoints.
Does not account for near-field effects like corner darkening, which some computer vision techniques have
addressed (Nayar et al., 1991).
Both methods influence scene appearance and 3D interpretation and can be combined for more accurate global
.IN
illumination.
C
2.2.3 OPTICS:
N
After light from a scene enters the camera, it passes through the lens before reaching the sensor. A simple pinhole
SY
model assumes all rays pass through a single projection point. However, to account for focus, exposure, vignetting,
and aberration, a more advanced optical model is needed (Möller, 1988; Ray, 2002; Hecht, 2015).
The thin lens model (Figure 2.19) represents a basic lens as a single glass element with equal curvature on both
U
sides. According to the lens equation, the relationship between the object distance (z₀) and the image distance (zᵢ)
VT
19
.IN
C
N
Focal Length and Depth of Field:
SY
The focal length (f) of a lens determines where light converges to form a sharp image. When an object is at infinity
(z₀ → ∞), the image forms at zᵢ = f, making the lens function similarly to a pinhole camera at distance f from the
focal plane (Figure 2.10).
U
If the focal plane is shifted (e.g., by adjusting the focus ring), objects at z₀ become out of focus, leading to a circle
VT
of confusion (c), which depends on the focus shift (Δzᵢ) and aperture diameter (d). The range of distances where
objects remain acceptably sharp is the depth of field, influenced by focus distance and aperture size (Figure 2.20).
The f-number (f/# or N) defines the aperture relative to focal length.
Since this depth of field depends on the aperture diameter d, we also have to know how this varies with
20
the commonly displayed f-number, which is usually denoted as f / # or N and is defined as
where the focal length f and the aperture diameter d are measured in the same unit (say, millimetres).
The f-number (f/# or N) represents the ratio of a lens’s focal length (f) to its aperture diameter (d). It is commonly
written as f/1.4, f/2, f/2.8, …, f/22, where dividing f by the f-number gives the actual aperture diameter. Alternatively,
the notation N = 1.4, 2, 2.8, etc. is also used.
.IN
The standard progression of f-numbers follows a sequence based on multiples of √2 (e.g., 1.4, 2, 2.8, 4, 5.6, etc.).
Each step, known as a full stop, halves or doubles the amount of light entering the lens. This corresponds to a one-
stop change in exposure value (EV), which has the same effect on light intensity as doubling or halving the shutter
C
speed (e.g., from 1/250s to 1/125s).
N
Depth of Field and Optical Calculations
SY
By understanding f-numbers, one can compute the depth of field (DoF), which depends on the focal length (f),
circle of confusion (c), and focus distance (z₀). These relationships can be used to generate DoF plots, helping
photographers and optics engineers analyze focus characteristics, as demonstrated in Exercise 2.4.
U
Unlike the ideal thin lens model, real lenses suffer from geometric aberrations due to their thickness and optical
design. Compound lens elements help correct these distortions. The five Seidel aberrations in third-order optics
include:
1. Spherical Aberration – Light rays at different distances from the optical axis focus at varying
points.
4. Curvature of Field – A flat object appears distorted due to a curved focal plane.
5. Distortion – Lines appear bent due to magnification variations across the image.
21
Understanding these optical principles enables better lens design and image quality optimization (Möller 1988; Ray
2002; Hecht 2015).
.IN
1. Causes of Chromatic Aberration
Chromatic aberration arises because the index of refraction of glass varies with wavelength. This variation causes
C
different colors of light to focus at slightly different distances from the lens, resulting in color fringing and blurring
N
in images. The two primary types of chromatic aberration are:
Transverse Chromatic Aberration (TCA): This occurs when different wavelengths experience
SY
different magnification factors, leading to color shifts at the edges of the image. It can be modeled
as per-color radial distortion and corrected using calibration techniques (see Section 11.1.4).
U
Longitudinal Chromatic Aberration (LCA): This happens when different wavelengths focus at
different depths along the optical axis, causing color-dependent blurring. While some correction
VT
techniques exist (Section 10.1.4), severe longitudinal aberration can cause high-frequency image
details to be irreversibly lost.
To minimize chromatic and other optical aberrations, modern photographic lenses use compound lens designs
composed of multiple glass elements, each with specialized coatings. Unlike simple lenses, compound lenses cannot
be approximated by a single nodal point (P) in a pinhole model. Instead, they have:
Front Nodal Point: The point where incoming light rays appear to converge before entering the lens.
Rear Nodal Point: The point where light rays exit towards the sensor.
For camera calibration, especially in applications like parallax-free panorama stitching, determining the exact
location of the front nodal point is crucial (Section 8.2.3; see Littlefield 2006, Houghton 2013).
22
3. Challenges in Modeling Wide-Angle and Fisheye Lenses
Not all lenses fit the simple nodal point model. Certain optical systems, including:
Catadioptric imaging systems (which combine lenses with curved mirrors, e.g., Baker and Nayar
1999)
…do not allow all captured rays to pass through a single nodal point. Instead of using a traditional lens model, a
mapping function or look-up table is created to directly relate pixel coordinates to corresponding 3D rays in
space. Various research efforts have developed methods for this approach (Gremban et al. 1988; Champleboux et
al. 1992; Grossberg & Nayar 2001; Sturm & Ramalingam 2004; Tardif et al. 2009), as discussed in Section
2.1.5.
By understanding and compensating for chromatic aberration and lens distortions, photographers and optical
.IN
engineers can significantly enhance image quality and camera performance.
Vignetting is a phenomenon in which the brightness of an image decreases towards the edges. This happens
N
due to the way light interacts with the lens and sensor.
SY
1. Natural Vignetting – Caused by the way light travels through the lens.
U
Natural vignetting happens because light from objects at an angle (β) reaches the lens with reduced intensity.
The reasons for this include:
23
Distance-Based Light Fall-off (Inverse-Square Law):
Together, these effects cause brightness to drop more at the edges of the image than at the center.
.IN
The light reaching the camera sensor E is related to:
C
N
Focal length (f) – The distance from the lens to the focus point.
U
Unlike natural vignetting, mechanical vignetting happens when parts of the lens block light near the edges.
This can be caused by:
24
Lens housing or internal elements blocking peripheral rays.
Unlike natural vignetting, mechanical vignetting can be reduced by using a smaller aperture (higher f-
number).
.IN
C
N
SY
U
VT
25
4. How to Reduce Vignetting?
Vignetting is not always a bad thing—some photographers use it artistically to draw attention to the center of an
image. However, if unwanted, here’s how to reduce it:
✅ Use a smaller aperture (higher f-number) to allow light from wider angles.
✅ Choose high-quality lenses designed to minimize vignetting.
✅ Apply post-processing corrections in software like Adobe Lightroom or Photoshop.
By understanding vignetting, photographers and students can improve their image composition and optimize
camera settings for better brightness distribution across the frame.
.IN
When light travels from a source, reflects off surfaces, and passes through a camera’s lens, it eventually
reaches the imaging sensor. The question is: how does this light get converted into the digital RGB (Red,
C
Green, Blue) values that make up a digital image?
N
SY
U
VT
A digital camera follows a series of steps to process light into an image. This involves exposure settings
(shutter speed and gain), non-linear adjustments, sampling, aliasing, and noise reduction. Researchers like
Healey and Kondepudy (1994), Tsin, Ramesh, and Kanade (2001), and Liu, Szeliski et al. (2008) have
developed simplified models to explain these processes. More advanced models, such as those by
26
Chakrabarti, Scharstein, and Zickler (2009), use 24 parameters for better accuracy. Recent research by
Brooks, Mildenhall et al. (2019) has focused on reversing the in-camera processing of JPEG images to
restore their original RAW format, which helps in reducing noise.
When light reaches the camera sensor, it is detected by an active sensing area. The sensor integrates light
for a specific duration, known as exposure time (or shutter speed), measured in fractions of a second like
1/125, 1/60, or 1/30. The captured signal is then passed through amplifiers before being converted into
digital data.
There are two main types of image sensors used in modern digital cameras:
.IN
o The accumulated charge is transferred in a step-by-step manner (like a bucket brigade)
to sense amplifiers, which convert the signal for digital processing.
o Older CCDs had an issue called blooming, where excess charge from overexposed
C
pixels spilled into neighboring ones. However, modern CCDs have anti-blooming
N
technology to prevent this.
SY
o Each pixel has its own amplifier, making data transfer faster and more power-efficient
VT
than CCDs.
o CMOS sensors are widely used in modern digital cameras, smartphones, and video
recording devices.
27
.IN
Key Factors Affecting Image Sensor Performance
Shutter Speed:
Controls how long light is allowed to hit the sensor.
C
o Fast shutter speed (e.g., 1/1000s): Reduces motion blur, useful for action shots.
N
o Slow shutter speed (e.g., 1/30s): Allows more light in, useful in low-light conditions.
Sampling Pitch:
SY
28
o Comes from different sources (e.g., sensor defects, electrical interference, photon
limitations).
o Noise is more noticeable in low-light images.
o Cameras use noise reduction algorithms to improve quality.
ADC (Analog-to-Digital Conversion):
o Converts light into digital values (e.g., 8-bit JPEG or 16-bit RAW).
o More bits = better dynamic range, but actual usable bits depend on noise levels.
.IN
White Balance: Adjusts color to match real-world lighting conditions.
Gamma Correction: Adjusts brightness and contrast for better display.
Compression: Reduces file size (e.g., JPEG) while preserving quality.
C
N
Advancements in Camera Technology
Modern sensors are improving rapidly with advancements in image processing, noise reduction, and
SY
depth sensing. Conferences like the IS&T Symposium on Electronic Imaging Science and Technology and
sources like the Image Sensors World blog track these developments.
U
29
If the sampling rate is too low, different signals become indistinguishable.
.IN
C
N
SY
U
VT
30
100% fill-factor sensor
High-quality 9-tap filter
More examples of decimation filters in Section 3.5.2 and Figure 3.29.
Point Spread Function (PSF) & Aliasing Prediction
PSF (Point Spread Function) helps estimate aliasing effects.
Represents the sensor’s response to an ideal point light source.
PSF is a result of:
o Optical system (lens) blur
o Finite sensor resolution
.IN
C
N
SY
U
31
The MTF estimates aliasing based on the area of the Fourier magnitude outside the Nyquist
frequency (f ≤ fs).
Defocusing the lens increases the blur radius to 2s (Figure 2.27c):
o Aliasing decreases significantly.
o But image detail also decreases (frequencies closer to f = fs are lost).
Estimating PSF in Laboratory Conditions
Using a pinhole:
o A point light source is placed behind a black pinhole in a cardboard piece.
o The pinhole's image forms the PSF, but only at pixel-level accuracy.
o It models large blur (e.g., defocus blur) but not sub-pixel detail.
Alternative method (Section 10.1.4):
o Uses a calibration pattern (e.g., slanted step edges).
.IN
C
N
SY
U
VT
32
The correct subtractive primary colors are:
o Cyan (light blue green)
o Magenta (pink)
o Yellow
Black is also used in four-color printing (CMYK).
Subtractive colors absorb certain wavelengths in the color spectrum.
Additive Color Mixing
Primary colors in additive mixing:
o Red, Green, Blue (RGB)
These combine (e.g., in TVs & monitors) to produce:
o Cyan, Magenta, Yellow, White, etc.
Additive colors are used in light-emitting devices.
.IN
How Do Two Colors Create a Third Color?
When red and green produce yellow, are wavelengths being mixed?
No, colors do not mix at a wavelength level.
C
This effect is due to the tri-stimulus (or tri-chromatic) nature of human vision:
N
o The eye has three types of cone cells, each responding to different wavelengths.
o Our brain interprets combined signals as new colors.
SY
Note that for machine vision applications, such as remote sensing and terrain classification, it is
preferable to use many more wavelengths. Similarly, surveillance applications can often benefit
from sensing in the near-infrared (NIR) range.
U
VT
33
CIE RGB and XYZ :
Based on the tri-chromatic theory of perception, which suggests that all monochromatic (single-
wavelength) colors can be reproduced using a mix of three primary colors.
The CIE (Commission Internationale d’Éclairage) standardized the RGB representation
in the 1930s.
Primary colors and their wavelengths:
o Red → 700.0 nm
o Green → 546.1 nm
o Blue → 435.8 nm
1. Color Matching Experiment
Performed with a standard observer (average perceptual results over multiple subjects).
Key findings:
.IN
o For certain blue-green spectra, negative amounts of red had to be added to
achieve a color match.
o This means that some colors outside the RGB spectrum cannot be accurately
C
represented using positive RGB values.
N
2. Metamers & Lighting Effects
Metamers: Different colors with different spectral compositions that appear identical under
SY
certain lighting.
Example: Two fabrics or paints may match in one lighting condition but look different in
another.
U
Developed by the CIE to address issues in the RGB model, especially the problem of
mixing negative light.
XYZ contains all pure spectral colors with only positive values, making it more practical
for color representation.
34
CIE XYZ Color Space
.IN
C
Chromaticity Diagram
The chromaticity diagram is formed by sweeping the monochromatic wavelength λ\lambdaλ
N
from 380 nm to 800 nm.
SY
Key observations:
o The outer curve represents all possible spectral colors.
o The straight-line segment (purple line) connects the endpoints and represents non-
U
spectral purples.
The inset triangle defines red, green, and blue, aligning with RGB color matching
VT
experiments.
35
VT
U
SY
N
C
.IN
36
Color cameras
RGB Video Cameras and Color Representation
How RGB Cameras Work
RGB and XYZ models describe perceived color but do not explain how cameras capture
color.
Color monitors do not measure only the nominal red (700 nm), green (546.1 nm), and
blue (435.8 nm) wavelengths.
Monitors mix different wavelengths to reproduce colors, such as emitting negative red light
for cyan reproduction.
Standard Definition & HDTV Color Models
Early RGB video cameras relied on phosphors in CRT displays.
The NTSC standard (ITU-R BT.601) mapped RGB values to XYZ values for consistent
.IN
color perception.
HDTV and modern displays follow ITU-R BT.709, which defines the transformation from
RGB to XYZ:
C
N
SY
U
VT
Color monitors do not strictly emit single wavelengths; they create colors through spectral
mixing.
Different display technologies (CRT, HDTV, modern LCD) use different RGB-to-XYZ
mappings.
Cameras integrate color over a spectrum, not just at fixed wavelengths.
37
Camera Spectral Sensitivities and Color Filter Arrays:
Camera spectral sensitivities are not always publicly available unless the manufacturer provides data
or they are measured using monochromatic light. Standards like BT.709 do not define these
sensitivities, only the transformation needed to match standard color values. This means
manufacturers can use different sensor sensitivities as long as they can be converted to standard RGB
values through a linear transformation.
TV and computer monitors follow standard RGB output rules but can apply digital transformations
before displaying colors. Properly calibrated monitors provide color management, ensuring
consistency between real-life colors, screens, and printed images.
Early color cameras used three separate tubes or RGB sensors for capturing images. Modern digital
cameras, however, use a Color Filter Array (CFA), where individual sensors are covered with color
.IN
filters instead of using separate chips for red, green, and blue.
The Bayer pattern is the most common CFA used today. It has twice as many green filters as red and
blue, arranged in a checkerboard-like structure. This is because human vision is more sensitive to
C
luminance (brightness) than to chrominance (color). Green light plays a major role in luminance,
N
which helps maintain sharpness in images. This property is also used in color image compression
techniques.
SY
Since each pixel only captures one color, missing color values are estimated using a process called
demosaicing. This technique reconstructs full-color images by filling in the missing red, green, and
U
crystal areas. This setup creates the perception of full-color images on the screen.
The human visual system is more sensitive to brightness (luminance) than to color details
(chrominance). Because of this, digital techniques can be used to enhance the sharpness of images.
By applying special filtering methods to RGB or monochrome images, the perception of crispness
can be improved, making images appear clearer and more detailed.
Color Balance:
Cameras adjust color balance before encoding RGB values to ensure that white areas
appear truly white. This process shifts the white point of an image to compensate for
different lighting conditions. If the camera and lighting system match (such as using
38
daylight illuminant D65 in BT.709), minimal adjustment is needed. However, under
strong-colored lighting, like warm indoor lighting that adds a yellow or orange tint,
significant corrections are required.
A simple way to adjust color balance is by multiplying each RGB value by a specific factor,
effectively scaling the colors to achieve a more neutral white. More advanced methods use a 3×3
color transformation matrix, which can map colors to XYZ space and back, creating a more refined
correction known as a "color twist."
Gamma
Early black-and-white televisions used phosphors in CRT displays that responded non-
linearly to voltage, meaning the brightness did not increase proportionally with the input
signal. This relationship is described by a value called gamma (γ), which characterizes how
the display converts input voltage into visible brightness.
.IN
C
N
SY
U
VT
with a of about 2.2. To compensate for this effect, the electronics in the TV camera would
pre-map the sensed luminance Y through an inverse gamma,
39
Gamma Correction in Imaging and Its Challenges
In the early days of analog television, signals were adjusted using gamma correction before being
transmitted. This adjustment had an unexpected benefit—it reduced noise in darker areas of the
image, making them appear clearer. This worked well because human vision is more sensitive to
changes in brightness rather than absolute brightness levels.
When color television was introduced, the same gamma correction process was applied separately to
red, green, and blue signals before they were combined and encoded. Even though modern digital
systems no longer have analog noise issues, gamma correction is still useful because digital signals
are compressed, and applying inverse gamma helps maintain image quality.
However, gamma correction can be problematic in computer vision and graphics. Many image-
processing techniques, such as shading and lighting simulations, assume a linear brightness scale. For
accurate results, calculations should be performed in a linear space before gamma correction is
.IN
applied for display. Unfortunately, many graphics systems do not follow this approach, leading to
errors in brightness and color representation.
Gamma Correction and Its Role in Computer Vision
C
Many computer graphics systems directly use RGB values and display them without adjusting for
N
gamma. However, newer imaging standards like 16-bit scRGB use a linear space, reducing gamma-
related issues.
SY
In computer vision, working with accurate brightness levels is essential. Techniques like photometric
stereo (used to estimate surface details) and image deblurring require brightness values to be in a
linear intensity scale. Therefore, before performing precise image calculations, it's important to
U
remove gamma correction and undo any automatic color adjustments made by the camera.
VT
Researchers like Chakrabarti, Scharstein, and Zickler (2009) developed a 24-parameter model to
accurately simulate modern digital camera processing. They also provide a dataset of images useful
for testing.
However, in some vision applications, such as feature detection, stereo vision, or motion
estimation, linearization is not always necessary. Deciding whether to remove gamma requires
careful thought—for example, in image stitching, adjusting for brightness differences might not
require full gamma correction.
Understanding these processing steps can be complex. A useful way to explore these effects is through
hands-on testing—taking pictures of color charts and comparing RAW images with JPEG-
compressed versions.
40
Color Spaces
RGB and XYZ are the main color spaces used to describe color signals. However, other
representations have been developed for video and image processing.
One of the earliest color representations for video transmission was the YIQ standard (used in NTSC
video in North America) and the YUV standard (used in PAL video in Europe). These systems were
designed to include a luma (Y) channel, which mimics true luminance and is similar to a black-and-
white TV signal, along with two chroma channels for color information.
In these systems, the Y' (luma) signal is computed as follows:
.IN
C
N
SY
In NTSC and PAL video systems, the chroma signals (color information) were filtered and
VT
superimposed on the Y' luma signal. This allowed older black-and-white TVs to ignore the color
information, while color TVs could process it correctly.
In modern digital video and still image compression, the YCbCr color space is commonly used. It
is closely related to YUV but uses different scale factors to fit within the 8-bit digital signal range.
Y' (Luma) represents brightness and is adjusted within the range [16…235] in video formats.
Cb and Cr (Chroma) represent color differences and are scaled within [16…240] in video
formats.
For JPEG images, the full 8-bit range [0…255] is used for all values.
41
Here, R'G'B' are the gamma-compressed 8-bit RGB values.
This conversion is mainly handled by software when opening or saving an image.
It is useful for compression and color processing but may require careful handling in image
deblocking (removing artifacts from compressed images).
HSV Color Space
Another commonly used color space is HSV (Hue, Saturation, Value).
Hue represents the color type.
Saturation measures color intensity.
Value represents brightness (similar to luma).
This model is useful for intuitive color adjustments and computer vision applications.
.IN
HSV and Color Ratios
HSV (Hue, Saturation, Value) Color Space
HSV is another way to represent colors by transforming RGB values into:
C
o Hue (H): The color type, represented as an angle around a color wheel.
N
o Saturation (S): The intensity of the color, with higher values indicating purer colors
and lower values appearing more gray.
SY
o Value (V): The brightness of the color, which can be defined as either the mean or the
maximum RGB value.
Usage:
U
o It approximates the Munsell color chart, a system for organizing colors based on
human perception.
o In HSV images, saturation is often shown in grayscale (darker = more saturated),
while hue is represented in full color.
Using Color Ratios for Image Processing
If an algorithm should only affect luminance (brightness) and not saturation or hue, an
alternative approach is to use:
o Yxy color space, which separates luminance and chromaticity.
o Simpler color ratios, defined as:
42
o These ratios help maintain color balance while modifying brightness.
Application:
o After adjusting the luma component (e.g., through histogram equalization), colors
can be restored by multiplying each color ratio by the ratio of new-to-old luma values.
o There are many color models, but in practice, their differences may not matter much
depending on the application.
o The Lab color space*, designed to align with human perception, is also similar in
purpose to HSV.
o For more details, Charles Poynton's Color FAQ (available online) is a helpful
resource.
.IN
1. Final Step in Camera Processing
o Before storing an image, most cameras compress the data (except when using formats
like RAW or PNG, which are lossless).
C
2. YCbCr Conversion
N
o Images and videos are converted from RGB to YCbCr to prioritize luminance (Y)
over color details (Cb, Cr), since human eyes detect brightness better than color.
SY
to save space.
3. Transforming the Image
VT
o After reducing color data, the image is split into blocks for further compression.
o The Discrete Cosine Transform (DCT) is used, which is similar to the Discrete
Fourier Transform (DFT) but works with real values.
o DCT helps by concentrating most of the image’s energy into fewer values, making
compression more efficient.
o This method is used in JPEG (images) and MPEG (videos).
Image compression reduces file size by focusing on important details (luminance) while reducing less
noticeable details (color), using YCbCr conversion, subsampling, and DCT transformation.
43
.IN
C
N
SY
U
VT
44
2. Quantization and Coding
o After applying a transform (like DCT), the values are converted into small integers.
o These are then compressed using Huffman coding or arithmetic coding.
o DC coefficients (low-frequency components) are predicted from previous blocks.
o Quality settings in JPEG control the step size in quantization.
3. Motion Compensation in Videos
o Motion compensation is used in video compression to predict pixel values from
previous frames.
o MPEG uses 16×16 motion blocks, while newer standards use adaptive block sizes
and sub-pixel motion compensation for better efficiency.
o I-frames (independent frames) are mixed with P-frames (predicted frames) for better
error recovery and random access.
.IN
4. Measuring Compression Quality
o Peak Signal-to-Noise Ratio (PSNR) is commonly used to evaluate compression
quality.
C
o PSNR is derived from the Mean Square Error (MSE), which measures the difference
N
between the original and compressed image.
Compression improves storage and transmission by reducing file size while preserving important
SY
details, using techniques like block transforms, quantization, coding, and motion compensation.
U
VT
While this is just a high-level sketch of how image compression works, it is useful to understand so
that the artifacts introduced by such techniques can be compensated for in various computer vision
applications.
45
3.1 Point operators
The simplest kinds of image processing transforms are point operators, where each output
pixel’s value depends on only the corresponding input pixel value (plus, potentially, some
.IN
C
N
SY
U
VT
46
Image Processing Operators
1. Basic Adjustments
o Image processing includes brightness and contrast adjustments, color correction,
and transformations.
.IN
o These are also called point processes, as they modify individual pixel values
independently.
2. Types of Image Processing Operations
C
o Simple point operators like brightness scaling and image addition adjust image
N
intensity.
o Color manipulation changes the appearance of colors in an image.
SY
47
3.1.1 Pixel transforms:
Definition of Pixel Transforms
A pixel transform is a function that takes an input image and produces an output image.
It is represented mathematically as:
.IN
C
N
SY
U
VT
Pixel transforms modify images by applying functions to pixel values, commonly adjusting
brightness and contrast for image enhancement.
Gain and Bias Parameters
Gain (a) controls contrast.
Bias (b) controls brightness.
48
.IN
C
N
SY
Linear operations preserve the sum of inputs, like blending and brightness/contrast changes.
U
Non-linear operations, like gamma correction, adjust image tones for better visual representation.
VT
49
1. Color Image Basics
o Color images have three channels (RGB) that are closely related.
o Adjusting colors affects image brightness, contrast, and overall appearance.
2. Effects of Brightening
.IN
o Simply adding a constant value to all RGB channels increases intensity but can also
change hue and saturation.
o To brighten without affecting color balance, use chromaticity coordinates or color
C
ratios.
N
3. Color Balancing Techniques
o Fixes color distortions caused by different lighting conditions (e.g., incandescent
SY
light).
o Methods include:
Multiplying each RGB channel by a different scale factor.
U
Converting to XYZ color space, adjusting the white point, and converting
VT
50
o An alpha-matted image is an intermediate version of the cut-out object before placing it on a
new background.
o It contains four channels: Red (R), Green (G), Blue (B), and Alpha (A).
3. Alpha Channel (Opacity Control)
o The alpha channel (A) controls transparency at each pixel.
o Fully opaque pixels (inside the object) have α = 1.
o Fully transparent pixels (outside the object) have α = 0.
o Boundary pixels have values between 0 and 1, creating a smooth transition and avoiding
jagged edges.
This technique helps blend objects naturally into new scenes without harsh edges.
.IN
C
N
SY
U
The over operator is used to composite a foreground image over a background image.
Proposed by Porter and Duff (1984) and later studied by Blinn (1994a; 1994b).
The formula used is:
Functionality
The operator attenuates the influence of the background (BBB) by a factor of (1−α)(1 -
\alpha)(1−α).
It then adds the color and opacity values of the foreground layer (FFF).
51
Pre-Multiplied Alpha Representation
In many cases, storing and manipulating pre-multiplied alpha values (αF\alpha FαF) is
convenient.
Advantages of Pre-Multiplied RGBA Representation (Blinn, 1994b):
o Easier to blur or resample (e.g., rotate) images.
o No extra complications when handling alpha-matted images.
o Each RGBA band is treated independently.
When matting using local color consistency (Ruzon and Tomasi 2000; Chuang, Curless et
al. 2001), using un-multiplied foreground colors (FFF) is preferred.
.IN
This ensures:
o Colors remain constant or vary slowly near object edges.
C
N
SY
U
VT
Porter and Duff (1984) describe other operations useful for photo editing and visual
effects.
52
o Transparent motion occurs when scenes are observed from a moving camera
(Section 9.4.2).
Matting Process
Matting: Extracting foreground, background, and alpha matte values from images.
Historical Perspective:
o Smith & Blinn (1996): Traditional blue-screen matting techniques.
o Toyama, Krumm et al. (1999): Review on difference matting.
.IN
Xu, Price et al. (2017).
o Aim: Extract mattes from single images (Figure 3.4a) or extended video sequences.
C
o Chuang, Agarwala et al. (2002) explore techniques for video sequences.
N
SY
The goal is to spread out pixel intensity values to use the full range of brightness levels.
Why is it Needed?
Basic brightness and gain controls (from Section 3.1.1) can improve an image, but choosing
the best values manually is hard.
Some images have too many dark or bright pixels and few mid-tone values, making them
look unbalanced.
A better approach is to adjust brightness and contrast dynamically to improve visibility.
53
.IN
C
N
How Does it Work?
SY
54
Then, use this sum to remap intensity values to evenly spread out the
brightness.
.IN
It works by spreading out intensity values using the Cumulative Distribution Function
(CDF).
This technique is useful for enhancing poorly lit images and ensuring a balanced
C
brightness range.
N
SY
U
Each pixel's intensity (or a student's grade) is converted to a percentile using the Cumulative
Distribution Function (CDF).
When working with 8-bit images (grayscale images), the values range from 0 to 255.
Applying Histogram Equalization
The transformation f(I)=c(I)f(I) = c(I)f(I)=c(I) is applied to adjust pixel brightness.
This results in a flat histogram, meaning pixel values are more evenly distributed.
However, the image may look dull or washed out due to reduced contrast.
Improving the Contrast
To fix this, a partial adjustment is used instead of full equalization: f(I)=αc(I)+(1−α)If(I) = \alpha
c(I) + (1 - \alpha)If(I)=αc(I)+(1−α)I
This blends the original values with the equalized values, keeping some of the natural grayscale
balance.
55
As shown in Figure 3.7f, this approach creates a more visually appealing result.
.IN
Instead of using a single adjustment, the image is divided into M × M pixel blocks.
Each block is equalized separately to adjust contrast locally.
However, this method creates visible blocky artifacts at the boundaries (Figure 3.8b).
C
N
SY
U
VT
56
o Pizer, Amburn et al. (1987) introduced CLAHE to balance contrast without amplifying
noise.
To blend the four lookup functions {f00,f10,f01,f11}\{ f_{00}, f_{10}, f_{01}, f_{11} \}{f00,f10
,f01,f11}, bilinear interpolation is used.
The formula in Equation 3.10 calculates a smooth transition between adjacent blocks.
The blending function depends on the horizontal (s) and vertical (t) position within a block.
.IN
Instead of blending four lookup tables for each output pixel (which is slow), a better approach is to
blend the results of mapping a pixel through the four neighboring lookup tables.
C
This approach smooths transitions between blocks while maintaining efficiency.
N
Corner-Based Lookup Table Placement
SY
A variation of the algorithm places the lookup tables at the corners of each M × M block.
This helps in better distribution of pixel values across the histogram.
Instead of assigning each input pixel to just one lookup table, it is distributed into four adjacent
U
Soft Histogramming
This soft histogramming technique distributes pixel values smoothly instead of assigning them
rigidly to one bin.
It is used in many applications, including:
o SIFT (Scale-Invariant Feature Transform) feature descriptors (Section 7.1.3).
o Vocabulary trees (Section 7.1.4).
57
.IN
C
3.1.5 Tonal Adjustment
N
What is Tonal Adjustment?
Tonal adjustment is a common image processing technique used to enhance contrast and
SY
brightness in photographs.
It makes images look more attractive or easier to interpret.
U
Where is it Used?
Found in photo editing tools where users can adjust contrast, brightness, and color.
VT
58
3.2 Linear Filtering
What is Linear Filtering?
A neighborhood operator (or local operator) that processes a pixel based on the values of
surrounding pixels.
Used for image enhancement by modifying pixel values in a small region.
Applications of Neighborhood Operators
Adjusting tones in specific areas of an image.
Filtering images for different effects:
o Blurring to soften details.
o Sharpening to enhance details.
o Edge detection to highlight boundaries.
o Noise removal to clean up the image.
Examples can be seen in Figures 3.10 and 3.11b–d.
and h is then called the impulse response function.5 The reason for this name is that the kernel
function, h, convolved with an impulse signal, δ(i, j) (an image that is 0 everywhere except
at the origin) reproduces itself, h ∗ δ = h, whereas correlation produces the reflected signal.
59
VT
U
SY
N
C
.IN
60
In fact, Equation (3.14) can be interpreted as the superposition (addition) of shifted impulse response
functions h(i − k, j − l) multiplied by the input pixel values f (k, l). Convolution has additional nice
properties, e.g., it is both commutative and associative. As well, the Fourier transform of two
convolved images is the product of their individual Fourier transforms.
.IN
C
N
SY
U
VT
where the (sparse) H matrix contains the convolution kernels. Figure 3.12 shows how a
onedimensional convolution can be represented in matrix-vector form.
61
.IN
C
Padding (border effects)
N
The astute reader will notice that the correlation shown in Figure 3.10 produces a result that is smaller
SY
than the original image, which may not be desirable in many applications.6 This is because the
neighborhoods of typical correlation and convolution operations extend beyond the image boundaries
near the edges, and so the filtered images suffer from boundary effects To deal with this, a number of
U
different padding or extension modes have been developed for neighborhood operations (Figure
VT
3.13):
• zero: set all pixels outside the source image to 0 (a good choice for alpha-matted cutout images);
• constant (border color): set all pixels outside the source image to a specified border value;
• clamp (replicate or clamp to edge): repeat edge pixels indefinitely;
• (cyclic) wrap (repeat or tile): loop “around” the image in a “toroidal” configuration;
• mirror: reflect pixels across the image edge.
extend: extend the signal by subtracting the mirrored version of the signal from the edge pixel value.
In the computer graphics literature, these mechanisms are known as the wrapping mode (OpenGL)
or texture addressing mode.
Figure 3.13 shows the effects of padding an image with each of the above mechanisms and then
blurring the resulting padded image. As you can see, zero padding darkens the edges, clamp
62
(replication) padding propagates border values inward, mirror (reflection) padding preserves colors
near the borders. Extension padding (not shown) keeps the border pixels fixed (during blur).
Separable filtering is widely used in Gaussian blurring, box filtering, and other image processing
applications due to its efficiency.
U
Separable kernels reduce computational complexity and are widely used in image processing for efficient
filtering.
63
VT
U
SY
N
C
.IN
64
3.2.2 Examples of linear filtering
Linear filtering is a fundamental technique in image processing that involves applying mathematical
operations to enhance or modify images. Below are some key examples:
1. Smoothing Filters (Blurring or Low-Pass Filters)
These filters reduce noise and blur an image by averaging pixel values in a local region.
Moving Average (Box Filter)
o Averages pixel values in a K × K window.
o Implemented by convolving the image with a matrix of ones and then scaling.
o Efficient for large kernels by using summed area tables for quick computation.
Tent Filter (Bilinear Filter)
o Uses a piecewise linear function to smooth images.
o A 3 × 3 bilinear kernel is formed by the outer product of two linear splines.
.IN
o Helps preserve details while reducing noise.
Gaussian Filter
o Obtained by convolving the bilinear filter with itself.
C
o Provides a smoother result compared to the box filter.
N
o Can be approximated by iterating box filter operations multiple times.
Used when rotationally symmetric smoothing is needed.
SY
frequencies.
o Theoretical but not commonly used in real-world applications due to computational
VT
complexity.
2. Image Sharpening (Unsharp Masking)
Although smoothing filters usually blur images, they can also be used to sharpen them.
Unsharp Masking Process
1. Blur the image using a smoothing filter (e.g., Gaussian).
2. Subtract the blurred image from the original to extract high-frequency details.
3. Add this difference back to the original image to enhance sharpness.
o Formula:
52
Historical Method (Before Digital Processing)
In darkroom photography, photographers created a blurred positive negative by misfocusing
the image.
Overlaying this with the original negative produced a sharper final image.
The mathematical representation is:
o Though not a linear filter, this method remains effective in image enhancement.
3. Edge Detection Filters
Edge detection is crucial for feature extraction, object recognition, and computer vision applications.
Sobel Operator
o A 3 × 3 filter used to detect edges in an image.
.IN
o Combines:
Horizontal central difference filter (detects horizontal edges).
Vertical tent filter (smooths noise in the vertical direction).
C
o Highlights vertical edges effectively.
N
Corner Detection
o Detects corners by analyzing horizontal and vertical second derivatives.
SY
o A simple 3 × 3 kernel identifies corners but may also respond to diagonal edges.
o More advanced detectors provide rotation invariance, improving accuracy in feature
detection.
U
NOTE:
VT
✔ Smoothing filters (box, bilinear, Gaussian) reduce noise and blur images.
✔ Corner detectors find key points in an image, useful for computer vision tasks.
53
Band-pass Filters:
.IN
C
N
Steerable Filters
SY
U
VT
Applications
Edge and corner detection: Enhances detection of edges in specific directions.
Texture analysis: Helps analyze patterns in images.
Image enhancement: Used in sharpening and feature extraction.
54
NOTE:
Band-pass filters enhance features by filtering out both low and high frequencies.
Steerable filters can be rotated to detect edges at any orientation efficiently.
The Laplacian of Gaussian (LoG) is a common band-pass filter.
Freeman and Adelson's steerable filters allow efficient multi-directional feature detection.
.IN
C
N
SY
U
VT
55