Pouring in color
• Grayscale image: 2D array of size M x N
containing scalar intensity values (graylevels).
• Color image: typically represented as a 3D
array of size M x N x 3 again containing scalar
values. But each pixel location now has three
values – called as R(red),G(green), B(blue)
intensity values.
• All file formats store color images based on
this representation.
Questions
• What are RGB? How are red, green, blue
determined?
• Are there other ways of representing color?
• How do you distinguish between different
intensities of the same color (shades)?
Between varying levels of whiteness in a color
(tints)?
[Link]
More questions
• How would you define an edge in a color
image?
• How do you smooth color images?
• What do we know about human color
perception?
• Why do we consider only 3 channels (i.e.
RGB)? Are there images with more channels?
Where are they used?
Color perception and physics
• Human perception of color is not fully
understood, but there are some well
understood physical principles.
• The color of light is defined by its constituent
wavelengths (inverse of frequency).
• The visible part of the electromagnetic
spectrum lies between 450 nm (violet) to 700
nm (red).
Infrared
Ultraviolet
Color Wavelength
violet 380–450 nm
blue 450–495 nm
green 495–570 nm
yellow 570–590 nm
orange 590–620 nm
red 620–750 nm
Color physics
• White light is a blend of several wavelengths of
light, which get separated by “dispersive
elements” such as prisms.
• Objects which reflect light that is balanced in
several visible wavelengths appear “white”.
• Objects which reflect light in a narrow range of
wavelengths appear “colored” (example: green
objects reflect light between 500 to 560 nm).
• No color starts or ends abruptly at a particular
wavelength – the transitions are smooth.
Human color perception
• The human retina has two types of receptor
cells that respond to light – the rods and the
cones.
• The rods work in the low-light regime and are
responsible for monochromatic vision.
• The cones respond to brighter light, and are
responsible for color perception.
• There are around 5-7 million cones in a single
retina.
Human color perception
• There are 3 types of cones. Each type responds
differently to light of different wavelengths: L
(responsive to long wavelengths, i.e. red), M
(medium wavelengths, i.e. green) and S (short
wavelengths, i.e. blue).
Yellow color: L is stimulated a bit more
than M and S is not stimulated
Red: L is stimulated much more than M
and S is not stimulated
Violet: S is stimulated, M and L are not
Color-blindness: absence of one or more
of the three types of cones
Response sensitivity functions for
LMS cells
Human color perception
• Consider a beam of light striking the retina.
Let its spectral intensity as a function of
wavelength λ be given as I(λ).
• The three types of cone cells re-weigh the
spectral intensity and produce the following
response:
a L I ( )c L ( )d I ( )c L ( ) I1
aL c L I 2
a M I ( )c M ( )d I ( )c M ( ) aL c M . CI
a c .
L S C is a matrix of size 3 x Nλ
aS I ( )c S ( )d I ( )c S ( ) I and I is a vector of Nλ
Vectors of Nλ N elements
Human color perception
• The colors R,G,B are called primary colors –
their corresponding wavelengths are 435, 546,
700 nm respectively.
• These values were standardized by CIE
(International Commission on Illumination –
Commission Internationale de l’Eclairage).
Display systems (CRT/LCD)
• The interior of a cathode ray tube (CRT) contains an
array of triangular dot patterns (triads) containing
electron-sensitive phosphor. Each dot in the triad
produces light in one of the three primary colors based
on the intensity of that primary color.
• Thus the three primary colors get mixed in different
proportions by the color sensitive cones of the human
eye to perceive different colors.
• Though the electronics of an LCD system is different
from CRT, the color display follows similar principles.
Color Models (Color Spaces)
• A purpose of color model is to serve as a
method of representing color.
• Some color models are oriented towards
hardware (eg: monitors, printers), others for
applications involving color manipulation.
• Monitors: RGB, Printers: CMY, human
perception: HSI, efficient compression and
transmission: YCbCr.
RGB color model
•Defines a cartesian coordinate system for
colors – in terms of R,G,B axes.
•Images in the RGB color model consist of
three component images, one for each
primary color.
•When an RGB image is given as input to a
display system, the three images combine
to produce the composite image on screen.
•Typically, an 8 bit integer is used to
represent the intensity value in each
channel, giving rise to (2^8)^3 = 1.677 x
10^7 colors.
CMY(K) color space
•The colors cyan, magenta and yellow are “opponents”
of red, green and blue respectively, i.e. cyan and red lie
on diagonally opposite corners of the RGB cube, i.e.
C = 255-R, M = 255-G, Y = 255-B
•Cyan, magenta and yellow are called secondary colors
of light, or primary colors of pigments. A cyan colored
surface illuminated with white light will not allow the
reflection of red light. Likewise for magneta and green,
yellow and blue.
•CMY are the colors of ink pigments used in printing
industry. Color printing is a subtractive process – the
ink subtracts certain color components from white
light.
•For purposes of display, white is the full combination
of RGB and black is the absence of light. For purposes
of printing, white is the absence of any printing, and
black is the full combination of CMY.
CMY(K) color space
• The printer puts down dots (of different sizes,
shapes) of CMY colors with tiny spacing (of
different widths) in between. The spacing is so
tiny that our eye perceives them as a single solid
color (optical illusion!). This process is called
color half-toning.
• While black is a full combination of CMY, it is
printed on paper using a separate black-color ink
(to save costs). This is the ‘K’ of the CMYK model.
Color half-toning
[Link]
Three examples of color half-toning with CMYK separations. From left to right:
The cyan separation, the magenta separation, the yellow separation, the black
separation, the combined halftone pattern and finally how the human eye would
observe the combined halftone pattern from a sufficient distance.
Digression: Gray-scale half-toning
[Link]
Left: Halftone dots. Right: How the human eye
would see this sort of arrangement from a
sufficient distance.
Digression: negative after-images!
[Link]
HSI color space
• RGB, CMY are not intuitive from the point of view
of human perception/description.
• We don’t think naturally of colors in the form of
combinations of RGB.
• We tend of think of color as the following
components: hue (the “inherent/pure” color –
red, orange, purple, etc.), saturation (the amount
of white mixed in the color, i.e. pink versus
magenta), intensity (the amount of black mixed
in the color, i.e. dark red versus bright red).
• Intensity increases as we move from black to white on the intensity line.
• Consider a plane perpendicular to the intensity line (in 3D). Saturation of a color
increases as we move on that plane away from the point where the plane and the
intensity line intersect.
• How to determine hue? Pick any point (e.g. yellow) in the RGB cube, and draw a
triangle connecting that point with the white point and black point. All points inside
or on this triangle have the same hue. Any such point would be a color
corresponding to a convex combination of yellow, black and white, i.e. of the form
a x yellow + b x black + c x red, where a, b, c are non-negative and sum to 1. By
rotating this triangle about the intensity axis, you will get different hues.
By rotating the triangle about
HSI space
the intensity axis, you will get Primary colors are separated by
different hues. In fact hue is an 120 degrees. The secondary
ANGULAR quantity ranging colors (of light) are 60 degrees
from 0 to 360 degrees. By away from the primary colors.
convention, red is considered 0
degrees.
To be very accurate, this HSI spindle is actually
hexagonal. But it is approximated as a circular
spindle for convenience. This approximation
does not alter the notion of hue or intensity
and has an insignificant effect on the
saturation.
RGB to HSI conversion
• Conversion formulae are obtained by making
the preceding geometric intuition more
precise:
0.5[( R G ) ( R B )]
cos
1 ,
Refer to textbook for formulae
( R G ) ( R B )(G B )
2
to convert back from HSI to RGB
hue h if B G
2 if B G
3 min( R, G, B )
S 1
RG B
RG B
I
3
HSI and RGB
Practical use of hue
0.5[( R G ) ( R B )]
cos
1 ,
( R G ) 2 ( R B )(G B )
hue h if B G
2 if BG
Hue is invariant to:
• Scaling of R,G,B
• Constant offsets added to R,G,B
What does this mean physically?
Practical use of hue
• To understand this, we need to understand a
model which tells you the what color is
observed at a particular point on a surface of
an object illuminated by one or more light
sources.
• This color is given by:
I (C )
I (C )
ambient I (C )
diffuse I (C )
specular , C {R, G , B}
Ambient light (say due Diffuse reflection of light Reflection from shiny
to sunlight): constant from a directed source off a surface: varies from
effect on all points of rough surface: varies from point to point on a
the object’s surface point to point on a surface surface
Diffuse reflection Specular reflection
from an irregular
surface
• Diffuse reflection from a rough surface: “diffuse” means that
incident light is reflected in all directions.
• Specular reflection: part of the surface acts like a mirror, the
incident light is reflected only in particular directions
Diffuse reflection Diffuse + specular reflection L=Strength of white light source,
ka,kd,ks: surface reflectivity (fraction
I (C )
I (C )
ambient I (C )
diffuse I (C )
specular , C {R, G, B} of incident light that is reflected off
the surface)
ka I a kd( C ) L(nˆ sˆ) k s L( rˆ vˆ)
Vector normal to the Lighting n̂
Viewing Direction of
surface at a point direction v̂
r̂
ŝ
direction reflected
light
For shiny surfaces, α is large.
Practical use of hue
• The ambient and specular components are assumed to
be the same across RGB (neutral reflection model). So
they get subtracted out when computing R-G,G-B,B-R.
Hence hue is invariant to specular reflection!
• Notice: hue is independent of strength of lighting
(why?), lighting direction (why?) and viewing direction
(why?).
• This makes hue useful in object detection and object
recognition or in applications such as detection of
faces/foliage in color images.
• Hue is thus said to be an “illumination invariant”
feature.
Food for thought
• We’ve heaped praises on hue, all along. Any ideas
on its demerits?
• Suppose we define the following quantities (r,g,b)
[the chromaticity vector] derived from RGB:
R G B
r ,g ,b
RG B RG B RG B
Is the chromaticity vector also an illumination
invariant feature? How does it compare to hue?
Digression: Playing with color: seeing is
not (!) believing
[Link]
/a_02_p_vis/a_02_p_vis.html#
[Link]
Operations on color images
• Color image histogram equalization
• Color image filtering
• Color edge detection
Histogram equalization
• Method 1: perform histogram equalization on
RGB channels separately.
• Method 2: Convert RGB to HSI, histogram
equalize the intensity, convert back to RGB.
• Method 1 may cause alterations in the hue –
which is undesirable.
• Method 2 will change only the intensity,
leaving hue and saturation unaltered. It is the
preferred method.
Top row: original
images
Middle row: histogram
equalization channel by
channel
Bottom row: histogram
equalization on
intensity (of HSI) and
conversion back to RGB
Color image smoothing: bilateral
filtering
• Remember the bilateral filter (HW2): an edge-
preserving filter for grayscale images.
• It smoothes the image based on local
weighted combinations driven by difference
between spatial coordinates and intensity
values.
I (i, j )w(i, j )
( i , j )N ( x , y ) (i x ) ( j y )
2 2
( I (i, j ) I ( x, y ))2
I ( x, y ) , w(i, j ) exp [ ],
w(i, j )
( i , j )N ( x , y )
2
s I2
N ( x, y ) small neighborhood (usually square) centered at (x,y)
Bilateral filtering for color images
• You can filter each channel separately, i.e.
I (i , j ) w (i , j )
C
( i , j )N ( x , y )
C
(i x )2 ( j y )2 ( I C (i, j ) I C ( x, y ))2
I C ( x, y ) ,w C (i , j ) exp
[ ],
w ( i , j ) C
( i , j )N ( x , y )
2
s 2
I
N ( x, y ) small neighborhood (usually square) centered at (x,y),
C {R, G, B}
Bilateral filtering for color images
• Or you can filter the three channels in a
coupled fashion, i.e. the smoothing weights
are same for all three channels and they are
derived using information from all three
channels.
I (i, j )w(i, j )
C
( i , j )N ( x , y )
(i x ) ( j y )
2 2 (I
C{ R ,G , B }
C (i, j ) I C ( x, y )) 2
I C ( x, y ) , w(i, j ) exp [ ] ,
w(i, j ) 2
s I 2
( i , j )N ( x , y )
N ( x, y ) small neighborhood (usually square) centered at (x,y)
What’s wrong with separate Channel by channel: Color artifacts
around edges. RGB channels are
channel bilateral filtering? highly inter-dependent – you
shouldn’t treat them as independent.
Separate channel Coupled
More examples: see figure 4.3 of
[Link]
Color Edges
• A color (RGB) image will have three gradient
vectors – one for each channel.
• We could compute edges separately for each
channel.
• Option: Combine (add) channel-per-channel
edges together to get a composite edge
image. Not a good one? Why (see next slide)
Color Edges
• Problem: the two circled points have the same edge
strength (mathematically), though one appears to be a
stronger edge. (Rx,Gx,Bx) = (255,255,255),
(Ry,Gy,By) = (0,0,0),
Rx^2 + Gx^2 +Bx^2 +
Ry^2+Gy^2+By^2 = 3*255^2
(Rx,Gx,Bx) = (255,255,0),
(Ry,Gy,By) = (0,0,255),
Rx^2 + Gx^2 +Bx^2 +
Ry^2+Gy^2+By^2 = 3*255^2
Color Edge
• We want to ask the question: along which
direction in XY space is the total magnitude of
change in intensity the maximum?
• The squared change in intensity in a direction
(cos ϴ, sin ϴ) is given by (square of the
directional derivative of the intensity):
E ( ) ( Rx cos Ry sin )2 (Gx cos G y sin )2 ( Bx cos By sin )2
( Rx2 Gx2 Bx2 ) cos2 ( Ry2 G y2 By2 ) sin 2 2 sin cos ( Rx Ry GxG y Bx By )
• We want to maximize this w.r.t ϴ. Take derivative
with respect to ϴ and set it to zero.
Color Edge
• This gives the color gradient direction which
makes an angle ϴ w.r.t. the X axis, given by:
1 2( Rx Ry GxG y Bx By )
tan 2
1
2
( Rx Gx Bx ) ( Ry G y By )
2 2 2 2
2
• For a grayscale image, this turns out to be
2I
y
1 2I x I y 1 Ix 1 I y
1
tan 2 2 tan 1
2
tan
2 Ix Iy 2 1 I y Ix
I
x
Color Edge
• Consider
E ( ) ( Rx2 Gx2 Bx2 ) cos2 ( R y2 G y2 B y2 ) sin 2 2 sin cos ( Rx R y GxG y Bx B y )
Rx2 Gx2 Bx2 Rx R y GxG y Bx B y cos
cos sin
sin
R x R y G x G y Bx B y Ry G y By
2 2 2
Local color gradient matrix
• It turns out that the ϴ (i.e. the color gradient) we
derived is given by the eigenvector of this matrix
corresponding to the larger eigenvalue. The direction
perpendicular to it (i.e. the eigenvector corresponding
to the smaller eigenvalue) is the color edge.
The YCbCr color space
RGB and correlation coefficient
• The RGB space is inefficient from the point of view of image
compression or transmission.
• This is because there is high correlation between the R,G,B
values at corresponding pixels.
• This is measured by the correlation coefficient which is given
as follows:
N N
( x )( y
i x i y ) ( x )( y
i x i y )
r ( x, y ) i 1
i 1
N N ( N 1) x y
(x ) ( y
i 1
i x
2
i 1
i y ) 2
RGB and correlation coefficient
• This is measured by the correlation coefficient which is given
as follows:
N N
( x )( y
i x i y ) ( x )( y
i x i y )
r ( x, y ) i 1
i 1
N N ( N 1) x y
(x ) ( y
i 1
i x
2
i 1
i y ) 2
• The values of r lie from -1 to 1. A high absolute value of r
indicates high (positive or negative) correlation and a low
value (close to 0) indicates low correlation.
RGB and correlation coefficient
• For the following images, the values of r(R,G), r(G,B) and
r(B,R) were all around 0.9.
PCA on RGB values
• Suppose you take N color images and extract
RGB values of each pixel (3 x 1 vector at each
location).
• Now, suppose you build an eigen-space out of
this – you get 3 eigenvectors, each
corresponding to 3 different eigenvalues.
• The eigen-coefficients are said to be
decorrelated!
PCA on RGB values
• The eigen-coefficients are said to be
decorrelated!
• Why? Because if the correlation matrix of the
RGB values if C, then the correlation matrix of
the eigen-coefficients is VTCV which is a
diagonal matrix.
R channel B channel
G channel
Image containing eigencoefficient value corresponding to 1st eigenvector (with maximum eigenvalue)
Image containing eigencoefficient value corresponding to 2nd eigenvector (with second largest eigenvalue)
Image containing eigencoefficient value corresponding to 3rd eigenvector (with least eigenvalue)
The variances of the three eigen-coefficient values:
8411, 159.1, 71.7
YCbCr color space
• The YCbCr color space is a similarly
decorrelated color space with Y being the
luminance channel similar to the V in HSV.
• And Cb, Cr being the two chrominance
channels.
• Y gives intensity information and the “color”
information lies in Cb and Cr.
YCbCr color space
• The RGB to YCbCr conversion is given as
follows:
• The YCbCr to RGB conversion is as follows:
YCbCr color space
• The luminance channel (Y) carries most information
from the point of view of human perception, and the
human eye is less sensitive to changes in chrominance.
• This fact can be used to assign coarser quantization
levels (i.e. fewer bits) for storing or transmitting Cb and
Cr values as compared to the Y channel.
• This improves the compression rate without significant
loss in perceptual quality.
• The JPEG standard for color image compression uses
the YCbCr format. For an image of size M x N x 3, it
stores Y with full resolution (i.e. as an M x N image),
and Cb and Cr with 25% resolution, i.e. as M/2 x N/2
images.
Y channel Cr channel
Cb channel
The correlation coefficients between Y
and Cr, and between Y and Cr are
around 0.1 for this image. The Cb and Cr
correlation coefficient is around -0.4.
Where do the formulae for YCbCr
emerge from?
• From another color space used earlier called as
the YUV space given as follows:
Y 0.3R 0.6G 0.1B
U B Y
V R Y
[Link]
• Here U and V are the chroma (or chrominance)
components, and Y is the luma component.
Where do the formulae for YCbCr
emerge from?
• The formula for Y was obtained by
psychovisual experiments that estimated the
amount of red, green and blue that human
users perceive (this is proportional to the
percentage of red, green and blue cones in the
retina which are around 33%, 65% and 2%
respectively. But the blue cones are the most
sensitive).
Where do the formulae for YCbCr
emerge from?
• For RGB values in the [0,1] range, the value of
U lies in [-0.9,0.9] and the value of V lies in
[-0.7,0.7].
• In the YCbCr scheme, the Cb and Cr values are
scaled and shifted versions of U and V
respectively given as follows:
Cb U / 2 0.5
Cr V / 1.6 0.5
Beyond color: Hyperspectral images
• Hyperspectral images are images of the form
M x N x L, where L is the number of channels.
L can range from 30 to 30,000 or more.
• Finer division of wavelengths than possible in
RGB!
• Can contain wavelengths in the infrared or
ultraviolet regime.
Sources of confusion
• Hyperspectral images are abbreviated as HSI!
• Hyperspectral images are different from
multispectral images. The latter contain few, discrete
and discontinuous wavelengths. The former contain
many more wavelengths with continuity.
Beyond color: Hyperspectral images
• Widely used in remote sensing (satellite
images) – often different
materials/geographical entities (soil, water,
vegetation, concrete, landmines, mountains,
etc.) can be detected/classified by spectral
properties.
• Also used in chemistry, pharmaceutical
industry and pathology for classification of
materials/tissues.
Example multispectral image with 6 bands
Reference color image
Color Image Acquisition: Mosaicing
and demosaicing
• In grayscale image acquisition, the image is stored on a
CCD array.
• One would imagine there would need to be three such
CCD arrays of equal size for RGB (color) images.
• But CCD arrays are expensive! And there would be
spatial alignment issues in the R, G, B channel images
due to difference in location of the RGB sensors.
• So this is not followed in practice.
Color Image Acquisition: Mosaicing
and demosaicing
• Instead at any pixel, only one out of the RGB
values is stored.
• This is accomplished in hardware by means of
a color filter array (CFA).
Color Filter Arrays
• A CFA is an array of tiny color filters placed
before the image sensor array of a camera.
• The resolution of this array is the same as that
of the image sensor array.
• Each color filter may allow a different
wavelength of light to pass – this is pre-
determined during the camera design.
Color Filter Arrays
• The most common type of CFA is the Bayer
pattern which is shown below:
[Link]
• The Bayer pattern collects information at red,
green, blue wavelengths only in a repeated
pattern shown above.
*The word “mosaic” or “mosaiced” is not to be confused with image panorama
generation which is also called image mosaicing.
Color Filter Arrays
• The Bayer pattern uses twice the number of green
elements as compared to red or blue elements.
• This is because both the M and L cone cells of the retina are
sensitive to green light.
• The raw (uncompressed) output of the Bayer pattern is
called as the Bayer pattern image or the mosaiced (*)
image.
• The mosaiced image needs to be converted to a normal
RGB image by a process called color image demosaicing.
[Link]
“original scene” /wiki/Bayer_filter
Mosaiced image
Mosaiced image – just
coded with the Bayer
filter colors
“Demosaiced” image –
obtained by
interpolating the
missing color values at
all the pixels
A Demosaicing Algorithm
• There exist a plethora of demosaicing
algorithms.
• We will study one that uses the bilateral filter.
A Demosaicing Algorithm
• Why not just use simple linear interpolation to
fill in missing RGB values?
• It produces a color fringe artifact along edges.
• See figures 2 and 11 of the article
“Demosaicing methods for Bayer color arrays”
by Ramanath et al (2002, Journal of Electronic
Imaging)
A Demosaicing Algorithm: bilateral
filter
Assumption:
The values in the R, G, B channels
are highly correlated with one
another. Make use of this to
interpolate missing values in any of
the channels
Approach:
To estimate green values at a red/blue pixel, compute gradient for the green
channel in horizontal or vertical directions. Interpolate the green color using
pixels along the direction with the smaller gradient (i.e. along the edge).
To estimate red values at a green pixel, use a bilateral filter with spatial weights
defined as usual, and intensity weights using the differences between the
interpolated green values (red and blue values not used as they are not available).
A Demosaicing Algorithm: bilateral
filter
Assumption:
The values in the R, G, B channels
are highly correlated with one
another. Make use of this to
interpolate missing values in any of
the channels
For results and description of the approach, see the following paper:
Ramanath and Snyder, “Adaptive demosaicking”, Journal of Electronic Imaging,
2003
Mosaicing – a broader perspective
• Mosaicing is an example of acquiring a signal in
compressed format.
• Why? Because the underlying RGB image with N
pixels has some 3N values, but only N values are
measured by the camera.
• A software routine (demosaicing algorithm) then
interpolates the remaining values.
Mosaicing – a broader perspective
• This can be written in the form:
y Φx,
y R m , x R n , Φ R m n , m n
• Recovering x from y and Φ is an ill-posed
problem – as the number of knowns is less
than the number of unknowns.
Mosaicing – a broader perspective
• But there is some theory that x can actually be
recovered without error if x is a sparse vector and
if Φ obeys certain properties.
• This is called the theory of compressed sensing,
and is a very active area of research in signal and
image processing (we will cover this theory at the
end of the computer vision course).
• The Φ matrix in case of the Bayer pattern does
not satisfy the required properties however.