Image Pre-Processing


             Ashish Khare
Local pre-processing
   Pre-processing methods use a small
    neighborhood of a pixel in an input image to get a
    new brightness value in the output image.

   Such pre-processing operations are called also
    filtration.

   Local pre-processing methods can be divided into
    the two groups according to the goal of the
    processing
   Smoothing aims to suppress noise or other small
    fluctuations in the image
     equivalent to the suppression of high
        frequencies in the frequency domain.

   Unfortunately, smoothing also blurs all sharp
    edges that bear important information about the
    image.
   Gradient operators are based on local
    derivatives of the image function.

   Derivatives are bigger at locations of the image
    where the image function undergoes rapid
    changes. The aim of gradient operators is to
    indicate such locations in the image.

   Gradient operators have a similar effect as
    suppressing low frequencies in the frequency
    domain.
   Noise is often high frequency in nature;
    unfortunately, if a gradient operator is applied to
    an image the noise level increases
    simultaneously.

   Clearly, smoothing and gradient operators have
    conflicting aims.

   Some pre-processing algorithms solve this
    problem and permit smoothing and edge
    enhancement simultaneously.
   Another classification of local pre-processing
    methods is according to the transformation
    properties.

   Linear and nonlinear transformations can be
    distinguished.

   Linear operations calculate the resulting value in
    the output image pixel g(i,j) as a linear
    combination of brightnesses in a local
    neighborhood of the pixel f(i,j) in the input image.
   The contribution of the pixels in the neighborhood
    is weighted by coefficients h



   The above equation is equivalent to discrete
    convolution with the kernel h, that is called a
    convolution mask.
   Rectangular neighborhoods O are often used with
    an odd number of pixels in rows and columns,
    enabling the specification of the central pixel of
    the neighborhood.

   Local pre-processing methods typically use very
    little a priori knowledge about the image contents.
    It is very difficult to infer this knowledge while an
    image is processed as the known neighborhood
    O of the processed pixel is small.
   The choice of the local transformation, size, and
    shape of the neighborhood O depends strongly
    on the size of objects in the processed image.

   If objects are rather large, an image can be
    enhanced by smoothing of small degradations.
Image Smoothing
   Image smoothing is the set of local pre-
    processing methods which have the aim of
    suppressing image noise - it uses redundancy in
    the image data.

   Calculation of the new value is based on
    averaging of brightness values in some
    neighborhood O.
   Smoothing poses the problem of blurring sharp
    edges in the image, and so we shall concentrate
    on smoothing methods which are edge
    preserving. They are based on the general idea
    that the average is computed only from those
    points in the neighborhood which have similar
    properties to the processed point.

   Local image smoothing can effectively eliminate
    impulsive noise or degradations appearing as thin
    stripes, but does not work if degradations are
    large blobs or thick stripes.
Averaging
   Assume that the noise value at each pixel is an
    independent random variable with zero mean
    and standard deviation
   We can obtain such an image by capturing the
    same static scene several times.
   The result of smoothing is an average of the
    same n points in these images g1,...,g{n} with
    noise values {1},..., {n}
   The second term here describes the effect of
    the noise ... a random value with zero mean.

   Thus if n images of the same scene are
    available, the smoothing can be accomplished
    without blurring the image by
   In many cases only one image with noise is
    available, and averaging is then realized in a local
    neighborhood.

   Results are acceptable if the noise is smaller in
    size than the smallest objects of interest in the
    image, but blurring of edges is a serious
    disadvantage.
   Averaging is a special case of discrete
    convolution. For a 3 x 3 neighborhood the
    convolution mask h is
Averaging with Limited Data
             Validity
   Methods that average with limited data validity
    try to avoid blurring by averaging only those
    pixels which satisfy some criterion, the aim
    being to prevent involving pixels that are part of
    a separate feature.

   A very simple criterion is to use only pixels in
    the original image with brightness in a
    predefined interval of invalid data [min,max].
   Consider a point (m,n) in the image. If the
    intensity at (m,n) has a valid intensity, then
    nothing is done.

   However, if a point (m,n) has an invalid gray-
    level, then the convolution mask is calculated in
    the neighborhood O from the nonlinear formula
   A second method performs the averaging
    only if the computed brightness change of a
    pixel is in some predefined interval.

   This method permits repair to large-area
    errors resulting from slowly changing
    brightness of the background without
    affecting the rest of the image.
   A third method uses edge strength (i.e.,
    magnitude of a gradient) as a criterion.

   The magnitude of some gradient operator is
    first computed for the entire image, and only
    pixels in the input image with a gradient
    magnitude smaller than a predefined
    threshold are used in averaging.
Average according to
          inverse gradient
   The convolution mask is calculated at each pixel
    according to the inverse gradient.

   Brightness change within a region is usually
    smaller than between neighboring regions.

   Let (i,j) be the central pixel of a convolution mask
    with odd size; the inverse gradient at the point
    (m,n) with respect to (i,j) is then
   If g(m,n) = g(i,j) then we define (i,j,m,n) = 2;

   the inverse gradient is then in the interval (0,2],
    and is smaller on the edge than in the interior of
    a homogeneous region.

   Weight coefficients in the convolution mask h are
    normalized by the inverse gradient, and the
    whole term is multiplied by 0.5 to keep brightness
    values in the original range.
   The constant 0.5 has the effect of assigning half
    the weight to the central pixel (i,j), and the other
    half to its neighborhood.



   The convolution mask coefficient corresponding
    to the central pixel is defined as h(i,j) = 0.5.

   The above method assumes sharp edges.
   When the convolution mask is close to an edge,
    pixels from the region have larger coefficients
    than pixels near the edge, and it is not blurred.
    Isolated noise points within homogeneous
    regions have small values of the inverse
    gradient; points from the neighborhood take
    part in averaging and the noise is removed.
Averaging using
           a rotating mask
   avoids edge blurring by searching for the
    homogeneous part of the current pixel
    neighborhood

   the resulting image is in fact sharpened

   brightness average is calculated only within
    the homogeneous region
   a brightness dispersion σ2 is used as the
    region homogeneity measure.

   let n be the number of pixels in a region R
    and g(i,j) be the input image. Dispersion σ2
    is calculated as
   The computational complexity (number of
    multiplications) of the dispersion calculation can
    be reduced if expressed as follows
   Rotated masks
Median Smoothing
   In a set of ordered values, the median is the
    central value.

   Median filtering reduces blurring of edges.

   The idea is to replace the current point in the
    image by the median of the brightness in its
    neighborhood.

   not affected by individual noise spikes
   eliminates impulsive noise quite well

   does not blur edges much and can be applied
    iteratively.

   The main disadvantage of median filtering in a
    rectangular neighborhood is its damaging of
    thin lines and sharp corners in the image --
    this can be avoided if another shape of
    neighborhood is used.
Edge Detectors
   locate sharp changes in the intensity function

   edges are pixels where brightness changes
    abruptly.

   Calculus describes changes of continuous
    functions using derivatives; an image function
    depends on two variables - partial derivatives.
   A change of the image function can be
    described by a gradient that points in the
    direction of the largest growth of the image
    function.

   An edge is a property attached to an
    individual pixel and is calculated from the
    image function behavior in a neighborhood of
    the pixel.
It   is a vector variable
            magnitude of the gradient
            direction Φ
   The gradient direction gives the direction of
    maximal growth of the function, e.g., from
    black (f(i,j)=0) to white (f(i,j)=255).

   This is illustrated below; closed lines are
    lines of the same brightness.

   The orientation 0? points East.
   Edges are often used in image analysis for
    finding region boundaries.

   Boundary and its parts (edges) are
    perpendicular to the direction of the
    gradient.
   The gradient magnitude and gradient
    direction are continuous image functions
    where arg(x,y) is the angle (in radians) from
    the x-axis to the point (x,y).
   Sometimes we are interested only in edge
    magnitudes without regard to their orientations.
   The Laplacian may be used.
   The Laplacian has the same properties in all
    directions and is therefore invariant to rotation in
    the image.
   Image sharpening makes edges steeper -- the
    sharpened image is intended to be observed by a
    human.

   C is a positive coefficient which gives the
    strength of sharpening and S(i,j) is a measure of
    the image function sheerness that is calculated
    using a gradient operator.
   The Laplacian is very often used to estimate
    S(i,j).
Laplace Operator
   The Laplace operator (Eq. 4.37) is a very
    popular operator approximating the second
    derivative which gives the gradient
    magnitude only.

   The Laplacian is approximated in digital
    images by a convolution sum.
   A 3 x 3 mask for 4-neighborhoods and 8-
    neighborhood
   A Laplacian operator with stressed
    significance of the central pixel or its
    neighborhood is sometimes used. In this
    approximation it loses invariance to rotation
   The Laplacian operator has a
    disadvantage -- it responds doubly to
    some edges in the image.
   Image sharpening / edge detection can be
    interpreted in the frequency domain as well.

   The result of the Fourier transform is a
    combination of harmonic functions.

   The derivative of the harmonic function sin (nx) is
    n cos (nx); thus the higher the frequency, the
    higher the magnitude of its derivative.

   This is another explanation of why gradient
    operators enhance edges.
   Unsharp masking is often used in printing
    industry applications - another image
    sharpening approach.

   A signal proportional to an unsharp image
    (e.g., blurred by some smoothing operator)
    is subtracted from the original image, again
    a parameter C may be used to control the
    weight of the subtraction.
   A digital image is discrete in nature ... derivatives
    must be approximated by differences.

   The first differences of the image g in the vertical
    direction (for fixed i) and in the horizontal
    direction (for fixed j)
   n is a small integer, usually 1.

   The value n should be chosen small enough
    to provide a good approximation to the
    derivative, but large enough to neglect
    unimportant changes in the image function.
   Symmetric expressions for the difference
    are not usually used because they neglect
    the impact of the pixel (i,j) itself.
   Gradient operators can be divided into
    three categories
I

   Operators approximating derivatives of the image
    function using differences.

   rotationally invariant (e.g., Laplacian) need one
    convolution mask only.

   approximating first derivatives use several masks
    ... the orientation is estimated on the basis of the
    best matching of several simple patterns.
II

   Operators based on the zero crossings of
    the image function second derivative (e.g.,
    Marr-Hildreth or Canny edge detector).
III

   Operators which attempt to match an image
    function to a parametric model of edges.

   This category will not be covered here;
    parametric models describe edges more
    precisely than simple edge magnitude and
    direction and are much more
    computationally intensive.
   Individual gradient operators that examine
    small local neighborhoods are in fact
    convolutions and can be expressed by
    convolution masks.

   Operators which are able to detect edge
    direction as well are represented by a
    collection of masks, each corresponding to
    a certain direction.
Roberts Operator



   so the magnitude of the edge is computed as



   The primary disadvantage of the Roberts
    operator is its high sensitivity to noise, because
    very few pixels are used to approximate the
    gradient.
Prewitt Operator
   The Prewitt operator, similarly to the Sobel,
    Kirsch, Robinson (as discussed later) and
    some other operators, approximates the first
    derivative.

   Operators approximating first derivative of
    an image function are sometimes called
    compass operators because of the ability to
    determine gradient direction.
   The gradient is estimated in eight (for a 3 x 3
    convolution mask) possible directions, and the
    convolution result of greatest magnitude indicates
    the gradient direction. Larger masks are possible.




    The direction of the gradient is given by the mask
    giving maximal response. This is valid for all
    following operators approximating the first
    derivative.
Sobel Operator


   Used as a simple detector of horizontality and
    verticality of edges in which case only masks h1
    and h3 are used.
   If the h1 response is y and the h3 response x, we
    might then derive edge strength (magnitude) as



   and direction as arctan (y / x).
Robinson Operator
Kirsch Operator
Marr-Hildreth Edge Detection:
Zero crossings of the second derivative

   Edge detection techniques like the Kirsch,
    Sobel, Prewitt operators are based on
    convolution in very small neighborhoods and
    work well for specific images only.

   The main disadvantage of these edge
    detectors is their dependence on the size of
    objects and sensitivity to noise.
   The Marr-Hildreth edge detection technique,
    based on the zero crossings of the second
    derivative explores the fact that a step edge
    corresponds to an abrupt change in the
    image function.

   The first derivative of the image function
    should have an extreme at the position
    corresponding to the edge in the image, and
    so the second derivative should be zero at
    the same position.
   It is much easier and more precise to find a
    zero crossing position than an extreme
   Robust calculation of the 2nd derivative:
       smooth an image first (to reduce noise) and then
        compute second derivatives.
       The 2D Gaussian smoothing operator G(x,y)
   The standard deviation sigma is the only
    parameter of the Gaussian filter - it is
    proportional to the size of neighborhood on
    which the filter operates.

   Pixels more distant from the center of the
    operator have smaller influence, and pixels
    further than 3 sigma from the center have
    negligible influence.
   Goal is to get second derivative of a
    smoothed 2D function f(x,y) ... the Laplacian
    operator gives the second derivative, and is
    moreover non-directional (isotropic).

   Consider then the Laplacian of an image
    f(x,y) smoothed by a Gaussian ... LoG
   The order of differentiation and convolution
    can be interchanged due to linearity of the
    operations:
   The derivative of the Gaussian filter is
    independent of the image under
    consideration and can be precomputed
    analytically reducing the complexity of the
    composite operation.
   Using the substitution r2=x2+y2, where r
    measures distance from the origin
    ( reasonable as the Gaussian is circularly
    symmetric, the 2D Gaussian can be
    converted into a 1D function that is easier to
    differentiate.
   The first derivative is



   and the second derivative (LoG) is
   After returning to the original co-ordinates x, y
    and introducing a normalizing multiplicative
    coefficient c (that includes 1/ σ2), we get a
    convolution mask of a zero crossing detector



    where c normalizes the sum of mask
    elements to zero.
   Finding second derivatives in this way is
    very robust.

   Gaussian smoothing effectively suppresses
    the influence of the pixels that are up to a
    distance 3 sigma from the current pixel; then
    the Laplace operator is an efficient and
    stable measure of changes in the image.
   The location in the LoG image where the zero
    level is crossed corresponds to the position of the
    edges.
   The advantage of this approach compared to
    classical edge operators of small size is that a
    larger area surrounding the current pixel is taken
    into account; the influence of more distant points
    decreases according to the of the Gaussian.
   The variation does not affect the location of the
    zero crossings.
   Convolution masks become large for larger σ; for
    example, σ = 4 needs a mask about 40 pixels wide.

   The practical implication of Gaussian smoothing is
    that edges are found reliably.

   If only globally significant edges are required, the
    standard deviation σ of the Gaussian smoothing
    filter may be increased, having the effect of
    suppressing less significant evidence.
   The LoG operator can be very effectively
    approximated by convolution with a mask that
    is the difference of two Gaussian averaging
    masks with substantially different - this
    method is called the Difference of
    Gaussians - DoG.

   Even coarser approximations to LoG are
    sometimes used - the image is filtered twice
    by an averaging operator with smoothing
    masks of different size and the difference
    image is produced.
   Disadvantages of zero-crossing:
       smoothes the shape too much; for
        example sharp corners are lost
       tends to create closed loops of edges
Scale in Image Processing
   Many image processing techniques work
    locally

   The essential problem in such computation
    is scale

   Edges correspond to the gradient of the
    image function that is computed as a
    difference between pixels in some
    neighborhood
   There is seldom a sound reason for
    choosing a particular size of
    neighborhood
       The right size depends on the size of the
        objects under investigation.
       To know what the objects are assumes
        that it is clear how to interpret an image.
       This is not in general known at the pre-
        processing stage.
   Scale in image processing
    examples/solutions
       Processing of planar noisy curves at a
        range of scales - the segment of curve
        that represents the underlying structure
        of the scene needs to be found.
   After smoothing using the Gaussian filter with
    varying standard deviations, the significant
    segments of the original curve can be found.

   The task can be formulated as an
    optimization problem in which two criteria are
    used simultaneously
       the longer the curve segment the better
       the change of curvature should be minimal.
   Scale space filtering describes signals
    qualitatively with respect to scale.

   The original 1D signal f(x) is smoothed by
    convolution with a 1D Gaussian
   If the standard deviation σ is slowly changed
    the following function represents a surface
    on the (x,σ) plane that is called the scale--
    space image.

   Inflection points of the curve F(x,σ0) for a
    distinct value σ0
   Inflection points:




   The positions of inflection points can be
    drawn as a set of curves in (x,σ) co-ordinates.
   Coarse to fine analysis of the curves
    corresponding to inflection points, i.e., in the
    direction of the decreasing value of the σ,
    localizes large-scale events.

   The qualitative information contained in the
    scale--space image can be transformed into
    a simple interval tree that expresses the
    structure of the signal f(x) over all observed
    scales.
   The interval tree is built from the root that
    corresponds to the largest scale.

   The scale-space image is searched in the
    direction of decreasing σ.

   The interval tree branches at those points
    where new curves corresponding to
    inflection points appear.
   The third example of the application of
    scale - Canny edge detector.
Canny edge detection
   optimal for step edges corrupted by white noise
   optimality related to three criteria
       detection criterion ... important edges should not
        be missed, there should be no spurious responses
       localization criterion ... distance between the
        actual and located position of the edge should be
        minimal
       one response criterion ... minimizes multiple
        responses to a single edge (also partly covered by
        the first criterion since when there are two
        responses to a single edge one of them should be
        considered as false)
   Canny's edge detector is based on
    several ideas:
1. The edge detector was expressed for a 1D
   signal and the first two optimality criteria. A
   closed form solution was found using the
   calculus of variations.
2. If the third criterion (multiple responses) is
   added, the best solution may be found by
   numerical optimization. The resulting filter
   can be approximated effectively with error
   less than 20% by the first derivative of a
   Gaussian smoothing filter with standard
   deviation σ ; the reason for doing this is the
   existence of an effective implementation.
     There is a strong similarity here to the Marr-
      Hildreth edge detector (Laplacian of a Gaussian)
3. The detector is then generalized to two
   dimensions. A step edge is given by its position,
   orientation, and possibly magnitude (strength).
     It can be shown that convoluting an image with a
      symmetric 2D Gaussian and then differentiating in
      the direction of the gradient (perpendicular to the
      edge direction) forms a simple and effective
      directional operator.
     Recall that the Marr-Hildreth zero crossing operator
      does not give information about edge direction as it
      uses Laplacian filter.
   Suppose G is a 2D Gaussian and assume we
    wish to convolute the image with an operator
    Gn which is a first derivative of G in the
    direction n.
   The direction n should be oriented
    perpendicular to the edge
       this direction is not known in advance
       however, a robust estimate of it based on the
        smoothed gradient direction is available
       if g is the image, the normal to the edge is
        estimated as
   The edge location is then at the local
    maximum in the direction n of the operator
    Gn convoluted with the image g
   The above equation shows how to find local
    maxima in the direction perpendicular to the
    edge; this operation is often referred to as
    non-maximum suppression.