Computer Vision Basics and Techniques
Computer Vision Basics and Techniques
LEC 1
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
COURSE OUTCOMES
► Understand key features of Computer Vision to analyse and interpret the visible world
around us
► Design and implement multi-dimensional signal processing, feature extraction, pattern
analysis visual geometric modelling, and stochastic optimization
► Apply the computer vision concepts to Biometrics, Medical diagnosis, document
processing, mining of visual content, to surveillance, advanced rendering
SYLLABUS
What is an Image?
► In monochrome images the minimum value corresponds to black and the maximum to
white.
► The different values the intensity function can take are called gray levels.
► Gray level indicates brightness of a pixel
► F(x,y)= 0(black), 1(white)
DIGITIZATION
► Discretization : The function f(x,y) is sampled into an array of MxN . Each element in this
matrix is called pixel
► Quantification: Continuous range of f(x,y) is divided into K intervals and given a value
► Digital images can be quantized upto 256 gray levels
COLOR QUANTIZATION
► Coloured images can be quantized into three vectors, one for red colour, green and blue
► Any colour is a combination of these three primary colours
► There are various colour models
RGB TO GRAYSCALE CONVERSION
► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
► import cv2
► from matplotlib import pyplot as plt
►
im = [Link]('/content/[Link]')
► Color consistency
► Shadows
► Lighting
► Brightness
► Contrast
► r, g, b = [Link](img)
► [Link](r)
► [Link]()
► [Link](g)
► [Link]()
► [Link](b)
► [Link]()
HSI MODEL
Saturation
Intensity
IMAGE PRE-PROCESSING
► import cv2
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► bgr_img = [Link]('/content/[Link]')
► hsv_img = [Link](bgr_img, cv2.COLOR_BGR2HSV)
► cv2_imshow(hsv_img)
► [Link](0)
► [Link]()
What will this do?
► colored_negative = abs(255-im_rgb)
► cv2_imshow(colored_negative)
► cv2_imshow(ad)
RGB TO GRAYSCALE CONVERSION
► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
BRIGHTNESS OF IMAGE
►
b, g, r = [Link](image)
► hist_b = [Link]([b], [
0], None, [256], [0, 256])
► hist_g = [Link]([g], [
0], None, [256], [0, 256])
► hist_r = [Link]([r], [
0], None, [256], [0, 256])
►
[Link]('RGB Histogram')
► [Link]('Pixel Value')
► [Link]('Frequency')
► [Link]()
► [Link]()
HISTOGRAM EQUALIZATION
► To find the transformation matrix, we need three points from input image and their
corresponding locations in the output image
► Is a geometric transformation that preserves lines and parallelism (but not necessarily
distances and angles
Common transformations
► img = [Link]('/content/[Link]')
► rows, cols, ch = [Link]
►
► pts1 = np.float32([[50, 50],[200, 50], [50, 200]])
►
► pts2 = np.float32([[10, 100],[200, 50], [100, 250]])
►
► M = [Link](pts1, pts2)
► dst = [Link](img, M, (cols, rows))
► cv2_imshow(dst)
DIFFERENTIATE BETWEEN
TRANSFORMATIONS
► Euclidean
► Affine
► Projective
Affine Vs Non Affine
Affine Non-affine
Includes scaling, translation Includes projective transformations
Parallelism is preserved Not preserved
Also called homography
Euclidean transformation
► 3D scenes projected to 2D
► Resultant image depends on camera’s viewpoint
► Ratios or dimensions of objects change
► May not preserve angles
► No parallelism
► Generalized Affine transform
► For affine transformation, the projection vector is equal to 0. Thus, affine
transformation can be considered as a particular case of perspective transformation.
► Since the transformation matrix (M) is defined by 8 constants(degree of freedom), thus to
find this matrix we first select 4 points in the input image and map these 4 points to the
desired locations in the unknown output image according to the use-case (This way we
will have 8 equations and 8 unknowns and that can be easily solved).
CONVOLUTION AND FILTERING
► In order to detect vertical and horizontal images, we can go for convolution with the help
of filters
► Use of mxm filter on nxn image to extract features and detect edges
► Gives nxm-1 dimension image
Solve
Filters
TYPES OF FILTERS
► Weights are added to a filter matrix in order to extract features or add edges
► For instance, Sober filter adds more weight to the central row of pixels
as compared to the filter used before
Use of different filters helps to add robustness
CONVOLUTIONAL NEURAL
NETWORKS
• A CNN typically has three layers: a convolutional layer, a pooling layer,
and a fully connected layer.
• The convolution layer is the core building block of the CNN. It carries the
main portion of the network’s computational load.
• This layer performs a dot product between two matrices, where one
matrix is the set of learnable parameters otherwise known as a kernel,
and the other matrix is the restricted portion of the receptive field.
Nxn
Kxk 64
(n-k+1) x (n-k+1)
4x4
2x2 matrix
Pooling
2x2
POOLING LAYER
FULLY CONNECTED LAYER
► Neurons in this layer have full connectivity with all neurons in the preceding and
succeeding layer as seen in regular FCNN.
► This is why it can be computed as usual by a matrix multiplication followed by a
bias effect.
► The FC layer helps to map the representation between the input and the output.
NON-LINEAR ACTIVATION FUNCTIONS
► Sigmoid
► The sigmoid non-linearity has the mathematical form
σ(κ) = 1/(1+e-k)
► It takes a real-valued number and “squashes” it into a range between 0 and 1.
► Tanh
► Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation
saturates, but — unlike the sigmoid neurons — its output is zero centered.
► ReLU
► The Rectified Linear Unit (ReLU) has become very popular in the last few years. It
computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at
zero.
YOLO- YOU ONLY LOOK ONCE - 4/2
- Residual blocks
► This first step starts by dividing the original image (A) into NxN grid cells of equal shape, where
N in our case is 4 shown on the image on the right. Each cell in the grid is responsible for
localizing and predicting the class of the object that it covers, along with the
probability/confidence value.
-Bounding boxes
The next step is to determine the bounding boxes which correspond to rectangles highlighting all
the objects in the image. We can have as many bounding boxes as there are objects within a given
image.
► YOLO determines the attributes of these bounding boxes using a single regression module in
the following format, where Y is the final vector representation for each bounding box.
► Y = [pc, bx, by, bh, bw, c1, c2]
► This is especially important during the training phase of the model.
YOLO- YOU ONLY LOOK ONCE
• Given an image generate bounding boxes, one for each detectable object in image
• For each bounding box, output 5 predictions: x, y, w, h, confidence.
• Also output class
x, y (coordinates for center of bounding box)
w,h (width and height)
confidence (probability bounding box has object)
class (classification of object in bounding box)
YOLO- YOU ONLY LOOK ONCE
Fourier Transform
► Sampled FFT
► Sets of samples which describe the spatial image
► Helps to find periodic patterns in spatial domain images
► Inverse FFT converts image from frequency to spatial domain
► Separation based on sine and cosine components
► cos Ɵ+ isin Ɵ, cos Ɵ – i sin Ɵ (e-i Ɵ & ei Ɵ
► The term in exponential power is called basis function.
Inverse FFT:
FOURIER TRANSFORM
► import cv2
► import numpy as np
► f_transform = [Link].fft2(image)
► f_transform_shifted = [Link](f_transform)
►
magnitude_spectrum = [Link](np.
abs(f_transform_shifted) + 1)
►
[Link](121), [Link](image, cmap=
'gray')
►
[Link](122), [Link](magnitude_spectrum, cmap=
'gray')
►
[Link]()
IMAGE NOISE & FILTERS 7/2
► Noise adds random variations in brightness and color information of an existing image
► Adding noise to images can help in testing the performance of image processing
algorithms such as denoising, segmentation, and feature detection under different
levels of noise.
TYPES OF FILTERS
► LINEAR
Gaussian
Box filter
Weighted Average Filter
► NON-LINEAR
Median Filter
Min filter
Max filter
IMAGE SMOOTHING
► Weighted average filter- Gives more weight to pixels near the output location
► import numpy as np
►
def gaussian_kernel(size, sigma=1.0):
► kernel = [Link](
► lambda x, y: (1/(2*[Link]*sigma**2)) * [Link](-((x-(size-1)/2)**2 + (y-(size-1)/2)**2)/(2*sigma**2)),
► (size, size)
► )
► return kernel / [Link](kernel)
► kernel_size = 5
► sigma = 1.0
► gaussian_kernel_matrix = gaussian_kernel(kernel_size, sigma)
► print("Gaussian Kernel Matrix:")
► for row in gaussian_kernel_matrix:
► print(["{:.5f}".format(value) for value in row])
IMAGE SMOOTHING USING GAUSSIAN
FILTER
► The result is almost the expected one, but some of the edges are thick
and others are thin.
► Non-Max Suppression step will help us mitigate the thick ones.
NON-MAX SUPPRESSION
► The algorithm goes through all the points on the gradient intensity matrix and finds
the pixels with the maximum value in the edge directions.
► if one those two pixels are more intense than the one being processed, then only
the more intense one is kept.
► Hence, the intensity value of the current pixel (i, j) is set to 0. If there are no pixels
in the edge direction having more intense values, then the value of the current
pixel is kept.
• Create a matrix initialized to 0 of the same size of the original gradient intensity
matrix;
• Identify the edge direction based on the angle value from the angle matrix;
• Check if the pixel in the same direction has a higher intensity than the pixel that is
currently processed;
• Return the image processed with the non-max suppression algorithm.
• We can still notice some variation regarding the edges’ intensity: some pixels
seem to be brighter than others.
DOUBLE THRESHOLDING
• High threshold is used to identify the strong pixels (intensity higher than the high
threshold)
• Low threshold is used to identify the non-relevant pixels (intensity lower than the
low threshold)
• All pixels having intensity between both thresholds are flagged as weak and the
Hysteresis mechanism will help us identify the ones that could be considered as
strong and the ones that are considered as non-relevant.
• the hysteresis consists of transforming weak pixels into strong ones, if and only if
at least one of the pixels around the one being processed is a strong one
CANNY EDGE DETECTOR
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► image = [Link]('/content/[Link]', cv2.IMREAD_GRAYSCALE)
►
► Edge detection algorithms using Sobel Operator work on the first derivative of an
image.
► When the image is smoothed, the derivatives Ix and Iy w.r.t. x and y are
calculated. by convolving I with Sobel kernels Kx and Ky, respectively.
► First derivative of an image might be subject to noise
► Laplacian operator makes use of second derivative of the images
LAPLACIAN OF GAUSSIAN (LOG)
► In Positive Laplacian we have standard mask in which center element of the mask should
be negative and corner elements of mask should be zero.
► Positive Laplacian Operator is use to take out outward edges in an image.
► In negative Laplacian operator we also have a standard mask, in which center element
should be positive. All the elements in the corner should be zero and rest of all the
elements in the mask should be -1.
► Negative Laplacian operator is use to take out inward edges in an image
► The Laplacian Operator achieves a sharpening effect by enhancing the grayscale contrast
of the image. As a second-order differential operator, it enhances areas with sudden
grayscale changes in the image and weakens areas with slow grayscale changes. But the
processed image loses the direction information of the edges and enhances the noise
CAMERA GEOMETRY
► How does a camera map perspective projection points on the image plane?
► A camera projects a 3D scene onto a 2D image plane. This transformation can be
represented using a projection matrix P, which maps a 3D point to a 2D image point
► Determining external and internal parameters of a camera is called camera calibration
► Estimating linear models is easier than non-linear camera models.
► Used to develop camera calibration
► Determine projection matrix
LINEAR CAMERA MODEL: 3D to 2D
INTRINSIC MATRIX
► Given mx as number of pixels per mm in the X direction and my as number of pixels per
mm in the Y direction
► U=mx*xi + ox
► V=my*yi + oy
► Consider (ox,oy) is the centre of the image
► Fx,fy,ox,oy are known as intrinsic parameters of a camera
► Also called camera’s internal geometry
► Corresponding intrinsic matrix:
Calibration
matrix
EXTRINSIC MATRIX
► Images captured from one integrated stereo vision camera or two cameras at a time
► Also called binocular vision
► Camera calibration accurate in integrated camera
► No movement in case of multiple cameras
► Orientation of one camera with respect to another
► Used for depth extraction
DEPTH ESTIMATION
• Dilatio expands the boundaries of an object in an image. This is done by convolving the
image with a structuring element, which determines the size and shape of the dilation. The
output of the dilation operation is a new image where the pixels in the original image are
expanded or dilated.
• Erosion is a morphological operation that shrinks the boundaries of an object in an image.
This is done by convolving the image with a structuring element, which determines the
size and shape of the erosion. The output of the erosion operation is a new image where
the pixels in the original image are eroded or shrunk.
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► img = [Link]('/content/[Link]', 0)
► kernel = [Link]((5, 5), np.uint8)
► img_erosion = [Link](img, kernel, iterations=
1)
► img_dilation = [Link](img, kernel, iterations=
1)
► cv2_imshow(img)
► cv2_imshow(img_erosion)
► cv2_imshow(img_dilation)
► [Link](0)
OPENING AND CLOSING
► cv2_imshow(image)
Advantages of Harris Detector
► While the basic ideas of detecting corners remain the same as the Harris detector,
the Hessian detector makes use of the Hessian matrix and determinant, instead of
second-moment matrix M and corner response function R, respectively.
► Entries in Hessian matrix are second derivatives.
Disadvantages
► Once a corner gets magnified and becomes bigger than the size of the window by
zooming, the Harris and Hessian can no longer detect the corner.
► It is because what the detectors perceive through the window is not a corner
anymore but an edge due to the scale change.
Feature Extraction
► This is an area of image processing that uses algorithms to detect and isolate various
desired portions of a digitized image.
► A feature is a significant piece of information extracted from an image which provides
more detailed understanding of the image.
► Example, Detecting of faces in an image filled with people and other objects, Detecting of
facial features such as eyes, nose, mouth, Detecting of edges, so that a feature can be
extracted and compared with another
Feature Detection
► Need to recognize objects with unique and descriptive features in the process of object
recognition
► Detection of different feature families:
► Local pixels (SIFT, SURF..)
► Global pixel features (Histogram, Texture, Color)
► Shape of pixel regions(Area, Perimeter)
► Basis sets (FFT, Haar Wavelet)
Characteristics of features
► Salient
► Robust to clutter
► Repeatable
► Fewer and efficient
Local Feature Descriptors
► Detectors and descriptors can be used combined or independently for local feature
descriptions
► Searching strategies can be pixel-wise or tiled
► Aims to find pieces of objects
► SIFT, SURF, HOG
Global Feature Descriptors
► Texture Histograms
► Spatial Dependency matrix
► Regional Descriptors
Shape Features
► Area
► Perimeter
► Centroid
Basis set descriptors
► HAAR Wavelets
► Fourier transforms
Basic CV Pipeline
► Sensor processing
► Image Processing
► Global Metrics
► Local features
► Training
► Augmentation & Control
► Performance
SIFT : Feature Detector and Descriptor
► The scale space of an image is a function produced from the convolution of a Gaussian
kernel(Blurring) at different scales with the input image
► Scale-space is separated into octaves and the number of octaves and scale depends on the
size of the original image.
► We generate several octaves of the original image.
► Each octave’s image size is half the previous one
► Up till now, we have generated a scale space and used the scale space to calculate the
Difference of Gaussians.
► One pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next scale
and 9 pixels in previous scales.
► Select the pixel if it is larger/smaller than all the 26 pixels around it in spatial or scaled
neighborhood
Keypoint localization
► Keypoints generated in the previous step produce a lot of keypoints. Some of them lie
along an edge, or they don’t have enough contrast.
► We get rid of them. The approach is similar to the one used in the Harris Corner Detector
for removing edge features.
► Reject low contrast points
► Reject edges (Use Harris matrix based response function)
► SIFT uses HESSIAN corner detector
Orientation estimation
►
► Based on the value of gradient magnitudes, we can also increase or decrease a histogram
entry
► Select the peak as the direction of keypoint
Describe the Keypoints
► Take a 16x16 window around the detected keypoint and calculate the local gradients
► Divide the window into 4 quadrants
► Construct a histogram of orientations in each quadrant
► 4 histograms in 8x8 window and 16 histograms in 16x16 window
► Done with respect to 8 bins
► 16 histograms-> 8 values
► The 128 non-negative values obtained give us the keypoint descriptor
(128 dimensional SIFT vector)
import numpy as np
import cv2 as cv
img = [Link]('/content/[Link]')
gray= [Link](img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = [Link](gray,None)
img=[Link](gray,kp,img)
[Link]('sift_keypoints.jpg',img)
Advantages
► To find the HOG of a point, extract a block (square window) of some size
► Divide the block into smaller grids
► For each cell, find an orientation histogram
► Concatenate them together
► For instance, if we obtain 40 values, then a 40 dimensional vector will be obtained
SURF- Speeded Up Robust Features
► SIFT suffers from some limitations, such as being computationally expensive and
not being able to handle images with repetitive patterns or cluttered backgrounds
effectively.
► Gradient Location Orientation Histogram (GLOH) is an extension of SIFT
► Takes into account the gradient orientation of keypoints, which provides rich
information about the local structure and texture of an image
► Includes the location information of keypoints, which allows it to capture the
spatial distribution of features in an image.
► This is particularly useful for tasks where the relative positions of objects or
regions in an image are important, such as object detection and tracking.
Advantages
1. Choose a pixel in the image and select its neighboring pixels in a circular or rectangular region around it.
2. Take the threshold (intensity of the selected pixel, here it is 50).
3. Go through every neighboring pixel and check whether its intensity is greater than or less than the
threshold.
Assign 1 to the neighboring pixel, if the intensity of the neighboring pixel is greater than the threshold.
Assign 0 to the neighboring pixel, if the intensity of the neighboring pixel is less than the threshold.
4. Combine the binary values for all neighboring pixels to obtain a binary code for the central pixel
(Anti-clockwise, starting from the top left corner), and convert it to a decimal value.
5. Repeat steps 1–4 for each pixel in the image to obtain a binary code for each pixel.
6. Now use these LBP values to construct the histogram. By constructing a histogram of the LBP patterns,
we can capture the frequency of occurrence of different texture patterns in the image. This histogram can
then be used as a feature vector for texture classification tasks, where the goal is to automatically classify
images based on their texture properties.
Advantages
LBP is robust to illumination variations, which means that it can effectively capture
texture information in images that have different lighting conditions. This makes it
particularly useful for applications such as facial recognition and object detection,
where lighting conditions can vary significantly.
LBP is a computationally efficient method for texture analysis, which makes it
suitable for processing large datasets and real-time applications.
LBP is invariant to image rotation and scale. Hence it can effectively capture
texture information in images that have been rotated or scaled.
LBP has been shown to be highly discriminative for texture analysis
Disadvantages
LBP is sensitive to noise in the image. This can affect its ability to accurately
capture texture information. T
he LBP operator compares neighboring pixel intensities, and if there is noise in
the image, it can result in incorrect binary values that can affect the resulting LBP
histogram.
LBP only captures local texture information in the immediate vicinity of each pixel,
which can limit its ability to capture more global texture information in the image.
While LBP is invariant to image rotation, it does not capture rotational information
in the texture patterns.
LBP is typically applied to grayscale images, which means that it does not capture
color information in the texture patterns.
K means clustering
► Optical flow is the pattern of apparent motion of image objects between two consecutive
frames caused by the movement of object or camera. It is 2D vector field where each
vector is a displacement vector showing the movement of points from first frame to
second.
► Motion of scene points in a sequence of images
► Image should be a grayscale image.
► Use a corner detector algorithm to select and discard corners
► Compare consecutive frames of videos to initial images and frames
► Use Lucas Kanade method
Object detection vs Object Tracking
► Object Tracking involves correspondence between frames to find the location of single
object or multiple objects
► Object detection involves finding objects in an image or video
► Traditional objects:
Background subtraction : Pixel differences in frames
Shadow removal
Shape based classification
Model/Feature based models for tracking
Lucas Kanade method
► A Gabor filter is a linear filter used in image processing for edge detection, texture
classification, feature extraction and disparity estimation.
► Combines Gaussian and sinusoidal terms to get weights and directionality
► A Gabor filter bank is a set of Gabor filters with different parameters.
► Different low-level features can be extracted from the original image via the convolution
operation by varying the Gabor parameters
How to extract features?
► The simplest way to generate features with Gabor Filters is to use the real or the imaginary
part of the response matrix as a feature vector
► The phase of the response matrix can be taken as a feature vector since it contains relevant
information about the edges
► The amplitude of the response matrix can be taken as a feature vector since it includes the
frequency spectrum
► The squared sum of different wavelength responses with the same orientation can be taken
as a feature vector. It represents the local energy in a particular direction
Bag of Visual Words
► Bag of visual words (BOVW) is commonly used in image classification. Its concept is
adapted from information retrieval and NLP’s bag of words (BOW).
► In bag of words (BOW), we count the number of each word appears in a document, use
the frequency of each word to know the keywords of the document, and make a frequency
histogram from it.
► The general idea of bag of visual words (BOVW) is to represent an image as a set of
features. Features consists of keypoints and descriptors.
► We use the keypoints and descriptors to construct vocabularies and represent each image
as a frequency histogram of features that are in the image.
How to build Bag of Visual Words?
► We detect features, extract descriptors from each image in the dataset, and build a visual
dictionary.
► Detecting features and extracting descriptors in an image can be done by using feature
extractor algorithms (for example, SIFT, KAZE, etc).
Image Texture Analysis
1. Choose a pixel in the image and select its neighboring pixels in a circular or rectangular region around it.
2. Take the threshold (intensity of the selected pixel, here it is 50).
3. Go through every neighboring pixel and check whether its intensity is greater than or less than the
threshold.
Assign 1 to the neighboring pixel, if the intensity of the neighboring pixel is greater than the threshold.
Assign 0 to the neighboring pixel, if the intensity of the neighboring pixel is less than the threshold.
4. Combine the binary values for all neighboring pixels to obtain a binary code for the central pixel
(Anti-clockwise, starting from the top left corner), and convert it to a decimal value.
5. Repeat steps 1–4 for each pixel in the image to obtain a binary code for each pixel.
6. Now use these LBP values to construct the histogram. By constructing a histogram of the LBP patterns,
we can capture the frequency of occurrence of different texture patterns in the image. This histogram can
then be used as a feature vector for texture classification tasks, where the goal is to automatically classify
images based on their texture properties.
Advantages
LBP is robust to illumination variations, which means that it can effectively capture
texture information in images that have different lighting conditions. This makes it
particularly useful for applications such as facial recognition and object detection,
where lighting conditions can vary significantly.
LBP is a computationally efficient method for texture analysis, which makes it
suitable for processing large datasets and real-time applications.
LBP is invariant to image rotation and scale. Hence it can effectively capture
texture information in images that have been rotated or scaled.
LBP has been shown to be highly discriminative for texture analysis
Disadvantages
LBP is sensitive to noise in the image. This can affect its ability to accurately
capture texture information. T
he LBP operator compares neighboring pixel intensities, and if there is noise in
the image, it can result in incorrect binary values that can affect the resulting LBP
histogram.
LBP only captures local texture information in the immediate vicinity of each pixel,
which can limit its ability to capture more global texture information in the image.
While LBP is invariant to image rotation, it does not capture rotational information
in the texture patterns.
LBP is typically applied to grayscale images, which means that it does not capture
color information in the texture patterns.
Gray Level Co-occurrence matrix
► The key concept of GLCM lies in analyzing how often gray levels (intensities) occur together
within an image, specifically considering neighboring pixels.
► Pixel offsets and directions: By counting co-occurrences in a specific pixel offset (often 1 or 2)
and specific directions (e.g., horizontal, vertical, diagonal), GLCM captures the spatial
arrangement of textures.
3 parameters used in calculating GLCM:
► Distance (d): The displacement between two pixels.
► 2. Angle (θ): The direction in which pixel pairs are considered, typically in 0°, 45°, 90°, and
135°.
► 3. Number of Gray Levels (G): The number of discrete intensity levels in the image.
What does it measure?
► Contrast: How intense is the difference between neighboring pixels. High contrast values
indicate large differences between neighboring pixel intensities.
► Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
► The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
► Ρ(a, b) = |x2 – x1| + |y2 – y1|