0% found this document useful (0 votes)
11 views196 pages

Computer Vision Basics and Techniques

The document outlines an introductory course on Computer Vision by Dr. Shefali Arora Chouhan, focusing on key concepts such as image representation, digitization, color quantization, and image pre-processing techniques. It covers essential topics including recognition, reconstruction, and reorganization in computer vision, along with practical applications in various fields. Additionally, it discusses advanced techniques like convolutional neural networks and the YOLO algorithm for object detection.

Uploaded by

Sumit Chauhan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views196 pages

Computer Vision Basics and Techniques

The document outlines an introductory course on Computer Vision by Dr. Shefali Arora Chouhan, focusing on key concepts such as image representation, digitization, color quantization, and image pre-processing techniques. It covers essential topics including recognition, reconstruction, and reorganization in computer vision, along with practical applications in various fields. Additionally, it discusses advanced techniques like convolutional neural networks and the YOLO algorithm for object detection.

Uploaded by

Sumit Chauhan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

COMPUTER VISION

LEC 1
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
COURSE OUTCOMES

► Understand key features of Computer Vision to analyse and interpret the visible world
around us
► Design and implement multi-dimensional signal processing, feature extraction, pattern
analysis visual geometric modelling, and stochastic optimization
► Apply the computer vision concepts to Biometrics, Medical diagnosis, document
processing, mining of visual content, to surveillance, advanced rendering
SYLLABUS
What is an Image?

► A signal is a function which depends on some variable


► An image is a signal which can be modelled into a 2D or 3D function
► The values of the function correspond to parameters such as brightness of image, pressure
etc.
THE 3Rs OF COMPUTER VISION

► The central problems in computer vision are recognition, reconstruction and


reorganization
► Recognition is about attaching semantic category labels to objects and scenes as well as to
events and activities.
► Reconstruction is traditionally about estimating shape, spatial layout, reflectance and
illumination – which could be used together to render the scene to produce an image.
► Reorganization is our term for what is usually called “perceptual organization” in human
vision; the “re” prefix makes the analogy with recognition and reconstruction more salient.
BASIC CONCEPTS

► In monochrome images the minimum value corresponds to black and the maximum to
white.
► The different values the intensity function can take are called gray levels.
► Gray level indicates brightness of a pixel
► F(x,y)= 0(black), 1(white)
DIGITIZATION

► Discretization : The function f(x,y) is sampled into an array of MxN . Each element in this
matrix is called pixel
► Quantification: Continuous range of f(x,y) is divided into K intervals and given a value
► Digital images can be quantized upto 256 gray levels
COLOR QUANTIZATION

► Coloured images can be quantized into three vectors, one for red colour, green and blue
► Any colour is a combination of these three primary colours
► There are various colour models
RGB TO GRAYSCALE CONVERSION

► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
► import cv2
► from matplotlib import pyplot as plt

im = [Link]('/content/[Link]')

► im_rgb = [Link](im, cv2.COLOR_BGR2RGB)



# Display the image using matplotlib
► [Link](im_rgb)
► [Link]()

► [Link](131)
► [Link](r, cmap= 'Reds')
► [Link]('Red Channel')

[Link](132)
► [Link](g, cmap= 'Greens')
► [Link]('Green Channel' )

[Link](133)
► [Link](b, cmap= 'Blues')
► [Link]('Blue Channel' )

► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► src = [Link]('/content/[Link]')
► print([Link])
► red_channel = src[:,:,2]
► red_img = [Link]([Link])
► red_img[:,:,2] = red_channel
► cv2_imshow(red_img)
MORE PARAMETERS

► Color consistency
► Shadows
► Lighting
► Brightness
► Contrast
► r, g, b = [Link](img)
► [Link](r)
► [Link]()
► [Link](g)
► [Link]()
► [Link](b)
► [Link]()
HSI MODEL

► Represents colors as human eye represents colors


► Three components: Hue, Saturation & Intensity
► Saturation & intensity range from 0-1
► Color will be same for intensity 1

Saturation

Intensity
IMAGE PRE-PROCESSING

► Series of operations at the lowest level of abstraction


► Input and output is an image
► Objective is to improve the image
► Remove distortions
► Improve quality or highlight important features
COMMON OPERATIONS

► Gray level transformations


► Histograms
► Geometric transformations
► Arithmetic Operations
► Convolution
► Smoothing
HSI MODEL

► import cv2
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► bgr_img = [Link]('/content/[Link]')
► hsv_img = [Link](bgr_img, cv2.COLOR_BGR2HSV)
► cv2_imshow(hsv_img)
► [Link](0)
► [Link]()
What will this do?

► colored_negative = abs(255-im_rgb)
► cv2_imshow(colored_negative)
► cv2_imshow(ad)
RGB TO GRAYSCALE CONVERSION

► Average method is the most simple one. Since its an RGB image, so it
means that you have add r with g with b and then divide it by 3 to get
your desired grayscale image.
Grayscale = (R + G + B) / 3
► Weighted Method: Since red color has more wavelength of all the three
colors, and green is the color that has not only less wavelength then
red color but also green is the color that gives more soothing effect to
the eyes. It means that we have to decrease the contribution of red
color, and increase the contribution of the green color, and put blue
color contribution in between these two.
New grayscale image = ( (0.3 * R) + (0.59 * G) + (0.11 * B) ).
BRIGHTNESS OF IMAGE

► Load the image


► Define a variable with the amount of brightness to be increased

► brightness_increase = 50 brightened_image = [Link](img +


brightness_increase, 0, 255).astype(np.uint8)
GRAY LEVEL HISTOGRAMS

► Depicts frequency of occurrence of each gray value


► Can be interpreted by probability density functions
► PDF represents the likelihood of a pixel having a particular intensity value
► To convert a histogram to a PDF, you need to normalize it.
► Normalization involves dividing each bin count by the total number of pixels in the image.
► The normalized histogram values then represent probabilities, indicating the likelihood of a pixel
having a specific intensity value.
► The area under the PDF curve should sum to 1, as it represents the probability of a pixel having
an intensity value within the entire intensity range.
► import cv2
► import numpy as np
► from matplotlib import pyplot as plt
► import numpy as np
► from [Link] import cv2_imshow
► path = '/content/[Link]'
► img = [Link](path, cv2.IMREAD_GRAYSCALE)
► cv2_imshow(img)

dst = [Link](img, [0], None, [256], [0,256])

[Link]([Link](),256,[0,256])
► [Link]('Histogram for gray scale image')
► [Link]()
► image = [Link](img, (
200, 200))


b, g, r = [Link](image)

► hist_b = [Link]([b], [
0], None, [256], [0, 256])

► hist_g = [Link]([g], [
0], None, [256], [0, 256])

► hist_r = [Link]([r], [
0], None, [256], [0, 256])

► [Link](hist_b, color='blue', label='Blue Channel')

► [Link](hist_g, color='green', label='Green Channel')

► [Link](hist_r, color='red', label='Red Channel')


[Link]('RGB Histogram')

► [Link]('Pixel Value')

► [Link]('Frequency')

► [Link]()

► [Link]()
HISTOGRAM EQUALIZATION

► Histogram equalization is a method in image processing of contrast adjustment using the


image’s histogram
► This allows for areas of lower local contrast to gain a higher contrast
► The goal is to create an image with evenly distributed gray levels
► import cv2
► import numpy as np
► img = [Link]('/content/[Link]', 0)
► equ = [Link](img)
► res = [Link]((img, equ))
► cv2_imshow( res)
► [Link](0)
► [Link]()
Equalized Histogram
AFFINE TRANSFORMATIONS

► To find the transformation matrix, we need three points from input image and their
corresponding locations in the output image
► Is a geometric transformation that preserves lines and parallelism (but not necessarily
distances and angles
Common transformations

► Rotation: cv2.getRotationMatrix2D(center, angle,scale) function is used to make the


transformation matrix M which will be used for rotating a image.

► Translation: A translation matrix is created and passed to [Link] to shift the


object’s location
► import cv2
► import numpy as np
► image = [Link]('/content/[Link]')
► height, width = [Link][:2]
► tx, ty = width / 4, height / 4
► # create the translation matrix using tx and ty, it is a NumPy array
► translation_matrix = [Link]([
► [1, 0, tx],
► [0, 1, ty]
► ], dtype=np.float32)
► translated_image = [Link](src=image, M=translation_matrix, dsize=(width, height))
► # display the original and the Translated images
► cv2_imshow(translated_image)
AFFINE TRANSFORMATIONS

► img = [Link]('/content/[Link]')
► rows, cols, ch = [Link]

► pts1 = np.float32([[50, 50],[200, 50], [50, 200]])

► pts2 = np.float32([[10, 100],[200, 50], [100, 250]])

► M = [Link](pts1, pts2)
► dst = [Link](img, M, (cols, rows))
► cv2_imshow(dst)
DIFFERENTIATE BETWEEN
TRANSFORMATIONS

► Euclidean
► Affine
► Projective
Affine Vs Non Affine

Affine Non-affine
Includes scaling, translation Includes projective transformations
Parallelism is preserved Not preserved
Also called homography
Euclidean transformation

► In Affine transformations, we can do the following operations:Rotation, Shearing,


Translation, Scaling etc. {2x3 matrix is used}->refer slides
► In Euclidian transformation, we can do rotation and translation
► Subset of Affine transform
► Preserves distance, shape
► Parallelism is maintained
► Also called isometric transform
Projective/Perspective Transformation

► 3D scenes projected to 2D
► Resultant image depends on camera’s viewpoint
► Ratios or dimensions of objects change
► May not preserve angles
► No parallelism
► Generalized Affine transform
► For affine transformation, the projection vector is equal to 0. Thus, affine
transformation can be considered as a particular case of perspective transformation.
► Since the transformation matrix (M) is defined by 8 constants(degree of freedom), thus to
find this matrix we first select 4 points in the input image and map these 4 points to the
desired locations in the unknown output image according to the use-case (This way we
will have 8 equations and 8 unknowns and that can be easily solved).
CONVOLUTION AND FILTERING

► In order to detect vertical and horizontal images, we can go for convolution with the help
of filters
► Use of mxm filter on nxn image to extract features and detect edges
► Gives nxm-1 dimension image
Solve

Filters
TYPES OF FILTERS

► Weights are added to a filter matrix in order to extract features or add edges
► For instance, Sober filter adds more weight to the central row of pixels
as compared to the filter used before
Use of different filters helps to add robustness
CONVOLUTIONAL NEURAL
NETWORKS
• A CNN typically has three layers: a convolutional layer, a pooling layer,
and a fully connected layer.
• The convolution layer is the core building block of the CNN. It carries the
main portion of the network’s computational load.
• This layer performs a dot product between two matrices, where one
matrix is the set of learnable parameters otherwise known as a kernel,
and the other matrix is the restricted portion of the receptive field.
Nxn
Kxk 64
(n-k+1) x (n-k+1)
4x4

2x2 matrix
Pooling
2x2
POOLING LAYER
FULLY CONNECTED LAYER

► Neurons in this layer have full connectivity with all neurons in the preceding and
succeeding layer as seen in regular FCNN.
► This is why it can be computed as usual by a matrix multiplication followed by a
bias effect.
► The FC layer helps to map the representation between the input and the output.
NON-LINEAR ACTIVATION FUNCTIONS

► Sigmoid
► The sigmoid non-linearity has the mathematical form
σ(κ) = 1/(1+e-k)
► It takes a real-valued number and “squashes” it into a range between 0 and 1.
► Tanh
► Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation
saturates, but — unlike the sigmoid neurons — its output is zero centered.
► ReLU
► The Rectified Linear Unit (ReLU) has become very popular in the last few years. It
computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at
zero.
YOLO- YOU ONLY LOOK ONCE - 4/2

► Object detection is the problem of both locating AND classifying objects


► Goal of YOLO algorithm is to do object detection both fast AND with high accuracy
Image Localization Using YOLO

► Image Localization is the process of identifying the correct location of one or


multiple objects using bounding boxes, which correspond to rectangular shapes
around the objects.
► This process is sometimes confused with image classification or image
recognition, which aims to predict the class of an image or an object within an
image into one of the categories or classes.
► The illustration below corresponds to the visual representation of the previous
explanation. The object detected within the image is “Person.”
Advantages

► High Detection Accuracy


► Speed
► Better Generalization
► Open-source
Approaches Behind YOLO

- Residual blocks
► This first step starts by dividing the original image (A) into NxN grid cells of equal shape, where
N in our case is 4 shown on the image on the right. Each cell in the grid is responsible for
localizing and predicting the class of the object that it covers, along with the
probability/confidence value.
-Bounding boxes
The next step is to determine the bounding boxes which correspond to rectangles highlighting all
the objects in the image. We can have as many bounding boxes as there are objects within a given
image.
► YOLO determines the attributes of these bounding boxes using a single regression module in
the following format, where Y is the final vector representation for each bounding box.
► Y = [pc, bx, by, bh, bw, c1, c2]
► This is especially important during the training phase of the model.
YOLO- YOU ONLY LOOK ONCE

• Given an image generate bounding boxes, one for each detectable object in image
• For each bounding box, output 5 predictions: x, y, w, h, confidence.
• Also output class
x, y (coordinates for center of bounding box)
w,h (width and height)
confidence (probability bounding box has object)
class (classification of object in bounding box)
YOLO- YOU ONLY LOOK ONCE
Fourier Transform

► Converts image from spatial domain to frequency domain


► Used for correction of images
► FFT works on grayscale images
► Single image band formation can help to detect spots, noise etc. from the image
Discrete FFT

► Sampled FFT
► Sets of samples which describe the spatial image
► Helps to find periodic patterns in spatial domain images
► Inverse FFT converts image from frequency to spatial domain
► Separation based on sine and cosine components
► cos Ɵ+ isin Ɵ, cos Ɵ – i sin Ɵ (e-i Ɵ & ei Ɵ
► The term in exponential power is called basis function.
Inverse FFT:
FOURIER TRANSFORM

► import cv2

► import numpy as np

► from matplotlib import pyplot as plt

► image = [Link]('/content/[Link]', cv2.IMREAD_GRAYSCALE)

► f_transform = [Link].fft2(image)

► f_transform_shifted = [Link](f_transform)


magnitude_spectrum = [Link](np.
abs(f_transform_shifted) + 1)


[Link](121), [Link](image, cmap=
'gray')

► [Link]('Original Image'), [Link]([]), [Link]([])


[Link](122), [Link](magnitude_spectrum, cmap=
'gray')

► [Link]('Fourier Transform'), [Link]([]), [Link]([])


[Link]()
IMAGE NOISE & FILTERS 7/2

► Noise adds random variations in brightness and color information of an existing image
► Adding noise to images can help in testing the performance of image processing
algorithms such as denoising, segmentation, and feature detection under different
levels of noise.
TYPES OF FILTERS

► LINEAR
Gaussian
Box filter
Weighted Average Filter
► NON-LINEAR
Median Filter
Min filter
Max filter
IMAGE SMOOTHING

► Box filter- Uses a box matrix / kernel with equal coefficients

► Weighted average filter- Gives more weight to pixels near the output location

► Gaussian filter- Gets weights using 2D Gaussian function


A Gaussian kernel

► import numpy as np

def gaussian_kernel(size, sigma=1.0):
► kernel = [Link](
► lambda x, y: (1/(2*[Link]*sigma**2)) * [Link](-((x-(size-1)/2)**2 + (y-(size-1)/2)**2)/(2*sigma**2)),
► (size, size)
► )
► return kernel / [Link](kernel)
► kernel_size = 5
► sigma = 1.0
► gaussian_kernel_matrix = gaussian_kernel(kernel_size, sigma)
► print("Gaussian Kernel Matrix:")
► for row in gaussian_kernel_matrix:
► print(["{:.5f}".format(value) for value in row])
IMAGE SMOOTHING USING GAUSSIAN
FILTER

► Gaussian filter is the most popular filter


► Window weights follow a Gaussian distribution
► Influence of neigboring pixels decreases with distance to the center
► Degree of smoothing depends on standard deviation.
► More the standard deviation, broader the Gaussian distribution
► The normalization factor is important to ensure that area under the distribution remains
same
GAUSSIAN NOISE & GAUSSIAN FILTER

► import [Link] as plt


► def apply_gaussian_filter(image, kernel_size=(5, 5), sigma=4):
► return [Link](image, kernel_size, sigma)
► image = [Link]('/content/[Link]')

filtered_image = apply_gaussian_filter(image)

# Save the filtered image to a file
► [Link]('path/to/your/output/image_filtered.jpg', filtered_image)
► [Link](131), [Link]([Link](noisy_image, cv2.COLOR_BGR2RGB)), [Link]('Original
Image')
► [Link](133), [Link]([Link](filtered_image, cv2.COLOR_BGR2RGB)),
[Link]('Filtered Image')
► [Link]()
EDGE DETECTION: CANNY EDGE
DETECTION

► Contours are found by looking for differences between adjacent pixels


► Regions are found by looking for similarities between adjacent pixel values
► To segment images, we can separate pixels base don gray levels, depth, texture etc.
► Contours appear if they correspond to edges
► Canny Edge Detector helps to detect various edges in an image
► Step 1: Apply Gaussian filter (for noise removal)
► Step 2: Find the gradients
► Step 3: Remove local maxima
► Step 4: Binarize the image
NOISE REDUCTION & GRADIENT
CALCULATION

► Edge detection results are highly sensitive to image noise


► Applying Gaussian filter to smooth it
► Edges correspond to a change of pixels’ intensity. T
► To detect it, the easiest way is to apply filters that highlight this intensity
change in both directions: horizontal (x) and vertical (y) by convolving
sobel filters with image
NOISE REDUCTION & GRADIENT
CALCULATION

► Magnitude of Gradient and angle is calculated as:

► The result is almost the expected one, but some of the edges are thick
and others are thin.
► Non-Max Suppression step will help us mitigate the thick ones.
NON-MAX SUPPRESSION

► The algorithm goes through all the points on the gradient intensity matrix and finds
the pixels with the maximum value in the edge directions.
► if one those two pixels are more intense than the one being processed, then only
the more intense one is kept.
► Hence, the intensity value of the current pixel (i, j) is set to 0. If there are no pixels
in the edge direction having more intense values, then the value of the current
pixel is kept.
• Create a matrix initialized to 0 of the same size of the original gradient intensity
matrix;
• Identify the edge direction based on the angle value from the angle matrix;
• Check if the pixel in the same direction has a higher intensity than the pixel that is
currently processed;
• Return the image processed with the non-max suppression algorithm.
• We can still notice some variation regarding the edges’ intensity: some pixels
seem to be brighter than others.
DOUBLE THRESHOLDING

• High threshold is used to identify the strong pixels (intensity higher than the high
threshold)
• Low threshold is used to identify the non-relevant pixels (intensity lower than the
low threshold)
• All pixels having intensity between both thresholds are flagged as weak and the
Hysteresis mechanism will help us identify the ones that could be considered as
strong and the ones that are considered as non-relevant.
• the hysteresis consists of transforming weak pixels into strong ones, if and only if
at least one of the pixels around the one being processed is a strong one
CANNY EDGE DETECTOR

► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► image = [Link]('/content/[Link]', cv2.IMREAD_GRAYSCALE)

blurred_image = [Link](image, (5, 5), 0)



canny_edges = [Link](blurred_image, 30, 150)
► cv2_imshow(image)
► cv2_imshow(canny_edges)
► [Link](0)
► [Link]()

LAPLACIAN OPERATOR

► Edge detection algorithms using Sobel Operator work on the first derivative of an
image.
► When the image is smoothed, the derivatives Ix and Iy w.r.t. x and y are
calculated. by convolving I with Sobel kernels Kx and Ky, respectively.
► First derivative of an image might be subject to noise
► Laplacian operator makes use of second derivative of the images
LAPLACIAN OF GAUSSIAN (LOG)

► We can approximate the second derivatives by using the following convolutional


kernels
► An edge occurs where the graph of the second derivative crosses zero
► Calculating just the Laplacian will result in a lot of noise, so we need to convolve a
Gaussian smoothing filter with the Laplacian filter to reduce noise prior to
computing the second derivatives.
► The LoG kernel is convolved with a grayscale input image to detect the zero
crossings of the second derivative. We set a threshold for these zero crossings
and retain only those zero crossings that exceed the threshold.
Types of Laplacian operator

► In Positive Laplacian we have standard mask in which center element of the mask should
be negative and corner elements of mask should be zero.
► Positive Laplacian Operator is use to take out outward edges in an image.
► In negative Laplacian operator we also have a standard mask, in which center element
should be positive. All the elements in the corner should be zero and rest of all the
elements in the mask should be -1.
► Negative Laplacian operator is use to take out inward edges in an image
► The Laplacian Operator achieves a sharpening effect by enhancing the grayscale contrast
of the image. As a second-order differential operator, it enhances areas with sudden
grayscale changes in the image and weakens areas with slow grayscale changes. But the
processed image loses the direction information of the edges and enhances the noise
CAMERA GEOMETRY

► How does a camera map perspective projection points on the image plane?
► A camera projects a 3D scene onto a 2D image plane. This transformation can be
represented using a projection matrix P, which maps a 3D point to a 2D image point
► Determining external and internal parameters of a camera is called camera calibration
► Estimating linear models is easier than non-linear camera models.
► Used to develop camera calibration
► Determine projection matrix
LINEAR CAMERA MODEL: 3D to 2D
INTRINSIC MATRIX

► Given mx as number of pixels per mm in the X direction and my as number of pixels per
mm in the Y direction
► U=mx*xi + ox
► V=my*yi + oy
► Consider (ox,oy) is the centre of the image
► Fx,fy,ox,oy are known as intrinsic parameters of a camera
► Also called camera’s internal geometry
► Corresponding intrinsic matrix:

Calibration
matrix
EXTRINSIC MATRIX

► Position C and Orientation R of camera are extrinsic parameters


► R is the interpretation of rotation matrix
EXTRACTING THE PARAMETERS
STEREO VISION

► Process of comparing 2 or more images of the same scene


► Recovers 3D structure of a scene from 2D image
► Also called binocular vision
► Used in self driving cars, robotics etc.
Stereo Vision Acquisition
STEREO VISION

► Images captured from one integrated stereo vision camera or two cameras at a time
► Also called binocular vision
► Camera calibration accurate in integrated camera
► No movement in case of multiple cameras
► Orientation of one camera with respect to another
► Used for depth extraction
DEPTH ESTIMATION

► Take images from two cameras


► Calculate the disparities between images
► Obtain the disparity maps and depth maps
► Exact depth or distance from that object
► Disparity is calculated a sthe difference between xl & xr
Camera systems at (0,0,0) and (b,0,0)
DEPTH ESTIMATION

► Disparity is inversely proportional to depth of a point


► If a point is close to the two-camera system, disparity will be large
► Disparity shrinks as we move away
► The closer a point/more the disparity, the brighter it is in disparity map
► Stereo Matching refers to finding disparities
► There will be no disparity in vertical direction
► import cv2 as cv
► from matplotlib import pyplot as plt

imgR = [Link]('/content/[Link]')
► imgL = [Link]('/content/[Link]')

stereo = cv.StereoBM_create(numDisparities = 16,
► blockSize = 15)

disparity = [Link](imgL, imgR)
► [Link](disparity, 'gray')
► [Link]()
BASIC MORPHOLOGICAL OPERATIONS
ON IMAGES
MORPHOLOGICAL OPERATIONS

• Performed on binarized images


• Each pixel is adjusted based on value in the neighborhood
• Processing done based on kernel which defines the operation
EROSIONS

• Erodes away the boundaries of the foreground object


• Used to diminish the features of an image.
• A kernel(a matrix of odd size(3,5,7) is convolved with the image.
• A pixel in the original image (either 1 or 0) will be considered 1 if any of the pixels
surrounding it in the kernel during convolution are 1.
• It increases the white region in the image o
DILATION & EROSION

• Dilatio expands the boundaries of an object in an image. This is done by convolving the
image with a structuring element, which determines the size and shape of the dilation. The
output of the dilation operation is a new image where the pixels in the original image are
expanded or dilated.
• Erosion is a morphological operation that shrinks the boundaries of an object in an image.
This is done by convolving the image with a structuring element, which determines the
size and shape of the erosion. The output of the erosion operation is a new image where
the pixels in the original image are eroded or shrunk.
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► img = [Link]('/content/[Link]', 0)
► kernel = [Link]((5, 5), np.uint8)
► img_erosion = [Link](img, kernel, iterations=
1)
► img_dilation = [Link](img, kernel, iterations=
1)
► cv2_imshow(img)
► cv2_imshow(img_erosion)
► cv2_imshow(img_dilation)
► [Link](0)
OPENING AND CLOSING

• Opening is just another name of erosion followed by dilation.


• It is useful in removing noise
• Closing is reverse of Opening, Dilation followed by Erosion.
• It is useful in closing small holes inside the foreground objects, or small black points on
the object.
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► # Reading the input image
► img = [Link]('/content/[Link]', 0)

► # Taking a matrix of size 5 as the kernel


► kernel = [Link]((5, 5), np.uint8)

► opening = [Link](img, cv2.MORPH_OPEN, kernel)


► cv2_imshow(img)
► cv2_imshow(opening)
► [Link](0)
GRADIENT

► It is the difference between dilation and erosion of an image.


► Helps to find outlines of images
TOP HAT

► Highlights minor details of images


► Input image - Opening
BLACK HAT

► Highlights bright objects in a dark background


► Closing - Image
LINE DETECTION: HOUGH TRANSFORM

► Problem of extraneous data in edge detection problems


► Given the edge points, we can detect lines from the points
► Detects line y=mx+c
► Image space depicts the point and parameter space depicts number of lines passing
through that point
LINE DETECTION: HOUGH TRANSFORM

► Create an acculumator array A


► Set all values A(m,c)= 0
► For each edge (xi,yi)
Convert the line from (x,y) plane to (m,c) plane
C=-mx+y
For every m you get for a straight line, increment the values
A(m,c)=A(m,c)+1 for all lines passing through these points
At the points of intersection, we will get values such as 2, 3 and so on
Disadvantages

► If accumulator array is small, lines might get missed


► If array is large, it will waste memory
► Solution : Use line equation x sin Ɵ – y cos Ɵ + r =0
► Ɵ lies between 0 and 180
► R is finite (distance of line from origin)
► Better parameterization
Using line parameters

► Maps to a sinusoidal wave


► Better parameterization
► Mapping as shown in figure
► Too big acculumator array may merge different lines
► Too small array might lead to missing of lines
► Extract peaks in the accumulator array
Detecting a circle

► Works on given set of edges in a circle


► (x-a)2+ (y-b)2 = r2
► A point (xi,yi) on the circle will map to a circle
in the Hough space
All the circles intersect at a point (a,b)
► import cv2
► import numpy as np
► img = [Link]( '/content/[Link]' )
► gray = [Link](img, cv2.COLOR_BGR2GRAY)
► edges = [Link](gray, 50, 150, apertureSize= 3)
► lines = [Link](edges, 1, [Link]/180, 200)
► for r_theta in lines:
► arr = [Link](r_theta[ 0], dtype=np.float64)
► r, theta = arr
► a = [Link](theta)
► b = [Link](theta)
► x0 = a*r
► y0 = b*r
► x1 = int(x0 + 1000*(-b))
How to solve?

► Consider image of 100x100


► Select the first point (x,y) and vary values of r from 0 to 180
► Check value of r
► For every (r, theta) pair, increment the accumulator array by 1
► Try the next point and repeat the procedure
► For instance, the blue point will be voted up
► Accumulator with maximum votes indicates a line
Corner detection: Harris & Hessian

► A corner is a point whose local neighborhood is characterized by large intensity


variation in all directions.
► Corners are important features in computer vision because they are points stable over
changes of viewpoint and illumination
► Large variation in gradients at all points of interest
► Intersection of two lines
► Matching of corners is easier than edges
Harris Corner Detector

► Recognize a point by looking through a small window


► Shifting a window in any direction will give a large change in intensity
► No change in intensity in the flat region or edge region
► Change in intensity along the corners
► Value of R is calculated
► When |R| is small, the region is flat.
► When R<0, indicates edge.
► When R is large, are large and λ1∼λ2, the region is a corner.
Algorithm

► Apply a Gaussian filter to smooth out any noise


► Apply Sobel operator to find the x and y gradient values for every pixel in the
grayscale image
► For each pixel p in the grayscale image, consider a 3×3 window around it and
compute the corner strength function. Call this its Harris value.
► Find all pixels that exceed a certain threshold and are the local maxima within a
certain window (to prevent redundant dupes of features)
► import cv2
► import numpy as np
► from [Link] import cv2_imshow
► image = [Link]('/content/[Link]')
► operatedImage = [Link](image, cv2.COLOR_BGR2GRAY)
► operatedImage = np.float32(operatedImage)

► dest = [Link](operatedImage, 2, 5, 0.07)


► dest = [Link](dest, None)

► image[dest > 0.01 * [Link]()]=[0, 0, 255]


► cv2_imshow(image)
Advantages of Harris Detector

► It finds pixel-intensity displacement of (u, v) such that the function E gets


maximized for pixels in the window
► Estimates the second moment matrix to get a clue about whether a corner lies
inside of the window or not, by looking at the matrix.
► If one eigenvalue is significantly higher => derivative with respect to one direction
is much stronger than the other => pixel lies on an edge
► If both eigenvalues are small => pixel intensities do not change in any direction =>
pixel lies on a flat region
► If both eigenvalues are large => pixel intensities largely chance in
both x and y direction => pixel lies on a corner
Hessian Detector

► While the basic ideas of detecting corners remain the same as the Harris detector,
the Hessian detector makes use of the Hessian matrix and determinant, instead of
second-moment matrix M and corner response function R, respectively.
► Entries in Hessian matrix are second derivatives.
Disadvantages

► Once a corner gets magnified and becomes bigger than the size of the window by
zooming, the Harris and Hessian can no longer detect the corner.
► It is because what the detectors perceive through the window is not a corner
anymore but an edge due to the scale change.
Feature Extraction

► This is an area of image processing that uses algorithms to detect and isolate various
desired portions of a digitized image.
► A feature is a significant piece of information extracted from an image which provides
more detailed understanding of the image.
► Example, Detecting of faces in an image filled with people and other objects, Detecting of
facial features such as eyes, nose, mouth, Detecting of edges, so that a feature can be
extracted and compared with another
Feature Detection

► Feature detection is to identify the presence of a certain type of feature or object in an


image.
► Feature detection is usually achieved by studying the statistic variations of certain regions
and their backgrounds to locate unusual activities.
► Once an interesting feature has been detected, the representation of this feature will be
used to compare with all possible features known to the processor.
Need For Feature Descriptors

► Need to recognize objects with unique and descriptive features in the process of object
recognition
► Detection of different feature families:
► Local pixels (SIFT, SURF..)
► Global pixel features (Histogram, Texture, Color)
► Shape of pixel regions(Area, Perimeter)
► Basis sets (FFT, Haar Wavelet)
Characteristics of features

► Salient
► Robust to clutter
► Repeatable
► Fewer and efficient
Local Feature Descriptors

► Detectors and descriptors can be used combined or independently for local feature
descriptions
► Searching strategies can be pixel-wise or tiled
► Aims to find pieces of objects
► SIFT, SURF, HOG
Global Feature Descriptors

► Texture Histograms
► Spatial Dependency matrix
► Regional Descriptors
Shape Features

► Area
► Perimeter
► Centroid
Basis set descriptors

► HAAR Wavelets
► Fourier transforms
Basic CV Pipeline

► Sensor processing
► Image Processing
► Global Metrics
► Local features
► Training
► Augmentation & Control
► Performance
SIFT : Feature Detector and Descriptor

► Scale Invariant Feature Transform


► 2D object detection
► Image Alignment
► Detects patches with local appearance
► Detects key interest points
► Handles multiple scales (position, magnification)
► SIFT detector and descriptor
COMPUTER VISION
LEC 1
BY : DR. SHEFALI ARORA CHOUHAN
ASSISTANT PROFESSOR, DEPT. OF CSE
SIFT: Scale Invariant Feature Transform

► Transform image data into scale invariant keypoint coordinates


► Feature descriptors describe the local characteristics around the keypoint
► Helps in recognition, motion tracking
► Irrespective of transformations such as shearing, scaling etc.,
► Step 1: Scale space Detection
Detect interesting points(Example, use of harris corner detector, DOG etc.)
► Step 2: Keypoint localization
Determine location and scale of each candidate location
► Step 3: Orientation estimation
Use local image gradients on localized keypoints
► Step 4: Keypoint descriptor
Extract local image gradients and form a representation invariant to shape/distortion
Scale Space Detection

► The scale space of an image is a function produced from the convolution of a Gaussian
kernel(Blurring) at different scales with the input image
► Scale-space is separated into octaves and the number of octaves and scale depends on the
size of the original image.
► We generate several octaves of the original image.
► Each octave’s image size is half the previous one

Blurred image L(x,y,σ )= G(x,y,σ )*I(x,y)


► Use those blurred images to generate another set of images, the Difference of Gaussians
(DoG).
► These DoG images are great for finding out interesting keypoints in the image.
► The difference of Gaussian is obtained as the difference of Gaussian blurring of an image
with two different σ, let it be σ and kσ.

D(x,y,σ )= I’(x,y,σ )-I(x,y,k σ)


Finding keypoints

► Up till now, we have generated a scale space and used the scale space to calculate the
Difference of Gaussians.
► One pixel in an image is compared with its 8 neighbors as well as 9 pixels in the next scale
and 9 pixels in previous scales.
► Select the pixel if it is larger/smaller than all the 26 pixels around it in spatial or scaled
neighborhood
Keypoint localization

► Keypoints generated in the previous step produce a lot of keypoints. Some of them lie
along an edge, or they don’t have enough contrast.
► We get rid of them. The approach is similar to the one used in the Harris Corner Detector
for removing edge features.
► Reject low contrast points
► Reject edges (Use Harris matrix based response function)
► SIFT uses HESSIAN corner detector
Orientation estimation


► Based on the value of gradient magnitudes, we can also increase or decrease a histogram
entry
► Select the peak as the direction of keypoint
Describe the Keypoints

► Take a 16x16 window around the detected keypoint and calculate the local gradients
► Divide the window into 4 quadrants
► Construct a histogram of orientations in each quadrant
► 4 histograms in 8x8 window and 16 histograms in 16x16 window
► Done with respect to 8 bins
► 16 histograms-> 8 values
► The 128 non-negative values obtained give us the keypoint descriptor
(128 dimensional SIFT vector)
import numpy as np
import cv2 as cv
img = [Link]('/content/[Link]')
gray= [Link](img,cv.COLOR_BGR2GRAY)
sift = cv.SIFT_create()
kp = [Link](gray,None)
img=[Link](gray,kp,img)
[Link]('sift_keypoints.jpg',img)
Advantages

► Robust Feature Detector


► Handles changes in rotation
► Handles changes in viewpoint
► Handles changes in illumination
► Fast and efficient
Applications

► Used in image stitching


► Find keypoints in two images and understand the transformation of points between two
scenarios.
► Stitch them together to form a panorama
Histogram of Oriented Gradients

► To find the HOG of a point, extract a block (square window) of some size
► Divide the block into smaller grids
► For each cell, find an orientation histogram
► Concatenate them together
► For instance, if we obtain 40 values, then a 40 dimensional vector will be obtained
SURF- Speeded Up Robust Features

► High dimensionality is a drawback of SIFT feature detector


► Reduction in dimensionality reduces accuracy
► Based on Hessian matrix
► Uses DoG
► Describes distribution of Haar wavelet responses within neighborhood
► Uses only 64 dimensions
► Indexing step based on sign of Laplacian
The Concept of SURF

► Makes use of integral images


► Sums up surrounding pixel including the pixel itself
► Helps to detect rectangular features quickly
Keypoint detection

► Fast Hessian detector


► In order to calculate the determinant of the Hessian matrix,
first we need to apply convolution with Gaussian kernel, then
second-order derivative.
► SURF pushes the approximation(both convolution and
second-order derivative) even further with box filters.
► These approximate second-order Gaussian derivatives and
can be evaluated at a very low computational cost using
integral images and independently of size, and this is part of
the reason why SURF is fast.
Haar Wavelets

► Haar filters to get keypoint orientations


► Handles blur and rotation variations
► 3 times faster than SIFT
Scale space analysis vs box filters

► Scale spaces implemented as image pyramids


► Repetitively smoothed and subsampled
► Box filter can be directly applied on any image
Orientation assignment

► Grid of equispaced points


► Gradient of x and y directions calculated for 100 keypoints
► Haar wavelet filter gives summation of pixels in white region minus separation of pixels
in the black region
► Response: D-C-B+A
► Use Gausian of sigma 2.5 to give
Weightage to central pixels
SIFT & SURF Pipeline
GLOH

► SIFT suffers from some limitations, such as being computationally expensive and
not being able to handle images with repetitive patterns or cluttered backgrounds
effectively.
► Gradient Location Orientation Histogram (GLOH) is an extension of SIFT
► Takes into account the gradient orientation of keypoints, which provides rich
information about the local structure and texture of an image
► Includes the location information of keypoints, which allows it to capture the
spatial distribution of features in an image.
► This is particularly useful for tasks where the relative positions of objects or
regions in an image are important, such as object detection and tracking.
Advantages

► Robust to Scale, Rotation, and Illumination Changes


► Efficient and Scalable
► Discriminative and Informative
► Can be used on large datasets
► Applications: Image recognition and object tracking
PCA

► Principal Component Analysis


► Used for dimensionality reduction
► four main parts: feature covariance, eigendecomposition, principal component
transformation, and choosing components in terms of explained variance.
► Used to solve the ‘Curse of Dimensionality’ Problem:As the number of features or
dimensions grows, the amount of data we need to generalize accurately grows
exponentially
PCA

► Identifies a set of unrelated variables called principal components


► Find maximum amount of variance from lowest number of principal components
► Covariance is calculated if we want to use variables of higher variance
► Correlation can be used if all variables need to be weighed equally
STEPS
OF PCA
Local Binary Patterns

► LBP is based on appearance features.


► It is a way to describe the local structure of an
image in a way that is invariant to changes in
illumination.
► LBP works by comparing the intensity of a
central pixel in a small neighborhood with the
intensity of its surrounding pixels. Each pixel in
the neighborhood is assigned a binary value
based on whether its intensity is greater than or
less than the intensity of the central pixel
► These binary values are then concatenated into
a binary number, which represents the texture of
that neighborhood.
STEPS

1. Choose a pixel in the image and select its neighboring pixels in a circular or rectangular region around it.
2. Take the threshold (intensity of the selected pixel, here it is 50).
3. Go through every neighboring pixel and check whether its intensity is greater than or less than the
threshold.
Assign 1 to the neighboring pixel, if the intensity of the neighboring pixel is greater than the threshold.
Assign 0 to the neighboring pixel, if the intensity of the neighboring pixel is less than the threshold.
4. Combine the binary values for all neighboring pixels to obtain a binary code for the central pixel
(Anti-clockwise, starting from the top left corner), and convert it to a decimal value.
5. Repeat steps 1–4 for each pixel in the image to obtain a binary code for each pixel.
6. Now use these LBP values to construct the histogram. By constructing a histogram of the LBP patterns,
we can capture the frequency of occurrence of different texture patterns in the image. This histogram can
then be used as a feature vector for texture classification tasks, where the goal is to automatically classify
images based on their texture properties.
Advantages

LBP is robust to illumination variations, which means that it can effectively capture
texture information in images that have different lighting conditions. This makes it
particularly useful for applications such as facial recognition and object detection,
where lighting conditions can vary significantly.
LBP is a computationally efficient method for texture analysis, which makes it
suitable for processing large datasets and real-time applications.
LBP is invariant to image rotation and scale. Hence it can effectively capture
texture information in images that have been rotated or scaled.
LBP has been shown to be highly discriminative for texture analysis
Disadvantages

LBP is sensitive to noise in the image. This can affect its ability to accurately
capture texture information. T
he LBP operator compares neighboring pixel intensities, and if there is noise in
the image, it can result in incorrect binary values that can affect the resulting LBP
histogram.
LBP only captures local texture information in the immediate vicinity of each pixel,
which can limit its ability to capture more global texture information in the image.
While LBP is invariant to image rotation, it does not capture rotational information
in the texture patterns.
LBP is typically applied to grayscale images, which means that it does not capture
color information in the texture patterns.
K means clustering

► K-Means clustering algorithm is an


unsupervised algorithm and it is
used to segment the interest area
from the background. It clusters, or
partitions the given data into
K-clusters or parts based on the
K-centroids.
► Image segmentation is the
classification of an image into
different groups.
► Many kinds of research have been
done in the area of image
segmentation using clustering.
Steps

1. Choose the number of clusters K.


2. Select at random K points, the centroids(not necessarily from your dataset).
3. Assign each data point to the closest centroid → that forms K clusters.
4. Compute and place the new centroid of each cluster.
5. Reassign each data point to the new closest centroid. If any reassignment . took place, go
to step 4, otherwise, the model is ready.
► Image Segmentation involves converting an image into a collection of regions of pixels that are
represented by a mask or a labeled image.
► By dividing an image into segments, you can process only the important segments of the image
instead of processing the entire image.
► A common technique is to look for abrupt discontinuities in pixel values, which typically indicate
edges that define a region.
► Another common approach is to detect similarities in the regions of an image. Some techniques
that follow this approach are region growing, clustering, and thresholding.
► A variety of other approaches to perform image segmentation have been developed over the years
using domain-specific knowledge to effectively solve segmentation problems in specific
application areas.
Optical Flow Tracking

► Optical flow is the pattern of apparent motion of image objects between two consecutive
frames caused by the movement of object or camera. It is 2D vector field where each
vector is a displacement vector showing the movement of points from first frame to
second.
► Motion of scene points in a sequence of images
► Image should be a grayscale image.
► Use a corner detector algorithm to select and discard corners
► Compare consecutive frames of videos to initial images and frames
► Use Lucas Kanade method
Object detection vs Object Tracking

► Object Tracking involves correspondence between frames to find the location of single
object or multiple objects
► Object detection involves finding objects in an image or video
► Traditional objects:
Background subtraction : Pixel differences in frames
Shadow removal
Shape based classification
Model/Feature based models for tracking
Lucas Kanade method

► For each pixel, motion flow is constant in a small neighbourhood


► The constraint is as follows:
► Ix(k,l)u+Iy(k,l)v+Tt(k,l)=0 (for points (k,l) in a window)
► (Derivate along x axis , Derivative along y axis and Derivate with respect to time)
Applications: Video Surveillance
Gabor Filters

► A Gabor filter is a linear filter used in image processing for edge detection, texture
classification, feature extraction and disparity estimation.
► Combines Gaussian and sinusoidal terms to get weights and directionality
► A Gabor filter bank is a set of Gabor filters with different parameters.
► Different low-level features can be extracted from the original image via the convolution
operation by varying the Gabor parameters
How to extract features?

► The simplest way to generate features with Gabor Filters is to use the real or the imaginary
part of the response matrix as a feature vector
► The phase of the response matrix can be taken as a feature vector since it contains relevant
information about the edges
► The amplitude of the response matrix can be taken as a feature vector since it includes the
frequency spectrum
► The squared sum of different wavelength responses with the same orientation can be taken
as a feature vector. It represents the local energy in a particular direction
Bag of Visual Words

► Bag of visual words (BOVW) is commonly used in image classification. Its concept is
adapted from information retrieval and NLP’s bag of words (BOW).
► In bag of words (BOW), we count the number of each word appears in a document, use
the frequency of each word to know the keywords of the document, and make a frequency
histogram from it.
► The general idea of bag of visual words (BOVW) is to represent an image as a set of
features. Features consists of keypoints and descriptors.
► We use the keypoints and descriptors to construct vocabularies and represent each image
as a frequency histogram of features that are in the image.
How to build Bag of Visual Words?

► We detect features, extract descriptors from each image in the dataset, and build a visual
dictionary.
► Detecting features and extracting descriptors in an image can be done by using feature
extractor algorithms (for example, SIFT, KAZE, etc).
Image Texture Analysis

► Texture describes repeating patterns in an image


► It can describe distribution of pixels in an image
► Texture can be coarse, fine, smooth etc.
Types of texture analysis

► Texture classification: Identification of texture into a set of classes


► Texture segmentation : Partitioning of regions or boundaries based on texture
► Texture synthesis : Synthesizing a given image from a sample based on structural content
Local Binary Patterns

► LBP is based on appearance features.


► It is a way to describe the local structure of an
image in a way that is invariant to changes in
illumination.
► LBP works by comparing the intensity of a
central pixel in a small neighborhood with the
intensity of its surrounding pixels. Each pixel in
the neighborhood is assigned a binary value
based on whether its intensity is greater than or
less than the intensity of the central pixel
► These binary values are then concatenated into
a binary number, which represents the texture of
that neighborhood.
STEPS

1. Choose a pixel in the image and select its neighboring pixels in a circular or rectangular region around it.
2. Take the threshold (intensity of the selected pixel, here it is 50).
3. Go through every neighboring pixel and check whether its intensity is greater than or less than the
threshold.
Assign 1 to the neighboring pixel, if the intensity of the neighboring pixel is greater than the threshold.
Assign 0 to the neighboring pixel, if the intensity of the neighboring pixel is less than the threshold.
4. Combine the binary values for all neighboring pixels to obtain a binary code for the central pixel
(Anti-clockwise, starting from the top left corner), and convert it to a decimal value.
5. Repeat steps 1–4 for each pixel in the image to obtain a binary code for each pixel.
6. Now use these LBP values to construct the histogram. By constructing a histogram of the LBP patterns,
we can capture the frequency of occurrence of different texture patterns in the image. This histogram can
then be used as a feature vector for texture classification tasks, where the goal is to automatically classify
images based on their texture properties.
Advantages

LBP is robust to illumination variations, which means that it can effectively capture
texture information in images that have different lighting conditions. This makes it
particularly useful for applications such as facial recognition and object detection,
where lighting conditions can vary significantly.
LBP is a computationally efficient method for texture analysis, which makes it
suitable for processing large datasets and real-time applications.
LBP is invariant to image rotation and scale. Hence it can effectively capture
texture information in images that have been rotated or scaled.
LBP has been shown to be highly discriminative for texture analysis
Disadvantages

LBP is sensitive to noise in the image. This can affect its ability to accurately
capture texture information. T
he LBP operator compares neighboring pixel intensities, and if there is noise in
the image, it can result in incorrect binary values that can affect the resulting LBP
histogram.
LBP only captures local texture information in the immediate vicinity of each pixel,
which can limit its ability to capture more global texture information in the image.
While LBP is invariant to image rotation, it does not capture rotational information
in the texture patterns.
LBP is typically applied to grayscale images, which means that it does not capture
color information in the texture patterns.
Gray Level Co-occurrence matrix

► The key concept of GLCM lies in analyzing how often gray levels (intensities) occur together
within an image, specifically considering neighboring pixels.
► Pixel offsets and directions: By counting co-occurrences in a specific pixel offset (often 1 or 2)
and specific directions (e.g., horizontal, vertical, diagonal), GLCM captures the spatial
arrangement of textures.
3 parameters used in calculating GLCM:
► Distance (d): The displacement between two pixels.
► 2. Angle (θ): The direction in which pixel pairs are considered, typically in 0°, 45°, 90°, and
135°.
► 3. Number of Gray Levels (G): The number of discrete intensity levels in the image.
What does it measure?

► Contrast: How intense is the difference between neighboring pixels. High contrast values
indicate large differences between neighboring pixel intensities.

► Dissimilarity: Measures the average difference in intensity between neighboring pixels.


High dissimilarity values indicate greater heterogeneity in texture.
► Correlation: Measures the linear dependency between pixel pairs. High correlation values
indicate a more predictable texture.
K means clustering

► K-Means clustering algorithm is an


unsupervised algorithm and it is
used to segment the interest area
from the background. It clusters, or
partitions the given data into
K-clusters or parts based on the
K-centroids.
► Image segmentation is the
classification of an image into
different groups.
► Many kinds of research have been
done in the area of image
segmentation using clustering.
Steps

1. Choose the number of clusters K.


2. Select at random K points, the centroids(not necessarily from your dataset).
3. Assign each data point to the closest centroid → that forms K clusters.
4. Compute and place the new centroid of each cluster.
5. Reassign each data point to the new closest centroid. If any reassignment . took place, go
to step 4, otherwise, the model is ready.
► Cluster the following eight points (with (x, y) representing locations) into three clusters:
► A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9)

► Initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
► The distance function between two points a = (x1, y1) and b = (x2, y2) is defined as-
► Ρ(a, b) = |x2 – x1| + |y2 – y1|

You might also like