0% found this document useful (0 votes)
88 views31 pages

YOLO: Object Detection Overview

Uploaded by

Manel Lnsry
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views31 pages

YOLO: Object Detection Overview

Uploaded by

Manel Lnsry
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
  • Concepts in Object Detection
  • A Brief History of Object Detection
  • YOLO: You Only Look Once
  • Training and Performance
  • Criticisms of YOLO
  • Security Concerns

You Only Look Once

path to design a detector


Feng Wang

AIRD, Coretronic Co.


Apr 17, 2019
The slides and a list of references can be found from
[Link]
Outlines

 Concepts in object detection

 A brief history of object detection

 YOLO
 design
 loss function
 training
 weaknesses
Classification vs detection/recognition
Common tasks on images

[Link]
Bounding box proposal
Region of interest, region proposal, box proposal
Ground truth

Proposed bounding box

5 parameters
 w, h
 x, y
 confidence score: how likely it
contains an object & accuracy
of the box
How good: Intersection over Union (IOU)

Overlap Area Examples


IOU =
Union Area
0:

1:
Outlines

 Concepts in object detection

 A brief history of object detection

 YOLO
 design
 loss function
 training
 weaknesses
A brief history of object detection

[Link]
A brief history of object detection

 Before CNN, people use handcrafted features to locate and


classify objects. (not too bad)

 CNN boosts the accuracy of classification

ImageNet
A brief history of object detection

Region proposal -> Single shot:


classification Region proposal + classification
 e.g. RCNN  e.g. YOLO, SSD
 accurate  fast
 slow  less accurate
Outlines

 Concepts in object detection

 A brief history of object detection

 YOLO
 design
 loss function
 training
 weaknesses
YOLO: you look only once

Results
 x, y, w, h
 confidence
Look once score:
contain an object &
box accuracy
 class score:
belong to a class

Let's use CNN, Why not regress?


since it's good. They are just numbers.
Let's go to CNN

YOLO v1's CNN: GoogLeNet variant, 24 layers

YOLO v3's CNN: darknet-53

YOLO v2's CNN: darknet-19, 19 layers


Let's do regression
-- wait, wait, how many bounding boxes? Where are they
initially?
Better solution: using grids

Results for one box


 x, y, w, h
 confidence score:
contain an object &
box accuracy
 class score:
belong to a class
 Maybe set N as a large number?
 Maybe initially put them randomly?

Note: N is large, but much smaller than R-CNN's


region proposal.
Let's do regression with non-maximal suppression
Proposed Proposed Class scores
box 1 box 2
class 1
Grid x, y, w, h x, y, w, h class 2,
1
confidence confidence ...
score score class 20

... ... ... ...


Proposed Proposed Class scores
box 1 box 2
class 1
We can use CNN to extract features, and Grid x, y, w, h x, y, w, h class 2,
SxS
finally perform a regression to detect confidence confidence ...
objects. score score class 20
 YOLO v1: fully connected layers
 v2 & v3: convolutional layers
arXiv: 1506.02640, 1612.08242, 1804.02767 vector size: SxSx(5x2+20)
Loss function
Problems
 One object is partially/fully covered by several boxes.
 Most boxes has no objects.
 Multi-task training problem: location & class
 Small objects need more accurate location & box
size.

Solution
Oh, no math please. Let's speak human language

Problem 1:
One object is
partially/fully
covered by
several boxes.

 Each true object has one proposed box “responsible” to it.


Rule: the one with highest overlap with the ground truth boxes.
 When inference, we use non-maximal suppression to select the best among the proposals.
Human language

Problem 2: 0.5
Most boxes has
no objects.
Human language

Problem 3:
Multi-task training
problem: location
& class. Weighted sum: here the problem is left untouched.
Human language

sqrt

Problem 4:
Small objects need
more accurate
location & box size.
Other problems
 x, y can be out of the grid cell
 smaller objects can locate
worse than the largers

 probability can be out of [0, 1]


Fix them in YOLO v2

Pre-defined box size


Pre-defined box: anchor
 Naturally, objects have special aspect ratios and sizes.
 This can be a good starting point.
 We don't need randomly initialized boxes' shapes.

 Handcrafted box size vs clustering algorithms

 Box can reshape during training.

 The number of pre-defined boxes is


a hyperparameter
 v2 uses 5
 v3 uses 9

Anchor-free detection is a research topic, see [Link] for an instance. anchors used in YOLO v2
Improvements (in v2)
 Resizing image sizes randomly during training: {320, 352, ..., 608}
 CNN only reduce an image by a constant factor (here 32), hence is robust to input image size
 resize every 10 epochs.
 multi-scale training

 Passthrough layer  Odd number of grid cells


 No loss to perform reshaping

vs

Feature map
Training
ImageNet: COCO/PASCAL VOC:
classification dataset detection dataset

YOLO
Step 1: Step 2 (transfer learning):
 train classification backbone  remove head layers
 add regression as new head
 fine-tune backbone & train head

Training tricks
 decaying learning rate
 batch normalization
 data augmentation
Performance
Generalizability

Picasso & People-Art dataset


But ... no free lunch
 YOLO is not as accurate as RCNN-series models
 multi-task problem:
YOLO wins in less background error,
however, loses in localization error.

 YOLO is poor for detecting small objects


 CNN: training on ImageNet may not generalize well for small objects (classification)
 loss function equalizes location weights for small & large objects (localization)
50+ years
 YOLO is not good at crowd objects
 non-maximal suppression. See an improvement: Adaptive NMS (arXiv:1904.03629)

 YOLO is bad when encountering strange aspect ratio


 pre-defined anchors, or anchors learned from data. Go anchor-free (arXiv:1904.01355).
Security
CNN (classification) can be fooled, as well as
YOLO, and the issues can be even worse.

Non-maximal suppression is fooled.

Daedalus: Breaking Non-Maximum


Suppression in Object Detection via
Adversarial Examples. arXiv:1902.02067
Is there anything helpful to improve?
Darwin's evolution

arXiv: 1807.05511

You Only Look Once
path to design a detector
Feng Wang
AIRD, Coretronic Co.
Apr 17, 2019
The slides and a list of references
Outlines
Concepts in object detection
A brief history of object detection
YOLO
design
loss function
training
weaknesse
Classification vs detection/recognition
Common tasks on images
https://medium.com/@nikasa1889/the-modern-history-of-object-recognition-infographic-aea18517c318
Bounding box proposal
Region of interest, region proposal, box proposal
Ground truth
Proposed bounding box
5 parameters
w, h
How good: Intersection over Union (IOU)
Overlap Area
Union Area
IOU   =
0:
1:
Examples
Outlines
Concepts in object detection
A brief history of object detection
YOLO
design
loss function
training
weaknesse
A brief history of object detection
https://stats385.github.io
A brief history of object detection
Before CNN, people use handcrafted features to locate and 
classify objects. (not too ba
A brief history of object detection
Region proposal -> 
classification
e.g. RCNN
accurate
slow
Single shot:
Region proposa

You might also like