📚
DLIP
  • Introduction
  • Prerequisite
  • Image Processing Basics
    • Notes
      • Thresholding
      • Spatial Filtering
      • Masking with Bitwise Operation
      • Model n Calibration
    • Tutorial
      • Tutorial: Install OpenCV C++
      • Tutorial: Create OpenCV Project
      • Tutorial: C++ basics
      • Tutorial: OpenCV Basics
      • Tutorial: Image Watch for Debugging
      • Tutorial: Spatial Filter
      • Tutorial: Thresholding and Morphology
      • Tutorial: Camera Calibration
      • Tutorial: Color Image Processing
      • Tutorial: Edge Line Circle Detection
      • Tutorial: Corner Detection and Optical Flow
      • Tutorial: OpenCV C++ Cheatsheet
      • Tutorial: Installation for Py OpenCV
      • Tutorial: OpenCv (Python) Basics
    • LAB
      • Lab Report Template
      • Lab Report Grading Criteria
      • LAB Report Instruction
      • LAB: Grayscale Image Segmentation
        • LAB: Grayscale Image Segmentation -Gear
        • LAB: Grayscale Image Segmentation - Bolt and Nut
      • LAB: Color Image Segmentation
        • LAB: Facial Temperature Measurement with IR images
        • LAB: Magic Cloak
      • LAB: Straight Lane Detection and Departure Warning
      • LAB: Dimension Measurement with 2D camera
      • LAB: Tension Detection of Rolling Metal Sheet
  • Deep Learning for Perception
    • Notes
      • Lane Detection with Deep Learning
      • Overview of Deep Learning
        • Object Detection
        • Deep Learning Basics: Introduction
        • Deep Learning State of the Art
        • CNN, Object Detection
      • Perceptron
      • Activation Function
      • Optimization
      • Convolution
      • CNN Overview
      • Evaluation Metric
      • LossFunction Regularization
      • Bias vs Variance
      • BottleNeck Unit
      • Object Detection
      • DL Techniques
        • Technical Strategy by A.Ng
    • Tutorial - PyTorch
      • Tutorial: Install PyTorch
      • Tutorial: Python Numpy
      • Tutorial: PyTorch Tutorial List
      • Tutorial: PyTorch Example Code
      • Tutorial: Tensorboard in Pytorch
      • Tutorial: YOLO in PyTorch
        • Tutorial: Yolov8 in PyTorch
        • Tutorial: Train Yolo v8 with custom dataset
          • Tutorial: Train Yolo v5 with custom dataset
        • Tutorial: Yolov5 in Pytorch (VS code)
        • Tutorial: Yolov3 in Keras
    • LAB
      • Assignment: CNN Classification
      • Assignment: Object Detection
      • LAB: CNN Object Detection 1
      • LAB: CNN Object Detection 2
      • LAB Grading Criteria
    • Tutorial- Keras
      • Train Dataset
      • Train custom dataset
      • Test model
      • LeNet-5 Tutorial
      • AlexNet Tutorial
      • VGG Tutorial
      • ResNet Tutorial
    • Resource
      • Online Lecture
      • Programming tutorial
      • Books
      • Hardware
      • Dataset
      • Useful sites
  • Must Read Papers
    • AlexNet
    • VGG
    • ResNet
    • R-CNN, Fast-RCNN, Faster-RCNN
    • YOLOv1-3
    • Inception
    • MobileNet
    • SSD
    • ShuffleNet
    • Recent Methods
  • DLIP Project
    • Report Template
    • DLIP 2021 Projects
      • Digital Door Lock Control with Face Recognition
      • People Counting with YOLOv4 and DeepSORT
      • Eye Blinking Detection Alarm
      • Helmet-Detection Using YOLO-V5
      • Mask Detection using YOLOv5
      • Parking Space Management
      • Vehicle, Pedestrian Detection with IR Image
      • Drum Playing Detection
      • Turtle neck measurement program using OpenPose
    • DLIP 2022 Projects
      • BakeryCashier
      • Virtual Mouse
      • Sudoku Program with Hand gesture
      • Exercise Posture Assistance System
      • People Counting Embedded System
      • Turtle neck measurement program using OpenPose
    • DLIP Past Projects
  • Installation Guide
    • Installation Guide for Pytorch
      • Installation Guide 2021
    • Anaconda
    • CUDA cuDNN
      • CUDA 10.2
    • OpenCV
      • OpenCV Install and Setup
        • OpenCV 3.4.13 with VS2019
        • OpenCV3.4.7 VS2017
        • MacOS OpenCV C++ in XCode
      • Python OpenCV
      • MATLAB-OpenCV
    • Framework
      • Keras
      • TensorFlow
        • Cheat Sheet
        • Tutorial
      • PyTorch
    • IDE
      • Visual Studio Community
      • Google Codelab
      • Visual Studio Code
        • Python with VS Code
        • Notebook with VS Code
        • C++ with VS Code
      • Jupyter Notebook
        • Install
        • How to use
    • Ubuntu
      • Ubuntu 18.04 Installation
      • Ubuntu Installation using Docker in Win10
      • Ubuntu Troubleshooting
    • ROS
  • Programming
    • Python_Numpy
      • Python Tutorial - Tips
      • Python Tutorial - For Loop
      • Python Tutorial - List Tuple, Dic, Set
    • Markdown
      • Example: API documentation
    • Github
      • Create account
      • Tutorial: Github basic
      • Tutorial: Github Desktop
    • Keras
      • Tutorial Keras
      • Cheat Sheet
    • PyTorch
      • Cheat Sheet
      • Autograd in PyTorch
      • Simple ConvNet
      • MNIST using LeNet
      • Train ConvNet using CIFAR10
  • Resources
    • Useful Resources
    • Github
Powered by GitBook
On this page
  • Introduction
  • Single Shot MultiBox Detector
  • Architecture
  • Results
  • Additional Notes On SSD

Was this helpful?

  1. Must Read Papers

SSD

PreviousMobileNetNextShuffleNet

Last updated 3 years ago

Was this helpful?

Introduction

(by W.Liu, C. Szegedy et al.,2016), object detector scoring over 74% mAP at 59 FPS on and .

  • Single Shot: this means that the tasks of object localization and classification _are done in a _single forward pass of the network

  • MultiBox: this is the name of a technique for bounding box regression developed by Szegedy et al. (we will briefly cover it shortly)

  • Detector: The network is an object detector that also classifies those detected objects

Single Shot MultiBox Detector

Architecture

Based on VGG-16. No FC layers but added a set of auxiliary convolutional layers (from conv6 onwards). This can extract features at multiple scales and progressively decrease the size of the input to each subsequent layer.

Baseline: VGG architecture

MultiBox

Multibox: It is for Bounding Box Proposal of fast class-agnostic bounding box coordinate proposals.

Class-agnostic means bounding box of object without classfication

Multibox contains 11 priors per feature map cell (8x8, 6x6, 4x4, 3x3, 2x2) and only one on the 1x1 feature map, resulting in a total of 1420 priors per image

11*(8*8)+11*(6*6)+...+11*(2*2)=1419 + 1(1*1)=1420

MultiBox’s loss function also combined two critical components:

multibox_loss = confidence_loss + alpha * location_loss

Results

Default Bounding Boxes

The SSD paper has around 6 bounding boxes per feature map cell.

Pascal VOC 2007

Additional Notes On SSD

The SSD paper makes the following additional observations:

  • more default boxes results in more accurate detection, although there is an impact on speed

  • having MultiBox on multiple layers results in better detection as well, due to the detector running on features at multiple resolutions

  • 80% of the time is spent on the base VGG-16 network: this means that with a faster and equally accurate network SSD’s performance could be even better

  • SSD confuses objects with similar categories (e.g. animals). This is probably because locations are shared for multiple classes

  • SSD-500 (the highest resolution variant using 512x512 input images) achieves best mAP on Pascal VOC2007 at 76.8%, but at the expense of speed, where its frame rate drops to 22 fps. SSD-300 is thus a much better trade-off with 74.3 mAP at 59 fps.

  • SSD produces worse performance on smaller objects, as they may not appear across all feature maps. Increasing the input image resolution alleviates this problem but does not completely address it

The bounding box regression technique of SSD is inspired by Szegedy’s work on . MultiBox starts with the priors as predictions and attempt to regress closer to the ground truth bounding boxes. At the end, MultiBox only retains the top K predictions that have minimised both location (LOC) and confidence (CONF) losses.

Confidence Loss: this measures how confident the network is of the objectness of the computed bounding box. Categorical is used to compute this loss.

Location Loss: this measures how far away the network’s predicted bounding boxes are from the ground truth ones from the training set. is used here.

MultiBox
cross-entropy
L2-Norm
SSD: Single Shot MultiBox Detector
PascalVOC
COCO
A comparison between two single shot detection models: SSD and YOLO [5]. Our SSD model adds several feature layers to the end of a base network, which predict the offsets to default boxes of different scales and aspect ratios and their associated confidences. SSD with a 300 × 300 input size significantly outperforms its 448 × 448 YOLO counterpart in accuracy on VOC2007 test while also improving the speed.
VGG architecture (input is 224x224x3)
Architecture of multi-scale convolutional prediction of the location and confidences of multibox