📚
DLIP
  • Introduction
  • Prerequisite
  • Image Processing Basics
    • Notes
      • Thresholding
      • Spatial Filtering
      • Masking with Bitwise Operation
      • Model n Calibration
    • Tutorial
      • Tutorial: Install OpenCV C++
      • Tutorial: Create OpenCV Project
      • Tutorial: C++ basics
      • Tutorial: OpenCV Basics
      • Tutorial: Image Watch for Debugging
      • Tutorial: Spatial Filter
      • Tutorial: Thresholding and Morphology
      • Tutorial: Camera Calibration
      • Tutorial: Color Image Processing
      • Tutorial: Edge Line Circle Detection
      • Tutorial: Corner Detection and Optical Flow
      • Tutorial: OpenCV C++ Cheatsheet
      • Tutorial: Installation for Py OpenCV
      • Tutorial: OpenCv (Python) Basics
    • LAB
      • Lab Report Template
      • Lab Report Grading Criteria
      • LAB Report Instruction
      • LAB: Grayscale Image Segmentation
        • LAB: Grayscale Image Segmentation -Gear
        • LAB: Grayscale Image Segmentation - Bolt and Nut
      • LAB: Color Image Segmentation
        • LAB: Facial Temperature Measurement with IR images
        • LAB: Magic Cloak
      • LAB: Straight Lane Detection and Departure Warning
      • LAB: Dimension Measurement with 2D camera
      • LAB: Tension Detection of Rolling Metal Sheet
  • Deep Learning for Perception
    • Notes
      • Lane Detection with Deep Learning
      • Overview of Deep Learning
        • Object Detection
        • Deep Learning Basics: Introduction
        • Deep Learning State of the Art
        • CNN, Object Detection
      • Perceptron
      • Activation Function
      • Optimization
      • Convolution
      • CNN Overview
      • Evaluation Metric
      • LossFunction Regularization
      • Bias vs Variance
      • BottleNeck Unit
      • Object Detection
      • DL Techniques
        • Technical Strategy by A.Ng
    • Tutorial - PyTorch
      • Tutorial: Install PyTorch
      • Tutorial: Python Numpy
      • Tutorial: PyTorch Tutorial List
      • Tutorial: PyTorch Example Code
      • Tutorial: Tensorboard in Pytorch
      • Tutorial: YOLO in PyTorch
        • Tutorial: Yolov8 in PyTorch
        • Tutorial: Train Yolo v8 with custom dataset
          • Tutorial: Train Yolo v5 with custom dataset
        • Tutorial: Yolov5 in Pytorch (VS code)
        • Tutorial: Yolov3 in Keras
    • LAB
      • Assignment: CNN Classification
      • Assignment: Object Detection
      • LAB: CNN Object Detection 1
      • LAB: CNN Object Detection 2
      • LAB Grading Criteria
    • Tutorial- Keras
      • Train Dataset
      • Train custom dataset
      • Test model
      • LeNet-5 Tutorial
      • AlexNet Tutorial
      • VGG Tutorial
      • ResNet Tutorial
    • Resource
      • Online Lecture
      • Programming tutorial
      • Books
      • Hardware
      • Dataset
      • Useful sites
  • Must Read Papers
    • AlexNet
    • VGG
    • ResNet
    • R-CNN, Fast-RCNN, Faster-RCNN
    • YOLOv1-3
    • Inception
    • MobileNet
    • SSD
    • ShuffleNet
    • Recent Methods
  • DLIP Project
    • Report Template
    • DLIP 2021 Projects
      • Digital Door Lock Control with Face Recognition
      • People Counting with YOLOv4 and DeepSORT
      • Eye Blinking Detection Alarm
      • Helmet-Detection Using YOLO-V5
      • Mask Detection using YOLOv5
      • Parking Space Management
      • Vehicle, Pedestrian Detection with IR Image
      • Drum Playing Detection
      • Turtle neck measurement program using OpenPose
    • DLIP 2022 Projects
      • BakeryCashier
      • Virtual Mouse
      • Sudoku Program with Hand gesture
      • Exercise Posture Assistance System
      • People Counting Embedded System
      • Turtle neck measurement program using OpenPose
    • DLIP Past Projects
  • Installation Guide
    • Installation Guide for Pytorch
      • Installation Guide 2021
    • Anaconda
    • CUDA cuDNN
      • CUDA 10.2
    • OpenCV
      • OpenCV Install and Setup
        • OpenCV 3.4.13 with VS2019
        • OpenCV3.4.7 VS2017
        • MacOS OpenCV C++ in XCode
      • Python OpenCV
      • MATLAB-OpenCV
    • Framework
      • Keras
      • TensorFlow
        • Cheat Sheet
        • Tutorial
      • PyTorch
    • IDE
      • Visual Studio Community
      • Google Codelab
      • Visual Studio Code
        • Python with VS Code
        • Notebook with VS Code
        • C++ with VS Code
      • Jupyter Notebook
        • Install
        • How to use
    • Ubuntu
      • Ubuntu 18.04 Installation
      • Ubuntu Installation using Docker in Win10
      • Ubuntu Troubleshooting
    • ROS
  • Programming
    • Python_Numpy
      • Python Tutorial - Tips
      • Python Tutorial - For Loop
      • Python Tutorial - List Tuple, Dic, Set
    • Markdown
      • Example: API documentation
    • Github
      • Create account
      • Tutorial: Github basic
      • Tutorial: Github Desktop
    • Keras
      • Tutorial Keras
      • Cheat Sheet
    • PyTorch
      • Cheat Sheet
      • Autograd in PyTorch
      • Simple ConvNet
      • MNIST using LeNet
      • Train ConvNet using CIFAR10
  • Resources
    • Useful Resources
    • Github
Powered by GitBook
On this page
  • Convolution: single channel
  • Filter vs Kernel
  • 2D Convolution: multiple channel
  • 3D Convolution
  • 1D Convolution
  • Cost of Convolution
  • Separable Convolution
  • Spatially Separable Convolution
  • Depthwise Separable Convolution
  • Grouped Convolution
  • Shuffled Grouped Convolution
  • Pointwise grouped convolution

Was this helpful?

  1. Deep Learning for Perception
  2. Notes

Convolution

PreviousOptimizationNextCNN Overview

Last updated 3 years ago

Was this helpful?

There are different types of Convolution in CNN such as

  • 2D Conv, 3D Conv

  • 1x1 Conv, BottleNeck

  • Spatially Separable

  • Depthwise Separable

  • Grouped Convolution

  • Shuffled Grouped

Read the following blog for more detailed explanations on types of convolution

Convolution: single channel

It is the element-wise multiplication and addition with window sliding.

Read the followings for more detailed information

Using 3x3 kernel. from 5x5=25 input features --> 3x3=9 output.

Common techniques in convolution

  • Padding: pad the edges with '0','1' or other values

  • With padding: WxHxC --> WxHxC

  • Without padding: WxHxC -> (W-w+1)x(H-h+1)xC

  • Striding: skip some of the slide locations

  • ⌊(nh−kh+ph+sh)/sh⌋×⌊(nw−kw+pw+sw)/sw⌋.

  • With padding: WxHxC  (W+S-1)/S x (H+S-1)/S x C Without padding: WxHxC  (W-w+S)/S x (H-h+S)/S xC

Filter vs Kernel

For 2D convolution, kernel and filter are the same

For 3D convolution, a filter is the collection of the stacked kernels

2D Convolution: multiple channel

The filter has the same depth (channel) as the input matrix.

The output is 2D matrix.

Example: Input is 5x5x3 matrix. Filter is 3x3x3 matrix.

Then, three channels are summed by element-wise addition to form one single channel (3x3x1)

3D Convolution

A general form of convolution but the filter kernel size < channel size. The filter moves in three directions: height, width, channel

The output is 3D matrix.

1D Convolution

Input: HxWxD. Filtering with 1x1xD produces the Output' HxWx1

Initially, proposed in 'Network-in-Network (2013)' . Widely used after introduced in 'Inception (2014)'

  • Dimensionality reduction for efficient computations

    • HxWxD --> HxWx1

  • Efficient low dimensional embedding, or feature pooling

    *

  • Applying nonlinearity again after convolution

    • after 1x1 conv, non-linear activation(ReLU etc) can be added

Cost of Convolution

Calculation cost for a convolution depends on:

  1. Input size: i*i*D

  2. Kernel Size: k*k*D

  3. Stride: s

  4. Padding: p

The output image ( o*o*1 ) then becomes

The required operations are

  • o*o repetition of { (k*k) multiplications and (k*k-1) additions}

In terms of multiplications

  • For input of size H x W x D, 2D convolution (stride=1, padding=0) with Nc kernels of size h x h x D, where h is even

  • Total multiplications: Nc x h x h x D x (H-h+1) x (W-h+1)

Separable Convolution

Spatially Separable Convolution

Not used much in deep learning. It is decomposing a convolution into two separate operations

Example: A Sobel kernel can be divided into a 3 x 1 and a 1 x 3 kernel.

Depthwise Separable Convolution

Commonly used in Deep Learning such as MobileNet and Xception. It is two steps of (1) Depthwise convolution (2) 1x1 convolution

For example: Input 7*7*3 --> 128 of 3*3*3 filters --> 5*5*128 output

  • Step1: Depthwise convolution

    • Each layer of a single filter is separated into kernels. (e.g. 3 of 3x3x1)

    • Each kernel convoles with 1 channel(only) layer input : (5*5*1) for each kernel

    • Then, stack the maps to get the final output e.g. (5*5*3)

  • Step 2: 1*1 Convolution

    • Apply 1*1 convolution with 1*1*3 kernels to get 5*5*1 map.

    • Apply 128 of 1x1 convolutions to get 5*5*128 map

  • Standard 2D convolution vs Depthwise Convolution

Calculation comparison

  • Standard: 128*(3*3*3)*(5*5) multiplications

    • 128*(3*3*3)*(5*5) =86,400

    • Nc x h x h x D x (H-h+1) x (W-h+1)

  • Separable: 3*(3*3*1)*(5*5)+128*(1*1*3)*(5*5) multiplications

    • =675+9600=10,275 (12%)

    • D x h x h x 1 x (H-h+1) x (W-h+1) + Nc x 1 x 1 x D x (H-h+1) x (W-h+1) = (h x h + Nc) x D x (H-h+1) x (W-h+1)

  • The ratio of multiplication is

  • If Nc>>h, then it is approx. 1/(h^2). for 5x5 filters, 25 times more multiplications

Grouped Convolution

Introduced in AlexNet(2012), to do parallel convolutions. The filters are separated into different groups. Each group is responsible for standard 2D conv with certain depth. Then the each outputs are concetenated in depth-wise

  • Model-Parallelization for efficient training

    • each group can be handled by different GPUs

    • Better than data parallelization using batches

  • Efficient Computation

    • Standard: h x w x Din x Dout

    • Grouped: 2*(h x w x Din/2 x Dout/2)= (1/2)*(h x w x Din x Dout)

Shuffled Grouped Convolution

Pointwise grouped convolution

The group operation is performed on the 3x3 spatial convolution, but not on 1 x 1 convolution. The ShuffleNet suggested 1x1 convolution on Group convolution

Group convolution of 1x1 filters instead of NxN filters (N>1).

Two-dimensional cross-correlation with padding

image from

Image fDifference between “layer” (“filter”) and “channel” (“kernel”)or post

Used in , for efficient processing.

Introduced by for computation -efficient convolution. The idea is mixing up the information from different filter groups to connect the information flow between the channel groups.

Read for the paper explanations

Intuitive Understanding of Convolution
Guide to convolution arithmetic for deep learning
here
MobileNet(2017)
Xception(2016)
ShuffleNet(2017)
this blog
LogoA Comprehensive Introduction to Different Types of Convolutions in Deep LearningMedium
Convolution for a single channel. Image is adopted from medium@IrhumShafkat
Same padding[1]
Cross-correlation with strides of 3 and 2 for height and width, respectively.
A stride 2 convolution w/o padding [1]
The first step of 2D convolution for multi-channels: each of the kernels in the filter are applied to three channels in the input layer, separately. The image is adopted from this link.
Another way to think about 2D convolution: thinking of the process as sliding a 3D filter matrix through the input layer. Notice that the input layer and the filter have the same depth (channel number = kernel number). The 3D filter moves only in 2-direction, height & width of the image (That’s why such operation is called as 2D convolution although a 3D filter is used to process 3D volumetric data). The output is a one-layer matrix
In 3D convolution, a 3D filter can move in all 3-direction (height, width, channel of the image). At each position, the element-wise multiplication and addition provide one number. Since the filter slides through a 3D space, the output numbers are arranged in a 3D space as well. The output is then a 3D data.
1 x 1 convolution, where the filter size is 1 x 1 x D.
For output o*o.
Spatially separable convolution with 1 channel.
Depthwise separable convolution — first step: Instead of using a single filter of size 3 x 3 x 3 in 2D convolution, we used 3 kernels, separately. Each filter has size 3 x 3 x 1. Each kernel convolves with 1 channel of the input layer (1 channel only, not all channels!). Each of such convolution provides a map of size 5 x 5 x 1. We then stack these maps together to create a 5 x 5 x 3 image. After this, we have the output with size 5 x 5 x 3.
Depthwise separable convolution — second step: apply multiple 1 x 1 convolutions to modify depth.
Standard 2D convolution
The overall process of depthwise separable convolution.
Grouped convolution with 2 filter groups
Channel shuffle.