📚
DLIP
  • Introduction
  • Prerequisite
  • Image Processing Basics
    • Notes
      • Thresholding
      • Spatial Filtering
      • Masking with Bitwise Operation
      • Model n Calibration
    • Tutorial
      • Tutorial: Install OpenCV C++
      • Tutorial: Create OpenCV Project
      • Tutorial: C++ basics
      • Tutorial: OpenCV Basics
      • Tutorial: Image Watch for Debugging
      • Tutorial: Spatial Filter
      • Tutorial: Thresholding and Morphology
      • Tutorial: Camera Calibration
      • Tutorial: Color Image Processing
      • Tutorial: Edge Line Circle Detection
      • Tutorial: Corner Detection and Optical Flow
      • Tutorial: OpenCV C++ Cheatsheet
      • Tutorial: Installation for Py OpenCV
      • Tutorial: OpenCv (Python) Basics
    • LAB
      • Lab Report Template
      • Lab Report Grading Criteria
      • LAB Report Instruction
      • LAB: Grayscale Image Segmentation
        • LAB: Grayscale Image Segmentation -Gear
        • LAB: Grayscale Image Segmentation - Bolt and Nut
      • LAB: Color Image Segmentation
        • LAB: Facial Temperature Measurement with IR images
        • LAB: Magic Cloak
      • LAB: Straight Lane Detection and Departure Warning
      • LAB: Dimension Measurement with 2D camera
      • LAB: Tension Detection of Rolling Metal Sheet
  • Deep Learning for Perception
    • Notes
      • Lane Detection with Deep Learning
      • Overview of Deep Learning
        • Object Detection
        • Deep Learning Basics: Introduction
        • Deep Learning State of the Art
        • CNN, Object Detection
      • Perceptron
      • Activation Function
      • Optimization
      • Convolution
      • CNN Overview
      • Evaluation Metric
      • LossFunction Regularization
      • Bias vs Variance
      • BottleNeck Unit
      • Object Detection
      • DL Techniques
        • Technical Strategy by A.Ng
    • Tutorial - PyTorch
      • Tutorial: Install PyTorch
      • Tutorial: Python Numpy
      • Tutorial: PyTorch Tutorial List
      • Tutorial: PyTorch Example Code
      • Tutorial: Tensorboard in Pytorch
      • Tutorial: YOLO in PyTorch
        • Tutorial: Yolov8 in PyTorch
        • Tutorial: Train Yolo v8 with custom dataset
          • Tutorial: Train Yolo v5 with custom dataset
        • Tutorial: Yolov5 in Pytorch (VS code)
        • Tutorial: Yolov3 in Keras
    • LAB
      • Assignment: CNN Classification
      • Assignment: Object Detection
      • LAB: CNN Object Detection 1
      • LAB: CNN Object Detection 2
      • LAB Grading Criteria
    • Tutorial- Keras
      • Train Dataset
      • Train custom dataset
      • Test model
      • LeNet-5 Tutorial
      • AlexNet Tutorial
      • VGG Tutorial
      • ResNet Tutorial
    • Resource
      • Online Lecture
      • Programming tutorial
      • Books
      • Hardware
      • Dataset
      • Useful sites
  • Must Read Papers
    • AlexNet
    • VGG
    • ResNet
    • R-CNN, Fast-RCNN, Faster-RCNN
    • YOLOv1-3
    • Inception
    • MobileNet
    • SSD
    • ShuffleNet
    • Recent Methods
  • DLIP Project
    • Report Template
    • DLIP 2021 Projects
      • Digital Door Lock Control with Face Recognition
      • People Counting with YOLOv4 and DeepSORT
      • Eye Blinking Detection Alarm
      • Helmet-Detection Using YOLO-V5
      • Mask Detection using YOLOv5
      • Parking Space Management
      • Vehicle, Pedestrian Detection with IR Image
      • Drum Playing Detection
      • Turtle neck measurement program using OpenPose
    • DLIP 2022 Projects
      • BakeryCashier
      • Virtual Mouse
      • Sudoku Program with Hand gesture
      • Exercise Posture Assistance System
      • People Counting Embedded System
      • Turtle neck measurement program using OpenPose
    • DLIP Past Projects
  • Installation Guide
    • Installation Guide for Pytorch
      • Installation Guide 2021
    • Anaconda
    • CUDA cuDNN
      • CUDA 10.2
    • OpenCV
      • OpenCV Install and Setup
        • OpenCV 3.4.13 with VS2019
        • OpenCV3.4.7 VS2017
        • MacOS OpenCV C++ in XCode
      • Python OpenCV
      • MATLAB-OpenCV
    • Framework
      • Keras
      • TensorFlow
        • Cheat Sheet
        • Tutorial
      • PyTorch
    • IDE
      • Visual Studio Community
      • Google Codelab
      • Visual Studio Code
        • Python with VS Code
        • Notebook with VS Code
        • C++ with VS Code
      • Jupyter Notebook
        • Install
        • How to use
    • Ubuntu
      • Ubuntu 18.04 Installation
      • Ubuntu Installation using Docker in Win10
      • Ubuntu Troubleshooting
    • ROS
  • Programming
    • Python_Numpy
      • Python Tutorial - Tips
      • Python Tutorial - For Loop
      • Python Tutorial - List Tuple, Dic, Set
    • Markdown
      • Example: API documentation
    • Github
      • Create account
      • Tutorial: Github basic
      • Tutorial: Github Desktop
    • Keras
      • Tutorial Keras
      • Cheat Sheet
    • PyTorch
      • Cheat Sheet
      • Autograd in PyTorch
      • Simple ConvNet
      • MNIST using LeNet
      • Train ConvNet using CIFAR10
  • Resources
    • Useful Resources
    • Github
Powered by GitBook
On this page
  • Batch Normalization
  • Reducing Internal Covariance Shift
  • Effect of Regularization
  • Regularization Methods
  • Effects of Methods
  • Effect of Batch size
  • What is Batch Size

Was this helpful?

  1. Deep Learning for Perception
  2. Notes

LossFunction Regularization

PreviousEvaluation MetricNextBias vs Variance

Last updated 3 years ago

Was this helpful?

Batch Normalization

Batch normalization greatly reduces the variation in the loss landscape, gradient productiveness, and β-smoothness, making the task of navigating the terrain to find the global error minima much easier.

  • More freedom in setting the initial learning rate. Large initial learning rates will not result in missing out on the minimum during optimization, and can lead to quicker convergence.

  • Accelerate the learning rate decay.

  • Remove dropout. One can get away with not using dropout layers when using batch normalization, since dropout can provide damage and/or slow down the training process. Batch normalization introduces an additional form of resistance to overfitting.

  • Reduce L2 weight regularization.

  • Solving the vanishing gradient problem.

  • Solving the exploding gradient problem.

For example: We train our data on only black cats’ images. So, if we now try to apply this network to data with colored cats, it is obvious; we’re not going to do well. The training set and the prediction set are both cats’ images but they differ a little bit. In other words, if an algorithm learned some X to Y mapping, and if the distribution of X changes, then we might need to retrain the learning algorithm by trying to align the distribution of X with the distribution of Y.

Batch normalization allows each layer of a network to learn by itself a little bit more independently of other layers.

  • We can use higher learning rates because batch normalization makes sure that there’s no activation that’s gone really high or really low. And by that, things that previously couldn’t get to train, it will start to train.

  • It reduces overfitting because it has a slight regularization effects. Similar to dropout, it adds some noise to each hidden layer’s activations. Therefore, if we use batch normalization, we will use less dropout, which is a good thing because we are not going to lose a lot of information. However, we should not depend only on batch normalization for regularization; we should better use it together with dropout.

Effect of Regularization

Regularization refers to the practice of constraining /regularizing the model from learning complex concepts, thereby reducing the risk of overfitting.

Regularization Methods

  • Dropout Regularization

  • L2 Regularization

  • L1 Regularization

Effects of Methods

  • Dropout has the best performance among other regularizers. Dropout has both weight regularization effect and induces sparsity.

  • L1 Regularization has a tendency to produce sparse weights whereas L2 Regularization produces small weights

  • Regularization hyper parameters for CONV and FC layers should tuned separately.

Effect of Batch size

We use mini-batches because it tends to converge more quickly, allow us to parallelize computations

What is Batch Size

Neural networks are trained to minimize a loss function of the following form:

Figure 1: Loss function. Adapted from Keskar et al [1].

Stochastic gradient descent computes the gradient on a subset of the training data, B_k, as opposed to the entire training dataset.

Usually small Batch size perform better

Training with small batch sizes tends to converge to flat minimizers that vary only slightly within a small neighborhood of the minimizer, whereas large batch sizes converge to sharp minimizers, which vary sharply [1]

  • Small batch sizes perform best with smaller learning rates, while large batch sizes do best on larger learning rates.

  • Linear scaling rule: when the minibatch size is multiplied by k, multiply the learning rate by k.

  • When the right learning rate is chosen, larger batch sizes can train faster, especially when parallelized.

Image foDeeplearning.ai: Why Does Batch Norm Work? (C2W3L06)
Image for post
Image for pFigure 2: Stochastic gradient descent update equation. Adapted from Keskar et al [1].ost
Figure 5: Training and validation loss curves for different batch sizes
Figure 23: Training and validation loss for different batch sizes, with adjusted learning rates for post
Reducing Internal Covariance Shift
LogoBatch Normalization: The Greatest Breakthrough in Deep LearningMedium
LogoEffect of Regularization in Neural Net TrainingMedium
LogoEffect of Batch Size on Neural Net TrainingMedium
https://arxiv.org/pdf/1502.03167v3.pdf