VGG
Last updated
Last updated
Very Deep Convolutional Networks for Large-Scale Image Recognition, Simonyan, Karen, and Andrew Zisserman. 2015 ICLR & preprint arXiv (2014)
VGGNet is invented by VGG (Visual Geometry Group) from University of Oxford. Though VGGNet is the 1st runner-up, of ILSVRC 2014 in the classification task. Accuracy error less than 10%
GoogLeNet is the winner of ILSVLC 2014. VGGNet beats the GoogLeNet and won the localization task in ILSVRC 2014.
Improved from AlexNet(-8) 16.4% to VGG-16 7.3%)
A very important paper on CNN. It uses only 3x3 CONV and many networks are based on VGG architecture.
Instead of large-size filters (such as 11×11, 7×7) as in AlexNet
Repeats 3 layers of 3x3 CONV instead of 1 time of 7x7. They both cover 7x7 receptive field
VGG has fewer parameters, more non-linearity.
VGG-16 (Conv1) obtains 9.4% error rate, which means the additional three 1×1 conv layers help the classification accuracy. 1×1 conv actually helps to increase non-linearlity of the decision function. Without changing the dimensions of input and output, 1×1 conv is doing the projection mapping in the same high dimensionality.
VGG-16 obtains 8.8% error rate which means the deep learning network is still improving by adding number of layers.
VGG-19 obtains 9.0% error rate which means the deep learning network is NOT improving by adding number of layers. Thus, authors stop adding layers.
Rescale from 224 to 256~512px. Then crop to 224px which contains the object fully or partially. Has the effect of data augmentation with scaling and translation, which helps to reduce overfitting.
Dataset: ImageNet of 256x256x3
Input: 224×224×3 image with data augmentation from multi-scaling
Batch normalization
Multinomial logistic regression loss
Mini-batch GD with momentum
Batch size: 256
Momentum v: 0.9
Weight Decay: 0.0005
Learning rate : 0.01 decreased factor of 10