Optimization

Loss Functions

A loss function calculates the error between the prediction from the ground truth. It averages all error from all datasets.

Logistic Loss and Multinomial Logistic Loss are other names for Cross-Entropy loss.

In a binary classification problem, where C′=2C′=2, the Cross Entropy Loss can be defined also as

For Classification

Softmax Loss

Softmax with Cross-Entropy Loss is often used. If we use this loss, we will train a CNN to output a probability over the C classes for each image. It is used for multi-class classification.

Note that The Softmax function cannot be applied independently to each s i , since it depends on all elements of s . For a given class s i , the Softmax function can be computed as:

Binary Cross-Entropy Loss

Also called Sigmoid Cross-Entropy loss. It is a Sigmoid activation plus a Cross-Entropy loss. Unlike Softmax loss it is independent for each vector component (class), meaning that the loss computed for every CNN output vector component is not affected by other component values. It is used for multi-label classification

Further Reading

Optimization

We want to get the model weights(W) to minimize the value of loss function for accurate prediction. How can we change the model parameters during training? Optimizer helps to move along the slope(gradient) for min or max point.

Gradient Descent

Minimize objective function J(w) by updating parameter(w) in opposite direction of gradient of J(w). Following the negative gradient of the Objective Function to find the minimum value of loss. It control the step size by learning rate n

Finding the derivative: 1) analytical 2) numerical approach. If possible, use analytical approach for faster and accurate gradient.

Examples of Optimizer include

  • SGD (Stochastic Gradient Descent)

Often SGD is refered to Mini-batch Gradient Descent

  • Adagrad

  • Momentum•Adam

Further Reading

Last updated