Activation Function

Activation Functions

Activation function or transfer function is used to determine the output of a neuron or node. It is a mathematical gate in between the input feeding the current neuron and its output going to the next layer.

In deep learning, we commonly use non-linear Activation Function include

  • Sigmoid: Output limit to [0 1]. But gives gradient vanishing problem, not used anymore

  • ReLU(rectified linear unit): most commonly used in CNN (hidden layers)

  • Others: Tanh, Leaky ReLU, Maxout...

Image from MIT Deeplearning Lecture
Cheat sheet of commonly used Activation Function

Output Activation Function

These functions are transformations we apply to vectors coming out from CNNs ( s ) before the loss computation. [ reference ]

Sigmoid

It squashes a vector in the range (0, 1). It is applied independently to each element of ss sisi. It’s also called logistic function.

Softmax

Softmax it’s a function, not a loss. It squashes a vector in the range (0, 1) and all the resulting elements add up to 1. It is applied to the output scores ss. As elements represent a class, they can be interpreted as class probabilities. The Softmax function cannot be applied independently to each sisi, since it depends on all elements of ss. For a given class sisi, the Softmax function can be computed as:

Where sjsj are the scores inferred by the net for each class in CC. Note that the Softmax activation for a class sisi depends on all the scores in ss.

Last updated

Was this helpful?