Activation Function
Activation Functions
Activation function or transfer function is used to determine the output of a neuron or node. It is a mathematical gate in between the input feeding the current neuron and its output going to the next layer.

In deep learning, we commonly use non-linear Activation Function include
Sigmoid: Output limit to [0 1]. But gives gradient vanishing problem, not used anymore
ReLU(rectified linear unit): most commonly used in CNN (hidden layers)
Others: Tanh, Leaky ReLU, Maxout...


Output Activation Function
These functions are transformations we apply to vectors coming out from CNNs ( s ) before the loss computation. [ reference ]
Sigmoid
It squashes a vector in the range (0, 1). It is applied independently to each element of ss sisi. It’s also called logistic function.
Softmax
Softmax it’s a function, not a loss. It squashes a vector in the range (0, 1) and all the resulting elements add up to 1. It is applied to the output scores ss. As elements represent a class, they can be interpreted as class probabilities.
The Softmax function cannot be applied independently to each sisi, since it depends on all elements of ss. For a given class sisi, the Softmax function can be computed as:
Where sjsj are the scores inferred by the net for each class in CC. Note that the Softmax activation for a class sisi depends on all the scores in ss.
Last updated
Was this helpful?