Recently a new activation function named Exponential Linear Unit or its widely known name ELU was introduced. Researchs reveal that the function tend to converge cost to zero faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha constant which should be positive number.
ELU is very similiar to RELU except negative inputs. They are both in identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes. Notice that α is equal to +1 in the following illustration.
Derivative of activation function is fed to backprogapagation algorithm during learning. That’s why, both the function and its derivative should have low computation cost.
f(x) = x if x ≥ 0 (identity function)
f(x) = α.( ex – 1) if x < 0
As seen, ELU consists of two different equations. That’s why, their derivatives should be calculated seperately.
Firstly, derivative of ELU would be one for x is greater than or equal to zero because derivative of identity function is always one.
On the other hand, what would be the derivative of ELU if x is less than zero? It’s easy.
y = α.( ex – 1) = α.ex – α
dy/dx = α.ex
That’s the derivative but it can be expressed in simpler way. Adding and removing α to the derivative term wouldn’t change the result. In this way, we can express the derivative in simpler way.
dy/dx = α.ex – α + α = (α.ex – α) + α = y + α
y = α.( ex – 1)
dy/dx = y + α
So, ELU is a strong alternative to ReLU as an activation function in neural networks. The both functions are common in convolutional neural networks. Unlike to ReLU, ELU can produce negative outputs.