ELU as a Neural Networks Activation Function

Recently a new activation function named Exponential Linear Unit or its widely known name ELU was introduced. Researchs reveal that the function tend to converge cost to zero faster and produce more accurate results. Different to other activation functions, ELU has a extra alpha constant which should be positive number.

elu-move-v2
ELU Dance Move (Inspired from Imaginary)

ELU is very similiar to ReLUexcept negative inputs. They are both in identity function form for non-negative inputs. On the other hand, ELU becomes smooth slowly until its output equal to -α whereas RELU sharply smoothes. Notice that α is equal to +1 in the following illustration.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

elu-and-relu
ELU and RELU functions are very similar

Derivative

Derivative of activation function is fed to backprogapagation algorithm during learning. That’s why, both the function and its derivative should have low computation cost.

f(x) = x if x ≥ 0 (identity function)

f(x) = α.( ex – 1) if x < 0

As seen, ELU consists of two different equations. That’s why, their derivatives should be calculated seperately.

Firstly, derivative of ELU would be one for x is greater than or equal to zero because derivative of identity function is always one.

On the other hand, what would be the derivative of ELU if x is less than zero? It’s easy.

y = α.( ex – 1) = α.ex – α

dy/dx = α.ex





That’s the derivative but it can be expressed in simpler way. Adding and removing α to the derivative term wouldn’t change the result. In this way, we can express the derivative in simpler way.

dy/dx = α.ex – α + α = (α.ex – α) + α = y + α

y = α.( ex – 1)

dy/dx = y + α

elu-derivative
ELU and its derivative

So, ELU is a strong alternative to ReLU as an activation function in neural networks. The both functions are common in convolutional neural networks. Unlike to ReLU, ELU can produce negative outputs.

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂


Like this blog? Support me on Patreon

Buy me a coffee