Leaky ReLU as an Neural Networks Activation Function

Convolutional neural networks make ReLU activation function so popular. Common alternatives such as sigmoid or tanh have upper limits to saturate whereas ReLU doesn’t saturate for positive inputs. However, it still tends to saturate for negative inputs. Herein, we will do a small modification and the function will produce a constant times input value for negative inputs. In this way, function would not saturate for both direction.

Parametric ReLU or PReLU has a general form. It produces maximum value of x and αx. Additionaly, customized version of PReLU is Leaky ReLU or LReLU. Constant multiplier α is equal to 0.1 for this customized function. Some sources mention that constant alpha as 0.01. Finally, Randomized ReLU picks up random alpha value for each session.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

leaky-relu-v2 — Leaky ReLU Dance Move (Inspired from Imaginary)

Function

We will handle feed forward of PReLU as coded below.

def leaky_relu(alpha, x):
 if x&amp;amp;amp;amp;amp;amp;amp;lt;=0:
  return x
 else:
  return alpha * x

Graph is demonstrated below.

Derivative

Similarly, derivative of the function is alpha for negative values whereas one for positive inputs. We’ll calculate the derivative as coded below. So, derivative of the PReLU is very similar to step function.

def derive_leaky_relu(alpha, x):
 if x&amp;gt;=0:
  return 1
 else:
  return alpha

Multiplying small numbers to small small numbers produces much smaller number. So, It may cause trouble in recurrent neural networks if constant multiplier is picked up between [0, 1] just like in LReLU. Because an unit is connected to itself in RNN and output of an unit is also input for itself.

Conclusion

Even though, its contribution on that field is a matter of debate and such researchers believe that it is unstable, some recent researches revealed that consuming PReLU improves converging performance systematically. Remember that constructing neural networks structure including optimal activation function is still state-of-the-art and no one can cross out without testing.

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂

Support this blog financially if you do like!

Leaky ReLU as a Neural Networks Activation Function

Function

Derivative

Conclusion

Let’s dance

Related

Leave a Reply Cancel reply

Function

Derivative

Conclusion

Let’s dance

Related

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil