ReLU as Neural Networks Activation Function

Rectifier linear unit or its more widely known name as ReLU becomes popular for the past several years since its performance and speed. In contrast to other common activation functions, ReLU is a linear function. In other words, its derivative is either 0 or 1. However, you might remember that derivative of activation functions are included in backpropagation. So, what makes ReLU different from other linear functions?

relu
ReLU function dance move (Imaginary)

ReLU function produces 0 when x is less than or equal to 0 whereas it would be equal to x when x is greater than 0. The function output can be generalized as max(0, x).

relu-graph
ReLU function

Previously, we’ve mentioned on softplus function. The secret is that ReLU function is very similar to softplus function except near 0. Moreover, smoothing ReLU arises softplus function as illustrated below.

relu-and-softplus
ReLU vs Softplus

Pros

Sigmoid function produces outputs in scale of [0, +1]. Similarly, tanh function produces results in scale of [-1, +1]. These functions would produce same results when they increased or decreased dramatically. This means that gradient of these functions would be equal for differet positive or negative large values. This reveals that the gradient of these functions vanishes as x value is increased or decreased. However, gradient vanishing problem is destroyed for ReLU. Because its derivative is 1 when x is greater than 0 and its derivative is 0 when x is less than or equal to 0.

What’s more, the dataset must be normalized if output of activation function has upper and lower limit. We can skip this task for ReLU based systems. Because, the function produces outputs in scale of [0, +∞).

Finally, calculation of the function result and gradient is easy task because it does not include exponential calculations. Thus, the both feed forward and back progate steps can be processed fastly. That’s why, experiments show ReLU is six times faster than other well known activation functions. That is the reason why ReLU is commonly used in convolutional neural networks.

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s