Softplus as a Neural Networks Activation Function

Activation unit calculates the net output of a neural cell in neural networks. Backpropagation algorithm multiplies the derivative of the activation function. That’s why, picked up activation function has to be differentiable. For example, step function is useless in backpropagation because it cannot be backpropageted. That is not a must, but scientists tend to consume activation functions which have meaningful derivatives. That’s why, sigmoid and hyperbolic tangent functions are the most common activation functions in literature. Herein, softplus is a newer function than sigmoid and tanh. It is firstly introduced in 2001. Softplus is an alternative of traditional functions because it is differentiable and its derivative is easy to demonstrate. Besides, it has a surprising derivative!

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Softplus function: f(x) = ln(1+e^x)

And the function is illustarted below.

Outputs produced by sigmoid and tanh functions have upper and lower limits whereas softplus function produces outputs in scale of (0, +∞). That’s the essental difference.

Derivative

You might remember the derivative of ln(x) is 1/x. Let’s adapt this rule to softplus function.

f'(x) = dy/dx = (ln(1+e^x))’ = (1/(1+e^x)).(1+e^x)’ = (1/(1+e^x)). e^x = e^x / (1+e^x)

So, we’ve calculated the derivative of the softplus function. However, we can transform this derivative to alternative form. Let’s express the denominator as multiplier of e^x.

dy/dx = e^x / (1+e^x) = e^x / ( e^x.(e^-x + 1) )

Then, numerator and denominator both include e^x. We can simplify the fraction.

dy/dx = 1 / (1 + e^-x)

So, that’s the derivative of softplus function in simpler form. You might notice that the derivative is equal to sigmoid function. Softplus and sigmoid are like russian dolls. They placed one inside another!

To sum up, the following equation and derivate belong to softplus function. We can consume softplus as an activation function in our neural networks models.

f(x) = ln(1+e^x)

dy/dx = 1 / (1 + e^-x)

Proof of concept

Additionally, I’ve captured step by step derivative calculation video for softplus function.

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂

Support this blog financially if you do like!

4 Comments

Apollys says:

July 16, 2018 at 10:59 pm

Your graph of the softplus function is a little misleading, what it’s actually showing is a scaled version, something like:

5y = ln(1 + e^(5x) )

Log in to Reply
1. Sefik Serengil says:
  
  July 17, 2018 at 1:20 am
  
  The function 5y = ln(1 + e^(5x)) is differentiable function, but it is not softplus. Why you think it is misleading?
  
  Log in to Reply
  1. Soft Plush says:
    
    May 14, 2019 at 11:05 pm
    
    Because softplus(0) = ln(1+e^0) = ln(2) ~ 0.693, but on the graph it’s somewhere around 0.1 – 0.2, and softplus(1) ~ 1.3, but on the graph, it’s much closer to 1 than 1.5.
    
    Log in to Reply

Softplus as a Neural Networks Activation Function

Derivative

Proof of concept

Let’s dance

Related

4 Comments

Leave a Reply Cancel reply

Derivative

Proof of concept

Let’s dance

Related

4 Comments

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil