Softsign as a Neural Networks Activation Function

Activation functions play pivotal role in neural networks. As an alternative to hyperbolic tangent, softsign is an activation function for neural networks. Even though tanh and softsign functions are closely related, tanh converges exponentially whereas softsign converges polynomially. Even though softsign appears in literature, it would not be adopted in practice as much as tanh.

softsign_dance — Softsign function dance move (Inspired from Imaginary)

Softsign function: y = x / (1 + |x|)

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

The both tanh and softsign functions produce outputs in scale of [-1, +1].

Derivative

We need the function’s partial derivative to backpropagate.

Remember quotient rule first. We would apply quotient rule to the function

dy/dx = [x’.(1 + |x|) – x.(1 + |x|)’] / (1 + |x|)²

We have to find derivative of absolute x first. We’ve already known that the following statements are true

Before that we’ve already known that when x > 0, then |x|’ = 1

when x < 0, then |x|’ = -1

when x = 0, then |x|’= undefined because it has no slope in this point.

So |x|’ would be ±1 for x ≠0. We would express differently that result

|x|’ = dy/dx = (x²)^1/2

dy/dx = (1/2).(x²)^{1/2 – 1} .2x= (1/2). (x²)^-1/2 .2x = (1/2).[1/(x²)^1/2].2x

We’ve already known that squared root of x squared is equal to the absolute x.

Let’s replace the following term (x²)^1/2 = |x| in the equation above

dy/dx = (1/2).[1/|x|].2x = x/|x|

So, derivative of absolute x is equal to x over absolute x

|x|’ = x /|x|

Put the derivative term in main equation

dy/dx = [1.(1 + |x|) – x.(0 + |x|’)] / (1 + |x|)²

dy/dx = [1.(1 + |x|) – x.(0 + x/|x|)] / (1 + |x|)²

dy/dx = [(1 + |x|) – (x² /|x|)] / (1 + |x|)²

dy/dx = [(1 + |x|- (x² /|x|)] / (1 + |x|)²

Let’s put +3 and -3 to the x for the equation |x|- (x² /|x|)

for x = +3 -> |3| – 3²/|3| = 0

for x = -3 -> |-3| – [(-3)*(-3)/|-3|] = 3 – 9/3 = 0

The equation |x|- (x² /|x|) is equal to the 0 for positive and negative values.

dy/dx = [(1 + |x|- (x² /|x|)] / (1 + |x|)²

dy/dx = [1 + 0] / (1 + |x|)²

dy/dx = 1 / (1 + |x|)²

softsign-and-derivative — Softsign and its derivative

So, softsign is one of the dozens of activation functions. Maybe it would not be adopted by professionals and this makes it uncommon. But do not forget that choice of the activation function is state-of-the-art. It might be the most convenient transfer function for your problem.

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂

Like this blog? Support me on Patreon

Derivative

Let’s dance

Related