Activation functions play pivotal role in neural networks. As an alternative to hyperbolic tangent, softsign is an activation function for neural networks. Even though tanh and softsign functions are closely related, tanh converges exponentially whereas softsign converges polynomially. Even though softsign appears in literature, it would not be adopted in practice as much as tanh.
Softsign function: y = x / (1 + |x|)
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
The both tanh and softsign functions produce outputs in scale of [-1, +1].
Derivative
We need the function’s partial derivative to backpropagate.
Remember quotient rule first. We would apply quotient rule to the function
dy/dx = [x’.(1 + |x|) – x.(1 + |x|)’] / (1 + |x|)2
We have to find derivative of absolute x first. We’ve already known that the following statements are true
Before that we’ve already known that when x > 0, then |x|’ = 1
when x < 0, then |x|’ = -1
when x = 0, then |x|’= undefined because it has no slope in this point.
So |x|’ would be ±1 for x ≠0. We would express differently that result
|x|’ = dy/dx = (x2)1/2
dy/dx = (1/2).(x2)1/2 – 1 .2x= (1/2). (x2)-1/2 .2x = (1/2).[1/(x2)1/2].2x
We’ve already known that squared root of x squared is equal to the absolute x.
Let’s replace the following term (x2)1/2 = |x| in the equation above
dy/dx = (1/2).[1/|x|].2x = x/|x|
So, derivative of absolute x is equal to x over absolute x
|x|’ = x /|x|
Put the derivative term in main equation
dy/dx = [1.(1 + |x|) – x.(0 + |x|’)] / (1 + |x|)2
dy/dx = [1.(1 + |x|) – x.(0 + x/|x|)] / (1 + |x|)2
dy/dx = [(1 + |x|) – (x2 /|x|)] / (1 + |x|)2
dy/dx = [(1 + |x|- (x2 /|x|)] / (1 + |x|)2
Let’s put +3 and -3 to the x for the equation |x|- (x2 /|x|)
for x = +3 -> |3| – 32/|3| = 0
for x = -3 -> |-3| – [(-3)*(-3)/|-3|] = 3 – 9/3 = 0
The equation |x|- (x2 /|x|) is equal to the 0 for positive and negative values.
dy/dx = [(1 + |x|- (x2 /|x|)] / (1 + |x|)2
dy/dx = [1 + 0] / (1 + |x|)2
dy/dx = 1 / (1 + |x|)2
So, softsign is one of the dozens of activation functions. Maybe it would not be adopted by professionals and this makes it uncommon. But do not forget that choice of the activation function is state-of-the-art. It might be the most convenient transfer function for your problem.
Let’s dance
These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂
Support this blog if you do like!