Sigmoid Function as Neural Network Activation Function

Sigmoid function (aka logistic function) is moslty picked up as activation function in neural networks. Because its derivative is easy to demonstrate. It produces output in scale of [0 ,1] whereas input is meaningful between [-5, +5]. Out of this range produces same outputs. In this post, we’ll mention the proof of the derivative calculation.

sigmoid_dance
Sigmoid dance move (Imaginary)


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Sigmoid function is formulized in the following form:

f(x) = 1 / (1 + e-x)

sigmoid-function
Sigmoid Function

 

The function could also be demonstrated as the following equation. Divisor would be illustarated as dividend.

f(x) = (1) . (1 + e-x)-1 = (1 + e-x)-1

Derivative of the sigmoid function

d f(x) / dx = (-1) . ((1 + e-x)-1-1). d(1 + e-x)/dx

d f(x) / dx = (-1) . ((1 + e-x)-2) . (e-x) . d (-x)/dx

d f(x) / dx = (-1) . ((1 + e-x)-2) . (e-x) . (-1)





d f(x) / dx = (e-x) / ((1 + e-x)2)

That’s the derivative of the sigmoid function. However, it could be demonstrated in simpler form. Let’s 1 append plus and minus 1 to dividend, in this way the result would not be changed.

d f(x) / dx = (e-x +1 -1) / (1 + e-x)2

d f(x) / dx = [(1 + e-x)/ (1 + e-x)2 ]- [1 / (1 + e-x)2]

d f(x) / dx = [(1 + e-x)/ (1 + e-x)2 ]- [1 / (1 + e-x)2]

d f(x) / dx = [1/ (1 + e-x) ]- [1 / (1 + e-x)2]

d f(x) / dx = [1/ (1 + e-x) ]- [1 / (1 + e-x)].[1 / (1 + e-x)]

d f(x) / dx = (1/ (1 + e-x)) . [1 – (1 / (1 + e-x))]

If f(x) is put instead of 1 / (1 + e-x) on the equation above, then the formula would be demonstrated as:

d f(x) / dx = f(x) . (1 – f(x))





To sum up, sigmoid function and its derivative are illusrated as following formulas

f(x) = 1 / (1 + e-x)

d f(x) / dx = f(x) . (1 – f(x))

Proof of concept

Those formulas might confuse you, step by step derivative calculation video may contribute to be understood

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂

 


Support this blog if you do like!

Buy me a coffee      Buy me a coffee


12 Comments

    1. You might try to manipulate the equation as 1/(1+e^-n*x) and try different n values e.g. 7 or 9. This modification will change the slope of equation and it still has a derivative, but it will not be sigmoid anymore.

      Alternatively, I recommend you to research tanh – hyperbolic tangent function.

Comments are closed.