In neural networks, as an alternative to sigmoid function, hyperbolic tangent function could be used as activation function. When you backpropage, derivative of activation function would be involved in calculation for error effects on weights. Derivative of hyperbolic tangent function has a simple form just like sigmoid function. This explains why hyperbolic tangent common in neural networks.
Hyperbolic Tangent Function: tanh(x) = (ex – e-x) / (ex + e-x)
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
The function produces outputs in scale of [-1, +1]. Moreover, it is continuous function. In other words, function produces output for every x value.
Derivative of Hyperbolic Tangent Function
Before we begin, let’s recall the quotient rule.
Suppose that function h is quotient of fuction f and function g. If derivatives exist for both function f and function h. Then derivative of function h would be demonstrated as following formula.
h(x) = f(x) / g(x)
d(h(x)) / dx = ( f'(x).g(x) – g'(x).f(x) ) / g(x)2
or d(h(x)) / dx = ( (df(x)/dx).g(x) – (dg(x)/dx).f(x) ) / g(x)2
So, we can adapt this rule for hyperbolic tanget function. Because we know that tangent function is quotient of sine and cosine functions.
tanh(x) = sinh(x) / cosh(x)
d(tanh(x))/dx = ( (d(sinh(x))/dx).cosh(x) – (d(cosh(x))/dx).sinh(x) ) / (cosh(x))2
Let’s calculate the derivative of sinh(x) and cosh(x)
sinh(x) = (ex – e-x) / 2
cosh(x) = (ex + e-x) / 2
d(sinh(x))/dx= d ((ex – e-x) / 2 ) / dx = d ( (ex/2) – (e-x/2) ) / dx = d(ex/2)/dx – d(e-x/2)/dx = (1/2).(d(ex)/dx) – (1/2).(d(e-x)/dx) = (1/2).ex – (1/2).e-x.(-1) = (1/2).ex + (1/2).e-x= (ex + e-x)/2 = cosh(x)
d(cosh(x))/dx = d((ex + e-x) / 2)/dx = d((ex/2+ e-x/2)/dx = d(ex/2)/dx + d(e-x/2)/dx = (1/2).d(ex)/dx + (1/2).d(e-x)/dx = (1/2).ex + (1/2).e-x.(-1) = (1/2).ex – (1/2).e-x =(ex-e-x)/2 = sinh(x)
Let’s back to calculation of tanh function
d(tanh(x))/dx = ( (d(sinh(x))/dx).cosh(x) – (d(cosh(x))/dx).sinh(x) ) / (cosh(x))2
d(tanh(x))/dx = ( cosh(x).cosh(x) – sinh(x).sinh(x) ) / (cosh(x))2
d(tanh(x))/dx = ( (cosh(x))2 – (sinh(x))2 ) / (cosh(x))2
d(tanh(x))/dx = 1 – (sinh(x))2/(cosh(x))2 = 1 – ( sinh(x)/cosh(x) )2
d(tanh(x))/dx = 1 – (tanh(x))2
To sum up, hyperbolic tangent function and its derivative are demonstrated as following formulas:
f(x) = (ex – e-x) / (ex + e-x)
d(f(x))/dx = 1 – (f(x))2
Proof of concept
If formulas confused you, you might want to look at step by step derivative calculation video
Let’s dance
These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂
Support this blog if you do like!
5 Comments
Comments are closed.