Hyperbolic Secant As Neural Networks Activation Function

Hyperbolic functions are common activation functions in neural networks. Previously, we have mentioned hyperbolic tangent as activation function. Now, we will mention a less common one. Hyperbolic secant or as its acronym sech(x) appears mostly probability theory and Gaussian distributions in statistics. We can additionally consume the function as neural networks activation function.

sech
Dance move for hyperbolic secant

Some resources mention the function as inverse of hyperbolic cosine or inverse-cosh. Remember the formula of hyperbolic cosine.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

y = 1 / cosh(x) where cosh(x) = (ex + e-x)/2

So, pure form of the function is formulated below.

y = 2 / (ex + e-x)

The function produces outputs in scale of [0,1]. Output decreases and closes to neutral when x goes to infinite. However, it will never produce 0 output even for very large inputs except ±∞.

sechx
Graph of sech(x)

Derivative of hyperbolic secant

Hyperbolic secant formula contributes to feed forward step in neural networks. On the other hand, derivative of the function will be involved in back propagation.

dy/dx = 2 . (ex + e-x)-1

dy/dx = 2.(-1).(ex + e-x)-2.[d(ex + e-x)/dx]

dy/dx = 2.(-1).(ex + e-x)-2.(ex + (-1).e-x) = 2.(-1).(ex + e-x)-2.(ex – e-x)





dy/dx = (-2).(-ex + e-x)/(ex + e-x)2

dy/dx = 2.(-ex + e-x)/(ex + e-x)2

Or we can rearrange the derivative into simpler form. Adding and substracting ex to numerators would not change the result.

dy/dx = 2.(e– e– ex + e-x)/(ex + e-x)2 = 2.(e+ e-x -e– ex )/(ex + e-x)2

dy/dx = 2(e+ e-x)/[(e+ e-x).(e+ e-x)] -2.(ex + ex)/(ex + e-x)2

dy/dx = 2/(e+ e-x) – (2.2ex)/(ex + e-x)(ex + e-x)

dy/dx = 2/(e+ e-x) – 2[2/(ex + e-x)].[ex/(ex + e-x)]

You might realize that the term above contains hyperbolic secant function. Put y instead of 2/(e+ e-x).

dy/dx = y – 2y.[ex/(ex + e-x)]

sech-graph
Graph for hyperbolic secant and its derivative

Notice that both the function and its derivative have high computation cost.





Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂


Like this blog? Support me on Patreon

Buy me a coffee