Hyperbolic Secant As Neural Networks Activation Function

Hyperbolic functions are common activation functions in neural networks. Previously, we have mentioned hyperbolic tangent as activation function. Now, we will mention a less common one. Hyperbolic secant or as its acronym sech(x) appears mostly probability theory and Gaussian distributions in statistics. We can additionally consume the function as neural networks activation function.

Some resources mention the function as inverse of hyperbolic cosine or inverse-cosh. Remember the formula of hyperbolic cosine.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

y = 1 / cosh(x) where cosh(x) = (e^x + e^-x)/2

So, pure form of the function is formulated below.

y = 2 / (e^x + e^-x)

The function produces outputs in scale of [0,1]. Output decreases and closes to neutral when x goes to infinite. However, it will never produce 0 output even for very large inputs except ±∞.

Derivative of hyperbolic secant

Hyperbolic secant formula contributes to feed forward step in neural networks. On the other hand, derivative of the function will be involved in back propagation.

dy/dx = 2 . (e^x + e^-x)^-1

dy/dx = 2.(-1).(e^x + e^-x)^-2.[d(e^x + e^-x)/dx]

dy/dx = 2.(-1).(e^x + e^-x)^-2.(e^x + (-1).e^-x) = 2.(-1).(e^x + e^-x)^-2.(e^x – e^-x)

dy/dx = (-2).(-e^x + e^-x)/(e^x + e^-x)²

dy/dx = 2.(-e^x + e^-x)/(e^x + e^-x)²

Or we can rearrange the derivative into simpler form. Adding and substracting e^x to numerators would not change the result.

dy/dx = 2.(e^x– e^x– e^x + e^-x)/(e^x + e^-x)² = 2.(e^x+ e^-x -e^x– e^x )/(e^x + e^-x)²

dy/dx = 2(e^x+ e^-x)/[(e^x+ e^-x).(e^x+ e^-x)] -2.(e^x + e^x)/(e^x + e^-x)²

dy/dx = 2/(e^x+ e^-x) – (2.2e^x)/(e^x + e^-x)(e^x + e^-x)

dy/dx = 2/(e^x+ e^-x) – 2[2/(e^x + e^-x)].[e^x/(e^x + e^-x)]

You might realize that the term above contains hyperbolic secant function. Put y instead of 2/(e^x+ e^-x).

dy/dx = y – 2y.[e^x/(e^x + e^-x)]

sech-graph — Graph for hyperbolic secant and its derivative

Notice that both the function and its derivative have high computation cost.

Let’s dance

These are the dance moves of the most common activation functions in deep learning. Ensure to turn the volume up 🙂

Support this blog financially if you do like!

Hyperbolic Secant As Neural Networks Activation Function

Derivative of hyperbolic secant

Let’s dance

Related

Leave a Reply Cancel reply

Derivative of hyperbolic secant

Let’s dance

Related

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil