Softmax as a Neural Networks Activation Function

Convolutional neural networks popularize softmax so much as an activation function. However, softmax is not a traditional activation function. The other activation functions produce a single output for a single input whereas softmax produces multiple outputs for an input array. In this way, we can build neural networks models that can classify more than 2 classes instead of binary class solution.

Softmax function

σ(xj) = exj / (∑ (i=1 to n) exi ) (for j=1 to n)

Softmax normalizes the input array in scale of [0, 1]. Also, sum of the softmax outputs is always equal to 1. So, neural networks model classifies the instance as a class that have an index of the maximum output.

softmax1
Softmax function

For example, the following results will be retrieved when softmax is applied for the inputs above.

σ(x1) = ex1 / (ex1+ex2+ex3 ) = e2 / (e2 + e1+ e0.1) = 0.7

σ(x2) = ex2 / (ex1+ex2+ex3 ) = e1 / (e2 + e1+ e0.1) = 0.2

σ(x3) = ex3 / (ex1+ex2+ex3 ) = e0.1 / (e2 + e1+ e0.1) = 0.1

As seen, inputs normalized between [0, 1]. Also, sum of the results are equal to 0.7 + 0.2 + 0.1 = 1.

Derivative

You might remember that activation function is consumed in feedforward step whereas its derivative is consumed in backpropagation step. Now, we would find its partial derivative.

Quotient rule states that if a function can be expressed as a division of two differentiable functions, then its derivative can be expressed as illustrated below.

f(x) = g(x) / h(x)

f'(x) = (g'(x) h(x) – g(x) h'(x)) / h2(x)

We can apply quotient rule to softmax function

∂σ(xj) / ∂xj = [(exj)’.(∑ (i=1 to n) exi ) – (exj).(∑ (i=1 to n) exi )’ ] / (∑ (i=1 to n) exi )2

Firstly, you might remember that derivative of ex is ex again.

Secondly, we need to derive the sum term

∑ (i=1 to n) exi = ex1+ ex2+ … + exj + … + exn

∂∑ (i=1 to n) exi / ∂xj = 0 + 0 + … + exj + … + 0

Now, we would apply these derivatives to quetient rule applied softmax function

∂σ(xj) / ∂xj = [exj.(∑ (i=1 to n) exi ) – exj.exj ] / (∑ (i=1 to n) exi )2

We can apply the divisor to divident.

∂σ(xj) / ∂xj = exj.(∑ (i=1 to n) exi )/(∑ (i=1 to n) exi )2 – (exj.exj)/(∑ (i=1 to n) exi )2

∂σ(xj) / ∂xj = exj/(∑ (i=1 to n) exi ) – (exj.exj)/(∑ (i=1 to n) exi )2

∂σ(xj) / ∂xj = exj/(∑ (i=1 to n) exi ) -[exj/∑ (i=1 to n) exi]2

Would you realize that pure softmax function  appears in the equation above? Let’s apply replacement value.

σ(xj) = exj / (∑ (i=1 to n) exi )

∂σ(xj) / ∂xj = σ(xj) – σ(xj)2

∂σ(xj) / ∂xj = σ(xj).(1 – σ(xj))

That is the solution for derivative of the σ(xj) with respect to the (xj). So, what would be the derivative of the σ(xj) with respect to the (xk) in case j is not equal to k? Again we would apply quotient rule to the term.

σ(xj) = exj / (∑ (i=1 to n) exi ) (for j=1 to n)

∂σ(xj) / ∂xk = [(exj)’.(∑ (i=1 to n) exi) – (exj).(∑ (i=1 to n) exi )’]/(∑ (i=1 to n) exi )2

(exj)’ = ∂exj/∂xk = 0 (Because exj is constant for xk)

∂σ(xj) / ∂xk = [0.(∑ (i=1 to n) exi) – (exj).(∑ (i=1 to n) exi )’]/(∑ (i=1 to n) exi )2

∑ (i=1 to n) exi =  ex1+ ex2+ … + exk + … + exn

(∑ (i=1 to n) exi )’ = ∂(∑ (i=1 to n) exi)/∂xk = 0 + 0 + … + exk + 0 = exk

∂σ(xj) / ∂xk = [0.(∑ (i=1 to n) exi) – (exj).(exk)]/(∑ (i=1 to n) exi )2

∂σ(xj) / ∂xk = – (exj).(exk)/(∑ (i=1 to n) exi )2

∂σ(xj) / ∂xk = – [(exj).(exk)]/[∑ (i=1 to n) exi ∑ (i=1 to n) exi]

∂σ(xj) / ∂xk = – [exj/∑ (i=1 to n) exi][exk/∑ (i=1 to n) exi]

∂σ(xj) / ∂xk = – σ(exj).σ(exk)

Let’s put the derivatives in general form

σ(xj) = exj / (∑ (i=1 to n) exi ) (for j=1 to n)

∂σ(xj) / ∂xk = σ(xj).(1 – σ(xj)), if if j = k

∂σ(xj) / ∂xk = – σ(exj).σ(exk), if j != k

So, derivative of softmax function is easy to demonstrate surprisingly. We mostly consume softmax function in convolutional neural networks final layer. Because, CNN is very good at classifying image based things and classification studies mostly include more than 2 classes.

2 Comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s