Gradient descent is one of the most powerful optimizing method. However, learning time is a challange, too. Standard version of gradient descent learns slowly. That’s why, some modifications are included into the gradient descent in real world applications. These approaches would be applied to converge faster. In previous post, incorporating momentum is already mentioned to work for same purpose. Now, we will focus on applying adaptive learning to learn faster.

As you might remember, weights are updated by the following formula in back propagation.

w_{i} = w_{i} – α . (∂_{Error} / ∂_{wi})

Alpha refers to learning rate in the formula. Applying adaptive learning rate proposes to increase / decrease alpha based on cost changes. The following code block would realize this process.

To sum up, learning rate should be increased if cost decreases. In contrast, alpha should be decreased if cost increases. This procedure should be applied in each gradient descent iteration.

Let’s transform the theory to the practice. If adaptive learning rate is applied on classical xor dataset, cost value decreases dramatically whereas standard gradient descent decreases cost stable. Thus, network could converge faster as illustrated below. Of course, momentum incorporation and adaptive learning rate are applied both.

So, we’ve focused on a method that improve gradient descent performance. The experiment shows that this approach improves performance considerably. Moreover, it is easy to implement. Only a few code lines could perform this action. Finally, adaptive learning capability is added to neural-networks repository shared on my GitHub profile.