🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy


As you might remember, weights are updated by the following formula in back propagation.
wi = wi – α . (∂Error / ∂wi)
Alpha refers to learning rate in the formula. Applying adaptive learning rate proposes to increase / decrease alpha based on cost changes. The following code block would realize this process.
a = 0.1, b = 0.5; //adaptive learning params if(currentCost < previousCost){ learningRate = learningRate + a; } else{ learningRate = learningRate - b * learningRate; } previousCost = currentCost * 1;
To sum up, learning rate should be increased if cost decreases. In contrast, alpha should be decreased if cost increases. This procedure should be applied in each gradient descent iteration.
Converge
Let’s transform the theory to the practice. If adaptive learning rate is applied on classical xor dataset, cost value decreases dramatically whereas standard gradient descent decreases cost stable. Thus, network could converge faster as illustrated below. Of course, momentum incorporation and adaptive learning rate are applied both.

So, we’ve focused on a method that improve gradient descent performance. The experiment shows that this approach improves performance considerably. Moreover, it is easy to implement. Only a few code lines could perform this action. Finally, adaptive learning capability is added to neural-networks repository shared on my GitHub profile.
Support this blog if you do like!
1 Comment