Handling Overfitting with Dropout in Neural Networks

Overfitting is trouble maker for neural networks. Designing too complex neural networks structure could cause overfitting. So, dropout is introduced to overcome overfitting problem in neural networks. This technique proposes to drop nodes randomly during training. Clinical tests reveal that dropout reduces overfitting significantly.

disapperaing-mcflys
Dropout of McFly Family (Back to the Future)

Neural networks, particularly Deep Neural Networks or Deep Learning have wide and deep structure. Even though these units provide to solve many non-linear problems, they might cause to fall into overfitting. Overfitting explains learning too much on training data. Memorizing trainin data may you loss your way for unknown examples. Instead of re-designing the network structure, dropping-out would gain victory.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Procedure

Applying this technique is very easy task. You need to ignore some units randomly while training the network. You should ignore when you back propagate and feed forward. In this way, you can prevent overfitting. The operation includes both dropping units and their connections. Dropped units can be located in both hidden layer or input / output layer. Additionally, training time reduces dramatically.

dropping-out.png
Dropout operation

Dropout can be applied in Keras easily. You might already apply it several times if you follow this blog. It is enough to mention dropout ratio for layers as demonstrated below. It says that layer would be droped the ratio of 20%.

I will apply dropout to basic XOR example

model = Sequential()
model.add(Dense(3 #num of hidden units
 , input_shape=(len(attributes[0]),))) #num of features in input layer
model.add(Activation('sigmoid')) #activation function from input layer to 1st hidden layer
model.add(Dropout(0.2))
model.add(Dense(len(labels[0]))) #num of classes in output layer
model.add(Activation('softmax')) #activation function from 1st hidden layer to output layer

We can compare loss change for both raw model and dropout applied version.


plt.plot(rawmodel_score.history['loss'])
plt.plot(dropout_score.history['loss'])

Even though, drop-out applied model shows instability, it can reduce the loss significantly for some epoch values. On the other hand, raw model cannot get closer.

Effect on xor problem

So, dropout imposes that small is beautiful even for the most complex systems. You should not overrate it but you should consume it. It is basically a regularization technique. Today, most complicated neural networks models such as Inception V3 include droput units.

Bonus: The idea of dropout is firstly by Hinton. You might be familiar with this name if you interested in AI. He is the one who created back-propagation in neural networks and deep neural networks concepts. He is also called as Godfather of AI. This technique draws my attention because of its creator and inventor.






Like this blog? Support me on Patreon

Buy me a coffee