A Gentle Introduction to Convolutional Neural Networks

Convolutional neural networks (aka CNN and ConvNet) are modified version of traditional neural networks. These networks have wide and deep structure. Therefore, they call them as deep neural networks or deep learning. Nowadays, they are so popular because they are also good at classifying image based things.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Transferring all image pixels to neural networks is the traditional approach. In contrast, we will detect some patterns first, and feeding these patterns to neural networks. In this way, we can process images in less complex way whereas we can get more successful results.

Suppose that you would like to classify images as cat or not cat. In the most general action, shapes of ear and tail of cats are extracted from known ones. Then, we will look for these shapes (filters) in new instances.

 

cnn-filter
Sample filter for cat detection in CNN

We would classify ones as cat if they include ear and tail filters as illustrated below.

cat-classification
Cat classification in CNN

Of course, math behind the convolutional neural networks is not that easy.

1508999490138
CNN procedures

Convolution layer

In this layer, we will reduce the image size based on the filter size. For example, 3×3 sized convolved feature would be created when 3×3 sized filter would be applied to 5×5 sized image in the following animation. x1 items are equal to 1 whereas x0 items are equal to 0 in the filter. Colvolved feature is calculated by multiplying filter and same size frame in the image.

Convolution_animation
Convolution

This animation shows applying one filter. We should apply several filters to the image. In this way, we can detect different patterns (e.g. different filters for both cat ear and tail). That’s why, 2-dimensional image would be transformed to 3-dimensional convolution object in CNN procedures illustration.





Activation layer

ReLU produces outputs faster than other common activation functions for both feed forward and backpropagation. That’s why, CNN mostly consumes ReLU as an activation function.

relu-graph
ReLU

On the other hand, softmax function should be consumed to connect fully connected neural networks output layer and CNN output layer.

Max-pooling layer

Features reduced again in pooling layer. We need to store convolved layer’s maximum valued items in the filter size frames in this layer.

maxpool
Max pooling

We should apply convolution, activation and max-pooling procedures several times. After then, we will flatten reduced features to feed fully connected neural networks as inputs.

Neural networks layer

This layer is typical neural networks layer. Backpropagation algorithm would handle learning from now on. You might realize that neural networks model is not fully-connected in the following illustration. Only meaningful nodes involved in the model.

Now, machine can respond that the following image would be most probably dog, even I cannot!

cat-or-dog
Is this a cat or dog?

Is It Perfect?

But CNNs are not perfect. They only look for some patterns. They do not interested in about that the locations of these patterns are meaningful or not. For example, changing ear and lip of Kim Kardesian doesn’t change the result.





kim-kardesian-cnn
Kim Kardesian

So, convolutional neural networks are powerful deep learning method for pixel based things. They produce successful results even on single CPUs. The most succefful approach for detecting handwritten digits is still ConvNets.


Like this blog? Support me on Patreon

Buy me a coffee