A Gentle Introduction to Convolutional Neural Networks

Convolutional neural networks (aka CNN and ConvNet) are modified version of traditional neural networks. These networks have wide and deep structure therefore they are also known as deep neural networks or deep learning. Nowadays, they are so popular because they are also good at classifying image based things.

All image pixels were transferred to neural networks in traditional approach. In contrast, some patterns are detected first, and these patterns are fed to neural networks in convolutional neural networks. In this way, images can be processed in less complex way whereas the more successful results can be produced.

Tradional neural networks vs convolutional neural networks

Suppose that you would like to classify images as cat or not cat. In the most general action, shapes of ear and tail of cats are extracted from known ones. Then,  these shapes would be looked for in new instances. These shapes are also called as filters in CNN.


Sample filter for cat detection in CNN

We would classify ones as cat if they include ear and tail filters as illustrated below.

Cat classification in CNN

Of course, math behind the convolutional neural networks is not limited with applying filters.

CNN procedures

Convolution layer

In this layer, image size would be reduced based on the filter size. For example, 3×3 sized convolved feature would be created when 3×3 sized filter would be applied to 5×5 sized image in the following animation. x1 items are equal to 1 whereas x0 items are equal to 0 in the filter. Colvolved feature is calculated by multiplying filter and same size frame in the image.


This animation is applied for one filter. Several filters should be applied to the image. In this way, we can detect different patterns (e.g. different filters for both cat ear and tail). That’s why, 2-dimensional image would be transformed to 3-dimensional convolution object in CNN procedures illustration.

Activation layer

ReLU produces outputs faster than other common activation functions for both feed forward and backpropagation. That’s why, it is commonly consumed in convolutional networks as activation function.


On the other hand, softmax function should be consumed to connect fully connected neural networks output layer and CNN output layer.

Max-pooling layer

Features reduced again in pooling layer. Convolved layer’s maximum valued items in the filter size frames would be stored in this layer.

Max pooling

Convolution, activation and max-pooling procedures should be applied several times. After then, reduced features would be flattened to feed fully connected neural networks as inputs.

Neural networks layer

This layer is typical neural networks layer. Backpropagation algorithm would handle learning from now on. You might realize that neural networks model is not fully connected in the following illustration. Only meaningful nodes involved in the model.

Now, machine can respond that the following image would be most probably dog, even I cannot!

Is this a cat or dog?

But CNNs are not perfect. They only look for some patterns. They do not interested in about that the locations of these patterns are meaningful or not. For example, changing ear and lip of Kim Kardesian doesn’t change the result.

Kim Kardesian

So, convolutional neural networks are powerful deep learning method for pixel based things. They produce successful results even on single CPUs. The most succefful approach for detecting handwritten digits is still ConvNets.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s