Convolutional neural networks (aka CNN and ConvNet) are modified version of traditional neural networks. These networks have wide and deep structure therefore they are also known as deep neural networks or deep learning. Nowadays, they are so popular because they are also good at classifying image based things.
All image pixels were transferred to neural networks in traditional approach. In contrast, some patterns are detected first, and these patterns are fed to neural networks in convolutional neural networks. In this way, images can be processed in less complex way whereas the more successful results can be produced.
Suppose that you would like to classify images as cat or not cat. In the most general action, shapes of ear and tail of cats are extracted from known ones. Then, these shapes would be looked for in new instances. These shapes are also called as filters in CNN.
We would classify ones as cat if they include ear and tail filters as illustrated below.
Of course, math behind the convolutional neural networks is not limited with applying filters.
In this layer, image size would be reduced based on the filter size. For example, 3×3 sized convolved feature would be created when 3×3 sized filter would be applied to 5×5 sized image in the following animation. x1 items are equal to 1 whereas x0 items are equal to 0 in the filter. Colvolved feature is calculated by multiplying filter and same size frame in the image.
This animation is applied for one filter. Several filters should be applied to the image. In this way, we can detect different patterns (e.g. different filters for both cat ear and tail). That’s why, 2-dimensional image would be transformed to 3-dimensional convolution object in CNN procedures illustration.
On the other hand, softmax function should be consumed to connect fully connected neural networks output layer and CNN output layer.
Features reduced again in pooling layer. Convolved layer’s maximum valued items in the filter size frames would be stored in this layer.
Convolution, activation and max-pooling procedures should be applied several times. After then, reduced features would be flattened to feed fully connected neural networks as inputs.
Neural networks layer
This layer is typical neural networks layer. Backpropagation algorithm would handle learning from now on. You might realize that neural networks model is not fully connected in the following illustration. Only meaningful nodes involved in the model.
Now, machine can respond that the following image would be most probably dog, even I cannot!
But CNNs are not perfect. They only look for some patterns. They do not interested in about that the locations of these patterns are meaningful or not. For example, changing ear and lip of Kim Kardesian doesn’t change the result.
So, convolutional neural networks are powerful deep learning method for pixel based things. They produce successful results even on single CPUs. The most succefful approach for detecting handwritten digits is still ConvNets.