Convolutional neural networks (aka CNN and ConvNet) are modified version of traditional neural networks. These networks have wide and deep structure. Therefore, they call them as deep neural networks or deep learning. Nowadays, they are so popular because they are also good at classifying image based things.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
Transferring all image pixels to neural networks is the traditional approach. In contrast, we will detect some patterns first, and feeding these patterns to neural networks. In this way, we can process images in less complex way whereas we can get more successful results.
Suppose that you would like to classify images as cat or not cat. In the most general action, shapes of ear and tail of cats are extracted from known ones. Then, we will look for these shapes (filters) in new instances.
We would classify ones as cat if they include ear and tail filters as illustrated below.
Of course, math behind the convolutional neural networks is not that easy.
Convolution layer
In this layer, we will reduce the image size based on the filter size. For example, 3×3 sized convolved feature would be created when 3×3 sized filter would be applied to 5×5 sized image in the following animation. x1 items are equal to 1 whereas x0 items are equal to 0 in the filter. Colvolved feature is calculated by multiplying filter and same size frame in the image.
This animation shows applying one filter. We should apply several filters to the image. In this way, we can detect different patterns (e.g. different filters for both cat ear and tail). That’s why, 2-dimensional image would be transformed to 3-dimensional convolution object in CNN procedures illustration.
Activation layer
ReLU produces outputs faster than other common activation functions for both feed forward and backpropagation. That’s why, CNN mostly consumes ReLU as an activation function.
On the other hand, softmax function should be consumed to connect fully connected neural networks output layer and CNN output layer.
Max-pooling layer
Features reduced again in pooling layer. We need to store convolved layer’s maximum valued items in the filter size frames in this layer.
We should apply convolution, activation and max-pooling procedures several times. After then, we will flatten reduced features to feed fully connected neural networks as inputs.
Neural networks layer
This layer is typical neural networks layer. Backpropagation algorithm would handle learning from now on. You might realize that neural networks model is not fully-connected in the following illustration. Only meaningful nodes involved in the model.
Now, machine can respond that the following image would be most probably dog, even I cannot!
Is It Perfect?
But CNNs are not perfect. They only look for some patterns. They do not interested in about that the locations of these patterns are meaningful or not. For example, changing ear and lip of Kim Kardesian doesn’t change the result.
So, convolutional neural networks are powerful deep learning method for pixel based things. They produce successful results even on single CPUs. The most succefful approach for detecting handwritten digits is still ConvNets.
Support this blog if you do like!
8 Comments
Comments are closed.