Image recognition studies have reached incredible accuracy levels for the past several years. It is undeniable fact that deep learning has defeated traditional computer vision techniques in this field and we have reached that level.
Previously, we’ve applied classical neural networks to MNIST dataset to recognize handwritten digits. That approach transfers all image pixels to fully connected neural networks. This approach has produced 98.01% accuracy rate on test set. This success rate can be evaluated as good, but it is not perfect.
Applying convolutional neural networks can produce much more successful results. In contrast to traditional approach, important image pixels would be transferred to fully connected neural networks instead of all image pixels. Some filters should be applied to the picture to detect important pixels.
Keras is an API to consume common deep learning frameworks and build deep learning models easier. It also reduces code complexity. We can write shorter codes to implement same duty in Keras. Also, same Keras code can be run on different platforms such as TensorFlow or Theano. All you need is to change the configuration to switch deep learning framework. We will consume Keras to create a convolutional neural networks model.
Before begin, you need to get TensorFlow and Keras environments up. You can follow the orders in TensorFlow 101 course for installation (these lectures are preview enabled).
Firstly, we will import required keras libraries
import keras from keras.datasets import mnist from keras.models import Sequential from keras.layers import Dense, Activation, Flatten from keras.layers import Conv2D, MaxPooling2D from keras.preprocessing.image import ImageDataGenerator
Secondly, we would load mnist dataset. This dataset has already seperated as train and test sets. The both train and data set includes features and labels.
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Thirdly, keras enforces us to work on 3D matrixes for input features. So, we would transform train set and test set features to 3D matrix. Input features are two dimensional matrix size of 28×28. These matrixes remain same, we just add a dummy dimension and matrix would be transformed to 28x28x1. Moreover, input features have to be between 0 and 1. That’s why, features would be divided to 255 to normalize [0, 1].
x_train = x_train.reshape(x_train.shape, 28, 28, 1) x_test = x_test.reshape(x_test.shape, 28, 28, 1) x_train = x_train.astype('float32') x_test = x_test.astype('float32') x_train /= 255 #inputs have to be between [0, 1] x_test /= 255
Dataset labels are in scale fo 0 to 9. Keras enforces us to work on binary class labels. The following block would transform labels to binary format. (e.g. label 2 would be represented as 0010000000)
y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes)
That is not a must but we would stay loyal following structure. Convolution and pooling operations would be applied twice. After then, learned features would be transferred to a fully connected neural networks consisting of a hidden layer. You might change the structure of the network and monitor the effect on accuracy.
Now, we would construct the CNN structure.
model = Sequential() #1st convolution layer model.add(Conv2D(32, (3, 3) #apply 32 filters size of (3, 3) , input_shape=(28,28,1))) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2,2))) #2nd convolution layer model.add(Conv2D(64,(3, 3))) #apply 64 filters size of (3x3) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2,2))) model.add(Flatten()) # Fully connected layer. 1 hidden layer consisting of 512 nodes model.add(Dense(512)) model.add(Activation('relu')) #10 outputs model.add(Dense(10, activation='softmax'))
You may be gotten attention that output layer of fully connected neural networks is connected to CNN output layer with a non linear function. That function should be softmax. In this way, output values are normalized between [0, 1]. Also, sum of the outputs are always equal to 1. Finally, the maximum index would fire the result.
Standard dataset consists of 60K instances. It is hard to handle all instances several times on a personal computer. That’s why, I prefer to use randomly selected ones to train the network. You might skip this step if you have time or strong hardware, and want to work on all instances.
gen = ImageDataGenerator() train_generator = gen.flow(x_train, y_train, batch_size=batch_size)
Now, it is time to train the network.
model.compile(loss='categorical_crossentropy' , optimizer=keras.optimizers.Adam() , metrics=['accuracy'] ) model.fit_generator(train_generator , steps_per_epoch=batch_size , epochs=epochs, validation_data=(x_test, y_test))
Once network is trained, we can question the success metrics.
score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', score) print('Test accuracy:', 100*score)
Classical fully connected neural networks retrieved 98.01% accuracy whereas convolutional neural networks did exceed the 99% accuracy limit. That is an incredible result. And all these tests are performed on a Core i7 CPU. Moreover, increasing batch size and epoch would increase the accuracy.
Finally, I have created the model with the following configuration
batch_size = 250 epochs = 10
So, image recognition studies would be gone a step further by convolutional neural networks. Also, the entire code is shared on GitHub.
If you would like to dig deeper into deep learning concepts, you should check out the online course TensorFlow 101: Introduction to Deep Learning.