Handwritten Digit Recognition Using CNN with Keras

Image recognition studies have reached incredible accuracy levels for the past several years. It is undeniable fact that deep learning has defeated traditional computer vision techniques in this field and we have reached that level.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Previously, we’ve applied classical neural networks to MNIST dataset to recognize handwritten digits. That approach transfers all image pixels to fully connected neural networks. This approach has produced 98.01% accuracy rate on test set. This success rate might be good, but it is not perfect.

Applying convolutional neural networks can produce much more successful results. In contrast to traditional approach, we will transfer important image pixels to fully connected neural networks instead of all image pixels. We should apply some filters to detect important pixel groups.

Keras is an API to consume common deep learning frameworks and build deep learning models easier. It also reduces code complexity. We can write shorter codes to implement same duty in Keras. Also, same Keras code can be run on different platforms such as TensorFlow or Theano. All you need is to change the configuration to switch deep learning framework. We will consume Keras to create a convolutional neural networks model.

Before begin, you need to get TensorFlow and Keras environments up. You can follow the orders in TensorFlow 101 course for installation (these lectures are preview enabled).

Firstly, we will import required keras libraries

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator

Secondly, we would load mnist dataset. This dataset has already seperated as train and test sets. The both train and data set includes features and labels.

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Thirdly, keras enforces us to work on 3D matrixes for input features. So, we would transform train set and test set features to 3D matrix. Input features are two dimensional matrix size of 28×28. These matrixes remain same, we just add a dummy dimension and transform the matrix to 28x28x1. Moreover, input features have to be between 0 and 1. That’s why, features would be divided to 255 to normalize [0, 1].

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

x_train /= 255 #inputs have to be between [0, 1]
x_test /= 255

Dataset labels are in scale fo 0 to 9. Keras enforces us to work on binary class labels. The following block would transform labels to binary format. (e.g. 0010000000 represents the number of 2 )





y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

That is not a must but we would stay loyal following structure. We would apply convolution and pooling operations twice. After then, learned features would be transferred to a fully connected neural networks consisting of a hidden layer. You might change the structure of the network and monitor the effect on accuracy.

cnn-procedure
CNN procedure

Now, we would construct the CNN structure.

model = Sequential()

#1st convolution layer
model.add(Conv2D(32, (3, 3) #apply 32 filters size of (3, 3)
 , input_shape=(28,28,1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

#2nd convolution layer
model.add(Conv2D(64,(3, 3))) #apply 64 filters size of (3x3)
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))

model.add(Flatten())

# Fully connected layer. 1 hidden layer consisting of 512 nodes
model.add(Dense(512))
model.add(Activation('relu'))

#10 outputs
model.add(Dense(10, activation='softmax'))

Notice that connection between output layer of fully connected neural networks and CNN output layer is a non linear function. That function should be softmax. In this way, output values are normalized between [0, 1]. Also, sum of the outputs are always equal to 1. Finally, the maximum index would fire the result.

Standard dataset consists of 60K instances. It is hard to handle all instances several times on a personal computer. That’s why, I prefer to use randomly selected ones to train the network. You might skip this step if you have time or strong hardware, and want to work on all instances.

gen = ImageDataGenerator()
train_generator = gen.flow(x_train, y_train, batch_size=batch_size)

Now, it is time to train the network.

model.compile(loss='categorical_crossentropy'
 , optimizer=keras.optimizers.Adam()
 , metrics=['accuracy']
)

model.fit_generator(train_generator
 , steps_per_epoch=batch_size
 , epochs=epochs,
 validation_data=(x_test, y_test))

We can question the success metrics because training completed

Accuracy

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', 100*score[1])

Classical fully connected neural networks retrieved 98.01% accuracy whereas convolutional neural networks did exceed the 99% accuracy limit. That is an incredible result. I run this project on Core i7 CPU. Moreover, increasing batch size and epoch would increase the accuracy.

cnn-success
Final score

Finally, I have created the model with the following configuration

batch_size = 250
epochs = 10

So, convolutional neural networks take image recognition studies away a step more. Also, the entire code is shared on GitHub.

If you would like to dig deeper into deep learning concepts, you should check out the online course TensorFlow 101: Introduction to Deep Learning.






Like this blog? Support me on Patreon

Buy me a coffee