Kaggle announced facial expression recognition challenge in 2013. Researchers are expected to create models to detect 7 different emotions from human being faces. However, recent studies are far away from the excellent results even today. That’s why, this topic is still satisfying subject.
The both training and evaluation operations would be handled with Fec2013 dataset. Compressed version of the dataset takes 92 MB space whereas uncompressed version takes 295 MB space. There are 28K training and 3K testing images in the dataset. Each image was stored as 48×48 pixel. The pure dataset consists of image pixels (48×48=2304 values), emotion of each image and usage type (as train or test instance).
Suppose that the dataset is already loaded under the data folder. Herein, we can read the dataset content as mentioned below.
with open("/data/fer2013.csv") as f: content = f.readlines() lines = np.array(content) num_of_instances = lines.size print("number of instances: ",num_of_instances)
Deep learning dominates computer vision studies in recent years. Even academic computer vision conferences are closely transformed into Deep Learning activities. Herein, we would apply convolutional neural networks to tackle this task. And we will construct CNN with Keras using TensorFlow backend.
We’ve already loaded the dataset before. Now, train and test set can be stored into dedicated variables.
x_train, y_train, x_test, y_test = , , ,  for i in range(1,num_of_instances): emotion, img, usage = lines[i].split(",") val = img.split(" ") pixels = np.array(val, 'float32') emotion = keras.utils.to_categorical(emotion, num_classes) if 'Training' in usage: y_train.append(emotion) x_train.append(pixels) elif 'PublicTest' in usage: y_test.append(emotion) x_test.append(pixels)
Time to construct CNN structure.
model = Sequential() #1st convolution layer model.add(Conv2D(64, (5, 5), activation='relu', input_shape=(48,48,1))) model.add(MaxPooling2D(pool_size=(5,5), strides=(2, 2))) #2nd convolution layer model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(AveragePooling2D(pool_size=(3,3), strides=(2, 2))) #3rd convolution layer model.add(Conv2D(128, (3, 3), activation='relu')) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(AveragePooling2D(pool_size=(3,3), strides=(2, 2))) model.add(Flatten()) #fully connected neural networks model.add(Dense(1024, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(1024, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(num_classes, activation='softmax'))
We can train the network. To complete the training in less time, I prefer to implement learning with randomly selected trainset instances. That is the reason why train and fit generator used. Also, loss function would be cross entropy because the task is multi class classification.
gen = ImageDataGenerator() train_generator = gen.flow(x_train, y_train, batch_size=batch_size) model.compile(loss='categorical_crossentropy' , optimizer=keras.optimizers.Adam() , metrics=['accuracy'] ) model.fit_generator(train_generator, steps_per_epoch=batch_size, epochs=epochs)
Fit is over. We can evaluate the network.
train_score = model.evaluate(x_train, y_train, verbose=0) print('Train loss:', train_score) print('Train accuracy:', 100*train_score) test_score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', test_score) print('Test accuracy:', 100*test_score)
I’ve got the following results not to fall into overfitting. I faced with overfitting when I increase the epoch.
Test loss: 2.27945706329 Test accuracy: 57.4254667071 Train loss: 0.223031098232 Train accuracy: 92.0512731201
Let’s try to recognize facial expressions of custom images. Because only error rates don’t express anything.
img = image.load_img("/data/pablo.png", grayscale=True, target_size=(48, 48)) x = image.img_to_array(img) x = np.expand_dims(x, axis = 0) x /= 255 custom = model.predict(x) emotion_analysis(custom) x = np.array(x, 'float32') x = x.reshape([48, 48]); plt.gray() plt.imshow(x) plt.show()
Emotions are stored as numerical as labeled from 0 to 6. Keras would produce an output array including these 7 different emotion scores. We can visualize each prediction as bar chart.
def emotion_analysis(emotions): objects = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral') y_pos = np.arange(len(objects)) plt.bar(y_pos, emotions, align='center', alpha=0.5) plt.xticks(y_pos, objects) plt.ylabel('percentage') plt.title('emotion') plt.show()
If you watch the famous Netflix series Narcos, then you would be familiar with the following picture. The following picture of Pablo Escobar is taken in a police station when he was taken into custody. It seems that the model we’ve constructed can successfully recognize Pablo in happy mood.
Secondly, we will test the scene of Marlon Brando acting in Godfather as Don Corleone. Corleone cries at dead body of his son’s elbow. It seems that the model can recognize Brando’s facial expression, too.
What’s more, Hugh Jackman comes to my mind as always angry figure. That’s why, I would like to test him. Especially, I choose a picture of Jackman from X-Men as Wolverine. Result seems very successful.
Finally, art authorities still cannot come to mutual agreement for Mona Lisa’s emotion. Network says that Mona Lisa is in neutral mood.
So, we’ve constructed a CNN model to recognize facial expressions of human beings. As mentioned before, model got 57% accuracy on test set. That can be acceptable because winner of kaggle challenge got 34% accuracy.
Processing detected faces instead of the entire image would increase accuracy. That’s a little trick. I crop the faces manually before running network.
The entire code of the project is pushed on GitHub. Also, you might want to apply transfer learning and use pre-trained weights. Pre-trained weights and pre-constructed network structure are pushed on GitHub, too.