Kaggle announced facial expression recognition challenge in 2013. Researchers are expected to create models to detect 7 different emotions from human being faces. However, recent studies are far away from the excellent results even today. That’s why, this topic is still satisfying subject.

Dataset
The both training and evaluation operations would be handled with Fec2013 dataset. Compressed version of the dataset takes 92 MB space whereas uncompressed version takes 295 MB space. There are 28K training and 3K testing images in the dataset. Each image was stored as 48×48 pixel. The pure dataset consists of image pixels (48×48=2304 values), emotion of each image and usage type (as train or test instance).
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
Suppose that the dataset is already loaded under the data folder. Herein, we can read the dataset content as mentioned below.
with open("/data/fer2013.csv") as f: content = f.readlines() lines = np.array(content) num_of_instances = lines.size print("number of instances: ",num_of_instances)
Learning Procedure
Deep learning dominates computer vision studies in recent years. Even academic computer vision conferences are closely transformed into Deep Learning activities. Herein, we would apply convolutional neural networks to tackle this task. And we will construct CNN with Keras using TensorFlow backend.
We’ve already loaded the dataset before. Now, train and test set can be stored into dedicated variables.
x_train, y_train, x_test, y_test = [], [], [], [] for i in range(1,num_of_instances): try: emotion, img, usage = lines[i].split(",") val = img.split(" ") pixels = np.array(val, 'float32') emotion = keras.utils.to_categorical(emotion, num_classes) if 'Training' in usage: y_train.append(emotion) x_train.append(pixels) elif 'PublicTest' in usage: y_test.append(emotion) x_test.append(pixels) except: print("", end="")
Time to construct CNN structure.
model = Sequential() #1st convolution layer model.add(Conv2D(64, (5, 5), activation='relu', input_shape=(48,48,1))) model.add(MaxPooling2D(pool_size=(5,5), strides=(2, 2))) #2nd convolution layer model.add(Conv2D(64, (3, 3), activation='relu')) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(AveragePooling2D(pool_size=(3,3), strides=(2, 2))) #3rd convolution layer model.add(Conv2D(128, (3, 3), activation='relu')) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(AveragePooling2D(pool_size=(3,3), strides=(2, 2))) model.add(Flatten()) #fully connected neural networks model.add(Dense(1024, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(1024, activation='relu')) model.add(Dropout(0.2)) model.add(Dense(num_classes, activation='softmax'))
We can train the network. To complete the training in less time, I prefer to implement learning with randomly selected trainset instances. That is the reason why train and fit generator used. Also, loss function would be cross entropy because the task is multi class classification.
gen = ImageDataGenerator() train_generator = gen.flow(x_train, y_train, batch_size=batch_size) model.compile(loss='categorical_crossentropy' , optimizer=keras.optimizers.Adam() , metrics=['accuracy'] ) model.fit_generator(train_generator, steps_per_epoch=batch_size, epochs=epochs)
Fit is over. We can evaluate the network.
train_score = model.evaluate(x_train, y_train, verbose=0) print('Train loss:', train_score[0]) print('Train accuracy:', 100*train_score[1]) test_score = model.evaluate(x_test, y_test, verbose=0) print('Test loss:', test_score[0]) print('Test accuracy:', 100*test_score[1])
I’ve got the following results not to fall into overfitting. I faced with overfitting when I increase the epoch.
Test loss: 2.27945706329 Test accuracy: 57.4254667071 Train loss: 0.223031098232 Train accuracy: 92.0512731201
Confusion Matrix
Sure, accuracy should not express right impression for multi class classification problems. Confusion matrix of this model is demonstrated below. Lines represent actual values whereas columns state predictions. I mean that there are 467 angry instances in testset. We can classify 214 angry items correctly. On the other hand, we classified 9 items as disgust but these items are actual angry ones.
| Angry | Disgust | Fear | Happy | Sad | Surprise | Neutral | |
| Angry | 214 | 9 | 53 | 30 | 67 | 8 | 86 |
| Disgust | 10 | 24 | 9 | 2 | 6 | 0 | 5 |
| Fear | 45 | 2 | 208 | 29 | 89 | 45 | 78 |
| Happy | 24 | 0 | 40 | 696 | 37 | 18 | 80 |
| Sad | 65 | 3 | 83 | 56 | 285 | 10 | 151 |
| Surprise | 7 | 1 | 42 | 27 | 9 | 303 | 26 |
| Neutral | 45 | 2 | 68 | 65 | 88 | 8 | 331 |
Basically, scikit-learn produces that confusion matrix.
from sklearn.metrics import classification_report, confusion_matrix pred_list = []; actual_list = [] for i in predictions: pred_list.append(np.argmax(i)) for i in y_test: actual_list.append(np.argmax(i)) confusion_matrix(actual_list, pred_list)
Face detection
Images are already cropped and just facial area are focused on in the train set. This is not a must but we should detect faces of the custom testing images and feed just facial areas to the neural networks model. This will increase the accuracy dramatically.
There are several face detection solutions. OpenCV offers haar cascade and single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally Multi-task Cascaded Convolutional Networks (MTCNN) is a common solution for face detection. Herein, haar cascade and HoG are legacy methods whereas SSD, MMOD and MTCNN are deep learning based modern solutions. You can see the detection performance of those models in the following video.
Here, you can watch how to use different face detectors in Python.
Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.
Testing
Let’s try to recognize facial expressions of custom images. Because only error rates don’t express anything.
img = image.load_img("/data/pablo.png", grayscale=True, target_size=(48, 48)) x = image.img_to_array(img) x = np.expand_dims(x, axis = 0) x /= 255 custom = model.predict(x) emotion_analysis(custom[0]) x = np.array(x, 'float32') x = x.reshape([48, 48]); plt.gray() plt.imshow(x) plt.show()
Emotions stored as numerical as labeled from 0 to 6. Keras would produce an output array including these 7 different emotion scores. We can visualize each prediction as bar chart.
def emotion_analysis(emotions): objects = ('angry', 'disgust', 'fear', 'happy', 'sad', 'surprise', 'neutral') y_pos = np.arange(len(objects)) plt.bar(y_pos, emotions, align='center', alpha=0.5) plt.xticks(y_pos, objects) plt.ylabel('percentage') plt.title('emotion') plt.show()
If you watch the famous Netflix series Narcos, then you would be familiar with the following picture. The following picture of Pablo Escobar is taken in a police station when he was taken into custody. It seems that the model we’ve constructed can successfully recognize Pablo in happy mood.

Secondly, we will test the scene of Marlon Brando acting in Godfather as Don Corleone. Corleone cries at dead body of his son’s elbow. It seems that the model can recognize Brando’s facial expression, too.

What’s more, Hugh Jackman comes to my mind as always angry figure. That’s why, I would like to test him. Especially, I choose a picture of Jackman from X-Men as Wolverine. Result seems very successful.

Finally, art authorities still cannot come to mutual agreement for Mona Lisa’s emotion. Network says that Mona Lisa is in neutral mood.

Real time solution
Besides, we can apply emotion analysis on a video streaming or web cam capturing. I’ve written a dedicated blog post about this subject. Its demo is shown below.
Web cam
Me and my colleagues try to act all emotion classes. As seen, this implementation runs very fast.
Video streaming
Remember the testimony of Mark Zuckerberg after Cambridge Analytica scandal. Facebook lost 134 billion dollar after this news. This makes unhappy anyone just like Mark as detected below.
Conclusion
So, we’ve constructed a CNN model to recognize facial expressions of human beings. Model produces 57% accuracy on test set. That can be acceptable because winner of kaggle challenge has got 34% accuracy.
Processing detected faces instead of the entire image would increase accuracy. That’s a little trick. I crop the faces manually before running network.
The entire code of the project is pushed on GitHub. Also, you might want to apply transfer learning and use pre-trained weights. Pre-trained weights and pre-constructed network structure are pushed on GitHub, too.
This post covers my custom design for facial expression recognition task. I can improve the accuracy from 57% to 66% with Auto-Keras for the same task.
If you interested in this post, you might be interested in deep face recognition.
Python library
Herein, deepface is a lightweight facial analysis framework covering both face recognition and demography such as age, gender, race and emotion. If you are not interested in building neural networks models from scratch, then you might adopt deepface. It is fully open-source and available on PyPI. You can make predictions with a few lines of code. It also supports real-time implementations as well.

Here, you can watch a how to apply facial attribute analysis in python with a just few lines of code.
You can run deepface in real time with your web cam as well.
Also, deepface has its own ui built with react js for real time facial attribute analysis purposes.
Anti-Spoofing and Liveness Detection
What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.
Support this blog financially if you do like!

Hi blockposter thank you for sharing the post.
Fec2013 file not showing any image cannot open at all after download.
First need download opencv or not? to read, resize, convert grayscale
Need install numpy?
Keras or tensor flow need to install?
Keras is one lib that inside tensor flow?
What to start first? I view many webpage and github code.
All get me confuse, i really not knowing how to start the project at all,
spending many days to search but not understand.
I am new in python but need to do my FYP project TT.
Before running your code need to apply transfer learning and use pre-trained weights?
Wht to do to apply transfer learning and use pre-trained weights?
running the code that given then run your code?
Pre-trained weights (facial_expression_model_weights.h5) file to be download for?
After click and dowload cannot be open.
Hello,
1- To read Fec2013, you need to install numpy but you do not have to install OpenCV.
2- Yes, you must install keras and tensorflow because in this post keras code pushed
3- Please follow steps mentioned only in this post. If something confuse you, then please contact.
4- It depends. I recommend you to train the dataset instead of applying transfer learning
5- Once, you trained a network and understand how system works, you might apply tarnsfer learning. And yes, facial_expression_model_weights.h5 refers to pre-trained weights. Would you try different browser? I can download it in chrome.
Thank you a lot, really. I hv installed the numpy, keras, n dowload the fer2013 file n covert it to .csv. However, the facial_expression_model_weights.h5 i am able to download in chrome but unable to open it. Which apps u use to open/view it or just direct download and apply only.
Beisdes, there is problem when i copy yr code until line:
for i in range(1,num_of_instances):
try:
emotion, img, usage = lines[i].split(“,”)
val = img.split(” “)
pixels = np.array(val, ‘float32’)
emotion = keras.utils.to_categorical(emotion, num_classes)
if ‘Training’ in usage:
y_train.append(emotion)
x_train.append(pixels)
elif ‘PublicTest’ in usage:
y_test.append(emotion)
x_test.append(pixels)
except:
print(end=” +”)
It direct jump to except statement print out ++++++
Output:
no of instances: 35918
instance length: 2304
++++++++++++++++
Or I need to direct copy all the codes from start to end and just put in.
Downloading facial_expression_model_weights.h5 is enough. You don’t have to open it. It’s a binary file. Your python code will consume it.
Jumping except block would not be a problem. It seems you can read valid lines in fec2013 because you can dump num of instances.
Thank you very much. Isn’t over fitting means the test accuracy is lower than train accuracy?
For epoch 5 I get the result of:
Train loss: 1.103590385833875
Train accuracy: 58.18036155919678
Test loss: 1.2256663155509224
Test accuracy: 53.552521594173896
For epoch 7:
Train loss: 1.0120716843428483
Train accuracy: 62.001462955971306
Test loss: 1.1952817713726063
Test accuracy: 54.332683201984096
Is it over fitting, if is how can it solve?
No, in your case, both train and test accuracy increase during epochs. If your train accuracy increases, meanwhile your test accuracy decreases, then this means that you fall into overfitting.
I have the same problem described below here
the training accuracy is higher than the testing accuracy , I supposed that it’s overfitting.
how can I solve that ?
I tend to use earlystopping and modelcheckpoint to avoid overfitting. ModelCheckpoint saves the weights for the best epoch based on validation loss whereas earlystopping terminates training if validation loss wouldn’t decrease for 200 epochs. You should pass both earlystopping and modelcheckpoint to fit command as callbacks parameters as illustrated below.
from keras.callbacks import ModelCheckpoint,EarlyStopping
checkpointer = ModelCheckpoint(
filepath=’model.hdf5′
, monitor = “val_loss”
, verbose=1
, save_best_only=True
, mode = ‘auto’
)
eStop = EarlyStopping(monitor=’val_loss’
, patience=200
, verbose=1)
model.fit(
train_x, train_y
, epochs = 5000
, verbose = 1
, validation_data = (test_x, test_y)
, callbacks = [eStop, checkpointer]
)
BTW, if you still want to apply transfer learning for this case, I recommend you to read this blog post: https://sefiks.com/2019/02/13/apparent-age-and-gender-prediction-in-keras/
The trick is that you lock earlier layers and don’t update weights of these layers. In this way, you have the outcome of pre-trained models. For example, early layers of the inception model is responsible for detecting edges. Locking early layers provides you to detect edges to. The last 3 or 4 layers are free to update weights. In this way, inception model customized for your custom problem.