Modern face recognition pipelines consist of 4 common stages. These are detection, alignment, representation and verification. These might be confusing for beginners. In this post, we take a step back and mention a face recognition pipeline conceptually. You should follow the links to dive these concepts deep.
Vlog
The following video covers a hands-on face recognition workshop from scratch in python.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
DeepFace
We will use deepface framework for python in this post. You can install it by calling the following command.
!pip install deepface
Stage 1 and 2: Detection and Alignment
There are several face detection solutions. OpenCV offers haar cascade and Single Shot Multibox Detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and a CNN based Max-Margin Object Detection (MMOD) and finally Multi-task Cascaded Convolutional Networks (MTCNN) is a common solution for face detection.
Here, you can watch how to use different face detectors in Python.
Alignment is easy if face and eyes detected already. Experiments show that applying face alignment increases the accuracy of model more than 1%. Unfortunately, neither opencv nor dlib offer face alignment as an out-of-box function. We have to do some trigonometry here to align faces.
You can find out the math behind face alignment in the following video.
Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.
Herein, deepface offers both face detection and face alignment as a function. It wraps OpenCV’s haar cascade, SSD, dlib HoG, MTCNN and retinaface. It also does some math and trigonometry to align faces. You just need to pass the path of the image. If you wouldn’t mention detector_backend argument, then it will use its default configuration OpenCV’s haar cascade.
import numpy as np from deepface import DeepFace from deepface.commons import functions model_name = "VGG-Face" target_size = functions.functions(model_name = model_name) img1 = DeepFace.extract_faces(img_path = "img1.jpg", target_size=target_size, detector_backend ="mtcnn") img2 = DeepFace.extract_faces(img_path = "img2.jpg", target_size=target_size, detector_backend = "mtcnn") img1 = np.expand_dims(img1, axis=0) #(224, 224, 3) to (1, 224, 224, 3) img2 = np.expand_dims(img2, axis=0) #(224, 224, 3) to (1, 224, 224, 3)
Stage 2.5 Normalization
Face detectors detect faces in a squared area. So, detected faces come with some noise such as background color. Here, dlib can find 68 facial landmarks. We can extract exact face and get rid of any noise in this way. This optional step is called as normalization in facial recognition.
In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.
Stage 3: Representation
Deep learning just appears in this representation stage. We will feed face images to a convolutional neural networks model but the task is here is not classification. We will use CNN models to find embeddings similar to autoencoders.
The most popular face recognition models are VGG-Face, Google FaceNet, OpenFace and Facebook DeepFace. Luckily, these models are all provided by deepface framework for python as well. You can build these models as illustrated below.
model_name = "VGG-Face" model = DeepFace.build_model(model_name = model_name)
These models have different input and output shapes. For example, VGG-Face expects (224, 224, 3) shaped inputs and returns 2622 dimensional vector as output. On the other hand, Google FaceNet expexts (160, 160, 3) shaped inputs and return 128 dimensional array. Notice that we have to pass the input shape to detectFace function in the detect and align stage. We can get the input shape expected by the built model as shown below. So, you must put detectFace command after input shape retrieved.
from deepface.commons import functions model_name = "VGG-Face" target_size = functions.functions(model_name = model_name)
Question: how those models trained?
These face recognition models were previously built to classify identities of face images on a large scale data set. Consider a data set containing 1M images of 1000 unique person. Output layer of the CNN model would be 1000 in this case and the model is trained to find identities of fed images. When training is over, then the output layer is dropped and the early layer of the output layer will be the new output layer. Now, the new model will not classify identities but return representation of faces. We can now feed new images that does not appear in the training data set. The model still finds representations.
Dashed lines in the final layer mean exactly this in the Facebook DeepFace architecture.
These concept is called as Siamese Networks in the literature.
Representations
We’ve detected and aligned face images and fed them to a face recognition model in the previous steps. Now, we have vector representations for each image. This is a abstract concept. To make this concrete, I will visualize it.
I will transform 1D vectors to 2D matrices by appending vector itself. In this way, each line of the matrix will have same information.
img1_representation = model.predict(img1)[0].tolist() img2_representation = model.predict(img2)[0].tolist() img1_graph = []; img2_graph = [] for i in range(0, 200): img1_graph.append(img1_representation) img2_graph.append(img2_representation) img1_graph = np.array(img1_graph) img2_graph = np.array(img2_graph)
This is similar to legacy barcodes. They just store data horizontally. If you damage the barcode horizontally, you can still read data of it. However, vertical damages cause data loss as well.
To visualize the presentations, the following code block will help us.
fig = plt.figure() ax1 = fig.add_subplot(3,2,1) plt.imshow(img1[0][:,:,::-1]) plt.axis('off') ax2 = fig.add_subplot(3,2,2) im = plt.imshow(img1_graph, interpolation='nearest', cmap=plt.cm.ocean) plt.colorbar() ax3 = fig.add_subplot(3,2,3) plt.imshow(img2[0][:,:,::-1]) plt.axis('off') ax4 = fig.add_subplot(3,2,4) im = plt.imshow(img2_graph, interpolation='nearest', cmap=plt.cm.ocean) plt.colorbar() plt.show()
VGG-Face representation has 2622 slots horizontally. Each slot is represented with different color and color meaning explained in the colorbar on the right.
If we set Google FaceNet to face recognition model, then representation will be in different shape and content. It would have 128 dimensions.
So, we will decide these two images are same person or not based on those vector representations instead of face images themselves.
Question: which single face recognition model is the best
We could use VGG-Face, FaceNet, OpenFace or DeepFace to find representations of face. They are all state-of-the-art face recognition models. Some are designed by tech giants such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford or Carnegie Mellon University. So, which single model performs better than others? Let’s have a short discussion about this topic.
Stage 3: Verification
We will compare vector representations of images. The easiest way to compare two vectors is to find the euclidean distance between them. We all actually remember it from Pythagorean theorem in high school days. However that was 2 dimensional equation. Here we have n-dimensional vector as a representation.
To adapt Pythagorean theorem into n-dimensional space, we will find the square of difference of each slot values in our representations. This new vector represent distance vector. So, squared root of sum of each slot will be the distance.
distance_vector = np.square(img1_representation - img2_representation) distance = np.sqrt(distance_vector.sum())
We can visualize the distance vector as well.
distance_graph = [] for i in range(0, 200): distance_graph.append(distance_vector) distance_graph = np.array(distance_graph) ax6 = fig.add_subplot(3,2,6) im = plt.imshow(distance_graph, interpolation='nearest', cmap=plt.cm.ocean) plt.colorbar()
Distance vector appears in the 3rd line. As seen, its slots are mostly green colored. Notice that green color represent values close to 0.
Let’s look at a pair for different ones.
Decision
We know that distance would be 0 if we feed same images. Because representation will be same and difference of each slot will be 0 as well.
Besides, we see that distance value is smaller when we feed images of a same person. It will increase when we feed images of different ones. Here, we will check the distance value is smaller than a threshold value.
Threshold
However, what is the threshold value to determine distance is enough to classify a pair as same person?
This is a very deep topic as well. Here, you can find a deeply explained post about determination of the threshold in a face recognition pipeline. Besides, the following vlog covers how to fine tune the threshold value in a face recognition pipeline.
To sum up, euclidean distanve value for VGG-Face model should be shown below.
if distance <= 0.55: return True else: return False
Threshold should be different for different face recognition models. My experiments show that thresholds should be tuned as demonstrated below.
def findThreshold(model_name): threshold = 0 if model_name == &amp;#039;VGG-Face&amp;#039;: threshold = 0.55 elif model_name == &amp;#039;OpenFace&amp;#039;: threshold = 0.55 elif model_name == &amp;#039;Facenet&amp;#039;: threshold = 10 elif model_name == &amp;#039;DeepFace&amp;#039;: threshold = 64 return threshold
BTW, we can find cosine similarity value to compare vectors as well. You can see the threshold values for cosine values here.
Testing
I’ve applied Facebook DeepFace model in real time in the following video. Results are satisfactory for both accuracy and speed, aren’t they?
Namesakes
As seen, face recognition is mainly based on comparing two images. We do not train a CNN model with multiple photos of identities. We just feed an image. That’s why, this concept is also called as one shot learning in the literature. Besides, some sources mention this technology as face verification instead of face recognition. It comes from verifying faces obviously.
Deepface itself
DeepFace handles all pipeline stages mentioned in this post in the background as well. You can apply face recognition tests with a few lines of code. We’ve just focused on pipeline stages to understand a face recognition system.
Large scale face recognition
In this post, we’ve mentioned actually how to apply face verification. Face verification has O(1) complexity in big O notation. Face recognition requires to find a face in a data set. This becomes O(n) complexity in big O notation where n is the number of instances in your data set.
We can find a hacking method to speed large scale face recognition up dramatically.
Approximate Nearest Neighbor
As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.
On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.
So, if you have a robust facial recognition model then it is not a big deal to run it in billions!
Real time face recognition
Besides, we can run face recognition tasks in real time as well.
Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.
Anti-Spoofing and Liveness Detection
What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.
DeepFace API
DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:
Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:
Ensemble method
We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.
Tech Stack Recommendations
Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.
Conclusion
So, we have mentioned how face recognition works and common stages of a common face recognition pipeline. We have used pre-built models provided by deepface framework. I strongly recommend you to follow links to understand concepts well.
I pushed the source code of this blog post to GitHub. You can support this study by starring the GitHub repo as well.
Support this blog if you do like!
Hi Sir,
I was interested to use the deepface project for one of my research project. I found that the codes are quite outdated and I was curious to ask if you have upgraded the codes according to the newest tensorflow version (2.3.x) or like that.
I run deepface for tf 2.2.0. What troubles you had when you running deepface?
Hi Sefik,
Great stuff! Thanks for sharing.
There seems to be an problem with all the embedded videos since I always get an error when trying to play them.
I’m trying to follow along with the first video and noticed that
input_shape = model.layers[0].input_shape[1:] returns an empty listed, whereas
input_shape = model.layers[0].input_shape[0][1:] returns a tuple
Hopefully I won’t have too much trouble following along with the rest of the video, but I was able to replicate the output by adding that 0 index
Most probably your tf version behaves different than my tf version. Thank you for the comment!