Facebook researchers announced its face recognition model DeepFace. It shows a very close performance to human level. Humans have 97.53% score whereas DeepFace model has 97.35% Β± 0.25%. This means that the model can get higher score than human beings sometimes. We will mention DeepFace model within Keras for Python in this post.
Objective
Face recognition is a combination of CNN, Autoencoders and Transfer Learning studies. I strongly recommend you to read How Face Recognition Works post to understand what a face recognition pipeline exactly is.
πββοΈ You may consider to enroll my top-rated machine learning course on Udemy
DeepFace structure
DeepFace model is a 8 layered convolutional neural networks. Each layer is named with a letter and number as seen. Here, number refers to the index from 1 to 8 and letter states the type of layer. C refers to convolutional layer, M refers to max pooling, L refers to locally connected layer and F refers to fully connected layer.
Even though it has a limited number of layers, its parameter size is huge. It is 6 times wider than Facenet and 36 times wider than OpenFace whereas complexity of VGG-Face and DeepFace is close. Imagine that how hard to train the network from scratch.
Training the network from scratch
Swarup Ghosh trained the DeepFace model from scratch for Keras.
The original study trained the model for SFC Dataset. The data set has 4.4 millions of photos of 4030 people. That is exactly the number of nodes in the final layer – F8 in the illustration.
On the other hand, Swarup trained the same model for VGGFace2 dataset. This data set has 3.3 millions photos of 8631 people. That’s why, the number of nodes in the final layer will be 8631. This is different than the original study.
Notice that the final layer F8 is drawed with dashed lines in the illustration. This is because the final layer F8 will be dropped in the face recognition task and the outputs of the early layer F7 will be used.
DeepFace model
DeepFace model can be built by the sequential API of Keras.
model = Sequential() model.add(Convolution2D(32, (11, 11), activation='relu', name='C1', input_shape=(152, 152, 3))) model.add(MaxPooling2D(pool_size=3, strides=2, padding='same', name='M2')) model.add(Convolution2D(16, (9, 9), activation='relu', name='C3')) model.add(LocallyConnected2D(16, (9, 9), activation='relu', name='L4')) model.add(LocallyConnected2D(16, (7, 7), strides=2, activation='relu', name='L5') ) model.add(LocallyConnected2D(16, (5, 5), activation='relu', name='L6')) model.add(Flatten(name='F0')) model.add(Dense(4096, activation='relu', name='F7')) model.add(Dropout(rate=0.5, name='D0')) model.add(Dense(8631, activation='softmax', name='F8'))
Loading pre-trained weights
Swarup shared the pre-trained weights of DeepFace model for Keras.
#Can be downloaded from https://github.com/swghosh/DeepFace/releases model.load_weights("VGGFace2_DeepFace_weights_val-0.9034.h5")
Representation layer
As I mentioned before, F7 layer refers to the representation layer. So, we no longer need to include the final layer F8. We can drop it right now.
deepface_model = Model(inputs=model.layers[0].input, outputs=model.layers[-3].output)
So, DeepFace model expects 152×152 sized facial image as input and represent is as 4096 dimensional vector.
Comparing representations
The output of the deepface is 4096 dimensional vector (embedding). Euclidean distance in L2 form is used to find the similarity between two vectors in the original paper.
def l2_normalize(x): return x / np.sqrt(np.sum(np.multiply(x, x))) def findEuclideanDistance(source_representation, test_representation): euclidean_distance = source_representation - test_representation euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance)) euclidean_distance = np.sqrt(euclidean_distance) return euclidean_distance
We will represent two face images as vectors firstly. Representation means calling predict function for programmers actually. So, we will make predictions for two facial images and then find the euclidean distance between these two outputs.
img1_embedding = deepface_model.predict("img1.jpg")[0] img2_embedding = deepface_model.predict("img2.jpg")[0] euclidean_l2_distance = findEuclideanDistance(l2_normalize(img1_embedding) , l2_normalize(img2_embedding))
The expectation is that the distance between two face images should be close for same person whereas it should be distant for different ones. My experiments show that the threshold value should be 0.55.
If you wonder how to determine the threshold value for this face recognition model, then this blog post explains it deeply.
if euclidean_l2_distance <= 0.55: verified = True else: verified = False
Unit tests
I’ve tested the DeepFace model for a set of Angelina Jolie and Scarlett Johansson. The following 6 experiments got absolute accuracy. To be honest, it satisfies me a lot.
Real time implementation
You can run Facebook DeepFace model in real time as well. You can find the source code for DeepFace in real time here. Results seems very fast and very satisfactory in the following video.
This is a sample study of transfer learning and autoencoders.
Approximate Nearest Neighbor
As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.
On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.
So, if you have a robust facial recognition model then it is not a big deal to run it in billions!
Early stages of face recognition pipeline
Modern face recognition pipelines consist of 4 stages: detect, align, represent and classify / verify. We’ve skipped the face detection and face alignment steps not to make this post so complex. However, it is really important for face recognition tasks.
Face detection can be done with many solutions such as OpenCV, Dlib or MTCNN. OpenCV offers haar cascade, single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally, MTCNN is a popular solution in the open source community as well. Herein, SSD, MMOD and MTCNN are modern deep learning based approaches whereas haar cascade and HoG are legacy methods. Besides, SSD is the fastest one. You can monitor the detection performance of those methods in the following video.
Here, you can watch how to use different face detectors in Python.
For instance, Google declared that face alignment increases its face recognition model FaceNet from 98.87% to 99.63%. This is almost 1% accuracy improvement which means a lot for engineering studies. Here, you can find a detailed tutorial for face alignment in Python within OpenCV.
You can find out the math behind alignment more on the following video:
Besides, face detectors detect faces in a rectangle area. So, detected faces come with some noise such as background color. We can find 68 different landmarks of a face with dlib. In this way, we can get rid of any noise of a facial image.
In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.
Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.
Conclusion
So, we’ve mentioned the Facebook’s DeepFace face recognition model in this post. It comes with a human-level face recognition performance based on the declaration of Facebook researchers. It has a huge, wide and very complex architecture. I am very grateful to Swarup Ghosh because of his contributions to open source.
I’ve pushed the source code of this study as a notebook to GitHub. You can support this study by starringβοΈ the repo.
Python library
You might not want to spend time to build FB DeepFace model from scratch, load pre-trained weights, find vector embeddings of faces and find distances between embeddings. Herein, deepface is a lightweight face recognition framework for Python. It currently supports the most common face recognition models including VGG-Face, Google Facenet, OpenFace, Facebook DeepFace and DeepID.
It handles all of those steps in the background. You can verify faces with a just few lines of code. It is available on PyPI as well. You should run the command βpip install deepfaceβ to have it. Its code is also open-sourced in GitHub. Repo has a detailed documentation for developers. BTW, you can support this project by starring the repo.
Here, you can watch the video of deepface framework.
Besides, you can run deepface in real time with your webcam as well.
Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.
Anti-Spoofing and Liveness Detection
What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.
Large scale face recognition
Large scale face recognition requires to apply face verification several times. However, we can store the representations of ones in our database. In this way, we just need to find the representation of target image. Finding distances between representations can be handled very fast. So, we can find an identity in a large scale data set in just seconds. Deepface offers an out-of-the-box function to handle large scale face recognition as well.
Notice that face recognition has O(n) time complexity and this might be problematic for millions or billions level data. Herein, approximate nearest neighbor (a-nn) algorithm reduces time complexity dramatically. Spotify Annoy, Facebook Faiss and NMSLIB are amazing a-nn libraries. Besides, Elasticsearch wraps an NMSLIB and it comes with highly scalability. You should run deepface within those a-nn libraries if you have really large scale data base.
On the other hand, a-nn algorithm does not guarantee to find the closest one always. We can still apply k-nn algorithm here. Map reduce technology of big data systems might satisfy the both speed and confidence here. mongoDB, Cassandra and Hadoop are the most popular solutions for no-sql databases. Besides, if you have a powerful database such as Oracle Exadata, then RDBMS and regular sql might satisfy your concerns as well.
Ensemble method
We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.
Tech Stack Recommendations
Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.
The Best Single Model
There are a few state-of-the art face recognition models: VGG-Face, FaceNet, OpenFace and DeepFace. Some are designed by tech giant companies such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford and Carnegie Mellon University. We will have a discussion about the best single face recognition model in this video.
DeepFace API
DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:
Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:
Support this blog if you do like!
Dear Serengil,
Why you have not stored the images in folders? this would allow you to store for e.g. 10 images of 1 person under a folder name. This should increase the accuracy?
You can but this is one shot learning. If one picture is enough, why would you need many pictures.
HI,
What is the resource that you get the facenet’s parameters is equal to 22,808,144?
because some website said that facenet’s parameters equal to 140M parameters
I build the model scratch and call model.summary() in keras. You should follow the related link of facenet.
Hi…
I read the deepface paper. It said that (the DeepFace runs at 0.33 seconds per image, accounting for image decoding, face detection and alignment, the feedforward network, and the final classification output)
– Using a single-core Intel 2.2GHz CPU.
-When I use the model here in the GPU of 11700 images (450 persons), it takes 24 seconds to predict.
They might build the model on core c or c++ . This is a keras implementation. Besides, we do not know the hardware they have. We should not compare performances.
Hi…
I read the deepface paper. It said that (the DeepFace runs at 0.33 seconds per image, accounting for image decoding, face detection and alignment, the feedforward network, and the final classification output)
– Using a single-core Intel 2.2GHz CPU.
-When I use the model here in the GPU of 11700 images (450 persons), it takes 24 seconds to predict.
Hi…
I read the deepface paper. It said that (the DeepFace runs at 0.33 seconds per image, accounting for image decoding, face detection and alignment, the feedforward network, and the final classification output)
– Using a single-core Intel 2.2GHz CPU.
-When I use the model here in the GPU of 11700 images (450 persons), it takes 24 seconds to predict.
Hi
How can I calculate the accuracy using accuracy_score function in your implementation?
You can get help from this program: https://github.com/serengil/deepface/blob/master/tests/unit_tests.py
I basically call predict function and embeddings, find distance between embeddings, assume they are same if distance is less than the threshold. Finally, assumptions and real values could be stored in a data frame and you can pass this data frame to accuracy_score function.
thank you
I am using the KNN classification to train the model. the accuracy return is 17% !!!
Why?
Aren’t you use the pre-trained weights? Why you need training?
I am using deepface pre-trained weight that you put it here, but Instead of using your way to predict I am using KNN classification
yhat_class = knn_classifier.predict(test_faces_embedding)
score = accuracy_score(test_labels ,yhat_class)
Any help?
Hi…
I tested deepface model with 100 faces and it predicted only 18 persons correct.
– the dataset contains 46000 images (1000 persons)
Is that the accuracy of the model?
You should apply face alignment first. Besides, using multiple images of a person increases the accuracy. As mentioned in the real time video, it finds correct faces in my experiments.
I am using MTCNN to detect faces. It is by default using face alignment
Dear Safrik,
What are the versions you have used?
OpenCV –
Keras –
Tensorflow –
Thank You
pip install opencv-python==3.4.4
pip install tensorflow==1.9.0
pip install keras==2.2.0
Could you suggest a system configuration which would be optimal for deepface model using real time feed.
Thank you π
Dear Sefik,
I have a question, can I use the pre-trained weight to train a new different model. I’m planning to create a face recognition model for people wearing a face mask. Is it possible or do I have to train it from the scratch?.
Yes, you can. Please read the transfer learning posts in my blog (e.g. apparent age prediction)