Face Recognition with FaceNet in Keras

Google announced FaceNet as its deep learning based face recognition model. It was built on the Inception model. We have been familiar with Inception in kaggle imagenet competitions. Basically, the idea to recognize face lies behind representing two images as smaller dimension vectors and decide identity based on similarity just like in Oxford’s VGG-Face.

katy-perry-facenet
Katy Perry wears a funeral face net

Objective

Face recognition is a combination of CNN, Autoencoders and Transfer Learning studies. I strongly recommend you to read How Face Recognition Works post to understand what a face recognition pipeline is.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Transfer learning

We will apply transfer learning to have outcomes of previous researches. David Sandberg shared pre-trained weights after 30 hours training with GPU. However, that work was on raw TensorFlow. Your friendly neighborhood blogger converted the pre-trained weights into Keras format. I put the weights in Google Drive because it exceeds the upload size of GitHub. You can find pre-trained weights here. Also, FaceNet has a very complex model structure. You can find the model structure here in json format.

We can create FaceNet mode as illustrated below.

from keras.models import model_from_json

#facenet model structure: https://github.com/serengil/tensorflow-101/blob/master/model/facenet_model.json
model = model_from_json(open("facenet_model.json", "r").read())

#pre-trained weights https://drive.google.com/file/d/1971Xk5RwedbudGgTIrGAL4F7Aifu7id1/view?usp=sharing
model.load_weights("facenet_weights.h5")

model.summary()

Some people informed me about that python 3.6 users need to downgrade their python version to 3.5 to import model structure json. If you are a 3.6 user and insist to use this version you can alternatively load the model as demonstrated below.

# get the following file in the same directoryhttps
# https://github.com/serengil/tensorflow-101/blob/master/model/inception_resnet_v1.py
from inception_resnet_v1 import *
model = InceptionResNetV1()

FaceNet model expects 160×160 RGB images whereas it produces 128-dimensional representations. Auto-encoded representations called embeddings in the research paper. Additionally, researchers put an extra l2 normalization layer at the end of the network. Remember what l2 normalization is.

l2 = √(∑ xi2) while (i=0 to n) for n-dimensional vector

They also constrained 128-dimensional output embedding to live on the 128-dimensional hyperspace. This means that element wise should be applied to output and l2 normalized form pair.

def l2_normalize(x):
return x / np.sqrt(np.sum(np.multiply(x, x)))

512 dimensional FaceNet model

The original Facenet study creates 128 dimensional vectors. The model mentioned above creates 128 dimensions as well. Here, David Sandberg published an extended version of Facenet and it creates 512 dimensions. He got 99.60% accuracy on LFW data set! Here, you can find how to build Facenet 512 model.

#!pip install deepface
model = DeepFace.build_model("Facenet512")

Finding similarity

Researchers also mentioned that they used euclidean distance instead of cosine similarity to find similarity between two vectors. Euclidean distance basically finds distance of two vectors on an euclidean space.





def findEuclideanDistance(source_representation, test_representation):
euclidean_distance = source_representation - test_representation
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance

Finally, we can find the distance between two different images via FaceNet.

img1_representation = l2_normalize(model.predict(preprocess_image("img1.jpg"))[0,:])
img2_representation = l2_normalize(model.predict(preprocess_image("img2.jpg"))[0,:])

euclidean_distance = findEuclideanDistance(img1_representation, img2_representation)

Distance should be small for images of same person whereas distance should be large for pictures of different people. Setting the threshold  to 0.20 in the research paper but I got successful results when it is set to 0.35.

If you wonder how to determine the threshold value for this face recognition model, then this blog post explains it deeply.

threshold = 0.35
if euclidean_distance < threshold:
print("verified... they are same person")
else:
print("unverified! they are not same person!")

Still, we can check cosine similarity between two vectors. In this case, I got the most successful results when I set the threshold to 0.07. Notice that l2 normalization skipped for this metric.

def findCosineSimilarity(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))

img1_representation = model.predict(preprocess_image("img1.jpg"))[0,:]
img2_representation = model.predict(preprocess_image("img2.jpg"))[0,:]

cosine_similarity = findCosineSimilarity(img1_representation, img2_representation)
print(cosine_similarity)
threshold = 0.07
if cosine_similarity < threshold:
print("verified... they are same person")
else:
print("unverified! they are not same person!")

Testing

Well, we designed the model. The important thing how successful designed model is. I test the FaceNet with same instances in VGG-Face testing.

It succeeded when I tested the model for really different Angelina Jolie images.

facenet-test1
True positive results for Angelina Jolie

Similarly, FaceNet succeeded when tested for different photos of Jennifer Aniston .

facenet-test3
True positive results for Jennifer Aniston

We can process true negative cases successfully.

facenet-test2
Angelina Jolie and Jennifer Aniston are true negative

Real Time Implementation

We can run Google Facenet model in real time as well. The following video applies facenet to find the vector representations of both images in the database and captured one. OpenCV handles face detection here. Euclidean distance checks the distance between two images.

I removed l2 normalization step here because it produces unstable results in real time. That’s why, threshold value changed. My experiments show that face images have a euclidean distance less than 21 are same if l2 normalization disabled. My experiments also show that l2 normalization disabled euclidean distance is more stable than cosine distance for Facenet model.





You can find the source code for this real time implementation in GitHub.

Approximate Nearest Neighbor

As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.

On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.

So, if you have a robust facial recognition model then it is not a big deal to run it in billions!

Face alignment

Modern face recognition pipelines consist of 4 stages: detect, align, represent and classify / verify. We’ve skipped the face detection and face alignment steps not to make this post so complex. However, it is really important for face recognition tasks.

Face detection can be done with many solutions such as OpenCV, Dlib or MTCNN. OpenCV offers haar cascade, single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally, MTCNN is a popular solution in the open source community as well. Herein, SSD, MMOD and MTCNN are modern deep learning based approaches whereas haar cascade and HoG are legacy methods. Besides, SSD is the fastest one. You can monitor the detection performance of those methods in the following video.

Here, you can watch how to use different face detectors in Python.

Moreover, Google declared that face alignment increases its face recognition model FaceNet from 98.87% to 99.63%. This is almost 1% accuracy improvement which means a lot for engineering studies. Here, you can find a detailed tutorial for face alignment in Python within OpenCV.

rotate-from-scratch
Face alignment

You can find out the math behind alignment more on the following video:





Besides, face detectors detect faces in a rectangle area. So, detected faces come with some noise such as background color. We can find 68 different landmarks of a face with dlib. In this way, we can get rid of any noise of a facial image.

In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.

Conclusion

So, we’ve implemented Google’s face recognition model on-premise in this post. We have combined representations with autoencoders, transfer learning and vector similarity concepts to build FaceNet. Original paper includes face alignment steps but we skipped them in this post. Instead of including alignment, I fed already aligned images as inputs. Moreover, FaceNet has a much more complex model structure than VGG-Face. Still, VGG-Face produces more successful results than FaceNet based on experiments. This might cause to produce slower results in real time. Finally, I pushed the code of this post into GitHub.

katy-perry-facenet-2
Katy Perry with her Face Net

Python Library

Herein, deepface is a lightweight face recognition framework for Python. It currently supports the most common face recognition models including VGG-Face, Facenet and OpenFace, DeepID.

It handles model building, loading pre-trained weights, finding vector embedding of faces and applying similarity metrics to recognize faces in the background. You can verify faces with a just few lines of code. It is available on PyPI. You should run the command “pip install deepface” to have it. Its code is also open-sourced in GitHub. GitHub repo has a detailed documentation for developers. BTW, you can support this project by starring the repo.

deepface-simple
deepface

Here, you can watch the how to video for deepface.

Besides, you can run deepface in real time with your webcam as well.

Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.





Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.

Large scale face recognition

Large scale face recognition requires to apply face verification several times. However, we can store the representations of ones in our database. In this way, we just need to find the representation of target image. Finding distances between representations can be handled very fast. So, we can find an identity in a large scale data set in just seconds. Deepface offers an out-of-the-box function to handle large scale face recognition as well.

Notice that face recognition has O(n) time complexity and this might be problematic for millions or billions level data. Herein, approximate nearest neighbor (a-nn) algorithm reduces time complexity dramatically. Spotify Annoy, Facebook Faiss and NMSLIB are amazing a-nn libraries. Besides, Elasticsearch wraps an NMSLIB and it comes with highly scalability. You should run deepface within those a-nn libraries if you have really large scale data base.

On the other hand, a-nn algorithm does not guarantee to find the closest one always. We can still apply k-nn algorithm here. Map reduce technology of big data systems might satisfy the both speed and confidence here. mongoDB, Cassandra and Hadoop are the most popular solutions for no-sql databases. Besides, if you have a powerful database such as Oracle Exadata, then RDBMS and regular sql might satisfy your concerns as well.

Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Ensemble method

We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.

The Best Single Model

There are a few state-of-the art face recognition models: VGG-Face, FaceNet, OpenFace and DeepFace. Some are designed by tech giant companies such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford and Carnegie Mellon University. We will have a discussion about the best single face recognition model in this video.

DeepFace API

DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:

Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:





Encrypt Vector Embeddings

Facial recognition relies on vector embedding comparisons, much like other vector-based models such as reverse image search, recommendation engines, and LLMs. While embeddings do not allow the original data to be reconstructed, they still contain sensitive information, similar to fingerprints. If leaked, they can expose systems to adversarial attacks.

Encrypting embeddings can enhance security, but traditional symmetric key algorithms like AES have limitations. The private key needed for decryption must remain secure and cannot be transmitted to the cloud without risking exposure. If encrypted embeddings are retrieved for decryption and distance calculations on on-premises systems, the cloud’s computational power remains underutilized.

Homomorphic encryption offers a powerful alternative through public-key cryptography. It enables computations directly on encrypted data without requiring decryption, allowing full utilization of the cloud’s computational resources.

You can encrypt embeddings and compute vector similarity directly on encrypted data using homomorphic encryption. This ensures privacy while still enabling similarity calculations. Here, you can find an implementation using partially homomorphic encryption.

The same use case can also be implemented with fully homomorphic encryption (FHE), but it comes with significant drawbacks. FHE is slower, requires more computational power, and generates much longer ciphertexts. Additionally, its private and public keys are not well-suited for memory-constrained environments like IoT devices.


Support this blog if you do like!

Buy me a coffee      Buy me a coffee


86 Comments

Leave a Reply