Face Recognition with OpenFace in Keras

OpenFace is a lightweight and minimalist model for face recognition. Similar to Facenet, its license is free and allowing commercial purposes. On the other hand, VGG-Face is restricted for commercial use. In this post, we will mention how to adapt OpenFace for your face recognition tasks in Python with Keras.

Objective

Face recognition is a combination of CNN, Autoencoders and Transfer Learning studies. I strongly recommend you to read How Face Recognition Works post to understand what a face recognition pipeline exactly is.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Impediments

OpenFace model including model structure and weights is public but it is built with Lua Torch. Currently, PyTorch does not support to load Torch7 (.t7) models anymore after its 1.0 version released. Even though previous PyTorch versions (e.g. 0.4.1) continues to support, publised OpenFace Torch7 model cannot be converted because it is a really old Torch model. Some researchers propose to load this model with a new Lua Torch version and save it back. However, I do not a Lua environment.

After this painful process, I found this repository – Keras-OpenFace. OpenFace weights are converted to Keras already here. The whole model including structure and weights saved a standalone file here. Unfortunately, this causes trouble if you would load the model with different Keras version. Weights and structure should be separated to be compatible with all Keras environments. Weights can be shared as binary h5 extension but structure should be constructed by hand. Loading the model structure with a json file might cause “bad marshal data” exceptions similar to loading standalone model.

Luckily, converted OpenFace weights in csv files are already stored in this repository. Your friendly neightbour blogger will convert OpenFace model including structure and weights separately to Keras. This notebook helped me to convert weights.

CNN Model

OpenFace model expects (96×96) RGB images as input. Its has a 128 dimensional output. The model is built on Inception Resnet V1. I already build the CNN model for Keras. You can find the built model here as json file. Even though the model seems complex, number of parameters are much less than VGG-Face.

import tensorflow as tf
model = model_from_json(open("openface_structure.json", "r").read(), custom_objects={"tf": tf})

You might have some troubles when you loading the model from json file because Keras expects you to use same environment with the publisher. In this case, you can build your model manually. Here, you can find the manual model building.

Next, you should load the pre-trained weights. It is size of 14 MB. That’s why, I stored it in my Google Drive.

#Pre-trained OpenFace weights: https://bit.ly/2Y34cB8
model.load_weights("openface_weights.h5")

Vector representation

Similar to VGG-Face and Facenet, we will apply one shot learning. CNN model finds the vector representations of faces and expresses faces as embeddings.

p1 = "openface-samples/img-1.jpg"
p2 = "openface-samples/img-2.jpg"

img1_representation = model.predict(preprocess_image(p1))[0,:]
img2_representation = model.predict(preprocess_image(p2))[0,:]

Different photos of same person should have a low distance whereas different faces should have a high distance. Euclidean distance or cosine can be the metric here. I mostly do not prefer to use l2 normalization in calculations of euclidean distance.

def findCosineDistance(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))

def l2_normalize(x, axis=-1, epsilon=1e-10):
output = x / np.sqrt(np.maximum(np.sum(np.square(x), axis=axis, keepdims=True), epsilon))
return output

def findEuclideanDistance(source_representation, test_representation):
euclidean_distance = source_representation - test_representation
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
#euclidean_distance = l2_normalize(euclidean_distance )
return euclidean_distance

cosine = findCosineDistance(img1_representation, img2_representation)
euclidean = findEuclideanDistance(img1_representation, img2_representation)

My own best practices that I’ve gathered from my personal experiences and experiments on OpenFace show that distance threshold for the model should be 0.02 for cosine and 0.20 for euclidean distance.

If you wonder how to determine the threshold value for this face recognition model, then this blog post explains it deeply.

if cosine < 0.02:
print("these are same")
else:
print("these are different")

"""if euclidean < 0.20:
print("these are same")
else:
print("these are different")"""

Testings

I feed some photos of Katy Perry, Miley Cyrus and Angelina Jolie to OpenFace model and check the identity of some combinations. Results are satisfactory.

openface-tests-resized — Testings for OpenFace

Real time

We can run OpenFace implemenation for real time as well. OpenCV’s haarcascade module handles detecting faces and we feed detected face to OpenFace model. You can find the source code of the following video here. This solution is much more faster than VGG-Face and Facenet.

Thresholds are tuned for real time implementation. It is now 0.45 for cosine and 0.95 for euclidean without l2 normalization.

Approximate Nearest Neighbor

As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.

On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.

So, if you have a robust facial recognition model then it is not a big deal to run it in billions!

Face alignment

Modern face recognition pipelines consist of 4 stages: detect, align, represent and classify / verify. We’ve ignored the face detection and face alignment steps not to make this post so complex. However, it is really important for face recognition tasks.

Face detection can be done with many solutions such as OpenCV, Dlib or MTCNN. OpenCV offers haar cascade, single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally, MTCNN is a popular solution in the open source community as well. Herein, SSD, MMOD and MTCNN are modern deep learning based approaches whereas haar cascade and HoG are legacy methods. Besides, SSD is the fastest one. You can monitor the detection performance of those methods in the following video.

Here, you can watch how to use different face detectors in Python.

For instance, Google declared that face alignment increases its face recognition model FaceNet from 98.87% to 99.63%. This is almost 1% accuracy improvement which means a lot for engineering studies. Here, you can find a detailed tutorial for face alignment in Python within OpenCV.

You can find out the math behind alignment more on the following video:

Besides, face detectors detect faces in a rectangle area. So, detected faces come with some noise such as background color. We can find 68 different landmarks of a face with dlib. In this way, we can get rid of any noise of a facial image.

In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.

Conclusion

OpenFace is a lightweight face recognition model. It is not the best but it is a strong alternative to stronger ones such as VGG-Face or Facenet. It has 3.7M trainable parameters. This was 145M in VGG-Face and 22.7M in Facenet. Besides, weights of OpenFace is 14MB. Notice that VGG-Face weights was 566 MB and Facenet weights was 90 MB. This comes with the speed. That’s why, adoption of OpenFace is very high. You can deploy it even in a mobile device.

To be honest, this model is not perfect. It can fail some obvious testings. You should adopt VGG-Face if you do not have a tolerance for errors. On the other hand, speed of your implementation is your first benchmark, OpenFace would be a pretty solution.

Besides, OpenFace is developed by researchers of Carnegie Mellon University. VGG-Face was developed by Oxford University researchers. Notice that both studies are based on United Kingdom universities. We owe much these UK based researches!

Source code

I pushed source code of this blog post to GitHub as notebook. Besides, model including structure and pre-trained weights in Keras format is shared as well. If model building step seems too complex, you can also load it with a single line code by referencing model in JSON format. You can support this work just by starring the repository.

Python Library

Herein, deepface is a lightweight face recognition framework for Python. It currently supports the most common face recognition models including VGG-Face, Facenet and OpenFace, DeepID.

It handles model building, loading pre-trained weights, finding vector embedding of faces and applying similarity metrics to recognize faces in the background. You can verify faces with a just few lines of code. It is available on PyPI. You should run the command “pip install deepface” to have it. Its code is also open-sourced in GitHub. GitHub repo has a detailed documentation for developers. BTW, you can support this project by starring the repo.

Here, you can watch the how to video for deepface.

Besides, you can run deepface in real time with your webcam as well.

Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.

Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.

Large scale face recognition

Large scale face recognition requires to apply face verification several times. However, we can store the representations of ones in our database. In this way, we just need to find the representation of target image. Finding distances between representations can be handled very fast. So, we can find an identity in a large scale data set in just seconds. Deepface offers an out-of-the-box function to handle large scale face recognition as well.

Notice that face recognition has O(n) time complexity and this might be problematic for millions or billions level data. Herein, approximate nearest neighbor (a-nn) algorithm reduces time complexity dramatically. Spotify Annoy, Facebook Faiss and NMSLIB are amazing a-nn libraries. Besides, Elasticsearch wraps an NMSLIB and it comes with highly scalability. You should run deepface within those a-nn libraries if you have really large scale data base.

On the other hand, a-nn algorithm does not guarantee to find the closest one always. We can still apply k-nn algorithm here. Map reduce technology of big data systems might satisfy the both speed and confidence here. mongoDB, Cassandra and Hadoop are the most popular solutions for no-sql databases. Besides, if you have a powerful database such as Oracle Exadata, then RDBMS and regular sql might satisfy your concerns as well.

Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Ensemble method

We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.

The Best Single Model

There are a few state-of-the art face recognition models: VGG-Face, FaceNet, OpenFace and DeepFace. Some are designed by tech giant companies such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford and Carnegie Mellon University. We will have a discussion about the best single face recognition model in this video.

DeepFace API

DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:

Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:

Encrypt Vector Embeddings

Facial recognition relies on vector embedding comparisons, much like other vector-based models such as reverse image search, recommendation engines, and LLMs. While embeddings do not allow the original data to be reconstructed, they still contain sensitive information, similar to fingerprints. If leaked, they can expose systems to adversarial attacks.

Encrypting embeddings can enhance security, but traditional symmetric key algorithms like AES have limitations. The private key needed for decryption must remain secure and cannot be transmitted to the cloud without risking exposure. If encrypted embeddings are retrieved for decryption and distance calculations on on-premises systems, the cloud’s computational power remains underutilized.

Homomorphic encryption offers a powerful alternative through public-key cryptography. It enables computations directly on encrypted data without requiring decryption, allowing full utilization of the cloud’s computational resources.

You can encrypt embeddings and compute vector similarity directly on encrypted data using homomorphic encryption. This ensures privacy while still enabling similarity calculations. Here, you can find an implementation using partially homomorphic encryption.

The same use case can also be implemented with fully homomorphic encryption (FHE), but it comes with significant drawbacks. FHE is slower, requires more computational power, and generates much longer ciphertexts. Additionally, its private and public keys are not well-suited for memory-constrained environments like IoT devices.

Support this blog if you do like!

Face Recognition with OpenFace in Keras

Objective

Impediments

CNN Model

Vector representation

Testings

Real time

Approximate Nearest Neighbor

Face alignment

Conclusion

Source code

Python Library

Anti-Spoofing and Liveness Detection

Large scale face recognition

Tech Stack Recommendations

Ensemble method

The Best Single Model

DeepFace API

Encrypt Vector Embeddings

Related

22 Comments

Leave a Reply Cancel reply

Objective

Impediments

CNN Model

Vector representation

Testings

Real time

Approximate Nearest Neighbor

Face alignment

Conclusion

Source code

Python Library

Anti-Spoofing and Liveness Detection

Large scale face recognition

Tech Stack Recommendations

Ensemble method

The Best Single Model

DeepFace API

Encrypt Vector Embeddings

Related

22 Comments

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil