Face Recognition with OpenFace in Keras

OpenFace is a lightweight and minimalist model for face recognition. Similar to Facenet, its license is free and allowing commercial purposes. On the other hand, VGG-Face is restricted for commercial use. In this post, we will mention how to adapt OpenFace for your face recognition tasks in Python with Keras.

jason-bourne-alpha-cam
Jason Bourne

Objective

Face recognition is a combination of CNN, Autoencoders and Transfer Learning studies. I strongly recommend you to read How Face Recognition Works post to understand what a face recognition pipeline exactly is.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Impediments

OpenFace model including model structure and weights is public but it is built with Lua Torch. Currently, PyTorch does not support to load Torch7 (.t7) models anymore after its 1.0 version released. Even though previous PyTorch versions (e.g. 0.4.1) continues to support, publised OpenFace Torch7 model cannot be converted because it is a really old Torch model. Some researchers propose to load this model with a new Lua Torch version and save it back. However, I do not a Lua environment.

After this painful process, I found this repository – Keras-OpenFace. OpenFace weights are converted to Keras already here. The whole model including structure and weights saved a standalone file here. Unfortunately, this causes trouble if you would load the model with different Keras version. Weights and structure should be separated to be compatible with all Keras environments. Weights can be shared as binary h5 extension but structure should be constructed by hand. Loading the model structure with a json file might cause “bad marshal data” exceptions similar to loading standalone model.

Luckily, converted OpenFace weights in csv files are already stored in this repository. Your friendly neightbour blogger will convert OpenFace model including structure and weights separately to Keras. This notebook helped me to convert weights.

CNN Model

OpenFace model expects (96×96) RGB images as input. Its has a 128 dimensional output. The model is built on Inception Resnet V1. I already build the CNN model for Keras. You can find the built model here as json file. Even though the model seems complex, number of parameters are much less than VGG-Face.

import tensorflow as tf
model = model_from_json(open("openface_structure.json", "r").read(), custom_objects={'tf': tf})

You might have some troubles when you loading the model from json file because Keras expects you to use same environment with the publisher. In this case, you can build your model manually. Here, you can find the manual model building.

Next, you should load the pre-trained weights. It is size of 14 MB. That’s why, I stored it in my Google Drive.

#Pre-trained OpenFace weights: https://bit.ly/2Y34cB8
model.load_weights("openface_weights.h5")

Vector representation

Similar to VGG-Face and Facenet, we will apply one shot learning. CNN model finds the vector representations of faces and expresses faces as embeddings.

p1 = 'openface-samples/img-1.jpg'
p2 = 'openface-samples/img-2.jpg'

img1_representation = model.predict(preprocess_image(p1))[0,:]
img2_representation = model.predict(preprocess_image(p2))[0,:]

Different photos of same person should have a low distance whereas different faces should have a high distance. Euclidean distance or cosine can be the metric here. I mostly do not prefer to use l2 normalization in calculations of euclidean distance.





def findCosineDistance(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))

def l2_normalize(x, axis=-1, epsilon=1e-10):
output = x / np.sqrt(np.maximum(np.sum(np.square(x), axis=axis, keepdims=True), epsilon))
return output

def findEuclideanDistance(source_representation, test_representation):
euclidean_distance = source_representation - test_representation
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
#euclidean_distance = l2_normalize(euclidean_distance )
return euclidean_distance

cosine = findCosineDistance(img1_representation, img2_representation)
euclidean = findEuclideanDistance(img1_representation, img2_representation)

My own best practices that I’ve gathered from my personal experiences and experiments on OpenFace show that distance threshold for the model should be 0.02 for cosine and 0.20 for euclidean distance.

If you wonder how to determine the threshold value for this face recognition model, then this blog post explains it deeply.

if cosine <= 0.02:
print("these are same")
else:
print("these are different")

"""if euclidean <= 0.20:
print("these are same")
else:
print("these are different")"""

Testings

I feed some photos of Katy Perry, Miley Cyrus and Angelina Jolie to OpenFace model and check the identity of some combinations. Results are satisfactory.

openface-tests-resized
Testings for OpenFace

Real time

We can run OpenFace implemenation for real time as well. OpenCV’s haarcascade  module handles detecting faces and we feed detected face to OpenFace model. You can find the source code of the following video here. This solution is much more faster than VGG-Face and Facenet.

Thresholds are tuned for real time implementation. It is now 0.45 for cosine and 0.95 for euclidean without l2 normalization.

Face alignment

Modern face recognition pipelines consist of 4 stages: detect, align, represent and classify / verify. We’ve ignored the face detection and face alignment steps not to make this post so complex. However, it is really important for face recognition tasks.

Face detection can be done with many solutions such as OpenCV, Dlib or MTCNN. OpenCV offers haar cascade, single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally, MTCNN is a popular solution in the open source community as well. Herein, SSD, MMOD and MTCNN are modern deep learning based approaches whereas haar cascade and HoG are legacy methods. Besides, SSD is the fastest one. You can monitor the detection performance of those methods in the following video.

Here, you can watch how to use different face detectors in Python.

For instance, Google declared that face alignment increases its face recognition model FaceNet from 98.87% to 99.63%. This is almost 1% accuracy improvement which means a lot for engineering studies. Here, you can find a detailed tutorial for face alignment in Python within OpenCV.

rotate-from-scratch
Face alignment

You can find out the math behind alignment more on the following video:





Besides, face detectors detect faces in a rectangle area. So, detected faces come with some noise such as background color. We can find 68 different landmarks of a face with dlib. In this way, we can get rid of any noise of a facial image.

In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.

Conclusion

OpenFace is a lightweight face recognition model. It is not the best but it is a strong alternative to stronger ones such as VGG-Face or Facenet. It has 3.7M trainable parameters. This was 145M in VGG-Face and 22.7M in Facenet. Besides, weights of OpenFace is 14MB. Notice that VGG-Face weights was 566 MB and Facenet weights was 90 MB. This comes with the speed. That’s why, adoption of OpenFace is very high. You can deploy it even in a mobile device.

To be honest, this model is not perfect. It can fail some obvious testings. You should adopt VGG-Face if you do not have a tolerance for errors. On the other hand, speed of your implementation is your first benchmark, OpenFace would be a pretty solution.

Besides, OpenFace is developed by researchers of Carnegie Mellon University. VGG-Face was developed by Oxford University researchers. Notice that both studies are based on United Kingdom universities. We owe much these UK based researches!

Source code

I pushed source code of this blog post to GitHub as notebook. Besides, model including structure and pre-trained weights in Keras format is shared as well. If model building step seems too complex, you can also load it with a single line code by referencing model in JSON format. You can support this work just by starring the repository.

Python Library

Herein, deepface is a lightweight face recognition framework for Python. It currently supports the most common face recognition models including VGG-Face, Facenet and OpenFace, DeepID.

It handles model building, loading pre-trained weights, finding vector embedding of faces and applying similarity metrics to recognize faces in the background. You can verify faces with a just few lines of code. It is available on PyPI. You should run the command “pip install deepface” to have it. Its code is also open-sourced in GitHub. GitHub repo has a detailed documentation for developers. BTW, you can support this project by starring the repo.

deepface-simple
deepface

Here, you can watch the how to video for deepface.





Besides, you can run deepface in real time with your webcam as well.

Large scale face recognition

Large scale face recognition requires to apply face verification several times. However, we can store the representations of ones in our database. In this way, we just need to find the representation of target image. Finding distances between representations can be handled very fast. So, we can find an identity in a large scale data set in just seconds. Deepface offers an out-of-the-box function to handle large scale face recognition as well.

Notice that face recognition has O(n) time complexity and this might be problematic for millions or billions level data. Herein, approximate nearest neighbor (a-nn) algorithm reduces time complexity dramatically. Spotify Annoy, Facebook Faiss and NMSLIB are amazing a-nn libraries. Besides, Elasticsearch wraps an NMSLIB and it comes with highly scalability. You should run deepface within those a-nn libraries if you have really large scale data base.

On the other hand, a-nn algorithm does not guarantee to find the closest one always. We can still apply k-nn algorithm here. Map reduce technology of big data systems might satisfy the both speed and confidence here. mongoDB, Cassandra and Hadoop are the most popular solutions for no-sql databases. Besides, if you have a powerful database such as Oracle Exadata, then RDBMS and regular sql might satisfy your concerns as well.

Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Ensemble method

We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.

The Best Single Model

There are a few state-of-the art face recognition models: VGG-Face, FaceNet, OpenFace and DeepFace. Some are designed by tech giant companies such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford and Carnegie Mellon University. We will have a discussion about the best single face recognition model in this video.


Like this blog? Support me on Patreon

Buy me a coffee


22 Comments

    1. This is the one shot learning but you can manipulate the system as described below. Feed a multiple images of a person. For example, alex_1.jpg, alex_2.jpg and alex_3.jpg. Suppose that you are going to find who is x.jpg. Compare x and Alex. To do this, you need to find the distance x-alex_1, x-alex_2 and x-alex-3. Find the average value of all of these comparisons. If average value is less than the threshold, then x is Alex.

      I hope this explains well

  1. thanks for such a great tutorial. I have tried it, but its not labelling me correct. Either it is labelling me unknown or the other guys. I have also increased my picture, as you suggested in above pics. and remove the guy which the algo is predicting me. but now it still predicting me as unknown or the other guy left in data.
    How can i improve it?

      1. yes, they are. i have one more question, at which phase, this model is being trained for our dataset? i mean in transfer learning we trained last layers according to our data. i want to know, how this model know about our data

  2. Thank you for the tutorial! I tried out your real time code from GitHub and I’m getting strange results. All the source images (“database” folder) are cropped, the cv2 face detection works correctly. For instance when I try the net on Angelina I’m getting the best match result for Brad. With a photo of my own the detections are inconsistent – I’m being labeled as myself and then as someone else. Maybe you could share the dataset you have used for the YT video (testing dataset)? Otherwise, what network should I use for more usable results and also for the constrained devices?

    1. This is interesting. Could you crop testing images in database folder with opencv face detection? I could find these testing images randomly.

      1. Actually the opencv detection algorithm crops the images differently than they are cropped in your samples. But it changes nothing. It is noticeable that OpenFace has a very low margin of error – most wrong/correct classifications are just near the threshold.
        What network could I chose for more precise recognition?

    1. Another Question:
      What is the good algorithm should I used, If I have 1000 persons (49000 images)?

      1. You need to apply face recognition in real time? If yes, OpenFace is pretty. Because it has close accuracy to complex models such as VGG-Face, FaceNet and DeepFace. But if you can do it as batch, these complex models would be better.

      1. Thank you very much for the respond

        What is better openface with mtcnn or dlib face recognition?

  3. Dear Mr. Serengil,

    at first i like to say thanks for your amazing work.
    I have a important question about this.

    Im using your framework but i dont know why its not giving me the same distance (similarity-scores) than the original openface framework.

    My code is:

    OpenFaceModel = DeepFace.build_model(“OpenFace”)
    result = DeepFace.verify(lennon1,clapton2,”OpenFace” ,”euclidean_l2″,OpenFaceModel,detector_backend=”dlib”,enforce_detection=True)
    print(result[‘distance’])

    And i wanted to check if i get the same result like here: https://cmusatyalab.github.io/openface/demo-2-comparison/

    The verification between lennon1-image and clapton2-image gives me a distance of 0.6152904504327942. But if you check the openface-website, the distance should be 1.145.

    Why is this so different?

      1. Thanks for your advise.
        I managed to run the Code from https://sefiks.com/2020/07/24/face-recognition-with-opencv-dnn-in-python/ but it still doesnt have the same distance scores between the images.

        For lennon1 and lennon2 image comparison i get these scores:
        Euclidean distance: 0.6823
        EuclideanL2 distance: 0.6823
        Cosine distance: 0.2328

        For lennon1 and claptopn2 image comparison i get these scores:
        Euclidean distance: 0.5396
        EuclideanL2 distance: 0.5396
        Cosine distance: 0.1456

        Its using the identical model though, but the scores are still different.
        Could it be that the difference is happening because of the different detection methods? In your example you are using opencv whereas openface uses dlib.

        Thank you for your help.

        Im looking for a way to use deepface with facenet and openface, where i get the same scores like the original frameworks.

Comments are closed.