Deep Face Recognition with VGG-Face in Keras

Face recognition has always been challenging topic for both science and fiction. A woman has her hair dyed or worn a hat to to disguise. Deep learning tasks usually expect to be fed multiple instances of a custom class to learn (e.g. lots of pictures of someone). This makes face recognition task satisfactory because training should be handled with limited number of instances – mostly one shot of a person exists. Moreover, adding new classes should not require reproducing the model. In this post, we’ll create a deep face recognition model from scratch with Keras based on the recent researches.

774796 — Dramatic transformation of Katy Perry

Oxford visual geometry group announced its deep face recognition architecture. We have been familiar with VGG in imagenet challenge. We can recognize hundreds of images just applying transfer learning. Also, we have used same model for style transfer.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Objective

Face recognition is a combination of CNN, auto-encoder and transfer learning studies. I strongly recommend you to read How Face Recognition Works post to understand what a face recognition pipeline is.

Network configuration

Even though research paper is named Deep Face, researchers give VGG-Face name to the model. This might be because Facebook researchers also called their face recognition system DeepFace – without blank. VGG-Face is deeper than Facebook’s Deep Face, it has 22 layers and 37 deep units.

The structure of the VGG-Face model is demonstrated below. Only output layer is different than the imagenet version – you might compare.

Research paper denotes the layer structre as shown below.

layer-details-in-vgg-face — VGG-Face layers from original paper

I visualize the VGG-Face architure to be understood clear

vgg-face-architecture — Visualization of VGG-Face

Let’s construct the VGG Face model in Keras

model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
model.add(Convolution2D(64, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, (3, 3), activation="relu"))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation="relu"))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation="relu"))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation="relu"))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Convolution2D(4096, (7, 7), activation="relu"))
model.add(Dropout(0.5))
model.add(Convolution2D(4096, (1, 1), activation="relu"))
model.add(Dropout(0.5))
model.add(Convolution2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation("softmax"))

Learning outcomes

Research group shared pre-trained weights on the group page under the path vgg_face_matconvnet / data / vgg_face.mat, but it is matlab compatible. Here, your friendly neighborhood blogger has already transformed pre-trained weights for Keras. If you wonder how matlab weights converted in Keras, you can read this article. Due to weight file is 500 MB, and GitHub enforces to upload files smaller than 25 MB, I had to upload pre-trained weights in Google Drive. You can find the pre-trained weights here.

from keras.models import model_from_json
# Weights: https://github.com/serengil/deepface_models/releases/download/v1.0/vgg_face_weights.h5
model.load_weights("vgg_face_weights.h5")

Finally, we’ll use previous layer of the output layer for representation. The following usage will give output of that layer.

vgg_face_descriptor = Model(inputs=model.layers[0].input
, outputs=model.layers[-2].output)

Representation

In this way, we can represent images as 2622 dimensional vector as illustarted below.

img1_representation = vgg_face_descriptor.predict(preprocess_image("1.jpg"))[0,:]
img2_representation = vgg_face_descriptor.predict(preprocess_image("2.jpg"))[0,:]

Notice that VGG model expects 224x224x3 sized input images. Here, 3rd dimension refers to number of channels or RGB colors. Besides, preprocess_input function normalizes input in scale of [-1, +1].

def preprocess_image(image_path):
img = load_img(image_path, target_size=(224, 224))
img = img_to_array(img)
img = np.expand_dims(img, axis=0)
img = preprocess_input(img)
return img

Vector Similarity

We’ve represented input images as vectors. We will decide both pictures are same person or not based on comparing these vector representations. Now, we need to find the distance of these vectors. There are two common ways to find the distance of two vectors: cosine distance and euclidean distance. Cosine distance is equal to 1 minus cosine similarity. No matter which measurement we adapt, they all serve for finding similarities between vectors.

def findCosineDistance(source_representation, test_representation):
a = np.matmul(np.transpose(source_representation), test_representation)
b = np.sum(np.multiply(source_representation, source_representation))
c = np.sum(np.multiply(test_representation, test_representation))
return 1 - (a / (np.sqrt(b) * np.sqrt(c)))

def findEuclideanDistance(source_representation, test_representation):
euclidean_distance = source_representation - test_representation
euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
euclidean_distance = np.sqrt(euclidean_distance)
return euclidean_distance

Recognizing images

We’ve represented images as vectors and find the similarity measures of two vectors. If both images are same person, then measurement should be small. Otherwise, the measurement should be large if two images are different person. Here, epsilon value states threshold.

If you wonder how to determine the threshold value for this face recognition model, then this blog post explains it deeply.

epsilon = 0.40 #cosine similarity
#epsilon = 120 #euclidean distance

def verifyFace(img1, img2):
img1_representation = vgg_face_descriptor.predict(preprocess_image(img1))[0,:]
img2_representation = vgg_face_descriptor.predict(preprocess_image(img2))[0,:]

cosine_similarity = findCosineSimilarity(img1_representation, img2_representation)
euclidean_distance = findEuclideanDistance(img1_representation, img2_representation)

if(cosine_similarity < epsilon):
print("verified... they are same person")
else:
print("unverified! they are not same person!")

Cosine similarity should be less than 0.40 or euclidean distance should be less than 120 based on my observations. Thresholds might be tuned based on your problem. It is all up to you to choose the similarity measurement.

This is one shot learning process. I mean that we would not feed multiple images of a person to network. Suppose that we store a picture of a person on our database, and we would take a photo of that one in the entrance of building and verify him. This process can be called face verification instead of face recognition.

Some researchers call finding distances of represented two images as Siamese networks.

Testing

I tested the developed model with variations of Angelina Jolie and Jennifer Aniston. Surprisingly, the model can verify all instances I fed. For example, Angelina Jolie is either blonde or brunette in the following test set. She even wears a hat in a image.

angelina-jolie-true-positive-v2 — Detecting Angelina Jolie

The mode is very successful for true negative cases. Descriptor can overwhelmingly detect Angelina Jolie and Jennifer Aniston.

angelina-jolie-vs-jennifer-anistos — Detecting that Jennifer Aniston and Angelina Jolie are different person

True positive results for Jennifer Aniston fascinate me. I might not detect the 3rd one (2nd row, 1st column). Jennifer is at least 10 years young in that photo.

jennifer-aniston-true-positive — True positive cases for Jennifer Aniston

I think the revolutionary testing was on Katy Perry. The face recognition model can recognize her even for dramatic appearance change.

katy-perry-face-recognition — Dramatic transformation of Katy Perry can be detected

Of course, I can test the model for limited number of instances. The model got 98.78% accuracy for labeled faces in the wild dataset. The dataset contains 13K images of 5K people. BTW, researchers fed 2.6 M images to tune the model weights.

Having the hair dyed or wearing hat just like in movies do not work against AI systems. Movie producers should find more creative solutions.

angelina-jolie-salt — Angelina Jolie has her hair dyed in Salt (2010)

Real Time Deep Face Recognition

We can apply deep face recognition in real time as well. Face pictures in database represented as 2622 dimensional vector at program initialization once. Luckily, opencv can handle face detection. Then, we will represent detected face and check similarities.

Early stages of face recognition pipeline

Modern face recognition pipelines consist of 4 stages: detect, align, represent and classify / verify. We’ve ignored the face detection and face alignment step not to make this post so complex. However, it is really important for face recognition tasks

Face detection can be done with many solutions such as OpenCV, Dlib or MTCNN. OpenCV offers haar cascade, single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally, MTCNN is a popular solution in the open source community as well. Herein, SSD, MMOD and MTCNN are modern deep learning based approaches whereas haar cascade and HoG are legacy methods. Besides, SSD is the fastest one. You can monitor the detection performance of those methods in the following video.

Here, you can watch how to use different face detectors in Python.

Moreover, Google declared that face alignment increases its face recognition model FaceNet from 98.87% to 99.63%. This is almost 1% accuracy improvement which means a lot for engineering studies. Here, you can find a detailed tutorial for face alignment in Python within OpenCV.

You can find out the math behind alignment more on the following video:

Besides, face detectors detect faces in a rectangle area. So, detected faces come with some noise such as background color. We can find 68 different landmarks of a face with dlib. In this way, we can get rid of any noise of a facial image.

In addition, MediaPipe can find 468 landmarks. Please see its real time implementation in the following video. Recommended tutorials: Deep Face Detection with MediaPipe, Zoom Style Virtual Background Setup with MediaPipe.

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.

Approximate Nearest Neighbor

As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.

On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.

So, if you have a robust facial recognition model then it is not a big deal to run it in billions!

Conclusion

So, we can recognize faces easily on-premise by combining transfer learning and auto-encoder concepts. Additionally, some linear algebra ideas such as cosine similarity contribute the decision. We’ve fed frontal images to the model directly. Finally, I pushed the source code of project on my GitHub profile. BTW, I run the code for TensorFlow backend.

Finally, Google has Facenet, Carnegie Mellon University has OpenFace and Facebook has DeepFace face recognition models as an alternative to VGG-Face.

Python Library

Herein, deepface is a lightweight face recognition framework for Python. It currently supports the most common face recognition models including VGG-Face, Facenet, OpenFace, Facebook DeepFace and DeepID.

It handles building pre-designed model, loading pre-trained weights, finding vector embedding of faces and finding similarity to recognize faces in the background. So, you can verify faces with a just few lines of code.

It is available on PyPI as well. You should run the command “pip install deepface” to have it. Its code is also fully open-sourced in GitHub. Repository has a detailed documentation for developers. BTW, you can support this project by starring the repo.

Here, you can watch the how to video for deepface.

Besides, you can run deepface in real time with your webcam as well.

Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.

Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.

Large scale face recognition

Large scale face recognition requires to apply face verification several times. However, we can store the representations of ones in our database. In this way, we just need to find the representation of target image. Finding distances between representations can be handled very fast. So, we can find an identity in a large scale data set in just seconds. Deepface offers an out-of-the-box function to handle large scale face recognition as well.

Notice that face recognition has O(n) time complexity and this might be problematic for millions or billions level data. Herein, approximate nearest neighbor (a-nn) algorithm reduces time complexity dramatically. Spotify Annoy, Facebook Faiss and NMSLIB are amazing a-nn libraries. Besides, Elasticsearch wraps an NMSLIB and it comes with highly scalability. You should run deepface within those a-nn libraries if you have really large scale data base.

On the other hand, a-nn algorithm does not guarantee to find the closest one always. We can still apply k-nn algorithm here. Map reduce technology of big data systems might satisfy the both speed and confidence here. mongoDB, Cassandra and Hadoop are the most popular solutions for no-sql databases. Besides, if you have a powerful database such as Oracle Exadata, then RDBMS and regular sql might satisfy your concerns as well.

Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Ensemble method

We’ve mentioned just a single face recognition model. On the other hand, there are several state-of-the-art models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace and DeepID. Even though all of those models perform well, there is no absolute better model. Still, we can apply an ensemble method to build a grandmaster model. In this approach, we will feed the predictions of those models to a boosting model. Accuracy metrics including precision, recall and f1 score increase dramatically in ensemble method whereas running time lasts longer.

The Best Single Model

There are a few state-of-the art face recognition models: VGG-Face, FaceNet, OpenFace and DeepFace. Some are designed by tech giant companies such as Google and Facebook whereas some are designed by the top universities in the world such as University of Oxford and Carnegie Mellon University. We will have a discussion about the best single face recognition model in this video.

DeepFace API

DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:

Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:

Encrypt Vector Embeddings

Facial recognition relies on vector embedding comparisons, much like other vector-based models such as reverse image search, recommendation engines, and LLMs. While embeddings do not allow the original data to be reconstructed, they still contain sensitive information, similar to fingerprints. If leaked, they can expose systems to adversarial attacks.

Encrypting embeddings can enhance security, but traditional symmetric key algorithms like AES have limitations. The private key needed for decryption must remain secure and cannot be transmitted to the cloud without risking exposure. If encrypted embeddings are retrieved for decryption and distance calculations on on-premises systems, the cloud’s computational power remains underutilized.

Homomorphic encryption offers a powerful alternative through public-key cryptography. It enables computations directly on encrypted data without requiring decryption, allowing full utilization of the cloud’s computational resources.

You can encrypt embeddings and compute vector similarity directly on encrypted data using homomorphic encryption. This ensures privacy while still enabling similarity calculations. Here, you can find an implementation using partially homomorphic encryption.

The same use case can also be implemented with fully homomorphic encryption (FHE), but it comes with significant drawbacks. FHE is slower, requires more computational power, and generates much longer ciphertexts. Additionally, its private and public keys are not well-suited for memory-constrained environments like IoT devices.

Support this blog if you do like!

68 Comments

Alihosein Pouryamoazam says:

January 21, 2023 at 1:44 pm

Hello, Mr. Sefalak
I have two questions:
1) Is it possible to access the test and training data of the dataset??? And can we retrain these models???
2) Are these models compressed (pruning, quantization, etc.) or not???
If the answers are yes, please provide the access link, thank you very much

Log in to Reply
1. Sefik Serengil says:
  
  January 21, 2023 at 10:37 pm
  
  In this post, we used pre-trained VGG-Face model. So, we did not perform training because it is already trained. That pre-trained model trained with VGG-Face data set. This is public and can be accessible on the VGG group web site. So, you can re-train this model.
  
  Pruning is a feature for decision tree algorithms. So, it cannot be applied for neural networks. So, this is not a compressed model.
  
  Log in to Reply

Deep Face Recognition with Keras

Objective

Network configuration

Learning outcomes

Representation

Vector Similarity

Recognizing images

Testing

Real Time Deep Face Recognition

Early stages of face recognition pipeline

Approximate Nearest Neighbor

Conclusion

Python Library

Anti-Spoofing and Liveness Detection

Large scale face recognition

Tech Stack Recommendations

Ensemble method

The Best Single Model

DeepFace API

Encrypt Vector Embeddings

Related

68 Comments

Leave a Reply Cancel reply

Objective

Network configuration

Learning outcomes

Representation

Vector Similarity

Recognizing images

Testing

Real Time Deep Face Recognition

Early stages of face recognition pipeline

Approximate Nearest Neighbor

Conclusion

Python Library

Anti-Spoofing and Liveness Detection

Large scale face recognition

Tech Stack Recommendations

Ensemble method

The Best Single Model

DeepFace API

Encrypt Vector Embeddings

Related

68 Comments

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil