Large Scale Face Recognition for Deep Learning

Face recognition technology is mainly based on face verification. All scenario depends on feeding two face photos to a convolutional neural networks and retrieving their vector representations. Then, decision will be made based on the distance of those vectors but this is easy.

On the other hand, building a CNN model and calling its predict function are both costly operations. That’s why, applying face recognition on a large scale data set seems to be problematic. In this post, we will mention a workaround to handle large scale face recognition in an easy and fast way.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

A Face Recognition Pipeline

Let’s remember the flow of a modern face recognition pipeline.

It consists of 4 common stages: detect, align, represent and verify. We will focus on represent and verify stages in this post.

Big O Notation

Firstly, face verification has O(1) complexity whereas face recognition has O(n) complexity in big O notation. In other words, face recognition requires to call face verification function n times where n is the number of instances in the database.

Vlog

Now, you can either watch the following vlogs or follow this blog post. They both cover the large scale face recognition topic.

To focus on just face verification on face pairs.

Face recognition requires to apply face verification several times. Deepface handles it in the background.

Approximate Nearest Neighbor

But this might be problematic for billions of level data. In this case, we can apply approximate nearest neighbor algorithm to decrease the complexity. Spotify Annoy, Facebook Faiss and NMSLIB are very popular a-nn libraries.

Herein, Elasticsearch wraps NMSLIB but it comes with highly scalable feature. We can run it on many clusters.

Face recognition is a complex task

The both building a face recognition model and calling its prediction function are costly operations.

face-recognition-time — Time complexity of face recognition models

Imagine the required time to look for a face in a data set consists of 100 samples based on the table above.

Do we need building time plus 100 times verification time? Because this is not acceptable.

The hacker’s way

We might have hundreds of face photos in our database. The verification stage of a pipeline requires to find distances between vector of each image in the database and vector of the target image.

The trick is that we can already store the vector representations of faces in our database.

from deepface import DeepFace
#--------------------------
employees = []

for r, d, f in os.walk(db_path): # r=root, d=directories, f = files
 for file in f:
  if (".jpg" in file):
   exact_path = r + "/" + file
   employees.append(exact_path)
#--------------------------
representations = []
for employee in employees:
 representation = DeepFace.represent(img_path = employee, model_name = "VGG-Face")[0]["embedding"]

 instance = []
 instance.append(employee)
 instance.append(representation)
 representations.append(instance)
#--------------------------
import pickle
f = open("representations.pkl", "wb")
pickle.dump(representations, f)
f.close()

So, we will already have representations of identities existing in the database when a face recognition task is called.

Then, you we should find the vector representation of just one target face when we look for a face.

target_path = "target.jpg"
target_img = DeepFace.extract_faces(img_path = target_path)[0]["face"]
target_representation = DeepFace.represent(img_path = target_path, model_name = "VGG-Face")[0]["embedding"]

Besides, we could build the face recognition model already and wait for a target face. In this case, we need to spend time mentioned in the verification row. That lasts less than a second in the worst case scenario.

Then, all we need is to find distances between target and source image vectors. This could be handled very fast as you imagine.

#load representations of faces in database
f = open("representations.pkl", "rb")
representations = pickle.load(f)

distances = []
for i in range(0, len(representations)):
 source_name = representations[i][0]
 source_representation = representations[i][1]
 distance = dst.findCosineDistance(source_representation, target_representation)
 distances.append(distance)
#find the minimum distance index
idx = np.argmin(distances)
matched_name = representations[idx][0]

My experiments show that finding an identity is completed in less than a second for a data set consisting of tens of instances if the face recognition model is already built.

Approximate Nearest Neighbor

As explained in this tutorial, facial recognition models are being used to verify a face pair is same person or different persons. This is actually face verification instead of face recognition. Because face recognition requires to perform face verification many times. Now, suppose that you need to find an identity in a billion-scale database e.g. citizen database of a country and a citizen may have many images. This problem has O(n x logn) time complexity where n is the number of entries of your database.

On the other hand, approximate nearest neighbor algorithm reduces time complexity dramatically to O(logn)! Vector indexes such as Annoy, Voyager, Faiss; and vector databases such as Postgres with pgvector and RediSearch are running this algorithm to find a similar vector of a given vector even in billions of entries just in milliseconds.

So, if you have a robust facial recognition model then it is not a big deal to run it in billions!

DeepFace

Deepface offers an out-of-the-box find function to handle this action. All of those stages are handled in the background.

# !pip install deepface
from deepface import DeepFace

# find returns list of pandas dataframes
dfs = DeepFace.find(img_path = "target.jpg", db_path = "C:/my_db")

for df in ds:
   print(df.head())

As seen, this can be handled just a few lines of code.

Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.

Map reduce technology

Approximate nearest neighbor algorithm reduces the time complexity dramatically but it does not guarantee to find the nearest ones always. Big data technology and no sql databases come with the power of map reduce technology and we can run k-nn algorithm easily in this way. Herein, mongoDB, Cassandra, Redis and Hadoop are strong candidates.

Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Encrypt Vector Embeddings

Facial recognition relies on vector embedding comparisons, much like other vector-based models such as reverse image search, recommendation engines, and LLMs. While embeddings do not allow the original data to be reconstructed, they still contain sensitive information, similar to fingerprints. If leaked, they can expose systems to adversarial attacks.

Encrypting embeddings can enhance security, but traditional symmetric key algorithms like AES have limitations. The private key needed for decryption must remain secure and cannot be transmitted to the cloud without risking exposure. If encrypted embeddings are retrieved for decryption and distance calculations on on-premises systems, the cloud’s computational power remains underutilized.

Homomorphic encryption offers a powerful alternative through public-key cryptography. It enables computations directly on encrypted data without requiring decryption, allowing full utilization of the cloud’s computational resources.

You can encrypt embeddings and compute vector similarity directly on encrypted data using homomorphic encryption. This ensures privacy while still enabling similarity calculations. Here, you can find an implementation using partially homomorphic encryption.

The same use case can also be implemented with fully homomorphic encryption (FHE), but it comes with significant drawbacks. FHE is slower, requires more computational power, and generates much longer ciphertexts. Additionally, its private and public keys are not well-suited for memory-constrained environments like IoT devices.

Conclusion

So, we’ve mentioned large scale face recognition in this blog post. Even though, building a face recognition pipeline is a complex process, applying some hacking skills contributes to find a workaround. In this way, we can find a custom face in a large scale data set just in seconds.

You can support this study by starring the GitHub repo as well.

Support this blog if you do like!

2 Comments

dominic says:

April 16, 2023 at 5:23 pm

Does the code from your first code block assume at all of the images in the db_path directory are aligned / shaped in the correct way? If they’re not (say the db_path has image files of angelina jolie taken off of google images), should there be an added step of extracting the faces before representing and pickling them?

Also, the second code block saves the output of extract_faces to the variable “target_image”, but that variable isn’t used. should the target representation be created using “target_image” instead of “target_path”?

Log in to Reply
1. Sefik Serengil says:
  
  April 17, 2023 at 3:29 pm
  
  Yes, I suppose they are aligned already.
  
  Log in to Reply

Large Scale Face Recognition for Deep Learning

A Face Recognition Pipeline

Big O Notation

Vlog

Approximate Nearest Neighbor

Face recognition is a complex task

The hacker’s way

Approximate Nearest Neighbor

DeepFace

Anti-Spoofing and Liveness Detection

Map reduce technology

Tech Stack Recommendations

Encrypt Vector Embeddings

Conclusion

Related

2 Comments

Leave a Reply Cancel reply

A Face Recognition Pipeline

Big O Notation

Vlog

Approximate Nearest Neighbor

Face recognition is a complex task

The hacker’s way

Approximate Nearest Neighbor

DeepFace

Anti-Spoofing and Liveness Detection

Map reduce technology

Tech Stack Recommendations

Encrypt Vector Embeddings

Conclusion

Related

2 Comments

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil