Large Scale Face Recognition with Facebook Faiss

Facebook research team developed an amazing product – Faiss – to handle large scale similarity search problem. The name of the library comes from Facebook AI Similarity Search. Scalability is mostly ignored in facial recognitions studies. We will adopt Facebook Faiss for large scale face recognition task in this post. However, this tutorial will guide you if you have similarity search problem such as image or document search in a millions of data set.

Mark Zuckerberg
Webinar

There are billions images indexed by Google but reverse image search returns responses just in seconds. This has nothing to do with the hardware it has. We will talk about Large Scale Machine Learning in this video.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

As Ara Guler declared if the best camera had taken the best photograph then the one would be the best novelist who has the best typewriter.

Face recognition pipeline

We will apply approximate nearest neighbor (a-nn) method with Facebook Faiss library. You should remember how face recognition pipeline works. A modern face recognition pipeline consists of 4 common stages: detect, align, represent and verify.

Detection and alignment stages improve the model accuracy but they are not must. Representation stage is mainly based on feeding face images to a CNN model and it returns a multidimensional vector. Verification stage is based on finding the distance between representations of a pair. If the distance is less than a threshold, face pair will be verified as same person. In this post, we will mainly mention the verification stage of a face recognition pipeline.

Face recognition requires to apply explained pipeline several times. It has a O(nxd) time complexity where n is the number of instances in the database and d is the number of dimensions of the representation vector. This becomes problematic for very large data sets. A-nn algorithms reduces the complexity to O(log n) levels. Herein, Faiss is a strong a-nn library.

Installation

Faiss is not supported in Windows OS. I tested all my experiments on Linux. Besides, the library is recommended to be installed with conda but I cannot install it in this way. I can install it with this unofficial pypi version.

#!pip install --user faiss-cpu
Data set

Unit test items of deepface has several images. I will represent those images in this study. Let’s walk into the directory and get file names first.

import os
files = []
for r, d, f in os.walk("deepface/tests/dataset/"):
    for file in f:
        if ('.jpg' in file):
            exact_path = r + file
            files.append(exact_path)

Face recognition model

Deepface framework for python wraps several state-of-the-art face recognition models: VGG-FaceGoogle FaceNetOpenFaceFacebook DeepFaceDeepIDDlib and ArcFace.

We will build a Google FaceNet model and represent face images as vectors. Preprocess face function of deepface detects and aligns faces. I will feed this preprocessed image to the FaceNet model. Then, store the both image name and its representation in a list.





from deepface import DeepFace

representations = []
for img_path in files:     
    embedding = DeepFace.represent(img_path = img_path, model_name = "Facenet")[0]["embedding"]
     
    representation = []
    representation.append(img_path)
    representation.append(embedding)
    representations.append(representation)

FaceNet, VGG-Face, Dlib and ArcFace overperform among others. Here, you can watch how to determine the best model.

Synthetic data

There are almost 60 instances in deepface unit test directory. I will generate a synthetic data to make the problem much more complex.

import random
for i in range(70, 1000000): #1M instances
    key = 'dataset/img_%d.jpg' % (i)
    vector = [random.gauss(-0.35, 0.48) for z in range(embedding_size)]
 
    dummy_item = []
    dummy_item.append(key)
    dummy_item.append(vector)
    representations.append(dummy_item)

Faiss expect 2 dimensional matrix as float32 numpy array type. That’s why, I will convert representations list to the required format.

embeddings = []
for i in range(0, len(representations)):
    embedding = representations[i][1]
    embeddings.append(embedding)

embeddings = np.array(embeddings, dtype='f')

Now, embeddings object is (1M, 128) shaped float32 numpy array. In other words, it stores 1M different 128 dimensional vectors in float32 format.

Target image

We have stored millions of face representations in representations object. Let’s find the nearest neighbors of the following target image with faiss.

Target image
target_representation = DeepFace.represent(img_path = "target.jpg", model_name = "Facenet")[0]["embedding"]

Here, target representation is 128 dimensional vector but faiss expects a 2D matrix. That’s why, I will add a dummy dimension to the target representation. Besides, faiss expects a float32 numpy type array.

target_representation = np.array(target_representation, dtype='f')
target_representation = np.expand_dims(target_representation, axis=0)

Now, target representation is (1, 128) shaped float32 numpy array.

Initializing faiss index

We need to initialize faiss index first. I show below how to use different distance metrics such as euclidean distance or cosine similarity. Initialization lasts 0.000261 seconds.

dimensions = 128 #FaceNet output is 128 dimensional vector

metric = 'euclidean' #euclidean, cosine

if metric == 'euclidean':
    index = faiss.IndexFlatL2(dimensions)
elif metric == 'cosine':
    index = faiss.IndexFlatIP(dimensions)
    faiss.normalize_L2(embeddings)

We can feed bulk of vectors when faiss index is initialized. Here, embeddings object is (1M, 128) shape numpy array. Notice that it has to be float32. Adding 1M 128 dimensional vectors to the faiss index lasts 0.222926 seconds. This is amazing!

index.add(embeddings)
Save and restore the index

The number of vectors might be much larger in your case. In this case, you might want to skip embedding generation and index building stages. Herein, you can save the built index and restore it as illustrated below.





#save
faiss.write_index(index,"vector.index")

#restore
index = faiss.read_index("vector.index")

Saving lasts 0.7817 seconds whereas restoration lasts 1.2570 seconds. Built index is size of 512 MB on the disk for 128 dimensional 1M vectors.

Finding nearest neighbor

We have the millions of vectors in the memory. Those are database images. Now, I want to find the nearest neighbors of the target image. Production will start here because previous stages could be handled as a batch operation.

k = 3
distances, neighbors = index.search(target_representation, k)

Search lasts 0.2537 seconds. This is amazing! I can find the nearest neighbors of an identity in millions just is milliseconds. Besides, it returns the distance values as well. In this way, I can verify those images are same person with the target image.

Neighbors object stores the index values of the nearest neighbors of the target image. Representations object already store the image names. Let’s visualize the nearest ones. Results are very satisfactory. We can find this result just in 250 milliseconds!

Nearest neighbors of the target

Now, you might imagine how Google Images Search works. When you drag and drop an image to the search bar, they represent that image as vectors. They already stored the vector representations of billions of images. Google image search basically applies approximate nearest neighbor algorithm and returns the most similar ones. To sum up, image and document search is mainly based on a CNN model to represent entities as vectors and running a-nn algorithm.

Facebook Faiss ve Spotify Annoy

In spotify’s annoy, we have to build indexes to save. Besides, we cannot add items to loaded index. Still we can store a built index on the memory and add items into it if we don’t save.

This is important if you will look for a new item in your existing data base. Imagine that you have millions of identities in your data base and you will apply face recognition in real time web cam data. Captured web cam data is new and we could not store it in our db before. We have to append it to the index to find the nearest one. Because the vector must be in the index as well to find the nearest neighbors in annoy.

In contrast, the vector you will look for its nearest neighbors does not have to be in index. You can build an index with your database images and look for a new identity in your index.

In my opinion, spotify annoy is more convenient to find the nearest neighbors of existing entities such as finding similar customers whereas Facebook Faiss is more convenient to find the nearest neighbors of new entities such as image or document search or face recognition.

Besides, Facebook Faiss can be run on both CPU and GPU whereas Spotify Annoy can be run on just CPU.





I’ve mentioned that the installation of faiss is a little bit problematic and it is not supported on windows os. However, spotify annoy is very stable. I can run it in my first try.

The other ann packages

Spotify Annoy is not the unique approximate nearest neighbor implementation of the open source community. If you like this post, I strongly recommend you to read this tutorial:

Picking up the right tool for the right job is important. If your use case requires to creating and building index often, then search time is the key performance. On the other hand, if your use case requires to re-build index often, then adding embeddings and index building times are important. The following table demonstrates the times I spent for those a-nn packages on 1M vectors.

Performances of ann packages
Scalability

Spotify Annoy, Facebook Faiss and NMSLIB are amazing libraries enabling us to search on very large scale data set fast. But they are very core libraries and scalability of those libraries on production pipelines might be problematic. Herein, Elasticsearch wraps NMSLIB to perform approximate nearest neighbor algorithm but it comes with highly scalability feature by default. In this way, we can run a-nn algorithm on many clusters easily.

Face recognition in deepface

Those technique stuff might discourage and confuse you. Herein, deepface package for python offers a large scale face recognition with just a few lines of code. It wraps the state-of-the-art face recognition models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID and Dlib.

It returns a pandas data frame of the strong candidates of an identity.

#!pip install deepface
from deepface import DeepFace

models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']
df = DeepFace.find(img_path = 'img1.jpg', db_path = 'C:/my_db', model_name = models[0])

print(df.head())

Here, you can watch large scale face verification within deepface library.

Face recognition requires to apply face verification several times. Deepface handles it in the background.

Map Reduce Technology

Approximate nearest neighbor algorithm reduces the time complexity dramatically but it does not guarantee to find the closest ones always. If you have million level data, big data systems and map reduce technology might match your satisfactions if your concern is not to discard important ones. Herein, mongoDB, Cassandra, Redis and Hadoop are the most popular solutions.

Elephant is an iconic mascot of hadoop
Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.





Conclusion

So, we’ve mentioned an implementation of approximate nearest neighbor algorithm – faiss. It has some pros and cons against Spotify Annoy. It comes with very high speed and practicalness. However, its installation stages is a little bit problematic. Even though, this post applies faiss library to a face recognition task, you can adapt it to any problem requires similarity search.

I pushed the source code of this study to GitHub. You can support this work if your star the repo.


Like this blog? Support me on Patreon

Buy me a coffee


6 Comments

  1. k = 3
    distances, neighbors = index.search(target_representation, k)

    Belki biraz saçma bir soru olacak ancak buradan dönen değerlerin hangi resimler olduğunu nasıl gösteririm?

    1. neighbors’tan dönen index’e bakmanız yetecektir.

      for i in range(0, len(neighbors)):
      item = representations[neighbors[i]]
      file_name = item[0]

  2. What is the threshold for determining that two faces belonging to the same person when using faiss?

Comments are closed.