Large Scale Face Recognition with Spotify Annoy

Face recognition requires to apply face verification tasks many times. Nowadays, we already have state-of-the-art face verification models such as VGG-Face or FaceNet. However, scalability is the most challenging stage and it is mostly missing in studies. We will mention how to run face recognition on a millions of data with the awesome spotify tool annoy. Even though this tutorial will mainly mention face recognition, it will guide ones who will adopt approximate nearest neighbors (a-nn) in daily projects.

Painted Ladies in Silicon Valley by Burak Arik
Webinar

There are billions images indexed by Google but reverse image search returns responses just in seconds. This has nothing to do with the hardware it has. We will talk about Large Scale Machine Learning in this webinar.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

As Ara Guler declared if the best camera had taken the best photograph then the one would be the best novelist who has the best typewriter.

Face recognition pipeline

Let’s remember what a modern face recognition pipeline is first. It consists of detection, alignment, represention and verification stages. Detection and alignment are pre-processing stages but they increases model accuracy dramatically. We mostly have a CNN in representation stage. We feed a face image and CNN returns a vector representation. We will compare represented vectors with some common formula such as euclidean distance or cosine similarity. We finally verify two image pairs are same person if the similarity is high.

Herein, verification stage requires O(n x d) time complexity where n is the number of instances and d is the number of dimensions in a vector.

Conventional Way

Feeding a facial image to a neural network and retrieving its output is the most costly operation in the face recognition pipeline. We could store the vector representations of faces beforehand in our database. In this way, we just need to find the vector representation of a target image. Then, finding distances between target and database images could be handled very fast. We have mentioned this approach in a blog post already.

However, this is mainly based on K nearest neighbor algorithm and it has a time complexity of O(n x d) where n is the number of instances and d is the number of dimensions in the vector representation. Notice that this might be problematic if you need to find an identity in a millions of faces.

The Spotify Way

Face recognition might be an utopic case for searching in millions. However, Spotify has to handle this issue. They recover the playlist recommendation everyday. The company has 70M users, 4M unique songs and 40 features. This means that they have apply O (70M x 4M x 40) = O(12 peta operations = 10 to the power of 15). This is really challenging!

Spotify

Herein, spotify engineers implemented approximate nearest neighbor approach and published as an open source package: annoy. Annoy comes with a huge speed but it does not guarantee to find the nearest one. It just approximates. It reduces the time complexity to O(log n).

Data set

We will use the images in this folder. Let’s walk into the directory and get file names here.





import os
files = []
for r, d, f in os.walk("deepface/tests/dataset/"):
    for file in f:
        if ('.jpg' in file):
            exact_path = r + file
            files.append(exact_path)

Face recognition model

Deepface framework for python wraps several state-of-the-art face recognition models: VGG-FaceGoogle FaceNetOpenFaceFacebook DeepFaceDeepIDDlib and ArcFace.

We are going to build a Google FaceNet model.

FaceNet, VGG-Face, Dlib and ArcFace overperform among others. Here, you can watch how to determine the best model.

Facial embeddings

Then, we will feed face images as input to the model and find vector representations of face images. But face detection and alignment should be handled in a face recognition pipeline to increase the model accuracy. Preprocess face function handles these stages. Finally, we will store each item name and its representation in the representations list.

from deepface import DeepFace
representations = []
for img_path in files:
    embedding = DeepFace.represent(img_path = img_path, model_name = "Facenet")[0]["embedding"]
    representations.append([img_path, embedding])
Synthetic data generation

However, there are just 60 instances here. We need a very large data set to learn the limits of annoy package. Find millions of face images is hard. That’s why, I will generate synthetic data. In this way, I will have millions of data.

import random
for i in range(70, 1000000): #1M instances
    key = 'dataset/img_%d.jpg' % (i)
    vector = [random.gauss(-0.35, 0.48) for z in range(embedding_size)]

    dummy_item = []
    dummy_item.append(key)
    dummy_item.append(vector)
    representations.append(dummy_item)
Storing representations in annoy

We already have the vector representations of data base identities. We will add those items including index value and vector representations to annoy index object. Then, annoy index object can be built. Build command expects a number of trees argument to build a random forest.

embedding_size = 128 #FaceNet output size
t = AnnoyIndex(embedding_size, 'euclidean')

for i in range(0, len(representations):
   representation = representations[i]
   img_path = representation[0]
   embedding = representation[1]
   
   t.add_item(i, embedding)

t.build(3) #3 trees

Herein, adding items block lasts 3.04 seconds for 1M data and building completed in 7.29 seconds. However, we could build annoy model already and skip these stages.

Built annoy models could be stored and restored for next runs.

#save the built model
t.save('result.ann')

#restore the built model
t = AnnoyIndex(embedding_size, 'euclidean')
t.load('result.ann')

Built model could find the nearest neighbors of a custom item. Suppose that I would like to find the 3 neighbors of the 1st item in the list.

idx = 0 #0 index item is the target
k = 3 #find the k nearest neighbors
neighbors = t.get_nns_by_item(idx, k+1)

Herein, get_nns_by_item function lasts 0.0001800060272216797 seconds which is really challenging. It can find the k nearest vectors among 1M vectors just in milliseconds!





Consider playlist recommendation in spotify. They already have the built annoy model and they can recommend you a playlist in just milliseconds in this way.

neighbors object will store the k nearest neighbors. The first one will be the identical of the vector. That’s why, I find the k+1 neighbors.

print(representations[neighbors[0]][0], " is highly correlated with "
   ,representations[neighbors[1]][0])

Annoy says img1.jpg is highly correlated with img4.jpg. It seems that it is acceptable.

Nearest neighbor
Brute force

We could find the most similar vector with brute force method as well.

def findEuclideanDistance(source_representation, test_representation):
    euclidean_distance = source_representation - test_representation
    euclidean_distance = np.sum(np.multiply(euclidean_distance, euclidean_distance))
    euclidean_distance = np.sqrt(euclidean_distance)
    return euclidean_distance

idx = 0
source_embedding = representations[idx][1]

distances = []
for i in range(1, len(representations)):
   target_path = representations[i][0]
   target_embedding = representations[i][1]
   distance = findEuclideanDistance(source_embedding, target_embedding)
   distances.append(distance)

candidate_idx = np.argmin(distances)
print(representations[idx][0]," is highly correlated with ",representations[candidate_idx+1][0])

Brute force method returns same identity with annoy but it performs slower.

Test results

I tested annoy and brute force methods for 1M and 2M vectors. Annoy is 1.5 times faster than brute force method for those sample size in my experiments.

vectors | ann | brute force |
1000000 | 10.37 | 12.21 |
2000000 | 31.41 | 46.88 |

BTW, those time values include adding_item and build stages of annoy. If you already built the model, then finding the nearest neighbor lasts just milliseconds. For example, I can find the nearest neighbor among 1M vectors just ins 0.0001800060272216797 seconds. On the other hand, brute force require 12.21 seconds.

Of course, one might ask what if I apply brute force method and store the all distance candidates in a matrix. This requires to store 1M x 1M shaped matrix. Allocating that size of matrix into memory would be problem. Secondly, you have to spend 12 seconds to find a row in the matrix. This means that you have to spend 12M seconds = 200K minutes = 3333 hours = 138 days.

BTW, you might realize that distance a to b and b to a will be same. That’s why, the matrix is symmetric. This can be decrease the required time to the half. Still you need 69 days. You need 7 days even if you can run 10 computers in parallel. However, a-nn approach lasts just 10 seconds serially. So, storing all distance candidates beforehand for 1M vectors is not meaningful.





Scalability

Spotify Annoy, Facebook Faiss and NMSLIB are amazing libraries enabling us to search on very large scale data set fast. But they are very core libraries and scalability of those libraries on production pipelines might be problematic. Herein, Elasticsearch wraps NMSLIB to perform approximate nearest neighbor algorithm but it comes with highly scalability feature by default. In this way, we can run a-nn algorithm on many clusters easily.

Face recognition in deepface

Face recognition can be handled within deepface. It handles all common stages of a face recognition pipeline: detect, align, represent and verify. Also, it applies face verification several times in the background. All you need is to call find function and it returns a pandas data frame.

#!pip install deepface
from deepface import DeepFace

models = ['VGG-Face', 'Facenet', 'OpenFace', 'DeepFace', 'DeepID', 'Dlib']

df = DeepFace.find(img_path = "img.jpg", db_path = "C:/my_db", model_name = models[0])
print(df.head())

It wraps state-of-the-art VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID and Dlib face recognition models.

Here, you can watch how large scale face verification is handled in deepface.

Face recognition requires to apply face verification several times. Deepface handles it in the background.

The other ann packages

Spotify Annoy is not the unique approximate nearest neighbor implementation of the open source community. If you like this post, I strongly recommend you to read this tutorial:

Picking up the right tool for the right job is important. If your use case requires to creating and building index often, then search time is the key performance. On the other hand, if your use case requires to re-build index often, then adding embeddings and index building times are important. The following table demonstrates the times I spent for those a-nn packages on 1M vectors.

Performance of packages
Map Reduce Technology

Approximate nearest neighbor algorithm reduces the time complexity dramatically but it does not guarantee to find the closest ones always. If you have million level data, big data systems and map reduce technology might match your satisfactions if your concern is not to discard important ones. Herein, mongoDB, Cassandra, Redis and Hadoop are the most popular solutions.

Elephant is an iconic mascot of Hadoop
Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Conclusion

We have mention approximate nearest neighbor implementation – annoy of the spotify team. It aims to accelerate the nearest neighbor algorithm. This is a must for operation of very big data. Spotify or YouTube have to handle peta operations (10 to the power of 15) daily. Annoy comes with advantage for this level of data. Besides, its get_nns_by_item commands is very fast. If you already built the annoy model, then you can find nearest neighbor much faster than brute force method.





Special thanks to Oguzhan Gencoglu to inform me about approximate nearest neighbor concept and annoy package.


Like this blog? Support me on Patreon

Buy me a coffee


3 Comments

  1. Amazing content, Mr. Serengil! However, something wasn’t clear to me: which of the verification methods mentioned in the post does the find() method use?

Comments are closed.