Billion-scale Fast Vector Similarity Search with Spotify Voyager

In the ever-evolving landscape of music streaming and recommendation systems, Spotify has long been at the forefront, leveraging cutting-edge technologies to enhance user experiences. Recently, however, the traditional approach using Annoy for large-scale recommendation engines has encountered performance challenges, prompting Spotify to explore new horizons. Enter Spotify Voyager, a groundbreaking solution that disrupts the status quo by replacing Annoy’s tree-based approximate nearest neighbor with a more efficient alternative – HNSW. In this blog post, we delve into the fascinating realm of large-scale face recognition with Spotify Voyager, a journey that involves harnessing the power of the DeepFace library in Python. We’ll uncover the intricacies of generating vector embeddings for a facial database, augmenting it with synthetic data to achieve millions of entries. The real magic happens as we embark on the quest to identify faces not present in our extensive database, showcasing the speed and precision of Voyager’s search capabilities – unveiling similar images in just milliseconds. Moreover, the versatility of this approach extends beyond facial recognition, as it can seamlessly adapt to any vector model, be it for chatbots like ChatGPT or language models, opening up a world of possibilities beyond the realm of faces and music. Join us on this exciting exploration of Spotify Voyager’s transformative role in the realm of large-scale face recognition and its potential applications across various vector models.

Gray and Black Galaxy Wallpaper From Pexels

Prerequisites

In this experiment, we will depend on DeepFace library for python to represent facial images as vector; Spotify Voyager to store vector embeddings into our vector index; numpy to play with vector data; opencv and matplotlib pair to display results. To sum up, we can group the requirements as


๐Ÿ™‹โ€โ™‚๏ธ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

# built-in dependencies
import os
import time

# third-party dependencies
import numpy as np
import cv2
import matplotlib.pyplot as plt
from deepface import DeepFace
from voyager import Index, Space

Configurations

DeepFace wraps many cutting-edge facial recognition models and face detectors. In this experiment, we will adopt FaceNet and MtCnn pair. You may consider to adopt different set of configuration such as VGG-Face and RetinaFace.

model_name = 'Facenet'
detector_backend = 'mtcnn'
num_dimensions = 128 # Facenet produces 128-dimensional vectors 

Finding embeddings for facial database

In this experiment, we will use unit test items of DeepFace. DeepFace features a represent function that transforms facial images into vector embeddings. The chosen facial recognition model, FaceNet, generates 128-dimensional vector embeddings. Initially, we will store all these embeddings in a dedicated list while concurrently storing their corresponding names in a separate list.

img_names = []
embeddings = []

for dirpath, dirnames, filenames in os.walk('deepface/tests/dataset/'):
    for filename in filenames:
        if '.jpg' in filename:
            img_name = f'{dirpath}{filename}'
            
            embedding_objs = DeepFace.represent(
                img_name, model_name=model_name, detector_backend=detector_backend
            )
            embedding = embedding_objs[0]['embedding']
            
            embeddings.append(embedding)
            img_names.append(img_name)

Creating synthetic data

The unit test folder in DeepFace currently contains 62 items, which is insufficient for conducting large-scale vector search experiments. To address this limitation, we will generate synthetic data to scale up our vector database to the million-level, providing a more challenging benchmark for testing the capabilities of the Voyager index.

target_size = 1000000
for i in range(len(embeddings), target_size):
    embedding = np.random.uniform(-5, +5, num_dimensions)
    embeddings.append(embedding)
    img_names.append(f'synthetic_{i}.jpg')

embeddings_np = np.array(embeddings)

Now, we have million-level vector database!

Storing embeddings into Voyager Index

We will begin by initializing the Voyager index, configuring its distance metric, and specifying the number of dimensions for vector embeddings. In this experiment, we will opt for Euclidean distance, noting that Voyager interprets it as Euclidean L2 when selected. As previously mentioned, FaceNet generates 128-dimensional vectors, so the num_dimensions argument for our case will be set to 128. Subsequently, Voyager features an add_items function that accepts 2-dimensional arrays. In this context, one dimension represents the number of instances, which is 1 million, and the other dimension denotes the number of dimensions for a vector embedding, which is 128. This method proves significantly more efficient than adding items in a conventional for loop.

index = Index(Space.Euclidean, num_dimensions=num_dimensions)
index.add_items(embeddings_np)

In my scenario, Voyager Index successfully stored 1 million vectors with 128 dimensions in a mere 27.64 seconds.

Finding a target image in Voyager Index

Next, we will conduct a search in the Voyager index for an item that is not present in our facial database.

Target Image – Not Available In Facial Database

We must first obtain its vector embedding using DeepFace.





target_img = 'target.jpg'
embedding_obj = DeepFace.represent(
    target_img, model_name=model_name, detector_backend=detector_backend
)
target_embedding = embedding_obj[0]['embedding']

Vector Similarity Search

We will query the constructed Voyager index with a target embedding. The query method also requires specifying k items, indicating the number of nearest neighbors to be found. In this case, we will search for the 3 nearest neighbors.

neighbors, distances = index.query(target_embedding, k=3)

In my case, the Voyager Index was successfully queried within a staggering 0.0002269744873046875 seconds among 1 million vectors with 128 dimensions. This performance truly presents a remarkable and challenging achievement!

Results

The query method of Voyager returns the indexes of the nearest neighbors. Since we have stored the names corresponding to these indexes in the image names list, we can access the items with the same index to retrieve the original image names. Subsequently, we can use matplotlib to plot the images.

target_img = cv2.imread('target.jpg')

for i, neighbor in enumerate(neighbors):
    img_name = img_names[neighbor]
    label = img_name.split('/')[-1]
    distance = distances[i]
    print(
        f'{i+1}. nearest neighbor is {label} with distance {round(distance)}'
    )
    
    fig = plt.figure(figsize=(7, 7))
    
    fig.add_subplot(1,2,1)
    plt.imshow(target_img[:,:,::-1])
    plt.axis('off')
    
    fig.add_subplot(1,2,2)
    img = cv2.imread(img_name)
    plt.imshow(img[:,:,::-1])
    plt.axis('off')
    
    plt.show()    

Upon plotting the 3-nearest neighbors of the target image, it becomes evident that all the images feature Angelina Jolie. This outcome strongly indicates the successful end-to-end functionality of Voyager.

Nearest Neighbors

Highlights

According to the Spotify Engineering blog post, Voyager showcases an impressive performance, exceeding Annoy in speed by more than 10 times at the same recall, or achieving up to 50% more accuracy at the same speed. This remarkable efficiency and accuracy make Voyager a standout choice in large-scale vector similarity search applications. Besides, Voyager boasts up to 4 times less memory usage compared to Annoy, emphasizing its efficiency in resource utilization.

One notable advantage of Voyager over Annoy lies in its capability to utilize existing vectors within its index for the purpose of searching for similar ones. In contrast, Annoy faces a limitation as it cannot employ a new vector to conduct searches for similar entities. This distinction holds particular significance in the context of playlist or music recommendations. Voyager, by circumventing this restriction, offers a more flexible and dynamic approach, allowing users to explore and discover similar vectors beyond the constraints imposed by the initial set, thereby enhancing the precision and adaptability of its recommendation engine.

One notable drawback of Voyager in comparison to Annoy is its limited compatibility with operating systems. While Annoy supports Windows OS, Voyager, unfortunately, does not operate on this platform according to readme of its GitHub repo. As of now, to utilize Voyager, users are required to have either Linux or MacOS, presenting a constraint for those relying on Windows. This platform dependency constitutes a notable disadvantage for individuals seeking to leverage Voyager’s capabilities but operating within a Windows environment. On the other hand, its official blog post mentions compatibility with the Windows operating system. I did my experiments on MacOS. So, if you do an experiment with Windows OS, then please make a comment to this blog post.

Conclusion

In conclusion, our journey into the realm of large-scale face recognition with Spotify Voyager has illuminated the transformative power of innovative technologies in the field of recommendation systems. By stepping away from the conventional Annoy approach and embracing the efficiency of hnsw, Spotify has not only addressed performance challenges but has also set a new standard in speed and accuracy. The utilization of the DeepFace library for generating vector embeddings and the strategic augmentation of facial databases with synthetic data showcase the commitment to pushing the boundaries of what’s possible. Beyond the realm of music and faces, the adaptability of this approach to other vector models, such as chatbots or language models, opens up exciting possibilities for diverse applications. Spotify Voyager’s prowess in quickly identifying faces in a vast database, coupled with its potential for broader vector-based models, marks a significant stride forward in the evolution of recommendation systems. As we look to the future, the fusion of advanced technologies and creative implementations promises an even more personalized and efficient user experience, transcending the limits of traditional recommendation engines. The era of large-scale face recognition with Spotify Voyager beckons a new chapter in the ongoing narrative of technological innovation and user-centric design.

I pushed the source code of this experiment into GitHub. If you do like this post, then you can support this work with starringโญ its repo.






Like this blog? Support me on Patreon

Buy me a coffee