Face recognition is mainly based on similarity search on facial embeddings. Researches mostly focus on storing vector representations on memory. However, production driven solutions must have database interaction. Herein, mongoDB is a pretty tool to store facial vector representations. We can easily use it on cloud or on-premise. Besides, we can work as a thin client and expect it to make strong calculcations on the server side. It is highly scalable and we could have the power of map reduce.
Vlog
Join this video to perform vector similarity search in mongodb with a deep face recognition use case.
πββοΈ You may consider to enroll my top-rated machine learning course on Udemy
Dependencies
We will need pymongo and dnspython packages to communicate with mongoDb from python. You should also need MongoDB compass as a visual editor. Finally, I will use mongoDb on cloud. That doesn’t require any installation.
I’ve created the database and collection (equivalent to reguler tables in RDBMs) within compass. The both database and collection name will be deepface in this experiment.
Connecting to mongoDb
pymongo distribution offers mongo client. We can connect our mongodb with a basic connection string. It expects basically username and password pair. Once a client connection established, we are going to access the deepface database we have created in the previous stage.
from pymongo import MongoClient connection = "mongodb+srv://..." client = MongoClient(connection) database = 'deepface'; collection = 'deepface' db = client[database]
Face recognition pipeline
A modern face recognition pipeline consists of 4 common stages: detect, align, represent and verify.
Luckily, deepface framework for python wraps several state-of-the-art face recognition models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, Dlib and ArcFace.
We are going to store the facial embeddings instead of image itself in the mongodb. That’s why, we will use deepface framework and Google FaceNet model to represent facial images as vector embeddings.
#!pip install deepface from deepface import DeepFace
FaceNet, VGG-Face and ArcFace overperform among others. Here, you can watch how to determine the best model.
Loading images
Unit test content of deepface will be the facial database as well in this experiment. Let’s find the image names in that folder first.
facial_img_paths = [] for root, directory, files in os.walk("deepface/tests/dataset"): for file in files: if '.jpg' in file: facial_img_paths.append(root+"/"+file)
Once exact image paths and names stored, we will find vector representations of facial images. Herein, preprocess face function of deepface detects and align faces. FaceNet model expects 160x160x3 inputs and it generates 128 dimensional output.
from deepface.commons import functions from tqdm import tqdm instances = [] for i in tqdm(range(0, len(facial_img_paths))): facial_img_path = facial_img_paths[i] embedding = DeepFace.represent(img_path = facial_img_path, model_name = "Facenet")[0]["embedding"] instance = [] instance.append(facial_img_path) instance.append(embedding) instances.append(instance)
Face detection
To have a more information about face detection and alignment in deepface, you should watch the following video.
Deepface wraps several face detectors: opencv, ssd, dlib and mtcnn. Here, mtcnn is the most robust but it is really slow. SSD is the fastest but its alignment score is lower than MTCNN. You can watch their detection performance in the following video.
Herein, retinaface is the cutting-edge technology for face detection. It can even detect faces in the crowd. Besides, it finds some facial landmarks including eye coordinates. In this way, its alignment score is high as well.
Preprocessing
We have exact image names and embeddings in instances list. I prefer to store 2D information in pandas instead of python list.
import pandas as pd df = pd.DataFrame(instances, columns = ["img_name", "embedding"]) df.head()
Pandas data frame stores exact image names and facial embeddings as columns.
Storing embeddings in mongoDb
We will walk over the rows of pandas data frame and save image name and embedding column values into the document database.
for index, instance in tqdm(df.iterrows(), total = df.shape[0]): db[collection].insert_one({"img_path": instance["img_name"], "embedding" : instance["embedding"].tolist()})
Interestingly, if you try to save the embedding with the command list(instance[“embedding”]), then you will have the exception: cannot encode object: β¦, of type.
We stored the facial embeddings in mongoDB. When I check it in compass, then I see that a document has image path and embedding fields. Also, embedding field is a type of array and it stores 128 dimensional vector.
When a new user joined in our team, we then add its embedding into the facial database. We don’t have to find embeddings of existing users anymore.
Target face
We stored our facial database in mongoDB. Now, we will look for an identity in our document based database. We will apply preprocess and embedding stages for the target image.
target_img_path = "target.jpg" target_img = DeepFace.extract_faces(img_path = target_img_path)[0]["face"] target_embedding = DeepFace.rerpresent(img_path = target_img_path, model_name = "Facenet")[0]["embedding"]
Some queries
We can query existing image names or embeddings from mongoDb as shown below. But that doesn’t satisfy our requirement because target image does not appear in mongo.
db[collection].find_one({'img_path': 'deepface/tests/dataset/img1.jpg'}) db[collection].find_one({'embedding': df.iloc[0].embedding.tolist()})
We can retrieve all image embeddings and compare each one with the target image. In this way, I can find the closest one. However, that’s not the best case as well. Because number of images in mongo might be millions. Besides, I spend all processing power in client side.
documents = db[collection].find() for document in documents: print(document["img_path"], document["embedding"])
Power of mongo
I can spend processing power in the server side as a better solution. I mean that I will send just the embedding of target image to mongo. Then, I expect mongo to find the closest one. Client might be an edge such as raspberry pi in this case. It will just query and get results.
The first question is that how to send an array to mongo?
query = db[collection].aggregate( [ { "$addFields": { "target_embedding": target_embedding } } ] )
This will return existing image path and embedding fields and also the target embedding I passed above.
{ '_id': ObjectId('600b1eed023a302a7526a2a6') , 'img_path': 'deepface/tests/dataset/img1.jpg' , 'embedding': [1.0574054718017578, 1.096140742301941, 1.2643184661865234, , β¦] , 'target_embedding': [-0.5445538759231567, 0.24550628662109375, 0.687471330165863, β¦] }
Euclidean distance
I need to find the euclidean distance values for each document in the collection. This requires to find the difference of each item with same index in both embedding and target embedding array first.
Subtracting two arrays can be handled with unwind method in mongo.
{ "$addFields": { "target_embedding": target_embedding } } , {"$unwind" : { "path" : "$embedding"}} , {"$unwind" : { "path" : "$target_embedding" }}
Matching unwind dimensions
However, unwinding two arrays causes the cartesian product. If you add the include array index argument in unwinding stage, then you can add a basic compare rule. If source and target embedding index are same, then compare rule will return 0. That’s why, we will filter zero ones with match command.
{ "$addFields": { "target_embedding": target_embedding } } , {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}} , {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }} , { "$project": { "img_path": 1, "embedding": 1, "target_embedding": 1, "compare": { "$cmp": ['$embedding_index', '$target_index'] } } } , {"$match": {"compare": 0}}
Group by clause
Now, there will be 128 lines for each image. Luckily, we can group them by image path key. Once array items of same image paths are group, we can run the euclidean distance formula. Remember that it requires to subtract each dimension first, and find their power of 2 second. Thirdly, we need to find the sum of the found powers.
{ "$addFields": { "target_embedding": target_embedding } } , {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}} , {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }} , { "$project": { "img_path": 1, "embedding": 1, "target_embedding": 1, "compare": { "$cmp": ['$embedding_index', '$target_index'] } } } , { "$group": { "_id": "$img_path", "distance": { "$sum": { "$pow": [{ "$subtract": ['$embedding', '$target_embedding'] }, 2] } } } }
Finalize Euclidean distance formula
We finally need to find the square root of the sum.
{ "$addFields": { "target_embedding": target_embedding } } , {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}} , {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }} , { "$project": { "img_path": 1, "embedding": 1, "target_embedding": 1, "compare": { "$cmp": ['$embedding_index', '$target_index'] } } } , { "$group": { "_id": "$img_path", "distance": { "$sum": { "$pow": [{ "$subtract": ['$embedding', '$target_embedding'] }, 2] } } } } , { "$project": { "_id": 1 #, "distance": 1 , "distance": {"$sqrt": "$distance"} } }
Discarding distant ones
We discard face pairs which have a distance value greater than 10 in FaceNet and Euclidean distance pair. We should add this rule in the query. In this way, mongo will return just same identities of target image.
query = db[collection].aggregate( [ { "$addFields": { "target_embedding": target_embedding } } , {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}} , {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }} , { "$project": { "img_path": 1, "embedding": 1, "target_embedding": 1, "compare": { "$cmp": ['$embedding_index', '$target_index'] } } } , { "$group": { "_id": "$img_path", "distance": { "$sum": { "$pow": [{ "$subtract": ['$embedding', '$target_embedding'] }, 2] } } } } , { "$project": { "_id": 1 #, "distance": 1 , "distance": {"$sqrt": "$distance"} } } , { "$project": { "_id": 1 , "distance": 1 , "cond": { "$lte": [ "$distance", 10 ] } } } , {"$match": {"cond": True}} , { "$sort" : { "distance" : 1 } } ] ) for i in query: print(i)
Results
Mongo returns the following images as same person for the target image. There are lots of people in the unit test folder but it returns just the faces of Angelina Jolie.
{'_id': 'deepface/tests/dataset/img2.jpg', 'distance': 7.0178008611285865, 'cond': True} {'_id': 'deepface/tests/dataset/img10.jpg', 'distance': 7.629044595250684, 'cond': True} {'_id': 'deepface/tests/dataset/img6.jpg', 'distance': 8.467856464704305, 'cond': True} {'_id': 'deepface/tests/dataset/img4.jpg', 'distance': 8.58306099238241, 'cond': True} {'_id': 'deepface/tests/dataset/img7.jpg', 'distance': 8.762336403333999, 'cond': True} {'_id': 'deepface/tests/dataset/img11.jpg', 'distance': 8.944895856870483, 'cond': True} {'_id': 'deepface/tests/dataset/img1.jpg', 'distance': 9.297199074856135, 'cond': True}
Obviously, we can apply same approach for reverse image search similar to Google Images.
Lightweight way
mongoDB comes with high scalability and this is very good for big data. However, you might need to scale your code always. A lightweight way exists!
Validation
Query becomes very complex and we might make some mistake. That’s why, we should find the closest ones in client side as well. We can do it with deepface easily.
from deepface import DeepFace target_img_path = "target.jpg" dfs = DeepFace.find(target_img_path , db_path = 'deepface/tests/dataset' , model_name = 'Facenet' , distance_metric = 'euclidean' , detector_backend = 'opencv') print(dfs[0].head(10))
This returns exactly the same results with mongo. So, everything seems fine.
To have a more information about verify and find functions of deepface framework, you should watch the following videos.
Applying face verification several times in the background.
Large scale face recognition
Notice that handling face recognition in mongo requires O(n) time complexity and this might still have trouble for millions level data. You might consider some solutions based on approximate nearest neighbor: elasticsearch, annoy, faiss or nmslib. Those libraries reduces time complexity dramatically.
The Best Single Model
DeepFace has many cutting-edge models in its portfolio. Find out the best configuration for facial recognition model, detector, similarity metric and alignment mode.
DeepFace API
DeepFace offers a web service for face verification, facial attribute analysis and vector embedding generation through its API. You can watch a tutorial on using the DeepFace API here:
Additionally, DeepFace can be run with Docker to access its API. Learn how in this video:
Other no sql solutions
As an alternative to mongoDB, Cassandra, Redis and Hadoop are strong solutions. They come with the power of map reduce technology. Especially, cassandra and redis are key value stores and offer high performance in particular face verification tasks instead of face recognition.
Super Fast Vector Search
In this post, we focused on using the k-NN algorithm to find similar vectors. However, this approach becomes problematic with large databases due to its time complexity of O(n + n log(n)). Imagine indexing all images on Google! To address this, we use the approximate nearest neighbor algorithm, which significantly reduces complexity and allows for super-fast vector searches. With this method, you can find the nearest vectors in a billion-scale database in just milliseconds. Many vector databases and indexing tools, such as Annoy, Faiss, ElasticSearch, NMSLIB, and Redis, adopt a similar approach.
Tech Stack Recommendations
Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.
Conclusion
So, we’ve applied face recognition heavily depending on a document db mongoDB. In this way, even thin clients and low level hardwares (such as edges, IOT or raspberry pi) can apply face recognition very cheap. They will consume less processing power but still have robust results.
I pushed the source code of this study to GitHub as a notebook. You can support this study if you starβοΈ the repo.
Support this blog if you do like!
It’s interesting.
Where can I see all the code? Thank you.