Deep Face Recognition with mongoDB

Face recognition is mainly based on similarity search on facial embeddings. Researches mostly focus on storing vector representations on memory. However, production driven solutions must have database interaction. Herein, mongoDB is a pretty tool to store facial vector representations. We can easily use it on cloud or on-premise. Besides, we can work as a thin client and expect it to make strong calculcations on the server side. It is highly scalable and we could have the power of map reduce.

mongoDB headquarters

Vlog

Join this video to perform vector similarity search in mongodb with a deep face recognition use case.


πŸ™‹β€β™‚οΈ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Dependencies

We will need pymongo and dnspython packages to communicate with mongoDb from python. You should also need MongoDB compass as a visual editor. Finally, I will use mongoDb on cloud. That doesn’t require any installation.

I’ve created the database and collection (equivalent to reguler tables in RDBMs) within compass. The both database and collection name will be deepface in this experiment.

Creating database and collection
Connecting to mongoDb

pymongo distribution offers mongo client. We can connect our mongodb with a basic connection string. It expects basically username and password pair. Once a client connection established, we are going to access the deepface database we have created in the previous stage.

from pymongo import MongoClient
connection = "mongodb+srv://..."

client = MongoClient(connection)

database = 'deepface'; collection = 'deepface'

db = client[database]
Face recognition pipeline

A modern face recognition pipeline consists of 4 common stages: detect, align, represent and verify.

Luckily, deepface framework for python wraps several state-of-the-art face recognition models: VGG-Face, Google FaceNet, OpenFace, Facebook DeepFace, DeepID, Dlib and ArcFace.

We are going to store the facial embeddings instead of image itself in the mongodb. That’s why, we will use deepface framework and Google FaceNet model to represent facial images as vector embeddings.

#!pip install deepface
from deepface import DeepFace

FaceNet, VGG-Face and ArcFace overperform among others. Here, you can watch how to determine the best model.

Loading images

Unit test content of deepface will be the facial database as well in this experiment. Let’s find the image names in that folder first.





facial_img_paths = []
for root, directory, files in os.walk("deepface/tests/dataset"):
    for file in files:
        if '.jpg' in file:
            facial_img_paths.append(root+"/"+file)

Once exact image paths and names stored, we will find vector representations of facial images. Herein, preprocess face function of deepface detects and align faces. FaceNet model expects 160x160x3 inputs and it generates 128 dimensional output.

from deepface.commons import functions
from tqdm import tqdm

instances = []

for i in tqdm(range(0, len(facial_img_paths))):
    facial_img_path = facial_img_paths[i]    
    embedding = DeepFace.represent(img_path = facial_img_path, model_name = "Facenet")[0]["embedding"]
    
    instance = []
    instance.append(facial_img_path)
    instance.append(embedding)
    instances.append(instance)
Face detection

To have a more information about face detection and alignment in deepface, you should watch the following video.

Deepface wraps several face detectors: opencv, ssd, dlib and mtcnn. Here, mtcnn is the most robust but it is really slow. SSD is the fastest but its alignment score is lower than MTCNN. You can watch their detection performance in the following video.

Herein, retinaface is the cutting-edge technology for face detection. It can even detect faces in the crowd. Besides, it finds some facial landmarks including eye coordinates. In this way, its alignment score is high as well.

Preprocessing

We have exact image names and embeddings in instances list. I prefer to store 2D information in pandas instead of python list.

import pandas as pd
df = pd.DataFrame(instances, columns = ["img_name", "embedding"])
df.head()

Pandas data frame stores exact image names and facial embeddings as columns.

Data frame
Storing embeddings in mongoDb

We will walk over the rows of pandas data frame and save image name and embedding column values into the document database.

for index, instance in tqdm(df.iterrows(), total = df.shape[0]):
    db[collection].insert_one({"img_path": instance["img_name"], "embedding" : instance["embedding"].tolist()})

Interestingly, if you try to save the embedding with the command list(instance[“embedding”]), then you will have the exception: cannot encode object: …, of type.

We stored the facial embeddings in mongoDB. When I check it in compass, then I see that a document has image path and embedding fields. Also, embedding field is a type of array and it stores 128 dimensional vector.

Collection

When a new user joined in our team, we then add its embedding into the facial database. We don’t have to find embeddings of existing users anymore.





Target face

We stored our facial database in mongoDB. Now, we will look for an identity in our document based database. We will apply preprocess and embedding stages for the target image.

target_img_path = "target.jpg"
target_img = DeepFace.extract_faces(img_path = target_img_path)[0]["face"]
target_embedding = DeepFace.rerpresent(img_path = target_img_path, model_name = "Facenet")[0]["embedding"]
Angelina Jolie as target
Some queries

We can query existing image names or embeddings from mongoDb as shown below. But that doesn’t satisfy our requirement because target image does not appear in mongo.

db[collection].find_one({'img_path': 'deepface/tests/dataset/img1.jpg'})
db[collection].find_one({'embedding': df.iloc[0].embedding.tolist()})

We can retrieve all image embeddings and compare each one with the target image. In this way, I can find the closest one. However, that’s not the best case as well. Because number of images in mongo might be millions. Besides, I spend all processing power in client side.

documents = db[collection].find()

for document in documents:
    print(document["img_path"], document["embedding"])

Power of mongo

I can spend processing power in the server side as a better solution. I mean that I will send just the embedding of target image to mongo. Then, I expect mongo to find the closest one. Client might be an edge such as raspberry pi in this case. It will just query and get results.

The first question is that how to send an array to mongo?

query = db[collection].aggregate( [
   {
       "$addFields": { 
           "target_embedding": target_embedding
       }
   }
] )

This will return existing image path and embedding fields and also the target embedding I passed above.

{
'_id': ObjectId('600b1eed023a302a7526a2a6')
, 'img_path': 'deepface/tests/dataset/img1.jpg'
, 'embedding': [1.0574054718017578, 1.096140742301941, 1.2643184661865234, , …]
, 'target_embedding': [-0.5445538759231567, 0.24550628662109375, 0.687471330165863, …]
}
Euclidean distance

I need to find the euclidean distance values for each document in the collection. This requires to find the difference of each item with same index in both embedding and target embedding array first.

Euclidean distance by dataaspirant

Subtracting two arrays can be handled with unwind method in mongo.

{
    "$addFields": { 
        "target_embedding": target_embedding
    }
}
, {"$unwind" : { "path" : "$embedding"}}
, {"$unwind" : { "path" : "$target_embedding" }}
Matching unwind dimensions

However, unwinding two arrays causes the cartesian product. If you add the include array index argument in unwinding stage, then you can add a basic compare rule. If source and target embedding index are same, then compare rule will return 0. That’s why, we will filter zero ones with match command.

{
    "$addFields": { 
        "target_embedding": target_embedding
    }
}
, {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}}
, {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }}

, {
    "$project": {
        "img_path": 1,
        "embedding": 1,
        "target_embedding": 1,
        "compare": {
            "$cmp": ['$embedding_index', '$target_index']
        }
    }
}
, {"$match": {"compare": 0}}
Group by clause

Now, there will be 128 lines for each image. Luckily, we can group them by image path key. Once array items of same image paths are group, we can run the euclidean distance formula. Remember that it requires to subtract each dimension first, and find their power of 2 second. Thirdly, we need to find the sum of the found powers.





{
    "$addFields": { 
        "target_embedding": target_embedding
    }
}
, {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}}
, {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }}
, {
    "$project": {
        "img_path": 1,
        "embedding": 1,
        "target_embedding": 1,
        "compare": {
            "$cmp": ['$embedding_index', '$target_index']
        }
    }
}
, {
  "$group": {
    "_id": "$img_path",
    "distance": {
            "$sum": {
                "$pow": [{
                    "$subtract": ['$embedding', '$target_embedding']
                }, 2]
            }
    }
  }
}
Finalize Euclidean distance formula

We finally need to find the square root of the sum.

{
    "$addFields": { 
        "target_embedding": target_embedding
    }
}
, {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}}
, {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }}
, {
    "$project": {
        "img_path": 1,
        "embedding": 1,
        "target_embedding": 1,
        "compare": {
            "$cmp": ['$embedding_index', '$target_index']
        }
    }
}
, {
  "$group": {
    "_id": "$img_path",
    "distance": {
            "$sum": {
                "$pow": [{
                    "$subtract": ['$embedding', '$target_embedding']
                }, 2]
            }
    }
  }
}
, { 
    "$project": {
        "_id": 1
        #, "distance": 1
        , "distance": {"$sqrt": "$distance"}
    }
}
Discarding distant ones

We discard face pairs which have a distance value greater than 10 in FaceNet and Euclidean distance pair. We should add this rule in the query. In this way, mongo will return just same identities of target image.

query = db[collection].aggregate( [
{
    "$addFields": { 
        "target_embedding": target_embedding
    }
}
, {"$unwind" : { "path" : "$embedding", "includeArrayIndex": "embedding_index"}}
, {"$unwind" : { "path" : "$target_embedding", "includeArrayIndex": "target_index" }}
, {
    "$project": {
        "img_path": 1,
        "embedding": 1,
        "target_embedding": 1,
        "compare": {
            "$cmp": ['$embedding_index', '$target_index']
        }
    }
}
, {
  "$group": {
    "_id": "$img_path",
    "distance": {
            "$sum": {
                "$pow": [{
                    "$subtract": ['$embedding', '$target_embedding']
                }, 2]
            }
    }
  }
}
, { 
    "$project": {
        "_id": 1
        #, "distance": 1
        , "distance": {"$sqrt": "$distance"}
    }
}
, { 
    "$project": {
        "_id": 1
        , "distance": 1
        , "cond": { "$lte": [ "$distance", 10 ] }
    }
}
, {"$match": {"cond": True}}
, { "$sort" : { "distance" : 1 } }
] )

for i in query:
    print(i)
Results

Mongo returns the following images as same person for the target image. There are lots of people in the unit test folder but it returns just the faces of Angelina Jolie.

{'_id': 'deepface/tests/dataset/img2.jpg', 'distance': 7.0178008611285865, 'cond': True}
{'_id': 'deepface/tests/dataset/img10.jpg', 'distance': 7.629044595250684, 'cond': True}
{'_id': 'deepface/tests/dataset/img6.jpg', 'distance': 8.467856464704305, 'cond': True}
{'_id': 'deepface/tests/dataset/img4.jpg', 'distance': 8.58306099238241, 'cond': True}
{'_id': 'deepface/tests/dataset/img7.jpg', 'distance': 8.762336403333999, 'cond': True}
{'_id': 'deepface/tests/dataset/img11.jpg', 'distance': 8.944895856870483, 'cond': True}
{'_id': 'deepface/tests/dataset/img1.jpg', 'distance': 9.297199074856135, 'cond': True}
Results

Obviously, we can apply same approach for reverse image search similar to Google Images.

Lightweight way

mongoDB comes with high scalability and this is very good for big data. However, you might need to scale your code always. A lightweight way exists!

Validation

Query becomes very complex and we might make some mistake. That’s why, we should find the closest ones in client side as well. We can do it with deepface easily.

from deepface import DeepFace

target_img_path = "target.jpg"

dfs = DeepFace.find(target_img_path
, db_path = 'deepface/tests/dataset'
, model_name = 'Facenet'
, distance_metric = 'euclidean'
, detector_backend = 'opencv')

print(dfs[0].head(10))

This returns exactly the same results with mongo. So, everything seems fine.

Validation result

To have a more information about verify and find functions of deepface framework, you should watch the following videos.

Applying face verification several times in the background.

Large scale face recognition

Notice that handling face recognition in mongo requires O(n) time complexity and this might still have trouble for millions level data. You might consider some solutions based on approximate nearest neighbor: elasticsearch, annoy, faiss or nmslib. Those libraries reduces time complexity dramatically.





Other no sql solutions

As an alternative to mongoDB, Cassandra, Redis and Hadoop are strong solutions. They come with the power of map reduce technology. Especially, cassandra and redis are key value stores and offer high performance in particular face verification tasks instead of face recognition.

Elephant is an iconic mascot of hadoop
Tech Stack Recommendations

Face recognition is mainly based on representing facial images as vectors. Herein, storing the vector representations is a key factor for building robust facial recognition systems. I summarize the tech stack recommendations in the following video.

Conclusion

So, we’ve applied face recognition heavily depending on a document db mongoDB. In this way, even thin clients and low level hardwares (such as edges, IOT or raspberry pi) can apply face recognition very cheap. They will consume less processing power but still have robust results.

I pushed the source code of this study to GitHub as a notebook. You can support this study if you star⭐️ the repo.


Like this blog? Support me on Patreon

Buy me a coffee


1 Comment

Comments are closed.