Determining the architecture is a significant stage for production driven purposes. Herein, a facial recognition pipeline comes with wide product range. In this post, we are going to discuss the tech stack for several use cases of facial recognition.
Vlog
You can either follow this blog post or watch the following video. They both cover the same subject.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
The big picture
We are going to focus on the recommended tech stack shown below.
Pipeline
Remember that a facial recognition pipeline consists of 4 common stages: detect, align, represent and verify.
Remember that the output of the representation stage represents facial images as vectors. We can do it with several state-of-the-art face recognition models: VGG-Face, Google FaceNet, Dlib and ArcFace. Those models already reached and passed the human level accuracy.
Where to store vector embeddings of facial images? That’s the question!
Face recognition 101
Face verification and face recognition might be confusing for newbies. Let’s mention those tasks first. We can run facial recognition tasks with deepface for python with a few lines of code.
Face verification
Face verification is mainly based on comparing two facial images and determining that they are same person or they are different persons. In other words, face verification looks for the answer of a question pattern like is this Mark?. Here, you already have the pre-taken photo of Mark. Then, someone attempts to enter the system as Mark because he is authorized to enter. But that person is Mark?
Face id in iPhones is a pretty application for face verification.
Face verification has a O(d) time complexity in big O notation where d is the number of dimensions in the facial vector embedding. For example, FaceNet represents facial images as 128 dimensional vectors. So, it has a O(128) time complexity and we can express it to O(1) basically.
Face recognition
On the other hand, face recognition looks for the answer of a question pattern like who is this?
Here, we have the pre-taken photos of many people. We will look for the identity of someone in those pre-taken photos.
Face recognition task has a O(dxn) time complexity in big O notation where d is the number of dimensions in the facial vector embedding and n is the number of instances in our database. Consider we have a very large database. It might have millions or billions level data. Here, n is much greater than the d. That’s why, time complexity could be expressed as O(n) basically.
Relational databases
In face verification, we already know the identity of Mark. His facial vector embedding is found beforehand. Where to store that embedding best?
Regular relational databases is a pretty way to store the embeddings. Notice that the identity of Mark should be defined as index. If you ask what is the embedding of Mark where Mark is an unique index, relational databases will respond you very quickly.
Herein, vector embedding are actually regular arrays in programming languages. You cannot store those items as are in regular databases because there is no data type for arrays. Still, you can store each dimension of the vector (or each item of the array) as columns or rows. Storing dimension values in columns makes your data structure static and that would not be pretty. For example, today we could use FaceNet model and it returns 128 dimensional vectors but in the future we might switch the model from FaceNet to VGG-Face. VGG-Face returns 2622 dimensional vectors. We will have to change the table structure because we defined it with 128 columns to store representation but we now need 2622 columns. It would be better to store dimension values in a different table. Secondary key of the embeddings table should be the primary key of the identity table.
Here, Oracle, Microsoft SQL, IBM DB2 are strong relational database alternatives. Besides, SQLite or MySQL are more lightweight solutions.
Key-value stores
Relational databases are very good at collecting relational data. If your task should returns his purchases in time, relational databases are perfect. However, we actually need just the embedding of an identity in face verification case. We really need that kind of complex architectures?
So, key-value stores overperform than regular relational databases. Besides, they have a data type as array to store a facial embedding as is in contrast to relational databases.
Redis is a very simple key value store. You can store just a value for a key. Suppose that key is Mark, and value is its facial embedding.
Cassandra is a wide column store. I mean that it can store several columns for a key. Suppose that key is Mark and values are its facial embedding, facial image itself, his birth place etc.
On the other hand, Redis can store just a value for a key. You should query Redis several times if you need more than one value for a key. We can manage it in Redis with adding prefixs to key. Suppose that key is embedding:mark and value is its embedding, key is birthplace:mark and value is California. Notice that each query has a cost. So, if you will need some additional fields, then Cassandra will be better than Redis.
No matter how large the database is in both relational databases or key value stores, we will be able to access the metadata of an identity very quickly. Because we will access it with index. Suppose that apple stores the facial embeddings of all iphone users in a database. That’s totally fine!
Big data
Face recognition requires to run face verification many times. In other words, face verification is a subset of face recognition. In theory, you can run face recognition with face verification tools as well. On the other hand, tools for face verification will not work for face recognition when data size becomes huge. You will need some big data solutions to handle face recognition.
Hadoop is a pretty way to store big data. You can still run sql like queries with Hive on even petabytes of data. Besides, you can run spark framework within java or python to perform queries on Hadoop as well.
mongoDB is a document database and it comes with high scalablity feature. We can run it on cluster with minimal way.
The both hadoop and mongoDB have array datatypes to store data in contrast to relational databases.
Approxmimate nearest neighbor
Big data solutions are performing well if you have strong hardware. If you have hundreds of datanodes for Hadoop or tens of clusters for mongoDB, you can handle face recognition easily. However, those are costly solutions.
Consider Google reverse image search or facebook image tag operation. Google indexed hundred billions of images but it can find similar images just in seconds. Facebook has billions of users and hundred billions of images but it can find you even though adder is not your connection. Those are because of the hardware Google and Facebook have?
A famous photographer Ara Guler mentioned that if the best camera had taken the best photograph then the one would be the best novelist who has the best typewriter. So, Google and Facebook can handle billions level data in just seconds and that is not related to the hardware they have!
We actually apply k-nn algorithm to find the nearest neighbors with those big data solutions. On the other hand, approximate nearest neighbor algorithm reduces the time complexity dramatically.
Spotify Annoy, Facebook Faiss, NMSLIB are strong approximate nearest neighbor libraries. Playlist recommendation is Spotify is based on Annoy. Think the number of users and songs in Spotify. You can realize that is very challenging problem. It is the acronym of approximate nearest neighbor oah yeah. Faiss is running in tag module in Facebook. It is the acronym of Facebook AI similarity search. Finally, NMSLIB is developed by 4 PhD students. Elasticsearch wraps NMSLIB and it comes with highly scalability feature as well.
Finally, Pinecone is a vector database working on the cloud. You are able to store data and use its processing power on a cloud system.
Graph database
Graph databases come with the power of discovering relations hard to find. Herein, Neo4j is a pretty graph database. For example, we cannot verify the pair img12 and img55 if we check its distance only. However, they are connected and this is easy to understand with graph databases.
k-NN or a-NN
Approximate nearest neighbor algorithm can be run very fast with limited hardware but it might miss some close items where k-nn algorithm guarantees to find always. If your task is similar to find the similar photos, a-nn algorithm is good. If your task is finding a guilty one, it requires full search with k-nn and big data solutions are better.
Conclusion
So, we have mentioned the tech stack recommendation for facial recognition tasks for different use cases in this post. Facial recognition studies are mainly based on the representation stage but determining the architecture might be more important for production driven applications. Because most face recognition models are already passed the human level accuracy. The innovation is only possible with those tech stack.
Support this blog if you do like!