Introducing Brand New Face Recognition in DeepFace

As humans, recognizing a person is often similar to matching someone we see with the people we already know or meet, and keep in our mental database. In computer vision, however, face recognition has a much more precise definition. From an academic perspective, face recognition is fundamentally a face verification problem: given a pair of face images, the task is to classify whether they belong to the same person or to two different individuals. Almost all face recognition models are trained and evaluated under this formulation, using benchmark datasets such as LFW, which consist of face pairs labeled as same or different. This is also the principle behind systems we often use in daily life. For example, Face ID on smartphones verifies whether the face captured by the front camera belongs to the same person as the one previously enrolled on the device.

Woman in Beige Shirt and Blue Denim Jeans Standing in Front of Books by pexels

There is, however, another closely related but practically different problem: searching for a person within a crowd. This is the scenario we usually mean in real-world face recognition applications. For example, in a forensic or surveillance setting, individuals appearing in CCTV footage are searched against a database of known or wanted people. Conceptually, this problem can be reduced to applying face verification multiple times: the query face is compared against every identity stored in the database.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

In this post, we will focus on how face recognition has been handled in DeepFace so far, and why this approach needed to evolve for large-scale and production use cases. We will first briefly revisit the traditional, directory-based find function and its limitations, especially in stateful and API-driven environments. We will then introduce the new register, build_index, and search functions added in DeepFace v0.9.7, and explain how they enable scalable, stateless face search backed by databases and approximate nearest neighbor indexing.

Vlog

You can either continue to read this post or watch the following video, they both cover the same subject with same steps.

How Face Recognition Has Been Handled So Far

In DeepFace, face verification is handled by the verify function. If you want to know whether two images belong to the same person, you simply pass the image pair to this function and receive a similarity decision.

result: dict = DeepFace.verify(
    img1_path = "img1.jpg",
    img2_path = "img2.jpg",
)

For face recognition (i.e., searching within a dataset), DeepFace traditionally provided the find function. With find, you pass a target image, and a directory containing reference images.

dfs: List[pd.DataFrame] = DeepFace.find(
    img_path = "img1.jpg",
    db_path = "C:/my_db",
)

DeepFace extracts embeddings for images in that directory and stores them on disk in pickle format. Once embeddings are computed, subsequent searches are fast, and only newly added or removed images are processed in later runs — not the entire dataset from scratch.

First run of find function takes minutes
Embeddings are stored in a pickle in the same folder after 1st run
Find function performs much faster from its second run

However, there was an important limitation. DeepFace ships not only as a Python library but also with a REST API. While the API exposes functions such as verify, analyze, and represent, the find function could not be exposed because it is stateful by design.

Introducing brand new register and search

Starting from DeepFace v0.9.7, three new core functions were introduced:

  • register
  • build_index
  • search

These functions enable a stateless and scalable face recognition pipeline. With register, you can extract face embeddings and store them directly in a database. With search, you can query a target image against the stored embeddings. In other words, the stateful find function is now complemented by a stateless search alternative, making large-scale deployments much easier. Also, you will always get the response from 1st request unsimilar to find function.

# register images into the database
DeepFace.register(
    img = "img1.jpg"
)
DeepFace.register(
    img = ["img2.jpg", "img3.jpg"]
)

# perform exact search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg"
)

Currently supported backend databases are:

We will add more backends in the short term but the principle will keep same. The default backend database is set to postgres. In other words, if you haven’t specified anything, then it will store embeddings in postgres. However, you can set database type in input arguments of these functions.

# register images into the database
DeepFace.register(
    img = "img1.jpg",
    database_type = "mongo",
)

# perform exact search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg",
    database_type = "mongo",
)
Result of exact nearest neighbour

Wise Search

If your database size is n, performing a brute-force search has a time complexity of O(n). This is manageable for small datasets (n ~ 10K), but as computer scientists, we usually try to avoid O(n) solutions when n grows large. For databases containing millions or even billions of embeddings, brute-force search quickly becomes impractical.

The original find function relied on brute-force search, and by default, the new search function behaves the same way. However, search introduces a powerful option: Approximate Nearest Neighbor (ANN) search.





By configuring the search_type argument, you can switch from exact brute-force search to ANN search, reducing the complexity from O(n) to approximately O(log n). This allows searches to complete in seconds even when the database contains tens of millions of faces.

# perform approximate neartest neighbour search
dfs: List[pd.DataFrame] = DeepFace.search(
    img = "target.jpg",
    search_type = "ann"
)
Result of approximate nearest neighbour

Indexing Embeddings and Vector Databases

For ANN search, indexing is required:

  • If you use Postgres or Mongo as the backend, embeddings are stored in those databases. You need to call build_index to index embeddings. This will use FAISS in the background. Then, index will be stored in database as well.
  • build_index function is resumable, should be run again whenever new identities are added to the db.
  • If you use Weaviate, Neo4j, pgvector which are vector databases, indexing is handled internally, and calling build_index is not necessary.
# build index on registered embeddings (for postgres and mongo only)
DeepFace.build_index()

As a rule of thumb:

  • Tens of thousands of embeddings → exact (brute-force) search is acceptable
  • Millions of embeddings → ANN search with FAISS is recommended
  • Tens of millions and beyond → FAISS may require GPU acceleration
  • 10M+ scale → a vector database becomes the most suitable choice

API Support

Because register, build_index, and search are stateless, they can be safely exposed through the DeepFace REST API. This makes it possible to build scalable face recognition services without relying on local directories, pickle files, or in-memory state — a key requirement for production-grade systems.

# register facial images and embeddings to db
$ curl -X POST http://localhost:5005/register \
   -H "Content-Type: application/json" \
   -d '{"model_name":"Facenet", "img":"img1.jpg"}'

# index embeddings (for postgres and mongo only)
$ curl -X POST http://localhost:5005/build/index \
   -H "Content-Type: application/json" \
   -d '{"model_name":"Facenet"}'

# search an identity in database
$ curl -X POST http://localhost:5005/search \
   -H "Content-Type: application/json" \
   -d '{"img":"target.jpg", "model_name":"Facenet"}'

Conclusion

Face recognition in DeepFace has evolved from a directory-based, stateful approach toward a more scalable and production-ready architecture. While face verification remains the core academic problem, real-world face recognition requires efficient search across large collections of identities. With the introduction of register, build_index, and search, DeepFace now supports stateless face search backed by databases and approximate nearest neighbor indexing. This shift enables DeepFace to scale from small, local datasets to millions of faces while remaining compatible with REST-based deployments. As a result, DeepFace can now serve both research-oriented use cases and large-scale, real-world face recognition systems more effectively.


Support this blog financially if you do like!

Buy me a coffee      Buy me a coffee


Leave a Reply