Deep Face Detection with MTCNN in Python

Face detection is a must stage for a face recognition pipeline to have a robust one. Herein, MTCNN is a strong face detector offering high detection scores. It stands for Multi-task Cascaded Convolutional Networks. It is a modern deep learning based approach as mentioned in its name. We will mention face detection and alignment with MTCNN in this post.

The most famous selfie in the Academy Awards 2014
It is an overperforming detector

It comes with a huge improvement on detection accuracy against OpenCV’s haar cascade and Dlib‘s histogram based approaches. The both SSD and MTCNN overperform on face detection.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

On the other hand, SSD is much faster than MTCNN. I tested 720p video with different face detectors. I can process 9.20 frames per second with SSD while MTCNN could process 1.54 fps. In other words, MTCNN is almost 6 times slower. You can monitor the detection performance of those methods in the following video.

Here, you can watch how to use different face detectors in Python.

Model structure

MTCNN is mainly based on 3 separate CNN models: P-Net, R-Net and O-Net.

MTCNN architecture: P-Net, R-Net, O-Net

The name of the P-Net comes from proposal network. It looks for face in 12×12 sized frames. The mission of this network is to produce fast results.

The name of R-Net comes from refine network. It has a deeper structure than P-Net. All candidate came from the previous network P-Net are fed to R-Net. P-Net here rejects a huge number of candidates.

Finally, output network or shortly O-Net returns bounding box (face area) and facial landmark positions.

Installation

MTCNN depends on tensorflow and keras installations as a prerequisite. It is heavily inspired from David Sandberg‘s FaceNet implementation. It is available on PyPI.

pip install mtcnn
Face detection

MTCNN is a lightweight solution as possible as it can be. We will construct a MTCNN detector first and feed a numpy array as input to the detect faces function under its interface. I load the input image with OpenCV in the following code block. Detect faces function returns an array of objects for detected faces. The returned object stores the coordinates of detected faces in the box key.





from mtcnn import MTCNN
import cv2

detector = MTCNN()

img = cv2.imread("img.jpg")
detections = detector.detect_faces(img)

for detection in detections:
   score = detection["confidence"]
   if score > 0.90:
      x, y, w, h = detection["box"]
      detected_face = img[int(y):int(y+h), int(x):int(x+w)]
Facial landmarks

Even though OpenCV based SSD offers a same level accuracy, MTCNN also finds some facial landmarks such as eye, nose and mouth locations. In particular, extracting the eye locations is very important to align faces. Notice that face alignment increases face recognition model accuracy almost 1% based on the Google FaceNet research.

On the other hand, OpenCV finds eye locations with conventional haar cascade method which under-performs. In other words, we have to depend on legacy haar cascade in OpenCV to align faces even if we adopt modern SSD.

The returned object of the detected faces function also stores facial landmarks. I just focus on the eye locations here.

keypoints = detection["keypoints"]
left_eye = keypoints["left_eye"]
right_eye = keypoints["right_eye"]
Face alignment procedure

We can align faces if we know the exact locations of eyes in the found face. We have mentioned this topic in a dedicated blog post. The following code block copied from the source code of deepface framework.

Alignment procedure

To sum up, we will rotate the base image until the both eye location becomes horizontal. However, I highly recommend you to read the dedicated blog post about face alignment.

def alignment_procedure(img, left_eye, right_eye):
#this function aligns given face in img based on left and right eye coordinates

left_eye_x, left_eye_y = left_eye
right_eye_x, right_eye_y = right_eye

#-----------------------
#find rotation direction

if left_eye_y > right_eye_y:
    point_3rd = (right_eye_x, left_eye_y)
    direction = -1 #rotate same direction to clock
else:
    point_3rd = (left_eye_x, right_eye_y)
    direction = 1 #rotate inverse direction of clock

#-----------------------
#find length of triangle edges

a = distance.findEuclideanDistance(np.array(left_eye), np.array(point_3rd))
b = distance.findEuclideanDistance(np.array(right_eye), np.array(point_3rd))
c = distance.findEuclideanDistance(np.array(right_eye), np.array(left_eye))

#-----------------------

#apply cosine rule

if b != 0 and c != 0: #this multiplication causes division by zero in cos_a calculation

    cos_a = (b*b + c*c - a*a)/(2*b*c)
    angle = np.arccos(cos_a) #angle in radian
    angle = (angle * 180) / math.pi #radian to degree

    #-----------------------
    #rotate base image

    if direction == -1:
        angle = 90 - angle

    img = Image.fromarray(img)
    img = np.array(img.rotate(direction * angle))

#-----------------------

return img #return img anyway

Here, you can see how MTCNN detects and aligns face

Face detection and alignment with MTCNN
MTCNN in deepface

Running detection and alignment respectively might seem complex and it may confuse and discourage you. Herein, deepface wraps OpenCV haar cascade, SSD, Dlib HoG and MTCNN detectors. You can detect and align faces within deepface with just a few lines of code.

from deepface import DeepFace
from deepface.commons import functions

backends = ['opencv', 'ssd', 'dlib', 'mtcnn']

for backend in backends:
   #face detection and alignment
   detected_and_aligned_face = DeepFace.detectFace("img.jpg", detector_backend = backend)
   
   #------------------------
   
   #face detection
   detected_face = functions.detect_face(img = "img.jpg", detector_backend = backend)
   
   #face alignment
   aligned_face = align_face(img = detected_face, detector_backend = backend)

On the other hand, face recognition pipeline covers detect and align stages in the background. I mean that you don’t have to detect and align faces manually in the pipeline.

#face verification
obj = DeepFace.verify("img1.jpg", "img2.jpg", detector_backend = 'mtcnn')

#face recognition
df = DeepFace.find(img_path = "img.jpg", db_path = "my_db", detector_backend = 'mtcnn')

You can watch how to apply those pre-processing stages within deepface in the following video as well.

Face recognition pipeline

Early stages of a modern face recognition pipeline are detection and alignment. We handled those stages in this blog post with a strong algorithm. The next stage is representation.





Conclusion

So, we’ve mentioned how to detect faces with MTCNN and apply face alignment based on facial landmarks provided by the MTCNN as well. Its face detection score is very high but its speed is low than its competitives. Herein, you should adopt MTCNN if your priority is accuracy whereas you should adopt SSD if your priority is speed. Besides, SSD won’t find facial landmarks and we have to use OpenCV’s haar cascade to find eye locations to align faces. This might cause negative effect in production.


Like this blog? Support me on Patreon

Buy me a coffee