Deep Face Detection with Mediapipe

Mediapipe is a Google powered machine learning solution covering face detection task as well. It focuses on live and streaming data. That’s why, it comes with high speed besides its robustness. It also has a cross-platform support and we are able to use it with its python client. In this post, we are going to use mediapipe for both face detection and facial landmark detection. Notice that these are required stages of a modern facial recognition pipeline.

Facial landmark demonstration

Summary

This is a spoiler alert πŸ™‚ Once you read this tutorial, then you will be able to use mediaface for face detection and facial landmarks detections in facial recognition pipelines similar to following videos.


πŸ™‹β€β™‚οΈ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Highlighting the landmarks

Face mesh with tesselation

Vlog

You can either watch the following videos or continue to read this tutorial. They both cover deep face detection with mediapipe and facial landmarks detection with mediapipe.

This will cover 468 facial landmark detection. We will highlight them in the image itself (2D) and also project landmarks in a 3D model. Besides, we will mention specific areas in the face such as eyes, eye brows, lips or face oval with those landmarks. Finally, we will find the face mesh.

Installing The Package

The easiest way to install mediapipe is to install it from PyPI with the following command. I run my experiments with its 0.8.9.1 version.

#!pip install mediapipe==0.8.9.1

Then, you will be able to import the library and use its functionalities.

import mediapipe

Testing set

I am going to use the image of the featured picture of this post. Let’s read the image with opencv.

Photo by cottonbro from Pexels
import cv2
#https://www.pexels.com/photo/portrait-photo-of-woman-in-white-crew-neck-shirt-8090149/
img = cv2.imread("pexels-cottonbro-8090149.jpg")

Building The Detector

Face detection module exists in the mediapipe solutions. Even if you are going to run the detector in real-time, you still need to build it once.





mp_face_detection = mediapipe.solutions.face_detection
face_detector =  mp_face_detection.FaceDetection( min_detection_confidence = 0.6)

Running The Detector

We already read the input image and build the mediapipe detector. Now, we are going to pass the input image to the detector.

results = face_detector.process(img)

Then, results object will store the confidence score and facial area coordinates.

if results.detections:
    for face in results.detections:
        confidence = face.score
        bounding_box = face.location_data.relative_bounding_box
        
        x = int(bounding_box.xmin * img.shape[1])
        w = int(bounding_box.width * img.shape[1])
        y = int(bounding_box.ymin * img.shape[0])
        h = int(bounding_box.height * img.shape[0])
        
        cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 255), thickness = 2)

The object also stores information about some landmarks such as eyes, nose, mouth and eyes.

landmarks = face.location_data.relative_keypoints

right_eye = (int(landmarks[0].x * img.shape[1]), int(landmarks[0].y * img.shape[0]))
left_eye = (int(landmarks[1].x * img.shape[1]), int(landmarks[1].y * img.shape[0]))
nose = (int(landmarks[2].x * img.shape[1]), int(landmarks[2].y * img.shape[0]))
mouth = (int(landmarks[3].x * img.shape[1]), int(landmarks[3].y * img.shape[0]))
right_ear = (int(landmarks[4].x * img.shape[1]), int(landmarks[4].y * img.shape[0]))
left_ear = (int(landmarks[5].x * img.shape[1]), int(landmarks[5].y * img.shape[0]))

cv2.circle(img, right_eye, 15, (0, 0, 255), -1)
cv2.circle(img, left_eye, 15, (0, 0, 255), -1)
cv2.circle(img, nose, 15, (0, 0, 255), -1)
cv2.circle(img, mouth, 15, (0, 0, 255), -1)
cv2.circle(img, right_ear, 15, (0, 0, 255), -1)
cv2.circle(img, left_ear, 15, (0, 0, 255), -1)

So, the result of the mediapipe face detector is illustrated below. It is very satisfactory, right?

The results of the detector

The technology behind face detection

Mediapipe face detection module uses SSD: Single Shot Multibox Detector in the background based on its tutorial. You can find out more about SSD on this post: Deep Face Detection with OpenCV in Python. Here, we implement SSD in python for face detection tasks.

Alignment

Face alignment comes after the detection in the modern facial recognition pipelines.

Once we have the facial area and positions of the eyes, then we are able to align the image with some high school math trick. You should watch the following video or read this blog post to find out more on alignment: Face Alignment for Facial Recognition

DeepFace Library

DeepFace wraps Mediapipe already. You are able to extract detected faces directly or run face recognition and facial analysis with mediapipe. All the functions handles detection and alignment in the background.

I try to summarize the main functionalities of deepface with mediapipe detector below.

#!pip install deepface
from deepface import DeepFace

#face detection
img = DeepFace.detectFace("img.jpg", detector_backend = "mediapipe")

#face verification
obj = DeepFace.verify(img1_path = "img1.jpg", img2_path = "img2.jpg", detector_backend = "mediapipe")
print(obj["verified"])

#face recognition
df = DeepFace.find(img_path = "img.jpg", db_path = "my_db", detector_backend =  "mediapipe")
print(df.head())

#facial analysis
demography = DeepFace.analyze(img_path = "img4.jpg", detector_backend =  "mediapipe")
print(demography)

However, mediapipe has currently a bug in mac and it hangs if tensorflow was imported before mediapipe. That’s why, you would have trouble if you are a mac user and use mediapipe as a backend.





Normalization

Normalization is the 3rd optional station of a modern facial recognition pipeline. Even though we extract just the face and align it in the previous stages, pre-processed image still stores some noisy information such as background color. Advanced detectors come with high facial landmark detection skills. We are able to find 68 facial landmark points with Dlib.

Herein, mediapipe comes with a more advanced technology. It can find 468 facial landmark points!

Building The Landmark Detector

Similar to regular face detector, we are able to build facial landmark detector under mediapipe solutions. This is going to be built once even if you will feed many images to landmark detector.

mp_face_mesh = mediapipe.solutions.face_mesh
face_mesh = mp_face_mesh.FaceMesh(static_image_mode = True)

Running The Landmark Detector

Let’s feed the facial image to facial landmark detector.

results = face_mesh.process(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

Results store the facial landmarks information. It is a python list length of number of faces in the image. I’m going to get the first one in this example because there is just one face in our testing sample.

landmarks = results.multi_face_landmarks[0]

Highlighting The Landmarks

As I mentioned, there are 468 landmark points. Let’s highlight each one in the base image. Here, each landmark has x and y coordinates in scale of [0, 1]. This is a normalized coordinate values. In order to denormalize them, we need to multiple x coordinate with width and y coordinate with height.

for landmark in landmarks.landmark:
    x = landmark.x
    y = landmark.y
    
    relative_x = int(img.shape[1] * x)
    relative_y = int(img.shape[0] * y)
    
    cv2.circle(img, (relative_x, relative_y), 5, (0, 0, 255), -1)

It is very satisfactory, isn’t it?

468 landmark points

The Marvel Way

If you are a marvel fan like me, then you might be wonder how those animations made. For instance, many facial landmarks highlighted manually in the face of Josh Brolin, then Thanos was born!

Josh Brolin as Thanos in Avengers Infinirt War and Endgame

Nowadays, we are able to find the more landmark points in the face even in real-time without manual highlighting! Please watch the following video from this perspective.

Notice that there is camera recording Josh Brolin frontally. He never turns back to the camera in that way. Our facial landmark detector show high performance in the frontal side but its performance is a little bit low from behind. So, mediapipe works for landmark detection in marvel style movies if it is always going to analyze frontal images.





3D projection

Even though we highlighted the facial landmarks in a 2-dimensional space, mediapipe offers the depth of any landmark point as well. I mean that it has x, y and z dimension values for each landmark point. We can project landmarks in the 3-dimensional space from 2-dimensional inputs with matplotlib. We will not denormalize any dimension value in this case.

fig = plt.figure()
ax = fig.add_subplot(projection='3d')

for landmark in landmarks.landmark:
    x = landmark.x
    y = landmark.y
    z = landmark.z

    ax.scatter(x, y, z, color = 'blue')

plt.show()

Appearance of four sides are shown below. That’s really amazing! It seems that mediapipe has a lot of potantial in 3d modelling and augmented reality.

3D prpjection of facial landmarks

Dlib facial landmark detector does not offer the depth with z-dimension. On the other hand, mediapipe comes with the more landmark point and depth information with z-dimension as well.

Custom landmarks

Mediapipe groups 468 landmark points for custom facial areas in the face such as eyes, eye brows, lips or outer area of the face. We are able to extract custom facial area as well.

Mediapipe already stores the index values in the 468 landmark points and routes for many facial areas. Some are shown below:

facial_areas = {
    'Contours': mp_face_mesh.FACEMESH_CONTOURS
    , 'Lips': mp_face_mesh.FACEMESH_LIPS
    , 'Face_oval': mp_face_mesh.FACEMESH_FACE_OVAL
    , 'Left_eye': mp_face_mesh.FACEMESH_LEFT_EYE
    , 'Left_eye_brow': mp_face_mesh.FACEMESH_LEFT_EYEBROW
    , 'Right_eye': mp_face_mesh.FACEMESH_RIGHT_EYE
    , 'Right_eye_brow': mp_face_mesh.FACEMESH_RIGHT_EYEBROW
    , 'Tesselation': mp_face_mesh.FACEMESH_TESSELATION
}

We will highlight each custom face area. To do this task, let’s create a generic function expecting the custom area object. It’s going to get the source and target coordinates, denormalize them and highlight with lines in the base image.

def plot_landmark(img_base, facial_area_obj):    
    for source_idx, target_idx in facial_area_obj:
        source = landmarks.landmark[source_idx]
        target = landmarks.landmark[target_idx]

        relative_source = (int(img.shape[1] * source.x), int(img.shape[0] * source.y))
        relative_target = (int(img.shape[1] * target.x), int(img.shape[0] * target.y))

        cv2.line(img, relative_source, relative_target, (255, 255, 255), thickness = 2)

Then, we are going to call plot landmark function for each custom facial area.

for facial_area in facial_areas.keys():
    facial_area_obj = facial_areas[facial_area]
    plot_landmark(img_base, facial_area_obj)

Results

Especially, this face oval result would be very helpful for normalization in facial recognition pipelines because we can just focus on the facial area with this approach.

Face oval

Here are the results of different facial areas.





Contours

Tracking the eyes and eye brows will help liveness detection analysis in facial recognition studies.

Left eye
Left eye brow
Lips

We can detect face mesh with tesselation argument.

Tesselation

Please watch the following video for face mesh analysis in real-time.

Conclusion

So, we have mentioned how to use mediapipe for pre-processing stages of a modern facial recognition pipeline such as detection, alignment and normalization. Mediapipe is a very powerful face detector disrupting existing detector based on accuracy and speed.

Finally, I pushed the source codes of this study to GitHub already. You can support this study if you star⭐️ its repo πŸ™


Like this blog? Support me on Patreon

Buy me a coffee


2 Comments

  1. Hi Sefik , I am a research scholar. I am trying to work in face recognition, and i want to learn implementation and how to build models. I found your videos interesting and helpful. Could you please help me to learn and to explore more about this topic.

Comments are closed.