Normalization for Facial Recognition with MediaPipe

Detection and alignment are important early mandatory stages of a modern facial recognition pipeline. However, face detection depends on extracting faces as rectangles. So, even detected and aligned facial images come with some noise in the background. Normalization is an optional stage of a modern facial recognition pipeline and plans to decrease the noise in inputs and increase the accuracy of facial recognition pipelines. Normalization stage is mainly based on facial landmark detection for the face oval. In this post, we are going to focus on facial landmarks detection with Google powered MediaPipe library and extract the face oval from the image.

Vlog

You can either continue to read this tutorial or watch the following video. They both cover the topic how to extract facial area with mediapipe library.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Notice that face extracting is very common in facial recognition pipeline to feed more clear inputs to face recognition models; and also deep fake videos for face swapping.

Requirements

I am currently using Python 3.8.12 and MediaPipe 0.8.9.1. I recommend you to have same versions of those packages to avoid environmental issues.

$ pip install mediapipe==0.8.9.1

Thereafter, you are going to be able to import the MediaPipe library.

import mediapipe

Reading a raw image

I’m going to use the following image as input.

import cv2
import matplotlib.pyplot as plt

img = cv2.imread("pexels-cottonbro-8090149-scaled.jpeg")

fig = plt.figure(figsize = (8, 8))
plt.axis('off')
plt.imshow(img[:, :, ::-1])
plt.show()

Building the facial landmarks detector

Facial landmarks detector model comes with face mesh object of the solutions module. Once we build the detector object, then we can feed an image as input.

mp_face_mesh = mediapipe.solutions.face_mesh
face_mesh = mp_face_mesh.FaceMesh(static_image_mode=True)

Running the detector

results = face_mesh.process(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
landmarks = results.multi_face_landmarks[0]

Focusing on face oval

MediaPipe finds 469 landmark points but we will focus on just face oval points in this study. Face mesh object store the categories of landmark point as well.

face_oval = mp_face_mesh.FACEMESH_FACE_OVAL

import pandas as pd
df = pd.DataFrame(list(face_oval), columns = ["p1", "p2"])

This is going to return the 36 lines of the face oval. Notice that we need 2 points to have a line. So, we will have 72 landmark points for face oval.

Ordering face oval lines

Unfortunately, the lines of the face oval are not ordered. We need ordered lines to extract an area in an image with opencv. Let’s order them!

routes_idx = []

p1 = df.iloc[0]["p1"]
p2 = df.iloc[0]["p2"]

for i in range(0, df.shape[0]):
    
    #print(p1, p2)
    
    obj = df[df["p1"] == p2]
    p1 = obj["p1"].values[0]
    p2 = obj["p2"].values[0]
    
    route_idx = []
    route_idx.append(p1)
    route_idx.append(p2)
    routes_idx.append(route_idx)

# -------------------------------

for route_idx in routes_idx:
    print(f"Draw a line between {route_idx[0]}th landmark point to {route_idx[1]}th landmark point")

Here, routes_idx list stores a line between pre-defined point in the face mesh object. Notice that target point of each line is the source point of the next line.

Draw a line between 149th landmark point to 150th landmark point

Draw a line between 150th landmark point to 136th landmark point

Draw a line between 136th landmark point to 172th landmark point

…

Draw a line between 152th landmark point to 148th landmark point

Draw a line between 148th landmark point to 176th landmark point

Draw a line between 176th landmark point to 149th landmark point

So, the last target point is also the first source point as well. We can define a finite field in this way.

Finding the coordinates of points

We ordered the pre-defined landmark routes in the previous section. Let’s find the coordinates of each point. Notice that relative source and target will store de-normalized coordinates.

routes = []

for source_idx, target_idx in routes_idx:
    
    source = landmarks.landmark[source_idx]
    target = landmarks.landmark[target_idx]
        
    relative_source = (int(img.shape[1] * source.x), int(img.shape[0] * source.y))
    relative_target = (int(img.shape[1] * target.x), int(img.shape[0] * target.y))

    #cv2.line(img, relative_source, relative_target, (255, 255, 255), thickness = 2)
    
    routes.append(relative_source)
    routes.append(relative_target)

Extracting face oval

Fill convect polynomial function of opencv expects a route of lines. We already store this in the routes list. Besides, we are feeding a black image in the mask object because a zero pixel means black whereas 255 means white.

import numpy as np

mask = np.zeros((img.shape[0], img.shape[1]))
mask = cv2.fillConvexPoly(mask, np.array(routes), 1)
mask = mask.astype(bool)
 
out = np.zeros_like(img)
out[mask] = img[mask]

fig = plt.figure(figsize = (15, 15))
plt.axis('off')
plt.imshow(out[:, :, ::-1])

Real-time facial landmarks detection

MediaPipe can find 469 facial landmark points for a given face. The following video mentions how it can find facial landmarks in real-time!

MediaPipe vs Dlib

We are able to find 68 landmark points with Dlib already but just 27 points are for face oval. On the other hand, MediaPipe can find 469 facial landmark points and 72 points are for face oval. So, we can extract the facial area much more sensitive with MediaPipe than Dlib! You can see how elegant the result of MediaPipe against Dlib below.

Conclusion

So, we have mentioned how to use MediaPipe facial landmarks detector for face normalization in facial recognition pipelines. MediaPipe comes with more sensitive and faster results. In this way, we can get rid of any noise in the background of facial image and just focus on the facial area.

Finally, I pushed the source code of this study as a notebook to GitHub. You can support this study if you star⭐️ the repo 🙏

Like this blog? Support me on Patreon