Apparent Age and Gender Prediction in Keras

Computer vision researchers of ETH Zurich University (Switzerland) announced a very successful apparent age and gender prediction models. They both shared how they designed the machine learning model and pre-trained weights for transfer learning. Their implementation was based on Caffe framework. Even though I tried to convert Caffe model and weights to Keras / TensorFlow, I couldn’t handle this. That’s why, I intend to adopt this research from scratch in Keras.

katy-perry-ages-v2 — Katy Perry Transformation

What this post offers?

We can apply age and gender predictions in real time.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

DeepFace library for Python covers age prediction. You can run age estimation with a few lines of code.

Pre-trained model

In this post, we are going to re-train the age and gender prediction models from scratch. If you focus on just prediction stage, then the following video might attract your attention. This subject is covered in a dedicated blog post actually: Age and Gender Prediction with Deep Learning in OpenCV. We will use pre-trained models for Caffe within OpenCV in this case. Besides, you don’t have to have Caffe on your environment. OpenCV handles to build Caffe models with its dnn module.

On the other hand, if training stage attracts your attention, then you should continue to read this blog post.

Dataset

The original work consumed face pictures collected from IMDB (7 GB) and Wikipedia (1 GB). You can find these data sets here. In this post, I will just consume wiki data source to develop solution fast. You should download faces only files.

Extracting wiki_crop.tar creates 100 folders and an index file (wiki.mat). The index file is saved as Matlab format. We can read Matlab files in python with SciPy.

import scipy.io
mat = scipy.io.loadmat(&amp;#039;wiki_crop/wiki.mat&amp;#039;)

Converting pandas dataframe will make transformations easier.

instances = mat[&amp;#039;wiki&amp;#039;][0][0][0].shape[1]

columns = [&amp;amp;quot;dob&amp;amp;quot;, &amp;amp;quot;photo_taken&amp;amp;quot;, &amp;amp;quot;full_path&amp;amp;quot;, &amp;amp;quot;gender&amp;amp;quot;, &amp;amp;quot;name&amp;amp;quot;, &amp;amp;quot;face_location&amp;amp;quot;, &amp;amp;quot;face_score&amp;amp;quot;, &amp;amp;quot;second_face_score&amp;amp;quot;]

import pandas as pd
df = pd.DataFrame(index = range(0,instances), columns = columns)

for i in mat:
if i == &amp;amp;quot;wiki&amp;amp;quot;:
current_array = mat[i][0][0]
for j in range(len(current_array)):
df[columns[j]] = pd.DataFrame(current_array[j][0])

Data set contains date of birth (dob) in Matlab datenum format. We need to convert this to Python datatime format. We just need the birth year.

from datetime import datetime, timedelta
def datenum_to_datetime(datenum):
days = datenum % 1
hours = days % 1 * 24
minutes = hours % 1 * 60
seconds = minutes % 1 * 60
exact_date = datetime.fromordinal(int(datenum)) \
+ timedelta(days=int(days)) + timedelta(hours=int(hours)) \
+ timedelta(minutes=int(minutes)) + timedelta(seconds=round(seconds)) \
- timedelta(days=366)

return exact_date.year

df[&amp;#039;date_of_birth&amp;#039;] = df[&amp;#039;dob&amp;#039;].apply(datenum_to_datetime)

wiki-crop-dataset-dob — Adding exact birth date

Extracting date of birth from matlab datenum format

Now, we have both date of birth and photo taken time. Subtracting these values will give us the ages.

df[&amp;#039;age&amp;#039;] = df[&amp;#039;photo_taken&amp;#039;] - df[&amp;#039;date_of_birth&amp;#039;]

Data cleaning

Some pictures don’t include people in the wiki data set. For example, a vase picture exists in the data set. Moreover, some pictures might include two person. Furthermore, some are taken distant. Face score value can help us to understand the picture is clear or not. Also, age information is missing for some records. They all might confuse the model. We should ignore them. Finally, unnecessary columns should be dropped to occupy less memory.

#remove pictures does not include face
df = df[df[&amp;#039;face_score&amp;#039;] != -np.inf]

#some pictures include more than one face, remove them
df = df[df[&amp;#039;second_face_score&amp;#039;].isna()]

#check threshold
df = df[df[&amp;#039;face_score&amp;#039;] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt;= 3]

#some records do not have a gender information
df = df[~df[&amp;#039;gender&amp;#039;].isna()]

df = df.drop(columns = [&amp;#039;name&amp;#039;,&amp;#039;face_score&amp;#039;,&amp;#039;second_face_score&amp;#039;,&amp;#039;date_of_birth&amp;#039;,&amp;#039;face_location&amp;#039;])

Some pictures are taken for unborn people. Age value seems to be negative for some records. Dirty data might cause this. Moreover, some seems to be alive for more than 100. We should restrict the age prediction problem for 0 to 100 years.

#some guys seem to be greater than 100. some of these are paintings. remove these old guys
df = df[df[&amp;#039;age&amp;#039;] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lt;= 100]

#some guys seem to be unborn in the data set
df = df[df[&amp;#039;age&amp;#039;] &amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; 0]

The raw data set will be look like the following data frame.

We can visualize the target label distribution.

histogram_age = df[&amp;#039;age&amp;#039;].hist(bins=df[&amp;#039;age&amp;#039;].nunique())
histogram_gender = df[&amp;#039;gender&amp;#039;].hist(bins=df[&amp;#039;gender&amp;#039;].nunique())

age-gender-distribution — Age and gender distribution in the data set

Full path column states the exact location of the picture on the disk. We need its pixel values.

target_size = (224, 224)

def getImagePixels(image_path):
img = image.load_img(&amp;amp;quot;wiki_crop/%s&amp;amp;quot; % image_path[0], grayscale=False, target_size=target_size)
x = image.img_to_array(img).reshape(1, -1)[0]
#x = preprocess_input(x)
return x

df[&amp;#039;pixels&amp;#039;] = df[&amp;#039;full_path&amp;#039;].apply(getImagePixels)

We can extract the real pixel values of pictures

wiki-crop-dataset-pixels — Adding pixels

Apparent age prediction model

Age prediction is a regression problem. But researchers define it as a classification problem. There are 101 classes in the output layer for ages 0 to 100. they applied transfer learning for this duty. Their choice was VGG for imagenet.

Preparing input output

Pandas data frame includes both input and output information for age and gender prediction tasks. Wee should just focus on the age task.

classes = 101 #0 to 100
target = df[&amp;#039;age&amp;#039;].values
target_classes = keras.utils.to_categorical(target, classes)

features = []

for i in range(0, df.shape[0]):
features.append(df[&amp;#039;pixels&amp;#039;].values[i])

features = np.array(features)
features = features.reshape(features.shape[0], 224, 224, 3)

Also, we need to split data set as training and testing set.

from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(features, target_classes, test_size=0.30)

The final data set consists of 22578 instances. It is splitted into 15905 train instances and 6673 test instances .

Transfer learning

As mentioned, researcher used VGG imagenet model. Still, they tuned weights for this data set. Herein, I prefer to use VGG-Face model. Because, this model is tuned for face recognition task. In this way, we might have outcomes for patterns in the human face.

#VGG-Face model
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
model.add(Convolution2D(64, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation=&amp;#039;relu&amp;#039;))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Convolution2D(4096, (7, 7), activation=&amp;#039;relu&amp;#039;))
model.add(Dropout(0.5))
model.add(Convolution2D(4096, (1, 1), activation=&amp;#039;relu&amp;#039;))
model.add(Dropout(0.5))
model.add(Convolution2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation(&amp;#039;softmax&amp;#039;))

Load the pre-trained weights for VGG-Face model. You can find the related blog post here.

#pre-trained weights of vgg-face model.
#you can find it here: https://github.com/serengil/deepface_models/releases/download/v1.0/vgg_face_weights.h5
#related blog post: https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/
model.load_weights(&amp;#039;vgg_face_weights.h5&amp;#039;)

We should lock the layer weights for early layers because they could already detect some patterns. Fitting the network from scratch might cause to lose this important information. I prefer to freeze all layers except last 3 convolution layers (make exception for last 7 model.add units). Also, I cut the last convolution layer because it has 2622 units. I need just 101 (ages from 0 to 100) units for age prediction task. Then, add a custom convolution layer consisting of 101 units.

for layer in model.layers[:-7]:
layer.trainable = False

base_model_output = Sequential()
base_model_output = Convolution2D(101, (1, 1), name=&amp;#039;predictions&amp;#039;)(model.layers[-4].output)
base_model_output = Flatten()(base_model_output)
base_model_output = Activation(&amp;#039;softmax&amp;#039;)(base_model_output)

age_model = Model(inputs=model.input, outputs=base_model_output)

Training

This is a multi-class classification problem. Loss function must be categorical crossentropy. Optimization algorithm will be Adam to converge loss faster. I create a checkpoint to monitor model over iterations and avoid overfitting. The iteration which has the minimum validation loss value will include the optimum weights. That’s why, I’ll monitor validation loss and save the best one only.

To avoid overfitting, I feed random 256 instances for each epoch.

age_model.compile(loss=&amp;#039;categorical_crossentropy&amp;#039;, optimizer=keras.optimizers.Adam(), metrics=[&amp;#039;accuracy&amp;#039;])

checkpointer = ModelCheckpoint(filepath=&amp;#039;age_model.hdf5&amp;#039;
, monitor = &amp;amp;quot;val_loss&amp;amp;quot;, verbose=1, save_best_only=True, mode = &amp;#039;auto&amp;#039;)

scores = []
epochs = 250; batch_size = 256

for i in range(epochs):
print(&amp;amp;quot;epoch &amp;amp;quot;,i)

ix_train = np.random.choice(train_x.shape[0], size=batch_size)

score = age_model.fit(train_x[ix_train], train_y[ix_train]
, epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer])

scores.append(score)

It seems that validation loss reach the minimum. Increasing epochs will cause to overfitting.

age-prediction-loss-v2 — Loss for age prediction task

Model evaluation on test set

We can evaluate the final model on the test set.

age_model.evaluate(test_x, test_y, verbose=1)

This gives both validation loss and accuracy respectively for 6673 test instances. It seems that we have the following results.

[2.871919590848929, 0.24298789490543357]

24% accuracy seems very low, right? Actually, it is not. Herein, researchers develop an age prediction approach and convert classification task to regression. They propose that you should multiply each softmax out with its label. Summing this multiplications will be the apparent age prediction.

This is a very easy operation in Python numpy.

predictions = age_model.predict(test_x)

output_indexes = np.array([i for i in range(0, 101)])
apparent_predictions = np.sum(predictions * output_indexes, axis = 1)

Herein, mean absolute error metric might be more meaningful to evaluate the system.

mae = 0

for i in range(0 ,apparent_predictions.shape[0]):
prediction = int(apparent_predictions[i])
actual = np.argmax(test_y[i])

abs_error = abs(prediction - actual)
actual_mean = actual_mean + actual

mae = mae + abs_error

mae = mae / apparent_predictions.shape[0]

print(&amp;amp;quot;mae: &amp;amp;quot;,mae)
print(&amp;amp;quot;instances: &amp;amp;quot;,apparent_predictions.shape[0])

Our apparent age prediction model averagely predict ages ± 4.65 error. This is acceptable.

Testing model on custom images

We can feel the power of the model when we feed custom images into it.

from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator

def loadImage(filepath):
test_img = image.load_img(filepath, target_size=(224, 224))
test_img = image.img_to_array(test_img)
test_img = np.expand_dims(test_img, axis = 0)
test_img /= 255
return test_img

picture = &amp;amp;quot;marlon-brando.jpg&amp;amp;quot;
prediction = age_model.predict(loadImage(picture))

Prediction variable stores distribution for each age class. Monitoring it might be intersting.

y_pos = np.arange(101)
plt.bar(y_pos, prediction[0], align=&amp;#039;center&amp;#039;, alpha=0.3)
plt.ylabel(&amp;#039;percentage&amp;#039;)
plt.title(&amp;#039;age&amp;#039;)
plt.show()

This is the age prediction distribution of Marlon Brando in Godfather. The most dominant age class is 44 whereas weighted age is 48 which is the exact age of him in 1972.

We’ll calculate apparent age from these age distributions

img = image.load_img(picture)
plt.imshow(img)
plt.show()

print(&amp;amp;quot;most dominant age class (not apparent age): &amp;amp;quot;,np.argmax(prediction))

apparent_age = np.round(np.sum(prediction * output_indexes, axis = 1))
print(&amp;amp;quot;apparent age: &amp;amp;quot;, int(apparent_age[0]))

Results are very satisfactory even though it does not have a good perspective. Marlon Brando was 48 and Al Pacino was 32 in Godfather Part I.

age-prediction-for-godfather-v2 — Apparent Age Prediction in Godfather

Compare to original study

As I mentioned before, we re-trained the base model because the original study is mainly based on Caffe and I need pre-trained weights for Keras. The original study was the winner of the ChaLearn Looking at People (LAP) challenge on Apparent age V1 (ICCV ’15).

You are expected to predict the age of someone and there are several predictions of his/her age instead of actual age. So, your predictions will be evaluated by the mean and standard deviation the the jury predictions.

If your prediction is equal to the mean of the predictions, then error becomes 0. Besides, if your prediction is not close to the mean of the predictions but the standard deviation of jury predictions are high, then the error closes to 0 as well. On the other hand, you will be fined if your prediction is not close to the mean of predictions and the standard deviation of the jury predictions is low as well.

from math import e
df[&amp;#039;epsilon&amp;#039;] = e ** ( -1*( (df[&amp;#039;prediction&amp;#039;] - df[&amp;#039;mean_age&amp;#039;]) ** 2 ) / (2*(df[&amp;#039;std_age&amp;#039;]**2)) )
df[&amp;#039;epsilon&amp;#039;].mean()

The ε value of this model is 0.387378, and MAE is 7.887859 for 1079 instances. On the other hand, the ε value of the original study was 0.264975. They declared that human reference of ε was 0.34. So, the original study is still little bit more accurate than the model I created in this post. Besides, my model is close to the human level for age prediction.

You can find the evaluation test data set and its labels here.

Face detection

Train set images are already cropped and just facial areas are mentioned. Testing a custom image requires to detect faces. This will increase the accuracy dramatically. Besides, face alignment is not a must but it is a plus for this study.

There are several face detection solutions. OpenCV offers haar cascade and single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally Multi-task Cascaded Convolutional Networks (MTCNN) is a common solution for face detection. Herein, haar cascade and HoG are legacy methods whereas SSD, MMOD and MTCNN are deep learning based modern solutions. You can see the face detection performance of those model in the following video.

Here, you can also see how to run those different face detectors in a single line of code with deepface framework for python.

You can find out the math behind face alignment more on the following video:

Face detectors extract faces in a rectangle area. So, it comes with a noise such as background color. Here, we can find 68 landmarks of a facial image with dlib.

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.

Gender prediction model

Apparent age prediction was a challenging problem. However, gender prediction is much more predictable.

We’ll apply binary encoding to target gender class.

target = df[&amp;#039;gender&amp;#039;].values
target_classes = keras.utils.to_categorical(target, 2)

We then just need to put 2 classes in the output layer for man and woman.

for layer in model.layers[:-7]:
layer.trainable = False

base_model_output = Sequential()
base_model_output = Convolution2D(2, (1, 1), name=&amp;#039;predictions&amp;#039;)(model.layers[-4].output)
base_model_output = Flatten()(base_model_output)
base_model_output = Activation(&amp;#039;softmax&amp;#039;)(base_model_output)

gender_model = Model(inputs=model.input, outputs=base_model_output)

Now, the model is ready to fit.

scores = []
epochs = 250; batch_size = 256

for i in range(epochs):
print(&amp;amp;quot;epoch &amp;amp;quot;,i)

ix_train = np.random.choice(train_x.shape[0], size=batch_size)

score = gender_model.fit(train_x[ix_train], train_y[ix_train]
, epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer])

scores.append(score)

It seems that the model is saturated. Terminating training will be clever.

loss-for-gender-v2 — Loss for gender prediction

Evaluation

gender_model.evaluate(test_x, test_y, verbose=1)

The model has the following validation loss and accuracy. It is really satisfactory.

[0.07324957040103375, 0.9744245524655362]

Confusion matrix

This is a real classification problem instead of age prediction. The accuracy should not be the only metric we need to monitor. Precision and recall should also be checked.

from sklearn.metrics import classification_report, confusion_matrix

predictions = gender_model.predict(test_x)

pred_list = []; actual_list = []

for i in predictions:
pred_list.append(np.argmax(i))

for i in test_y:
actual_list.append(np.argmax(i))

confusion_matrix(actual_list, pred_list)

The model generates the following confusion matrix. Columns are prediction whereas rows are actual value labels.

| Female | Male |
Female | 1873 | 98 |
Male | 72 | 4604 |

This means that we have 96.29% precision, 95.05% recall. These metrics are as satisfactory as the accuracy.

Testing gender for custom images

We just need to feed images to the model.

picture = &amp;amp;quot;katy-perry.jpg&amp;amp;quot;
prediction = gender_model.predict(loadImage(picture))

img = image.load_img(picture)#, target_size=(224, 224))
plt.imshow(img)
plt.show()
gender = &amp;amp;quot;Male&amp;amp;quot; if np.argmax(prediction) == 1 else &amp;amp;quot;Female&amp;amp;quot;
print(&amp;amp;quot;gender: &amp;amp;quot;, gender)

Conclusion

So, we’ve built an apparent age and gender predictors from scratch based on the research article of computer vision group of ETH Zurich. In particular, the way they proposed to calculate apparent age is an over-performing novel method. Deep learning really has a limitless power for learning.

I pushed the source code for both apparent age prediction and gender prediction to GitHub. Similarly, real time age and gender prediction implementation is pushed here. You might want to just use pre-trained weights. I put pre-trained weights for age and gender tasks to Google Drive.

Python library

Herein, deepface is a lightweight facial analysis framework covering both face recognition and demography such as age, gender, race and emotion. If you are not interested in building neural networks models from scratch, then you might adopt deepface. It is fully open-source and available on PyPI. You can make predictions with a few lines of code.

Here, you can watch how to apply facial attribute analysis in python with a just few lines of code.

You can run deepface in real time with your web cam as well.

Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.

Also, deepface has its React JS ui for facial attribute analysis purposes.

Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.

Support this blog financially if you do like!

83 Comments

Zaki says:

February 5, 2023 at 9:44 pm

Hi, Thanks for the amazing tutorial..

The link to the pre-trained weight is broken, could you pin point where can I download it?

Best

Log in to Reply
1. Sefik Serengil says:
  
  February 7, 2023 at 10:33 pm
  
  Fixing it. Thank you.
  
  Log in to Reply

Apparent Age and Gender Prediction in Keras

What this post offers?

Pre-trained model

Dataset

Data cleaning

Apparent age prediction model

Preparing input output

Transfer learning

Training

Model evaluation on test set

Testing model on custom images

Compare to original study

Face detection

Gender prediction model

Evaluation

Confusion matrix

Testing gender for custom images

Conclusion

Python library

Anti-Spoofing and Liveness Detection

Related

83 Comments

Leave a Reply Cancel reply

What this post offers?

Pre-trained model

Dataset

Data cleaning

Apparent age prediction model

Preparing input output

Transfer learning

Training

Model evaluation on test set

Testing model on custom images

Compare to original study

Face detection

Gender prediction model

Evaluation

Confusion matrix

Testing gender for custom images

Conclusion

Python library

Anti-Spoofing and Liveness Detection

Related

83 Comments

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil