Apparent Age and Gender Prediction in Keras

Haven't you subscribe my channel yet?



Follow me on Twitter



MVP

Buy me a coffee

Buy me a coffee

Computer vision researchers of ETH Zurich University (Switzerland) announced a very successful apparent age and gender prediction models. They both shared how they designed the machine learning model and pre-trained weights for transfer learning. Their implementation was based on Caffe framework. Even though I tried to convert Caffe model and weights to Keras / TensorFlow, I couldn’t handle this. That’s why, I intend to adopt this research from scratch in Keras.

katy-perry-ages-v2
Katy Perry Transformation

What this post offers?

We can apply age and gender predictions in real time.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

DeepFace library for Python covers age prediction. You can run age estimation with a few lines of code.

Pre-trained model

In this post, we are going to re-train the age and gender prediction models from scratch. If you focus on just prediction stage, then the following video might attract your attention. This subject is covered in a dedicated blog post actually: Age and Gender Prediction with Deep Learning in OpenCV. We will use pre-trained models for Caffe within OpenCV in this case. Besides, you don’t have to have Caffe on your environment. OpenCV handles to build Caffe models with its dnn module.

On the other hand, if training stage attracts your attention, then you should continue to read this blog post.

Dataset

The original work consumed face pictures collected from IMDB (7 GB) and Wikipedia (1 GB). You can find these data sets here. In this post, I will just consume wiki data source to develop solution fast. You should download faces only files.

Extracting wiki_crop.tar creates 100 folders and an index file (wiki.mat). The index file is saved as Matlab format. We can read Matlab files in python with SciPy.

1
2
import scipy.io
mat = scipy.io.loadmat('wiki_crop/wiki.mat')

Converting pandas dataframe will make transformations easier.

1
2
3
4
5
6
7
8
9
10
11
12
instances = mat['wiki'][0][0][0].shape[1]
 
columns = ["dob", "photo_taken", "full_path", "gender", "name", "face_location", "face_score", "second_face_score"]
 
import pandas as pd
df = pd.DataFrame(index = range(0,instances), columns = columns)
 
for i in mat:
if i == "wiki":
current_array = mat[i][0][0]
for j in range(len(current_array)):
df[columns[j]] = pd.DataFrame(current_array[j][0])
wiki-crop-dataset
Initial data set

Data set contains date of birth (dob) in Matlab datenum format. We need to convert this to Python datatime format. We just need the birth year.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
from datetime import datetime, timedelta
def datenum_to_datetime(datenum):
days = datenum % 1
hours = days % 1 * 24
minutes = hours % 1 * 60
seconds = minutes % 1 * 60
exact_date = datetime.fromordinal(int(datenum)) \
+ timedelta(days=int(days)) + timedelta(hours=int(hours)) \
+ timedelta(minutes=int(minutes)) + timedelta(seconds=round(seconds)) \
- timedelta(days=366)
 
return exact_date.year
 
df['date_of_birth'] = df['dob'].apply(datenum_to_datetime)
wiki-crop-dataset-dob
Adding exact birth date

Extracting date of birth from matlab datenum format





Now, we have both date of birth and photo taken time. Subtracting these values will give us the ages.

1
df['age'] = df['photo_taken'] - df['date_of_birth']

Data cleaning

Some pictures don’t include people in the wiki data set. For example, a vase picture exists in the data set. Moreover, some pictures might include two person. Furthermore, some are taken distant. Face score value can help us to understand the picture is clear or not. Also, age information is missing for some records. They all might confuse the model. We should ignore them. Finally, unnecessary columns should be dropped to occupy less memory.

1
2
3
4
5
6
7
8
9
10
11
12
13
#remove pictures does not include face
df = df[df['face_score'] != -np.inf]
 
#some pictures include more than one face, remove them
df = df[df['second_face_score'].isna()]
 
#check threshold
df = df[df['face_score'] >= 3]
 
#some records do not have a gender information
df = df[~df['gender'].isna()]
 
df = df.drop(columns = ['name','face_score','second_face_score','date_of_birth','face_location'])

Some pictures are taken for unborn people. Age value seems to be negative for some records. Dirty data might cause this. Moreover, some seems to be alive for more than 100. We should restrict the age prediction problem for 0 to 100 years.

1
2
3
4
5
#some guys seem to be greater than 100. some of these are paintings. remove these old guys
df = df[df['age'] <= 100]
 
#some guys seem to be unborn in the data set
df = df[df['age'] > 0]

The raw data set will be look like the following data frame.

wiki-crop-dataset-raw
Raw data set

We can visualize the target label distribution.

1
2
histogram_age = df['age'].hist(bins=df['age'].nunique())
histogram_gender = df['gender'].hist(bins=df['gender'].nunique())
age-gender-distribution
Age and gender distribution in the data set

Full path column states the exact location of the picture on the disk. We need its pixel values.

1
2
3
4
5
6
7
8
9
target_size = (224, 224)
 
def getImagePixels(image_path):
img = image.load_img("wiki_crop/%s" % image_path[0], grayscale=False, target_size=target_size)
x = image.img_to_array(img).reshape(1, -1)[0]
#x = preprocess_input(x)
return x
 
df['pixels'] = df['full_path'].apply(getImagePixels)

We can extract the real pixel values of pictures

wiki-crop-dataset-pixels

Adding pixels

Apparent age prediction model

Age prediction is a regression problem. But researchers define it as a classification problem. There are 101 classes in the output layer for ages 0 to 100. they applied transfer learning for this duty. Their choice was VGG for imagenet.

Preparing input output

Pandas data frame includes both input and output information for age and gender prediction tasks. Wee should just focus on the age task.

1
2
3
4
5
6
7
8
9
10
11
classes = 101 #0 to 100
target = df['age'].values
target_classes = keras.utils.to_categorical(target, classes)
 
features = []
 
for i in range(0, df.shape[0]):
features.append(df['pixels'].values[i])
 
features = np.array(features)
features = features.reshape(features.shape[0], 224, 224, 3)

Also, we need to split data set as training and testing set.





1
2
from sklearn.model_selection import train_test_split
train_x, test_x, train_y, test_y = train_test_split(features, target_classes, test_size=0.30)

The final data set consists of 22578 instances. It is splitted into 15905 train instances and 6673 test instances .

Transfer learning

As mentioned, researcher used VGG imagenet model. Still, they tuned weights for this data set. Herein, I prefer to use VGG-Face model. Because, this model is tuned for face recognition task. In this way, we might have outcomes for patterns in the human face.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#VGG-Face model
model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
 
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
 
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
 
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
 
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))
 
model.add(Convolution2D(4096, (7, 7), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(4096, (1, 1), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation('softmax'))

Load the pre-trained weights for VGG-Face model. You can find the related blog post here.

1
2
3
4
#pre-trained weights of vgg-face model.
model.load_weights('vgg_face_weights.h5')

We should lock the layer weights for early layers because they could already detect some patterns. Fitting the network from scratch might cause to lose this important information. I prefer to freeze all layers except last 3 convolution layers (make exception for last 7 model.add units). Also, I cut the last convolution layer because it has 2622 units. I need just 101 (ages from 0 to 100) units for age prediction task. Then, add a custom convolution layer consisting of 101 units.

1
2
3
4
5
6
7
8
9
for layer in model.layers[:-7]:
layer.trainable = False
 
base_model_output = Sequential()
base_model_output = Convolution2D(101, (1, 1), name='predictions')(model.layers[-4].output)
base_model_output = Flatten()(base_model_output)
base_model_output = Activation('softmax')(base_model_output)
 
age_model = Model(inputs=model.input, outputs=base_model_output)

Training

This is a multi-class classification problem. Loss function must be categorical crossentropy. Optimization algorithm will be Adam to converge loss faster. I create a checkpoint to monitor model over iterations and avoid overfitting. The iteration which has the minimum validation loss value will include the optimum weights. That’s why, I’ll monitor validation loss and save the best one only.

To avoid overfitting, I feed random 256 instances for each epoch.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
age_model.compile(loss='categorical_crossentropy', optimizer=keras.optimizers.Adam(), metrics=['accuracy'])
 
checkpointer = ModelCheckpoint(filepath='age_model.hdf5'
, monitor = "val_loss", verbose=1, save_best_only=True, mode = 'auto')
 
scores = []
epochs = 250; batch_size = 256
 
for i in range(epochs):
print("epoch ",i)
 
ix_train = np.random.choice(train_x.shape[0], size=batch_size)
 
score = age_model.fit(train_x[ix_train], train_y[ix_train]
, epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer])
 
scores.append(score)

It seems that validation loss reach the minimum. Increasing epochs will cause to overfitting.

age-prediction-loss-v2
Loss for age prediction task

Model evaluation on test set

We can evaluate the final model on the test set.

1
age_model.evaluate(test_x, test_y, verbose=1)

This gives both validation loss and accuracy respectively for 6673 test instances. It seems that we have the following results.

[2.871919590848929, 0.24298789490543357]





24% accuracy seems very low, right? Actually, it is not. Herein, researchers develop an age prediction approach and convert classification task to regression. They propose that you should multiply each softmax out with its label. Summing this multiplications will be the apparent age prediction.

age-prediction-approach
Age prediction approach

This is a very easy operation in Python numpy.

1
2
3
4
predictions = age_model.predict(test_x)
 
output_indexes = np.array([i for i in range(0, 101)])
apparent_predictions = np.sum(predictions * output_indexes, axis = 1)

Herein, mean absolute error metric might be more meaningful to evaluate the system.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mae = 0
 
for i in range(0 ,apparent_predictions.shape[0]):
prediction = int(apparent_predictions[i])
actual = np.argmax(test_y[i])
 
abs_error = abs(prediction - actual)
actual_mean = actual_mean + actual
 
mae = mae + abs_error
 
mae = mae / apparent_predictions.shape[0]
 
print("mae: ",mae)
print("instances: ",apparent_predictions.shape[0])

Our apparent age prediction model averagely predict ages ± 4.65 error. This is acceptable.

Testing model on custom images

We can feel the power of the model when we feed custom images into it.

1
2
3
4
5
6
7
8
9
10
11
12
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
 
def loadImage(filepath):
test_img = image.load_img(filepath, target_size=(224, 224))
test_img = image.img_to_array(test_img)
test_img = np.expand_dims(test_img, axis = 0)
test_img /= 255
return test_img
 
picture = "marlon-brando.jpg"
prediction = age_model.predict(loadImage(picture))

Prediction variable stores distribution for each age class. Monitoring it might be intersting.

1
2
3
4
5
y_pos = np.arange(101)
plt.bar(y_pos, prediction[0], align='center', alpha=0.3)
plt.ylabel('percentage')
plt.title('age')
plt.show()

This is the age prediction distribution of Marlon Brando in Godfather. The most dominant age class is 44 whereas weighted age is 48 which is the exact age of him in 1972.

age-prediction-distribution
Age prediction distribution for Marlon Brando in Godfather

We’ll calculate apparent age from these age distributions

1
2
3
4
5
6
7
8
img = image.load_img(picture)
plt.imshow(img)
plt.show()
 
print("most dominant age class (not apparent age): ",np.argmax(prediction))
 
apparent_age = np.round(np.sum(prediction * output_indexes, axis = 1))
print("apparent age: ", int(apparent_age[0]))

Results are very satisfactory even though it does not have a good perspective. Marlon Brando was 48 and Al Pacino was 32 in Godfather Part I.

age-prediction-for-godfather-v2
Apparent Age Prediction in Godfather

Compare to original study

As I mentioned before, we re-trained the base model because the original study is mainly based on Caffe and I need pre-trained weights for Keras. The original study was the winner of the ChaLearn Looking at People (LAP) challenge on Apparent age V1 (ICCV ’15).





You are expected to predict the age of someone and there are several predictions of his/her age instead of actual age. So, your predictions will be evaluated by the mean and standard deviation the the jury predictions.

Evaluation formula

If your prediction is equal to the mean of the predictions, then error becomes 0. Besides, if your prediction is not close to the mean of the predictions but the standard deviation of jury predictions are high, then the error closes to 0 as well. On the other hand, you will be fined if your prediction is not close to the mean of predictions and the standard deviation of the jury predictions is low as well.

1
2
3
from math import e
df['epsilon'] = e ** ( -1*( (df['prediction'] - df['mean_age']) ** 2 ) / (2*(df['std_age']**2)) )
df['epsilon'].mean()

The ε value of this model is 0.387378, and MAE is 7.887859 for 1079 instances. On the other hand, the ε value of the original study was 0.264975. They declared that human reference of ε was 0.34. So, the original study is still little bit more accurate than the model I created in this post. Besides, my model is close to the human level for age prediction.

You can find the evaluation test data set and its labels here

Face detection

Train set images are already cropped and just facial areas are mentioned. Testing  a custom image requires to detect faces. This will increase the accuracy dramatically. Besides, face alignment is not a must but it is a plus for this study.

There are several face detection solutions. OpenCV offers haar cascade and single shot multibox detector (SSD). Dlib offers Histogram of Oriented Gradients (HOG) and Max-Margin Object Detection (MMOD). Finally Multi-task Cascaded Convolutional Networks (MTCNN) is a common solution for face detection. Herein, haar cascade and HoG are legacy methods whereas SSD, MMOD and MTCNN are deep learning based modern solutions. You can see the face detection performance of those model in the following video.

Here, you can also see how to run those different face detectors in a single line of code with deepface framework for python.

You can find out the math behind face alignment more on the following video:

Face detectors extract faces in a rectangle area. So, it comes with a noise such as background color. Here, we can find 68 landmarks of a facial image with dlib

Here, retinaface is the cutting-edge face detection technology. It can even detect faces in the crowd and it finds facial landmarks including eye coordinates. That’s why, its alignment score is very high.





Gender prediction model

Apparent age prediction was a challenging problem. However, gender prediction is much more predictable.

We’ll apply binary encoding to target gender class.

1
2
target = df['gender'].values
target_classes = keras.utils.to_categorical(target, 2)

We then just need to put 2 classes in the output layer for man and woman.

1
2
3
4
5
6
7
8
9
for layer in model.layers[:-7]:
layer.trainable = False
 
base_model_output = Sequential()
base_model_output = Convolution2D(2, (1, 1), name='predictions')(model.layers[-4].output)
base_model_output = Flatten()(base_model_output)
base_model_output = Activation('softmax')(base_model_output)
 
gender_model = Model(inputs=model.input, outputs=base_model_output)

Now, the model is ready to fit.

1
2
3
4
5
6
7
8
9
10
11
12
scores = []
epochs = 250; batch_size = 256
 
for i in range(epochs):
print("epoch ",i)
 
ix_train = np.random.choice(train_x.shape[0], size=batch_size)
 
score = gender_model.fit(train_x[ix_train], train_y[ix_train]
, epochs=1, validation_data=(test_x, test_y), callbacks=[checkpointer])
 
scores.append(score)

It seems that the model is saturated. Terminating training will be clever.

loss-for-gender-v2
Loss for gender prediction

Evaluation

1
gender_model.evaluate(test_x, test_y, verbose=1)

The model has the following validation loss and accuracy. It is really satisfactory.

[0.07324957040103375, 0.9744245524655362]

Confusion matrix

This is a real classification problem instead of age prediction. The accuracy should not be the only metric we need to monitor. Precision and recall should also be checked.

1
2
3
4
5
6
7
8
9
10
11
12
13
from sklearn.metrics import classification_report, confusion_matrix
 
predictions = gender_model.predict(test_x)
 
pred_list = []; actual_list = []
 
for i in predictions:
pred_list.append(np.argmax(i))
 
for i in test_y:
actual_list.append(np.argmax(i))
 
confusion_matrix(actual_list, pred_list)

The model generates the following confusion matrix. Columns are prediction whereas rows are actual value labels.

| Female | Male |
Female | 1873 | 98 |
Male | 72 | 4604 |





This means that we have 96.29% precision, 95.05% recall. These metrics are as satisfactory as the accuracy.

Testing gender for custom images

We just need to feed images to the model.

1
2
3
4
5
6
7
8
picture = "katy-perry.jpg"
prediction = gender_model.predict(loadImage(picture))
 
img = image.load_img(picture)#, target_size=(224, 224))
plt.imshow(img)
plt.show()
gender = "Male" if np.argmax(prediction) == 1 else "Female"
print("gender: ", gender)

Conclusion

So, we’ve built an apparent age and gender predictors from scratch based on the research article of computer vision group of ETH Zurich. In particular, the way they proposed to calculate apparent age is an over-performing novel method. Deep learning really has a limitless power for learning.

I pushed the source code for both apparent age prediction and gender prediction to GitHub. Similarly, real time age and gender prediction implementation is pushed here. You might want to just use pre-trained weights. I put pre-trained weights for age and gender tasks to Google Drive.

Python library

Herein, deepface is a lightweight facial analysis framework covering both face recognition and demography such as age, gender, race and emotion. If you are not interested in building neural networks models from scratch, then you might adopt deepface. It is fully open-source and available on PyPI. You can make predictions with a few lines of code.

deepface-analysis
Deep Face Analysis

Here, you can watch how to apply facial attribute analysis in python with a just few lines of code.

You can run deepface in real time with your web cam as well.

Meanwhile, you can run face verification tasks directly in your browser with its custom ui built with ReactJS.

Also, deepface has its React JS ui for facial attribute analysis purposes.





Anti-Spoofing and Liveness Detection

What if DeepFace is given fake or spoofed images? This becomes a serious issue if it is used in a security system. To address this, DeepFace includes an anti-spoofing feature for face verification or liveness detection.


Support this blog if you do like!

Buy me a coffee      Buy me a coffee


83 Comments

Leave a Reply