Race and Ethnicity Prediction in Keras

We’ve mentioned how to predict the identity, emotion, age and gender with deep learning in previous posts. Ethnicity and race are facial attributes as well similar to previous ones and we can predict it, too. Recognizing ethnicity from face photos could contribute a huge contribution to missing children, search investigations, refugee crisis and genealogy research. We’ve previously mentioned the ethnicity prediction topic in the perspective of AI Ethics.

ethnicity-diversity
Ethnicity diversity

Data set

I’ve found two different public data sets including ethnicity labeled face pictures.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

The first one is FairFace. This one is a large scale data set and it consists of 86K train and 11K test instances. Its labels are East Asian, Southeast Asian, Indian, Black, White, Middle-Eastern and Latino-Hispanic. Merging both east and southeast Asian races into a single Asian race would be better.

train_df = pd.read_csv("fairface_label_train.csv")
test_df = pd.read_csv("fairface_label_val.csv")

The second one is UTKFace. This one is a small scale data set. It has 10K instances. Besides, its labels are Asian, Indian, Black, White and Others (Latino and Middle Eastern).

Merging two data sets increased the accuracy in my experiments from 68% to 72% but I had to replace Latino and Middle Eastern races to Others. In other words, UTKFace would not increase the accuracy as expected. That’s why, I prefer to train my model with just FairFace data set.

Ethnicity distribution

The number of instances for each race is homogeneous in FairFace data set.

100*train_df.groupby(['race']).count()[['file']]/train_df.groupby(['race']).count()[['file']].sum()
FairFace-Race-Distribution
Distribution in FairFace

I’ve merged two Asian races into a single Asian race.

idx = train_df[(train_df['race'] == 'East Asian') | (train_df['race'] == 'Southeast Asian')].index
train_df.loc[idx, 'race'] = 'Asian'

idx = test_df[(test_df['race'] == 'East Asian') | (test_df['race'] == 'Southeast Asian')].index
test_df.loc[idx, 'race'] = 'Asian'

Thus, distribution becomes as illustrated below after data manipulations.

FairFace-Race-Distribution-2
Distribution after data manipulation

Reading image pixels

The original data set includes just base image names and its race.

FairFace-head
FairFace head

We will read image pixels based on the file names.





target_size = (224, 224)
def getImagePixels(file):
    img = image.load_img(file, grayscale=False, target_size=target_size)
    x = image.img_to_array(img).reshape(1, -1)[0]
    return x

train_df['pixels'] = train_df['file'].progress_apply(getImagePixels)
test_df['pixels'] = test_df['file'].progress_apply(getImagePixels)

Now, images pixels are stored as a column

FairFace-pixels
Image pixels added as a column

Input features

Pixels are stored as a list. We need to reshape each line to (224, 224, 3). Besides, inputs should be normalized in neural networks because of activation functions. This is going to be input feature we will pass to the network as input.

train_features = []; test_features = []

for i in range(0, train_df.shape[0]):
    train_features.append(train_df['pixels'].values[i])

for i in range(0, test_df.shape[0]):
    test_features.append(test_df['pixels'].values[i])

train_features = np.array(train_features)
train_features = train_features.reshape(train_features.shape[0], 224, 224, 3)

test_features = np.array(test_features)
test_features = test_features.reshape(test_features.shape[0], 224, 224, 3)

train_features = train_features / 255
test_features = test_features / 255

Target

Race column is the target value we will predict. However, we need to apply it to one hot encoding. Network will have 6 outputs – this is the number of races in the data set.

train_label = train_df[['race']]
test_label = test_df[['race']]

races = train_df['race'].unique()

for j in range(len(races)): #label encoding
    current_race = races[j]
    print("replacing ",current_race," to ", j+1)
    train_label['race'] = train_label['race'].replace(current_race, str(j+1))
    test_label['race'] = test_label['race'].replace(current_race, str(j+1))

train_label = train_label.astype({'race': 'int32'})
test_label = test_label.astype({'race': 'int32'})

train_target = pd.get_dummies(train_label['race'], prefix='race')
test_target = pd.get_dummies(test_label['race'], prefix='race')

Train and validation split

Train and test sets are separate. We will predict the test set at the end of this study. We should split train set into train and validation to avoid overfitting. In this way, we can apply early stopping.

train_x, val_x, train_y, val_y = train_test_split(
	train_features, train_target.values
	, test_size=0.12, random_state=17
)

Base Model

We will use VGG-Face for transfer learning. Let’s construct it first.

model = Sequential()
model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(128, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(256, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(ZeroPadding2D((1,1)))
model.add(Convolution2D(512, (3, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Convolution2D(4096, (7, 7), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(4096, (1, 1), activation='relu'))
model.add(Dropout(0.5))
model.add(Convolution2D(2622, (1, 1)))
model.add(Flatten())
model.add(Activation('softmax'))

#related blog post: https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/
model.load_weights('vgg_face_weights.h5')

Transfer Learning

Its early layers can detect some facial patterns already. We do not have to train it from scratch. Because we do not have millions of train set instances. We can lock its early layers and expect the late layers to learn.

for layer in model.layers[:-7]:
    layer.trainable = False

In this way, its all layers except the last 7 one are locked and its weights will not be updated. We expect its last 7 layers to learn something.

The original VGG-Face network has 2622 outputs but here we need just 6 outputs related to races. We will customize the VGG-Face here and it is going to be VGG-Race now.

base_model_output = Sequential()
base_model_output = Convolution2D(num_of_classes, (1, 1), name='predictions')(model.layers[-4].output)
base_model_output = Flatten()(base_model_output)
base_model_output = Activation('softmax')(base_model_output)

race_model = Model(inputs=model.input, outputs=base_model_output)

Training

Instead of feeding all train data, I prefer to feed it as batches. I got the best result for 16.384 (2^14) batch size. I feed randomly selected 16K instances in every epoch. If validation loss would not decrease for 50 rounds, then training should be terminated to avoid overfitting.

race_model.compile(loss='categorical_crossentropy'
, optimizer=keras.optimizers.Adam(), metrics=['accuracy'])

checkpointer = ModelCheckpoint(filepath='race_model_single_batch.hdf5'
, monitor = "val_loss", verbose=1, save_best_only=True, mode = 'auto')

batch_size = pow(2, 14); patience = 50
last_improvement = 0; best_iteration = 0
loss = 1000000 #initialize as a large value

for i in range(0, epochs):
    print("Epoch ", i, ". ", end='')
    ix_train = np.random.choice(train_x.shape[0], size=batch_size)

    score = race_model.fit(
        train_x[ix_train], train_y[ix_train]
        , epochs=1
        , validation_data=(val_x, val_y)
        , callbacks=[checkpointer]
    )

    val_loss = score.history['val_loss'][0]; train_loss = score.history['loss'][0]
    val_scores.append(val_loss); train_scores.append(train_loss)

    if val_loss < loss:
        loss = val_loss * 1
        last_improvement = 0
        best_iteration = i * 1
    else:
        last_improvement = last_improvement + 1
        print("try to decrease val loss for ",patience - last_improvement," epochs more")

    if last_improvement == patience:
        print("there is no loss decrease in validation for ",patience," epochs. early stopped")
        break

Loss

The best epoch was 29. I train the network for 80 rounds but train loss decreased while validation loss increased when epoch > 30 in the following steps. That’s exactly overfitting.





plt.plot(val_scores[0:best_iteration+1], label='val_loss')
plt.plot(train_scores[0:best_iteration+1], label='train_loss')
plt.legend(loc='upper right')
plt.show()
FairFace-loss
Train and validation loss

That’s why, I loaded the weights for the best iteration

from keras.models import load_model
race_model = load_model("race_model_single_batch.hdf5")
race_model.save_weights('race_model_single_batch.h5')

Evaluation

We train the network with train data set and use validation set to apply early stop. Epoch is the best iteration for validation set actually. However, network could memorize the validation set and it could still be overfitted. That’s why, we haven’t feed test set to the network yet. We expect that test and validation loss should be close if the model is robust.

test_perf = race_model.evaluate(test_features, test_target.values, verbose=1)
print(test_perf)

validation_perf = race_model.evaluate(val_x, val_y, verbose=1)
print(validation_perf)

abs(validation_perf[0] - test_perf[0])

The both test and validation loss are 0.88 and accuracy are 68%. We can say that the model is robust.

Prediction

We can make predictions for the test set.

predictions = race_model.predict(test_features)

Also, we can print prediction and actual values and plot the original image as well.

predictions = race_model.predict(test_features)
for i in range(0, predictions.shape[0]):
    prediction = np.argmax(predictions[i])
    prediction_classes.append(races[prediction])

    actual = np.argmax(test_target.values[i])
    actual_classes.append(races[actual])

    if i == 10:
        print("Actual: ",races[actual])
        print("Predicted: ",races[prediction])
        img = (test_df.iloc[i]['pixels'].reshape([224, 224, 3])) / 255
        plt.imshow(img); plt.show()
FairFace-testset
Predictions for randomly selected instances in the test set

Confusion matrix

Accuracy does not mean anything for classification problems. We need precision and recall values. Confusion matrix is the best way to monitor the success of your model.

from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sn

cm = confusion_matrix(actual_classes, prediction_classes)
df_cm = pd.DataFrame(cm, index=races, columns=races)
sn.heatmap(df_cm, annot=True,annot_kws={"size": 10})

The following heat map explains everything.

FairFace-heatmap
Heat map

Predicting custom images

We can predict the ethnicity for custom images as well.

demo_set = ['fei-fei-li.jpg', 'sundar-pichai.jpg', 'obama.jpg', 'katy.jpg']

for file in demo_set:
    path = 'demo/%s' % (file)
    img = image.load_img(path, grayscale=False, target_size=target_size)
    img = image.img_to_array(img).reshape(1, -1)[0]
    img = img.reshape(224, 224, 3)
    img = img / 255

    plt.imshow(img)
    plt.show()

    img = np.expand_dims(img, axis=0) 

    prediction_proba = race_model.predict(img)

    print("Prediction: ",races[np.argmax(prediction_proba)])
    print("---------------------------")

I’ve applied prediction for the characters of Silicon Valley. Results are really satisfactory.

silicon-valley-ethnicity-2
Ethnicity of silicon valley characters

Loading pre-trained network

I shared the pre-trained network weights on Google Drive. You can skip training step and load the weight when our race model is built.





race_model.load_weights('race_model_weights_full_v2.h5')

Real Time Ethnicity Prediction

We can apply race prediction in real time as well. Its source code is pushed to GitHub already. Additionally OpenCV’s haar cascade module detects the face and we pass the detected face to the model.

BTW, have you subscribe my youtube channel 🙂

Conclusion

So, we’ve mentioned how to build a race and ethnicity classifier from scratch in this post. I pushed the source code of this post as a notebook to GitHub. Besides, its real time implementation code is pushed to GithHub, too. Pre-trained network weights are shared to Google Drive because of its size. There are many ways to support a project – starring the GitHub repo is just one.

Python library

Herein, deepface is a lightweight facial analysis framework covering both face recognition and demography such as age, gender, race and emotion. If you are not interested in building neural networks models from scratch, then you might adopt deepface. It is fully open-source and available on PyPI. You can make predictions with a few lines of code.

deepface-analysis
Deep Face Analysis

Here, you can watch a how to apply facial attribute analysis in python with a just few lines of code.


Like this blog? Support me on Patreon

Buy me a coffee


13 Comments

  1. Hi Sefik,

    This is awesome!

    I’m a big fan of your works.

    I am trying to implement your approach but I am having a memory issue when loading the images to arrays. Can you please advise on how can I proceed?

    Thanks and more power!

  2. Hello Sefik,

    Your work helped me a lot!
    I’m wondering at the transfer learning part, how should we define the number of classes(num_of_classes)? What number should we put?

    base_model_output = Convolution2D(num_of_classes, (1, 1), name=’predictions’)(model.layers[-4].output)

    Is here refer to the number of classes of race output?

  3. hi sefik,how can i predict the race of multiple images which i put them in a csv file?
    can you share code for predicting race of images through reading csv files?

  4. hi,how did you upload train and val csv file?
    we have to convert the downloaded zip file into csv?
    tell me how to…

  5. code:train_df[‘pixels’] = train_df[‘file’].progress_apply(getImagePixels)

    No such file or directory: ‘FairFace/train/1.jpg’

    how to fix this issue?

  6. link of download dataset is not working, can you please modify link, & also is there any new release of model?

Comments are closed.