Remember the famous quote in the Snow White. Evil Queen asks to the mirror on wall that who is the fairest of them all. Here, fairest means the most white. The question in the German original was that who is the most beautiful in the land. Would you like to ask this question to the deep learning? Today, the first beauty competition judged by a AI system was already organized. Herein, South China University of Technology published a research paper about facial beauty prediction. They also open-sourced the data set. So, we can build our own beauty score prediction application. It can be used for to find the most beautiful / handsome one in a community or to find the best photo of a single person.
Data set
It is a 172MB data set consisting of 5500 face images and face scores in range of [1, 5]. The beauty score is collected from 60 different labelers.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
path = "SCUT-FBP5500_v2/train_test_files/split_of_60%training and 40%testing" train_df = pd.read_csv("%s/train.txt" % (path), sep=" ", names=["image", "score"]) test_df = pd.read_csv("%s/test.txt" % (path), sep=" ", names=["image", "score"])
I find easy that storing both train, test and validation in the same data frame including purpose column.
train_df['purpose'] = 'train' test_df['purpose'] = 'test' np.random.seed(17) #important for reproducability val_set_size = int((40*train_df.shape[0])/100) val_idx = np.random.choice(train_df.shape[0], size=val_set_size) train_df.loc[val_idx, 'purpose'] = 'validation' df = pd.concat([train_df, test_df]) del train_df, test_df df.sample(5)
When we sort the beauty score from highest to lowest, the most beautiful ones are show below. The number one is an Asian lady. I do not recognize her. The second one is Audrey Hepburn, third one is Emma Watson and fourth one is Jessica Alba. I trust the data set because Audrey Hepburn appears in the top of the labeled data set!
Demography
Besides, image file names include some demographic information such as ethnicity and gender. The first letter of the image name refers to ethnicity including Asian and Caucasian, and the second letter of the image name states gender including Male and Female.
We can check the homogeneousness of the data set based on demography.
pd.DataFrame(df.purpose.value_counts()).rename(columns = {"purpose": "instances"}) pd.DataFrame(100*df.race.value_counts()/df.race.value_counts().sum()).rename(columns = {"race": "instance_percentage"}) pd.DataFrame(100*df.gender.value_counts()/df.gender.value_counts().sum()).rename(columns = {"gender": "instance_percentage"})
It seem that we have a homogeneous data set based on gender but Asian ethnicity is dominant in the data set.
Reading images
Just image names stored in the data frame. We need to read these images and find their pixels.
def retrievePixels(img_name): path = "SCUT-FBP5500_v2/Images/%s" % (img_name) img = image.load_img(path, grayscale=False, target_size=(224, 224)) x = image.img_to_array(img).reshape(1, -1)[0] return x df['pixels'] = df['image'].apply(retrievePixels)
This approach store all pixels of an image as an 1-dimensional array. We have to do that because we cannot store n-dimensional arrays as a column in a pandas data frame. It causes some troubles. We can read the pixel values as line by line and append to a numpy array.
features = [] pixels = df['pixels'].values for i in range(0, pixels.shape[0]): features.append(pixels[i]) features = np.array(features) features = features.reshape(features.shape[0], 224, 224, 3) features = features / 255 #normalize inputs within [0, 1]
Modelling
Total number of instances including train, test and validation is 5500. We just have half of it for training. We might have hundred thousands of images to train a neural networks model from scratch. Still we can apply transfer learning here. We’ve already known that VGG-Face is very successful for facial attributes based applications. We will build a VGG-Face model in Keras (TensorFlow backend).
model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3))) model.add(Convolution2D(64, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(Convolution2D(4096, (7, 7), activation='relu')) model.add(Dropout(0.5)) model.add(Convolution2D(4096, (1, 1), activation='relu')) model.add(Dropout(0.5)) model.add(Convolution2D(2622, (1, 1))) model.add(Flatten()) model.add(Activation('softmax')) #load pre-trained weights of vgg-face model. #you can find it here: https://drive.google.com/file/d/1CPSeum3HpopfomUEK1gybeuIVoeJT_Eo/view?usp=sharing #related blog post: https://sefiks.com/2018/08/06/deep-face-recognition-with-keras/ model.load_weights('vgg_face_weights.h5')
We can use its early layer outcomes and train a few final layers instead. This approach is almost the same we’ve done in age and gender prediction. The only different here is that this is a regression problem. That’s why, final layer must be size of 1. We also do not need to normalize target values.
num_of_classes = 1 #this is a regression problem #freeze all layers of VGG-Face except last 7 one for layer in model.layers[:-7]: layer.trainable = False base_model_output = Sequential() base_model_output = Flatten()(model.layers[-4].output) base_model_output = Dense(num_of_classes)(base_model_output) beauty_model = Model(inputs=model.input, outputs=base_model_output)
Training
To avoid overfitting, I’ll check validation loss in every epoch and terminate the training if there is no improvement for 50 steps. Mean squared error as a loss metric would be good choice. Because this is a regression problem and I could not use categorical cross entropy here.
beauty_model.compile(loss='mean_squared_error', optimizer=keras.optimizers.Adam()) checkpointer = ModelCheckpoint( filepath='beauty_model.hdf5' , monitor = "val_loss" , verbose=1 , save_best_only=True , mode = 'auto' ) earlyStop = EarlyStopping(monitor='val_loss', patience=50)
Notice that I have a monolithic data frame and it includes train, validation and test sets. I will use train set for training and validation set for early stopping but I will not feed test set in any case.
train_idx = df[(df['purpose'] == 'train')].index val_idx = df[(df['purpose'] == 'validation')].index test_idx = df[(df['purpose'] == 'test')].index score = beauty_model.fit( features[train_idx], df.iloc[train_idx].score , epochs=5000 , validation_data=(features[val_idx], df.iloc[val_idx].score) , callbacks=[checkpointer, earlyStop] )
I got the best validation score in 48th epoch. Training loss was 0.1180 and validation loss was 0.1338 in my experiment. We can monitor the loss change on epochs.
best_iteration = np.argmin(score.history['val_loss'])+1 val_scores = score.history['val_loss'][0:best_iteration] train_scores = score.history['loss'][0:best_iteration] plt.plot(val_scores, label='val_loss') plt.plot(train_scores, label='train_loss') plt.legend(loc='upper right') plt.show()
Performance
The reference study shows the Pearson Correlation Coefficient (PC), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) metrics for 3 different models.
We can calculate the error metrics of our built model as illustrated below.
actuals = df.iloc[test_idx].score.values predictions = beauty_model.predict(features[test_idx]) perf = pd.DataFrame(actuals, columns = ["actuals"]) perf["predictions"] = predictions print("PC: ",perf[['actuals', 'predictions']].corr(method ='pearson').values[0,1]) print("MAE: ", mean_absolute_error(actuals, predictions)) print("RMSE: ", sqrt(mean_squared_error(actuals, predictions)))
Our model got the following performance metrics. It is much more successful than the reference study in 3 metrics. Besides, the difference is not minor. We can say that we make a huge improvement on reference study.
PC: 0.93629415207285
MAE: 0.1759907865360815
RMSE: 0.25044935574228544
Scatter plot makes sense in regression problems.
best_predictions = [] for i in np.arange(int(min_limit), int(max_limit) + 1, 0.01): best_predictions.append(round(i, 2)) plt.scatter(best_predictions, best_predictions, s=1, color = 'black', alpha=0.3) plt.scatter(predictions, actuals, s=20, alpha=0.1)
X-axis is predictions whereas y-axis is actual values. We know that blue point would be the black line if we can predict test set faultlessly. Still predictions seem to be very close to actual values.
IMDB data set
Remember that we’ve used IMDB data set to build a age and gender classifier. There are almost 100K instances in that data set. I’ve applied beauty score prediction for all imdb data set. The AI system votes the following ladies as the most beautiful actresses. Notice that score of some of them are exceeded the 100%.
Berenice Marlohe is the most beautiful lady. You might be familiar to her in Skyfall. January Jones comes after the Berenice. She is the star of the Mad Man. Olivia Palermo, Kristen Stewart (Twilight, Charlie’s Angels) are on the podium, too.
I’ve created IMDb list covering the most beautiful actresses including ranks. Also, this following video show the most beautiful 25 one.
I also applied the model for actors in imdb.
We can also apply beauty score prediction in real time as well. I’ve applied real time beauty score prediction to a short one minute video of the Victoria’s Secret’s super model Adriana Lima.
Future work
Chicago Face Data Set(CFD) offers a huge volume of data including attractiveness and babyface score. It is a size of 1.5GB. The data set also includes some additional demography info such as ethnicity. It might help to increase the score of race and ethnicity prediction.
Conclusion
So, we’ve built a beauty score predictor in this post. It is very similar to any facial attribute based classifier in our previous studies. Besides, our experiments improved the accuracy of reference academic study. This study might be adapted to a beauty competition and find the most beautiful ones in a community or deciding the best photo of a single person.
Finally, I’ve pushed the source code of this study to GitHub. There are many ways to support a project – starring the GitHub repos is one.
Support this blog if you do like!
Hello Sefik,
First of all, thank you for the video. We are a team that is interested in deep learning and trying to learn. We tried to do the train like in the video, but we ran into a few problems.
1- The gpu we used in the training phase is not enough to complete the 5000 epoch, how did you train it? Only 1 Epoch took 5 minutes.
2- Is it possible to share this model you train with us? We’ve been trying for 2 weeks, but we couldn’t train the model.
Unfortunately, it seems that i did not store the weight file. I trained this model with a powerful gpu.