Transfer learning triggered spirit of sharing among machine learning practitioners. However, they are working with really different tools. PyTorch, Caffe2, TensorFlow, Theano and MatLab are the most common machine learning frameworks. Herein, Keras provides an amazing API to write code in a shared language also simpler format as well. However, built models including model structure and pre-trained weights must be converted to Keras format. This is really painful process. Most researchers look for existence of converted weights and they give up if it does not exist. This post will cover converting MatLab models to Keras format.
MatLab to Keras
MatConvNet is a MatLab toolbox for CNN. Mostly, academia builds models with matlab. They are mostly research studies. Transcendental face recognition model – VGG-Face built on this technology. You can find pre-trained weights here – vgg_face_matconvnet.tar.gz.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
Model Structure
Original paper described each layer details. We will need this information to construct a CNN model.
They also provide a model summary.
Firstly, we can construct this CNN model by hand. I will this constructed model by hand as base model.
model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3))) model.add(Convolution2D(64, (3, 3), name= 'conv1_1')) model.add(Activation('relu', name='relu1_1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(64, (3, 3), name= 'conv1_2')) model.add(Activation('relu', name='relu1_2')) model.add(MaxPooling2D((2,2), strides=(2,2), name='pool1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), name= 'conv2_1')) model.add(Activation('relu', name='relu2_1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(128, (3, 3), name= 'conv2_2')) model.add(Activation('relu', name='relu2_2')) model.add(MaxPooling2D((2,2), strides=(2,2), name='pool2')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), name= 'conv3_1')) model.add(Activation('relu', name='relu3_1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), name= 'conv3_2')) model.add(Activation('relu', name='relu3_2')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(256, (3, 3), name= 'conv3_3')) model.add(Activation('relu', name='relu3_3')) model.add(MaxPooling2D((2,2), strides=(2,2), name='pool3')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv4_1')) model.add(Activation('relu', name='relu4_1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv4_2')) model.add(Activation('relu', name='relu4_2')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv4_3')) model.add(Activation('relu', name='relu4_3')) model.add(MaxPooling2D((2,2), strides=(2,2), name='pool4')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv5_1')) model.add(Activation('relu', name='relu5_1')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv5_2')) model.add(Activation('relu', name='relu5_2')) model.add(ZeroPadding2D((1,1))) model.add(Convolution2D(512, (3, 3), name= 'conv5_3')) model.add(Activation('relu', name='relu5_3')) model.add(MaxPooling2D((2,2), strides=(2,2), name='pool5')) model.add(Convolution2D(4096, (7, 7), name= 'fc6')) model.add(Activation('relu', name='relu6')) model.add(Dropout(0.5, name='dropout6')) model.add(Convolution2D(4096, (1, 1), name= 'fc7')) model.add(Activation('relu', name='relu7')) model.add(Dropout(0.5, name='dropout7')) model.add(Convolution2D(2622, (1, 1), name= 'fc8')) model.add(Activation('relu')) model.add(Flatten()) model.add(Activation('softmax', name= 'softmax'))
Secondly, we try to construct this model from reference model. In this case, we still need to pay regard to the reference model. I will mention how to cover this later steps of this post.
Reading reference model
Scipy offers to read matlab weights. We will need its loadmat module to read a matlab file.
from scipy.io import loadmat
Weight file – vgg_face.mat is stored in the vgg_face_matconvnet/data directory.
data = loadmat('data/vgg_face.mat', matlab_compatible=False, struct_as_record=False)
Loading the matlab file stores the following information into the data variable.
{‘__globals__’: [],
‘__header__’: b’MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Tue Oct 13 16:54:01 2015′,
‘__version__’: ‘1.0’,
‘net’: array([[]],
dtype=object)}
We need here the net object because it stores mat_struct object.
net = data['net'][0][0]
Now, net object brings classes, layers and normalization sub objects when I press tab. I discard classes and normalization because I need layers. Weights should be stored in layers. I will call this matlab model as reference model.
ref_model_layers = net.layers
Shape of the reference model layers seems (1, 39).
Original study states that there are 37 layers in the model. It should be equal to shape of the reference model.
for layer in ref_model_layers: print(layer[0][0].name)
There are 2 dropout layers after fully connected (fc) layers. These layers don’t appear in the reference table. Actually, dropout layers don’t store any weights. Now, layers in the matlab file are valid for the reference study.
Setting weights
Just convolution and fully connected layers have weights. Trying to access weights for the rest of layers such as pooling or relu will cause exception. That’s why, we will use try and except mechanism.
See the weight shape for each layer of the base model.
for layer in model.layers: layer_name = layer.name try: print(layer_name,": ",layer.weights[0].shape) except: print("",end='') #print(layer_name)
Now, see the weight shapes for each layer of the reference model.
for i in range(ref_model_layers.shape[0]): ref_model_layer = ref_model_layers[i][0,0].name[0] try: weights = ref_model_layers[i][0,0].weights[0,0] print(ref_model_layer,": ",weights.shape) except: #print(ref_model_layer) print("",end='')
The both code block prints the following outputs. This means that both matlab and constructed model by hand matches.
conv1_1 : (3, 3, 3, 64)
conv1_2 : (3, 3, 64, 64)
conv2_1 : (3, 3, 64, 128)
conv2_2 : (3, 3, 128, 128)
conv3_1 : (3, 3, 128, 256)
conv3_2 : (3, 3, 256, 256)
conv3_3 : (3, 3, 256, 256)
conv4_1 : (3, 3, 256, 512)
conv4_2 : (3, 3, 512, 512)
conv4_3 : (3, 3, 512, 512)
conv5_1 : (3, 3, 512, 512)
conv5_2 : (3, 3, 512, 512)
conv5_3 : (3, 3, 512, 512)
fc6 : (7, 7, 512, 4096)
fc7 : (1, 1, 4096, 4096)
fc8 : (1, 1, 4096, 2622)
Notice that 1st, 2nd and 4rd items in each layer. For example, conv1_1 layer had 64 convolution filters size of (3×3). Similarly, fc8 layer had 2622 filters size of (1×1). We can use this information to create the model. In this way, we can skip constructing CNN model by hand.
The procedure is easy. We will iterate on reference model layers. If any layer exists in both reference and base model, then we can copy the layer weights from reference model to base model. Copying is handled just for convolution and fully connected layers.
base_model_layer_names = [layer.name for layer in model.layers] num_of_ref_model_layers = ref_model_layers.shape[0] for i in range(num_of_ref_model_layers): ref_model_layer = ref_model_layers[i][0][0].name[0] if ref_model_layer in base_model_layer_names: #we just need to set convolution and fully connected weights if ref_model_layer.find("conv") == 0 or ref_model_layer.find("fc") == 0: print(i,". ",ref_model_layer) base_model_index = base_model_layer_names.index(ref_model_layer) weights = ref_model_layers[i][0][0].weights[0,0] bias = ref_model_layers[i][0][0].weights[0,1] model.layers[base_model_index].set_weights([weights, bias[:,0]])
Now, base model has weights in Keras format. We can save it.
#save model and structure separately model.save_weights('model_weights.h5') from keras.models import model_from_json model_config = model.to_json() open("model_structure.json", "w").write(model_config) #save model and structure alone model.save("model.hdf5")
We will be able to load the model in the future.
#load model structure and weights separately model = model_from_json(open("model_structure.json", "r").read()) model.load_weights('model_weights.h5') #load weights #or load whole model including structure and weights model = load_model("model.hdf5")
Model construction
We have constructed base model by hand. Instead of this, we can construct it in a clever way. Padding always exists before convolution layers based on the reference study table.
model = Sequential() for i in range(num_of_ref_model_layers): ref_model_layer = ref_model_layers[i][0][0].name[0] if ref_model_layer.find("conv") == 0 or ref_model_layer.find("fc") == 0: weights = ref_model_layers[i][0,0].weights weights_shape = weights[0][0].shape #print(":", weights_shape) filter_x = weights_shape[0]; filter_y = weights_shape[1] number_of_filters = weights_shape[3] if ref_model_layer.find("conv") == 0: print("ZeroPadding2D((1,1))") if i == 0: model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3))) else: model.add(ZeroPadding2D((1,1))) print("Convolution2D(",number_of_filters,", (",filter_x,", ",filter_y,"), name='",ref_model_layer,"')") model.add(Convolution2D(number_of_filters, (filter_x, filter_y), name= ref_model_layer)) else: if ref_model_layer.find("relu") == 0: print("Activation('relu', name=",ref_model_layer) model.add(Activation('relu', name=ref_model_layer)) elif ref_model_layer.find("dropout") == 0: print("Dropout(0.5, name=",ref_model_layer,")") model.add(Dropout(0.5, name=ref_model_layer)) elif ref_model_layer.find("pool") == 0: print("MaxPooling2D((2,2), strides=(2,2), name=",ref_model_layer,")") model.add(MaxPooling2D((2,2), strides=(2,2), name=ref_model_layer)) elif ref_model_layer.find("softmax") == 0: print("Activation('softmax', name=",ref_model_layer,")") model.add(Activation('softmax', name=ref_model_layer)) else: print("This layer was not processed: ",ref_model_layer)
Conclusion
So, we have mentioned how to convert MatLab models to Keras format. This is very important to increase the number of transfer learning studies. In this way, some researchers could study on different tools and some others can have their outcomes. You can find the source code of this post as a iPython notebook in GitHub.
Support this blog if you do like!