Neural networks requires to apply several state-of-the-art techniques such as choice of activation function, or network design to push their limits. Initializing way for weights is another state-of-the-art technique for neural networks .
Random initialization did not exist in legacy version of perceptron. Adding hidden layers was not enough to generalize non-linear problems. Let’s monitor how initializing all weight values as zero fails for multi-layer perceptron. It cannot generalize even an xor gate problem even though it have a hidden layer including 4 nodes.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
def initialize_weights(layer_index, rows, columns): weights = np.zeros((rows+1, columns))
As seen, final weight values are same for same layers. This is the reason of failing.
Pure Initialization
On the other hand, initializing weights randomly enables to back propagate. You can create populate the weights with random samples from a uniform distribution over [0, 1].
def initialize_weights(rows, columns): weights = np.random.random((rows+1, columns)) #+1 refers to bias unit
Xaiver Initialization
You can improve converge performance by applying some additional techniques. Initializing weights is based on the layer it connected from. This is called Xavier Initialization. This initialization is good for tanh activation.
def initialize_weights(rows, columns): weights = np.random.randn(rows+1, columns) #normal distribution, +1 refers to bias unit weights = weights * np.sqrt(1/rows) return weights
Xavier Initialization for ReLU
Modifying dividend works better for ReLU.
weights = weights * np.sqrt(2/(rows+1)) #+1 refers to bias unit
Normalized Initialization
Same research proposes another initialization technique called normalized initialization based on the size of previous layer and following layer.
weights = weights * np.sqrt(6/((rows+1) + columns)) #+1 refers to bias unit
Weight initialization
You can create weights’ initial values in python as coded below:
num_of_layers = len(hidden_layers) + 2 #plus input layer and output layer w = [0 for i in range(num_of_layers-1)] #weights from input layer to first hidden layer w[0] = initialize_weights(num_of_features, hidden_layers[0]) #weights connecting a hidden layer to another hidden layer if len(hidden_layers)-1 != 0: for i in range(len(hidden_layers) - 1): w[i+1] = initialize_weights(hidden_layers[i], hidden_layers[i+1]) #weights from final hidden layer to output layer w[num_of_layers-2] = initialize_weights(hidden_layers[len(hidden_layers) - 1], num_of_classes)
So, we have focused on why random initialization is important for neural networks. Also, we’ve mentioned some initialization techniques. However, applying one of these initialization approaches are not must. Neural networks can handle any problem if they just initialized randomly. I’ve finally pushed weight initialization logic into GitHub.
Support this blog if you do like!