Developers tend to handle problems with conditional statements and loops. This is the number one topic of developers and data scientists being separated from each other. If statements are requirements for rule based systems but the less if statements exist, the better solution would be for machine learning problems. On the other hand, nested for loops cause to decelerate performance and decrease the code readability. Herein, vectorization enables to create clean code and increase code performance.
Handling feed forward with loops
Suppose that you will construct a neural network. Using for loops requires to store relations between nodes and weights to apply feed forward propagation. I have applied this approach once. That might be good for beginners. But you have to pay particular attention to follow algorithm instructions. Even a basic feed forward propagation can be coded as illustrated below. I can handle it with almost 50 lines of codes.
šāāļø You may consider to enroll my top-rated machine learning course on Udemy
def applyForwardPropagation(nodes, weights, instance, activation_function): #transfer bias unit values as +1 for j in range(len(nodes)): if nodes[j].get_is_bias_unit() == True: nodes[j].set_net_value(1) #------------------------------ #tranfer instace features to input layer. activation function would not be applied for input layer. for j in range(len(instance) - 1): #final item is output of an instance, that's why len(instance) - 1 used to iterate on features var = instance[j] for k in range(len(nodes)): if j+1 == nodes[k].get_index(): nodes[k].set_net_value(var) break #------------------------------ for j in range(len(nodes)): if nodes[j].get_level()>0 and nodes[j].get_is_bias_unit() == False: net_input = 0 net_output = 0 target_index = nodes[j].get_index() for k in range(len(weights)): if target_index == weights[k].get_to_index(): wi = weights[k].get_value() source_index = weights[k].get_from_index() for m in range(len(nodes)): if source_index == nodes[m].get_index(): xi = nodes[m].get_net_value() net_input = net_input + (xi * wi) break #iterate on weights end net_output = Activation.activate(activation_function, net_input) nodes[j].set_net_input_value(net_input) nodes[j].set_net_value(net_output) #------------------------------ return nodes
Linear Algebra
So, is this really that complex? Of course, not. We will focus onĀ linear algebra to transform neural networks concept to vectorized version.
You might realize that demonstration of weights is a little different.
E.g. w(2)11 refers to weight connecting 2nd layer to 3rd layer because of (2) superscript. It is not the power expression. Moreover, this weight connects 1st node in the previous layer to 1st node in the following layer because of 11 subscript. First item in the subscript refers to connected from information and second item in the subscript refers to connected to information. Similarly,Ā w(1)12 refers to weight connecting 1st layer’s 1st item to 2nd layer’s 2nd item.
Let’s express inputs and weights as vectors and matrices. Input features are expressed as column vector size of 1xn where n is the total number of inputs.
Let’s imagine, what would be if transposed weights and input features are multiplied?
Yes, you are right! This matrix multiplication will store netinput for hidden layer.
We additionally need to transfer these inputs to activation function (e.g. sigmoid) to calculate netoutputs.
Vectorization saves life
So, what would vectorization contribute when compared to loop approach?
We will consume only the following libraries in our python program. Numpy is very strong python library makes matrix operations easier.
import math import numpy as np
Here, let’s initialize the input features and weights
x = np.array( #xor dataset [ #bias, #x1, #x2 [[1],[0],[0]], #instance 1 [[1],[0],[1]], #instance 2 [[1],[1],[0]], #instance 3 [[1],[1],[1]] #instace 4 ] ) w = np.array( [ [ #weights for input layer to 1st hidden layer [0.8215133714710082, -4.781957888088778, 4.521206980948031], [-1.7254199547588138, -9.530462129807947, -8.932730568307496], [2.3874630239703, 9.221735768691351, 9.27410475328787] ], [ #weights for hidden layer to output layer [3.233334754817538], [-0.3269698166346504], [6.817229313048568], [-6.381026998906089] ] ] )
Now, it is time to code. We can adapt feed forward logic in 2 meaningful steps (matmul which serves matrix multiplication and sigmoidĀ which serves activation function) as illustrated below. The other lines refer to initialization. As seen, there is neither loop nor condition statement used among nodes and weights.
num_of_layers = w.shape[0] + 1 def applyFeedForward(x, w): netoutput = [i for i in range(num_of_layers)] netinput = [i for i in range(num_of_layers)] netoutput[0] = x for i in range(num_of_layers - 1): netinput[i+1] = np.matmul(np.transpose(w[i]), netoutput[i]) netoutput[i+1] = sigmoid(netinput[i+1]) return netoutput
Additionally, we need to apply the following function to transform netinput to netoutput in layers.
def sigmoid(netinput): netoutput = np.ones((netinput.shape[0] + 1, 1)) #ones because init values are same as bias unit. #also size of output is 1 plus input because of bias for i in range(netinput.shape[0]): netoutput[i+1] = 1/(1 + math.exp(-netinput[i][0])) return netoutput
Backpropagation
Similar approach can be applied to learning process in neural networks. Element wise multiplication and scalar multiplication ease construction.
for epoch in range(10000): for i in range(num_of_instances): instance = x[i] nodes = applyFeedForward(instance, w) predict = nodes[num_of_layers - 1][1] actual = y[i] error = actual - predict sigmas = [i for i in range(num_of_layers)] #error should not be reflected to input layer sigmas[num_of_layers - 1] = error for j in range(num_of_layers - 2, -1, -1): if sigmas[j + 1].shape[0] == 1: sigmas[j] = w[j] * sigmas[j + 1] else: if j == num_of_layers - 2: #output layer has no bias unit sigmas[j] = np.matmul(w[j], sigmas[j + 1]) else: #otherwise remove bias unit from the following node because it is not connected from previous layer sigmas[j] = np.matmul(w[j], sigmas[j + 1][1:]) #sigma calculation end derivative_of_sigmoid = nodes * (np.array([1]) - nodes) #element wise multiplication and scalar multiplication sigmas = derivative_of_sigmoid * sigmas for j in range(num_of_layers - 1): delta = nodes[j] * np.transpose(sigmas[j+1][1:]) w[j] = w[j] + np.array([0.1]) * delta
Performance
It is clear that vectorization makes code more readable and more clear. What about the performance? I tested it for both loop approach and vectorization on xor data set for same configurations (10000 epoch, 2 hidden layers number of different nodes – x axis). It seems vectorization defeats loop approach even for a basic dataset. That is the engineering! You can test it by your own from this GitHub repo. NN.py refers to loop approach whereas Vectorization.py refers to vectorized version.
So, we have replaced loop approach to vectorization in neural networks feed forward step. This approach speeds performance up and increase code readability radically. I’ve also pushed both vectorization and loop approachĀ the code to GitHub. Not surprising thatĀ Prof. Andrew mentionedĀ that you should not use loops. BTW, Barbara Fusinska defines neural networks and deep learning as matrix multiplication, a lot of matrix multiplication. I like this definition as well.
Support this blog if you do like!
You are have god gifted talent for teaching. Thank you very much