How Vectorization Saves Life in Neural Networks

Developers tend to handle problems with conditional statements and loops. This is the number one topic of developers and data scientists being separated from each other. If statements are requirements for rule based systems but the less if statements exist, the better solution would be for machine learning problems. On the other hand, nested for loops cause to decelerate performance and decrease the code readability. Herein, vectorization enables to create clean code and increase code performance.

Handling feed forward with loops

Suppose that you will construct a neural network. Using for loops requires to store relations between nodes and weights to apply feed forward propagation. I have applied this approach once. That might be good for beginners. But you have to pay particular attention to follow algorithm instructions. Even a basic feed forward propagation can be coded as illustrated below. I can handle it with almost 50 lines of codes.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

def applyForwardPropagation(nodes, weights, instance, activation_function):

 #transfer bias unit values as +1
 for j in range(len(nodes)):
  if nodes[j].get_is_bias_unit() == True:
   nodes[j].set_net_value(1)

 #------------------------------
 #tranfer instace features to input layer. activation function would not be applied for input layer.
 for j in range(len(instance) - 1): #final item is output of an instance, that's why len(instance) - 1 used to iterate on features
  var = instance[j]

  for k in range(len(nodes)):

   if j+1 == nodes[k].get_index():
    nodes[k].set_net_value(var)
    break

 #------------------------------

 for j in range(len(nodes)):
  if nodes[j].get_level()&amp;amp;gt;0 and nodes[j].get_is_bias_unit() == False:

   net_input = 0
   net_output = 0

   target_index = nodes[j].get_index()

   for k in range(len(weights)):

    if target_index == weights[k].get_to_index():

     wi = weights[k].get_value()

     source_index = weights[k].get_from_index()

     for m in range(len(nodes)):

      if source_index == nodes[m].get_index():

       xi = nodes[m].get_net_value()

       net_input = net_input + (xi * wi)

       break

   #iterate on weights end

   net_output = Activation.activate(activation_function, net_input)
   nodes[j].set_net_input_value(net_input)
   nodes[j].set_net_value(net_output)

 #------------------------------

 return nodes

Linear Algebra

So, is this really that complex? Of course, not. We will focus on linear algebra to transform neural networks concept to vectorized version.

You might realize that demonstration of weights is a little different.

E.g. w⁽²⁾₁₁ refers to weight connecting 2nd layer to 3rd layer because of (2) superscript. It is not the power expression. Moreover, this weight connects 1st node in the previous layer to 1st node in the following layer because of 11 subscript. First item in the subscript refers to connected from information and second item in the subscript refers to connected to information. Similarly, w⁽¹⁾₁₂ refers to weight connecting 1st layer’s 1st item to 2nd layer’s 2nd item.

Let’s express inputs and weights as vectors and matrices. Input features are expressed as column vector size of 1xn where n is the total number of inputs.

vectorized-network-params — Vectorized inputs and weights

Let’s imagine, what would be if transposed weights and input features are multiplied?

transposed-matrices — Weights transposed

Yes, you are right! This matrix multiplication will store netinput for hidden layer.

hidden-layer-netinputs — Netinputs for hidden layer

We additionally need to transfer these inputs to activation function (e.g. sigmoid) to calculate netoutputs.

Vectorization saves life

So, what would vectorization contribute when compared to loop approach?

We will consume only the following libraries in our python program. Numpy is very strong python library makes matrix operations easier.

import math
import numpy as np

Here, let’s initialize the input features and weights

x = np.array( #xor dataset
 [ #bias, #x1, #x2
  [[1],[0],[0]], #instance 1
  [[1],[0],[1]], #instance 2
  [[1],[1],[0]], #instance 3
  [[1],[1],[1]]  #instace 4
 ]
)

w = np.array(
 [
  [ #weights for input layer to 1st hidden layer
   [0.8215133714710082, -4.781957888088778, 4.521206980948031],
   [-1.7254199547588138, -9.530462129807947, -8.932730568307496],
   [2.3874630239703, 9.221735768691351, 9.27410475328787]
  ],
  [ #weights for hidden layer to output layer
   [3.233334754817538],
   [-0.3269698166346504],
   [6.817229313048568],
   [-6.381026998906089]
  ]
 ]
)

Now, it is time to code. We can adapt feed forward logic in 2 meaningful steps (matmul which serves matrix multiplication and sigmoid which serves activation function) as illustrated below. The other lines refer to initialization. As seen, there is neither loop nor condition statement used among nodes and weights.

num_of_layers = w.shape[0] + 1

def applyFeedForward(x, w):
 netoutput = [i for i in range(num_of_layers)]
 netinput = [i for i in range(num_of_layers)]
 netoutput[0] = x

 for i in range(num_of_layers - 1):
  netinput[i+1] = np.matmul(np.transpose(w[i]), netoutput[i])
  netoutput[i+1] = sigmoid(netinput[i+1])

 return netoutput

Additionally, we need to apply the following function to transform netinput to netoutput in layers.

def sigmoid(netinput):
 netoutput = np.ones((netinput.shape[0] + 1, 1))
 #ones because init values are same as bias unit.
 #also size of output is 1 plus input because of bias
 for i in range(netinput.shape[0]):
  netoutput[i+1] = 1/(1 + math.exp(-netinput[i][0]))
 return netoutput

Backpropagation

Similar approach can be applied to learning process in neural networks. Element wise multiplication and scalar multiplication ease construction.

for epoch in range(10000):
 for i in range(num_of_instances):
 instance = x[i]
 nodes = applyFeedForward(instance, w)

 predict = nodes[num_of_layers - 1][1]
 actual = y[i]
 error = actual - predict

 sigmas = [i for i in range(num_of_layers)]
 #error should not be reflected to input layer

 sigmas[num_of_layers - 1] = error
 for j in range(num_of_layers - 2, -1, -1):
  if sigmas[j + 1].shape[0] == 1:
   sigmas[j] = w[j] * sigmas[j + 1]
  else:
   if j == num_of_layers - 2: #output layer has no bias unit
    sigmas[j] = np.matmul(w[j], sigmas[j + 1])
   else: #otherwise remove bias unit from the following node because it is not connected from previous layer
    sigmas[j] = np.matmul(w[j], sigmas[j + 1][1:])
 #sigma calculation end
 derivative_of_sigmoid = nodes * (np.array([1]) - nodes) #element wise multiplication and scalar multiplication
 sigmas = derivative_of_sigmoid * sigmas
 for j in range(num_of_layers - 1):
  delta = nodes[j] * np.transpose(sigmas[j+1][1:])
  w[j] = w[j] + np.array([0.1]) * delta

Performance

It is clear that vectorization makes code more readable and more clear. What about the performance? I tested it for both loop approach and vectorization on xor data set for same configurations (10000 epoch, 2 hidden layers number of different nodes – x axis). It seems vectorization defeats loop approach even for a basic dataset. That is the engineering! You can test it by your own from this GitHub repo. NN.py refers to loop approach whereas Vectorization.py refers to vectorized version.

So, we have replaced loop approach to vectorization in neural networks feed forward step. This approach speeds performance up and increase code readability radically. I’ve also pushed both vectorization and loop approach the code to GitHub. Not surprising that Prof. Andrew mentioned that you should not use loops. BTW, Barbara Fusinska defines neural networks and deep learning as matrix multiplication, a lot of matrix multiplication. I like this definition as well.

Support this blog if you do like!

How Vectorization Saves Life in Neural Networks

Handling feed forward with loops

Linear Algebra

Vectorization saves life

Backpropagation

Performance

Related

1 Comment

Leave a Reply Cancel reply

Handling feed forward with loops

Linear Algebra

Vectorization saves life

Backpropagation

Performance

Related

1 Comment

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil