A Developer’s Guide to Machine Learning

Machine learning related jobs have a huge demand nowadays. However, there is a significant gap between supply and demand because of the talent shortage. Today, software developers are one of the most potential candidates of machine learning practitioners. However, there are still strict differences between these motivations.  In this post, we will mention how software developers and machine learning practitioners are different, and how to adapt developers into machine learning field.

super-cdo
New supermen are data science team members

Coding

Machine learning practitioners have to have coding, math and communication skills. Developers already have high coding skills. However, they mostly lack in math oriented thinking. Transforming from code based perspectives to math based mindset can handle most of issues.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Imagine a basic neural cell. There are inputs connecting to weights. Here, each input multiplied by its own weight, and sum of input weight multiplications stores summation unit. Summation will be transformed to an activation function, and in this way output of the cell can be calculated.

artificial-neural-network-cell
A neural cell.

Developers tend to apply this calculation with for loops. They might store features in an input array, similarly put weights in a weight array.  Then, they would apply for loop, multiply same indexed input and weight array items in current iteration. Multiplication should also be added into a global variable.

import numpy as np

inputs = np.array([1,0,1])
weights = np.array([0.3, 0.8, 0.4])

sum = 0
for i in range(inputs.shape[0]):
sum = sum + inputs[i] * weights[i]

print(sum)

This approach is fine. It will totally work. However, what if there are hundred of thousand input parameters? Compiler needs to find each variable in different memory slot. This causes a really performance problem. Alternatively, machine learning practitioners tend to solve same problem differently.

vectorized_sum = np.matmul(np.transpose(weights), inputs)

Inputs and weights are same sized vectors. Linear algebra says that matrices can be multiplied only if row size of a matrix is equal to column size of the other one. Here, we can handle matrix multiplication if we transpose weights vector. Then, inputs can be multiplied by transposed weights. The matrix multiplication operation will produce same result as for loops. This approach is called as vectorization.

Vectorization makes your code 4 times cleaner. As Bjarne Stroustrup, who created C++, mentioned that “I like my code elegant and efficient”. Remember the quote of Audrey Hepburn, “elegance is the only beauty that never fades”. Of course, this is not all!

Very complex deep neural networks structures can be reduced into matrix multiplication operations for both backwardly training and feeding-forwardly predictions. Vectorization is almost 150 times faster than traditional loops if input and weight pairs are really large.

My favorite definition of Deep Learning is matrix multiplication, a lot of matrix multiplication… – Barbara Fusinska

This is one of the reason why machine learning community adopts python. Even though there are matrices in high level languages such as Java or C#, there is no out-of-the-box function for matrix operations. Matrix multiplications can be handled by for loops in this high level languages. Herein, python offers overperforming matrix multiplication operations with numpy library. Similarly, tensorflow includes powerful matrix multiplication supporting. However, this is not meaning that python is much faster than the others. Python is faster only if you know how to apply linear algebra in your program.





Even though, matlab is much more powerful than python for matrix based computations, it is not good at data transformation, database connection and service integration topics. On the other hand, high level languages such as java and .net are good at these topics but they are not good at matrix operations. Herein, python is strong computation language and it also supports object oriented programming, database and service subjects. I imagine python as a programming language between matlab and java.

Modelling and training a neural network might last weeks but model can respond immediately if training is over. Herein, well accepted approach is separating training and prediction tasks. You should run training asynchronously as batch process, maybe once a week. High level languages such as java can consume tensorflow based pretrained models. In this way, you don’t need to do anything more, if your production environment is based on java. We can say that training is responsibility of data scientist whereas developers are in charge of prediction.

Testing

Today, unit tests are very important part of software development and devops process. Such that test driven development proposes to write unit tests even before coding. The question is that when to write unit tests. The answer is that when you know the exact result for that program. That’s why, you should not adapt this approach to machine learning lifecycle.

Suppose that you would like to develop an app to classify man and woman from taken photos. Let’s say you have 100 images for training. You created a model and this model can classify all of training images correctly. However, when you test the model with unseen images it can classify 30 of 100 correctly. This means that your model didn’t learn anything, it could only memorize the training instances.

Your model would be more successful if it can classify 70% correctly for both seen and unseen ones. This is the most challenging problem in machine learning studies called over- fitting. It is dangerous for researchers because overfitting misguides you. You might think that you’ve got a satisfactory accuracy, but it could only be in seen examples. That’s why, we will let the model fail to handle overfitting.

This is the reason why unit testing is not convenient in machine learning. Instead of this, we might separate 80% of data set for learning, and 20% of data set for testing. In this way, we can evaluate the model for unseen examples.

hulk-unit-test
Skipping tests

Data driven organization

Most of enterprises appoint Chief Data Officer role directly reports to CEO or COO. This role is a must for a data driven organization. Beyond the CDO, some organizations hire Chief AI Officer roles. This is important because you cannot evaluate data science team members as software engineers.

You can expect to develop a new lines of code from software engineers. Well defined expectations exist for them. However, data scientists spent most of their time for tuning machine learning models. I mean that they design neural network structure, decide input features of the network, try different number of layers and nodes in the network, make decisions about activation function and optimization algorithm.  Because there is no function to determine these configurations. This is the state-of-the-art part of the study. Still they have the luxury to fail because they are working on research projects. Similarly, data engineers responsible for preparing data for data scientists to be able to work. Typical software engineer team manager could not understand and evaluate the responsibilities of data science team members.

la-et-avengers
Avengers, assemble!

Google’s recent research, hidden technical debt in machine learning systems, reveals that only a small fraction of real world machine learning system is composed of the core machine learning code. The required surrounding infrastructure is wide and complex.

ml-code-in-ml-system
ML code is in a real world ML system

Choose the right knife for the job

Machine learning practitioners mostly discuss about algorithms instead of technology or programming language.





right-knife
The right knife for the right job

There are mainly 3 different problem type in machine learning field. If your problem responds the question how much or how many, then the problem is regression. You might think that what will Apple shares be in next week. If your problem responds the question such that is it this or not, that would be classification. Recognizing cat photos is a common binary classification task. Number of classes could be greater than 2, this is multi class classification.

hotdog-or-not
Binary classification

You can evaluate your predictions for both regression and classification because you have already known the actual results. I mean that your model can recognize an image as cat, but actually it is a dog, you can say that is incorrect. Actual labels are supervisors of your learning process. That’s why, both regression and classification tasks are kind of supervised learning.

What if your data doesn’t have these actual labels? If there is no supervisor labels, then learning type will be unsupervised learning. You might apply clusters based on some attributes.

Regression, classification and clustering must be handled different machine learning algorithms. For example, (standard) support vector machines are very powerful algorithm to solve classification problems. You can spent a year and you can be an expert of SVM, but still you cannot apply the typical algorithm for regression problems. So, you cannot solve regression problems with SVM. Similarly, you can apply linear regression for regression problems but you cannot do it for classification tasks fairly.

Herein, neural networks are like Swiss army knifes. You can apply this algorithm for any kind of machine learning task. You just need to feed the data to the algorithm. They can solve regression, classification and regression tasks.

My favorite machine learning is Neural Nets. That’s my favorite. My 2nd favorite machine learning is SVD. Everyone says, oh don’t you prefer gradient boosted trees? I know GBTs are great but I like NN best and I like SVD next best – Francois Chollet

So, machine learning practitioners have to have math, code and communication skills. Here, developers are one of the most potential candidates for machine learning field because they are a past master at coding. However, they need to be transformed from from code based mindset to math based perspective. Also, data science projects are not how typical software projects work. Developers need to be adapted to a new data driven business manner instead of letting them to transform business manners just like in software development teams.

The following webinar might attract you if you enjoy this blog post


Like this blog? Support me on Patreon

Buy me a coffee


2 Comments

  1. Hi,
    There are some misleading concepts.

    “So, you cannot solve regression problems with SVM. Similarly, you can apply linear or logistic regression for regression problems but you cannot do it for classification tasks.”

    We can use SVM for regression tasks,
    We can use Logistic Regression for binary classification tasks.

    Best regards

    1. Hello, yes, logistic regression can be applied for classification problems, I fixed the term. But support vector regression is a little different. Your sentence is not incorrect, but SVM requires a few minor differences such as deploying non-linear kernel to apply regression problems. Regards

Comments are closed.