Machine Learning Wars: Deep Learning vs GBM

Machine learning studies have unfortunately bi-polarization. Practitioners mostly adopt either deep learning or gradient boosting machines. They might support these algorithm just like fans. Keep in mind that human beings are biased life forms. We will discuss pros and cons of each algorithm unbiasedly. Let wage a war between Deep Learning and GBM!

Deep learning side

It is a fact that deep learning offers superpowers. Face recognition, mood analysis, making art are not hard tasks anymore. However deep neural networks hit the wall when decisioning matters. Because they are totally black boxes. They cannot answer why and how questions. Why did your network have this number of hidden layers and nodes? Why did you set learning rate and learning time to these values? More importantly, how this works? This is because neural networks are combinations of matrix multiplications, non-linear functions (e.g. sigmoid, relu), derivatives and normalizations. Notice that basic information gathering requires to answer 5W1H. Otherwise, it won’t be considered complete.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Being unexplainable makes hard to find solutions for troubles. Remember that Microsoft shuts down its chatbot Tay instead of publishing a patch when it becomes racist.

tay-ai-hitler — Tay becomes racist bot in hours

Herein, some industries such as banking and finance have heavy regulations. Interpretablity is a must here. You cannot deploy unexplainable model to production even though it works.

Decision tree side

On the other hand, decisions made by decision tree algorithms can be read clearly. Because they can be transformed to if statements. This offers to modify rules based on your custom requirements. Moreover, rules are not complex. A decision tree algorithm finds the most dominant feature and checks it in every if block. This means that if the features are luggage capacity and number of doors are, then you can check either luggace capacity of number of doors. There cannot be a statement like that if luggage capacity is big, otherwise number of doors is 2. Else condition must check the same feature in the if statement.

The most dominant feature can be found by information gain in ID3, gain ratio in C4.5, gini in CART or standard deviation in Regression Trees. Then, the decision tree algorithm will be run for its sub data set recursively. This is called divide and conquer.

Decision trees are transparent algorithms but interpretability and accuracy are inversely proportional. We cannot ignore accuracy against interpretability.

interpretability-vs-accuracy — Interpretability vs accuracy

GBM side

GBM pushes decision trees to close accuracy level of neural networks. Its approach is that a single decision tree is not strong enough. Applying a decision tree algorithm based on the error of the previous round is getting closer or going beyond to neural networks accuracy. Besides, it is still explainable.

gbm-illustration — Playing gold is very similar to GBM by Terence Parr)

A Comparative Study

Consider the two-spiral data set. It is really hard to classify these kind of a data set. This data set exists in tensorflow playground. Enabling all input candidates and increasing the number of nodes in the first hidden layer to 8 over-performs.

spiral-dataset-playground — Classification of two spiral data set in deep learning

A similar interface in GBM named gradient boosting interactive playground exists. Building recursively 30 decision trees over-performs, too.

Deep learning might be Superman and move a really heavy rock. Decision trees do not have super powers but several decision tree come together and become GBM. Herein, GBM can move same heavy rock, too. So, GBM would be Flash if deep learning were Superman!

The race

Convinced you about that GBM is as strong as Deep Learning. Could it be stronger?

Kaggle is the platform for the data enthusiastics. It is not a must but winning model owners can explain model details. We have the data for the 29 winning solutions in 2015. 17 solutions are related to GBM whereas 11 solutions are using neural networks. This means that GBM brings more than half of the winning solutions. GBM dominates KDDCup challange, too. Every winning team used GBM in the top 10.

superman-flash-comic — Flash seems to go ahead of Superman in comics

Herein, comparing GBM and deep learning might not be fair. Firstly, 9 solutions are using both neural networks and GBM as ensemble models. Moreover, deep learning winning solutions mostly related to image based unstructured data whereas GBM winning solutions mostly related to structured data. Ensemble of GBM and neural networks pair is very successful because these models appear in the podium ceremony.

Kaggle also published a survey for data science and machine learning. Half of the attendees stated that they are familiar with decision trees in their daily work whereas only 37.6% declared that they use neural networks. We’ve mentioned about heavy regulations in banking and finance industry. In this side, decision trees are in the toolbox of 60% of finance employees whereas 30% already put neural networks in the toolbox.

Marvel Universe

I would like to use imagery when I describe concepts.

If ML algorithms were characters in #marvel universe, then Deep Learning would be Hulk and GBM would be Iron Man. Deep learning is the strongest but mostly uncontrolled whereas GBM is strong as well and it follows instructions to accomplish the objective. #machinelearning pic.twitter.com/ZFhtLz3OyP

— Sefik Ilkin Serengil (@serengil) April 26, 2020

Photo Finish

In this post, I try to convince you about that neither GBM nor deep learning is superior than the other. GBM is a very powerful machine learning algorithm that machine learning practitioners should put it in their toolbox. It should not be ignored any practitioner.

Even though the race result of the Superman and Flash is not mentioned in Justice League, this appears in the comic book. They arrived the finish tie simultaneously. There is no winner. Maybe the challenge between deep learning and GBM has not winners, too. The winner might be ensemble models.

Acknowledgment

I created the content of this blog post for Bilisim IO Tech Talks event. It was performed in Turkish. I captured same slides in Webinar format in English and published on my Youtube channel. You can find the Machine Learning Wars Webinar here.

Like this blog? Support me on Patreon