Boosting makes decision trees cool again. Here, gradient boosting and adaboost are the most common boosting techniques for decision tree based machine learning. In this post, we are going to compare those two boosting techniques and explain the similar and different parts of them.
Vlog
You can either continue to read this tutorial or watch the following video. They both cover the comparison of gradient boosting and adaboost.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
Naming
Gradient boosting comes from boosting results with gradient descent algorithm. Some sources mention this gradient boosting machines or gradient boosting decision trees.
Adaboost is the acronym of the adaptive boosting.
Boosting
The both gradient boosting and adaboost build many decision trees. I mean that they build a tree and build another one with the error of the previous one.
The way gradient boosting boosting its results is to find the difference between prediction and actual value of an instance. Then, the target label of that instance will be replaced with this subtraction in the next round. Difference between actual and prediction comes from the derivative of the mean squared error as a loss function.
Adaboost applies a similar procedure. It builds a decision tree, then it will increase the target label for incorrectly predicted ones, and it will decrease the target label value for correctly predicted ones.
In this way, predictions with high error will be more important in the next rounds in those techniques both.
Weights
In gradient boosting, each tree has a same weight. To make a final decision, we will find the sum of the predictions of those sequential trees.
On the other hand, trees have weights in adaboost. Each tree will contribute to the prediction with respect to its weight.
Decision tree algorithm
The both gradient boosting and adaboost run regression trees. No matter what kind of a data set you have. It does not matter having a regression or classification problem. You have to transform classification data set to regression task firstly.
Tree depth
Adaboost builds one-depth regression tree (or decision stumps). It just expects 51% accuracy score as a prerequisite.
On the other hand, gradient boosting build trees with higher depth. For instance, lightgbm and xgboost are popular gradient boosting implementations. The maximum depth of a tree is set to 5 in the default configuration for these libraries.
Linearity
Built trees are linear models in adaboost because it builds a tree with one depth. However, predictions will be made with the combination of many linear models and it will be non-linear anyway.
On the other hand, built trees are already non-linear in gradient boosting. So, final predictions will be non-linear as well.
Adoption
Nowadays, gradient boosting is highly adopted in daily data science competitions. It can even battle with deep learning in many Kaggle challenges.
On the other hand, adaboost is a legacy technique and it is not adopted as common as gradient boosting. Still, it appears in the face and eye detection module (haar cascade) of opencv.
Conclusion
So, we have mentioned two important boosting techniques in tree-based machine learning. The both gradient boosting and adaboost are very similar to the moving heavy rocks. A poor employee cannot move a heavy rock but poor employees come together and move a heavy rock. This is the idea behind boosting!
Support this blog if you do like!