XGBoost vs LightGBM

XGBoost and LightGBM are the most common and most popular gradient boosting frameworks nowadays. There is no absolute better solution of course. The both framework have advantages and disadvantages. In this post, we are going to compare these frameworks and list their pros and cons. In this way, you might have an opinion to choose the right framework for your case.

rocky-vs-ivan — Rocky Balboa vs Ivan Drago in Rocky IV (1985)

Vlog

The following vlog video covers the same topics of this post as well.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Random Forest vs Gradient Boosting

The both random forest and gradient boosting are an approach instead of a core decision tree algorithm itself. They require to run core decision tree algorithms. They also build many decision trees in the background. The both LightGBM and XGBoost cover these algorithms but gradient boosting is the default. So, we will discuss how they are similar and how they are different in the following video.

Growing trees

The main difference between these frameworks is the way they are growing. XGBoost applies level-wise tree growth where LightGBM applies leaf-wise tree growth. Level-wise approach grows horizontal whereas leaf-wise grows vertical.

For example, in the 1st illustration XGBoost expands the 1st level of tree, and then expands the 2nd level when 1st level was expanded. On the other hand, LightGBM doesn’t wait to finish the 1st level to expand child nodes in the 2nd or 3rd level. It goes to maximum depth vertically.

If we let trees to be grown full, then the both approaches will build same trees. However, we mostly don’t let trees to be grown full. We might apply early stopping criteria or pre or post pruning to avoid overfitting. Besides, leaf-wise splits nodes based on the contribution to global loss whereas level-wise splits based on the contribution to loss of particular branch. That’s why, those two approaches build different trees in practice.

Herein, leaf-wise is mostly faster than the level-wise. That’s why, LightGBM is almost 10 times faster than XGBoost based on experiments. Even though, XGBoost recently started to support leaf-wise approach in its new version, its default approach is level-wise.

On the other hand, leaf-wise tends to be overfitting. That’s why, XGBoost builds more robust models than LightGBM. XGBoost offers almost 1 or 2 percent more accurate models.

Categorical features

The both XGBoost and LightGBM frameworks expect you to transform nominal features to numerical ones. However, they split the trees based on a rule checking the value is greater than or equal to a threshold or less than a threshold.

Suppose that you have a gender feature and set man to 1, woman to 2 and unknown to 0. Gender is a categorical feature and assigned values don’t express anything about being greater. I mean that being woman is not greater than being man and vice versa even though new label of woman is greater than the label of man. Moreover, you decision rule might be (if gender >=2). Its else condition includes both man and unknown gender. We do not want this to be happened. Rules should be for each gender in this case. I mean that (if gender == 0, else if gender == 1, else if gender == 2).

Passing the gender as categorical feature to LightGBM handles this. LightGBM split the gender feature just based on equality.

#LightGBM<br>
train = lightgbm.Dataset(x, y<br>
, feature_name = ['Age', 'Gender', 'Weight', 'Height']<br>
, categorical_feature = ['Gender'])<br>

However, XGBoost cannot create rules for just equalities. We have a workaround for this. If we expand gender column in the dataset as is_man, is_woman or is_unknown, these new columns store just 0 and 1 values. In other words, we have to apply one hot encoding for categorical features in XGBoost.

#XGBoost
unique_classes = df['Gender'].unique()
one_hot = pd.get_dummies(unique_classes, prefix='Gender')
one_hot['Gender'] = unique_classes
df = df.merge(one_hot, on = ['Gender'], how='left')
df = df.drop(columns = ['Gender'])

Coding might be easy for one hot encoding but it lasts long for large data sets. I think this is a huge handicap for XGBoost.

However, H2O wraps XGBoost as well and it supports native handling of categorical features. Besides, its preprocessing module performs multiprocessing.

On the other hand, if you have a feature expressing weekday you can set 1 to 7. In this case, the rule weekday >= 6 might include weekends and its else condition might include working days. Applying label encoding is fine for this case.

Processing unit

The right tool becomes different based on the processing unit you have.

I said that LightGBM is 10 times faster than XGBoost but this is true if you have CPU. If you are going to run these frameworks on GPU, then XGBoost becomes faster than LightGBM on GPU.

Besides, running LightGBM on GPU is really problematic. The default installation package does not support GPU. You have to build GPU distribution and you might have troubles when installation.

#LightGBM
!pip install lightgbm --install-option=--gpu

However, running XGBoost on GPU is easy. You just need to pass that you want to use GPU to configuration parameter in training.

#XGBoost
params = {
'learning_rate': 0.01,
'n_estimators': 250,
'object': 'multi:softmax',

'nthread': 4,
'gpu_id': 0,
'tree_method': 'gpu_hist'
}

Conclusion

Remember the principle the right knife for the right job.

If you are going to build machine learning models for an enterprise environment, I mean that you have GPUs and strong CPUs, then you should use XGBoost. It is more scaleable than LightGBM.

If you are going to build models for your personal environment, if you do not have GPU and limited CPU power, you might use LightGBM in early stages of your project because it is 10 times faster than XGBoost and this provides you to spend much more time for feature engineering. You should switch your model to XGBoost in the final stages because feature engineering will be done and XGBoost will build more robust models.

However, if your categorical features are too many in your data set, you might skip regular XGBoost alternative and think to use H2O XGBoost.

You can find the source code for building a model for same data set with both XGBoost and LightGBM here.

Support this blog financially if you do like!

Vlog

Random Forest vs Gradient Boosting

Growing trees

Categorical features

Processing unit

Conclusion

Related

1 Comment

Leave a Reply Cancel reply

Vlog

Random Forest vs Gradient Boosting

Growing trees

Categorical features

Processing unit

Conclusion

Related

1 Comment

Leave a Reply Cancel reply

Discover more from Sefik Ilkin Serengil