How Random Forests Can Keep You From Decision Tree

Life cycle of a tree begins with a seed. Then seed grows and becomes a young plant. Young plant is tranformed to a tree in years. After then, trees convert their areas to forest. Giant forests all came from a single seed at first. So, rule based systems follow similar steps.

Forest

Data would be a seed in this lifecycle. Growing data becomes dataset and that would be a young plant. Decision tree algorithms transform datasets to rule based trees. They are applied to dataset and it becomes to a tree.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

decision-tree
Decision tree

Herein, random forest is a new algorithm derived from decision trees. Instead of applying decision tree algorithm on all dataset, dataset would be seperated into subsets and same decision tree algorithm would be applied to these subsets. Decision would be made by the highest number of subset results.

random-forest
Random forest

Decision Tree is Good, but Random Forests are Better

So, why traditional decision tree algorithm evolved into random forests? Working on all dataset may cause to overfitting. In other words, it might cause memorizing instead of learning. In this case, you would have high accuracy on training set, but you would fail on newly instances.

What’s more, random forests work on multiple and small datasets. Increasing the dataset size would increase the learning time exponentially in decision tree. So, you can parallelize the learning procedure in random forest. In this way, learning time would last less than decision tree.

If decision tree algorithm were the wise / sage person of around you, then random forests would be multiple smart people. Wise one might know every domain but each smart person can be expert on different domains. Wise one would most probably respond the correct answer but you might not always have a opportunity to ask him. However, bringing smart people to gather would most probably be acceptable.

You might remember the Voltron cartoon series. That was a legendary cartoon in my childhood. Nowadays, Netflix reshoot it. To sum up, there are 5 different lion robots and they have ability to fight individually. These robots can also combine and become a giant robot called as Voltron. Every episode they try to fight individually but finally they need to combine and become Voltron to defeat the enemy. So, random forests are just like combining Voltron!

voltron
Voltron is a super robot combined from 5 different robots

Still, there is nothing new under the sun!

Random Forest vs Gradient Boosting

The both random forest and gradient boosting are an approach instead of a core decision tree algorithm itself. They require to run core decision tree algorithms. They also build many decision trees in the background. So, we will discuss how they are similar and how they are different in the following video.

Python Code

Hands-on coding might help some people to understand algorithms better. You can find the python implementation of random forest algorithm here. This is Chefboost and it supports common decision tree algorithms such as ID3C4.5CART or Regression Trees, also some bagging methods such as random forest and some boosting methods such as gradient boosting and adaboost.





There are many ways to support a project – starring the GitHub repos is just one.


Like this blog? Support me on Patreon

Buy me a coffee


2 Comments

  1. Your decision tree related lectures are very useful to understand the basics of different trees. I enjoyed.

Comments are closed.