A Step by Step CART Decision Tree Example

An algorithm can be transparent only if its decisions can be read and understood by people clearly. Even though deep learning is superstar of machine learning nowadays, it is an opaque algorithm and we do not know the reason of decision. Herein, Decision tree algorithms still keep their popularity because they can produce transparent decisions. ID3 uses information gain whereas C4.5 uses gain ratio for splitting. Here, CART is an alternative decision tree building algorithm. It can handle both classification and regression tasks. This algorithm uses a new metric named gini index to create decision points for classification tasks. We will mention a step by step CART decision tree example by hand from scratch.

Vlog

Here, you should watch the following video to understand how decision tree algorithms work. No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees. They all look for the feature offering the highest information gain. Then, they add a decision rule for the found feature and build an another decision tree for the sub data set recursively until they reached a decision.

🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Besides, regular decision tree algorithms are designed to create branches for categorical features. Still, we are able to build trees with continuous and numerical features. The trick is here that we will convert continuos features into categorical. We will split the numerical feature where it offers the highest information gain.

CART in Python

This blog post mentions the deeply explanation of CART algorithm and we will solve a problem step by step. On the other hand, you might just want to run CART algorithm and its mathematical background might not attract your attention.

Herein, you can find the python implementation of CART algorithm here. You can build CART decision trees with a few lines of code. This package supports the most common decision tree algorithms such as ID3, C4.5, CHAID or Regression Trees, also some bagging methods such as random forest and some boosting methods such as gradient boosting and adaboost.

Objective

Decision rules will be found by GINI index value.

Data set

We will work on same dataset in ID3. There are 14 instances of golf playing decisions based on outlook, temperature, humidity and wind factors.

Day	Outlook	Temp.	Humidity	Wind	Decision
1	Sunny	Hot	High	Weak	No
2	Sunny	Hot	High	Strong	No
3	Overcast	Hot	High	Weak	Yes
4	Rain	Mild	High	Weak	Yes
5	Rain	Cool	Normal	Weak	Yes
6	Rain	Cool	Normal	Strong	No
7	Overcast	Cool	Normal	Strong	Yes
8	Sunny	Mild	High	Weak	No
9	Sunny	Cool	Normal	Weak	Yes
10	Rain	Mild	Normal	Weak	Yes
11	Sunny	Mild	Normal	Strong	Yes
12	Overcast	Mild	High	Strong	Yes
13	Overcast	Hot	Normal	Weak	Yes
14	Rain	Mild	High	Strong	No

Gini index

Gini index is a metric for classification tasks in CART. It stores sum of squared probabilities of each class. We can formulate it as illustrated below.

Gini = 1 – Σ (Pi)² for i=1 to number of classes

Outlook

Outlook is a nominal feature. It can be sunny, overcast or rain. I will summarize the final decisions for outlook feature.

Outlook	Yes	No	Number of instances
Sunny	2	3	5
Overcast	4	0	4
Rain	3	2	5

Gini(Outlook=Sunny) = 1 – (2/5)² – (3/5)² = 1 – 0.16 – 0.36 = 0.48

Gini(Outlook=Overcast) = 1 – (4/4)² – (0/4)² = 0

Gini(Outlook=Rain) = 1 – (3/5)² – (2/5)² = 1 – 0.36 – 0.16 = 0.48

Then, we will calculate weighted sum of gini indexes for outlook feature.

Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342

Temperature

Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild. Let’s summarize decisions for temperature feature.

Temperature	Yes	No	Number of instances
Hot	2	2	4
Cool	3	1	4
Mild	4	2	6

Gini(Temp=Hot) = 1 – (2/4)² – (2/4)² = 0.5

Gini(Temp=Cool) = 1 – (3/4)² – (1/4)² = 1 – 0.5625 – 0.0625 = 0.375

Gini(Temp=Mild) = 1 – (4/6)² – (2/6)² = 1 – 0.444 – 0.111 = 0.445

We’ll calculate weighted sum of gini index for temperature feature

Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445 = 0.142 + 0.107 + 0.190 = 0.439

Humidity

Humidity is a binary class feature. It can be high or normal.

Humidity	Yes	No	Number of instances
High	3	4	7
Normal	6	1	7

Gini(Humidity=High) = 1 – (3/7)² – (4/7)² = 1 – 0.183 – 0.326 = 0.489

Gini(Humidity=Normal) = 1 – (6/7)² – (1/7)² = 1 – 0.734 – 0.02 = 0.244

Weighted sum for humidity feature will be calculated next

Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367

Wind

Wind is a binary class similar to humidity. It can be weak and strong.

Wind	Yes	No	Number of instances
Weak	6	2	8
Strong	3	3	6

Gini(Wind=Weak) = 1 – (6/8)² – (2/8)² = 1 – 0.5625 – 0.062 = 0.375

Gini(Wind=Strong) = 1 – (3/6)² – (3/6)² = 1 – 0.25 – 0.25 = 0.5

Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428

Time to decide

We’ve calculated gini index values for each feature. The winner will be outlook feature because its cost is the lowest.

Feature	Gini index
Outlook	0.342
Temperature	0.439
Humidity	0.367
Wind	0.428

We’ll put outlook decision at the top of the tree.

cart-step-1 — First decision would be outlook feature

You might realize that sub dataset in the overcast leaf has only yes decisions. This means that overcast leaf is over.

cart-step-2 — Tree is over for overcast outlook leaf

We will apply same principles to those sub datasets in the following steps.

Focus on the sub dataset for sunny outlook. We need to find the gini index scores for temperature, humidity and wind features respectively.

Day	Outlook	Temp.	Humidity	Wind	Decision
1	Sunny	Hot	High	Weak	No
2	Sunny	Hot	High	Strong	No
8	Sunny	Mild	High	Weak	No
9	Sunny	Cool	Normal	Weak	Yes
11	Sunny	Mild	Normal	Strong	Yes

Gini of temperature for sunny outlook

Temperature	Yes	No	Number of instances
Hot	0	2	2
Cool	1	0	1
Mild	1	1	2

Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)² – (2/2)² = 0

Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)² – (0/1)² = 0

Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)² – (1/2)² = 1 – 0.25 – 0.25 = 0.5

Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2

Gini of humidity for sunny outlook

Humidity	Yes	No	Number of instances
High	0	3	3
Normal	2	0	2

Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)² – (3/3)² = 0

Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)² – (0/2)² = 0

Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0

Gini of wind for sunny outlook

Wind	Yes	No	Number of instances
Weak	1	2	3
Strong	1	1	2

Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)² – (2/3)² = 0.266

Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)² – (1/2)² = 0.2

Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466

Decision for sunny outlook

We’ve calculated gini index scores for feature when outlook is sunny. The winner is humidity because it has the lowest value.

Feature	Gini index
Temperature	0.2
Humidity	0
Wind	0.466

We’ll put humidity check at the extension of sunny outlook.

cart-step-3 — Sub datasets for high and normal humidity

As seen, decision is always no for high humidity and sunny outlook. On the other hand, decision will always be yes for normal humidity and sunny outlook. This branch is over.

cart-step-4 — Decisions for high and normal humidity

Now, we need to focus on rain outlook.

Rain outlook

Day	Outlook	Temp.	Humidity	Wind	Decision
4	Rain	Mild	High	Weak	Yes
5	Rain	Cool	Normal	Weak	Yes
6	Rain	Cool	Normal	Strong	No
10	Rain	Mild	Normal	Weak	Yes
14	Rain	Mild	High	Strong	No

We’ll calculate gini index scores for temperature, humidity and wind features when outlook is rain.

Gini of temprature for rain outlook

Temperature	Yes	No	Number of instances
Cool	1	1	2
Mild	2	1	3

Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)² – (1/2)² = 0.5

Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)² – (1/3)² = 0.444

Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466

Gini of humidity for rain outlook

Humidity	Yes	No	Number of instances
High	1	1	2
Normal	2	1	3

Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)² – (1/2)² = 0.5

Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)² – (1/3)² = 0.444

Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466

Gini of wind for rain outlook

Wind	Yes	No	Number of instances
Weak	3	0	3
Strong	0	2	2

Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)² – (0/3)² = 0

Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)² – (2/2)² = 0

Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0

Decision for rain outlook

The winner is wind feature for rain outlook because it has the minimum gini index score in features.

Feature	Gini index
Temperature	0.466
Humidity	0.466
Wind	0

Put the wind feature for rain outlook branch and monitor the new sub data sets.

cart-step-5 — Sub data sets for weak and strong wind and rain outlook

As seen, decision is always yes when wind is weak. On the other hand, decision is always no if wind is strong. This means that this branch is over.

cart-step-6 — Final form of the decision tree built by CART algorithm

Feature Importance

Decision trees are naturally explainable and interpretable algorithms. Besides, we can find the feature importance values as well to understand how model works.

Gradient Boosting Decision Trees

Nowadays, gradient boosting decision trees are very popular in machine learning community. They are actually not different than the decision tree algorithm mentioned in this blog post. They mainly builds sequantial decision trees based on the errors in the previous loop.

Random Forest vs Gradient Boosting

The both random forest and gradient boosting are an approach instead of a core decision tree algorithm itself. They require to run core decision tree algorithms. They also build many decision trees in the background. So, we will discuss how they are similar and how they are different in the following video.

Adaboost vs Gradient Boosting

The both gradient boosting and adaboost are boosting techniques for decision tree based machine learning models. We will discuss how they are similar and how they are different than each other.

Conclusion

So, decision tree building is over. We have built a decision tree by hand. BTW, you might realize that we’ve created exactly the same tree in ID3 example. This does not mean that ID3 and CART algorithms produce same trees always. We are just lucky. Finally, I believe that CART is easier than ID3 and C4.5, isn’t it?

Like this blog? Support me on Patreon

Hi,

I appreciate the efforts you went to creating this informative example, but unfortunately, it’s not really a good example for understanding Breiman’s CART logic. As others posted here, CART is a binary approach.

If you have categorised data here, then the splits that you should be using are Rain vs Not Rain or Mild vs Not Mild. You should calculate the gini index for each example member of the category i.e. for Sunny/Overcast/Rain & Hot/Mild/Cold etc. Each of these binary splits give’s you a value for that member of the category. You then take the minimum of these values as your gini for that attribute. You then compare the lowest gini value for each of the four attributes to decide what should be your first level split. You then have a left and a right quantity.

With these two new quantities you repeat the exercise e.g. if was humidity (as suggested above) then you have one set of humid data and one set of not humid data and you use again all the members of a category i.e. for Sunny/Overcast/Rain & Hot/Mild/Cold etc to find the next split. You stop splitting when you have nothing meaningful to gain i.e. all data will fall 50/50 due to the splits.

There is also the stuff about pruning the tree back up if your decision tree is too deep, but that is no fun for a manually worked example.

I am also unaware of whether you can do a “weighted gini” of the individual ginis as you have done here.

“Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342”.

This is again, not a binary split of the data. See the section “twoing the data” in Breiman’s book.

Anyhow, I just wanted to correct what you have written. You have made a good attempt to explain it in a friendly way, unfortunately it is not quite accurate.

Regards

Conan

25 Comments

Lorenzo says:

October 10, 2018 at 9:48 pm

But Cart Algorithm isn’t used only with max 2 splits? Because it is a binary tree… In this example we can see three split (Sunny, Overcast and Rainy)… How is that possible? Sorry but I don’t understand.

Thanks
1. Sefik Serengil says:
  
  October 11, 2018 at 5:44 am
  
  There is no up limit for classes of a feature in CART algorithm. As you mentioned, classes for outlook feature are sunny, overcast and rain. We’ve checked gini score for sunny outlook, overcast outlook and rainy outlook respectively. Then, we will combine these calculations and calculate gini score for outlook feature.
  
  If you want to build a binary tree, you might still construct a tree for multiple classes as illustrated below.
  
  if outlook = ‘sunny’:
  #do something
  else:
  if outlook = ‘overcast’:
  #do something
  else: #this case refers to rainy outlook
  #do something
2. Soreen Ameen says:
  
  October 30, 2019 at 7:49 pm
  
  But Cart Algorithm isn’t used only with max 2 splits? Because it is a binary tree… In this example we can see three split (Sunny, Overcast and Rainy)… How is that possible? Sorry but I don’t understand.
  Thanks
  1. Sefik Serengil says:
    
    November 7, 2019 at 4:25 pm
    
    Yes, you’re right. I need to fix the tutorial in this way.
    1. Venkat says:
      
      March 9, 2020 at 1:23 pm
      
      For Overcast Outlook, Gini is 0 (Pure subset). Hence no sub-split is required.
      Hence the made tree is correct. Therefore in default, Play golf if Outlook is Overcast.
      
      Any suggestions is welcome on this post.
    2. Attia Iqbal says:
      
      April 9, 2021 at 2:24 pm
      
      HAVE YOU FIXED THIS TUTOTAIL IN SOME OTHER LINK?
xlee says:

February 14, 2019 at 4:09 am

blogs are awesome, but i am confused too, because the representation for
cart model is a binary tree, maybe the calculation need to be slightly adjusted?
just a small question.
Bonny Igboalisi says:

May 16, 2019 at 10:27 pm

Hello Sefik,
I am equally confused with this implementation of the CART algorithm on this dataset. I have modeled this dataset by hand by calculating gini scores and I get the exact same tree as you did, where outlook is selected as the root node with three branches – sunny, overcast and rain. However, when I model the same data using sklearn’s implementation of the CART algorithm, I get a different tree structure with Humidity as the root node. Could you please provide me with an example of this model using sklearn?
Thank you in anticipation of your assistance
Bonny
1. Sefik Serengil says:
  
  May 17, 2019 at 3:44 am
  
  I do not know actually what sklearn does in background. Possibly they add some modification to improve accuracy or performance.
Conan says:

July 7, 2019 at 12:19 pm

Hi,

I appreciate the efforts you went to creating this informative example, but unfortunately, it’s not really a good example for understanding Breiman’s CART logic. As others posted here, CART is a binary approach.

If you have categorised data here, then the splits that you should be using are Rain vs Not Rain or Mild vs Not Mild. You should calculate the gini index for each example member of the category i.e. for Sunny/Overcast/Rain & Hot/Mild/Cold etc. Each of these binary splits give’s you a value for that member of the category. You then take the minimum of these values as your gini for that attribute. You then compare the lowest gini value for each of the four attributes to decide what should be your first level split. You then have a left and a right quantity.

With these two new quantities you repeat the exercise e.g. if was humidity (as suggested above) then you have one set of humid data and one set of not humid data and you use again all the members of a category i.e. for Sunny/Overcast/Rain & Hot/Mild/Cold etc to find the next split. You stop splitting when you have nothing meaningful to gain i.e. all data will fall 50/50 due to the splits.

There is also the stuff about pruning the tree back up if your decision tree is too deep, but that is no fun for a manually worked example.

I am also unaware of whether you can do a “weighted gini” of the individual ginis as you have done here.

“Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342”.

This is again, not a binary split of the data. See the section “twoing the data” in Breiman’s book.

Anyhow, I just wanted to correct what you have written. You have made a good attempt to explain it in a friendly way, unfortunately it is not quite accurate.

Regards

Conan
1. Max says:
  
  December 23, 2019 at 10:50 am
  
  Hello Conan, is there any tutorial that you would recommend solving the (Play Tennis) problem according to your approach?
  1. Sefik Serengil says:
    
    December 23, 2019 at 1:37 pm
    
    I’ll check sources while I was writing this blog post. Then, I’ll share that here soon.
2. Rima says:
  
  January 16, 2021 at 1:05 am
  
  Interesting of what you said.. do you have any suggestions for good sources that explain this kind of topics in details? thanks.
  1. Sefik Serengil says:
    
    January 16, 2021 at 8:47 am
    
    I put the together the sources I found and give links in the post. Please follow the links. Besides, tutorials of computer science department professors guided me a lot.
Andy says:

March 16, 2020 at 3:44 am

Very neat explanation. Your post doesn’t involve lots of math symbols and confuse the readers like others. Easy to understand.
1. Sefik Serengil says:
  
  March 16, 2020 at 4:41 am
  
  That’s the concept of this blog 🙂 thank you for your feedback btw.
Ted says:

March 22, 2020 at 12:38 pm

Nice solving of the example.But i just have this one doubt…What should i do if the values for the attributes are continuous like the humidity and temperature in the c4.5 example?Do i have to convert them to nominal ones using the gini index as threshold?
1. Sefik Serengil says:
  
  March 22, 2020 at 12:40 pm
  
  Yes, you have to convert continuous / numeric features to numerical similar to C4.5 post.
asia says:

June 18, 2020 at 1:00 pm

Hi,
Why use gini index instead of
PL, PR, P(j|tL),P(j|tR), Q(s\t), and ϕ(s|t) ?
I see another journal publication use PL, PR, P(j|tL),P(j|tR), Q(s\t), and ϕ(s|t).
1. Sefik Serengil says:
  
  June 18, 2020 at 1:03 pm
  
  The original study uses gini for cart. Studies you’ve mentioned would be extended features.
Rima says:

January 16, 2021 at 12:54 am

Thanks for your simple useful explanation..
MZ says:

May 6, 2021 at 3:28 pm

Thanks for the clear explanation.
one question, there are possibilities as below that you could have in each level of split, then how you will choose which feature goes first?
– In a case that the min cost for 2 features be equal i.e. assume in 2nd level that data gives you same mini gini index for Humidity and Wind and none of the feature gives you straight answer to Yes/No
– In a case on feature is giving the min gini index for different features i.e. assume if Humidity gives you the min gini index for Sunny and Rainy as same time
1. Sefik Serengil says:
  
  May 6, 2021 at 6:21 pm
  
  You can pick randomly if two features or classes have same scores
nasim matar says:

March 4, 2022 at 7:01 pm

much appriciated
Hello says:

March 4, 2022 at 8:08 pm

Why does this have more than 2 splits :((

Comments are closed.