A Step By Step C4.5 Decision Tree Example

Decision trees are still hot topics nowadays in data science world. Here, ID3 is the most common conventional decision tree algorithm but it has bottlenecks. Attributes must be nominal values, dataset must not include missing data, and finally the algorithm tend to fall into overfitting. Here, Ross Quinlan, inventor of ID3, made some improvements for these bottlenecks and created a new algorithm named C4.5. Now, the algorithm can create a more generalized models including continuous data and could handle missing data. Additionally, some resources such as Weka named this algorithm as J48. Actually, it refers to re-implementation of C4.5 release 8.

guardians_galaxy_Groot_1000-920x584
Groot appears in Guardians of Galaxy and Avengers Infinity War

Vlog

Here, you should watch the following video to understand how decision tree algorithms work. No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees. They all look for the feature offering the highest information gain. Then, they add a decision rule for the found feature and build an another decision tree for the sub data set recursively until they reached a decision.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Besides, regular decision tree algorithms are designed to create branches for categorical features. Still, we are able to build trees with continuous and numerical features. The trick is here that we will convert continuos features into categorical. We will split the numerical feature where it offers the highest information gain.

C4.5 in Python

This blog post mentions the deeply explanation of C4.5 algorithm and we will solve a problem step by step. On the other hand, you might just want to run C4.5 algorithm and its mathematical background might not attract your attention.

Herein, you can find the python implementation of C4.5 algorithm here. You can build C4.5 decision trees with a few lines of code. This package supports the most common decision tree algorithms such as ID3, CART, CHAID or Regression Trees, also some bagging methods such as random forest and some boosting methods such as gradient boosting and adaboost.

Objective

Decision rules will be found based on entropy and information gain ratio pair of each feature. In each level of decision tree, the feature having the maximum gain ratio will be the decision rule.

Data set

We are going to create a decision table for the following dataset. It informs about decision making factors to play tennis at outside for previous 14 days. The dataset might be familiar from the ID3 post. The difference is that temperature and humidity columns have continuous values instead of nominal ones.

Day Outlook Temp. Humidity Wind Decision
1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Weak Yes
4 Rain 70 96 Weak Yes
5 Rain 68 80 Weak Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rain 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rain 71 80 Strong No

We will do what we have done in ID3 example. Firstly, we need to calculate global entropy. There are 14 examples; 9 instances refer to yes decision, and 5 instances refer to no decision.

Entropy(Decision) = ∑ – p(I) . log2p(I) = – p(Yes) . log2p(Yes) – p(No) . log2p(No) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

In ID3 algorithm, we’ve calculated gains for each attribute. Here, we need to calculate gain ratios instead of gains.





GainRatio(A) = Gain(A) / SplitInfo(A)

SplitInfo(A) = -∑ |Dj|/|D| x log2|Dj|/|D|

Wind Attribute

Wind is a nominal attribute. Its possible values are weak and strong.

Gain(Decision, Wind) = Entropy(Decision) – ∑ ( p(Decision|Wind) . Entropy(Decision|Wind) )

Gain(Decision, Wind) =  Entropy(Decision) – [ p(Decision|Wind=Weak) . Entropy(Decision|Wind=Weak) ] + [ p(Decision|Wind=Strong) . Entropy(Decision|Wind=Strong) ]

There are 8 weak wind instances. 2 of them are concluded as no, 6 of them are concluded as yes.

Entropy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

Entropy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Gain(Decision, Wind) = 0.940 – (8/14).(0.811) – (6/14).(1) = 0.940 – 0.463 – 0.428 = 0.049

There are 8 decisions for weak wind, and 6 decisions for strong wind.





SplitInfo(Decision, Wind) = -(8/14).log2(8/14) – (6/14).log2(6/14) = 0.461 + 0.524 = 0.985

GainRatio(Decision, Wind) = Gain(Decision, Wind) / SplitInfo(Decision, Wind) = 0.049 / 0.985 = 0.049

Outlook Attribute

Outlook is a nominal attribute, too. Its possible values are sunny, overcast and rain.

Gain(Decision, Outlook) = Entropy(Decision) – ∑ ( p(Decision|Outlook) . Entropy(Decision|Outlook) ) =

Gain(Decision, Outlook) = Entropy(Decision) – p(Decision|Outlook=Sunny) . Entropy(Decision|Outlook=Sunny) – p(Decision|Outlook=Overcast) . Entropy(Decision|Outlook=Overcast) – p(Decision|Outlook=Rain) . Entropy(Decision|Outlook=Rain)

There are 5 sunny instances. 3 of them are concluded as no, 2 of them are concluded as yes.

Entropy(Decision|Outlook=Sunny) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -(3/5).log2(3/5) – (2/5).log2(2/5) = 0.441 + 0.528 = 0.970

Entropy(Decision|Outlook=Overcast) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -(0/4).log2(0/4) – (4/4).log2(4/4) = 0

Notice that log2(0) is actually equal to -∞ but assume that it is equal to 0. Actually, lim (x->0) x.log2(x) = 0. If you wonder the proof, please look at this post.

Entropy(Decision|Outlook=Rain) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -(2/5).log2(2/5) – (3/5).log2(3/5) = 0.528 + 0.441 = 0.970





Gain(Decision, Outlook) = 0.940 – (5/14).(0.970) – (4/14).(0) – (5/14).(0.970) – (5/14).(0.970) = 0.246

There are 5 instances for sunny, 4 instances for overcast and 5 instances for rain

SplitInfo(Decision, Outlook) = -(5/14).log2(5/14) -(4/14).log2(4/14) -(5/14).log2(5/14) = 1.577

GainRatio(Decision, Outlook) = Gain(Decision, Outlook)/SplitInfo(Decision, Outlook) = 0.246/1.577 = 0.155

Humidity Attribute

As an exception, humidity is a continuous attribute. We need to convert continuous values to nominal ones. C4.5 proposes to perform binary split based on a threshold value. Threshold should be a value which offers maximum gain for that attribute. Let’s focus on humidity attribute. Firstly, we need to sort humidity values smallest to largest.

Day Humidity Decision
7 65 Yes
6 70 No
9 70 Yes
11 70 Yes
13 75 Yes
3 78 Yes
5 80 Yes
10 80 Yes
14 80 No
1 85 No
2 90 No
12 90 Yes
8 95 No
4 96 Yes

Now, we need to iterate on all humidity values and seperate dataset into two parts as instances less than or equal to current value, and instances greater than the current value. We would calculate the gain or gain ratio for every step. The value which maximizes the gain would be the threshold.

Check 65 as a threshold for humidity

Entropy(Decision|Humidity<=65) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -(0/1).log2(0/1) – (1/1).log2(1/1) = 0

Entropy(Decision|Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) =0.530 + 0.431 = 0.961

Gain(Decision, Humidity<> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048





* The statement above refers to that what would branch of decision tree be for less than or equal to 65, and greater than 65. It does not refer to that humidity is not equal to 65!

SplitInfo(Decision, Humidity<> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371

GainRatio(Decision, Humidity<> 65) = 0.126

Check 70 as a threshold for humidity

Entropy(Decision|Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811

Entropy(Decision|Humidity>70) =  – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970

Gain(Decision, Humidity<> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231 – 0.692 = 0.014

SplitInfo(Decision, Humidity<> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863

GainRatio(Decision, Humidity<> 70) = 0.016

Check 75 as a threshold for humidity





Entropy(Decision|Humidity<=75) = – (1/5).log2(1/5) – (4/5).log2(4/5) = 0.721

Entropy(Decision|Humidity>75) = – (4/9).log2(4/9) – (5/9).log2(5/9) = 0.991

Gain(Decision, Humidity<> 75) = 0.940 – (5/14).(0.721) – (9/14).(0.991) = 0.940 – 0.2575 – 0.637 = 0.045

SplitInfo(Decision, Humidity<> 75) = -(5/14).log2(4/14) -(9/14).log2(10/14) = 0.940

GainRatio(Decision, Humidity<> 75) = 0.047

I think calculation demonstrations are enough. Now, I skip the calculations and write only results.

Gain(Decision, Humidity <> 78) =0.090, GainRatio(Decision, Humidity <> 78) =0.090

Gain(Decision, Humidity <> 80) = 0.101, GainRatio(Decision, Humidity <> 80) = 0.107

Gain(Decision, Humidity <> 85) = 0.024, GainRatio(Decision, Humidity <> 85) = 0.027

Gain(Decision, Humidity <> 90) = 0.010, GainRatio(Decision, Humidity <> 90) = 0.016





Gain(Decision, Humidity <> 95) = 0.048, GainRatio(Decision, Humidity <> 95) = 0.128

Here, I ignore the value 96 as threshold because humidity cannot be greater than this value.

As seen, gain maximizes when threshold is equal to 80 for humidity. This means that we need to compare other nominal attributes and comparison of humidity to 80 to create a branch in our tree.

Temperature feature is continuous as well. When I apply binary split to temperature for all possible split points, the following decision rule maximizes for both gain and gain ratio.

Gain(Decision, Temperature <> 83) = 0.113, GainRatio(Decision, Temperature<> 83) = 0.305

Let’s summarize calculated gain and gain ratios. Outlook attribute comes with both maximized gain and gain ratio. This means that we need to put outlook decision in root of decision tree.

Attribute Gain GainRatio
Wind  0.049  0.049
Outlook  0.246  0.155
Humidity <> 80  0.101  0.107
Temperature <> 83  0.113  0.305

If we will use gain metric, then outlook will be the root node because it has the highest gain value. On the other hand, if we use gain ratio metric, then temperature will be the root node because it has the highest gain ratio value. I prefer to use gain here similar to ID3. As a homework, please try to build a C4.5 decision tree based on gain ratio metric.

After then, we would apply similar steps just like as ID3 and create following decision tree. Outlook is put into root node. Now, we should look decisions for different outlook types.

Outlook = Sunny

We’ve split humidity for greater than 80, and less than or equal to 80. Surprisingly, decisions would be no if humidity is greater than 80 when outlook is sunny. Similarly, decision would be yes if humidity is less than or equal to 80 for sunny outlook.

Day Outlook Temp. Hum. > 80 Wind Decision
1 Sunny 85 Yes Weak No
2 Sunny 80 Yes Strong No
8 Sunny 72 Yes Weak No
9 Sunny 69 No Weak Yes
11 Sunny 75 No Strong Yes

Outlook = Overcast

If outlook is overcast, then no matter temperature, humidity or wind are, decision will always be yes.





Day Outlook Temp. Hum. > 80 Wind Decision
3 Overcast 83 No Weak Yes
7 Overcast 64 No Strong Yes
12 Overcast 72 Yes Strong Yes
13 Overcast 81 No Weak Yes

Outlook = Rain

We’ve just filtered rain outlook instances. As seen, decision would be yes when wind is weak, and it would be no if wind is strong.

Day Outlook Temp. Hum. > 80 Wind Decision
4 Rain 70 Yes Weak Yes
5 Rain 68 No Weak Yes
6 Rain 65 No Strong No
10 Rain 75 No Weak Yes
14 Rain 71 No Strong No

Final form of decision table is demonstrated below.

c45-result
Decision tree generated by C4.5

Feature Importance

Decision trees are naturally explainable and interpretable algorithms. Besides, we can find the feature importance values as well to understand how model works.

Gradient Boosting Decision Trees

Nowadays, gradient boosting decision trees are very popular in machine learning community. They are actually not different than the decision tree algorithm mentioned in this blog post. They mainly builds sequantial decision trees based on the errors in the previous loop.

Random Forest vs Gradient Boosting

The both random forest and gradient boosting are an approach instead of a core decision tree algorithm itself. They require to run core decision tree algorithms. They also build many decision trees in the background. So, we will discuss how they are similar and how they are different in the following video.

Adaboost vs Gradient Boosting

The both gradient boosting and adaboost are boosting techniques for decision tree based machine learning models. We will discuss how they are similar and how they are different than each other.

Conclusion

So, C4.5 algorithm solves most of problems in ID3. The algorithm uses gain ratios instead of gains. In this way, it creates more generalized trees and not to fall into overfitting. Moreover, the algorithm transforms continuous attributes to nominal ones based on gain maximization and in this way it can handle continuous data. Additionally, it can ignore instances including missing data and handle missing dataset. On the other hand, both ID3 and C4.5 requires high CPU and memory demand. Besides, most of authorities think decision tree algorithms in data mining field instead of machine learning.

Bonus

In this post, we have used gain metric to build a C4.5 decision tree. If we use gain ratio as a decision metric, then built decision tree would be a different look.

def findDecision(Outlook, Temperature, Humidity, Wind)
   if Temperature&amp;amp;amp;amp;amp;lt;=83:
      if Outlook == 'Rain':
         if Wind == 'Weak':
            return 'Yes'
         elif Wind == 'Strong':
            return 'No'
      elif Outlook == 'Overcast':
         return 'Yes'
      elif Outlook == 'Sunny':
         if Humidity&amp;amp;amp;amp;amp;gt;65:
            if Wind == 'Strong':
               return 'No'
            elif Wind == 'Weak':
               return 'No'
   elif Temperature&amp;amp;amp;amp;amp;gt;83:
      return 'No'

I ask you to use gain ratio metric as a homework to understand C4.5 algorithm. Gain ratio based C4.5 decision tree will be look like the following decision rules.






Support this blog if you do like!

Buy me a coffee      Buy me a coffee


81 Comments

  1. GainRatio(Decision, Humidity 65) = 0.126
    GainRatio(Decision, Humidity 80) = 0.107

    How comes you took into account a threshold of 80 when GainRatio for 65 is higher?

    1. In case of Humidity<=80, there are 2 no and 7 yes decisions. Total number of instances is 9
      Entropy(Decision|Humidity<=80) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = - (2/9) * log(2/9) - (7/9)*log(7/9) = 0.764 (BTW, log refers to the base 2)
      In case of Humidity>80, there are 3 no and 2 yes decisions. Total number of instances is 5
      Entropy(Decision|Humidity>80) = – p(No) . log2p(No) – p(Yes) . log2p(Yes) = -(3/5)*log(3/5) – (2/5)*log(2/5) = 0.971

      Global entropy was calculated as 0.940 in previous steps

      Now, it is time to calculate Gain.
      Gain(Decision, Humidity<> 80) = Entropy(Decision) – p(Humidity<=80) * Entropy(Decision|Humidity<=80) - p(Humidity>80)*Entropy(Decision|Humidity>80)
      Gain(Decision, Humidity<> 80) = 0.940 – (9/14)*0.764 – (5/14)*0.971 = 0.101

      Now, we can calculate GainRatio but before we need to calculate SplitInfo first.

      SplitInfo(Decision, Humidity<> 80) = -p(No)*log(p(No)) – p(Yes)*log(P(Yes)) = -(9/14)*log(9/14) – (5/14)*log(5/14) = 0.940

      GainRatio = Gain / SplitInfo = 0.101 / 0.940 = 0.107

      I hope this explanation is understandable.

      1. I don’t understand why we choose 80 its gain ratio is only 0.107
        for 65 its 0.126
        and 0.107<0.126 ?

        1. If we would use gain ratio, then you will be right. Humidity=65 (0.126) is more dominant than Humidity=80 (0.107).
          But I use information gain instead of gain ratio here. In this case, Humidity=80 (0.101) is more dominant than Humidity= 65 (0.048).

          If you ask that why information gain is used instead of gain ratio, it is all up to you. You might want to use gain ratio, and herein 65 will be your choice.

  2. You have the following
    Gain(Decision, Humidity 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231 – 0.692 = 0.014

    there are 3 values of 70 so surely P(Humidity 70) = P( No) = 3/14 not 4/14

    1. Here, 4 is the number of instances which are less than or equal to 70. Humidity of instances for day 6, 7, 9, 11 are less than or equal to 70.

      Similarly, 10 is the number of instances which are greater than 70. Number of instances greater than 70 is 10.

  3. I’m sorry I posted the wrong reference. It should have been
    Gain(Decision, Humidity 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231 – 0.692 = 0.014

    so,
    there are 3 values of 70 so surely P(Humidity 70) = P( No) = 3/14 not 4/14
    thanks for your attention

    1. You were right if we need Gain(Decision, Humidity = 70) but we need Gain(Decision, Humidity <= 70). Gain(Decision, Humidity ? 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231 – 0.692 = 0.014 In this equation 4/14 is probability of instances less than or equal to 70, and 10/14 is probability of instances greater than 70.

      1. I do understand that but the statement is ‘not equal’ to 70, not less than or equal. If the objective is to have values less than or equal and values greater than then the calculation is that of the global entropy, i.e. all values surely.

        1. Right, cause of poor communication. Actually, I would not intent as your understanding. I should mention that in the post.

          The statement Gain(Decision, Humidity ? 70) refers to that what would be if the branch of decision tree is for less than or equal to 70, and greater than 70. All calculations made with this approach.

          I hope everything is clear now. Thank you for your attention.

  4. Looks as if the editor loses the not-equal sign, hence the poor communication..
    The expression in question is
    Gain(Decision, Humidity¬= 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.940 – 0.231 – 0.692 = 0.014

    There being 3 instances of 70 so 3 instances where P(Humidity¬=70) is false, i.e. P(No) = 3/14

  5. I’m sorry to have to return to the point made by Dinca Andrei but I think the confusion arises from a statement in your blog
    The value which maximizes the gain would be the threshold.

    Is it not the case that the threshold is the value that minimises Entropy(Decision|Humidity threshold)
    Here are some calculations, which if taken with the ones you perform, does show 80 as the splitting point – Entropy(Decision|Humidity 80) is the least value

    le 65 0
    > 65 0.961237
    65 0.892577
    g ratio 0.047423
    split 0.371232
    g ratio 0.127745

    le 78 0.650022
    >78 1
    78 0.85001
    g 0.08999
    split 0.985228
    g ratio 0.09134

    le 80 0.764205
    >80 0.970951
    80 0.838042
    g 0.101958
    spli 0.940286
    g ratio 0.108433

    le 85 0.881291
    >85 1
    85 0.915208
    g 0.024792
    split 0.863121
    g ratio 0.028724

    Thanks for your attention.

    1. Yes, you are absolutely right. I summarized gain and gain ratios for every possible threshold point.

      branch-> 65 70 75 78 80 85 90
      gain 0.048 0.014 0.045 0.09 0.101 0.024 0.01
      gain ratio 0.126 0.016 0.047 0.09 0.107 0.027 0.016

      I stated that “We would calculate the gain OR gain ratio for every step. The value which maximizes the gain would be the threshold.”. Now, it is all up to you to decide threshold point based on gain or gain ratio. If prefer to use gain and my threshold would be 80. If you prefer to use gain ratio metric, your threshold would be 65. The both approaches are correct.

  6. HI ,, are this algorithm good for a large database , like a dataset for large manufacture
    Thank you

    1. Decision tree algorithms require high memory demand. You should look its extended version – random forests, this might be adapted better for your problem

    1. It’s all up to you. You can use either information gain or gain ratio. In this case, using information gain is my choice.

  7. why when i tried your chefboost on github, i got an error

    File “Chefboost.py”, line 81
    print(header,end=”)
    ^
    SyntaxError: invalid syntax

    how to solve this, thanks

      1. ah sorry,i used python 2.x

        i have another question, so in the end ‘num_of_trees’ variable is not used? i thought we can use the variable for limit the number of leaf tree

        sorry if I am wrong in reading the code

        1. The variable number of trees is using in random forest. Regular tree algorithms such as id3 or c4.5 will create a single tree.

  8. Hi
    Thank you for this excellent description .I just have a question , how I can calculate accuracy of training data?
    thank you

    1. When you run a decision tree algorithm, it builds decision rules. For example, I use C4.5 algorithm and the data set https://github.com/serengil/chefboost/blob/master/dataset/golf2.txt , the following rules created.

      def findDecision(Outlook,Temperature,Humidity,Wind):
      …if.Temperature<=83:
      ……if Outlook==’Rain’:
      ………if Wind==’Weak’:
      …………return ‘Yes’
      ………if Wind==’Strong’:
      …………return ‘No’
      ……if Outlook==’Overcast’:
      ………return ‘Yes’
      ……if Outlook==’Sunny’:
      ………if Humidity>65:
      …………if Wind==’Weak’:
      ……………return Yes’
      …………if Wind==’Strong’:
      ……………return ‘Yes’
      …if Temperature>83:
      ……return ‘No’

      Now, I check all instances in the same data set. Let’s focus on the first instance.

      Outlook,Temp.,Humidity,Wind,Decision
      Sunny,85,85,Weak,No

      Decision rules say that prediction is no because temperature is 85 and it is greater than 83. On the other hand, decision column says actual value and it is no, too. This instance is classified correctly. I will apply same procedures for all instances and check how many of them are classified correctly. Dividing the correctly classified instances to all instances will be the accuracy on training set.

      Similarly, if you apply these decision rules for a data set haven’t been fed to training set will be you test accuracy.

      I strongly recommend you to run this python code to understand clear: https://github.com/serengil/chefboost

  9. i drew decision tree using info gain…it came perfectly… branches were more. while I drew decision tree using gain ratio… the tree is pruned and I cannot solve it further because I get gain ratio either more than 1.0 or in -ve. while splitting, one I got [0,0,2](classified as III class) and another is [0,49,3] (not classified fully, can I finalize the branch as II class). is it correct to stop abruptly, telling class-II as a leaf node?

      1. for a particular dataset i got different decision tree for id3 and c4.5(output of both id3 and c4.5 decision tree are not the same);
        while using c4.5, i calculated a gain ratio, i draw decision tree using it, but suddenly i started to get gain ratio values either more than 1.0 or in -ve. can i stop drawing decision tree while i get values of gain ratio like that?

          1. Cross validation is not currently supported in chefboost. I plan to add this feature in the next release.

      1. I mean you didn’t calculate Entropy, Gain, Split Info, and Gain Ratio for Temp. attribute. In summarize calculated Gain and Gain Ratios table you only provide Wind, Outlook, and Humidity 80 attributes. I confused because i have calculated all attributes and i found Gain = 0.113401 and Gain Ratio = 0.305472 for Temp. 83 which makes Temp. 83 comes with both maximized Gain and Gain Ratio and should be the root node ?

        1. Right, temp is missing and I’ve added it to summary table. But in this post I check gain instead of gain ratio and outlook’s gain is 0.246 and it is greater than the gain of temperature – it was 0.113 as you mentioned.

          1. Just to make sure is the exact rule are use max Gain to choose continuous attribute value from binary split process and then use max Gain Ratio for determine the root and branches ?

          2. Nope, you can either choose max gain or gain ratio to determine the root node. Both are true but you should choose one.

          3. So if i choose to use Gain i should use Gain to choose continuous attribute value from binary split process and use Gain too for determine the root node but, if choose to use Gain Ratio i should use Gain Ratio in choose continuous attribute value from binary split and determine the root node ?

          4. One more question sir, which one should i choose when there is duplicate Gain or Gain Ratio, in this case i try use Gain Ratio metric and max Gain Ratio for Humidity is 0.128515 where it’s in Humidity 65 and Humidity 95

          5. Interesting case. I haven’t have a similar case before. You might try to find max gain ratio, if there is a duplicate record, then you might find the highest gain in this case. Because, both metrics are meaningful. If duplication still appears, then you might choose the first one.

        2. Additionally, I’ve put what would built decision tree look like if we use gain ratio metric. As you mentioned, temperature would be in the root node in this case.

  10. Thanks for the post.
    I don’t understand one thing – you have calculated the best split for continuous values (humidity and temperature) globally. However, when the root node is Outlook, shouldn’t you calculate the best split again for the part of the tree we are already in?

    For example, in Figure 1, I think we should calculate the optimal split for the Humidity assuming that Outlook == Sunny, because we are already in a subspace defined by Outlook | Sunny.

    I’ve implemented your algorithm in my version and got the same tree apart from the value that Humidity should be splat on 🙂

    1. Hello, we already calculated the best splits again when outlook is the root node.

  11. 1. if parent entropy is less than child entropy can we split the child node (can child node become subtree)?
    2. if one branch is pure set(4 yes/ 0 no) and another branch
    impure(3 yes / 3 no). what should we do now. can we split the impure node(3 yes/3no) when the parent entropy is less than child entropy?

    1. 2- I mostly return the first class for impure branches.
      1- Parent and child nodes are totally different. You should not split child node.

      1. Thank you so much for your immediate response. still more i shall ask question later. i just need some confirmation. so that i need to ask question. Thank you for your reply in advance.

  12. after selecting outlook attribute as root node..
    we have three branches from outlook that is “sunny” ,” overcast” and “Rain” right…
    After this the next step you do, when you consider “outlook=sunny” then how table becomes short??? I cannot understand?

    1. I filtered the raw data set satisfying outlook==sunny instances. Then, I will calculate gain / gain ratio pair for this filtered data set.

    1. No, it should not. Because adaboost uses a weak classifier (decision stamps) but C4.5 is strong enough.

    1. Suppose that your column has values in scale of [0, +∞]. If you set -1 to missing values, then the algorithm can handle to build decision trees now. To sum up, find the minimum value in that column, and set a smaller value to missing ones.

  13. In the Gain ratio based C4.5 decision tree in the ‘Bonus’ section, I think the tree is wrong. In “outlook=sunny” -> “humidity>65” -> “wind=strong” -> “return no”, and also in “outlook=sunny” -> “humidity>65” -> “wind=weak” -> “return no”, I would like to tell you that there is 1 ‘yes’ decision each for wind=strong and wind=weak.

    Please have a look at the dataset and let me know if the decision tree given in ‘Bonus’ section perfectly works for the given dataset or not.

    1. The tree in the bonus section was built with information gain instead of gain ratio. It is all up to you choosing the metric. I mean that your tree is valid as well.

  14. Thank you for your quick response of my previous question. I want to apply chefboost for analysis of different algorithms, how to apply it for confusion matrix with training and testing dataset?

    1. You need to call its prediction function extract confusion matrix by yourself. As I mentioned, I should add this feature in the next release.

  15. May I know how the decision tree would look like if gain ratio metric is used? Temperature would be the root node because it has the highest gain ratio metric but how does it proceed from there?

  16. What i don’t realize is in truth how you’re now not actually much more well-appreciated than you may be right now. You are so intelligent. You know thus considerably in relation to this subject, produced me personally imagine it from numerous numerous angles. Its like women and men aren’t involved except it’s one thing to accomplish with Girl gaga! Your own stuffs nice. Always deal with it up!|

  17. Hey Sefik, I really appreciate your effort on this wonderful library. Anyway, Is there any walkaround when testing using a raw, undiscritized test data? I’ve constructed the decision tree using a discritized train data (using Fayyad-Irani’s EBB) and when I tried to run a test on it, I got some error message instead.

    PS: For your information, I’m using traffic flow data so it’s all number.

    Thank you

  18. Sir, I have a doubt regarding continuous dataset [Iris dataset], threshold value is found for continuous dataset. Then, based on the threshold value, which gives the highest gain ratio, the decision node is selected and the branches hold the value of threshold ‘less than or equal to [svalue]’ and ‘ greater than [svalue]’. My question is should we remove the best_attribute once selected or can we have that same attribute for next iteration also. [In ID3 we have to remove the best attribute once selected as decision node but in C4.5 algorithm that too in continuous dataset; what should we do sir?]

  19. Sir, I am not good in Coding specially in Python . I don’t even know which code i need to write to get Split Info and Gain Ratio as well as Max Gain ratio.I tried non-numerical values dataset to implement C4.5 Algorithm. Can you please help this out. 🙏

  20. Hi, I have a question. If say the rules.py file is very large then do you have any method from which I can get a decision tree. as in any packages which can directly give the decision tree.

Comments are closed.