A Step by Step ID3 Decision Tree Example

Decision tree algorithms transfom raw data to rule based decision making trees. Herein, ID3 is one of the most common decision tree algorithm. It was firstly introduced in 1986 and it is acronym of Iterative Dichotomiser. Dichotomisation means dividing into two completely opposite things. The algorithm iteratively divides attributes into two groups which are the most dominant attribute and others to construct a tree. Firstly, it calculates the entrophy and information gains of each atrribute and found the most dominant attribute. Then, the most dominant one is put on the tree as decision node. After then, entrophy and gain scores would be calculated again among the other attributes. Thus, the next most dominant attribute is found. This procedure continues until a decision is reached for every branch. That’s why, it is called Iterative Dichotomiser.

premonition_cover
Sandra Bullock, Premonition (2007)

For instance, the following table informs about decision making factors to play tennis at outside for previous 14 days.

Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

ID3 algorithm is basically based on the following two formulas.

Entrophy(S) = ∑ – p(I) . log2p(I)

Gain(S, A) = Entrophy(S) – ∑ [ p(S|A) . Entrophy(S|A) ]

These formulas might confuse your mind. Practicing would make it understandable.

Entrophy

Initially, we need to calculate the overall entrophy.

Decision column consists of 14 instances and includes two labels: yes and no. There are 9 decisions labeled yes, and 5 decisions labeled no.

Entrophy(Decision) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)

Entrophy(Decision) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

Now, we need to find the most dominant factor for decisioning.

Wind factor on decision

Gain(Decision, Wind) = Entrophy(Decision) – ∑ [ p(Decision|Wind) . Entrophy(Decision|Wind) ]

Wind attribute has two labels: weak and strong. We would reflect it to the formula.

Gain(Decision, Wind) = Entrophy(Decision) – [ p(Decision|Wind=Weak) . Entrophy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) . Entrophy(Decision|Wind=Strong) ]

We need to calculate (Decision|Wind=Weak) and (Decision|Wind=Strong) respectively.

Weak wind factor on decision

Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
13 Overcast Hot Normal Weak Yes

There are 8 instances for weak wind. Decision of 2 items are no, whereas 6 items are yes as illustrated below.

Entrophy(Decision|Wind=Weak) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

Entrophy(Decision|Wind=Weak) = – (2/8) . log2(2/8) – (6/8) . log2(6/8) = 0.811

Strong wind factor on decision

Day Outlook Temp. Humidity Wind Decision
2 Sunny Hot High Strong No
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
14 Rain Mild High Strong No

There are 6 instances for strong wind. Decision is divided into two equal parts.

Entrophy(Decision|Wind=Strong) = – p(No) . log2p(No) – p(Yes) . log2p(Yes)

Entrophy(Decision|Wind=Strong) = – (3/6) . log2(3/6) – (3/6) . log2(3/6) = 1

Now, we can turn back to Gain(Decision, Wind) equation.

Gain(Decision, Wind) = Entrophy(Decision) – [ p(Decision|Wind=Weak) . Entrophy(Decision|Wind=Weak) ] – [ p(Decision|Wind=Strong) . Entrophy(Decision|Wind=Strong) ]

Gain(Decision, Wind) = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] = 0.048

Calculations for wind column is over. Now, we need to apply same calculations for other columns to find the most dominant factor on decision.

Other factors on decision

So, the following gain values retrieved when similar calculations applied on the other columns.

Gain(Decision, Outlook) = 0.246

Gain(Decision, Temperature) = 0.029

Gain(Decision, Humidity) = 0.151

As seen, outlook factor on decision produces the highes score. That’s why, outlook decision will appear in the root node of the tree.

tree-v1
Root decision on the tree

Now, we need to test dataset for custom subsets of outlook attribute.

Overcast outlook on decision

The following table reveals that decision will always be yes if outlook were overcast.

Day Outlook Temp. Humidity Wind Decision
3 Overcast Hot High Weak Yes
7 Overcast Cool Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

Sunny outlook on decision

Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

There are 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes.

Gain(Outlook=Sunny|Temperature) = 0.570

Gain(Outlook=Sunny|Humidity) = 0.970

Gain(Outlook=Sunny|Wind) = 0.019

The following decision would be humidity because humidity produces the highest score if outlook were sunny.

Decision will always be no if humidity were high

Day Outlook Temp. Humidity Wind Decision
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
8 Sunny Mild High Weak No

Decision will always be yes if humidity were normal

Day Outlook Temp. Humidity Wind Decision
9 Sunny Cool Normal Weak Yes
11 Sunny Mild Normal Strong Yes

It means that we need to check the humidity and decide if outlook were sunny.

Rain outlook on decision

Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

Gain(Outlook=Rain | Temperature)

Gain(Outlook=Rain | Humidity)

Gain(Outlook=Rain | Wind)

In this case, wind produces the highest score if outlook were rain. That’s why, we need to check wind attribute in 2nd level if outlook were rain.

It is revealed that decision will always be yes if wind were weak and outlook were rain.

Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes

Also, decision will be always no if wind were strong and outlook were rain.

Day Outlook Temp. Humidity Wind Decision
6 Rain Cool Normal Strong No
14 Rain Mild High Strong No

So, decision tree looks like the following illustration.

tree-v3
Final version of decision tree

So, decision tree algorithms transfrom the raw data into rule based mechanism. In this post, we have mentioned one of the most common decision tree algorithm named as ID3. Unlike many other machine learning algorithms, they have ability to use nominal attributes. On the other hand, it is required to transform numeric attributes to nominal in ID3. Even though decision trees are powerful learning capability, they have long training time. Also, they tend to fall overfitting. They have evolved versions named random forests which tend not to fall overfitting issue and have shorter training times.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s