CHAID is the oldest decision tree algorithm in the history. It was raised in 1980 by Gordon V. Kass. Then, CART was found in 1984, ID3 was proposed in 1986 and C4.5 was announced in 1993. It is the acronym of chi-square automatic interaction detection. Here, chi-square is a metric to find the significance of a feature. The higher the value, the higher the statistical significance. Similar to the others, CHAID builds decision trees for classification problems. This means that it expects data sets having a categorical target variable.
Vlog
Here, you should watch the following video to understand how decision tree algorithms work. No matter which decision tree algorithm you are running: ID3, C4.5, CART, CHAID or Regression Trees. They all look for the feature offering the highest information gain. Then, they add a decision rule for the found feature and build an another decision tree for the sub data set recursively until they reached a decision.
đââď¸ You may consider to enroll my top-rated machine learning course on Udemy
Besides, regular decision tree algorithms are designed to create branches for categorical features. Still, we are able to build trees with continuous and numerical features. The trick is here that we will convert continuos features into categorical. We will split the numerical feature where it offers the highest information gain.
CHAID in Python
This blog post mentions the deeply explanation of CHAID algorithm and we will solve a problem step by step. On the other hand, you might just want to run CHAID algorithm and its mathematical background might not attract your attention.
Herein, you can find the python implementation of CHAID algorithm here. This package supports the most common decision tree algorithms such as ID3, C4.5, CART or Regression Trees, also some bagging methods such as random forest and some boosting methods such as gradient boosting and adaboost.
Here, you can find a hands-on video as well.
Objective
Decision rules will be found based on chi-square values of features.
Formula
CHAID uses chi-square tests to find the most dominant feature whereas ID3 uses information gain, C4.5 uses gain ratio and CART uses GINI index. Chi-square testing was raised by Karl Pearson. He is also the founder of correlation. Today, most programming libraries (e.g. Pandas for Python) use Pearson metric for correlation by default.
The formula of chi-square testing is easy.
â((y – y’)2 / y’)
where y is actual and y’ is expected.
Data set
We are going to build decision rules for the following data set. Decision column is the target we would like to find based on some features.
BTW, we will ignore the day column because it just states the row number.
Day | Outlook | Temp. | Humidity | Wind | Decision |
---|---|---|---|---|---|
1 | Sunny | Hot | High | Weak | No |
2 | Sunny | Hot | High | Strong | No |
3 | Overcast | Hot | High | Weak | Yes |
4 | Rain | Mild | High | Weak | Yes |
5 | Rain | Cool | Normal | Weak | Yes |
6 | Rain | Cool | Normal | Strong | No |
7 | Overcast | Cool | Normal | Strong | Yes |
8 | Sunny | Mild | High | Weak | No |
9 | Sunny | Cool | Normal | Weak | Yes |
10 | Rain | Mild | Normal | Weak | Yes |
11 | Sunny | Mild | Normal | Strong | Yes |
12 | Overcast | Mild | High | Strong | Yes |
13 | Overcast | Hot | Normal | Weak | Yes |
14 | Rain | Mild | High | Strong | No |
We need to find the most dominant feature in this data set.
Outlook feature
Outlook feature has 3 classes: sunny, rain and overcast. There are 2 decisions: yes and no. We firstly find the number of yes decisions and no decision for each class.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Sunny | 2 | 3 | 5 | 2.5 | 0.316 | 0.316 |
Overcast | 4 | 0 | 4 | 2 | 1.414 | 1.414 |
Rain | 3 | 2 | 5 | 2.5 | 0.316 | 0.316 |
Total column is the sum of yes and no decisions for each row. Expected values are the half of total column because there are 2 classes in the decision. It is easy to calculate the chi-squared values based on this table.
For example, chi-square yes for sunny outlook is â((2 – 2.5)2 / 2.5) = 0.316 whereas actual is 2 and expected is 2.5.
Chi-square value of outlook is the sum of chi-square yes and no columns.
0.316 + 0.316 + 1.414 + 1.414 + 0.316 + 0.316 = 4.092
Now, we will find chi-square values for other features. The feature having the maximum chi-square value will be the decision point.
Temperature feature
This feature has 3 classes: hot, mild and cool. The following table summarizes the chi-square values for these classes.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Hot | 2 | 2 | 4 | 2 | 0 | 0 |
Mild | 4 | 2 | 6 | 3 | 0.577 | 0.577 |
Cool | 3 | 1 | 4 | 2 | 0.707 | 0.707 |
Chi-square value of temperature feature will be
0 + 0 + 0.577 + 0.577 + 0.707 + 0.707 = 2.569
This is a value less than the chi-square value of outlook. This means that the feature outlook is more important than the feature temperature based on chi-square testing.
Humidity feature
Humidity has 2 classes: high and normal. Let’s summarize the chi-square values.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
High | 3 | 4 | 7 | 3.5 | 0.267 | 0.267 |
Normal | 6 | 1 | 7 | 3.5 | 1.336 | 1.336 |
So, the chi-square value of humidity feature is
0.267 + 0.267 + 1.336 + 1.336 = 3.207
This is less than the chi-square value of outlook as well. What about wind feature?
Wind feature
Wind feature has 2 classes: weak and strong. The following table is the pivot table.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Weak | 5 | 2 | 7 | 3.5 | 0.802 | 0.802 |
Strong | 3 | 3 | 6 | 3 | 0.000 | 0.000 |
Herein, the chi-square test value of the wind feature is
0.802 + 0.802 + 0 + 0 = 1.604
We’ve found the chi square values of all features. Let’s see them all in a table.
Feature | Chi-square value |
Outlook | 4.092 |
Temperature | 2.569 |
Humidity | 3.207 |
Wind | 1.604 |
As seen, outlook feature has the highest chi-square value. This means that it is the most significant feature. So, we will put this feature to the root node.
We’ve filtered the raw data set based on the outlook classes on the illustration above. For example, overcast branch just has yes decisions in the sub data set. This means that CHAID tree returns YES if outlook is overcast.
The both sunny and rain branches have yes and no decisions. We will apply chi-square tests for these sub data sets.
Outlook = Sunny branch
This branch has 5 instances. Now, we look for the most dominant feature. BTW, we will ignore the outlook column now because they are all same. In other words, we will find the most dominant feature among temperature, humidity and wind.
Day | Outlook | Temp. | Humidity | Wind | Decision |
1 | Sunny | Hot | High | Weak | No |
2 | Sunny | Hot | High | Strong | No |
8 | Sunny | Mild | High | Weak | No |
9 | Sunny | Cool | Normal | Weak | Yes |
11 | Sunny | Mild | Normal | Strong | Yes |
Temperature feature for sunny outlook
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Hot | 0 | 2 | 2 | 1 | 1 | 1 |
Mild | 1 | 1 | 2 | 1 | 0 | 0 |
Cool | 1 | 0 | 1 | 0.5 | 0.707 | 0.707 |
So, chi-square value of temperature feature for sunny outlook is
1 + 1 + 0 + 0 + 0.707 + 0.707 = 3.414
Humidity feature for sunny outlook
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
High | 0 | 3 | 3 | 1.5 | 1.225 | 1.225 |
Normal | 2 | 0 | 2 | 1 | 1 | 1 |
Chi-square value of humidity feature for sunny outlook is
1.225 + 1.225 + 1 + 1 = 4.449
Wind feature for sunny outlook
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Weak | 1 | 2 | 3 | 1.5 | 0.408 | 0.408 |
Strong | 1 | 1 | 2 | 1 | 0 | 0 |
Chi-square value of wind feature for sunny outlook is
0.408 + 0.408 + 0 + 0 = 0.816
We’ve found chi-square values for sunny outlook. Let’s see them all in a table.
Feature | Chi-square |
Temperature | 3.414 |
Humidity | 4.449 |
Wind | 0.816 |
Now, humidity is the most dominant feature for the sunny outlook branch. We will put this feature as a decision rule.
Now, the both humidity branches for sunny outlook have just one decisions as illustrated above. CHAID tree will return NO for sunny outlook and high humidity and it will return YES for sunny outlook and normal humidity.
Rain outlook branch
This branch still has the both yes and no decisions. We need to apply chi-square test for this branch to find an exact decisions. This branch has 5 instances as shown in the following sub data set. Let’s find the most dominant feature among temperature, humidity and wind.
Day | Outlook | Temp. | Humidity | Wind | Decision |
4 | Rain | Mild | High | Weak | Yes |
5 | Rain | Cool | Normal | Weak | Yes |
6 | Rain | Cool | Normal | Strong | No |
10 | Rain | Mild | Normal | Weak | Yes |
14 | Rain | Mild | High | Strong | No |
Temperature feature for rain outlook
This feature has 2 classes: mild and cool. Notice that even though hot temperature appears in the raw data set, this branch has no hot instance.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Mild | 2 | 1 | 3 | 1.5 | 0.408 | 0.408 |
Cool | 1 | 1 | 2 | 1 | 0 | 0 |
Chi-square value of temperature feature for rain outlook is
0.408 + 0.408 + 0 + 0 = 0.816
Humidity feature for rain outlook
This feature in this branch has 2 classes: high and normal.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
High | 1 | 1 | 2 | 1 | 0 | 0 |
Normal | 2 | 1 | 3 | 1.5 | 0.408 | 0.408 |
Chi-square value of humidity feature for rain outlook is
0 + 0 + 0.408 + 0.408 = 0.816
Wind feature for rain outlook
This feature in this branch has 2 classes: weak and strong.
Yes | No | Total | Expected | Chi-square Yes | Chi-square No | |
Weak | 3 | 0 | 3 | 1.5 | 1.225 | 1.225 |
Strong | 0 | 2 | 2 | 1 | 1 | 1 |
So, chi-squre value of wind feature for rain outlook is
1.225 + 1.225 + 1 + 1 = 4.449
We’ve found all chi-square values for rain outlook branch.Let’s see them all in a single table.
Feature | Chi-squared |
Temperature | 0.816 |
Humidity | 0.816 |
Wind | 4.449 |
So, wind feature is the winner for rain outlook branch. Put this feature in the related branch and see the corresponding sub data sets.
As seen, all branches have sub data sets having a single decision. So, we can build the CHAID tree as illustrated below.
Feature importance
Decision trees are naturally explainable and interpretable algorithms. Besides, they offer to find feature importance as well to understand built model well.
Gradient Boosting Decision Trees
Nowadays, gradient boosting decision trees are very popular in machine learning community. They are actually not different than the decision tree algorithm mentioned in this blog post. They mainly builds sequantial decision trees based on the errors in the previous loop.
Random Forest vs Gradient Boosting
The both random forest and gradient boosting are an approach instead of a core decision tree algorithm itself. They require to run core decision tree algorithms. They also build many decision trees in the background. So, we will discuss how they are similar and how they are different in the following video.
Adaboost vs Gradient Boosting
The both gradient boosting and adaboost are boosting techniques for decision tree based machine learning models. We will discuss how they are similar and how they are different than each other.
Conclusion
So, we’ve built a CHAID decision tree step by step in this post. CHAID uses chi-square metric to find the most dominant feature and apply this recursively until sub data sets having a single decision. Even though this is a legacy decision tree algorithm, it is still common way for classification problems.
Support this blog if you do like!
there is an error :
Wind feature
Wind feature has 2 classes: weak and strong. The following table is the pivot table.
Yes No Total Expected Chi-square Yes Chi-square No
Weak 5 2 7 3.5 0.802 0.802
Strong 3 3 6 3 0.000 0.000
the correct figures for weak : 6 yes 2 No and 8 for total, consequently chi is higher
Really cleared everything up and helped me in my studies! Thank you very much!
P.S.: From my understanding, CHAID isn’t actually the oldest decision tree learning algorithm, although it’s among the oldest. AID (Automatic Interaction Detector) and THAID (THeta Automatic Interaction Detector) were published in the ’70s.
import pandas as pd
from chefboost import Chefboost as cb
df = pd.read_csv(‘exECLAT.csv’)
config = {‘algorithm’: ‘CHAID’}
CHAIDtree = cb.fit(df, config)
what is Error why ??
assert group is None, ‘group argument must be None for now’
AssertionError: group argument must be None for now
A,B,C,D,E,F,Decision
2,2,0,0,0,1,1
1,3,0,1,0,3,0
3,3,3,3,3,3,0
…
Could you send me the data set?
Greeting,
How can we change the code in the case that the name of the target variable is not “Decision”‘?
I get this error
Please confirm that name of the target column is “Decision” and it is put to the right in pandas data frame
You can do it as: df = df.rename(columns = {‘Target’: ‘Decision’})
Grt work. In case we have a huge data set ~5L cases to study and run the CHAID it is taking a lot of time (more than 5-6 hrs). Is this common? Could you please help on this.
Try to enable random forest with chaid to fasten it up
How to enable random forest with chaid to fasten it up? Thanks
Note bene – This is not the complete, original CHAID algorithm. ID3, for example, will pick a value Outlook because Overcast was really good but Rainy and Sunny may not be particularly good predictors. However, in ID3, we are forced to split the data on all three feature values, which may diffuse the “signal” in the data.
What CHAID had, however, was a test to see if all the values of a feature (“categories” is what Kass called them) were good in their own right. If they weren’t, they were merged. This allows the introduction of OR. So you might have Overcast as one split and Rainy OR Sunny as a second split. (p. 121 of his paper, Steps 2 and 3 are missing here). You can see the results of CHAID on the tree of page 126. At the first layer, note the merger of categories 3 and 4.
Otherwise, you simply have ID3/C4.5 with Chi-square instead of Entropy/Information Gain as your metric but it’s not the original CHAID algorithm.
Thanks for this! A nice explanation.