Weka is pretty cool tool for small sized ML projects. Previously, we’ve already built a regression model with Weka and mentioned how to re-use it again. We can also consume Weka to build classification models. Besides, the both models show similarity. In this post, we would apply supervised learning for Exclusive OR (aka XOR) dataset and build both regression and classification models with Weka in Java.
🙋♂️ You may consider to enroll my top-rated machine learning course on Udemy
Regression
Regression models have one output item and they produce continuous outputs in scale of (-∞, +∞). The following dataset would be consumed for training. As seen, result column has continuous outpus.
x1,x2,result
0,0,0
0,1,1
1,0,1
1,1,0
Prediction would be made by distributionForInstance function. Calling distributionForInstance will produce 1D array, and this array includes one item for regression studies. That’s why, we can directly get its first item value.
double prediction = model.distributionForInstance(newInstance)[0];
Classification
In contrast, classification models have multiple output items and they produce discontinuous outputs in scale of [0, +1]. Moreover, only an output item should produce value of 1 whereas the others should produce 0. The decision will be made by checking which output node has the value of 1. That’s why, solution way changes a tiny bit for classification studies.
The following dataset would be used for training. As seen, result column has nominal values.
x1,x2,result
0,0,false
0,1,true
1,0,true
1,1,false
Similar to regression, we could predict the class of new instance with distributionForInstance function. This function produces one dimensional array consisting two items which stands for true and false. Then, index of the greatest item in the array would be our classification result.
double[] distributions = model.distributionForInstance(newInstance); double maxValue = -1, maxIndex = -1; for(int j=0;j<distributions.length;j++){ System.out.println("class_"+j+": "+distributions[j]+" ~ "+Math.round(distributions[j])); if(distributions[j] > maxValue){ maxIndex = j; maxValue = distributions[j]; } } System.out.println("classified as class_"+maxIndex+" ("+100*maxValue+"%)");
Also, we can calculate the index of highest item with classifyInstance function. The both classifiedIndex and maxIndex variables should be equal.
double classifiedIndex = model.classifyInstance(newInstance);
Until now, we’ve predicted the class index of new instance but results are String in classification dataset. We should get the class name as nominal text instead of class index. Firstly, we should set the class index of new instance. Secondly, we can use stringValue function to get the class name.
newInstance.setClassValue(classifiedIndex); String classifiedText = newInstance.stringValue(newInstance.numAttributes() - 1); System.out.println("classified as: "+classifiedText);
Only a output class can produce value of 1 for different instances as illustrated above. In this way, we can classify new instances.
Finally, we can get all available instances as demonstrated below
String allClasses = newInstance.attribute(newInstance.numAttributes() - 1)
All available classes would be dumped as:
@attribute result {false,true}
So, we’ve previously built regression models with Weka in our Java codes. Today, we’ve mentioned how to apply classification in similar way. Thus, we can apply both regression and classification in our studies. As usual, source code of the project is shared on GitHub.
Support this blog if you do like!