Classifying Instances with Weka In Java

Weka is pretty cool tool for small sized ML projects. Previously, we’ve already built a regression model with Weka and mentioned how to re-use it again. We can also consume Weka to build classification models. Besides, the both models show similarity. In this post, we would apply supervised learning for Exclusive OR (aka XOR) dataset and build both regression and classification models with Weka in Java.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

Weka_logo_transperent-3-1
Weka and Java come together

Regression

Regression models have one output item and they produce continuous outputs in scale of (-∞, +∞). The following dataset would be consumed for training. As seen, result column has continuous outpus.

x1,x2,result
0,0,0
0,1,1
1,0,1
1,1,0

xor-regression-model
Regression model for Exclusive OR

Prediction would be made by distributionForInstance function. Calling distributionForInstance will produce 1D array, and this array includes one item for regression studies. That’s why, we can directly get its first item value.

double prediction = model.distributionForInstance(newInstance)[0];
weka-regression-results
Results for regression model

Classification

In contrast, classification models have multiple output items and they produce discontinuous outputs in scale of [0, +1]. Moreover, only an output item should produce value of 1 whereas the others should produce 0. The decision will be made by checking which output node has the value of 1. That’s why, solution way changes a tiny bit for classification studies.

The following dataset would be used for training. As seen, result column has nominal values.

x1,x2,result
0,0,false
0,1,true
1,0,true
1,1,false

xor-classification-model
Classification model for Exclusive OR

Similar to regression, we could predict the class of new instance with distributionForInstance function. This function produces one dimensional array consisting two items which stands for true and false. Then, index of the greatest item in the array would be our classification result.

double[] distributions = model.distributionForInstance(newInstance);
double maxValue = -1, maxIndex = -1;

for(int j=0;j<distributions.length;j++){
 System.out.println("class_"+j+": "+distributions[j]+" ~ "+Math.round(distributions[j]));

 if(distributions[j] > maxValue){
  maxIndex = j;
  maxValue = distributions[j];
 }
}

System.out.println("classified as class_"+maxIndex+" ("+100*maxValue+"%)");

Also, we can calculate the index of highest item with classifyInstance function. The both classifiedIndex and maxIndex variables should be equal.





double classifiedIndex = model.classifyInstance(newInstance);

Until now, we’ve predicted the class index of new instance but results are String in classification dataset. We should get the class name as nominal text instead of class index. Firstly, we should set the class index of new instance. Secondly, we can use stringValue function to get the class name.

newInstance.setClassValue(classifiedIndex);
String classifiedText = newInstance.stringValue(newInstance.numAttributes() - 1);

System.out.println("classified as: "+classifiedText);
weka-classification-results
Results for classification model

Only a output class can produce value of 1 for different instances as illustrated above. In this way, we can classify new instances.

Finally, we can get all available instances as demonstrated below

String allClasses = newInstance.attribute(newInstance.numAttributes() - 1)

All available classes would be dumped as:

@attribute result {false,true}

So, we’ve previously built regression models with Weka in our Java codes. Today, we’ve mentioned how to apply classification in similar way. Thus, we can apply both regression and classification in our studies. As usual, source code of the project is shared on GitHub.


Like this blog? Support me on Patreon

Buy me a coffee