Homer Sometimes Nods: Error Metrics in Machine Learning

Even the worthy Homer sometimes nods. The idiom means even the most gifted person occasionally makes mistakes. We would adapt this sentence to machine learning lifecycle. Even the best ML-models should make mistakes (or else overfitting problem). The important thing is know how to measeure errors. There are lots of metrics for measuring forecasts. In this post, we will mention evalution metrics meaningful for ML studies.


🙋‍♂️ You may consider to enroll my top-rated machine learning course on Udemy

Decision Trees for Machine Learning

homer-doh-small
Homer says D’oh when something bad has happened

Sign of actual and predicted value diffence should not be considered when calculation total error of a system. Otherwise, total error of a series including equally high underestimations and overestimations might measure very low error. In fact, forecasts should include low underestimations and overestimations, and total error should be measured low. Discarding sign values provides to get rid of this negative effect. Squaring differences enables discarding signs. This metric is called as Mean Squared Error or mostly MSE.

Of course, calculating absolute differences (Mean Absolute Error or mostly MAE) enables discarding signs too but it deprives of two important advantages. Firstly, square of a number is greater than it except if the number is between 0 and 1. Thus, MSE allows to reflect small errors to total error as smaller whereas it allows to contribute large errors to total error larger. Secondly, squared difference function has a meaningful derivative. On the other hand, absolute difference function is a first order equation and its derivative is 1. That’s why, derivative of squared error would be reflected when backpropage in neural networks.

What’s more, correlation aims to extract how strongly pair of dataset are related. Temperature and ice cream sales comparison is the most common instance of correlation. In our case, we would apply correlation on actual and expected series. Correlation coefficient ranges from -1 to +1. Coefficient closes to -1 or +1 for strongly related datasets. Sign of the coefficient states direction of relation (directly or indirectly related). Neutral means no relationship between datasets. However, correlation would not give a opinion about deviatios or error. A pair of dataset could have high correlation whereas datasets might have high deviations.

Confusion Matrix

These metrics are meaningful for mostly regression studies. In contrast, they lose the significicance for classificiation studies. Suppose that you are working on forecasting rare events, e.g. diagnosing a disease. In this case, count of ones forecasted as patient but actually healthy, and forecasted as healthy but actually patient are important metrics, too. Two dimentional confusion matrix should be applied to measure system performance as illustrated below.

confusion-matrix
Sample Confusion Matrix for Disease Detection

There are 4 different outcomes based on confusion matrix.

True Positive: Ones classified as having the disease and they really have the disease.

True Negative: Ones classified as being healthy and they are really healthy.

False Positive: Ones classified as having the disease but they are actually healthy.





False Negative: Ones classified as being healthy but they actually have the disease.

So, these outcomes reveal the following metrics.

Precision: How many of patient classified ones are really patient.

Recall: How many of really patient ones are classified as patient.

Herein, precision is more convenient for fraud detection case whereas recall is more convenient for cancer diagnosis case.

ROC Curve and AUC

However, sometimes precision and recall cannot be enough. Some classifiers such as neural networks or logistic regression return probabilities instead of custom classes. Herein, precision, recall and accuracy would be different based on the threshold you decided to classify a prediction. At this point, ROC Curve and AUC score help us to evaluate built machine learning models.

Conclusion

So, there are three kinds of lies: Lies, damned lies, and statistics. Indeed, facts might be abused or manipulated by ones who know statistics well. That’s why, marketing staff might reference correlation metric to pretend to appear more successful whereas scientists might reference MSE to get down to a fine art. Hopefully, you have these metrics down to a fine art.


Like this blog? Support me on Patreon

Buy me a coffee