Even the worthy Homer sometimes nods. The idiom means even the most gifted person occasionally makes mistakes. We would adapt this sentence to machine learning lifecycle. Even the best ML-models should make mistakes (or else overfitting problem). The important thing is know how to measeure errors. There are lots of metrics for measuring forecasts. In this post, we will mention evalution metrics meaningful for ML studies.
Sign of actual and predicted value diffence should not be considered when calculation total error of a system. Otherwise, total error of a series including equally high underestimations and overestimations might measure very low error. In fact, forecasts should include low underestimations and overestimations, and total error should be measured low. Discarding sign values provides to get rid of this negative effect. Squaring differences enables discarding signs. This metric is called as Mean Squared Error or mostly MSE.
Of course, calculating absolute differences (Mean Absolute Error or mostly MAE) enables discarding signs too but it deprives of two important advantages. Firstly, square of a number is greater than it except if the number is between 0 and 1. Thus, MSE allows to reflect small errors to total error as smaller whereas it allows to contribute large errors to total error larger. Secondly, squared difference function has a meaningful derivative. On the other hand, absolute difference function is a first order equation and its derivative is 1. That’s why, derivative of squared error would be reflected when backpropage in neural networks.
What’s more, correlation aims to extract how strongly pair of dataset are related. Temperature and ice cream sales comparison is the most common instance of correlation. In our case, we would apply correlation on actual and expected series. Correlation coefficient ranges from -1 to +1. Coefficient closes to -1 or +1 for strongly related datasets. Sign of the coefficient states direction of relation (directly or indirectly related). Neutral means no relationship between datasets. However, correlation would not give a opinion about deviatios or error. A pair of dataset could have high correlation whereas datasets might have high deviations.
These metrics are meaningful for mostly regression studies. In contrast, they lose the significicance for classificiation studies. Suppose that you are working on forecasting rare events, e.g. diagnosing a disease. In this case, count of ones forecasted as patient but actually healthy, and forecasted as healthy but actually patient are important metrics, too. Two dimentional confusion matrix should be applied to measure system performance as illustrated below.
There are 4 different outcomes based on confusion matrix.
True Positive: Ones classified as having the disease and they really have the disease.
True Negative: Ones classified as being healthy and they are really healthy.
False Positive: Ones classified as having the disease but they are actually healthy.
False Negative: Ones classified as being healthy but they actually have the disease.
So, these outcomes reveal the following metrics.
Precision: How many of patient classified ones are really patient.
Recall: How many of really patient ones are classified as patient.
So, there are three kinds of lies: Lies, damned lies, and statistics. Indeed, facts might be abused or manipulated by ones who know statistics well. That’s why, marketing staff might reference correlation metric to pretend to appear more successful whereas scientists might reference MSE to get down to a fine art. Hopefully, you have these metrics down to a fine art.