Written By:

12 minute read

Machine LearningMLAIMAE

Posted in Machine Learning
JULY 13, 2019

In the previous post, we saw the various metrics which are used to assess a machine learning model's performance. Among those, the **confusion matrix** is used to evaluate a classification problem's accuracy. On the other hand, **mean squared error** (MSE), and **mean absolute error** (MAE) are used to evaluate the regression problem's accuracy.

F1 score is useful when the size of the positive class is relatively small.

ROC Area Under Curve is useful when we are not concerned about whether the small dataset/class of dataset is positive or not, in contrast to F1 score where the class being positive is important.

In today's post, we will understand what MAE is and explore more about what it means to vary these metrics. In addition to this, we will discuss a few more metrics that will help us decide if the machine learning model would be useful in real-life scenarios or not.

We know that an error basically is the absolute difference between the actual or true values and the values that are predicted. Absolute difference means that if the result has a negative sign, it is ignored.

Hence, **MAE = True values – Predicted values**

MAE takes the **average** of this error from every sample in a dataset and gives the output.

This can be implemented using **sklearn**’s **mean_absolute_error** method:

```
from sklearn.metrics import mean_absolute_error
# predicting home prices in some area
predicted_home_prices = mycity_model.predict(X)
mean_absolute_error(y, predicted_home_prices)
```

But this value might not be the relevant aspect that can be considered while dealing with a real-life situation because the data we use to build the model as well as evaluate it is the same, which means the model has no exposure to real, never-seen-before data. So, it may perform extremely well on seen data but might fail miserably when it encounters real, unseen data.

The concepts of underfitting and overfitting can be pondered over, from here:

** Underfitting:** The scenario when a machine learning model almost exactly matches the training data but performs very poorly when it encounters new data or validation set.

** Overfitting:** The scenario when a machine learning model is unable to capture the important patterns and insights from the data, which results in the model performing poorly on training data itself.

**P.S.** In the upcoming posts, we will understand how to fit the model in the right way using many methods like feature normalization, feature generation and much more.

MSE is calculated by taking the average of the square of the difference between the original and predicted values of the data.

Hence, MSE =

Here `N`

is the total number of observations/rows in the dataset. The `sigma`

symbol denotes that the difference between actual and predicted values taken on every `i`

value ranging from **1 to n**.

This can be implemented using **sklearn**'s `mean_squared_error`

method:

```
from sklearn.metrics import mean_squared_error
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
mean_squared_error(actual_values, predicted_values)
```

In most of the regression problems, mean squared error is used to determine the model's performance.

RMSE is the standard deviation of the errors which occur when a prediction is made on a dataset. This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model.

```
from sklearn.metrics import mean_squared_error
from math import sqrt
actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]
mean_squared_error(actual_values, predicted_values)
# taking root of mean squared error
root_mean_squared_error = sqrt(mean_squared_error)
```

It is also known as the **coefficient of determination**. This metric gives an indication of how good a model fits a given dataset. It indicates how close the regression line (i.e the predicted values plotted) is to the actual data values. The **R squared value lies between 0 and 1** where 0 indicates that this model doesn't fit the given data and 1 indicates that the model fits perfectly to the dataset provided.

```
import numpy as np
X = np.random.randn(100)
y = np.random.randn(60) # y has nothing to do with X whatsoever
from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score
scores = cross_val_score(LinearRegression(), X, y,scoring='r2')
```

**MAE: **It is not very sensitive to outliers in comparison to MSE since it doesn't punish huge errors. It is usually used when the performance is measured on continuous variable data. It gives a linear value, which averages the weighted individual differences equally. The lower the value, better is the model's performance.

**MSE:** It is one of the most commonly used metrics, but least useful when a single bad prediction would ruin the entire model's predicting abilities, i.e when the dataset contains a lot of noise. It is most useful when the dataset contains outliers, or unexpected values (too high or too low values).

**RMSE:** In RMSE, the errors are squared before they are averaged. This basically implies that RMSE assigns a higher weight to larger errors. This indicates that RMSE is much more useful when large errors are present and they drastically affect the model's performance. It avoids taking the absolute value of the error and this trait is useful in many mathematical calculations. In this metric also, lower the value, better is the performance of the model.