Dark Mode On/Off

Interactive Learning

C Language course

GO Lang course

Learn JavaScript

Learn HTML

Learn CSS

C Language

C Tutorial

C Programs (100+)

C Compiler

Execute C programs online.

C++ Language

C++ Tutorial

Standard Template Library

C++ Programs (100+)

C++ Compiler

Execute C++ programs online.

Python

Python Tutorial

Python Projects

Python Programs

Python How Tos

Numpy Module

Matplotlib Module

Tkinter Module

Network Programming with Python

Learn Web Scraping

What is Mean Squared Error, Mean Absolute Error, Root Mean Squared Error and R Squared?

Technology

In the previous post, we saw the various metrics which are used to assess a machine learning model's performance. Among those, the confusion matrix is used to evaluate a classification problem's accuracy. On the other hand, mean squared error (MSE), and mean absolute error (MAE) are used to evaluate the regression problem's accuracy.

The F1 score is useful when the size of the positive class is relatively small.

ROC Area Under Curve is useful when we are not concerned about whether the small dataset/class of dataset is positive or not, in contrast to the F1 score where the class being positive is important.

MAE, MSE, RMSE

In today's post, we will understand what MAE is and explore more about what it means to vary these metrics. In addition to this, we will discuss a few more metrics that will help us decide if the machine learning model would be useful in real-life scenarios or not.

1. What is Mean Absolute Error or MAE

Mean Absolute Error(MAE) is the mean size of the mistakes in collected predictions. We know that an error basically is the absolute difference between the actual or true values and the values that are predicted. The absolute difference means that if the result has a negative sign, it is ignored.

Hence, MAE = True values – Predicted values

MAE takes the average of this error from every sample in a dataset and gives the output.

This can be implemented using sklearn’s mean_absolute_error method:

from sklearn.metrics import mean_absolute_error

# predicting home prices in some area
predicted_home_prices = mycity_model.predict(X)
mean_absolute_error(y, predicted_home_prices)

But this value might not be the relevant aspect that can be considered while dealing with a real-life situation because the data we use to build the model as well as evaluate it is the same, which means the model has no exposure to real, never-seen-before data. So, it may perform extremely well on seen data but might fail miserably when it encounters real, unseen data.

The concepts of underfitting and overfitting can be pondered over, from here:

Underfitting: The scenario when a machine learning model almost exactly matches the training data but performs very poorly when it encounters new data or validation set.

Overfitting: The scenario when a machine learning model is unable to capture the important patterns and insights from the data, which results in the model performing poorly on training data itself.

P.S. In the upcoming posts, we will understand how to fit the model in the right way using many methods like feature normalization, feature generation, and much more.

2. What is Mean Squared Error or MSE

The Mean Absolute Error is the squared mean of the difference between the actual values and predictable values.

How do you Calculate MSE?

Steps to calculate the MSE from a set of X and Y values:

First, Find the regression line.
Insert the X values into the linear regression equation to find the new Y values (Y’).
Subtract the new Y value from the original to get the error.
Square the values that you go as errors.
Add up the errors
Find the mean.

Hence, MSE = Mean Squared Error

Here N is the total number of observations/rows in the dataset. The sigma symbol denotes the difference between actual and predicted values taken on every i value ranging from 1 to n.

This can be implemented using sklearn's mean_squared_error method:

from sklearn.metrics import mean_squared_error

actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]

mean_squared_error(actual_values, predicted_values)

In most regression problems, mean squared error is used to determine the model's performance.

3. What is Root Mean Squared Error or RMSE

RMSE is the standard deviation of the errors which occur when a prediction is made on a dataset. This is the same as MSE (Mean Squared Error) but the root of the value is considered while determining the accuracy of the model.

from sklearn.metrics import mean_squared_error
from math import sqrt

actual_values = [3, -0.5, 2, 7]
predicted_values = [2.5, 0.0, 2, 8]

mean_squared_error(actual_values, predicted_values)
# taking root of mean squared error
root_mean_squared_error = sqrt(mean_squared_error)

4. R Squared

It is also known as the coefficient of determination. This metric gives an indication of how good a model fits a given dataset. It indicates how close the regression line (i.e the predicted values plotted) is to the actual data values. The R squared value lies between 0 and 1 where 0 indicates that this model doesn't fit the given data and 1 indicates that the model fits perfectly to the dataset provided.

import numpy as np

X = np.random.randn(100)
y = np.random.randn(60) # y has nothing to do with X whatsoever

from sklearn.linear_model import LinearRegression
from sklearn.cross_validation import cross_val_score

scores = cross_val_score(LinearRegression(), X, y,scoring='r2')

Where to use which Metric to determine the Performance of a Machine Learning Model?

MAE: It is not very sensitive to outliers in comparison to MSE since it doesn't punish huge errors. It is usually used when the performance is measured on continuous variable data. It gives a linear value, which averages the weighted individual differences equally. The lower the value, the better the model's performance.

MSE: It is one of the most commonly used metrics, but least useful when a single bad prediction would ruin the entire model's predicting abilities, i.e when the dataset contains a lot of noise. It is most useful when the dataset contains outliers, or unexpected values (too high or too low values).

RMSE: In RMSE, the errors are squared before they are averaged. This basically implies that RMSE assigns a higher weight to larger errors. This indicates that RMSE is much more useful when large errors are present and they drastically affect the model's performance. It avoids taking the absolute value of the error and this trait is useful in many mathematical calculations. In this metric also, the lower the value, the better the performance of the model.

Conclusion

Mean Squared Error (MSE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-Squared (R2) are all popular metrics for assessing the precision of machine learning algorithms. MSE and MAE report the average difference between predicted and real values, whereas RMSE reports the same information but in the same unit as the objective variable. R2 is the percentage of variance in the objective variable described by the model. These metrics are useful for evaluating model success and comparing various models.

Frequently Asked Questions(FAQs)

1. What is the purpose of Mean Squared Error (MSE) in machine learning?

MSE measures the average difference between predicted and actual values.

2. How do you calculate Root Mean Squared Error (RMSE)?

RMSE is the square root of the average squared difference between predicted and actual values.

3. What is the significance of R-Squared (R²) in machine learning?

R² measures the proportion of variance in the target variable that is explained by the model.

4. How can Mean Absolute Error (MAE) help to evaluate model performance?

MAE measures the average absolute difference between predicted and actual values, providing a more easily interpretable metric for non-normal distributions.

5. How do you calculate MSE?

First, Find the regression line.
Insert the X values into the linear regression equation to find the new Y values (Y’).
Subtract the new Y value from the original to get the error.
Square the values that you go as errors.
Add up the errors
Find the mean.

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

What is Mean Squared Error, Mean Absolute Error, Root Mean Squared Error and R Squared?

Table of Contents

1. What is Mean Absolute Error or MAE

2. What is Mean Squared Error or MSE

How do you Calculate MSE?

3. What is Root Mean Squared Error or RMSE

4. R Squared

Where to use which Metric to determine the Performance of a Machine Learning Model?

Conclusion

Frequently Asked Questions(FAQs)

1. What is the purpose of Mean Squared Error (MSE) in machine learning?

2. How do you calculate Root Mean Squared Error (RMSE)?

3. What is the significance of R-Squared (R²) in machine learning?

4. How can Mean Absolute Error (MAE) help to evaluate model performance?

5. How do you calculate MSE?

You may also like:

IF YOU LIKE IT, THEN SHARE IT

RELATED POSTS