Title Image: Regression Metrics - MAE, MSE, RMSE

3 Regression Metrics You Must Know: MAE, MSE, and RMSE

Yashmeet Singh · · 9 minute read


Let’s say you’ve built a new machine learning model. How do you know if it’s going to make good predictions?

How can you measure your model’s expected performance in the real world?

Today we’ll address this question for regression models. Specifically, we’ll look at three widely used regression metrics:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)

Then I’ll show you how to calculate these metrics using Python and Scikit-Learn.

Let’s get started!

Regression Metrics (MAE, MSE, RMSE): explained using a model that predicts airfare Image Credit:  Manfred Irmer

Regression Error

In statistics and machine learning, regression refers to a set of techniques used to predict a numerical value based on some inputs.

Suppose you want to train a model to predict airfare for US domestic flights. That would be a regression task because the output (airfare) can take on any value, say, from 100to100 to 1,000.

Once you’ve trained the model, you must measure its performance using a test dataset.

Let’s say we have a test dataset with 10 entries. We use the model to predict price for each entry. And then compare our predictions against the actual prices:

TABLE 1: Actual vs Predicted Values
1 250 265
2 110 140
3 500 480
4 200 215
5 330 290
6 490 515
7 670 750
8 210 210
9 435 420
10 375 285

How far off were our predictions from the actual prices? Let’s find out the prediction error for each entry using the below formula:

Error=ActualValuePredictedValueError = Actual \medspace Value - Predicted \medspace Value
TABLE 2: Prediction Errors
1 250 265 -15
2 110 140 -30
3 500 480 20
4 200 215 -15
5 330 290 40
6 490 515 -25
7 670 750 -80
8 210 210 0
9 435 420 15
10 375 285 90
Total 0

The model over-predicted for some entries (in red). That is, the prediction was higher than the actual value. Thus the error is negative.

For other cases (in yellow), the model under-predicted as its prediction was lower than the actual value. Hence the error is positive.

Knowing individual errors for each entry is fine. But how can we combine all of these errors to give us one metric?

What if we add up all the errors? That won’t work. The negative errors would cancel out the positive errors.

For example, the sum of all errors in TABLE 2 is 0. That would lead us to believe that our model is perfect. That’s not true, though - the model made incorrect predictions for 9 out of 10 test cases!

Let’s try a different approach.

Mean Absolute Error (MAE)

As we saw above, the prediction error can be positive or negative. But what if we focus only on the size of the error and ignore the sign? That is, we measure the absolute value of the error.

In that case, we’ll treat two errors the same if they have equal size but only differ in sign (e.g., -80 and +80). Both are equally off from the expected value.

We can get absolute errors by dropping the sign from all the negative values:

TABLE 3: Absolute Errors
Error Absolute
1 250 265 -15 15
2 110 140 -30 30
3 500 480 20 20
4 200 215 -15 15
5 330 290 40 40
6 490 515 -25 25
7 670 750 -80 80
8 210 210 0 0
9 435 420 15 15
10 375 285 90 90
Total 0 330

And then divide the sum of absolute errors by the number of predictions. That’ll give us the Mean Absolute Error (MAE):

MeanAbsoluteError(MAE)=SumofAbsoluteErrorsNumberofPredictionsMean \medspace Absolute \medspace Error \medspace (MAE) = \frac{Sum \medspace of \medspace Absolute \medspace Errors}{Number \medspace of \medspace Predictions}

Let’s apply this formula to our example:

MAE=33010=33MAE = \frac{330}{10} = 33

Thus, the MAE for our model is 33. The average difference between the predicted and actual ticket prices will be $33.

Mean Squared Error

MAE treats absolute errors linearly - a change in the error will have a proportional effect on MAE. For example, an error of 40 is twice as bad as an error of 20.

In reality, however, we want to build models that don’t generate larger errors too often. Thus we need a metric that penalizes larger errors more harshly than smaller ones.

Wc can create a metric using the square of errors. That’ll ensure that a larger error will produce a far more pronounced effect.

Consider two error values - 20 and 40. Their squared values are 400 and 1600, respectively. Even though 40 is twice of 20, it’ll contribute 4 times to the total squared error.

Let’s calculate the square of errors for the airfare model:

TABLE 4: Squared Errors
Error Absolute
1 250 265 -15 15 225
2 110 140 -30 30 900
3 500 480 20 20 400
4 200 215 -15 15 225
5 330 290 40 40 1600
6 490 515 -25 25 625
7 670 750 -80 80 6400
8 210 210 0 0 0
9 435 420 15 15 225
10 375 285 90 90 8100
Total 0 330 18700

Dividing the sum of squared errors by the number of predictions will give us the Mean Squared Error (MSE):

MeanSquaredError(MSE)=SumofSquaredErrorsNumberofPredictionsMean \medspace Squared \medspace Error \medspace (MSE) = \frac{Sum \medspace of \medspace Squared \medspace Errors}{Number \medspace of \medspace Predictions}

Let’s apply this formula to our example problem:

MSE=1870010=1870MSE = \frac{18700}{10} = 1870

Root Mean Squared Error (RMSE)

MSE is a helpful metric, but it is hard to interpret. It, by definition, involved squaring of error terms. Thus MSE doesn’t have the same units as the value we want to predict.

For example, the MSE for our airfare prediction model is 1870. We cannot report it in dollar terms: an MSE of 1870 is meaningless when the price range is 100100 - 1,000.

It’s easy to convert MSE to a value that we can understand. Taking a square root of MSE will give us Root Mean Squared Error (RMSE):

RootMeanSquaredError(RMSE)=MSERoot \medspace Mean \medspace Squared \medspace Error \medspace (RMSE) = \sqrt{ MSE }

Here’s the RMSE for our model:

RMSE=1870=43.24RMSE = \sqrt{1870} = 43.24

This value makes sense. We can report that RMSE for our model is $43.24.


Our model’s RMSE ($43.24) is significantly higher than the MAE ($33). Why is that?

Notice in TABLE 4 that we have two absolute errors (80 and 90) that are much larger than the others.

When we square all the errors to find RMSE, these two large errors dominate the others (see the last column in TABLE 4). Hence, they push RMSE to a considerably higher value than MAE.

This explains why RMSE would be a superior metric when we want to minimize larger errors.

Practice using Python & Scikit-Learn

Now you are familiar with the regression metrics MAE, MSE, and RMSE. Let’s learn how to calculate them using Python and Scikit-Learn.

Load Dataset

We’ll use a kaggle dataset that contains heights and weights measurements for 25,000 individuals.

We’ll first train a model to predict a person’s weight based on height. Then we’ll calculate the metrics to evaluate the model.

First off, let’s load the dataset using pandas:

import pandas as pd
dataset = pd.read_csv(
    # dataset has an extra index column. We don't need it. 
    # Just load height and weight columns
    usecols=[1, 2] 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 2 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Height(Inches)  25000 non-null  float64
 1   Weight(Pounds)  25000 non-null  float64
dtypes: float64(2)
memory usage: 390.8 KB

The data types for both columns look good. And there are no missing values.

Next, use the seaborn scatterplot to see if heights and weights are associated:

import seaborn as sns 

Regression Metrics (MAE, MSE, RMSE): Scatterplot between height and weight. Plotted using Seaborn scatterplot()

The weight generally goes up as the height increases. So a machine learning model should be able to capture this pattern and predict the weight with reasonable accuracy.

Build Regression Model

Let’s use linear regression to build the model. First, we store the inputs and output in separate variables:

# Input
X = dataset['Height(Inches)']
# Output
y = dataset['Weight(Pounds)']

Next, split the dataset into training and test sets. We’ll use the training set to build the model. And then evaluate the model using the test set.

from sklearn.model_selection import train_test_split
# 67% - training set (X_train, y_train)
# 33% - test set (X_test, y_test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# X_train and X_test are instances of pandas Series because
# they contain only one column. Convert them to DataFrames
X_train = X_train.to_frame()
X_test = X_test.to_frame()

Finally, create and train a model using Scikit-Learn’s LinearRegression:

from sklearn.linear_model import LinearRegression
# Create a new model
model = LinearRegression()
# build the model using the traing data
model.fit(X_train, y_train)

Calculate Metrics - MAE, MSE, and RMSE

We now have a fully trained model. Its time to measure it’s performance using the metrics we learned today.

First, let’s predict the weights for the test set:

predicted = model.predict(X_test)
actual_vs_predicted = pd.DataFrame(
    {'Actual': y_test, 
# Show first 5 rows
Actual Predicted
7799 127.88 126.79
4427 108.97 117.22
14941 122.29 127.96
11644 118.53 124.57
15548 120.58 126.88

Scikit-Learn provides built-in functions to calculate a variety of metrics. Let’s import two of them we’ll use today:

from sklearn.metrics import (
    mean_absolute_error, # MAE
    mean_squared_error # MSE

First, we’ll compute Mean Absolute Error (MAE) using the function mean_absolute_error:

MAE = mean_absolute_error(
    y_true=y_test, # actual values
    y_pred=predicted # predicted values

And then calculate Mean Squared Error (MSE) using mean_squared_error:

MSE = mean_squared_error(
    y_true=y_test, # actual values
    y_pred=predicted # predicted values

Scikit-Learn doesn’t provide a function to provide Root Mean Squared Error (RMSE). But we can get RMSE by taking a square root of MSE:

# Square root of MSE gives RMSE
RMSE = MSE**(1/2)

Thus our model will predict weights with MAE and RMSE of 8.06 and 10.13 pounds, respectively.

Summary & Next Steps

This post introduced the most commonly used metrics to evaluate regression models. Let’s recap what you learned today:

  • What is regression prediction error?
  • How to use prediction errors to calculate MAE, MSE, and RMSE.
  • MAE vs. RMSE: what’s the difference, and why does it matter?
  • How to compute these metrics using Python and Scikit-Learn’s built-in functions.

Now that you know regression metrics, you might wonder: what about classification models - how do I evaluate them? You can learn all about that here and here.

Title Image by babilkulesi