## Introduction 🔗

Let’s say you’ve built a new machine learning model. How do you know if it’s going to make good predictions?

**How can you measure your model’s expected performance in the real world?**

Today we’ll address this question for regression models. Specifically, we’ll look at three widely used regression metrics:

- Mean Absolute Error (
**MAE**) - Mean Squared Error (
**MSE**) - Root Mean Squared Error (
**RMSE**)

Then I’ll show you how to calculate these metrics using **Python** and **Scikit-Learn**.

Let’s get started!

Image Credit: Manfred Irmer

## Regression Error 🔗

In statistics and machine learning, **regression** refers to a set of techniques used to **predict a numerical value** based on some inputs.

Suppose you want to train a model to predict airfare for US domestic flights. That would be a regression task because the output (airfare) can take on any value, say, from $100 to $1,000.

Once you’ve trained the model, you must measure its performance using a **test dataset**.

Let’s say we have a test dataset with 10 entries. We use the model to predict price for each entry. And then compare our predictions against the actual prices:

Ticket # |
Actual Price |
Predicted Price |
---|---|---|

1 | 250 | 265 |

2 | 110 | 140 |

3 | 500 | 480 |

4 | 200 | 215 |

5 | 330 | 290 |

6 | 490 | 515 |

7 | 670 | 750 |

8 | 210 | 210 |

9 | 435 | 420 |

10 | 375 | 285 |

How far off were our predictions from the actual prices? Let’s find out the **prediction error** for each entry using the below formula:

$$ Error = Actual \medspace Value - Predicted \medspace Value$$

Ticket # | Actual Price | Predicted Price | Error |
---|---|---|---|

1 | 250 | 265 | -15 |

2 | 110 | 140 | -30 |

3 | 500 | 480 | 20 |

4 | 200 | 215 | -15 |

5 | 330 | 290 | 40 |

6 | 490 | 515 | -25 |

7 | 670 | 750 | -80 |

8 | 210 | 210 | 0 |

9 | 435 | 420 | 15 |

10 | 375 | 285 | 90 |

Total | 0 |

The model **over-predicted** for some entries (in red). That is, the prediction was higher than the actual value. Thus the error is **negative**.

For other cases (in yellow), the model **under-predicted** as its prediction was lower than the actual value. Hence the error is **positive**.

Knowing individual errors for each entry is fine. But **how can we combine all of these errors to give us one metric?**

What if we add up all the errors? That won’t work. The negative errors would cancel out the positive errors.

For example, the sum of all errors in **TABLE 2** is 0. That would lead us to believe that our model is perfect. That’s not true, though - the model made incorrect predictions for 9 out of 10 test cases!

Let’s try a different approach.

## Mean Absolute Error (MAE) 🔗

As we saw above, the prediction error can be positive or negative. But **what if we focus only on the size of the error and ignore the sign?** That is, we measure the **absolute value** of the error.

In that case, we’ll treat two errors the same if they have equal size but only differ in sign (e.g., **-80** and **+80**). Both are equally off from the expected value.

We can get absolute errors by dropping the sign from all the negative values:

Ticket # | Actual Price | Predicted Price | Error | Absolute Error |
---|---|---|---|---|

1 | 250 | 265 | -15 | 15 |

2 | 110 | 140 | -30 | 30 |

3 | 500 | 480 | 20 | 20 |

4 | 200 | 215 | -15 | 15 |

5 | 330 | 290 | 40 | 40 |

6 | 490 | 515 | -25 | 25 |

7 | 670 | 750 | -80 | 80 |

8 | 210 | 210 | 0 | 0 |

9 | 435 | 420 | 15 | 15 |

10 | 375 | 285 | 90 | 90 |

Total | 0 | 330 |

And then divide the sum of absolute errors by the number of predictions. That’ll give us the **Mean Absolute Error (MAE)**:

$$ Mean \medspace Absolute \medspace Error \medspace (MAE) = \frac{Sum \medspace of \medspace Absolute \medspace Errors}{Number \medspace of \medspace Predictions}$$

Let’s apply this formula to our example:

$$ MAE = \frac{330}{10} = 33 $$

Thus, the **MAE** for our model is 33. The average difference between the predicted and actual ticket prices will be $33.

## Mean Squared Error 🔗

**MAE** treats absolute errors linearly - a change in the error will have a proportional effect on **MAE**. For example, an error of 40 is twice as bad as an error of 20.

In reality, however, we want to build models that don’t generate larger errors too often. Thus **we need a metric that penalizes larger errors more harshly than smaller ones**.

Wc can create a metric using the **square of errors**. That’ll ensure that a larger error will produce a far more pronounced effect.

Consider two error values - 20 and 40. Their squared values are 400 and 1600, respectively. Even though 40 is twice of 20, it’ll contribute 4 times to the total squared error.

Let’s calculate the **square of errors** for the airfare model:

Ticket # | Actual Price | Predicted Price | Error | Absolute Error | Squared Error |
---|---|---|---|---|---|

1 | 250 | 265 | -15 | 15 | 225 |

2 | 110 | 140 | -30 | 30 | 900 |

3 | 500 | 480 | 20 | 20 | 400 |

4 | 200 | 215 | -15 | 15 | 225 |

5 | 330 | 290 | 40 | 40 | 1600 |

6 | 490 | 515 | -25 | 25 | 625 |

7 | 670 | 750 | -80 | 80 | 6400 |

8 | 210 | 210 | 0 | 0 | 0 |

9 | 435 | 420 | 15 | 15 | 225 |

10 | 375 | 285 | 90 | 90 | 8100 |

Total | 0 | 330 | 18700 |

Dividing the sum of squared errors by the number of predictions will give us the **Mean Squared Error (MSE)**:

$$ Mean \medspace Squared \medspace Error \medspace (MSE) = \frac{Sum \medspace of \medspace Squared \medspace Errors}{Number \medspace of \medspace Predictions} $$

Let’s apply this formula to our example problem:

$$ MSE = \frac{18700}{10} = 1870 $$

## Root Mean Squared Error (RMSE) 🔗

**MSE** is a helpful metric, but **it is hard to interpret**. It, by definition, involved squaring of error terms. Thus **MSE** doesn’t have the same units as the value we want to predict.

For example, the **MSE** for our airfare prediction model is 1870. We cannot report it in dollar terms: an **MSE** of 1870 is meaningless when the price range is $100 - $1,000.

It’s easy to convert **MSE** to a value that we can understand. Taking a square root of **MSE** will give us **Root Mean Squared Error (RMSE)**:

$$ Root \medspace Mean \medspace Squared \medspace Error \medspace (RMSE) = \sqrt{ MSE }$$

Here’s the **RMSE** for our model:

$$ RMSE = \sqrt{1870} = 43.24 $$

This value makes sense. We can report that **RMSE** for our model is $43.24.

## MAE vs. RMSE 🔗

Our model’s **RMSE** ($43.24) is significantly higher than the **MAE** ($33). Why is that?

Notice in **TABLE 4** that we have two absolute errors (**80** and **90**) that are much larger than the others.

When we square all the errors to find **RMSE**, these two large errors dominate the others (see the last column in **TABLE 4**). Hence, they push **RMSE** to a considerably higher value than **MAE**.

This explains why **RMSE** would be a superior metric when we want to minimize larger errors.

## Practice using Python & Scikit-Learn 🔗

Now you are familiar with the regression metrics **MAE**, **MSE**, and **RMSE**. Let’s learn how to calculate them using **Python** and **Scikit-Learn**.

### Load Dataset 🔗

We’ll use a kaggle dataset that contains heights and weights measurements for 25,000 individuals.

We’ll first train a model to predict a person’s weight based on height. Then we’ll calculate the metrics to evaluate the model.

First off, let’s load the dataset using pandas:

```
import pandas as pd
dataset = pd.read_csv(
'SOCR-HeightWeight.csv',
# dataset has an extra index column. We don't need it.
# Just load height and weight columns
usecols=[1, 2]
)
dataset.info()
```

```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Height(Inches) 25000 non-null float64
1 Weight(Pounds) 25000 non-null float64
dtypes: float64(2)
memory usage: 390.8 KB
```

The data types for both columns look good. And there are no missing values.

Next, use the seaborn scatterplot to see if heights and weights are associated:

```
import seaborn as sns
sns.scatterplot(
x=dataset['Height(Inches)'],
y=dataset['Weight(Pounds)'],
)
```

The weight generally goes up as the height increases. So a machine learning model should be able to capture this pattern and predict the weight with reasonable accuracy.

### Build Regression Model 🔗

Let’s use linear regression to build the model. First, we store the inputs and output in separate variables:

```
# Input
X = dataset['Height(Inches)']
# Output
y = dataset['Weight(Pounds)']
```

Next, split the dataset into training and test sets. We’ll use the training set to build the model. And then evaluate the model using the test set.

```
from sklearn.model_selection import train_test_split
# 67% - training set (X_train, y_train)
# 33% - test set (X_test, y_test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)
# X_train and X_test are instances of pandas Series because
# they contain only one column. Convert them to DataFrames
X_train = X_train.to_frame()
X_test = X_test.to_frame()
```

Finally, create and train a model using **Scikit-Learn’s** **LinearRegression**:

```
from sklearn.linear_model import LinearRegression
# Create a new model
model = LinearRegression()
# build the model using the traing data
model.fit(X_train, y_train)
```

### Calculate Metrics - MAE, MSE, and RMSE 🔗

We now have a fully trained model. Its time to measure it’s performance using the metrics we learned today.

First, let’s predict the weights for the test set:

```
predicted = model.predict(X_test)
actual_vs_predicted = pd.DataFrame(
{'Actual': y_test,
'Predicted':predicted}
)
# Show first 5 rows
actual_vs_predicted.head().round(2)
```

Actual | Predicted | |
---|---|---|

7799 | 127.88 | 126.79 |

4427 | 108.97 | 117.22 |

14941 | 122.29 | 127.96 |

11644 | 118.53 | 124.57 |

15548 | 120.58 | 126.88 |

Scikit-Learn provides **built-in functions** to calculate a variety of metrics. Let’s import two of them we’ll use today:

```
from sklearn.metrics import (
mean_absolute_error, # MAE
mean_squared_error # MSE
)
```

First, we’ll compute **Mean Absolute Error (MAE)** using the function **mean_absolute_error**:

```
MAE = mean_absolute_error(
y_true=y_test, # actual values
y_pred=predicted # predicted values
)
MAE.round(2)
```

```
8.06
```

And then calculate **Mean Squared Error (MSE)** using **mean_squared_error**:

```
MSE = mean_squared_error(
y_true=y_test, # actual values
y_pred=predicted # predicted values
)
MSE.round(2)
```

```
102.62
```

**Scikit-Learn** doesn’t provide a function to provide **Root Mean Squared Error (RMSE)**. But we can get **RMSE** by taking a square root of **MSE**:

```
# Square root of MSE gives RMSE
RMSE = MSE**(1/2)
RMSE.round(2)
```

```
10.13
```

Thus our model will predict weights with **MAE** and **RMSE** of 8.06 and 10.13 pounds, respectively.

## Summary & Next Steps 🔗

This post introduced the most commonly used metrics to evaluate regression models. Let’s recap what you learned today:

- What is regression
**prediction error**? - How to use prediction errors to calculate
**MAE**,**MSE**, and**RMSE**. **MAE**vs.**RMSE**: what’s the difference, and why does it matter?- How to compute these metrics using
**Python**and**Scikit-Learn**’s built-in functions.

Now that you know regression metrics, you might wonder: * what about classification models - how do I evaluate them?* You can learn all about that

**here**and

**here**.