MAPE v/s MAE% v/s RMSE

What does MAPE-puts-a-larger-penalty-on-negative-errors mean!?

Agrima Bahl
3 min readNov 22, 2019

In machine learning or forecasting, the error measurement between the estimated value and actual value is useful both - to assess the performance of the model, and to define the objective function of the model.

Mean Absolute Error (MAE) is a widely used simple measure of error. MAE is the mean of absolute error, i.e.

MAE is simple to compute, available in Scikit-learn, and works fairly well for regression flows. However, it comes with its limitations -

  1. MAE is on the same scale as the data being measured. However, provides little insight into the error when comparing across data of different scales.
  2. It treats large errors/ outliers and small errors the same way. If different treatment is required, RMSE or customized objective functions can be explored.

The first issue can be resolved by normalizing the error. This is currently achieved in two way -

Mean Absolute Percentage Error (MAPE) and normalized Mean Absolute Error (nMAE or MAE%).

MAPE is the absolute error normalized over the data, which allows the error to be compared across data with different scales. MAPE is the absolute error normalized over the actual value, computed for every data point and then averaged.

nMAE is different from MAPE in that the average of mean error is normalized over the average of all the actual values. ie.

The small difference in the way the error is computed can produce very different results, specially if used as an objective function. MAPE is computed over every data point and averaged, and therefore captures more errors and outliers. nMAE on the other hand can lose some of the detail because of the aggregation of errors done before the averaging.

MAPE however comes with its share of drawbacks -

  1. To compute MAPE, data points with the actual value zero need to be excluded to avoid a division by zero error.
  2. MAPE puts a larger penalty on negative errors 🙈. What this means is that for the same error, the error is higher when aᵗ < fᵗ than when aᵗ > fᵗ.

For example, for the actual value 100 and estimated value of 90, the MAPE is 0.10. For the same estimated value and actual value of 80, the MAPE is 0.125. Therefore when using MAPE as an objective function, the estimator prefers smaller values and can be biased towards negative errors.

Different error measures can target different requirements depending on datasets. MAPE and nMAE are not available on scikit-learn, so explaining the exact computation can be more useful than relying on nomenclature.

--

--