Box-Cox Transformation

A Box-Cox transformation method transforms non-normal dependent variables into normal shapes. Normality is a key assumption in many statistical techniques. If your data isn’t normal, applying a Box-Cox transformation model will allow you to run more tests.

The Box Cox model transformation was named after statisticians George Box, and Sir David Roxbee Cox, who worked together on a 1964 paper to develop the technique.

Re-reading recently became a habit. Forecasting Principles and Practices To refresh your knowledge on forecasting (which you won’t guess),

The textbook is concise, simple, and easy to use for beginners. It’s a great addition to the myriad of books, YouTube videos, MOOCs, and YouTube videos that data scientists “must” know.

The textbook is also free.

Forecasting authors devote one sub-chapter for transforming data (Section3.2: “Transformations & Adjustments”), in which they cover four types of transformations.

The Box-Cox model transformation is one of these transformations and the first that I was exposed to during my undergrad.

Why would we want to transform our data?

Box-Cox method transforms data to closely match a normal distribution.

Many statistical techniques assume that errors are normally distributed. This assumption allows us construct confidence intervals, and to conduct hypothesis tests. If your target variable is not normal, you can transform it to normalize your errors.

Transforming our variables can increase the predictive power of your models, as transformations can remove white noise.

What’s the Box-Cox Transformation?

W is our transformed variable, _y is the target variable, then _t is the time period, and lambda is the parameter we choose. You can also perform the Box-Cox transform on non-time-series data.

What happens when lambda equals 1? Our data will shift down, but the shape of our data won’t change in this case. If lambda’s optimal value is 1, then data is already normally distributed and the Box-Cox method is not necessary.

How can we choose lambda?

We select the lambda value that best approximates the normal distribution for our response variable.