Using AIC to Test ARIMA Models

The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic.

When comparing two models, the one with the lower AIC is generally “better”. Now, let us apply this powerful tool in comparing various ARIMA models, often used to model time series.

The dataset we will use is the Dow Jones Industrial Average (DJIA), a stock market index that constitutes 30 of America’s biggest companies, such as Hewlett Packard and Boeing. First, let us perform a time plot of the DJIA data. This massive dataframe comprises almost 32000 records, going back to the index’s founding in 1896. There was an actual lag of 3 seconds between me calling the function and R spitting out the below graph!

DJIA since March 1896

Dow Jones Industrial Average since March 1896

But it immediately becomes apparent that there is a lot more at play here than an ARIMA model. Since 1896, the DJIA has seen several periods of rapid economic growth, the Great Depression, two World Wars, the Oil shock, the early 2000s recession, the current recession, etcetera. Therefore, I opted to narrow the dataset to the period 1988-1989, which saw relative stability. As is clear from the timeplot, and slow decay of the ACF, the DJIA 1988-1989 timeseries is not stationary:

Timeseries (left) and AIC (right): DJIA 1988-1989

Time plot (left) and AIC (right): DJIA 1988-1989

So, we may want to take the first difference of the DJIA 1988-1989 index. This is expressed in the equation below:

\Delta Y = Y_t - Y_{t-1}

The first difference is thus, the difference between an entry and entry preceding it. The timeseries and AIC of the First Difference are shown below. They indicate a stationary time series.

First Difference of DJIA 1988-1989: Timeseries (left) and ACF (right)

First Difference of DJIA 1988-1989: Time plot (left) and ACF (right)

Now, we can test various ARMA models against the DJIA 1988-1989 First Difference. I will test 25 ARMA models: ARMA(1,1); ARMA(1,2), … , ARMA(3,3), … , ARMA(5,5). To compare these 25 models, I will use the AIC.

Table of AICs: ARMA(1,1) through ARMA(5,5)

Table of AICs: ARMA(1,1) through ARMA(5,5)

I have highlighted in green the two models with the lowest AICs. Their low AIC values suggest that these models nicely straddle the requirements of goodness-of-fit and parsimony. I have also highlighted in red the worst two models: i.e. the models with the highest AICs. Since ARMA(2,3) is the best model for the First Difference of DJIA 1988-1989, we use ARIMA(2,1,3) for DJIA 1988-1989.

The AIC works as such: Some models, such as ARIMA(3,1,3), may offer better fit than ARIMA(2,1,3), but that fit is not worth the loss in parsimony imposed by the addition of additional AR and MA lags. Similarly, models such as ARIMA(1,1,1) may be more parsimonious, but they do not explain DJIA 1988-1989 well enough to justify such an austere model.

Note that the AIC has limitations and should be used heuristically. The above is merely an illustration of how the AIC is used. Nonetheless, it suggests that between 1988 and 1989, the DJIA followed the below ARIMA(2,1,3) model:

\Delta Y_t = \phi_2 Y_{t-2} + \phi_1 Y_{t-1} + \theta_{3} \epsilon_{t-3} + \theta_{2} \epsilon_{t-2} + \theta_1 \epsilon_{t-1} + \epsilon_{t}

Next: Determining the above coefficients, and forecasting the DJIA.

Analysis conducted on R. Credits to the St Louis Fed for the DJIA data.

Abbas Keshvani


Limits of Akaike Information Criteria (AIC)


We often use AIC to discern the best model among candidates.

Now suppose we have two (non-parametric) models, which use mass points and weights to model a random variable:

  • model A uses 4 mass points to model a random variable (i.e. the height of men in Singapore)
  • model B uses 5 mass points to mode the same random variable

We consider model A to be nested in model B. This is because model A is basically a version of model B, where one mass point is “de-activated”.

Thus, we must not use small differences in AIC or BIC alone to judge between these models. If the model with a constraint on one or more parameters (model A) is regarded as nested on within the model without the constraint (model B) , a chi-square difference test, or Likelihood Ratio (LR) test, is performed to test the reasonableness of the constraint, using a central chi-square with degrees of freedom equal to the number of parameters constraints.

However, under the null hypothesis, the parameter of interest takes its value on the boundary of the parameter space (next post). For this reason, the asymptotic distribution of the chi-square difference, or Likelihood Ratio (LR) statistic, is not that of a central chi-square distributed random variable with one degree of freedom. This boundary problem affects goodness of fit measures like AIC and BIC4. As a result, the AIC and BIC should be used heuristically, in conjunction with graphs and other criteria to evaluate estimates from the chosen model.

Abbas Keshvani