The erratic movements in the time series plot seen in the preliminary analysis section suggest modelling the data using ARIMA
models. Also, with the absence of any trend or seasonality in the time series plot, an ARIMA model again seems like a logical
To model the data using the ad hoc ARIMA method, a stationary mean is necessary by definition. The auto-correlation function
plot (ACF) above shows large positive auto-correlations that dominate the plot. This suggests a non-stationary mean.
Achieving a Stationary Mean Model
Several differencing techniques were examined in order to obtain a non-stationary mean that included:
› 1st order (yt
› 2nd order (yt
- 2 yt-1
To try and achieve a constant variance in the above plot posed a challenge. Many transformations of the original variable
(goals per game) were attempted in order to eliminate the irregular variance before differencing. Such transformations that
were employed included taking the log of goals per game, the square root of goals per game and raising goals per game to the power
of 0.25. After examination of these time series graphs, it was concluded that a first order differencing without any
transformations best represented a stationary mean model.
Determining an appropriate ARIMA model
To determine an appropriate ARIMA model it necessary to examine the ACF and PACF plots of the adjusted model (i.e. yt
- yt-1 ).
Based on the ACF and PACF plots, it is not immediately clear what model is most appropriate for this data. The possibilities
include an ARIMA model with a differencing of 1 and a moving average of 4 (MA(4)), or an ARIMA model with differencing of 1 and an
autoregressive component of level 4 (AR(4)). Each of these techniques require a cut-off of the correlation (i.e. spikes below the
confidence lines) at lag 4 on either the PACF or ACF plots and exponential decay on the other, as observed in the above plots.
The two models were fitted to the data and criteria measuring goodness of fit were examined (see Appendix D-2). The p-value
tests the hypothesis that the variable is zero, i.e is not included in the model. In the AR(4) and MA(4) model the probability
that the variables are zero are 0.8% and 2.6% respectively. The sum of squared errors (SSE) calculates the squared error terms
between the fitted model and the actual data. Lastly, the Akaike information criteria (AIC) and Scwarz-Bayesian criteria (SBC)
both measure goodness of fit and account for model complexity. The AR(4) model seems to be the best ARIMA model based on this
The output in Appendix D-2 suggests that the constant should not be included in the model. This is based on a p-value of .5288 or
52.88%. Hence, the final model, based on the analysis, is yt = y t-1 + 4 (yt-4 - yt-5) + et . The model validation procedure,
located in Appendix D-3, shows that this model is acceptable. The scatterplot of residuals versus the predicted values shows no
evidence of non-constant variance. The error ACD and PACF plots show that there is no autocorrelative pattern. Finally, the Q-Q
plot shows that the residuals resemble a normal distribution.
Using the estimated parameters we have,
yt = y t-1 + .38982( yt-4 - yt-5) + et
Based on this model, the sequence plot below was created, showing upper and lower confidence levels, as well as predictions to
the year 2005.
Interpreting the ARIMA (4,1,0) Model
While the ARIMA (4,1,0) model has the best theoretical fit (the lowest standard error, AIC, and SBC values), it is fairly
difficult to interpret logically. There does not seem to be any reasonable explanation for the correlation between the difference
in average goals per game in a season, four time periods apart. A possible explanation for this occurrence could simply be random
noise. Although number of goals per season is one of the longest series' of data in the NHL, it is still not large enough to
reasonably eliminate the possibility of overfitting the random errors. Since the 4th lag in the ACF plot was only slightly above
the confidence limits, we can reasonably suggest that this was due only to random noise. If this were true, it would likely
hinder our ability to make future predictions. Therefore, we suggest fitting an ARIMA (0,1,0) model, and the results are below.
Thus, the ARIMA (0,1,0) model makes more intuitive sense and has only a slightly worse theoretical fit than the ARIMA (4,1,0)
model. So we will use the ARIMA (0,1,0) model to forecast average goals per game in a season, in the future. This model is:
yt = yt-1 + et.