 AIC (Akaike's Information Criterion) - Maple Help

TimeSeriesAnalysis

 AIC
 Akaike's information criterion
 AICc
 Akaike's information criterion with sample size correction
 BIC
 Bayesian information criterion Calling Sequence AIC(model, ts, ll) AICc(model, ts, ll) BIC(model, ts, ll) Parameters

 model - ts - Time series consisting of a single data set ll - (optional) equation of the form loglikelihood = value to pass in a precomputed log likelihood value Description

 • Information criteria are functions used to evaluate goodness of fit for a model representing a time series.
 • The functions take into account both the goodness of fit itself and the number of parameters of the model: a model is better if it fits more closely and if it has fewer parameters.
 • Akaike's information criterion is defined by

$\mathrm{AIC}=2k-2l$

 where $k$ is the number of parameters and $l$ the log likelihood of obtaining the given time series from the given model.
 • Akaike's information criterion gives very good results if used to evaluate goodness of fit against a large sample size (i.e., a long time series), but for smaller sample sizes a correction is needed. This criterion is obtained as follows:

$\mathrm{AICc}=\left\{\begin{array}{cc}2k-2l+\frac{2k\left(k+1\right)}{n-k-1}& k+1

 where $k$ and $l$ are as before and $n$ is the size of the sample.
 • Finally, the Bayesian information criterion is given by

$\mathrm{BIC}=k\mathrm{log}\left(n\right)-2l$

 where $k$, $l$, and $n$ are as above.
 • The number of parameters of the model is always computed by the information criterion procedure, as is the sample size. The log likelihood can also be computed, but if the log likelihood is known beforehand (e.g. because of running the Optimize command), then it can be passed in using the loglikelihood option. This prevents recomputing the log likelihood and thereby increases efficiency very slightly. Examples

 > $\mathrm{with}\left(\mathrm{TimeSeriesAnalysis}\right):$

Consider the following time series.

 > $\mathrm{ts}≔\mathrm{TimeSeries}\left(\left[1.8,3.4,2.1,2.9,2.4,2.9,2.5,3.1\right],\mathrm{period}=2\right)$
 ${\mathrm{ts}}{≔}\left[\begin{array}{c}{\mathrm{Time series}}\\ {\mathrm{data set}}\\ {\mathrm{8 rows of data:}}\\ {\mathrm{2013 - 2020}}\end{array}\right]$ (1)

We create a list of potentially applicable models and optimize them.

 > $\mathrm{models}≔\mathrm{Specialize}\left(\mathrm{ExponentialSmoothingModel}\left(\right),\mathrm{ts}\right)$
 ${\mathrm{models}}{≔}\left[{\mathrm{< an ETS\left(A,A,A\right) model >}}{,}{\mathrm{< an ETS\left(A,A,N\right) model >}}{,}{\mathrm{< an ETS\left(A,Ad,A\right) model >}}{,}{\mathrm{< an ETS\left(A,Ad,N\right) model >}}{,}{\mathrm{< an ETS\left(A,N,A\right) model >}}{,}{\mathrm{< an ETS\left(A,N,N\right) model >}}{,}{\mathrm{< an ETS\left(M,A,A\right) model >}}{,}{\mathrm{< an ETS\left(M,A,M\right) model >}}{,}{\mathrm{< an ETS\left(M,A,N\right) model >}}{,}{\mathrm{< an ETS\left(M,Ad,A\right) model >}}{,}{\mathrm{< an ETS\left(M,Ad,M\right) model >}}{,}{\mathrm{< an ETS\left(M,Ad,N\right) model >}}{,}{\mathrm{< an ETS\left(M,M,M\right) model >}}{,}{\mathrm{< an ETS\left(M,M,N\right) model >}}{,}{\mathrm{< an ETS\left(M,Md,M\right) model >}}{,}{\mathrm{< an ETS\left(M,Md,N\right) model >}}{,}{\mathrm{< an ETS\left(M,N,A\right) model >}}{,}{\mathrm{< an ETS\left(M,N,M\right) model >}}{,}{\mathrm{< an ETS\left(M,N,N\right) model >}}\right]$ (2)
 > $\mathrm{map}\left(\mathrm{Optimize},\mathrm{models},\mathrm{ts}\right)$
 $\left[{-0.014065563}{,}{-5.416379083}{,}{0.031961103}{,}{-5.145765873}{,}{0.002827297}{,}{-5.804974813}{,}{-0.603105141}{,}{-1.517769899}{,}{-6.694364663}{,}{-0.487294335}{,}{-1.376178599}{,}{-6.769140123}{,}{-0.975463079}{,}{-6.912940013}{,}{-1.134387223}{,}{-6.715408243}{,}{-1.772582973}{,}{-2.135892023}{,}{-6.808173573}\right]$ (3)

We compute Akaike's information criterion for each model.

 > $\mathbf{for}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}m\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{in}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathrm{models}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\phantom{\rule[-0.0ex]{2.0em}{0.0ex}}\mathrm{print}\left(m,\mathrm{AIC}\left(m,\mathrm{ts}\right)\right)\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\mathbf{end}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}$
 ${\mathrm{< an ETS\left(A,A,A\right) model >}}{,}{12.01066840}$
 ${\mathrm{< an ETS\left(A,A,N\right) model >}}{,}{18.83275817}$
 ${\mathrm{< an ETS\left(A,Ad,A\right) model >}}{,}{13.89411477}$
 ${\mathrm{< an ETS\left(A,Ad,N\right) model >}}{,}{20.29153175}$
 ${\mathrm{< an ETS\left(A,N,A\right) model >}}{,}{7.994336806}$
 ${\mathrm{< an ETS\left(A,N,N\right) model >}}{,}{15.60994963}$
 ${\mathrm{< an ETS\left(M,A,A\right) model >}}{,}{13.26051935}$
 ${\mathrm{< an ETS\left(M,A,M\right) model >}}{,}{15.27182966}$
 ${\mathrm{< an ETS\left(M,A,N\right) model >}}{,}{21.38872933}$
 ${\mathrm{< an ETS\left(M,Ad,A\right) model >}}{,}{15.51428223}$
 ${\mathrm{< an ETS\left(M,Ad,M\right) model >}}{,}{16.93433454}$
 ${\mathrm{< an ETS\left(M,Ad,N\right) model >}}{,}{23.53828025}$
 ${\mathrm{< an ETS\left(M,M,M\right) model >}}{,}{13.94926348}$
 ${\mathrm{< an ETS\left(M,M,N\right) model >}}{,}{21.82588003}$
 ${\mathrm{< an ETS\left(M,Md,M\right) model >}}{,}{16.21600049}$
 ${\mathrm{< an ETS\left(M,Md,N\right) model >}}{,}{23.43081649}$
 ${\mathrm{< an ETS\left(M,N,A\right) model >}}{,}{11.54039483}$
 ${\mathrm{< an ETS\left(M,N,M\right) model >}}{,}{12.29990591}$
 ${\mathrm{< an ETS\left(M,N,N\right) model >}}{,}{17.61634715}$ (4)

The $\left(A,N,A\right)$ model has the best balance between number of parameters and goodness of fit, according to this criterion, and $\left(M,\mathrm{Ad},N\right)$ the worst.

Because the sample size is rather small, it might be useful to consider the criterion with sample size correction.

 > $\mathbf{for}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}m\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{in}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathrm{models}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\phantom{\rule[-0.0ex]{2.0em}{0.0ex}}\mathrm{print}\left(m,\mathrm{AICc}\left(m,\mathrm{ts}\right)\right)\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\mathbf{end}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}$
 ${\mathrm{< an ETS\left(A,A,A\right) model >}}{,}{96.01066840}$
 ${\mathrm{< an ETS\left(A,A,N\right) model >}}{,}{32.16609150}$
 ${\mathrm{< an ETS\left(A,Ad,A\right) model >}}{,}{\mathrm{\infty }}$
 ${\mathrm{< an ETS\left(A,Ad,N\right) model >}}{,}{50.29153175}$
 ${\mathrm{< an ETS\left(A,N,A\right) model >}}{,}{21.32767014}$
 ${\mathrm{< an ETS\left(A,N,N\right) model >}}{,}{18.00994963}$
 ${\mathrm{< an ETS\left(M,A,A\right) model >}}{,}{97.26051935}$
 ${\mathrm{< an ETS\left(M,A,M\right) model >}}{,}{99.27182966}$
 ${\mathrm{< an ETS\left(M,A,N\right) model >}}{,}{34.72206266}$
 ${\mathrm{< an ETS\left(M,Ad,A\right) model >}}{,}{\mathrm{\infty }}$
 ${\mathrm{< an ETS\left(M,Ad,M\right) model >}}{,}{\mathrm{\infty }}$
 ${\mathrm{< an ETS\left(M,Ad,N\right) model >}}{,}{53.53828025}$
 ${\mathrm{< an ETS\left(M,M,M\right) model >}}{,}{97.94926348}$
 ${\mathrm{< an ETS\left(M,M,N\right) model >}}{,}{35.15921336}$
 ${\mathrm{< an ETS\left(M,Md,M\right) model >}}{,}{\mathrm{\infty }}$
 ${\mathrm{< an ETS\left(M,Md,N\right) model >}}{,}{53.43081649}$
 ${\mathrm{< an ETS\left(M,N,A\right) model >}}{,}{24.87372816}$
 ${\mathrm{< an ETS\left(M,N,M\right) model >}}{,}{25.63323924}$
 ${\mathrm{< an ETS\left(M,N,N\right) model >}}{,}{20.01634715}$ (5)

This time, the $\left(A,N,N\right)$ model does best. Note how some of the models have a value of $\mathrm{\infty }$; this is because they have at least as many parameters as there are sample points.

Alternatively, one can use the Bayesian information criterion; it also corrects for the sample size, but not as strongly as AICc in this case.

 > $\mathbf{for}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}m\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{in}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathrm{models}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\phantom{\rule[-0.0ex]{2.0em}{0.0ex}}\mathrm{print}\left(m,\mathrm{BIC}\left(m,\mathrm{ts}\right)\right)\phantom{\rule[-0.0ex]{0.0em}{0.0ex}}\mathbf{end}\phantom{\rule[-0.0ex]{0.3em}{0.0ex}}\mathbf{do}$
 ${\mathrm{< an ETS\left(A,A,A\right) model >}}{,}{12.48731765}$
 ${\mathrm{< an ETS\left(A,A,N\right) model >}}{,}{19.15052434}$
 ${\mathrm{< an ETS\left(A,Ad,A\right) model >}}{,}{14.45020556}$
 ${\mathrm{< an ETS\left(A,Ad,N\right) model >}}{,}{20.68873946}$
 ${\mathrm{< an ETS\left(A,N,A\right) model >}}{,}{8.312102973}$
 ${\mathrm{< an ETS\left(A,N,N\right) model >}}{,}{15.76883271}$
 ${\mathrm{< an ETS\left(M,A,A\right) model >}}{,}{13.73716860}$
 ${\mathrm{< an ETS\left(M,A,M\right) model >}}{,}{15.74847891}$
 ${\mathrm{< an ETS\left(M,A,N\right) model >}}{,}{21.70649550}$
 ${\mathrm{< an ETS\left(M,Ad,A\right) model >}}{,}{16.07037302}$
 ${\mathrm{< an ETS\left(M,Ad,M\right) model >}}{,}{17.49042533}$
 ${\mathrm{< an ETS\left(M,Ad,N\right) model >}}{,}{23.93548796}$
 ${\mathrm{< an ETS\left(M,M,M\right) model >}}{,}{14.42591273}$
 ${\mathrm{< an ETS\left(M,M,N\right) model >}}{,}{22.14364620}$
 ${\mathrm{< an ETS\left(M,Md,M\right) model >}}{,}{16.77209128}$
 ${\mathrm{< an ETS\left(M,Md,N\right) model >}}{,}{23.82802420}$
 ${\mathrm{< an ETS\left(M,N,A\right) model >}}{,}{11.85816099}$
 ${\mathrm{< an ETS\left(M,N,M\right) model >}}{,}{12.61767207}$
 ${\mathrm{< an ETS\left(M,N,N\right) model >}}{,}{17.77523023}$ (6)

The Bayesian information criterion also favors the $\left(A,N,A\right)$ model. Compatibility

 • The TimeSeriesAnalysis[AIC], TimeSeriesAnalysis[AICc] and TimeSeriesAnalysis[BIC] commands were introduced in Maple 18.