Time series methods

BSMM8740-2-R-2024F [WEEK - 6]

L.L. Odette

Recap of last week:

Last week we introduced classification and clustering methods within the Tidymodels framework in R.
Today we look at methods for analysing time series, and we use the timetk and modeltime packages in conjunction with Tidymodels to create and evaluate time series models.

This week:

Today we will explore time series - data where each observation includes a time measurement and the time measurements are ordered.
We’ll look at how to manipulate our time values, create time-based features, plot our time series, and decompose time series into components.
Finally we will use our time series for forecasting, using regression, exponential smoothing and ARIMA¹ models

Time Series Methods

Time series

A time series data is a data frame (tibble) with an ordered temporal measurement.

Why is this a separate area of study? Consider the simple linear regression model

\[ y_t=x_t^\top\beta + \epsilon_t;\;t=1,\ldots,R \]

errors should not be serially correlated for OLS estimates:

\(\mathbb{E}[\epsilon_t]=\mathbb{E}[\epsilon_t|\epsilon_{t-1},\epsilon_{t-1},\ldots]\), and
\(\mathbb{E}[\epsilon_t\epsilon_{t-j}]=0,\forall j\)

Time series: characteristics

Most economic and financial time series exhibit some form of serial correlation
- If economic output is large during the previous quarter then there is a good chance that it is going to be large in the current quarter
A change that arises in the current period may only affect other variables in the distant future
A particular shock may affect variables over successive quarters
- Hence, we need to start thinking about the dynamic structure of the system that we are investigating

Time series: dynamics

Whether time series data is used for forecasting or for testing various theories/hypotheses, we always need to identify the dynamic evolution of the variables, e.g.

\[ \begin{align*} \text{(trend)}\qquad T_{t} & =1+0.05t\qquad\\ \text{(seasonality)}\qquad S_{t} & =1.5\cos(t\pi\times0.166)\\ \text{(noise)}\qquad I_{t} & =0.5I_{t-1}+\epsilon_{t}\quad\epsilon_t\sim\mathscr{N}(0,\sigma^2) \end{align*} \]

Time series: generation

creating a time series

> set.seed(8740)
> 
> dat <- tibble::tibble(
+   date = seq(as.Date('2015-04-7'),as.Date('2020-03-22'),'2 weeks')
+ ) |> 
+   tibble::rowid_to_column("t") |> 
+   dplyr::mutate(
+     trend = 1 + 0.05 * t
+     , seasonality = 1.5 * cos(pi * t * 0.166)
+     , noise = rnorm(length(t))
+     , temp = dplyr::lag(noise)
+   ) |> 
+   tidyr::replace_na(list(temp = 0)) |> 
+   dplyr::mutate(
+     noise =
+       purrr::map2_dbl(
+         noise
+         , temp
+         , ~ .x + 0.5 * .y
+       )
+   ) |> 
+   dplyr::select(-temp)
> dat

# A tibble: 130 × 5
       t date       trend seasonality   noise
   <int> <date>     <dbl>       <dbl>   <dbl>
 1     1 2015-04-07  1.05     1.30     1.59  
 2     2 2015-04-21  1.1      0.755    0.937 
 3     3 2015-05-05  1.15     0.00942 -0.941 
 4     4 2015-05-19  1.2     -0.739   -0.460 
 5     5 2015-06-02  1.25    -1.29     0.318 
 6     6 2015-06-16  1.3     -1.50    -0.0151
 7     7 2015-06-30  1.35    -1.31    -1.23  
 8     8 2015-07-14  1.4     -0.772   -1.55  
 9     9 2015-07-28  1.45    -0.0283   0.697 
10    10 2015-08-11  1.5      0.723   -0.0404
# ℹ 120 more rows

Code

> dat |> 
+   tidyr::pivot_longer(-c(t, date)) |> 
+   ggplot(aes(x=date, y=value, color=name)) + geom_line(linewidth=1) + 
+   labs(
+     title = 'trend, seasonality, and noise'
+     , subtitle = "deterministic: trend, seasonality | stochastic: noise"
+     , color=NULL) + theme_minimal() + theme(plot.margin = margin(0,0,0,0, "cm"))

Code

> dat |> 
+   dplyr::mutate(y = trend + seasonality + noise) |> 
+   ggplot(aes(x=date, y=y)) + geom_line(linewidth=1) + geom_point(color="red") +
+   labs(
+     title = 'trend + seasonality + noise'
+     , subtitle = "deterministic: trend, seasonality | stochastic: noise") + theme_minimal() +
+   theme(plot.margin = margin(0,0,0,0, "cm"))

Time series - processes

Time series is a collection of observations indexed by the date of each observation
Using notation that starts at time, \(t=1\), and using the end point, \(t=T\) the time series is a set of observations:

\[\{y_1,y_2,y_3,…,y_T\}\]

Time index can be of any frequency (e.g. daily, quarterly, etc.)

Time series - processes

Deterministic and Stochastic Processes

deterministic processes always produce the same output from a given starting point or state
stochastic processes have indeterminacy
- Usually described by some form of statistical distribution
- Examples include: white noise processes, random walks, Brownian motions, Markov chains, martingale difference sequences, etc.

Time series - processes

Stochastic Processes - White noise

A white noise series is made of serially uncorrelated random variables with zero mean and finite variance
For example, errors may be characterised by a Gaussian white noise process, where such a variable has a normal distribution
Slightly stronger condition is that they are independent from one another

\[\epsilon_{t}\sim\text{{i.i.d.}}\mathscr{N}\left(0,\sigma_{\epsilon_{t}}^{2}\right)\]

Time series - processes

Stochastic Processes - White noise

implications:

\(\mathbb{E}\left[\epsilon_{t}\right]=\mathbb{E}\left[\left.\epsilon_{t}\right|\epsilon_{t-1},\epsilon_{t-2},\ldots\right]=0\)
\(\mathbb{E}\left[\epsilon_{t}\epsilon_{t-j}\right]=\text{cov}\left(\epsilon_{t},\epsilon_{t-j}\right)=0\)
\(\text{var}\left(\epsilon_{t}\right)=\text{cov}\left(\epsilon_{t},\epsilon_{t}\right)=\sigma_{\epsilon_{t}}^{2}\)

Time series - processes

Stochastic Processes - Random walk

Random walk definition implies that the effect of a shock is permanent. \[y_t=y_{t-1}+\epsilon_t\]
Given the starting value \(y_0\), and using recursive substitution, this process could be represented as \[y_t=y_0+\sum_{j=1}^t\epsilon_t\]
Since the effect of past shocks do not dissipate we say it has an infinite memory

Time series - processes

Stochastic Processes - Random walk + drift

Random walk plus a constant term. \[y_t=\mu+y_{t-1}+\epsilon_t\]
Given the starting value \(y_0=0\), and using recursive substitution, this process could be represented as \[y_t=\mu t+\sum_{j=1}^t\epsilon_t\]
Shocks have permanent effects and are influenced by drift

Time series - processes

Stochastic Processes - Random walks

Characteristics:

Independence:
- The steps in a random walk are independent of each other. The future position depends only on the current position and a random step.
Types:
- Simple Random Walk: The step sizes are often \(\pm 1\), with equal probability.
- General Random Walk: The step sizes can follow any distribution.
Applications:
- Stock price movements, particle diffusion, and population genetics.

Time series - processes

Stochastic Processes - Markov chains

Characteristics:

State Space: - A Markov chain consists of a set of states and transition probabilities between these states.
Markov Property:
- The probability of transitioning to the next state depends only on the current state: \(\mathbb{P}(X_{t+1} = s' | X_t = s, X_{t-1}, ..., X_0) = \mathbb{P}(X_{t+1} = s' | X_t = s)\).
Transition Matrix:
- The transitions are governed by a matrix of probabilities, where each entry \(\mathsf{P}_{i,j}\) represents the probability of moving from state \(i\) to state \(j\).
Types:
- Discrete-Time Markov Chain: The process is observed at discrete time intervals.
- Continuous-Time Markov Chain: The process evolves continuously over time.
Applications:
- Weather modeling, queueing theory, board games (like Monopoly), speech recognition and speech generation.

Time series - processes

Stochastic Processes - Markov chains

In the discrete case, given a state transition matrix \(A\) and an initial state \(\pi_0\), then

\[ \begin{align*} \pi_1 & = \pi_0A\\ \pi_2 & = \pi_1A = \pi_0A^2\\ &\vdots\\ \pi_n & = \pi_0A^n \end{align*} \]

If the markov chain has a stationary distribution \(\pi^{s}\), then \(\pi^{s}A=\pi^{s}\) by definition and \(\pi^{s}(I-A)=0\) determines \(\pi^{s}\) up to a constant.

Time series - processes

Autoregressive Processes

In an AR(1) process the current value is a linear function of the previous value.

\[ y_t=\phi_1 y_{t-1}+\epsilon_t \]

Fixing the starting value at \(y_0=0\), and with repeated substitution, this process could be represented as

\[ y_t=\sum_{j=1}^t\phi_1^{t-j}\epsilon_j \]

The distribution of each error term is \(\epsilon_t=\mathscr{N}(0,\sigma^2)\) with \(\mathbb{E}[\epsilon_i\epsilon_j]=0,\,\forall i\ne j\), and we can generalize to several lags (i.e. an AR(p) model).

Time series - processes

Moments of distribution

The first moment of a stochastic process is the average of \(y_t\) over all possible realisations \[\hat{y}=\mathbb{E}[y_t];\;\;t=1,\ldots,T\]
The second moment is defined as the variance \[\text{var}(y_t)=\mathbb{E}[y_t\times y_t]=\mathbb{E}[y_t- \mathbb{E}[y_t]^2];\;\;t=1,\ldots,T\]
The covariance, for \(j\) \[\text{cov}(y_t,y_{t-j})=\mathbb{E}[y_t\times y_{t-j}]=\mathbb{E}[(y_t- \mathbb{E}[y_t])(y_{t-j}- \mathbb{E}[y_{t-j}])];\;\;t=1,\ldots,T\]

Time series - processes

Conditional moments

Conditional distribution is based on past realisations of a random variable
For the AR(1) model (where \(\epsilon_t\) are iid Gausian and \(|\phi|<0\)) \[y_t=\phi y_{t-1}+\epsilon_t\]
The conditional moments are \[ \begin{align*} \mathbb{E}\left[\left.y_{t}\right|y_{t-1}\right] & =\phi y_{t-1}\\ \text{var}\left(\left.y_{t}\right|y_{t-1}\right) & =\mathbb{E}\left[\phi y_{t-1}+\epsilon_{t}-\phi y_{t-1}\right]^{2}=\mathbb{E}\left[\epsilon\right]^{2}=\sigma^{2}\\ \text{cov}\left(\left.y_{t}\right|y_{t-1},\left.y_{t-j}\right|y_{t-j-1}\right) & =0;\;j>1 \end{align*} \]

Time series - processes

Conditional moments

Conditioning on \(y_{t-2}\) for \(y_t\) \[ \begin{align*} \mathbb{E}\left[\left.y_{t}\right|y_{t-2}\right] & =\phi^{2}y_{t-2}\\ \text{var}\left(\left.y_{t}\right|y_{t-2}\right) & =\left(1+\phi^{2}\right)\sigma^{2}\\ \text{cov}\left(\left.y_{t}\right|y_{t-2},\left.y_{t-j}\right|y_{t-j-2}\right) & =\phi\sigma^{2};\;j=1\\ & =0;\;j>1 \end{align*} \]

Time series - processes

Unconditional moments

For the AR(1) model (where \(\epsilon_t\) are iid Gausian and \(|\phi|<1\)) \[y_t=\phi y_{t-1}+\epsilon_t\]
The unconditional moments are (assuming stationarity and \(\phi<1\)) \[ \begin{align*} \mathbb{E}\left[y_{t}\right] & =0\\ \text{var}\left(y_{t}\right) & =\text{var}\left(\phi y_{t-1}+\epsilon_t\right)\\ & = \phi^2 \text{var}\left(y_{t-1}\right) + \text{var}\left(\epsilon_{t}\right)\\ & = \frac{\sigma^2}{1-\phi^2}\\ \text{cov}\left(y_{t},y_{t-k}\right) & =\phi^k\frac{\sigma^2}{1-\phi^2} \end{align*} \]

Time series - processes

Stationarity

A time series is strictly stationary if for any \(\{j_1,j_2,\ldots,j_n\}\)
the joint distribution of \(\{y_t,y_{t+j_1},y_{t+j_2},\ldots,y_{t+j_n}\}\)
depends only on the intervals separating the dates \(\{j_1,j_2,\ldots,j_n\}\)
and not on the date \(t\) itself

Time series - processes

Covariance stationary

If neither the mean \(\hat{y}\) nor the covariance \(\text{cov}(y_t,y_{t-j})\) depend on \(t\)
The the process for \(y_t\) is said to be covariance (weakly) stationary, where \(\forall t,j\) \[ \begin{align*} \mathbb{E}\left[y_{t}\right] & =\bar{y}\\ \mathbb{E}\left[\left(y_{t}-\bar{y}\right)\left(y_{t-j}-\bar{y}\right)\right] & =\text{cov}\left(y_{t},y_{t-j}\right) \end{align*} \]
Note that the process \(y_t=\alpha t+\epsilon_t\) would not be stationary, as the mean clearly depends on \(t\)
We saw that the unconditional moments of the AR(1) with \(|ϕ|<1\) had a mean and covariance that did not depend on time

Time series - processes

Autocorrelation function (ACF)

For a stationary process we can plot the standardised covariance of the process over successive lags
The autocovariance function is denoted by \(\gamma (j)\equiv\text{cov}(y_t,y_{t-j})\) for \(t=1,\ldots,T\)
Tha autocovariance function is standardized by dividing each function by the variance, giving the ACF for successive values of \(j\)

\[\rho(j)\equiv\frac{\gamma(j)}{\gamma(0)}\]
To display the results of the ACF we usually plot \(ρ(j)\) against (non-negative) \(j\) to illustrate the degree of persistence in a variable

Time series - processes

Partial autocorrelation function (PACF)

With an AR(1) process \(y_t=\phi y_{t-1}+\epsilon_t\), the ACF would suggest \(y_t\) and \(y_{t-2}\) are correlated, even though \(y_{t-2}\) does not appear in the model.
This is due to the pass through, where we noted that \(y_t=\phi^2y_{t-2}\) when performing recursive substitution
PACF eliminates the effects of passthrough and puts the focus on the independent relationship between \(y_t\) and \(y_{t_2}\)

Time series - processes

Partial autocorrelation function (PACF)

demonstration of an AR(1) process and its ACF

> library(ggfortify)
> # Parameters for the AR(1) process
> phi = 0.8  # Autoregressive coefficient (should be less than 1 in absolute value)
> n = 100    # Number of observations
> 
> # Simulate AR(1) process
> set.seed(123)  # For reproducibility
> epsilon = rnorm(n)  # White noise
> X = rep(0, n)  # Initialize the series
> 
> # Generate the AR(1) series
> for (t in 2:n) {
+   X[t] = phi * X[t-1] + epsilon[t]
+ }
> 
> # Plot the AR(1) series
> tibble::tibble(y=X, x=1:length(X)) |> ggplot(aes(x=x, y=y)) + geom_line() + 
+   labs(title="AR(1) Process", x = "Time", y = "Value")
> # Plot the autocorrelation function (ACF)
> autoplot(stats::acf(X, plot = FALSE)) + labs(title = "Autocorrelation of AR(1) Process")
> # Plot the autocorrelation function (ACF)
> autoplot(stats::pacf(X, plot = FALSE)) + labs(title = "Partial Autocorrelation of AR(1) Process")

Time series - processes

Autoregressive Processes

The characteristic equation for an AR(p) process is derived from the autoregressive parameters \((\phi_1, \phi_2, \ldots, \phi_p)\).
The characteristic equation is: \(1-\phi_1 z-\phi_2 z^2-\cdots-\phi_p z^p = 0\)
A process \({y_t}\) is strictly stationary if for each \(k\) and \(t\), and \(n\), the distribution of \({y_t,\ldots,y_{t+k}}\) is the same as the distribution of \({y_{t+n},\ldots,y_{t+k+n}}\)
For an AR process to be stationary (its statistical properties do not change over time), the parameters \(\phi_1, \phi_2, \ldots, \phi_p\) must satisfy certain conditions (typically related to the roots of the characteristic equation lying outside the unit circle).

Time series - processes

Autoregressive Processes

The AR(p) model is sometimes expressed in terms of the Lag operator \(\mathrm{L}\), where \(\mathrm{L}y_t=y_{t-1}\).
The lag operator can be raised to powers, e.g. \(\mathrm{L}^2y_t=y_{t-2}\) and its powers can be combined into polynomials to form a new operator: \(a(\mathrm{L})=a_0+a_1\mathrm{L}+a_2\mathrm{L}^2+\ldots+a_p\mathrm{L}^p\) such that \(a(\mathrm{L})y_t=a_0y_t+a_1y_{t-1}+a_2y_{t-2}+\ldots+a_py_{t-p}\)
Lag polymonials can be multiplied and multiplication commutes: \(a(\mathrm{L})b(\mathrm{L})=b(\mathrm{L})a(\mathrm{L})\).
We define \((1-\rho\mathrm{L})^{-1}\) by \((1-\rho\mathrm{L})(1-\rho\mathrm{L})^{-1}\equiv1\).

Time series - processes

Autoregressive Processes

per the formula for the geometric series, if \(|\rho|<1, then\)

\[ (1-\rho\mathrm{L})^{-1} = \sum_{i=0}^\infty\rho^i\mathrm{L}^i \]

EXERCISE: check that \((1-\rho\mathrm{L})(1-\rho\mathrm{L})^{-1}\equiv1\) holds on both sides of the equation above.
If the sum is to converge, then we need \(|\rho|<1\)

Time series - processes

Autoregressive Processes

For the AR(1) process, using the lag operator we can write:

\[ \begin{align*} y_t & = \phi y_{t-1}+\epsilon_t\\ (1-\phi L)y_t & = \epsilon_t\\ y_t & = (1-\phi L)^{-1} \epsilon_t\\ & = (\sum_{i=0}^\infty\phi^i\mathrm{L}^i)\epsilon_t\\ & = \sum_{j=1}^t\phi^{t-j}\epsilon_j \end{align*} \] - EXERCISE: check that the last equality holds.

Time series - processes

Autoregressive Processes

AR(p) processes can be written as AR(1) vector processes, e.g. for AR(2)

\[ \left(\begin{array}{c} y_{t}\\ y_{t-1} \end{array}\right)=\left(\begin{array}{cc} \phi_{1} & \phi_{2}\\ 1 & 0 \end{array}\right)\left(\begin{array}{c} y_{t-1}\\ y_{t-2} \end{array}\right)+\left(\begin{array}{c} \epsilon_{t}\\ 0 \end{array}\right) \] where the matrix \(A\) here is (i.e. AR(2)):

\[ A=\left(\begin{array}{cc} \phi_{1} & \phi_{2}\\ 1 & 0 \end{array}\right) \]

Time series - processes

Autoregressive Processes

Repeating the argument for AR(1) processes

\[ \begin{align*} \vec{y}_t & = (1-A L)^{-1} \vec{\epsilon}_t\\ & = (\sum_{i=0}^\infty A^i\mathrm{L}^i)\vec{\epsilon}_t\\ \end{align*} \]

And we only converge if \(A^i\rightarrow0 \;\mathrm{as}\;i\rightarrow\infty\), i.e. all eigenvalues are < 1.

Time series - processes

Autoregressive Processes

In general if an \(n\times n\) matrix \(A\) has \(n\) distinct eigenvalues, then all eigenvalues must have magnitude \(<1\) for \(A^i\rightarrow0 \;\mathrm{as}\;i\rightarrow\infty\).
In this case we have \(A=X\Lambda X^{-1}\) where the columns of \(x\) are the eigenvectors of \(A\) and \(\Lambda\) is a diagonal matrix with the eigenvalues on the diagonal.
\(A^i=X\Lambda^i X^{-1}\) so \(A^i\rightarrow0 \;\mathrm{as}\;i\rightarrow\infty\)

Time series - processes

Computing determinants

Minor:
- The minor \(M_{ij}\) of an element \(a_{ij}\) in an \(n \times n\) matrix \(A\) is the determinant of the \((n-1) \times (n-1)\) matrix that results from removing the \(i\)-th row and \(j\)-th column from \(A\).
Cofactor:
- The cofactor \(C_{ij}\) of an element \(a_{ij}\) is given by: \(C_{ij} = (-1)^{i+j} M_{ij}\)
- The sign factor \((-1)^{i+j}\) alternates according to the position of the element in the matrix.

Cofactor Expansion

The determinant of an \(n \times n\) matrix \(A\) can be computed by expanding along any row or column. The cofactor expansion along the \(i\)-th row is given by:

\[ \det(A) = \sum_{j=1}^n a_{ij} C_{ij} \] Similarly, the cofactor expansion along the \(j\)-th column is: \(\det(A) = \sum_{i=1}^n a_{ij} C_{ij}\)

Time series - processes

Computing determinants

The structure of the AR(1) vector process makes it easy to compute the polynomial for the determinant in terms of the coefficients of the process.

The roots of the polynomial can be computed using polyroot, passing in a vector of polynomial coefficients in increasing order. The magnitudes of the eigenvalues can using the base function Mod.

polyroot(c(1,.2,3,.6)) |> Mod()

Time series - processes

Moving Average Processes

In an MA(q) process the present value is a weighted sum on the current and previous errors. \[y_t= \epsilon_t +\theta_1\epsilon_{t-1};\;\text(MA(1))\]
MA(q) models describe processes where it takes a bit of time for the error (or “shock”) to dissipate
This type of expression may be used to describe a wide variety of stationary time series processes

Time series - processes

Moving Average Processes

There is a duality of AR() and MA() processes.
We have already seen how AR processes can be expressed as MA() processes:

\[ \vec{y}_t = (1-A L)^{-1} \vec{\epsilon}_t = (\sum_{i=0}^\infty A^i\mathrm{L}^i)\vec{\epsilon}_t\\ \]

and MA() processes can be expressed as AR processes by a similar argument.

Time series - processes

ARMA Processes

A combination of an AR(1) and a MA(1) process is termed an ARMA(1,1) observation. \[y_t=\phi y_{t-1}+\epsilon_t+\theta\epsilon_{t-1}\]
An ARMA(p,q) process takes the form \[y_t=\sum_{j=1}^p\phi_jy_{t-j}+\epsilon_t+\sum_{i=1}^q\theta_i\epsilon_{t-i}\]
This model was popularized by Box & Jenkins, who developed a methodology that may be used to identify the terms that should be included in the model

Time series - processes

Long memory & fractional differencing

Most AR(p), MA(q) and ARMA(p,q) processes are termed short-memory process because the coefficients in the representation are dominated by exponential decay
Long-memory (or persistent) time series are considered intermediate compromises between the short-memory models and integrated nonstationary processes

Time series - processes

ARIMA

AutoRegressive Integrated Moving Average combines three key concepts: autoregression (AR), differencing (I - for Integrated), and moving average (MA).
AR (AutoRegressive): captures the relationship between an observation and lagged observations.
I (Integrated): Differencing is used to make the time series stationary.
MA (Moving Average): captures the relationship between an observation and a residual error from a moving average model.
An ARIMA model is denoted as ARIMA(p, d, q), where: - p is the number of lag observations in the model (AR part), - d is the number of times that the raw observations are differenced (I part), - q is the size of the moving average window (MA part).

Time series - exponential smoothing

Exponential smoothing or exponential moving average (EMA) is a rule of thumb technique for smoothing time series data using the exponential window function.

For a raw data series \(\{x_t\}\) the output of the exponential smoothing process is \(\{y_t\}\), where, for \(0\le\alpha\le1\) and \(t>0\)

\[ \begin{align*} y_0 & = x_0 \\ y_t & = \alpha x_t + (1-\alpha)y_{t-1} \end{align*} \]

\(\alpha\) is called the smoothing factor.

Time series - exponential smoothing

The time constant \(\tau\) is the amount of time for the smoothed response of a unit step function to reach \(1-1/e\approx 63.2\%\) of the original signal.

\[ \begin{align*} \alpha & = 1-e^{-\Delta T/\tau}\\ \tau & = -\frac{\Delta T}{\log(1-\alpha)} \end{align*} \]

where \(\Delta T\) is the time interval.

Time series - exponential smoothing

We can also include a trend term

\[ \begin{align*} y_0 & = x_0 \\ b_0 & = x_1 - x_0\\ \end{align*} \]

and for \(t>0\), \(0\le\alpha\le1\) and \(0\le\beta\le1\)

\[ \begin{align*} y_t & = \alpha x_t + (1-\alpha)(y_{t-1} + b_{t-1})\\ b_t & = \beta(y_t-y_{t-1}) + (1-\beta)b_{t-1} \end{align*} \]

Time series - exponential smoothing

Finally we can also include a seasonality term, with a cycle length of \(K\) time intervals.

and for \(t>0\), \(0\le\alpha\le1\), \(0\le\beta\le1\) and \(0\le\gamma\le1\)

\[ \begin{align*} y_t & = \alpha \frac{x_t}{c_{t-L}} + (1-\alpha)(y_{t-1} + b_{t-1})\\ b_t & = \beta(y_t-y_{t-1}) + (1-\beta)b_{t-1}\\ c_t & = \gamma\frac{x_t}{y_t} + (1-\gamma)c_{t-L} \end{align*} \]

Time series - ETS models

ETS stands for Error-Trend-Seasonality, and the exponential smoothing model is clearly in this class, with each component additive. The more general taxonomy is:

Error: “Additive” (A), or “Multiplicative” (M);
Trend: “None” (N), “Additive” (A), “Additive damped” (Ad), “Multiplicative” (M), or “Multiplicative damped” (Md);
Seasonality: “None” (N), or “Additive” (A), or “Multiplicative” (M).

In this taxonomy, the exponential smoothing model is denoted as ETS(A,A,A).

Time series - ETS models

The additive ETS models are shown below, and more detailed discussion can be found here

The Timetk package

Time series - plotting

> timetk::bike_sharing_daily |> 
+   dplyr::slice_head(n=5) |> 
+   dplyr::glimpse()

Rows: 5
Columns: 16
$ instant    <dbl> 1, 2, 3, 4, 5
$ dteday     <date> 2011-01-01, 2011-01-02, 2011-01-03, 2011-01-04, 2011-01-05
$ season     <dbl> 1, 1, 1, 1, 1
$ yr         <dbl> 0, 0, 0, 0, 0
$ mnth       <dbl> 1, 1, 1, 1, 1
$ holiday    <dbl> 0, 0, 0, 0, 0
$ weekday    <dbl> 6, 0, 1, 2, 3
$ workingday <dbl> 0, 0, 1, 1, 1
$ weathersit <dbl> 2, 2, 1, 1, 1
$ temp       <dbl> 0.344167, 0.363478, 0.196364, 0.200000, 0.226957
$ atemp      <dbl> 0.363625, 0.353739, 0.189405, 0.212122, 0.229270
$ hum        <dbl> 0.805833, 0.696087, 0.437273, 0.590435, 0.436957
$ windspeed  <dbl> 0.160446, 0.248539, 0.248309, 0.160296, 0.186900
$ casual     <dbl> 331, 131, 120, 108, 82
$ registered <dbl> 654, 670, 1229, 1454, 1518
$ cnt        <dbl> 985, 801, 1349, 1562, 1600

Time series - plotting

The timetk::plot_time_series() function is a good way to to get a quick timeseries plot. From a tidy table we

select the time value and the columns we want to plot
pivot (longer) the columns we want to plot
plot

The timetk::plot_time_series() function has many options that can be used to customize the plot.

Time series - plotting

Code

> timetk::bike_sharing_daily |> 
+   dplyr::select(dteday, casual, registered) |> 
+   tidyr::pivot_longer(-dteday) |> 
+   timetk::plot_time_series(
+     .date_var = dteday
+     , .value = value
+     , .color_var = name
+   )

Time series - `timetk::`

time downscaling

time-downscale the bike sharing data

> timetk::bike_sharing_daily |> 
+   timetk::summarise_by_time(
+     .date_var = dteday
+     , .by = "week"
+     , .week_start = 7
+     , causal = sum(casual)
+     , registered = mean(registered)
+     , max_cnt = max(cnt)
+   )

# A tibble: 106 × 4
   dteday     causal registered max_cnt
   <date>      <dbl>      <dbl>   <dbl>
 1 2010-12-26    331       654      985
 2 2011-01-02    745      1235.    1606
 3 2011-01-09    477      1167.    1421
 4 2011-01-16    706      1183.    1927
 5 2011-01-23    632       994.    1985
 6 2011-01-30    550      1314.    1708
 7 2011-02-06   1075      1450.    1746
 8 2011-02-13   2333      1734.    2927
 9 2011-02-20   1691      1405.    1969
10 2011-02-27   2120      1631.    2402
# ℹ 96 more rows

Time series - `timetk::`

time upscaling

time-upscale the bike sharing data

> timetk::bike_sharing_daily |> 
+   dplyr::select(dteday, casual) |> 
+   timetk::pad_by_time(.date_var = dteday, .by = "hour") |> 
+   timetk::mutate_by_time(
+     .date_var = dteday
+     , .by = "day"
+     , casual = sum(casual,na.rm=T)/24
+   )

# A tibble: 17,521 × 2
   dteday              casual
   <dttm>               <dbl>
 1 2011-01-01 00:00:00   13.8
 2 2011-01-01 01:00:00   13.8
 3 2011-01-01 02:00:00   13.8
 4 2011-01-01 03:00:00   13.8
 5 2011-01-01 04:00:00   13.8
 6 2011-01-01 05:00:00   13.8
 7 2011-01-01 06:00:00   13.8
 8 2011-01-01 07:00:00   13.8
 9 2011-01-01 08:00:00   13.8
10 2011-01-01 09:00:00   13.8
# ℹ 17,511 more rows

Time series - `timetk::`

time filtering

filter the bike sharing data by date range

> timetk::bike_sharing_daily |>
+   timetk::filter_by_time(
+     .date_var = dteday
+     , .start_date="2012-01-15"
+     , .end_date = "2012-07-01"
+   ) |> 
+   timetk::plot_time_series(.date_var = dteday, casual)

Time series - `timetk::`

time offsets

filter the bike sharing data by offset

> require(timetk, quietly = FALSE)
> timetk::bike_sharing_daily |>
+   timetk::filter_by_time(
+     .date_var = dteday
+     , .start_date="2012-01-15"
+     , .end_date = "2012-01-15" %+time% "12 weeks"
+   ) |> 
+   timetk::plot_time_series(.date_var = dteday, casual)

Time series - `timetk::`

mutate by period

add columns using a period (rolling window)

> timetk::bike_sharing_daily |>
+   dplyr::select(dteday, casual) |> 
+   timetk::mutate_by_time(
+     .date_var = dteday
+     , .by = "7 days"
+     , casual_mean = mean(casual)
+     , casual_median = median(casual)
+     , casual_max = max(casual)
+     , casual_min = min(casual)
+   )

# A tibble: 731 × 6
   dteday     casual casual_mean casual_median casual_max casual_min
   <date>      <dbl>       <dbl>         <dbl>      <dbl>      <dbl>
 1 2011-01-01    331       144             120        331         82
 2 2011-01-02    131       144             120        331         82
 3 2011-01-03    120       144             120        331         82
 4 2011-01-04    108       144             120        331         82
 5 2011-01-05     82       144             120        331         82
 6 2011-01-06     88       144             120        331         82
 7 2011-01-07    148       144             120        331         82
 8 2011-01-08     68        46.1            43         68         25
 9 2011-01-09     54        46.1            43         68         25
10 2011-01-10     41        46.1            43         68         25
# ℹ 721 more rows

Time series - `timetk::`

summarize by period

add columns that summarize a period (rolling window)

> timetk::bike_sharing_daily |>
+   timetk::summarize_by_time(
+     .date_var = dteday
+     , .by = "7 days"
+     , casual_mean = mean(casual)
+     , registered_mean = mean(registered)
+     , windspeed_max = max(windspeed)
+   )

# A tibble: 119 × 4
   dteday     casual_mean registered_mean windspeed_max
   <date>           <dbl>           <dbl>         <dbl>
 1 2011-01-01       144             1201.         0.249
 2 2011-01-08        46.1           1147.         0.362
 3 2011-01-15       119.            1203.         0.353
 4 2011-01-22        86              981.         0.294
 5 2011-01-29       102.            1130          0.187
 6 2011-02-01       120.            1377.         0.278
 7 2011-02-08       172.            1455.         0.418
 8 2011-02-15       366             1618.         0.507
 9 2011-02-22       233.            1546.         0.347
10 2011-03-01       243.            1495          0.343
# ℹ 109 more rows

Time series - `timetk::`

create a timeseries

> tibble::tibble(
+   date = 
+     timetk::tk_make_timeseries(
+       start_date = "2024"
+       , length_out = 100
+       , by = "month"
+     )
+   , values=1:100
+ )

# A tibble: 100 × 2
   date       values
   <date>      <int>
 1 2024-01-01      1
 2 2024-02-01      2
 3 2024-03-01      3
 4 2024-04-01      4
 5 2024-05-01      5
 6 2024-06-01      6
 7 2024-07-01      7
 8 2024-08-01      8
 9 2024-09-01      9
10 2024-10-01     10
# ℹ 90 more rows

Time series - `timetk::`

create a timeseries

add columns for holidays

> timetk::tk_make_holiday_sequence(
+   start_date = "2024"
+   , end_date = "2026"
+   , calendar = "TSX"
+ ) %>% 
+   timetk::tk_get_holiday_signature(holiday_pattern = "Thanksgiving",locale_set = "CA", exchange = "TSX") |> 
+   dplyr::slice_head(n = 6) |> 
+   dplyr::glimpse()

Rows: 6
Columns: 6
$ index              <date> 2024-01-01, 2024-02-19, 2024-02-19, 2024-02-19, 2024-03-29, 2024-…
$ exch_TSX           <dbl> 1, 1, 1, 1, 1, 1
$ locale_CA          <dbl> 0, 1, 1, 1, 0, 1
$ CA_ThanksgivingDay <dbl> 0, 0, 0, 0, 0, 0
$ JP_ThanksgivingDay <dbl> 0, 0, 0, 0, 0, 0
$ US_ThanksgivingDay <dbl> 0, 0, 0, 0, 0, 0

Time series - `timetk::`

raw
box-cox transformed

plot raw windspeed data

> # plot wind speed
> timetk::bike_sharing_daily |> 
+   timetk::plot_time_series(dteday, windspeed, .title = "Time Series - Raw")

plot transformed windspeed data

> # plot transformed speed
> timetk::bike_sharing_daily |> 
+   timetk::plot_time_series(
+     dteday
+     , timetk::box_cox_vec(windspeed, lambda="auto",  silent = T)
+     , .title = "Time Series - Box Cox Tranformed")

Time series - `timetk::`

timeseries transformations

Time series - Feature engineering

feature engineering
plot

feature engineering

> subscribers_tbl   <- readRDS("data/00_data/mailchimp_users.rds")
> 
> data_prepared_tbl <- subscribers_tbl |> 
+   timetk::summarize_by_time(optin_time, .by="day", optins=dplyr::n()) |> 
+   timetk::pad_by_time(.pad_value=0) |> 
+   # preprocessing
+   dplyr::mutate(optins_trans=timetk::log_interval_vec(optins, limit_lower=0, offset=1)) |> 
+   dplyr::mutate(optins_trans=timetk::standardize_vec(optins_trans)) |> 
+   # fix missing vals at start
+   timetk::filter_by_time(.start_date = "2018-07-03") |> 
+   # outliers clean
+   dplyr::mutate(optins_trans_cleaned = timetk::ts_clean_vec(optins_trans, period=7)) |> 
+   dplyr::mutate(optins_trans=ifelse(optin_time |> timetk::between_time("2018-11-18","2018-11-20")
+                              , optins_trans_cleaned
+                              , optins_trans
+                              )) |> 
+   dplyr::select(-optins, -optins_trans_cleaned)
> # show the dt  
> data_prepared_tbl

# A tibble: 609 × 2
   optin_time optins_trans
   <date>            <dbl>
 1 2018-07-03      -0.492 
 2 2018-07-04      -0.153 
 3 2018-07-05      -0.578 
 4 2018-07-06      -0.413 
 5 2018-07-07      -1.20  
 6 2018-07-08      -1.66  
 7 2018-07-09      -0.274 
 8 2018-07-10      -0.212 
 9 2018-07-11      -0.0986
10 2018-07-12      -0.274 
# ℹ 599 more rows

plot the new data

> data_prepared_tbl |> # plot the table
+   tidyr::pivot_longer(contains("trans")) |> 
+   timetk::plot_time_series(optin_time,value,name)

Date Features

add date features

> data_prep_signature_tbl <- 
+   data_prepared_tbl |> 
+   timetk::tk_augment_timeseries_signature(
+     .date_var = optin_time
+   )

Rows: 4
Columns: 30
$ optin_time   <date> 2018-07-03, 2018-07-04, 2018-07-05, 2018-07-06
$ optins_trans <dbl> -0.4919060, -0.1534053, -0.5779424, -0.4133393
$ index.num    <dbl> 1530576000, 1530662400, 1530748800, 1530835200
$ diff         <dbl> NA, 86400, 86400, 86400
$ year         <int> 2018, 2018, 2018, 2018
$ year.iso     <int> 2018, 2018, 2018, 2018
$ half         <int> 2, 2, 2, 2
$ quarter      <int> 3, 3, 3, 3
$ month        <int> 7, 7, 7, 7
$ month.xts    <int> 6, 6, 6, 6
$ month.lbl    <ord> July, July, July, July
$ day          <int> 3, 4, 5, 6
$ hour         <int> 0, 0, 0, 0
$ minute       <int> 0, 0, 0, 0
$ second       <int> 0, 0, 0, 0
$ hour12       <int> 0, 0, 0, 0
$ am.pm        <int> 1, 1, 1, 1
$ wday         <int> 3, 4, 5, 6
$ wday.xts     <int> 2, 3, 4, 5
$ wday.lbl     <ord> Tuesday, Wednesday, Thursday, Friday
$ mday         <int> 3, 4, 5, 6
$ qday         <int> 3, 4, 5, 6
$ yday         <int> 184, 185, 186, 187
$ mweek        <int> 1, 1, 1, 1
$ week         <int> 27, 27, 27, 27
$ week.iso     <int> 27, 27, 27, 27
$ week2        <int> 1, 1, 1, 1
$ week3        <int> 0, 0, 0, 0
$ week4        <int> 3, 3, 3, 3
$ mday7        <int> 1, 1, 1, 1

Trend

add regression

> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var = optin_time
+     , .formula = optins_trans ~ index.num
+   )

Non-linear trends

B-spline, degree 3
Cubic spline, 3 knots

Code

> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var = optin_time
+     , .formula = optins_trans ~ splines::bs(index.num, degree=3)
+     , .show_summary = FALSE
+     , .title = "B-spline, degree 3"
+   )

Code

> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var=optin_time
+     , .formula=
+       optins_trans ~ splines::ns(
+         index.num
+         , knots=quantile(index.num, probs=c(0.25, 0.5, 0.75)))
+     , .show_summary = FALSE
+     , .title = "Cubic spline, 3 knots"
+   )

Seasonality

weekly
monthly

Code

> # Weekly Seasonality
> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var=optin_time
+     , .formula=optins_trans ~ wday.lbl + splines::bs(index.num, degree=3)
+     , .show_summary = FALSE
+     , .title = "Weekday seasonality"
+   )

Code

> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var=optin_time
+     , .formula=optins_trans ~ month.lbl + splines::bs(index.num, degree=3)
+     , .show_summary = FALSE
+     , .title = "Monthly seasonality"
+   )

Seasonality

regress with a formula

> # ** Together with Trend
> model_formula_seasonality <- as.formula(
+   optins_trans ~ wday.lbl + month.lbl +
+     splines::ns(index.num
+                 , knots=quantile(index.num, probs=c(0.25, 0.5, 0.75))) + .
+ )
> data_prep_signature_tbl |> 
+   timetk::plot_time_series_regression(
+     .date_var=optin_time
+     , .formula = model_formula_seasonality
+     , .show_summary = FALSE
+     , .title = "Day and Month seasonality + Cubic spline, 3 knots"
+   )

Fourier series

show autocorrelations

> data_prep_signature_tbl |> 
+   timetk::plot_acf_diagnostics(optin_time,optins_trans)

Fourier series

add seasonality using sin and cos

> data_prep_fourier_tbl <- 
+   data_prep_signature_tbl |> 
+   timetk::tk_augment_fourier(optin_time, .periods=c(7,14,30,90,365), .K=2)
> 
> data_prep_fourier_tbl |> dplyr::slice_head(n=3) |> dplyr::glimpse()

Rows: 3
Columns: 50
$ optin_time           <date> 2018-07-03, 2018-07-04, 2018-07-05
$ optins_trans         <dbl> -0.4919060, -0.1534053, -0.5779424
$ index.num            <dbl> 1530576000, 1530662400, 1530748800
$ diff                 <dbl> NA, 86400, 86400
$ year                 <int> 2018, 2018, 2018
$ year.iso             <int> 2018, 2018, 2018
$ half                 <int> 2, 2, 2
$ quarter              <int> 3, 3, 3
$ month                <int> 7, 7, 7
$ month.xts            <int> 6, 6, 6
$ month.lbl            <ord> July, July, July
$ day                  <int> 3, 4, 5
$ hour                 <int> 0, 0, 0
$ minute               <int> 0, 0, 0
$ second               <int> 0, 0, 0
$ hour12               <int> 0, 0, 0
$ am.pm                <int> 1, 1, 1
$ wday                 <int> 3, 4, 5
$ wday.xts             <int> 2, 3, 4
$ wday.lbl             <ord> Tuesday, Wednesday, Thursday
$ mday                 <int> 3, 4, 5
$ qday                 <int> 3, 4, 5
$ yday                 <int> 184, 185, 186
$ mweek                <int> 1, 1, 1
$ week                 <int> 27, 27, 27
$ week.iso             <int> 27, 27, 27
$ week2                <int> 1, 1, 1
$ week3                <int> 0, 0, 0
$ week4                <int> 3, 3, 3
$ mday7                <int> 1, 1, 1
$ optin_time_sin7_K1   <dbl> -9.749279e-01, -7.818315e-01, 2.256296e-13
$ optin_time_cos7_K1   <dbl> -0.2225209, 0.6234898, 1.0000000
$ optin_time_sin7_K2   <dbl> 4.338837e-01, -9.749279e-01, 4.512593e-13
$ optin_time_cos7_K2   <dbl> -0.9009689, -0.2225209, 1.0000000
$ optin_time_sin14_K1  <dbl> 7.818315e-01, 4.338837e-01, -1.128148e-13
$ optin_time_cos14_K1  <dbl> -0.6234898, -0.9009689, -1.0000000
$ optin_time_sin14_K2  <dbl> -9.749279e-01, -7.818315e-01, 2.256296e-13
$ optin_time_cos14_K2  <dbl> -0.2225209, 0.6234898, 1.0000000
$ optin_time_sin30_K1  <dbl> 1.126564e-13, -2.079117e-01, -4.067366e-01
$ optin_time_cos30_K1  <dbl> -1.0000000, -0.9781476, -0.9135455
$ optin_time_sin30_K2  <dbl> -2.253127e-13, 4.067366e-01, 7.431448e-01
$ optin_time_cos30_K2  <dbl> 1.0000000, 0.9135455, 0.6691306
$ optin_time_sin90_K1  <dbl> -0.8660254, -0.8290376, -0.7880108
$ optin_time_cos90_K1  <dbl> 0.5000000, 0.5591929, 0.6156615
$ optin_time_sin90_K2  <dbl> -0.8660254, -0.9271839, -0.9702957
$ optin_time_cos90_K2  <dbl> -0.5000000, -0.3746066, -0.2419219
$ optin_time_sin365_K1 <dbl> -0.2135209, -0.2303057, -0.2470222
$ optin_time_cos365_K1 <dbl> -0.9769385, -0.9731183, -0.9690098
$ optin_time_sin365_K2 <dbl> 0.4171936, 0.4482293, 0.4787338
$ optin_time_cos365_K2 <dbl> 0.9088176, 0.8939186, 0.8779601

Visualization

regress with a formula again

> # Model
> model_formula_fourier <- 
+   as.formula(
+     optins_trans ~ . +
+       splines::ns(index.num
+                   , knots=quantile(index.num, probs=c(0.25, 0.5, 0.75)))
+   )
> 
> # Visualize
> data_prep_fourier_tbl |> 
+   timetk::filter_by_time(.start_date="2018-09-13") |> 
+   timetk::plot_time_series_regression(
+     .date_var = optin_time
+     , .formula = model_formula_fourier
+     , .show_summary = FALSE
+     , .title = "Fourier + Cubic spline, 3 knots"
+   )

Test-train splits

test/train splits in timeseries

> dat <- subscribers_tbl |> 
+   timetk::summarize_by_time(optin_time, .by="day", optins=dplyr::n()) |> 
+   timetk::pad_by_time(.pad_value=0) |> 
+   timetk::filter_by_time(.start_date = "2018-12-15")
> 
> # Split Data 80/20
> splits <- 
+   timetk::time_series_split(
+     data = dat
+     , initial = "12 months"
+     , assess = "1 months"
+   )
> 
> splits |>
+   timetk::tk_time_series_cv_plan() |>
+   timetk::plot_time_series_cv_plan(.date_var = optin_time, .value = optins)

Feature engineering w/ recipes

using recipes to engineer features

> time_rec <- dat |> 
+   recipes::recipe(optins ~ ., data = rsample::training(splits)) |> 
+   timetk::step_log_interval(optins, limit_lower = 0, offset = 1) |> 
+   recipes::step_normalize(recipes::all_outcomes()) |> 
+   timetk::step_timeseries_signature(optin_time) |> 
+   timetk::step_fourier(optin_time, period = c(7,14,30,90,365), K=2)
> 
> time_rec |> recipes::prep(training = rsample::training(splits)) |> 
+   recipes::bake(new_data = NULL) |> 
+   timetk::plot_time_series_regression(
+     .date_var = optin_time
+     , .formula = optins ~ .
+     , .show_summary = FALSE
+   )

$optins
[1] 0

$optins
[1] 400

Workflows

> # process engineering with workflows: ARIMA model
> model_spec_arima <- modeltime::arima_reg() |>
+     parsnip::set_engine("auto_arima")
> 
> recipe_spec_fourier <- 
+   recipes::recipe(
+     optins ~ optin_time
+     , data = rsample::training(splits)
+   ) |>
+     timetk::step_fourier(optin_time, period = c(7, 14, 30, 90), K = 1) 
> 
> workflow_fit_arima <- workflows::workflow() |>
+   workflows::add_recipe(recipe_spec_fourier) |>
+   workflows::add_model(model_spec_arima) |>
+   parsnip::fit(rsample::training(splits))

Workflows

> # process engineering with workflows: linear model
> model_spec_lm <- parsnip::linear_reg() |>
+   parsnip::set_engine("lm") 
> 
> recipe_spec_linear <- 
+   recipes::recipe(
+     optins ~ optin_time
+     , data = rsample::training(splits)
+   ) |>
+     timetk::step_fourier(optin_time, period = c(7, 14, 30, 90), K = 1) 
> 
> workflow_fit_linear <- workflows::workflow() |>
+   workflows::add_recipe(recipe_spec_linear) |>
+   workflows::add_model(model_spec_lm) |>
+   parsnip::fit(rsample::training(splits))

Predict

forecasting with different models

> models_tbl <- 
+   modeltime::modeltime_table(workflow_fit_arima, workflow_fit_linear)
> 
> calibration_tbl <- models_tbl |>
+   modeltime::modeltime_calibrate(new_data = rsample::testing(splits))
> 
> calibration_tbl |>
+   modeltime::modeltime_forecast(
+     new_data    = rsample::testing(splits),
+     actual_data = dat
+   ) |>
+   modeltime::plot_modeltime_forecast(
+     .legend_max_width = 25, # For mobile screens
+     .interactive      = TRUE
+   )

Evaluate

evaluate model accuracy with test data

> calibration_tbl |>
+   modeltime::modeltime_accuracy() |>
+   modeltime::table_modeltime_accuracy(
+     .interactive = FALSE
+   )

Accuracy Table
.model_id	.model_desc	.type	mae	mape	mase	smape	rmse	rsq
1	REGRESSION WITH ARIMA(2,0,1)(2,0,0)[7] ERRORS	Test	60.82	65.18	0.6	72.87	124.54	0.08
2	LM	Test	60.80	65.60	0.6	68.92	126.34	0.02

Re-fit

forecast out of sample

> refit_tbl <- calibration_tbl |>
+   # use all the data
+   modeltime::modeltime_refit(data = dat)
> 
> refit_tbl |>
+   modeltime::modeltime_forecast(h = "3 months", actual_data = dat) |>
+   modeltime::plot_modeltime_forecast(
+     .legend_max_width = 12, # For mobile screens
+     .interactive      = TRUE
+   )

Recap

In this section we have worked with the tidymodels and timetk packages to build a workflow that facilitates building and evaluating multiple models.
Combined with the recipes package we now have a complete data modeling framework for timeseries.