<- timetk::taylor_30_min data
lab 6 - Time Series Methods
BSMM 8740 Fall 2024
Introduction
In today’s lab, you’ll practice building workflows
with recipes
, parsnip
models, rsample
cross validations, and model comparison in the context of time series data.
Getting started
To complete the lab, log on to your github account and then go to the class GitHub organization and find the 2024-lab-6-[your github username] repository .
Create an R project using your 2024-lab-6-[your github username] repository (remember to create a PAT, etc.) and add your answers by editing the
2024-lab-6.qmd
file in your repository.When you are done, be sure to: save your document, stage, commit and push your work.
To access Github from the lab, you will need to make sure you are logged in as follows:
- username: .\daladmin
- password: Business507!
Remember to (create a PAT and set your git credentials)
- create your PAT using
usethis::create_github_token()
, - store your PAT with
gitcreds::gitcreds_set()
, - set your username and email with
usethis::use_git_config( user.name = ___, user.email = ___)
Packages
The Data
Today we will be using electricity demand data, based on a paper by James W Taylor:
Taylor, J.W. (2003) Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society, 54, 799-805.
The data can be found in the timetk
package as timetk::taylor_30_min
, a tibble with dimensions: 4,032 x 2
date
: A date-time variable in 30-minute incrementsvalue
: Electricity demand in Megawatts
Exercise 1: EDA
Plot the data using the functions timetk::plot_time_series
, timetk::plot_acf_diagnostics
(using 100 lags), and timetk::plot_seasonal_diagnostics
.
Exercise 2: Time scaling
The raw data has 30 minute intervals between data points. Downscale the data to 60 minute intervals, using timetk::summarise_by_time
, revising the electricity demand (value) variable by adding the two 30-minute intervals in each 60-minute interval. Assign the downscaled data to the variable taylor_60_min
.
Exercise 3: Training and test datasets
- Split the new (60 min) time series into training and test sets using
timetk::time_series_split
- set the training period (‘initial’) to ‘2 months’ and the assessment period to ‘1 weeks’
- Prepare the data resample specification with
timetk::tk_time_series_cv_plan()
and plot it withtimetk::plot_time_series_cv_plan
- Separate the training and test data sets using
rsample
.
Exercise 4: recipes
Create a base recipe (base_rec) using the formula
value ~ date
and the training data. This will be used for non-regression modelsCreate a recipe (lm_rec) using the formula
value ~ .
and thetraining
data. This will be used for regression models. For this recipe:- add time series signature features using
timetk::step_timeseries_signature
with the appropriate argument, - add a step to select the columns
value
,date_index.num
,date_month.lbl
,date_wday.lbl
,date_hour
, - add a normalization step targeting
date_index.num
, - add a step to mutate
date_hour
, changing it to a factor, - add a step to one-hot encode nominal predictors.
- add time series signature features using
Exercise 5 models
Now we will create a several models to estimate electricity demand, as follows
- Create a model specification for an exponential smoothing model using engine ‘ets’
- Create a model specification for an arima model using engine ‘auto_arima’
- Create a model specification for a linear model using engine ‘glmnet’ and penalty = 0.02, mixture = 0.5
Exercise 6 model fitting
Create a workflow for each model using workflows::workflow
.
- Add a recipe to the workflow
- the linear model uses the
lm_rec
recipe created above - the
ets
andarima
models use thebase_rec
recipe created above
- the linear model uses the
- Add a model to each workflow
- Fit with the training data
This is a good place to save, stage, commit, and push changes to your remote lab repo on GitHub. Click the checkbox next to each file in the Git pane to stage the updates you’ve made, write an informative commit message, and push. After you push the changes, the Git pane in RStudio should be empty.
Exercise 7: calibrate
In this exercise we’ll use the testing data with our fitted models.
- Create a table with the fitted workflows using
modeltime::modeltime_table
- Using the table you just created, run a calibration on the test data with the function
modeltime::modeltime_calibrate
. - Compare the accuracy of the models using the
modeltime::modeltime_accuracy()
on the results of the calibration
Exercise 8: forecast - training data
Use the calibration table with modeltime::modeltime_forecast
to graphically compare the fits to the training data with the observed values.
Exercise 9: forecast - future
Now refit the models using the full data set (using the calibration table and modeltime::modeltime_refit
). Save the result in the variable refit_tbl.
- Use the refit data in the variable refit_tbl, along with
modeltime::modeltime_forecast
and argumenth
= ‘2 weeks’ (remember to also set theactual_data
argument). This will use the models to forecast electricity demand two weeks into the future. - Plot the forecast with
modeltime::plot_modeltime_forecast
You’re done and ready to submit your work! Save, stage, commit, and push all remaining changes. You can use the commit message “Done with Lab 6!” , and make sure you have committed and pushed all changed files to GitHub (your Git pane in RStudio should be empty) and that all documents are updated in your repo on GitHub.
I will pull (copy) everyone’s repository submissions at 5:00pm on the Sunday following class, and I will work only with these copies, so anything submitted after 5:00pm will not be graded. (don’t forget to commit and then push your work!)
Grading
Total points available: 30 points.
Component | Points |
---|---|
Ex 1 - 9 | 30 |