The following supplemental notes were created by Dr. Maria Tackett for STA 210. They are provided for students who want to dive deeper into the mathematics behind regression and reflect some of the material covered in STA 211: Mathematics of Regression. Additional supplemental notes will be added throughout the semester.
This document provides the details for the matrix notation for multiple linear regression. We assume the reader has familiarity with some linear algebra. Please see Chapter 1 of An Introduction to Statistical Learning for a brief review of linear algebra.
Introduction
Suppose we have observations. Let the be , such that are the explanatory variables (predictors) and is the response variable. We assume the data can be modeled using the least-squares regression model, such that the mean response for a given combination of explanatory variables follows the form in Equation 1.
We can write the response for the observation as shown in Equation 2
such that is the amount deviates from , the mean response for a given combination of explanatory variables. We assume each , where is a constant variance for the distribution of the response for any combination of explanatory variables .
Matrix Representation for the Regression Model
We can represent the Equation 1 and Equation 2 using matrix notation. Let
Thus,
Therefore the estimated response for a given combination of explanatory variables and the associated residuals can be written as
Estimating the Coefficients
The least-squares model is the one that minimizes the sum of the squared residuals. Therefore, we want to find the coefficients, that minimizes
where , the transpose of the matrix .
Note that . Since these are both constants (i.e. vectors), . Thus, Equation 6 becomes
Since we want to find the that minimizes Equation 5, will find the value of such that the derivative with respect to is equal to 0.
Thus, the estimate of the model coefficients is .
Variance-covariance matrix of the coefficients
We will use two properties to derive the form of the variance-covariance matrix of the coefficients:
First, we will show that
Recall, the regression assumption that the errors are Normally distributed with mean 0 and variance . Thus, for all . Additionally, recall the regression assumption that the errors are uncorrelated, i.e. for all . Using these assumptions, we can write Equation 8 as
where is the identity matrix.
Next, we show that .
Recall that the and . Then,
Using these two properties, we derive the form of the variance-covariance matrix for the coefficients. Note that the covariance matrix is