Deriving the Least-Squares Estimates for Simple Linear Regression

Note

The following supplemental notes were created by Dr. Maria Tackett for STA 210. They are provided for students who want to dive deeper into the mathematics behind regression and reflect some of the material covered in STA 211: Mathematics of Regression. Additional supplemental notes will be added throughout the semester.

This document contains the mathematical details for deriving the least-squares estimates for slope (β1\beta_1) and intercept (β0\beta_0). We obtain the estimates, β̂1\hat{\beta}_1 and β̂0\hat{\beta}_0 by finding the values that minimize the sum of squared residuals, as shown in Equation 1.

SSR=i=1n[yiŷi]2=[yi(β̂0+β̂1xi)]2=[yiβ̂0β̂1xi]2(1) SSR = \sum\limits_{i=1}^{n}[y_i - \hat{y}_i]^2 = [y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i)]^2 = [y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i]^2 \qquad(1)

Recall that we can find the values of β̂1\hat{\beta}_1 and β̂0\hat{\beta}_0 that minimize /eq-ssr by taking the partial derivatives of Equation 1 and setting them to 0. Thus, the values of β̂1\hat{\beta}_1 and β̂0\hat{\beta}_0 that minimize the respective partial derivative also minimize the sum of squared residuals. The partial derivatives are shown in Equation 2.

SSRβ̂1=2i=1nxi(yiβ̂0β̂1xi)SSRβ̂0=2i=1n(yiβ̂0β̂1xi)(2) \begin{aligned} \frac{\partial \text{SSR}}{\partial \hat{\beta}_1} &= -2 \sum\limits_{i=1}^{n}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) \\ \frac{\partial \text{SSR}}{\partial \hat{\beta}_0} &= -2 \sum\limits_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) \end{aligned} \qquad(2)

The derivation of deriving β̂0\hat{\beta}_0 is shown in Equation 3.

SSRβ̂0=2i=1n(yiβ̂0β̂1xi)=0i=1n(yi+β̂0+β̂1xi)=0i=1nyi+nβ̂0+β̂1i=1nxi=0nβ̂0=i=1nyiβ̂1i=1nxiβ̂0=1n(i=1nyiβ̂1i=1nxi)β̂0=yβ̂1x(3) \begin{aligned}\frac{\partial \text{SSR}}{\partial \hat{\beta}_0} &= -2 \sum\limits_{i=1}^{n}(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \\&\Rightarrow -\sum\limits_{i=1}^{n}(y_i + \hat{\beta}_0 + \hat{\beta}_1 x_i) = 0 \\&\Rightarrow - \sum\limits_{i=1}^{n}y_i + n\hat{\beta}_0 + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i = 0 \\&\Rightarrow n\hat{\beta}_0 = \sum\limits_{i=1}^{n}y_i - \hat{\beta}_1\sum\limits_{i=1}^{n}x_i \\&\Rightarrow \hat{\beta}_0 = \frac{1}{n}\Big(\sum\limits_{i=1}^{n}y_i - \hat{\beta}_1\sum\limits_{i=1}^{n}x_i\Big)\\&\Rightarrow \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \\\end{aligned} \qquad(3)

The derivation of β̂1\hat{\beta}_1 using the β̂0\hat{\beta}_0 we just derived is shown in Equation 4.

SSRβ̂1=2i=1nxi(yiβ̂0β̂1xi)=0i=1nxiyi+β̂0i=1nxi+β̂1i=1nxi2=0(Fill in β̂0)i=1nxiyi+(yβ̂1x)i=1nxi+β̂1i=1nxi2=0(yβ̂1x)i=1nxi+β̂1i=1nxi2=i=1nxiyiyi=1nxiβ̂1xi=1nxi+β̂1i=1nxi2=i=1nxiyinyxβ̂1nx2+β̂1i=1nxi2=i=1nxiyiβ̂1i=1nxi2β̂1nx2=i=1nxiyinyxβ̂1(i=1nxi2nx2)=i=1nxiyinyxβ̂1=i=1nxiyinyxi=1nxi2nx2(4) \begin{aligned}&\frac{\partial \text{SSR}}{\partial \hat{\beta}_1} = -2 \sum\limits_{i=1}^{n}x_i(y_i - \hat{\beta}_0 - \hat{\beta}_1 x_i) = 0 \\&\Rightarrow -\sum\limits_{i=1}^{n}x_iy_i + \hat{\beta}_0\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = 0 \\\text{(Fill in }\hat{\beta}_0\text{)}&\Rightarrow -\sum\limits_{i=1}^{n}x_iy_i + (\bar{y} - \hat{\beta}_1\bar{x})\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = 0 \\&\Rightarrow (\bar{y} - \hat{\beta}_1\bar{x})\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow \bar{y}\sum\limits_{i=1}^{n}x_i - \hat{\beta}_1\bar{x}\sum\limits_{i=1}^{n}x_i + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow n\bar{y}\bar{x} - \hat{\beta}_1n\bar{x}^2 + \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 = \sum\limits_{i=1}^{n}x_iy_i \\&\Rightarrow \hat{\beta}_1\sum\limits_{i=1}^{n}x_i^2 - \hat{\beta}_1n\bar{x}^2 = \sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x} \\&\Rightarrow \hat{\beta}_1\Big(\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2\Big) = \sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x} \\ &\hat{\beta}_1 = \frac{\sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x}}{\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2}\end{aligned} \qquad(4)

To write β̂1\hat{\beta}_1 in a form that’s more recognizable, we will use the following:

xiyinyx=(xx)(yy)=(n1)Cov(x,y)(5) \sum x_iy_i - n\bar{y}\bar{x} = \sum(x - \bar{x})(y - \bar{y}) = (n-1)\text{Cov}(x,y) \qquad(5)

xi2nx2(xx)2=(n1)sx2(6) \sum x_i^2 - n\bar{x}^2 - \sum(x - \bar{x})^2 = (n-1)s_x^2 \qquad(6)

where Cov(x,y)\text{Cov}(x,y) is the covariance of xx and yy, and sx2s_x^2 is the sample variance of xx (sxs_x is the sample standard deviation).

Thus, applying Equation 5 and Equation 6, we have

β̂1=i=1nxiyinyxi=1nxi2nx2=i=1n(xx)(yy)i=1n(xx)2=(n1)Cov(x,y)(n1)sx2=Cov(x,y)sx2(7) \begin{aligned}\hat{\beta}_1 &= \frac{\sum\limits_{i=1}^{n}x_iy_i - n\bar{y}\bar{x}}{\sum\limits_{i=1}^{n}x_i^2 -n\bar{x}^2} \\&= \frac{\sum\limits_{i=1}^{n}(x-\bar{x})(y-\bar{y})}{\sum\limits_{i=1}^{n}(x-\bar{x})^2}\\&= \frac{(n-1)\text{Cov}(x,y)}{(n-1)s_x^2}\\&= \frac{\text{Cov}(x,y)}{s_x^2}\end{aligned} \qquad(7)

The correlation between xx and yy is r=Cov(x,y)sxsyr = \frac{\text{Cov}(x,y)}{s_x s_y}. Thus, Cov(x,y)=rsxsy\text{Cov}(x,y) = r s_xs_y. Plugging this into Equation 7, we have

β̂1=Cov(x,y)sx2=rsysxsx2=rsysx(8) \hat{\beta}_1 = \frac{\text{Cov}(x,y)}{s_x^2} = r\frac{s_ys_x}{s_x^2} = r\frac{s_y}{s_x} \qquad(8)