[1] 3 5 6 7 8
Log Transformations in Linear Regression
The following supplemental notes were created by Dr. Maria Tackett for STA 210. They are provided for students who want to dive deeper into the mathematics behind regression and reflect some of the material covered in STA 211: Mathematics of Regression. Additional supplemental notes will be added throughout the semester.
This document provides details about the model interpretation when the predictor and/or response variables are log-transformed. For simplicity, we will discuss transformations for the simple linear regression model as shown in Equation 1.
All results and interpretations can be easily extended to transformations in multiple regression models.
Note: log refers to the natural logarithm.
Log-transformation on the response variable
Suppose we fit a linear regression model with , the log-transformed , as the response variable. Under this model, we assume a linear relationship exists between and , such that for some , and . In other words, we can model the relationship between and using the model in Equation 2.
If we interpret the model in terms of , then we can use the usual interpretations for slope and intercept. When reporting results, however, it is best to give all interpretations in terms of the original response variable , since interpretations using log-transformed variables are often more difficult to truly understand.
In order to get back on the original scale, we need to use the exponential function (also known as the anti-log), . Therefore, we use the model in Equation 2 for interpretations and predictions, we will use Equation 3 to state our conclusions in terms of .
In order to interpret the slope and intercept, we need to first understand the relationship between the mean, median and log transformations.
Mean, Median, and Log Transformations
Suppose we have a dataset y
that contains the following observations:
If we log-transform the values of y
then calculate the mean and median, we have
mean_log_y | median_log_y |
---|---|
1.70503 | 1.79176 |
If we calculate the mean and median of y
, then log-transform the mean and median, we have
log_mean | log_median |
---|---|
1.75786 | 1.79176 |
This is a simple illustration to show
- the mean and log are not commutable
- the median and log are commutable
Interpretaton of model coefficients
Using Equation 2, the mean for any given value of is ; however, this does not indicate that the mean of (see previous section). From the assumptions of linear regression, we assume that for any given value of , the distribution of is Normal, and therefore symmetric. Thus the median of is equal to the mean of , i.e .
Since the log and the median are commutable, . Thus, when we log-transform the response variable, the interpretation of the intercept and slope are in terms of the effect on the median of .
Intercept: The intercept is expected median of when the predictor variable equals 0. Therefore, when ,
Interpretation: When , the median of is expected to be .
Slope: The slope is the expected change in the median of when increases by 1 unit. The change in the median of is
Thus, the median of for is times the median of for .
Interpretation: When increases by one unit, the median of is expected to multiply by a factor of .
Log-transformation on the predictor variable
Suppose we fit a linear regression model with , the log-transformed , as the predictor variable. Under this model, we assume a linear relationship exists between and , such that for some , and . In other words, we can model the relationship between and using the model in #eq-log-x.
Intercept: The intercept is the mean of when , i.e. .
Interpretation: When , the mean of is expected to be .
Slope: The slope is interpreted in terms of the change in the mean of when is multiplied by a factor of , since . Thus, when is multiplied by a factor of , the change in the mean of is
Thus the mean of changes by units.
Interpretation: When is multiplied by a factor of , the mean of is expected to change by units. For example, if is doubled, then the mean of is expected to change by units.
Log-transformation on the the response and predictor variable
Suppose we fit a linear regression model with , the log-transformed , as the predictor variable and , the log-transformed , as the response variable. Under this model, we assume a linear relationship exists between and , such that for some , and . In other words, we can model the relationship between and using the model in Equation 5.
Because the response variable is log-transformed, the interpretations on the original scale will be in terms of the median of (see the section on the log-transformed response variable for more detail).
Intercept: The intercept is the mean of when , i.e. . Therefore, when ,
Interpretation: When , the median of is expected to be .
Slope: The slope is interpreted in terms of the change in the median when is multiplied by a factor of , since . Thus, when is multiplied by a factor of , the change in the median of is
Thus, the median of for is times the median of for .
Interpretation: When is multiplied by a factor of , the median of is expected to multiple by a factor of . For example, if is doubled, then the median of is expected to multiply by .