2.4 — OLS: Goodness of Fit and Bias — Class Content

Overview
Readings
Slides
Assignments
- Problem Set 2
Math Appendix

Tuesday, September 21, 2021

Problem Set 2 is due by the end of the day today.

Overview

Today we continue looking at basic OLS regression. We will cover how to measure if a regression line is a good fit (using $R^{2}$ and $σ_{u}$ or SER), and whether OLS estimators are biased. These will depend on four critical assumptions about $u$ .

In doing so, we begin an ongoing exploration into inferential statistics, which will finally become clear in another week. The most confusing part is recognizing that there is a sampling distribution of each OLS estimator. We want to measure the center of that sampling distribution, to see if the estimator is biased. Next class we will measure the spread of that distribution.

We continue the extended example about class sizes and test scores, which comes from a (Stata) dataset from an old textbook that I used to use, Stock and Watson, 2007. Download and follow along with the data from today’s example:¹

caschool.dta

I have also made a RStudio Cloud project documenting all of the things we have been doing with this data that may help you when you start working with regressions (next class):

Class Size Regression Analysis

Readings

Ch. 3.2-3.4, 3.7-3.8 in Bailey, Real Econometrics

Slides

Below, you can find the slides in two formats. Clicking the image will bring you to the html version of the slides in a new tab. Note while in going through the slides, you can type h to see a special list of viewing options, and type o for an outline view of all the slides.

The lower button will allow you to download a PDF version of the slides. I suggest printing the slides beforehand and using them to take additional notes in class (not everything is in the slides)!

Download as PDF

Assignments

Problem Set 2

Problem Set 2 is due by Tuesday September 21. Please see the instructions for more information on how to submit your assignment (there are multiple ways!).

Math Appendix

Deriving the OLS Estimators

The population linear regression model is:

$Y_{i} = β_{0} + β_{1} X_{i} + u_{i}$

The errors $(u_{i})$ are unobserved, but for candidate values of $\hat{β_{0}}$ and $\hat{β_{1}}$ , we can obtain an estimate of the residual. Algebraically, the error is:

$\hat{u_{i}} = Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}$

Recall our goal is to find $\hat{β_{0}}$ and $\hat{β_{1}}$ that minimizes the sum of squared errors (SSE):

$S S E = \sum_{i = 1}^{n} {\hat{u_{i}}}^{2}$

So our minimization problem is:

$min_{\hat{β_{0}}, \hat{β_{1}}} \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i})^{2}$

Using calculus, we take the partial derivatives and set it equal to 0 to find a minimum. The first order conditions are:

$\begin{aligned} \frac{\partial S S E}{\partial \hat{β_{0}}} & = - 2 \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) = 0 \\ \frac{\partial S S E}{\partial \hat{β_{1}}} & = - 2 \sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) X_{i} = 0 \end{aligned}$

Finding $\hat{β_{0}}$

Working with the first FOC, divide both sides by $- 2$ :

$\sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) = 0$

Then expand the summation across all terms and divide by $n$ :

$\underset{\bar{Y}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} Y_{i}}} - \underset{\hat{β_{0}}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} \hat{β_{0}}}} - \underset{\hat{β_{1}} \bar{X}}{\underset{⏟}{\frac{1}{n} \sum_{i = 1}^{n} \hat{β_{1}} X_{i}}} = 0$

Note the first term is $\bar{Y}$ , the second is $\hat{β_{0}}$ , the third is $\hat{β_{1}} \bar{X}$ .²

So we can rewrite as: $\bar{Y} - \hat{β_{0}} - β_{1} = 0$

Rearranging:

$\hat{β_{0}} = \bar{Y} - \bar{X} β_{1}$

Finding $\hat{β_{1}}$

To find $\hat{β_{1}}$ , take the second FOC and divide by $- 2$ :

$\sum_{i = 1}^{n} (Y_{i} - \hat{β_{0}} - \hat{β_{1}} X_{i}) X_{i} = 0$

From the formula for $\hat{β_{0}}$ , substitute in for $\hat{β_{0}}$ :

$\sum_{i = 1}^{n} (Y_{i} - [\bar{Y} - \hat{β_{1}} \bar{X}] - \hat{β_{1}} X_{i}) X_{i} = 0$

Combining similar terms:

$\sum_{i = 1}^{n} ([Y_{i} - \bar{Y}] - [X_{i} - \bar{X}] \hat{β_{1}}) X_{i} = 0$

Distribute $X_{i}$ and expand terms into the subtraction of two sums (and pull out $\hat{β_{1}}$ as a constant in the second sum:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} - \hat{β_{1}} \sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i} = 0$

Move the second term to the righthand side:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} = \hat{β_{1}} \sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i}$

Divide to keep just $\hat{β_{1}}$ on the right:

$\frac{\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i}}{\sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i}} = \hat{β_{1}}$

Note that from the rules about summation operators:

$\sum_{i = 1}^{n} [Y_{i} - \bar{Y}] X_{i} = \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})$

and:

$\sum_{i = 1}^{n} [X_{i} - \bar{X}] X_{i} = \sum_{i = 1}^{n} (X_{i} - \bar{X}) (X_{i} - \bar{X}) = \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$

Plug in these two facts:

$\frac{\sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} = \hat{β_{1}}$

Algebraic Properties of OLS Estimators

The OLS residuals $\hat{u}$ and predicted values $\hat{Y}$ are chosen by the minimization problem to satisfy:

The expected value (average) error is 0: $E (u_{i}) = \frac{1}{n} \sum_{i = 1}^{n} \hat{u_{i}} = 0$
The covariance between $X$ and the errors is 0: ${\hat{σ}}_{X, u} = 0$

Note the first two properties imply strict exogeneity. That is, this is only a valid model if $X$ and $u$ are not correlated.

The expected predicted value of $Y$ is equal to the expected value of $Y$ : $\bar{\hat{Y}} = \frac{1}{n} \sum_{i = 1}^{n} \hat{Y_{i}} = \bar{Y}$
Total sum of squares is equal to the explained sum of squares plus sum of squared errors: $\begin{aligned} T S S & = E S S + S S E \\ \sum_{i = 1}^{n} (Y_{i} - \bar{Y})^{2} & = \sum_{i = 1}^{n} (\hat{Y_{i}} - \bar{Y})^{2} + \sum_{i = 1}^{n} u^{2} \end{aligned}$

Recall $R^{2}$ is $\frac{E S S}{T S S}$ or $1 - S S E$

The regression line passes through the point $(\bar{X}, \bar{Y})$ , i.e. the mean of $X$ and the mean of $Y$ .

Bias in $\hat{β_{1}}$

Begin with the formula we derived for $\hat{β_{1}}$ :

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Recall from Rule 6 of summations, we can rewrite the numerator as

$\begin{aligned} = & \sum_{i = 1}^{n} (Y_{i} - \bar{Y}) (X_{i} - \bar{X}) \\ = & \sum_{i = 1}^{n} Y_{i} (X_{i} - \bar{X}) \end{aligned}$

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} Y_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

We know the true population relationship is expressed as:

$Y_{i} = β_{0} + β_{1} X_{i} + u_{i}$

Substituting this in for $Y_{i}$ in equation 2:

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} (β_{0} + β_{1} X_{i} + u_{i}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$ Breaking apart the sums in the numerator:

$\hat{β_{1}} = \frac{\sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X}) + \sum_{i = 1}^{n} β_{1} X_{i} (X_{i} - \bar{X}) + \sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

We can simplify equation 4 using Rules 4 and 5 of summations

The first term in the numerator $[\sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X})]$ has the constant $β_{0}$ , which can be pulled out of the summation. This gives us the summation of deviations, which add up to 0 as per Rule 4:

$\begin{aligned} \sum_{i = 1}^{n} β_{0} (X_{i} - \bar{X}) & = β_{0} \sum_{i = 1}^{n} (X_{i} - \bar{X}) \\ = β_{0} (0) \\ = 0 \end{aligned}$

The second term in the numerator $[\sum_{i = 1}^{n} β_{1} X_{i} (X_{i} - \bar{X})]$ has the constant $β_{1}$ , which can be pulled out of the summation. Additionally, Rule 5 tells us $\sum_{i = 1}^{n} X_{i} (X_{i} - \bar{X}) = \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$ :

$\begin{aligned} \sum_{i = 1}^{n} β_{1} X_{1} (X_{i} - \bar{X}) & = β_{1} \sum_{i = 1}^{n} X_{i} (X_{i} - \bar{X}) \\ = β_{1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2} \end{aligned}$

When placed back in the context of being the numerator of a fraction, we can see this term simplifies to just $β_{1}$ :

$\begin{aligned} \frac{β_{1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} & = \frac{β_{1}}{1} \times \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ = β_{1} \end{aligned}$

Thus, we are left with:

$\hat{β_{1}} = β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Now, take the expectation of both sides:

$E [\hat{β_{1}}] = E [β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}]$

We can break this up, using properties of expectations. First, recall $E [a + b] = E [a] + E [b]$ , so we can break apart the two terms.

$E [\hat{β_{1}}] = E [β_{1}] + E [\frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}]$

Second, the true population value of $β_{1}$ is a constant, so $E [β_{1}] = β_{1}$ .

Third, since we assume $X$ is also “fixed” and not random, the variance of $X$ , $\sum_{i = 1}^{n} (X_{i} - \bar{X})$ , in the denominator, is just a constant, and can be brought outside the expectation.

$E [\hat{β_{1}}] = β_{1} + \frac{E [\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})]}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}$

Thus, the properties of the equation are primarily driven by the expectation $E [\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})]$ . We now turn to this term.

Use the property of summation operators to expand the numerator term:

$\begin{aligned} \hat{β_{1}} & = β_{1} + \frac{\sum_{i = 1}^{n} u_{i} (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ \hat{β_{1}} & = β_{1} + \frac{\sum_{i = 1}^{n} (u_{i} - \bar{u}) (X_{i} - \bar{X})}{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \end{aligned}$

Now divide the numerator and denominator of the second term by $\frac{1}{n}$ . Realize this gives us the covariance between $X$ and $u$ in the numerator and variance of $X$ in the denominator, based on their respective definitions.

$\begin{aligned} \hat{β_{1}} & = β_{1} + \frac{\frac{1}{n} \sum_{i = 1}^{n} (u_{i} - \bar{u}) (X_{i} - \bar{X})}{\frac{1}{n} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}} \\ \hat{β_{1}} & = β_{1} + \frac{c o v (X, u)}{v a r (X)} \\ \hat{β_{1}} & = β_{1} + \frac{s_{X, u}}{s_{X}^{2}} \end{aligned}$

By the Zero Conditional Mean assumption of OLS, $s_{X, u} = 0$ .

Alternatively, we can express the bias in terms of correlation instead of covariance:

$E [\hat{β_{1}}] = β_{1} + \frac{c o v (X, u)}{v a r (X)}$

From the definition of correlation:

$\begin{aligned} c o r (X, u) & = \frac{c o v (X, u)}{s_{X} s_{u}} \\ c o r (X, u) s_{X} s_{u} & = c o v (X, u) \end{aligned}$

Plugging this in:

$\begin{aligned} E [\hat{β_{1}}] & = β_{1} + \frac{c o v (X, u)}{v a r (X)} \\ E [\hat{β_{1}}] & = β_{1} + \frac{[c o r (X, u) s_{x} s_{u}]}{s_{X}^{2}} \\ E [\hat{β_{1}}] & = β_{1} + \frac{c o r (X, u) s_{u}}{s_{X}} \\ E [\hat{β_{1}}] & = β_{1} + c o r (X, u) \frac{s_{u}}{s_{X}} \end{aligned}$

Proof of the Unbiasedness of $\hat{β_{1}}$

Begin with equation:³

$\hat{β_{1}} = \frac{\sum Y_{i} X_{i}}{\sum X_{i}^{2}}$

Substitute for $Y_{i}$ :

$\hat{β_{1}} = \frac{\sum (β_{1} X_{i} + u_{i}) X_{i}}{\sum X_{i}^{2}}$

Distribute $X_{i}$ in the numerator:

$\hat{β_{1}} = \frac{\sum β_{1} X_{i}^{2} + u_{i} X_{i}}{\sum X_{i}^{2}}$

Separate the sum into additive pieces:

$\hat{β_{1}} = \frac{\sum β_{1} X_{i}^{2}}{\sum X_{i}^{2}} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

$β_{1}$ is constant, so we can pull it out of the first sum:

$\hat{β_{1}} = β_{1} \frac{\sum X_{i}^{2}}{\sum X_{i}^{2}} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

Simplifying the first term, we are left with:

$\hat{β_{1}} = β_{1} + \frac{u_{i} X_{i}}{\sum X_{i}^{2}}$

Now if we take expectations of both sides:

$E [\hat{β_{1}}] = E [β_{1}] + E [\frac{u_{i} X_{i}}{\sum X_{i}^{2}}]$

$β_{1}$ is a constant, so the expectation of $β_{1}$ is itself.

$E [\hat{β_{1}}] = β_{1} + E [\frac{u_{i} X_{i}}{\sum X_{i}^{2}}]$

Using the properties of expectations, we can pull out $\frac{1}{\sum X_{i}^{2}}$ as a constant:

$E [\hat{β_{1}}] = β_{1} + \frac{1}{\sum X_{i}^{2}} E [\sum u_{i} X_{i}]$

Again using the properties of expectations, we can put the expectation inside the summation operator (the expectation of a sum is the sum of expectations):

$E [\hat{β_{1}}] = β_{1} + \frac{1}{\sum X_{i}^{2}} \sum E [u_{i} X_{i}]$

Under the exogeneity condition, the correlation between $X_{i}$ and $u_{i}$ is 0.

$E [\hat{β_{1}}] = β_{1}$

Note this is a .dta Stata file. You will need to (install and) load the package haven to read_dta() Stata files into a dataframe.↩︎
From the rules about summation operators, we define the mean of a random variable $X$ as $\bar{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ . The mean of a constant, like $β_{0}$ or $β_{1}$ is itself.↩︎
Admittedly, this is a simplified version where $\hat{β_{0}} = 0$ , but there is no loss of generality in the results.↩︎

Last updated on Sep 26, 2021