$$Y_i=\beta_0+\beta_1X_i+u_i$$
\(u_i\) includes all other variables that affect \(Y\)
Every regression model always has omitted variables assumed in the error
Again, we assume \(u\) is random, with \(E[u|X]=0\) and \(var(u)=\sigma^2_u\)
Sometimes, omission of variables can bias OLS estimators \((\hat{\beta_0}\) and \(\hat{\beta_1})\)
1. \(Z\) is a determinant of \(Y\)
1. \(Z\) is a determinant of \(Y\)
2. \(Z\) is correlated with the regressor \(X\)
Omitted variable bias makes \(X\) endogenous
Violates zero conditional mean assumption $$E(u_i|X_i)\neq 0 \implies$$
\(\hat{\beta_1}\) is biased: \(E[\hat{\beta_1}] \neq \beta_1\)
\(\hat{\beta_1}\) systematically over- or under-estimates the true relationship \((\beta_1)\)
\(\hat{\beta_1}\) “picks up” both pathways:
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
\(Z_i\): time of day of the test
\(Z_i\): parking space per student
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
\(Z_i\): time of day of the test
\(Z_i\): parking space per student
\(Z_i\): percent of ESL students
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
2) The larger \(cor(X,u)\) is, larger bias: \(\left(E[\hat{\beta_1}]-\beta_1 \right)\)
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
2) The larger \(cor(X,u)\) is, larger bias: \(\left(E[\hat{\beta_1}]-\beta_1 \right)\)
3) We can “sign” the direction of the bias based on \(cor(X,u)\)
† See 2.4 class notes for proof.
# Select only the three variables we want (there are many)CAcorr <- CASchool %>% select("str","testscr","el_pct")# Make a correlation tablecor_table <- cor(CAcorr)cor_table # look at it
## str testscr el_pct## str 1.0000000 -0.2263628 0.1876424## testscr -0.2263628 1.0000000 -0.6441237## el_pct 0.1876424 -0.6441237 1.0000000
el_pct
is strongly (negatively) correlated with testscr
(Condition 1)
el_pct
is reasonably (positively) correlated with str
(Condition 2)
# Make a correlation plotlibrary(corrplot)corrplot(cor_table, type="upper", method = "circle", order="original")
el_pct
is strongly correlated with testscr
(Condition 1)el_pct
is reasonably correlated with str
(Condition 2) # make a new variable called EL# = high (if el_pct is above median) or = low (if below median)CASchool <- CASchool %>% # next we create a new dummy variable called ESL mutate(ESL = ifelse(el_pct > median(el_pct), # test if ESL is above median yes = "High ESL", # if yes, call this variable "High ESL" no = "Low ESL")) # if no, call this variable "Low ESL"# get average test score by high/low ELCASchool %>% group_by(ESL) %>% summarize(Average_test_score = mean(testscr))
ggplot(data = CASchool)+ aes(x = testscr, fill = ESL)+ geom_density(alpha=0.5)+ labs(x = "Test Score", y = "Density")+ ggthemes::theme_pander( base_family = "Fira Sans Condensed", base_size=20 )+ theme(legend.position = "bottom")
esl_scatter <- ggplot(data = CASchool)+ aes(x = str, y = testscr, color = ESL)+ geom_point()+ geom_smooth(method = "lm")+ labs(x = "STR", y = "Test Score")+ ggthemes::theme_pander( base_family = "Fira Sans Condensed", base_size=20 )+ theme(legend.position = "bottom")esl_scatter
esl_scatter+ facet_grid(~ESL)+ guides(color = F)
$$E[\hat{\beta_1}]=\beta_1+bias$$ \(E[\hat{\beta_1}]=\) \(\beta_1\) \(+\) \(cor(X,u)\) \(\frac{\sigma_u}{\sigma_X}\)
\(cor(STR,u)\) is positive (via \(\%EL\))
\(cor(u, \text{Test score})\) is negative (via \(\%EL\))
\(\beta_1\) is negative (between Test score and STR)
Bias is positive
If school districts with higher Test Scores happen to have both lower STR AND districts with smaller \(STR\) sizes tend to have less \(\%EL\) ...
How can we say \(\hat{\beta_1}\) estimates the marginal effect of \(\Delta STR \rightarrow \Delta \text{Test Score}\)?
(We can’t.)
Consider an ideal random controlled trial (RCT)
Randomly assign experimental units (e.g. people, cities, etc) into two (or more) groups:
Compare results of two groups to get average treatment effect
Example: Imagine an ideal RCT for measuring the effect of STR on Test Score
School districts would be randomly assigned a student-teacher ratio
With random assignment, all factors in \(u\) (%ESL students, family size, parental income, years in the district, day of the week of the test, climate, etc) are distributed independently of class size
Example: Imagine an ideal RCT for measuring the effect of STR on Test Score
Thus, \(cor(STR, u)=0\) and \(E[u|STR]=0\), i.e. exogeneity
Our \(\hat{\beta_1}\) would be an unbiased estimate of \(\beta_1\), measuring the true causal effect of STR \(\rightarrow\) Test Score
But we didn't run an RCT, it's observational data!
“Treatment” of having a large or small class size is NOT randomly assigned!
\(\%EL\): plausibly fits criteria of O.V. bias!
Thus, “control” group and “treatment” group differ systematically!
Treatment Group
Control Group
Pathways connecting str and test score:
DAG rules tell us we need to control for ESL in order to identify the causal effect of str \(\rightarrow\) test score
So now, how do we control for a variable?
Look at effect of STR on Test Score by comparing districts with the same %EL
The simple fix is just to not omit %EL!
Treatment Group
Control Group
Look at effect of STR on Test Score by comparing districts with the same %EL
The simple fix is just to not omit %EL!
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
† Note Bailey defines k to include both the number of variables plus the constant.
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \Delta Y&= \beta_1 \Delta X_1 && \text{The difference}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \Delta Y&= \beta_1 \Delta X_1 && \text{The difference}\\ \frac{\Delta Y}{\Delta X_1} &= \beta_1 && \text{Solving for } \beta_1\\ \end{align*}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
Similarly, for \(\beta_2\):
$$\beta_2 =\frac{\Delta Y}{\Delta X_2}\text{ holding }X_1 \text{ constant}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
Similarly, for \(\beta_2\):
$$\beta_2 =\frac{\Delta Y}{\Delta X_2}\text{ holding }X_1 \text{ constant}$$
And for the constant, \(\beta_0\):
$$\beta_0 =\text{predicted value of Y when } X_1=0, \; X_2=0$$
We have been envisioning OLS regressions as the equation of a line through a scatterplot of data on two variables, \(X\) and \(Y\)
With 3+ variables, OLS regression is no longer a “line” for us to estimate...
Alternatively, we can write the population regression equation as: $$Y_i=\beta_0\color{#e64173}{X_{0i}}+\beta_1X_{1i}+\beta_2X_{2i}+u_i$$
Here, we added \(X_{0i}\) to \(\beta_0\)
\(X_{0i}\) is a constant regressor, as we define \(X_{0i}=1\) for all \(i\) observations
Likewise, \(\beta_0\) is more generally called the “constant” term in the regression (instead of the “intercept”)
This may seem silly and trivial, but this will be useful next class!
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
What measures the income effect? What sign should it have? What should inferior or normal (necessities & luxury) goods look like?
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
What measures the income effect? What sign should it have? What should inferior or normal (necessities & luxury) goods look like?
What measures the cross-price effect(s)? What sign should substitutes and complements have?
Example:
$$\widehat{\text{Beer Consumption}_i}=20-1.5Price_i+1.25Income_i-0.75\text{Nachos Price}_i+1.3\text{Wine Price}_i$$
# run regression of testscr on str and el_pctschool_reg_2 <- lm(testscr ~ str + el_pct, data = CASchool)
lm(y ~ x1 + x2, data = df)
y
is dependent variable (listed first!)~
means “is modeled by” or “is explained by”x1
and x2
are the independent variabledf
is the dataframe where the data is stored# look at reg objectschool_reg_2
## ## Call:## lm(formula = testscr ~ str + el_pct, data = CASchool)## ## Coefficients:## (Intercept) str el_pct ## 686.0322 -1.1013 -0.6498
lm
object called school_reg_2
, a list
objectsummary(school_reg_2) # get full summary
## ## Call:## lm(formula = testscr ~ str + el_pct, data = CASchool)## ## Residuals:## Min 1Q Median 3Q Max ## -48.845 -10.240 -0.308 9.815 43.461 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 686.03225 7.41131 92.566 < 2e-16 ***## str -1.10130 0.38028 -2.896 0.00398 ** ## el_pct -0.64978 0.03934 -16.516 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 14.46 on 417 degrees of freedom## Multiple R-squared: 0.4264, Adjusted R-squared: 0.4237 ## F-statistic: 155 on 2 and 417 DF, p-value: < 2.2e-16
# load packageslibrary(broom)# tidy regression outputtidy(school_reg_2)
library(huxtable)huxreg("Model 1" = school_reg, "Model 2" = school_reg_2, coefs = c("Intercept" = "(Intercept)", "Class Size" = "str", "%ESL Students" = "el_pct"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2)
Model 1 | Model 2 | |
---|---|---|
Intercept | 698.93 *** | 686.03 *** |
(9.47) | (7.41) | |
Class Size | -2.28 *** | -1.10 ** |
(0.48) | (0.38) | |
%ESL Students | -0.65 *** | |
(0.04) | ||
N | 420 | 420 |
R-Squared | 0.05 | 0.43 |
SER | 18.58 | 14.46 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |
$$Y_i=\beta_0+\beta_1X_i+u_i$$
\(u_i\) includes all other variables that affect \(Y\)
Every regression model always has omitted variables assumed in the error
Again, we assume \(u\) is random, with \(E[u|X]=0\) and \(var(u)=\sigma^2_u\)
Sometimes, omission of variables can bias OLS estimators \((\hat{\beta_0}\) and \(\hat{\beta_1})\)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
$$Y_i=\beta_0+\beta_1X_i+u_i$$
\(u_i\) includes all other variables that affect \(Y\)
Every regression model always has omitted variables assumed in the error
Again, we assume \(u\) is random, with \(E[u|X]=0\) and \(var(u)=\sigma^2_u\)
Sometimes, omission of variables can bias OLS estimators \((\hat{\beta_0}\) and \(\hat{\beta_1})\)
1. \(Z\) is a determinant of \(Y\)
1. \(Z\) is a determinant of \(Y\)
2. \(Z\) is correlated with the regressor \(X\)
Omitted variable bias makes \(X\) endogenous
Violates zero conditional mean assumption $$E(u_i|X_i)\neq 0 \implies$$
\(\hat{\beta_1}\) is biased: \(E[\hat{\beta_1}] \neq \beta_1\)
\(\hat{\beta_1}\) systematically over- or under-estimates the true relationship \((\beta_1)\)
\(\hat{\beta_1}\) “picks up” both pathways:
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
\(Z_i\): time of day of the test
\(Z_i\): parking space per student
Example: Consider our recurring class size and test score example: $$\text{Test score}_i = \beta_0 + \beta_1 \text{STR}_i + u_i$$
\(Z_i\): time of day of the test
\(Z_i\): parking space per student
\(Z_i\): percent of ESL students
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
2) The larger \(cor(X,u)\) is, larger bias: \(\left(E[\hat{\beta_1}]-\beta_1 \right)\)
$$E[\hat{\beta_1}]=\beta_1+cor(X,u)\frac{\sigma_u}{\sigma_X}$$
1) If \(X\) is exogenous: \(cor(X,u)=0\), we're just left with \(\beta_1\)
2) The larger \(cor(X,u)\) is, larger bias: \(\left(E[\hat{\beta_1}]-\beta_1 \right)\)
3) We can “sign” the direction of the bias based on \(cor(X,u)\)
† See 2.4 class notes for proof.
# Select only the three variables we want (there are many)CAcorr <- CASchool %>% select("str","testscr","el_pct")# Make a correlation tablecor_table <- cor(CAcorr)cor_table # look at it
## str testscr el_pct## str 1.0000000 -0.2263628 0.1876424## testscr -0.2263628 1.0000000 -0.6441237## el_pct 0.1876424 -0.6441237 1.0000000
el_pct
is strongly (negatively) correlated with testscr
(Condition 1)
el_pct
is reasonably (positively) correlated with str
(Condition 2)
# Make a correlation plotlibrary(corrplot)corrplot(cor_table, type="upper", method = "circle", order="original")
el_pct
is strongly correlated with testscr
(Condition 1)el_pct
is reasonably correlated with str
(Condition 2) # make a new variable called EL# = high (if el_pct is above median) or = low (if below median)CASchool <- CASchool %>% # next we create a new dummy variable called ESL mutate(ESL = ifelse(el_pct > median(el_pct), # test if ESL is above median yes = "High ESL", # if yes, call this variable "High ESL" no = "Low ESL")) # if no, call this variable "Low ESL"# get average test score by high/low ELCASchool %>% group_by(ESL) %>% summarize(Average_test_score = mean(testscr))
ggplot(data = CASchool)+ aes(x = testscr, fill = ESL)+ geom_density(alpha=0.5)+ labs(x = "Test Score", y = "Density")+ ggthemes::theme_pander( base_family = "Fira Sans Condensed", base_size=20 )+ theme(legend.position = "bottom")
esl_scatter <- ggplot(data = CASchool)+ aes(x = str, y = testscr, color = ESL)+ geom_point()+ geom_smooth(method = "lm")+ labs(x = "STR", y = "Test Score")+ ggthemes::theme_pander( base_family = "Fira Sans Condensed", base_size=20 )+ theme(legend.position = "bottom")esl_scatter
esl_scatter+ facet_grid(~ESL)+ guides(color = F)
$$E[\hat{\beta_1}]=\beta_1+bias$$ \(E[\hat{\beta_1}]=\) \(\beta_1\) \(+\) \(cor(X,u)\) \(\frac{\sigma_u}{\sigma_X}\)
\(cor(STR,u)\) is positive (via \(\%EL\))
\(cor(u, \text{Test score})\) is negative (via \(\%EL\))
\(\beta_1\) is negative (between Test score and STR)
Bias is positive
If school districts with higher Test Scores happen to have both lower STR AND districts with smaller \(STR\) sizes tend to have less \(\%EL\) ...
How can we say \(\hat{\beta_1}\) estimates the marginal effect of \(\Delta STR \rightarrow \Delta \text{Test Score}\)?
(We can’t.)
Consider an ideal random controlled trial (RCT)
Randomly assign experimental units (e.g. people, cities, etc) into two (or more) groups:
Compare results of two groups to get average treatment effect
Example: Imagine an ideal RCT for measuring the effect of STR on Test Score
School districts would be randomly assigned a student-teacher ratio
With random assignment, all factors in \(u\) (%ESL students, family size, parental income, years in the district, day of the week of the test, climate, etc) are distributed independently of class size
Example: Imagine an ideal RCT for measuring the effect of STR on Test Score
Thus, \(cor(STR, u)=0\) and \(E[u|STR]=0\), i.e. exogeneity
Our \(\hat{\beta_1}\) would be an unbiased estimate of \(\beta_1\), measuring the true causal effect of STR \(\rightarrow\) Test Score
But we didn't run an RCT, it's observational data!
“Treatment” of having a large or small class size is NOT randomly assigned!
\(\%EL\): plausibly fits criteria of O.V. bias!
Thus, “control” group and “treatment” group differ systematically!
Treatment Group
Control Group
Pathways connecting str and test score:
DAG rules tell us we need to control for ESL in order to identify the causal effect of str \(\rightarrow\) test score
So now, how do we control for a variable?
Look at effect of STR on Test Score by comparing districts with the same %EL
The simple fix is just to not omit %EL!
Treatment Group
Control Group
Look at effect of STR on Test Score by comparing districts with the same %EL
The simple fix is just to not omit %EL!
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
$$Y_i = \beta_0 + \beta_1 X_{1i} + \beta_2 X_{2i} + \cdots + \beta_kX_{ki} +u_i$$
† Note Bailey defines k to include both the number of variables plus the constant.
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \Delta Y&= \beta_1 \Delta X_1 && \text{The difference}\\ \end{align*}$$
$$Y_i= \beta_0+\beta_1 X_{1i} + \beta_2 X_{2i}$$
$$\begin{align*} Y&= \beta_0+\beta_1 X_{1} + \beta_2 X_{2} && \text{Before the change}\\ Y+\Delta Y&= \beta_0+\beta_1 (X_{1}+\Delta X_1) + \beta_2 X_{2} && \text{After the change}\\ \Delta Y&= \beta_1 \Delta X_1 && \text{The difference}\\ \frac{\Delta Y}{\Delta X_1} &= \beta_1 && \text{Solving for } \beta_1\\ \end{align*}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
Similarly, for \(\beta_2\):
$$\beta_2 =\frac{\Delta Y}{\Delta X_2}\text{ holding }X_1 \text{ constant}$$
$$\beta_1 =\frac{\Delta Y}{\Delta X_1}\text{ holding } X_2 \text{ constant}$$
Similarly, for \(\beta_2\):
$$\beta_2 =\frac{\Delta Y}{\Delta X_2}\text{ holding }X_1 \text{ constant}$$
And for the constant, \(\beta_0\):
$$\beta_0 =\text{predicted value of Y when } X_1=0, \; X_2=0$$
We have been envisioning OLS regressions as the equation of a line through a scatterplot of data on two variables, \(X\) and \(Y\)
With 3+ variables, OLS regression is no longer a “line” for us to estimate...
Alternatively, we can write the population regression equation as: $$Y_i=\beta_0\color{#e64173}{X_{0i}}+\beta_1X_{1i}+\beta_2X_{2i}+u_i$$
Here, we added \(X_{0i}\) to \(\beta_0\)
\(X_{0i}\) is a constant regressor, as we define \(X_{0i}=1\) for all \(i\) observations
Likewise, \(\beta_0\) is more generally called the “constant” term in the regression (instead of the “intercept”)
This may seem silly and trivial, but this will be useful next class!
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
What measures the income effect? What sign should it have? What should inferior or normal (necessities & luxury) goods look like?
Example:
$$\text{Beer Consumption}_i=\beta_0+\beta_1Price_i+\beta_2Income_i+\beta_3\text{Nachos Price}_i+\beta_4\text{Wine Price}+u_i$$
Let's see what you remember from micro(econ)!
What measures the price effect? What sign should it have?
What measures the income effect? What sign should it have? What should inferior or normal (necessities & luxury) goods look like?
What measures the cross-price effect(s)? What sign should substitutes and complements have?
Example:
$$\widehat{\text{Beer Consumption}_i}=20-1.5Price_i+1.25Income_i-0.75\text{Nachos Price}_i+1.3\text{Wine Price}_i$$
# run regression of testscr on str and el_pctschool_reg_2 <- lm(testscr ~ str + el_pct, data = CASchool)
lm(y ~ x1 + x2, data = df)
y
is dependent variable (listed first!)~
means “is modeled by” or “is explained by”x1
and x2
are the independent variabledf
is the dataframe where the data is stored# look at reg objectschool_reg_2
## ## Call:## lm(formula = testscr ~ str + el_pct, data = CASchool)## ## Coefficients:## (Intercept) str el_pct ## 686.0322 -1.1013 -0.6498
lm
object called school_reg_2
, a list
objectsummary(school_reg_2) # get full summary
## ## Call:## lm(formula = testscr ~ str + el_pct, data = CASchool)## ## Residuals:## Min 1Q Median 3Q Max ## -48.845 -10.240 -0.308 9.815 43.461 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 686.03225 7.41131 92.566 < 2e-16 ***## str -1.10130 0.38028 -2.896 0.00398 ** ## el_pct -0.64978 0.03934 -16.516 < 2e-16 ***## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1## ## Residual standard error: 14.46 on 417 degrees of freedom## Multiple R-squared: 0.4264, Adjusted R-squared: 0.4237 ## F-statistic: 155 on 2 and 417 DF, p-value: < 2.2e-16
# load packageslibrary(broom)# tidy regression outputtidy(school_reg_2)
library(huxtable)huxreg("Model 1" = school_reg, "Model 2" = school_reg_2, coefs = c("Intercept" = "(Intercept)", "Class Size" = "str", "%ESL Students" = "el_pct"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2)
Model 1 | Model 2 | |
---|---|---|
Intercept | 698.93 *** | 686.03 *** |
(9.47) | (7.41) | |
Class Size | -2.28 *** | -1.10 ** |
(0.48) | (0.38) | |
%ESL Students | -0.65 *** | |
(0.04) | ||
N | 420 | 420 |
R-Squared | 0.05 | 0.43 |
SER | 18.58 | 14.46 |
*** p < 0.001; ** p < 0.01; * p < 0.05. |