3.9 — Logarithmic Regression

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Nonlinearities

Consider the gapminder example

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$

$$\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}$$

Natural Logarithms

Logarithmic Models

Another useful model for nonlinear data is the logarithmic model^†
- We transform either $X$, $Y$, or both by taking the (natural) logarithm
Logarithmic model has two additional advantages
1. We can easily interpret coefficients as percentage changes or elasticities
2. Useful economic shape: diminishing returns (production functions, utility functions, etc)

^† Don’t confuse this with a logistic (logit) model for dependent dummy variables.

The Natural Logarithm

The exponential function, $Y=e^X$ or $Y=exp(X)$, where base $e=2.71828...$
Natural logarithm is the inverse, $Y=ln(X)$

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

Logarithms are the inverse, defined as the exponents in the expressions above
$$\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$$\(\color{#e64173}{n}\) is the number you must raise \(\color{#6A5ACD}{b}\) to in order to get \(\color{#314f4f}{y}\)

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

Logarithms are the inverse, defined as the exponents in the expressions above
$$\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$$\(\color{#e64173}{n}\) is the number you must raise \(\color{#6A5ACD}{b}\) to in order to get \(\color{#314f4f}{y}\)

Example: \(log_{\color{#6A5ACD}{2}}(\color{#314f4f}{8})=\color{#e64173}{3}\)

The Natural Logarithm: Review IILogarithms can have any base, but common to use the natural logarithm \((\ln)\) with base \(\mathbf{e=2.71828...}\)
$$\text{If } e^n=y\text{, then } \ln(y)=n$$

The Natural Logarithm: Properties

Natural logs have a lot of useful properties:

$\ln(\frac{1}{x})=-\ln(x)$
$\ln(ab)=\ln(a)+\ln(b)$
$\ln(\frac{x}{a})=\ln(x)-\ln(a)$
$\ln(x^a)=a \, \ln(x)$
$\frac{d \, \ln \, x}{d \, x} = \frac{1}{x}$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

Example: Let $x=100$ and $\Delta x =1$, relative change is:

$$\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$$

The logged difference: $$\ln(101)-\ln(100) = 0.00995 \approx 1\%$$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

Example: Let $x=100$ and $\Delta x =1$, relative change is:

$$\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$$

The logged difference: $$\ln(101)-\ln(100) = 0.00995 \approx 1\%$$

This allows us to very easily interpret coefficients as percent changes or elasticities

ElasticityAn elasticity between any two variables, \(\epsilon_{Y,X}\) describes the responsiveness (in %) of one variable \((Y)\) to a change in another \((X)\)

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Numerator is relative change in $Y$, Denominator is relative change in $X$

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Numerator is relative change in $Y$, Denominator is relative change in $X$

Interpretation: a 1% change in $X$ will cause a $\epsilon_{Y,X}$% chang in $Y$

Math FYI: Cobb Douglas Functions and LogsOne of the (many) reasons why economists love Cobb-Douglas functions:
$$Y=AL^{\alpha}K^{\beta}$$

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

$$\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$$

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

$$\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$$

With data on $(Y, L, K)$ and linear regression, can estimate $\alpha$ and $\beta$
- $\alpha$: elasticity of $Y$ with respect to $L$
  - A 1% change in $L$ will lead to an $\alpha$% change in $Y$
- $\beta$: elasticity of $Y$ with respect to $K$
  - A 1% change in $K$ will lead to a $\beta$% change in $Y$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Taking logs:

$$\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Taking logs:

$$\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$$

A 1% change in $L$ will yield a 0.75% change in output $Y$
A 1% change in $K$ will yield a 0.25% change in output $Y$

Logarithms in R IThe log() function can easily take the logarithm
gapminder <- gapminder %>%
  mutate(loggdp = log(gdpPercap)) # log GDP per capita
gapminder %>% head() # look at it

Logarithms in R II

Note, log() by default is the natural logarithm $ln()$, i.e. base e
- Can change base with e.g. log(x, base = 5)
- Some common built-in logs: log10, log2

log10(100)

## [1] 2

log2(16)

## [1] 4

log(19683, base=3)

## [1] 9

Logarithms in R IIINote when running a regression, you can pre-transform the data into logs (as I did above), or just add log() around a variable in the regression
lm(lifeExp ~ loggdp,
   data = gapminder) %>%
  tidy()

lm(lifeExp ~ log(gdpPercap),
   data = gapminder) %>%
  tidy()

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log

  

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log
Linear-log model: \(Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model: $Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$
Log-linear model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i$

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model: $Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$
Log-linear model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i$
Log-log model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$

Linear-Log Model

Linear-Log ModelLinear-log model has an independent variable \((X)\) that is logged

  

Linear-Log Model

Linear-log model has an independent variable $(X)$ that is logged

$$\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Linear-Log Model

Linear-log model has an independent variable $(X)$ that is logged

$$\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\frac{\beta_1}{100}$ unit change in $Y$

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.0841)=$ 2.1025 year decrease in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.0841)=$ 2.1025 year decrease in Life Expectancy
A 100% rise in GDP $\rightarrow$ a $(100 \times 0.0841)=$ 8.4100 year increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdpPercap,
      y = lifeExp)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm",
              formula=y~log(x),
              color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120000,20000))+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Linear-Log Model Graph II

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = lifeExp)+ 
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "Log GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Linear Model

Log-Linear ModelLog-linear model has the dependent variable \((Y)\) logged

  

Log-Linear Model

Log-linear model has the dependent variable $(Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}$$

Log-Linear Model

Log-linear model has the dependent variable $(Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1 unit change in $X \rightarrow$ a $\beta_1 \times 100$ % change in $Y$

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

gapminder <- gapminder %>%
  mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s
         loglife = log(lifeExp)) # take the log of LifeExp
gapminder %>% head() # look at it

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP $\rightarrow$ a $(-25 \times 1.3\%)=$ 32.5% decrease in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP $\rightarrow$ a $(-25 \times 1.3\%)=$ 32.5% decrease in Life Expectancy
A $100 (thousand) rise in GDP $\rightarrow$ a $(100 \times 1.3\%)=$ 130% increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdp_t,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120,20))+
  labs(x = "GDP per Capita ($ Thousands)",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Log Model

Log-Log ModelLog-log model has both variables \((X \text{ and } Y)\) logged

  

Log-Log Model

Log-log model has both variables $(X \text{ and } Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Log-Log Model

Log-log model has both variables $(X \text{ and } Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\beta_1$ % change in $Y$
$\beta_1$ is the elasticity of $Y$ with respect to $X$!

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.147\%)=$ 3.675% decrease in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.147\%)=$ 3.675% decrease in Life Expectancy
A 100% rise in GDP $\rightarrow$ a $(100 \times 0.147\%)=$ 14.7% increase in Life Expectancy

Log-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  labs(x = "Log GDP per Capita",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Comparing Models I

Model
Equation
Interpretation


Linear-Log
\(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\)
1% change in \(X \rightarrow \frac{\hat{\beta_1}}{100}\) unit change in \(Y\)

Log-Linear
\(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\)
1 unit change in \(X \rightarrow \hat{\beta_1}\times 100\)% change in \(Y\)

Log-Log
\(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\)
1% change in \(X \rightarrow \hat{\beta_1}\)% change in \(Y\)

Hint: the variable that gets logged changes in percent terms, the variable not logged changes in unit termsGoing from units \(\rightarrow\) percent: multiply by 100
Going from percent \(\rightarrow\) units: divide by 100


  

Model	Equation	Interpretation
Linear-Log	\(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\)	1% change in \(X \rightarrow \frac{\hat{\beta_1}}{100}\) unit change in \(Y\)
Log-Linear	\(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\)	1 unit change in \(X \rightarrow \hat{\beta_1}\times 100\)% change in \(Y\)
Log-Log	\(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\)	1% change in \(X \rightarrow \hat{\beta_1}\)% change in \(Y\)

Comparing Models IIlibrary(huxtable)
huxreg("Life Exp." = lin_log_reg,
       "Log Life Exp." = log_lin_reg,
       "Log Life Exp." = log_log_reg,
       coefs = c("Constant" = "(Intercept)",
                 "GDP ($1000s)" = "gdp_t",
                 "Log GDP" = "loggdp"),
       statistics = c("N" = "nobs",
                      "R-Squared" = "r.squared",
                      "SER" = "sigma"),
       number_format = 2)

Models are very different units, how to choose? Compare \(R^2\)’s
Compare graphs
Compare intution

Life Exp.Log Life Exp.Log Life Exp.

Constant-9.10 ***3.97 ***2.86 ***

(1.23)   (0.01)   (0.02)   

GDP ($1000s)       0.01 ***       

       (0.00)          

Log GDP8.41 ***       0.15 ***

(0.15)          (0.00)   

N1704       1704       1704       

R-Squared0.65    0.30    0.61    

SER7.62    0.19    0.14    

 *** p < 0.001;  ** p < 0.01;  * p < 0.05.

	Life Exp.	Log Life Exp.	Log Life Exp.
Constant	-9.10 ***	3.97 ***	2.86 ***
	(1.23)	(0.01)	(0.02)
GDP ($1000s)		0.01 ***
		(0.00)
Log GDP	8.41 ***		0.15 ***
	(0.15)		(0.00)
N	1704	1704	1704
R-Squared	0.65	0.30	0.61
SER	7.62	0.19	0.14
* p < 0.001; p < 0.01; * p < 0.05.

Comparing Models III

Linear-Log
Log-Linear
Log-Log

\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)
\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\)
\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)

\(R^2=0.65\)
\(R^2=0.30\)
\(R^2=0.61\)

Linear-Log	Log-Linear	Log-Log

\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)	\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\)	\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)
\(R^2=0.65\)	\(R^2=0.30\)	\(R^2=0.61\)

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots

Avoid logs for:Variables that are less than one, decimals, 0, or negative
Categorical variables (season, gender, political party)
Time variables (year, week, day)

Comparing Across Units

Comparing Coefficients of Different Units I

$$\hat{Y_i}=\beta_0+\beta_1 X_1+\beta_2 X_2 $$

We often want to compare coefficients to see which variable $X_1$ or $X_2$ has a bigger effect on $Y$
What if $X_1$ and $X_2$ are different units?

Example: $$\begin{align*} \widehat{\text{Salary}_i}&=\beta_0+\beta_1\, \text{Batting average}_i+\beta_2\, \text{Home runs}_i\\ \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36}\, \text{Home runs}_i\\ \end{align*}$$

Comparing Coefficients of Different Units II

An easy way is to standardize^† the variables (i.e. take the $Z$-score)

$$X_Z=\frac{X_i-\overline{X}}{sd(X)}$$

^† Also called “centering” or “scaling.”

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

$$\begin{align*}\scriptsize \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36} \, \text{Home runs}_i\\ \widehat{\text{Salary}_Z}&=\text{0.00}+\text{0.14} \, \text{Batting average}_Z+\text{0.48} \, \text{Home runs}_Z\\ \end{align*}$$

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

Marginal effects on $Y$ (in standard deviations of $Y$) from 1 standard deviation change in $X$:
$\hat{\beta_1}$: a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations

$$0.14 \times \$2,764,512=\$387,032$$

$\hat{\beta_2}$: a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations

$$0.48 \times \$2,764,512=\$1,326,966$$

Standardizing in R

Variable
Mean
SD


LifeExp
59.47
12.92

gdpPercap
$7215.32
$9857.46


Use the scale() command inside mutate() function to standardize a variable

gapminder <- gapminder %>%
  mutate(life_Z = scale(lifeExp),
         gdp_Z = scale(gdpPercap))
std_reg <- lm(life_Z ~ gdp_Z, data = gapminder)
tidy(std_reg)

## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept) 1.10e-16    0.0197  5.57e-15 1.00e+  0
## 2 gdp_Z       5.84e- 1    0.0197  2.97e+ 1 3.57e-156
A 1 standard deviation increase in gdpPercap will increase lifeExp by 0.584 standard deviations \((0.584 \times 12.92 = = 7.55\) years)


  

Variable	Mean	SD
`LifeExp`	59.47	12.92
`gdpPercap`	$7215.32	$9857.46

Joint Hypothesis Testing

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?
$H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0$

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?
$H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0$
This is a joint hypothesis to test

Joint Hypothesis Testing IIA joint hypothesis tests against the null hypothesis of a value for multiple parameters:
$$\mathbf{H_0: \beta_1= \beta_2=0}$$
the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome) 

  

Joint Hypothesis Testing II

A joint hypothesis tests against the null hypothesis of a value for multiple parameters: $$\mathbf{H_0: \beta_1= \beta_2=0}$$ the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome)
Our alternative hypothesis is that: $$H_1: \text{ either } \beta_1\neq0\text{ or } \beta_2\neq0\text{ or both}$$ or simply, that $H_0$ is not true

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

2) $H_0$: $\beta_1=\beta_2$

Testing whether two variables matter the same
Variables must be the same units
$H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2$

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

2) $H_0$: $\beta_1=\beta_2$

Testing whether two variables matter the same
Variables must be the same units
$H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2$

3) $H_0:$ ALL $\beta$'s $=0$

The "Overall F-test"
Testing against claim that regression model explains NO variation in $Y$

Joint Hypothesis Tests: F-statisticThe F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test

  

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
$F$ is an analysis of variance (ANOVA)
- essentially tests whether $R^2$ increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
$F$ is an analysis of variance (ANOVA)
- essentially tests whether $R^2$ increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model
$F$ has its own distribution, with two sets of degrees of freedom

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

$H_0: \beta_2=\beta_3=\beta_4=0$

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

$H_0: \beta_2=\beta_3=\beta_4=0$
$H_a$: $H_0$ is not true (at least one $\beta_i \neq 0$)

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i$$

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i$$

$F$-test: does going from restricted to unrestricted model statistically significantly improve $R^2$?

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)
$k$: number of $X$ variables in unrestricted model (all variables)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)
$k$: number of $X$ variables in unrestricted model (all variables)
$F$ has two sets of degrees of freedom:
- $q$ for the numerator, $(n-k-1)$ for the denominator

Calculating the F-statistic II

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$

Key takeaway: The bigger the difference between $(R^2_u-R^2_r)$, the greater the improvement in fit by adding variables, the larger the $F$!
This formula is (believe it or not) actually a simplified version (assuming homoskedasticity)
- I give you this formula to build your intuition of what F is measuring

F-test Example I

We'll use the wooldridge package's wage1 data again

# load in data from wooldridge package
library(wooldridge)
wages <- wage1
# run regressions
unrestricted_reg <- lm(wage ~ female + northcen + west + south, data = wages)
restricted_reg <- lm(wage ~ female, data = wages)

F-test Example II

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Northcen+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i$$

$H_0: \beta_2 = \beta_3 = \beta_4 =0$
$q = 3$ restrictions (F numerator df)
$n-k-1 = 526-4-1=521$ (F denominator df)

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing


# load car package for additional regression tools
library(car) 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south"))

## Linear hypothesis test
## 
## Hypothesis:
## northcen = 0
## west = 0
## south = 0
## 
## Model 1: restricted model
## Model 2: wage ~ female + northcen + west + south
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    524 6332.2                                
## 2    521 6174.8  3    157.36 4.4258 0.004377 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing


# load car package for additional regression tools
library(car) 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south"))

## Linear hypothesis test
## 
## Hypothesis:
## northcen = 0
## west = 0
## south = 0
## 
## Model 1: restricted model
## Model 2: wage ~ female + northcen + west + south
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    524 6332.2                                
## 2    521 6174.8  3    157.36 4.4258 0.004377 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(p\)-value on \(F\)-test \(<0.05\), so we can reject \(H_0\)


  

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Does height as an adolescent have the same effect on wages as height as an adult?

$$H_0: \beta_1=\beta_2$$

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Does height as an adolescent have the same effect on wages as height as an adult?

$$H_0: \beta_1=\beta_2$$

What is the restricted regression?

$$\widehat{wage_i}=\beta_0+\beta_1(\text{Adolescent height}_i + \text{Adult height}_i )+ \beta_3 \text{Male}_i$$

$q=1$ restriction

Second F-test Example: Data

# load in data
heightwages <- read_csv("../data/heightwages.csv")
# make a "heights" variable as the sum of adolescent (height81) and adult (height85) height
heightwages <- heightwages %>%
  mutate(heights = height81 + height85)
height_reg <- lm(wage96 ~ height81 + height85 + male, data = heightwages)
height_restricted_reg <- lm(wage96 ~ heights + male, data = heightwages)

Second F-test Example: Data

For second argument, set two variables equal, in quotes

linearHypothesis(height_reg, "height81 = height85") # F-test

## Linear hypothesis test
## 
## Hypothesis:
## height81 - height85 = 0
## 
## Model 1: restricted model
## Model 2: wage96 ~ height81 + height85 + male
## 
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1   6591 5128243                           
## 2   6590 5127284  1     959.2 1.2328 0.2669

Insufficient evidence to reject $H_0$!
The effect of adolescent and adult height on wages is the same

All F-test Isummary(unrestricted_reg)

## 
## Call:
## lm(formula = wage ~ female + northcen + west + south, data = wages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3269 -2.0105 -0.7871  1.1898 17.4146 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.5654     0.3466  21.827   <2e-16 ***
## female       -2.5652     0.3011  -8.520   <2e-16 ***
## northcen     -0.5918     0.4362  -1.357   0.1755    
## west          0.4315     0.4838   0.892   0.3729    
## south        -1.0262     0.4048  -2.535   0.0115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.443 on 521 degrees of freedom
## Multiple R-squared:  0.1376,    Adjusted R-squared:  0.131 
## F-statistic: 20.79 on 4 and 521 DF,  p-value: 6.501e-16
Last line of regression output from summary() is an All F-test\(H_0:\) all \(\beta's=0\) the regression explains no variation in \(Y\)

Calculates an F-statistic that, if high enough, is significant (p-value \(<0.05)\) enough to reject \(H_0\)



  

All F-test IIAlternatively, if you use broom instead of summary():glance() command makes table of regression summary statistics
tidy() only shows coefficients

library(broom)
glance(unrestricted_reg)

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.138         0.131  3.44      20.8 6.50e-16     4 -1394. 2800. 2826.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
statistic is the All F-test, p.value next to it is the p-value from the F test

  

3.9 — Logarithmic Regression

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Natural Logarithms

Linear-Log Model

Log-Linear Model

Log-Log Model

Comparing Across Units

Joint Hypothesis Testing

Nonlinearities

Consider the gapminder example

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$

Nonlinearities

Consider the gapminder example

$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$

$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$

$$\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}$$

Natural Logarithms

Logarithmic Models

Another useful model for nonlinear data is the logarithmic model^†
- We transform either $X$, $Y$, or both by taking the (natural) logarithm
Logarithmic model has two additional advantages
1. We can easily interpret coefficients as percentage changes or elasticities
2. Useful economic shape: diminishing returns (production functions, utility functions, etc)

^† Don’t confuse this with a logistic (logit) model for dependent dummy variables.

The Natural Logarithm

The exponential function, $Y=e^X$ or $Y=exp(X)$, where base $e=2.71828...$
Natural logarithm is the inverse, $Y=ln(X)$

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

Logarithms are the inverse, defined as the exponents in the expressions above
$$\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$$\(\color{#e64173}{n}\) is the number you must raise \(\color{#6A5ACD}{b}\) to in order to get \(\color{#314f4f}{y}\)

The Natural Logarithm: Review IExponents are defined as
$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$where base \(\color{#6A5ACD}{b}\) is multiplied by itself \(\color{#e64173}{n}\) times

Example: \(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)

Logarithms are the inverse, defined as the exponents in the expressions above
$$\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$$\(\color{#e64173}{n}\) is the number you must raise \(\color{#6A5ACD}{b}\) to in order to get \(\color{#314f4f}{y}\)

Example: \(log_{\color{#6A5ACD}{2}}(\color{#314f4f}{8})=\color{#e64173}{3}\)

The Natural Logarithm: Review IILogarithms can have any base, but common to use the natural logarithm \((\ln)\) with base \(\mathbf{e=2.71828...}\)
$$\text{If } e^n=y\text{, then } \ln(y)=n$$

The Natural Logarithm: Properties

Natural logs have a lot of useful properties:

$\ln(\frac{1}{x})=-\ln(x)$
$\ln(ab)=\ln(a)+\ln(b)$
$\ln(\frac{x}{a})=\ln(x)-\ln(a)$
$\ln(x^a)=a \, \ln(x)$
$\frac{d \, \ln \, x}{d \, x} = \frac{1}{x}$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

Example: Let $x=100$ and $\Delta x =1$, relative change is:

$$\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$$

The logged difference: $$\ln(101)-\ln(100) = 0.00995 \approx 1\%$$

The Natural Logarithm: Example

Most useful property: for small change in $x$, $\Delta x$:

$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$

Example: Let $x=100$ and $\Delta x =1$, relative change is:

$$\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$$

The logged difference: $$\ln(101)-\ln(100) = 0.00995 \approx 1\%$$

This allows us to very easily interpret coefficients as percent changes or elasticities

ElasticityAn elasticity between any two variables, \(\epsilon_{Y,X}\) describes the responsiveness (in %) of one variable \((Y)\) to a change in another \((X)\)

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Numerator is relative change in $Y$, Denominator is relative change in $X$

Elasticity

An elasticity between any two variables, $\epsilon_{Y,X}$ describes the responsiveness (in %) of one variable $(Y)$ to a change in another $(X)$

$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$

Numerator is relative change in $Y$, Denominator is relative change in $X$

Interpretation: a 1% change in $X$ will cause a $\epsilon_{Y,X}$% chang in $Y$

Math FYI: Cobb Douglas Functions and LogsOne of the (many) reasons why economists love Cobb-Douglas functions:
$$Y=AL^{\alpha}K^{\beta}$$

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

$$\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$$

Math FYI: Cobb Douglas Functions and Logs

One of the (many) reasons why economists love Cobb-Douglas functions: $$Y=AL^{\alpha}K^{\beta}$$
Taking logs, relationship becomes linear:

$$\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$$

With data on $(Y, L, K)$ and linear regression, can estimate $\alpha$ and $\beta$
- $\alpha$: elasticity of $Y$ with respect to $L$
  - A 1% change in $L$ will lead to an $\alpha$% change in $Y$
- $\beta$: elasticity of $Y$ with respect to $K$
  - A 1% change in $K$ will lead to a $\beta$% change in $Y$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Taking logs:

$$\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$$

Math FYI: Cobb Douglas Functions and Logs

Example: Cobb-Douglas production function: $$Y=2L^{0.75}K^{0.25}$$

Taking logs:

$$\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$$

A 1% change in $L$ will yield a 0.75% change in output $Y$
A 1% change in $K$ will yield a 0.25% change in output $Y$

Logarithms in R IThe log() function can easily take the logarithm
gapminder <- gapminder %>%
  mutate(loggdp = log(gdpPercap)) # log GDP per capita
gapminder %>% head() # look at it

Logarithms in R II

Note, log() by default is the natural logarithm $ln()$, i.e. base e
- Can change base with e.g. log(x, base = 5)
- Some common built-in logs: log10, log2

log10(100)

## [1] 2

log2(16)

## [1] 4

log(19683, base=3)

## [1] 9

Logarithms in R IIINote when running a regression, you can pre-transform the data into logs (as I did above), or just add log() around a variable in the regression
lm(lifeExp ~ loggdp,
   data = gapminder) %>%
  tidy()

lm(lifeExp ~ log(gdpPercap),
   data = gapminder) %>%
  tidy()

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log

  

Types of Logarithmic ModelsThree types of log regression models, depending on which variables we log
Linear-log model: \(Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model: $Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$
Log-linear model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i$

Types of Logarithmic Models

Three types of log regression models, depending on which variables we log

Linear-log model: $Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$
Log-linear model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i$
Log-log model: $\color{#e64173}{\ln Y_i}=\beta_0+\beta_1 \color{#e64173}{\ln X_i}$

Linear-Log Model

Linear-Log ModelLinear-log model has an independent variable \((X)\) that is logged

  

Linear-Log Model

Linear-log model has an independent variable $(X)$ that is logged

$$\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Linear-Log Model

Linear-log model has an independent variable $(X)$ that is logged

$$\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\frac{\beta_1}{100}$ unit change in $Y$

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.0841)=$ 2.1025 year decrease in Life Expectancy

Linear-Log Model in R

lin_log_reg <- lm(lifeExp ~ loggdp,
                  data = gapminder)
library(broom)
lin_log_reg %>% tidy()

$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a $\frac{9.41}{100}=$ 0.0841 year increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.0841)=$ 2.1025 year decrease in Life Expectancy
A 100% rise in GDP $\rightarrow$ a $(100 \times 0.0841)=$ 8.4100 year increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdpPercap,
      y = lifeExp)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm",
              formula=y~log(x),
              color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120000,20000))+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Linear-Log Model Graph II

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = lifeExp)+ 
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_y_continuous(breaks=seq(0,100,10),
                     limits=c(0,100))+
  labs(x = "Log GDP per Capita",
       y = "Life Expectancy (Years)")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Linear Model

Log-Linear ModelLog-linear model has the dependent variable \((Y)\) logged

  

Log-Linear Model

Log-linear model has the dependent variable $(Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}$$

Log-Linear Model

Log-linear model has the dependent variable $(Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1 unit change in $X \rightarrow$ a $\beta_1 \times 100$ % change in $Y$

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

Log-Linear Model in R (Preliminaries)

We will again have very large/small coefficients if we deal with GDP directly, again let's transform gdpPercap into $1,000s, call it gdp_t
Then log LifeExp

gapminder <- gapminder %>%
  mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s
         loglife = log(lifeExp)) # take the log of LifeExp
gapminder %>% head() # look at it

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP $\rightarrow$ a $(-25 \times 1.3\%)=$ 32.5% decrease in Life Expectancy

Log-Linear Model in R

log_lin_reg <- lm(loglife~gdp_t,
                data = gapminder)
log_lin_reg %>% tidy()

$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$

A $1 (thousand) change in GDP $\rightarrow$ a $0.013 \times 100\%=$ 1.3% increase in Life Expectancy
A $25 (thousand) fall in GDP $\rightarrow$ a $(-25 \times 1.3\%)=$ 32.5% decrease in Life Expectancy
A $100 (thousand) rise in GDP $\rightarrow$ a $(100 \times 1.3\%)=$ 130% increase in Life Expectancy

Linear-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = gdp_t,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  scale_x_continuous(labels=scales::dollar,
                     breaks=seq(0,120,20))+
  labs(x = "GDP per Capita ($ Thousands)",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Log-Log Model

Log-Log ModelLog-log model has both variables \((X \text{ and } Y)\) logged

  

Log-Log Model

Log-log model has both variables $(X \text{ and } Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Log-Log Model

Log-log model has both variables $(X \text{ and } Y)$ logged

$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$

Marginal effect of $\mathbf{X \rightarrow Y}$: a 1% change in $X \rightarrow$ a $\beta_1$ % change in $Y$
$\beta_1$ is the elasticity of $Y$ with respect to $X$!

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.147\%)=$ 3.675% decrease in Life Expectancy

Log-Log Model in R

log_log_reg <- lm(loglife ~ loggdp,
                  data = gapminder)
log_log_reg %>% tidy()

$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$

A 1% change in GDP $\rightarrow$ a 0.147% increase in Life Expectancy
A 25% fall in GDP $\rightarrow$ a $(-25 \times 0.147\%)=$ 3.675% decrease in Life Expectancy
A 100% rise in GDP $\rightarrow$ a $(100 \times 0.147\%)=$ 14.7% increase in Life Expectancy

Log-Log Model Graph I

ggplot(data = gapminder)+
  aes(x = loggdp,
      y = loglife)+
  geom_point(color="blue", alpha=0.5)+
  geom_smooth(method="lm", color="orange")+
  labs(x = "Log GDP per Capita",
       y = "Log Life Expectancy")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed",
           base_size=16)

Comparing Models I

Model
Equation
Interpretation


Linear-Log
\(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\)
1% change in \(X \rightarrow \frac{\hat{\beta_1}}{100}\) unit change in \(Y\)

Log-Linear
\(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\)
1 unit change in \(X \rightarrow \hat{\beta_1}\times 100\)% change in \(Y\)

Log-Log
\(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\)
1% change in \(X \rightarrow \hat{\beta_1}\)% change in \(Y\)

Hint: the variable that gets logged changes in percent terms, the variable not logged changes in unit termsGoing from units \(\rightarrow\) percent: multiply by 100
Going from percent \(\rightarrow\) units: divide by 100


  

Model	Equation	Interpretation
Linear-Log	\(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\)	1% change in \(X \rightarrow \frac{\hat{\beta_1}}{100}\) unit change in \(Y\)
Log-Linear	\(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\)	1 unit change in \(X \rightarrow \hat{\beta_1}\times 100\)% change in \(Y\)
Log-Log	\(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\)	1% change in \(X \rightarrow \hat{\beta_1}\)% change in \(Y\)

Comparing Models IIlibrary(huxtable)
huxreg("Life Exp." = lin_log_reg,
       "Log Life Exp." = log_lin_reg,
       "Log Life Exp." = log_log_reg,
       coefs = c("Constant" = "(Intercept)",
                 "GDP ($1000s)" = "gdp_t",
                 "Log GDP" = "loggdp"),
       statistics = c("N" = "nobs",
                      "R-Squared" = "r.squared",
                      "SER" = "sigma"),
       number_format = 2)

Models are very different units, how to choose? Compare \(R^2\)’s
Compare graphs
Compare intution

Life Exp.Log Life Exp.Log Life Exp.

Constant-9.10 ***3.97 ***2.86 ***

(1.23)   (0.01)   (0.02)   

GDP ($1000s)       0.01 ***       

       (0.00)          

Log GDP8.41 ***       0.15 ***

(0.15)          (0.00)   

N1704       1704       1704       

R-Squared0.65    0.30    0.61    

SER7.62    0.19    0.14    

 *** p < 0.001;  ** p < 0.01;  * p < 0.05.

	Life Exp.	Log Life Exp.	Log Life Exp.
Constant	-9.10 ***	3.97 ***	2.86 ***
	(1.23)	(0.01)	(0.02)
GDP ($1000s)		0.01 ***
		(0.00)
Log GDP	8.41 ***		0.15 ***
	(0.15)		(0.00)
N	1704	1704	1704
R-Squared	0.65	0.30	0.61
SER	7.62	0.19	0.14
* p < 0.001; p < 0.01; * p < 0.05.

Comparing Models III

Linear-Log
Log-Linear
Log-Log

\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)
\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\)
\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)

\(R^2=0.65\)
\(R^2=0.30\)
\(R^2=0.61\)

Linear-Log	Log-Linear	Log-Log

\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)	\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\)	\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)
\(R^2=0.65\)	\(R^2=0.30\)	\(R^2=0.61\)

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots

When to Log?In practice, the following types of variables are logged:Variables that must always be positive (prices, sales, market values)
Very large numbers (population, GDP)
Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP)
Variables that have diminishing returns (output, utility)
Variables that have nonlinear scatterplots

Avoid logs for:Variables that are less than one, decimals, 0, or negative
Categorical variables (season, gender, political party)
Time variables (year, week, day)

Comparing Across Units

Comparing Coefficients of Different Units I

$$\hat{Y_i}=\beta_0+\beta_1 X_1+\beta_2 X_2 $$

We often want to compare coefficients to see which variable $X_1$ or $X_2$ has a bigger effect on $Y$
What if $X_1$ and $X_2$ are different units?

Comparing Coefficients of Different Units II

An easy way is to standardize^† the variables (i.e. take the $Z$-score)

$$X_Z=\frac{X_i-\overline{X}}{sd(X)}$$

^† Also called “centering” or “scaling.”

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

Comparing Coefficients of Different Units: Example

Variable	Mean	Std. Dev.
Salary	$2,024,616	$2,764,512
Batting Average	0.267	0.031
Home Runs	12.11	10.31

Marginal effects on $Y$ (in standard deviations of $Y$) from 1 standard deviation change in $X$:
$\hat{\beta_1}$: a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations

$$0.14 \times \$2,764,512=\$387,032$$

$\hat{\beta_2}$: a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations

$$0.48 \times \$2,764,512=\$1,326,966$$

Standardizing in R

Variable
Mean
SD


LifeExp
59.47
12.92

gdpPercap
$7215.32
$9857.46


Use the scale() command inside mutate() function to standardize a variable

gapminder <- gapminder %>%
  mutate(life_Z = scale(lifeExp),
         gdp_Z = scale(gdpPercap))
std_reg <- lm(life_Z ~ gdp_Z, data = gapminder)
tidy(std_reg)

## # A tibble: 2 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept) 1.10e-16    0.0197  5.57e-15 1.00e+  0
## 2 gdp_Z       5.84e- 1    0.0197  2.97e+ 1 3.57e-156
A 1 standard deviation increase in gdpPercap will increase lifeExp by 0.584 standard deviations \((0.584 \times 12.92 = = 7.55\) years)


  

Variable	Mean	SD
`LifeExp`	59.47	12.92
`gdpPercap`	$7215.32	$9857.46

Joint Hypothesis Testing

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?
$H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0$

Joint Hypothesis Testing I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Maybe region doesn't affect wages at all?
$H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0$
This is a joint hypothesis to test

Joint Hypothesis Testing IIA joint hypothesis tests against the null hypothesis of a value for multiple parameters:
$$\mathbf{H_0: \beta_1= \beta_2=0}$$
the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome) 

  

Joint Hypothesis Testing II

A joint hypothesis tests against the null hypothesis of a value for multiple parameters: $$\mathbf{H_0: \beta_1= \beta_2=0}$$ the hypotheses that multiple regressors are equal to zero (have no causal effect on the outcome)
Our alternative hypothesis is that: $$H_1: \text{ either } \beta_1\neq0\text{ or } \beta_2\neq0\text{ or both}$$ or simply, that $H_0$ is not true

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

2) $H_0$: $\beta_1=\beta_2$

Testing whether two variables matter the same
Variables must be the same units
$H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2$

Types of Joint Hypothesis Tests

1) $H_0$: $\beta_1=\beta_2=0$

Testing against the claim that multiple variables don't matter
Useful under high multicollinearity between variables
$H_a$: at least one parameter $\neq$ 0

2) $H_0$: $\beta_1=\beta_2$

Testing whether two variables matter the same
Variables must be the same units
$H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2$

3) $H_0:$ ALL $\beta$'s $=0$

The "Overall F-test"
Testing against claim that regression model explains NO variation in $Y$

Joint Hypothesis Tests: F-statisticThe F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test

  

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
$F$ is an analysis of variance (ANOVA)
- essentially tests whether $R^2$ increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model

Joint Hypothesis Tests: F-statistic

The F-statistic is the test-statistic used to test joint hypotheses about regression coefficients with an F-test
This involves comparing two models:
1. Unrestricted model: regression with all coefficients
2. Restricted model: regression under null hypothesis (coefficients equal hypothesized values)
$F$ is an analysis of variance (ANOVA)
- essentially tests whether $R^2$ increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model
$F$ has its own distribution, with two sets of degrees of freedom

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

$H_0: \beta_2=\beta_3=\beta_4=0$

Joint Hypothesis F-test: Example I

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

$H_0: \beta_2=\beta_3=\beta_4=0$
$H_a$: $H_0$ is not true (at least one $\beta_i \neq 0$)

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i$$

Joint Hypothesis F-test: Example II

Example: Return again to:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i$$

$F$-test: does going from restricted to unrestricted model statistically significantly improve $R^2$?

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)
$k$: number of $X$ variables in unrestricted model (all variables)

Calculating the F-statistic

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$

$\color{#e64173}{R^2_u}$: the $R^2$ from the unrestricted model (all variables)
$\color{#6A5ACD}{R^2_r}$: the $R^2$ from the restricted model (null hypothesis)
$q$: number of restrictions (number of $\beta's=0$ under null hypothesis)
$k$: number of $X$ variables in unrestricted model (all variables)
$F$ has two sets of degrees of freedom:
- $q$ for the numerator, $(n-k-1)$ for the denominator

Calculating the F-statistic II

$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$

Key takeaway: The bigger the difference between $(R^2_u-R^2_r)$, the greater the improvement in fit by adding variables, the larger the $F$!
This formula is (believe it or not) actually a simplified version (assuming homoskedasticity)
- I give you this formula to build your intuition of what F is measuring

F-test Example I

We'll use the wooldridge package's wage1 data again

# load in data from wooldridge package
library(wooldridge)
wages <- wage1
# run regressions
unrestricted_reg <- lm(wage ~ female + northcen + west + south, data = wages)
restricted_reg <- lm(wage ~ female, data = wages)

F-test Example II

Unrestricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Northcen+\hat{\beta_4}South_i$$

Restricted model:

$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i$$

$H_0: \beta_2 = \beta_3 = \beta_4 =0$
$q = 3$ restrictions (F numerator df)
$n-k-1 = 526-4-1=521$ (F denominator df)

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing


# load car package for additional regression tools
library(car) 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south"))

## Linear hypothesis test
## 
## Hypothesis:
## northcen = 0
## west = 0
## south = 0
## 
## Model 1: restricted model
## Model 2: wage ~ female + northcen + west + south
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    524 6332.2                                
## 2    521 6174.8  3    157.36 4.4258 0.004377 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  

F-test Example IIIWe can use the car package's linearHypothesis() command to run an \(F\)-test:first argument: name of the (unrestricted) regression
second argument: vector of variable names (in quotes) you are testing


# load car package for additional regression tools
library(car) 
# F-test
linearHypothesis(unrestricted_reg, c("northcen", "west", "south"))

## Linear hypothesis test
## 
## Hypothesis:
## northcen = 0
## west = 0
## south = 0
## 
## Model 1: restricted model
## Model 2: wage ~ female + northcen + west + south
## 
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1    524 6332.2                                
## 2    521 6174.8  3    157.36 4.4258 0.004377 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(p\)-value on \(F\)-test \(<0.05\), so we can reject \(H_0\)


  

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Does height as an adolescent have the same effect on wages as height as an adult?

$$H_0: \beta_1=\beta_2$$

Second F-test Example: Are Two Coefficients Equal?

The second type of test is whether two coefficients equal one another

Example:

$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$

Does height as an adolescent have the same effect on wages as height as an adult?

$$H_0: \beta_1=\beta_2$$

What is the restricted regression?

$$\widehat{wage_i}=\beta_0+\beta_1(\text{Adolescent height}_i + \text{Adult height}_i )+ \beta_3 \text{Male}_i$$

$q=1$ restriction

Second F-test Example: Data

# load in data
heightwages <- read_csv("../data/heightwages.csv")
# make a "heights" variable as the sum of adolescent (height81) and adult (height85) height
heightwages <- heightwages %>%
  mutate(heights = height81 + height85)
height_reg <- lm(wage96 ~ height81 + height85 + male, data = heightwages)
height_restricted_reg <- lm(wage96 ~ heights + male, data = heightwages)

Second F-test Example: Data

For second argument, set two variables equal, in quotes

linearHypothesis(height_reg, "height81 = height85") # F-test

## Linear hypothesis test
## 
## Hypothesis:
## height81 - height85 = 0
## 
## Model 1: restricted model
## Model 2: wage96 ~ height81 + height85 + male
## 
##   Res.Df     RSS Df Sum of Sq      F Pr(>F)
## 1   6591 5128243                           
## 2   6590 5127284  1     959.2 1.2328 0.2669

Insufficient evidence to reject $H_0$!
The effect of adolescent and adult height on wages is the same

All F-test Isummary(unrestricted_reg)

## 
## Call:
## lm(formula = wage ~ female + northcen + west + south, data = wages)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3269 -2.0105 -0.7871  1.1898 17.4146 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.5654     0.3466  21.827   <2e-16 ***
## female       -2.5652     0.3011  -8.520   <2e-16 ***
## northcen     -0.5918     0.4362  -1.357   0.1755    
## west          0.4315     0.4838   0.892   0.3729    
## south        -1.0262     0.4048  -2.535   0.0115 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.443 on 521 degrees of freedom
## Multiple R-squared:  0.1376,    Adjusted R-squared:  0.131 
## F-statistic: 20.79 on 4 and 521 DF,  p-value: 6.501e-16
Last line of regression output from summary() is an All F-test\(H_0:\) all \(\beta's=0\) the regression explains no variation in \(Y\)

Calculates an F-statistic that, if high enough, is significant (p-value \(<0.05)\) enough to reject \(H_0\)



  

All F-test IIAlternatively, if you use broom instead of summary():glance() command makes table of regression summary statistics
tidy() only shows coefficients

library(broom)
glance(unrestricted_reg)

## # A tibble: 1 × 12
##   r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl> <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.138         0.131  3.44      20.8 6.50e-16     4 -1394. 2800. 2826.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
statistic is the All F-test, p.value next to it is the p-value from the F test

  

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

3.9 — Logarithmic Regression

ECON 480 • Econometrics • Fall 2021

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF21 metricsF21.classes.ryansafner.com

Outline

Nonlinearities

Nonlinearities

Nonlinearities

Nonlinearities

Natural Logarithms

Logarithmic Models

The Natural Logarithm

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review I

The Natural Logarithm: Review II

The Natural Logarithm: Properties

The Natural Logarithm: Example

The Natural Logarithm: Example

The Natural Logarithm: Example

Elasticity

Elasticity

Elasticity

Elasticity

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Math FYI: Cobb Douglas Functions and Logs

Logarithms in R I

Logarithms in R II

Logarithms in R III

Types of Logarithmic Models

Types of Logarithmic Models

Types of Logarithmic Models

Types of Logarithmic Models

Linear-Log Model

Linear-Log Model

Linear-Log Model

Linear-Log Model

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model in R

Linear-Log Model Graph I

Linear-Log Model Graph II

Log-Linear Model

Log-Linear Model

Log-Linear Model

Log-Linear Model

Log-Linear Model in R (Preliminaries)

Log-Linear Model in R (Preliminaries)

Log-Linear Model in R

Log-Linear Model in R

Log-Linear Model in R

Log-Linear Model in R

Linear-Log Model Graph I

Log-Log Model

Log-Log Model

Log-Log Model

Log-Log Model

Log-Log Model in R

Log-Log Model in R

Log-Log Model in R

Log-Log Model in R

Log-Log Model Graph I

Comparing Models I

Comparing Models II

Comparing Models III

When to Log?

When to Log?

Comparing Across Units

Comparing Coefficients of Different Units I

Comparing Coefficients of Different Units II

Comparing Coefficients of Different Units: Example

Comparing Coefficients of Different Units: Example

Standardizing in R

Joint Hypothesis Testing

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Standardizing in `R`

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com