class: center, middle, inverse, title-slide # 3.9 — Logarithmic Regression ## ECON 480 • Econometrics • Fall 2021 ### Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com
--- class: inverse # Outline ### [Natural Logarithms](#7) ### [Linear-Log Model](#37) ### [Log-Linear Model](#47) ### [Log-Log Model](#58) ### [Comparing Across Units](#72) ### [Joint Hypothesis Testing](#78) --- # Nonlinearities .pull-left[ - Consider the `gapminder` example ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-1-1.png" width="504" /> ] --- # Nonlinearities .pull-left[ - Consider the `gapminder` example .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-2-1.png" width="504" /> ] --- # Nonlinearities .pull-left[ - Consider the `gapminder` example .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` `$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-3-1.png" width="504" /> ] --- # Nonlinearities .pull-left[ - Consider the `gapminder` example .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` `$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$` `$$\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln \text{GDP}_i}$$` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-4-1.png" width="504" /> ] --- class: inverse, center, middle # Natural Logarithms --- # Logarithmic Models .pull-left[ .smallest[ - Another useful model for nonlinear data is the .hi[logarithmic model]<sup>.magenta[†]</sup> - We transform either `\(X\)`, `\(Y\)`, or *both* by taking the .hi-purple[(natural) logarithm] - Logarithmic model has two additional advantages 1. We can easily interpret coefficients as **percentage changes** or **elasticities** 2. Useful economic shape: diminishing returns (production functions, utility functions, etc) ] .tiny[<sup>.magenta[†]</sup> Don’t confuse this with a .hi[logistic (logit) model] for *dependent* dummy variables.] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-5-1.png" width="504" /> ] --- # The Natural Logarithm .pull-left[ <img src="3.9-slides_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] .pull-right[ - The .red[exponential function], `\(Y=e^X\)` or `\(Y=exp(X)\)`, where base `\(e=2.71828...\)` - .blue[Natural logarithm] is the inverse, `\(Y=ln(X)\)` ] --- # The Natural Logarithm: Review I .smallest[ - .hi[Exponents] are defined as `$$\color{#6A5ACD}{b}^{\color{#e64173}{n}}=\underbrace{\color{#6A5ACD}{b} \times \color{#6A5ACD}{b} \times \cdots \times \color{#6A5ACD}{b}}_{\color{#e64173}{n} \text{ times}}$$` - where base `\(\color{#6A5ACD}{b}\)` is multiplied by itself `\(\color{#e64173}{n}\)` times ] -- .smallest[ - .green[**Example**]: `\(\color{#6A5ACD}{2}^{\color{#e64173}{3}}=\underbrace{\color{#6A5ACD}{2} \times \color{#6A5ACD}{2} \times \color{#6A5ACD}{2}}_{\color{#e64173}{n=3}}=\color{#314f4f}{8}\)` ] -- .smallest[ - .hi-purple[Logarithms] are the inverse, defined as the exponents in the expressions above `$$\text{If } \color{#6A5ACD}{b}^{\color{#e64173}{n}}=\color{#314f4f}{y}\text{, then }log_{\color{#6A5ACD}{b}}(\color{#314f4f}{y})=\color{#e64173}{n}$$` - `\(\color{#e64173}{n}\)` is the number you must raise `\(\color{#6A5ACD}{b}\)` to in order to get `\(\color{#314f4f}{y}\)` ] -- .smallest[ - .green[**Example**]: `\(log_{\color{#6A5ACD}{2}}(\color{#314f4f}{8})=\color{#e64173}{3}\)` ] --- # The Natural Logarithm: Review II - Logarithms can have any base, but common to use the .hi-purple[natural logarithm] `\((\ln)\)` with base `\(\mathbf{e=2.71828...}\)` `$$\text{If } e^n=y\text{, then } \ln(y)=n$$` --- # The Natural Logarithm: Properties - Natural logs have a lot of useful properties: 1. `\(\ln(\frac{1}{x})=-\ln(x)\)` 2. `\(\ln(ab)=\ln(a)+\ln(b)\)` 3. `\(\ln(\frac{x}{a})=\ln(x)-\ln(a)\)` 4. `\(\ln(x^a)=a \, \ln(x)\)` 5. `\(\frac{d \, \ln \, x}{d \, x} = \frac{1}{x}\)` --- # The Natural Logarithm: Example .smallest[ - Most useful property: for small change in `\(x\)`, `\(\Delta x\)`: `$$\underbrace{\ln(x+\Delta x) - \ln(x)}_{\text{Difference in logs}} \approx \underbrace{\frac{\Delta x}{x}}_{\text{Relative change}}$$` ] -- .smallest[ .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: Let `\(x=100\)` and `\(\Delta x =1\)`, relative change is: `$$\frac{\Delta x}{x} = \frac{(101-100)}{100} = 0.01 \text{ or }1\%$$` - The logged difference: `$$\ln(101)-\ln(100) = 0.00995 \approx 1\%$$` ] ] -- .smallest[ - This allows us to very easily interpret coefficients as **percent changes** or .hi-purple[elasticities] ] --- # Elasticity .smallest[ - An .hi[elasticity] between any two variables, `\(\epsilon_{Y,X}\)` describes the .hi-purple[responsiveness] (in %) of one variable `\((Y)\)` to a change in another `\((X)\)` ] -- `$$\epsilon_{Y,X}=\frac{\% \Delta Y}{\% \Delta X} =\cfrac{\left(\frac{\Delta Y}{Y}\right)}{\left( \frac{\Delta X}{X}\right)}$$` -- .smallest[ - Numerator is relative change in `\(Y\)`, Denominator is relative change in `\(X\)` ] -- .smallest[ - .hi-purple[Interpretation]: a 1% change in `\(X\)` will cause a `\(\epsilon_{Y,X}\)`% chang in `\(Y\)` ] --- # Math FYI: Cobb Douglas Functions and Logs - One of the (many) reasons why economists love Cobb-Douglas functions: `$$Y=AL^{\alpha}K^{\beta}$$` -- - Taking logs, relationship becomes linear: -- `$$\ln(Y)=\ln(A)+\alpha \ln(L)+ \beta \ln(K)$$` -- - With data on `\((Y, L, K)\)` and linear regression, can estimate `\(\alpha\)` and `\(\beta\)` - `\(\alpha\)`: elasticity of `\(Y\)` with respect to `\(L\)` - A 1% change in `\(L\)` will lead to an `\(\alpha\)`% change in `\(Y\)` - `\(\beta\)`: elasticity of `\(Y\)` with respect to `\(K\)` - A 1% change in `\(K\)` will lead to a `\(\beta\)`% change in `\(Y\)` --- # Math FYI: Cobb Douglas Functions and Logs .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .hi-green[Example]: Cobb-Douglas production function: `$$Y=2L^{0.75}K^{0.25}$$` ] -- - Taking logs: `$$\ln Y=\ln 2+0.75 \ln L + 0.25 \ln K$$` -- - A 1% change in `\(L\)` will yield a 0.75% change in output `\(Y\)` - A 1% change in `\(K\)` will yield a 0.25% change in output `\(Y\)` --- # Logarithms in R I - The `log()` function can easily take the logarithm .smallest[ ```r gapminder <- gapminder %>% mutate(loggdp = log(gdpPercap)) # log GDP per capita gapminder %>% head() # look at it ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["country"],"name":[1],"type":["fct"],"align":["left"]},{"label":["continent"],"name":[2],"type":["fct"],"align":["left"]},{"label":["year"],"name":[3],"type":["int"],"align":["right"]},{"label":["lifeExp"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["pop"],"name":[5],"type":["int"],"align":["right"]},{"label":["gdpPercap"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["loggdp"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Afghanistan","2":"Asia","3":"1952","4":"28.801","5":"8425333","6":"779.4453","7":"6.658583"},{"1":"Afghanistan","2":"Asia","3":"1957","4":"30.332","5":"9240934","6":"820.8530","7":"6.710344"},{"1":"Afghanistan","2":"Asia","3":"1962","4":"31.997","5":"10267083","6":"853.1007","7":"6.748878"},{"1":"Afghanistan","2":"Asia","3":"1967","4":"34.020","5":"11537966","6":"836.1971","7":"6.728864"},{"1":"Afghanistan","2":"Asia","3":"1972","4":"36.088","5":"13079460","6":"739.9811","7":"6.606625"},{"1":"Afghanistan","2":"Asia","3":"1977","4":"38.438","5":"14880372","6":"786.1134","7":"6.667101"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Logarithms in R II - Note, `log()` by default is the **natural logarithm `\(ln()\)`**, i.e. base `e` - Can change base with e.g. `log(x, base = 5)` - Some common built-in logs: `log10`, `log2` ```r log10(100) ``` ``` ## [1] 2 ``` ```r log2(16) ``` ``` ## [1] 4 ``` ```r log(19683, base=3) ``` ``` ## [1] 9 ``` --- # Logarithms in R III - Note when running a regression, you can pre-transform the data into logs (as I did above), or just add `log()` around a variable in the regression .pull-left[ .tiny[ ```r lm(lifeExp ~ loggdp, data = gapminder) %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"loggdp","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .tiny[ ```r lm(lifeExp ~ log(gdpPercap), data = gapminder) %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"log(gdpPercap)","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] --- # Types of Logarithmic Models - Three types of log regression models, depending on which variables we log -- 1. .hi-purple[Linear-log model:] `\(Y_i=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)` -- 2. .hi-purple[Log-linear model:] `\(\color{#e64173}{\ln Y_i}=\beta_0+\beta_1X_i\)` -- 3. .hi-purple[Log-log model:] `\(\color{#e64173}{\ln Y_i}=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\)` --- class: inverse, center, middle # Linear-Log Model --- # Linear-Log Model - .hi-purple[Linear-log model] has an independent variable `\((X)\)` that is logged -- `$$\begin{align*} Y&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\Delta Y}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$` -- - .hi-purple[**Marginal effect of** `\\(\mathbf{X \rightarrow Y}\\)`: a **1%** change in `\\(X \rightarrow\\)` a `\\(\frac{\beta_1}{100}\\)` **unit** change in `\\(Y\\)`] --- # Linear-Log Model in R .pull-left[ .tiny[ ```r lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder) library(broom) lin_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"loggdp","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$` ] ] --- # Linear-Log Model in R .pull-left[ .tiny[ ```r lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder) library(broom) lin_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"loggdp","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a `\(\frac{9.41}{100}=\)` **0.0841 year increase** in Life Expectancy ] ] --- # Linear-Log Model in R .pull-left[ .tiny[ ```r lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder) library(broom) lin_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"loggdp","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a `\(\frac{9.41}{100}=\)` **0.0841 year increase** in Life Expectancy - A **25% fall in GDP** `\(\rightarrow\)` a `\((-25 \times 0.0841)=\)` **2.1025 year _decrease_** in Life Expectancy ] ] --- # Linear-Log Model in R .pull-left[ .tiny[ ```r lin_log_reg <- lm(lifeExp ~ loggdp, data = gapminder) library(broom) lin_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"-9.100889","3":"1.227674","4":"-7.413117","5":"1.934812e-13"},{"1":"loggdp","2":"8.405085","3":"0.148762","4":"56.500206","5":"0.000000e+00"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{Life Expectancy}}_i=-9.10+8.41 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a `\(\frac{9.41}{100}=\)` **0.0841 year increase** in Life Expectancy - A **25% fall in GDP** `\(\rightarrow\)` a `\((-25 \times 0.0841)=\)` **2.1025 year _decrease_** in Life Expectancy - A **100% rise in GDP** `\(\rightarrow\)` a `\((100 \times 0.0841)=\)` **8.4100 year increase** in Life Expectancy ] ] --- # Linear-Log Model Graph I .pull-left[ .code50[ ```r ggplot(data = gapminder)+ aes(x = gdpPercap, y = lifeExp)+ geom_point(color="blue", alpha=0.5)+ * geom_smooth(method="lm", * formula=y~log(x), * color="orange")+ scale_x_continuous(labels=scales::dollar, breaks=seq(0,120000,20000))+ scale_y_continuous(breaks=seq(0,100,10), limits=c(0,100))+ labs(x = "GDP per Capita", y = "Life Expectancy (Years)")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-16-1.png" width="504" /> ] --- # Linear-Log Model Graph II .pull-left[ .code50[ ```r ggplot(data = gapminder)+ * aes(x = loggdp, y = lifeExp)+ geom_point(color="blue", alpha=0.5)+ * geom_smooth(method="lm", color="orange")+ scale_y_continuous(breaks=seq(0,100,10), limits=c(0,100))+ labs(x = "Log GDP per Capita", y = "Life Expectancy (Years)")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-17-1.png" width="504" /> ] --- class: inverse, center, middle # Log-Linear Model --- # Log-Linear Model - .hi-purple[Log-linear model] has the dependent variable `\((Y)\)` logged -- `$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 X\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\Delta X}\\ \end{align*}$$` -- - .hi-purple[**Marginal effect of** `\\(\mathbf{X \rightarrow Y}\\)`: a **1 unit** change in `\\(X \rightarrow\\)` a `\\(\beta_1 \times 100\\)` **%** change in `\\(Y\\)`] --- # Log-Linear Model in R (Preliminaries) .smallest[ - We will again have very large/small coefficients if we deal with GDP directly, again let's transform `gdpPercap` into $1,000s, call it `gdp_t` - Then log LifeExp ] -- .quitesmall[ ```r gapminder <- gapminder %>% mutate(gdp_t = gdpPercap/1000, # first make GDP/capita in $1000s loglife = log(lifeExp)) # take the log of LifeExp gapminder %>% head() # look at it ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["country"],"name":[1],"type":["fct"],"align":["left"]},{"label":["continent"],"name":[2],"type":["fct"],"align":["left"]},{"label":["year"],"name":[3],"type":["int"],"align":["right"]},{"label":["lifeExp"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["pop"],"name":[5],"type":["int"],"align":["right"]},{"label":["gdpPercap"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["loggdp"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["gdp_t"],"name":[8],"type":["dbl"],"align":["right"]},{"label":["loglife"],"name":[9],"type":["dbl"],"align":["right"]}],"data":[{"1":"Afghanistan","2":"Asia","3":"1952","4":"28.801","5":"8425333","6":"779.4453","7":"6.658583","8":"0.7794453","9":"3.360410"},{"1":"Afghanistan","2":"Asia","3":"1957","4":"30.332","5":"9240934","6":"820.8530","7":"6.710344","8":"0.8208530","9":"3.412203"},{"1":"Afghanistan","2":"Asia","3":"1962","4":"31.997","5":"10267083","6":"853.1007","7":"6.748878","8":"0.8531007","9":"3.465642"},{"1":"Afghanistan","2":"Asia","3":"1967","4":"34.020","5":"11537966","6":"836.1971","7":"6.728864","8":"0.8361971","9":"3.526949"},{"1":"Afghanistan","2":"Asia","3":"1972","4":"36.088","5":"13079460","6":"739.9811","7":"6.606625","8":"0.7399811","9":"3.585960"},{"1":"Afghanistan","2":"Asia","3":"1977","4":"38.438","5":"14880372","6":"786.1134","7":"6.667101","8":"0.7861134","9":"3.649047"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Log-Linear Model in R .pull-left[ .tiny[ ```r log_lin_reg <- lm(loglife~gdp_t, data = gapminder) log_lin_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"3.966639","3":"0.0058345501","4":"679.85339","5":"0.000000e+00"},{"1":"gdp_t","2":"0.012917","3":"0.0004777072","4":"27.03958","5":"2.920378e-134"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$` ] ] --- # Log-Linear Model in R .pull-left[ .tiny[ ```r log_lin_reg <- lm(loglife~gdp_t, data = gapminder) log_lin_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"3.966639","3":"0.0058345501","4":"679.85339","5":"0.000000e+00"},{"1":"gdp_t","2":"0.012917","3":"0.0004777072","4":"27.03958","5":"2.920378e-134"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\ln\text{Life Expectancy}}_i=3.967+0.013 \, \text{GDP}_i$$` - A **$1 (thousand) change in GDP** `\(\rightarrow\)` a `\(0.013 \times 100\%=\)` **1.3% increase** in Life Expectancy ] ] --- # Log-Linear Model in R .pull-left[ .tiny[ ```r log_lin_reg <- lm(loglife~gdp_t, data = gapminder) log_lin_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"3.966639","3":"0.0058345501","4":"679.85339","5":"0.000000e+00"},{"1":"gdp_t","2":"0.012917","3":"0.0004777072","4":"27.03958","5":"2.920378e-134"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$` - A **$1 (thousand) change in GDP** `\(\rightarrow\)` a `\(0.013 \times 100\%=\)` **1.3% increase** in Life Expectancy - A **$25 (thousand) fall in GDP** `\(\rightarrow\)` a `\((-25 \times 1.3\%)=\)` **32.5% decrease** in Life Expectancy ] ] --- # Log-Linear Model in R .pull-left[ .tiny[ ```r log_lin_reg <- lm(loglife~gdp_t, data = gapminder) log_lin_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"3.966639","3":"0.0058345501","4":"679.85339","5":"0.000000e+00"},{"1":"gdp_t","2":"0.012917","3":"0.0004777072","4":"27.03958","5":"2.920378e-134"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{ln(\text{Life Expectancy})}_i=3.967+0.013 \, \text{GDP}_i$$` - A **$1 (thousand) change in GDP** `\(\rightarrow\)` a `\(0.013 \times 100\%=\)` **1.3% increase** in Life Expectancy - A **$25 (thousand) fall in GDP** `\(\rightarrow\)` a `\((-25 \times 1.3\%)=\)` **32.5% decrease** in Life Expectancy - A **$100 (thousand) rise in GDP** `\(\rightarrow\)` a `\((100 \times 1.3\%)=\)` **130% increase** in Life Expectancy ] ] --- # Linear-Log Model Graph I .pull-left[ .code50[ ```r ggplot(data = gapminder)+ * aes(x = gdp_t, * y = loglife)+ geom_point(color="blue", alpha=0.5)+ * geom_smooth(method="lm", color="orange")+ scale_x_continuous(labels=scales::dollar, breaks=seq(0,120,20))+ labs(x = "GDP per Capita ($ Thousands)", y = "Log Life Expectancy")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-23-1.png" width="504" /> ] --- class: inverse, center, middle # Log-Log Model --- # Log-Log Model - .hi-purple[Log-log model] has both variables `\((X \text{ and } Y)\)` logged -- `$$\begin{align*} \color{#e64173}{\ln Y_i}&=\beta_0+\beta_1 \color{#e64173}{\ln X_i}\\ \beta_1&=\cfrac{\big(\frac{\Delta Y}{Y}\big)}{\big(\frac{\Delta X}{X}\big)}\\ \end{align*}$$` -- - .hi-purple[**Marginal effect of** `\\(\mathbf{X \rightarrow Y}\\)`: a **1%** change in `\\(X \rightarrow\\)` a `\\(\beta_1\\)` **%** change in `\\(Y\\)`] - `\(\beta_1\)` is the .hi-turquoise[elasticity] of `\(Y\)` with respect to `\(X\)`! --- # Log-Log Model in R .pull-left[ .tiny[ ```r log_log_reg <- lm(loglife ~ loggdp, data = gapminder) log_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"2.864177","3":"0.02328274","4":"123.01718","5":"0"},{"1":"loggdp","2":"0.146549","3":"0.00282126","4":"51.94452","5":"0"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$` ] ] --- # Log-Log Model in R .pull-left[ .tiny[ ```r log_log_reg <- lm(loglife ~ loggdp, data = gapminder) log_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"2.864177","3":"0.02328274","4":"123.01718","5":"0"},{"1":"loggdp","2":"0.146549","3":"0.00282126","4":"51.94452","5":"0"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a **0.147% increase** in Life Expectancy ] ] --- # Log-Log Model in R .pull-left[ .tiny[ ```r log_log_reg <- lm(loglife ~ loggdp, data = gapminder) log_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"2.864177","3":"0.02328274","4":"123.01718","5":"0"},{"1":"loggdp","2":"0.146549","3":"0.00282126","4":"51.94452","5":"0"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a **0.147% increase** in Life Expectancy - A **25% fall in GDP** `\(\rightarrow\)` a `\((-25 \times 0.147\%)=\)` **3.675% decrease** in Life Expectancy ] ] --- # Log-Log Model in R .pull-left[ .tiny[ ```r log_log_reg <- lm(loglife ~ loggdp, data = gapminder) log_log_reg %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"2.864177","3":"0.02328274","4":"123.01718","5":"0"},{"1":"loggdp","2":"0.146549","3":"0.00282126","4":"51.94452","5":"0"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ .smallest[ `$$\widehat{\text{ln Life Expectancy}}_i=2.864+0.147 \, \text{ln GDP}_i$$` - A **1% change in GDP** `\(\rightarrow\)` a **0.147% increase** in Life Expectancy - A **25% fall in GDP** `\(\rightarrow\)` a `\((-25 \times 0.147\%)=\)` **3.675% decrease** in Life Expectancy - A **100% rise in GDP** `\(\rightarrow\)` a `\((100 \times 0.147\%)=\)` **14.7% increase** in Life Expectancy ] ] --- # Log-Log Model Graph I .pull-left[ .code50[ ```r ggplot(data = gapminder)+ * aes(x = loggdp, * y = loglife)+ geom_point(color="blue", alpha=0.5)+ * geom_smooth(method="lm", color="orange")+ labs(x = "Log GDP per Capita", y = "Log Life Expectancy")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.9-slides_files/figure-html/unnamed-chunk-28-1.png" width="504" /> ] --- # Comparing Models I | Model | Equation | Interpretation | |-------|----------|----------------| | Linear-.hi[Log] | `\(Y=\beta_0+\beta_1 \color{#e64173}{\ln X}\)` | 1.hi[%] change in `\(X \rightarrow \frac{\hat{\beta_1}}{100}\)` **unit** change in `\(Y\)` | | .hi[Log]-Linear | `\(\color{#e64173}{\ln Y}=\beta_0+\beta_1X\)` | 1 **unit** change in `\(X \rightarrow \hat{\beta_1}\times 100\)`.hi[%] change in `\(Y\)` | | .hi[Log]-.hi[Log] | `\(\color{#e64173}{\ln Y}=\beta_0+\beta_1\color{#e64173}{\ln X}\)` | 1.hi[%] change in `\(X \rightarrow \hat{\beta_1}\)`.hi[%] change in `\(Y\)` | - Hint: the variable that gets .hi[logged] changes in .hi[percent] terms, the variable not logged changes in **unit** terms - Going from units `\(\rightarrow\)` percent: multiply by 100 - Going from percent `\(\rightarrow\)` units: divide by 100 --- # Comparing Models II .pull-left[ .code50[ ```r library(huxtable) huxreg("Life Exp." = lin_log_reg, "Log Life Exp." = log_lin_reg, "Log Life Exp." = log_log_reg, coefs = c("Constant" = "(Intercept)", "GDP ($1000s)" = "gdp_t", "Log GDP" = "loggdp"), statistics = c("N" = "nobs", "R-Squared" = "r.squared", "SER" = "sigma"), number_format = 2) ``` ] - Models are very different units, how to choose? - Compare `\\(R^2\\)`’s - Compare graphs - Compare intution ] .pull-left[ .tiny[
Life Exp.
Log Life Exp.
Log Life Exp.
Constant
-9.10 ***
3.97 ***
2.86 ***
(1.23)
(0.01)
(0.02)
GDP ($1000s)
0.01 ***
(0.00)
Log GDP
8.41 ***
0.15 ***
(0.15)
(0.00)
N
1704
1704
1704
R-Squared
0.65
0.30
0.61
SER
7.62
0.19
0.14
*** p < 0.001; ** p < 0.01; * p < 0.05.
] ] --- # Comparing Models III .smallest[ | Linear-.hi[Log] | .hi[Log]-Linear | .hi[Log]-.hi[Log] | |:----------:|:----------:|:-------:| | ![](3.9-slides_files/figure-html/unnamed-chunk-17-1.png) | ![](3.9-slides_files/figure-html/unnamed-chunk-23-1.png) | ![](3.9-slides_files/figure-html/unnamed-chunk-28-1.png) | | `\(\hat{Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)` | `\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}X_i\)` | `\(\color{#e64173}{\ln Y_i}=\hat{\beta_0}+\hat{\beta_1}\color{#e64173}{\ln X_i}\)` | | `\(R^2=0.65\)` | `\(R^2=0.30\)` | `\(R^2=0.61\)` | ] --- # When to Log? .smaller[ - In practice, the following types of variables are logged: - Variables that must always be positive (prices, sales, market values) - Very large numbers (population, GDP) - Variables we want to talk about as percentage changes or growth rates (money supply, population, GDP) - Variables that have diminishing returns (output, utility) - Variables that have nonlinear scatterplots ] -- .smaller[ - Avoid logs for: - Variables that are less than one, decimals, 0, or negative - Categorical variables (season, gender, political party) - Time variables (year, week, day) ] --- class: inverse, center, middle # Comparing Across Units --- # Comparing Coefficients of Different Units I .smallest[ `$$\hat{Y_i}=\beta_0+\beta_1 X_1+\beta_2 X_2 $$` - We often want to compare coefficients to see which variable `\(X_1\)` or `\(X_2\)` has a bigger effect on `\(Y\)` - What if `\(X_1\)` and `\(X_2\)` are different units? .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: `$$\begin{align*} \widehat{\text{Salary}_i}&=\beta_0+\beta_1\, \text{Batting average}_i+\beta_2\, \text{Home runs}_i\\ \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36}\, \text{Home runs}_i\\ \end{align*}$$` ] ] --- # Comparing Coefficients of Different Units II - An easy way is to .hi[standardize]<sup>.magenta[†]</sup> the variables (i.e. take the `\(Z\)`-score) `$$X_Z=\frac{X_i-\overline{X}}{sd(X)}$$` .footnote[<sup>.magenta[†]</sup> Also called “centering” or “scaling.”] --- # Comparing Coefficients of Different Units: Example .smallest[ | Variable | Mean | Std. Dev. | |----------|------|-----------| | Salary | $2,024,616 | $2,764,512 | | Batting Average | 0.267 | 0.031 | | Home Runs | 12.11 | 10.31 | ] .quitesmall[ `$$\begin{align*}\scriptsize \widehat{\text{Salary}_i}&=-\text{2,869,439.40}+\text{12,417,629.72} \, \text{Batting average}_i+\text{129,627.36} \, \text{Home runs}_i\\ \widehat{\text{Salary}_Z}&=\text{0.00}+\text{0.14} \, \text{Batting average}_Z+\text{0.48} \, \text{Home runs}_Z\\ \end{align*}$$` ] -- .quitesmall[ - .hi-purple[Marginal effects] on `\(Y\)` (in *standard deviations* of `\(Y\)`) from 1 *standard deviation* change in `\(X\)`: - `\(\hat{\beta_1}\)`: a 1 standard deviation increase in Batting Average increases Salary by 0.14 standard deviations $$0.14 \times \$2,764,512=\$387,032$$ - `\(\hat{\beta_2}\)`: a 1 standard deviation increase in Home Runs increases Salary by 0.48 standard deviations $$0.48 \times \$2,764,512=\$1,326,966$$ ] --- # Standardizing in `R` .tiny[ | Variable | Mean | SD | |----------|-----:|---:| | `LifeExp` | 59.47 | 12.92 | | `gdpPercap` | $7215.32 | $9857.46 | ] .quitesmall[ - Use the `scale()` command inside `mutate()` function to standardize a variable ] .code60[ ```r gapminder <- gapminder %>% * mutate(life_Z = scale(lifeExp), * gdp_Z = scale(gdpPercap)) std_reg <- lm(life_Z ~ gdp_Z, data = gapminder) tidy(std_reg) ``` ``` ## # A tibble: 2 × 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 1.10e-16 0.0197 5.57e-15 1.00e+ 0 ## 2 gdp_Z 5.84e- 1 0.0197 2.97e+ 1 3.57e-156 ``` ] .quitesmall[ - A 1 standard deviation increase in `gdpPercap` will increase `lifeExp` by 0.584 standard deviations `\((0.584 \times 12.92 = = 7.55\)` years) ] --- class: inverse, center, middle # Joint Hypothesis Testing --- # Joint Hypothesis Testing I .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: Return again to: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$` ] -- - Maybe region doesn't affect wages *at all*? -- - `\(H_0: \beta_2=0, \, \beta_3=0, \, \beta_4=0\)` -- - This is a .hi[joint hypothesis] to test --- # Joint Hypothesis Testing II - A .hi[joint hypothesis] tests against the null hypothesis of a value for **multiple** parameters: `$$\mathbf{H_0: \beta_1= \beta_2=0}$$` the hypotheses that **multiple** regressors are equal to zero (have no causal effect on the outcome) -- - Our .hi-purple[alternative hypothesis] is that: `$$H_1: \text{ either } \beta_1\neq0\text{ or } \beta_2\neq0\text{ or both}$$` or simply, that `\(H_0\)` is not true --- # Types of Joint Hypothesis Tests .smallest[ 1) `\(H_0\)`: `\(\beta_1=\beta_2=0\)` - Testing against the claim that multiple variables don't matter - Useful under high multicollinearity between variables - `\(H_a\)`: at least one parameter `\(\neq\)` 0 ] -- .smallest[ 2) `\(H_0\)`: `\(\beta_1=\beta_2\)` - Testing whether two variables matter the same - Variables must be the same units - `\(H_a: \beta_1 (\neq, <, \text{ or }>) \beta_2\)` ] -- .smallest[ 3) `\(H_0:\)` ALL `\(\beta\)`'s `\(=0\)` - The "**Overall F-test"** - Testing against claim that regression model explains *NO* variation in `\(Y\)` ] --- # Joint Hypothesis Tests: F-statistic - The .hi-turquoise[F-statistic] is the test-statistic used to test joint hypotheses about regression coefficients with an .hi-turquoise[F-test] -- - This involves comparing two models: 1. .hi[Unrestricted model]: regression with all coefficients 2. .hi-purple[Restricted model]: regression under null hypothesis (coefficients equal hypothesized values) -- - `\(F\)` is an .hi-turquoise[analysis of variance (ANOVA)] - essentially tests whether `\(R^2\)` increases statistically significantly as we go from the restricted model$\rightarrow$unrestricted model -- - `\(F\)` has its own distribution, with *two* sets of degrees of freedom --- # Joint Hypothesis F-test: Example I .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: Return again to: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$` ] -- - `\(H_0: \beta_2=\beta_3=\beta_4=0\)` -- - `\(H_a\)`: `\(H_0\)` is not true (at least one `\(\beta_i \neq 0\)`) --- # Joint Hypothesis F-test: Example II .smallest[ .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: Return again to: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$` ] - .hi[Unrestricted model]: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Midwest_i+\hat{\beta_4}South_i$$` ] -- .smallest[ - .hi-purple[Restricted model]: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Male_i$$` ] -- .smallest[ - `\(F\)`-test: **does going from .hi-purple[restricted] to .hi[unrestricted] model statistically significantly improve `\(R^2\)`?** ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$` ] .pull-right[ ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$` ] .pull-right[ .smallest[ - `\(\color{#e64173}{R^2_u}\)`: the `\(R^2\)` from the .hi[unrestricted model] (all variables) ] ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$` ] .pull-right[ .smallest[ - `\(\color{#e64173}{R^2_u}\)`: the `\(R^2\)` from the .hi[unrestricted model] (all variables) - `\(\color{#6A5ACD}{R^2_r}\)`: the `\(R^2\)` from the .hi-purple[restricted model] (null hypothesis) ] ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$` ] .pull-right[ .smallest[ - `\(\color{#e64173}{R^2_u}\)`: the `\(R^2\)` from the .hi[unrestricted model] (all variables) - `\(\color{#6A5ACD}{R^2_r}\)`: the `\(R^2\)` from the .hi-purple[restricted model] (null hypothesis) - `\(q\)`: number of restrictions (number of `\(\beta's=0\)` under null hypothesis) ] ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$` ] .pull-right[ .smallest[ - `\(\color{#e64173}{R^2_u}\)`: the `\(R^2\)` from the .hi[unrestricted model] (all variables) - `\(\color{#6A5ACD}{R^2_r}\)`: the `\(R^2\)` from the .hi-purple[restricted model] (null hypothesis) - `\(q\)`: number of restrictions (number of `\(\beta's=0\)` under null hypothesis) - `\(k\)`: number of `\(X\)` variables in .hi[unrestricted model] (all variables) ] ] --- # Calculating the F-statistic .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(\color{#e64173}{R^2_u}-\color{#6A5ACD}{R^2_r})}{q}\right)}{\left(\displaystyle\frac{(1-\color{#e64173}{R^2_u})}{(n-k-1)}\right)}$$` ] .pull-right[ .smallest[ - `\(\color{#e64173}{R^2_u}\)`: the `\(R^2\)` from the .hi[unrestricted model] (all variables) - `\(\color{#6A5ACD}{R^2_r}\)`: the `\(R^2\)` from the .hi-purple[restricted model] (null hypothesis) - `\(q\)`: number of restrictions (number of `\(\beta's=0\)` under null hypothesis) - `\(k\)`: number of `\(X\)` variables in .hi[unrestricted model] (all variables) - `\(F\)` has two sets of degrees of freedom: - `\(q\)` for the numerator, `\((n-k-1)\)` for the denominator ] ] --- # Calculating the F-statistic II .pull-left[ `$$F_{q,(n-k-1)}=\cfrac{\left(\displaystyle\frac{(R^2_u-R^2_r)}{q}\right)}{\left(\displaystyle\frac{(1-R^2_u)}{(n-k-1)}\right)}$$` ] .pull-right[ - .hi-purple[Key takeaway]: The bigger the difference between `\((R^2_u-R^2_r)\)`, the greater the improvement in fit by adding variables, the larger the `\(F\)`! - This formula is (believe it or not) actually a simplified version (assuming homoskedasticity) - I give you this formula to **build your intuition of what F is measuring** ] --- # F-test Example I - We'll use the `wooldridge` package's `wage1` data again ```r # load in data from wooldridge package library(wooldridge) wages <- wage1 # run regressions unrestricted_reg <- lm(wage ~ female + northcen + west + south, data = wages) restricted_reg <- lm(wage ~ female, data = wages) ``` --- # F-test Example II - .hi[Unrestricted model]: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i+\hat{\beta_2}Northeast_i+\hat{\beta_3}Northcen+\hat{\beta_4}South_i$$` - .hi-purple[Restricted model]: `$$\widehat{Wage_i}=\hat{\beta_0}+\hat{\beta_1}Female_i$$` - `\(H_0: \beta_2 = \beta_3 = \beta_4 =0\)` - `\(q = 3\)` restrictions (F numerator df) - `\(n-k-1 = 526-4-1=521\)` (F denominator df) --- # F-test Example III .smallest[ - We can use the `car` package's `linearHypothesis()` command to run an `\(F\)`-test: - first argument: name of the (unrestricted) regression - second argument: vector of variable names (in quotes) you are testing ] -- .smallest[ .code50[ ```r # load car package for additional regression tools library(car) # F-test linearHypothesis(unrestricted_reg, c("northcen", "west", "south")) ``` ``` ## Linear hypothesis test ## ## Hypothesis: ## northcen = 0 ## west = 0 ## south = 0 ## ## Model 1: restricted model ## Model 2: wage ~ female + northcen + west + south ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 524 6332.2 ## 2 521 6174.8 3 157.36 4.4258 0.004377 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` ] ] -- .smallest[ - `\(p\)`-value on `\(F\)`-test `\(<0.05\)`, so we can reject `\(H_0\)` ] --- # Second F-test Example: Are Two Coefficients Equal? .smallest[ - The second type of test is whether two coefficients equal one another .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: `$$\widehat{wage_i}=\beta_0+\beta_1 \text{Adolescent height}_i + \beta_2 \text{Adult height}_i + \beta_3 \text{Male}_i$$` ] ] -- .smallest[ - Does height as an adolescent have the same effect on wages as height as an adult? `$$H_0: \beta_1=\beta_2$$` ] -- .smallest[ - What is the .hi-purple[restricted] regression? `$$\widehat{wage_i}=\beta_0+\beta_1(\text{Adolescent height}_i + \text{Adult height}_i )+ \beta_3 \text{Male}_i$$` - `\(q=1\)` restriction ] --- # Second F-test Example: Data ```r # load in data heightwages <- read_csv("../data/heightwages.csv") # make a "heights" variable as the sum of adolescent (height81) and adult (height85) height heightwages <- heightwages %>% mutate(heights = height81 + height85) height_reg <- lm(wage96 ~ height81 + height85 + male, data = heightwages) height_restricted_reg <- lm(wage96 ~ heights + male, data = heightwages) ``` --- # Second F-test Example: Data - For second argument, set two variables equal, in quotes ```r linearHypothesis(height_reg, "height81 = height85") # F-test ``` ``` ## Linear hypothesis test ## ## Hypothesis: ## height81 - height85 = 0 ## ## Model 1: restricted model ## Model 2: wage96 ~ height81 + height85 + male ## ## Res.Df RSS Df Sum of Sq F Pr(>F) ## 1 6591 5128243 ## 2 6590 5127284 1 959.2 1.2328 0.2669 ``` - Insufficient evidence to reject `\(H_0\)`! - The effect of adolescent and adult height on wages is the same --- # All F-test I .pull-left[ .code50[ ```r summary(unrestricted_reg) ``` ``` ## ## Call: ## lm(formula = wage ~ female + northcen + west + south, data = wages) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.3269 -2.0105 -0.7871 1.1898 17.4146 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 7.5654 0.3466 21.827 <2e-16 *** ## female -2.5652 0.3011 -8.520 <2e-16 *** ## northcen -0.5918 0.4362 -1.357 0.1755 ## west 0.4315 0.4838 0.892 0.3729 ## south -1.0262 0.4048 -2.535 0.0115 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.443 on 521 degrees of freedom ## Multiple R-squared: 0.1376, Adjusted R-squared: 0.131 ## F-statistic: 20.79 on 4 and 521 DF, p-value: 6.501e-16 ``` ] ] .pull-right[ .smallest[ - Last line of regression output from `summary()` is an **All F-test** - `\(H_0:\)` all `\(\beta's=0\)` - the regression explains no variation in `\(Y\)` - Calculates an `F-statistic` that, if high enough, is significant (`p-value` `\(<0.05)\)` enough to reject `\(H_0\)` ] ] --- # All F-test II - Alternatively, if you use `broom` instead of `summary()`: - `glance()` command makes table of regression summary statistics - `tidy()` only shows coefficients .quitesmall[ ```r library(broom) glance(unrestricted_reg) ``` ``` ## # A tibble: 1 × 12 ## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0.138 0.131 3.44 20.8 6.50e-16 4 -1394. 2800. 2826. ## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int> ``` ] - `statistic` is the All F-test, `p.value` next to it is the p-value from the F test