class: center, middle, inverse, title-slide # 3.8 — Polynomial Regression ## ECON 480 • Econometrics • Fall 2021 ### Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com
--- class: inverse # Outline ### [The Quadratic Model](#34) ### [The Quadratic Model: Maxima and Minima](#61) ### [Are Polynomials Necessary?](#68) --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear ] -- .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-1-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-2-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` `$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-3-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear - Get rid of the outliers (>$60,000) ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-4-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear - Get rid of the outliers (>$60,000) .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-5-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear - Get rid of the outliers (>$60,000) .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` `$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- # *Linear* Regression .pull-left[ - OLS is commonly known as “.hi[linear_ regression]” as it fits a **straight line** to data points - Often, data and relationships between variables may *not* be linear - Get rid of the outliers (>$60,000) .quitesmall[ `$$\color{red}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i}$$` `$$\color{green}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\text{GDP}_i+\hat{\beta_2}\text{GDP}_i^2}$$` `$$\color{orange}{\widehat{\text{Life Expectancy}_i}=\hat{\beta_0}+\hat{\beta_1}\ln(\text{GDP}_i)}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-7-1.png" width="504" /> ] --- # Nonlinear Effects in Linear Regression .smallest[ - Despite being “linear regression”, OLS can handle this with an easy fix - OLS requires all *parameters* (i.e. the `\(\beta\)`'s) to be linear, the *regressors* `\((X\)`'s) can be nonlinear: ] -- .smallest[ `$$Y_i=\beta_0+\beta_1 X_i^2 \quad ✅$$` ] -- .smallest[ `$$Y_i=\beta_0+\beta_1^2X_i \quad ❌$$` ] -- .smallest[ `$$Y_i=\beta_0+\beta_1 \sqrt{X_i} \quad ✅$$` ] -- .smallest[ $$Y_i=\beta_0+\sqrt{\beta_1} X_i \quad ❌ $$ ] -- .smallest[ `$$Y_i=\beta_0+\beta_1 (X_{1i} \times X_{2i}) \quad ✅$$` ] -- .smallest[ `$$Y_i=\beta_0+\beta_1 ln(X_i) \quad ✅$$` ] -- .smallest[ - In the end, each `\(X\)` is always just a number in the data, OLS can always estimate parameters for it; but *plotting* the modelled points `\((X_i, \hat{Y_i})\)` can result in a curve! ] --- # Sources of Nonlinearities - Effect of `\(X_1 \rightarrow Y\)` might be nonlinear if: -- 1. `\(X_1 \rightarrow Y\)` is different for different levels of `\(X_1\)` - e.g. **diminishing returns**: `\(\uparrow X_1\)` increases `\(Y\)` at a *decreasing* rate - e.g. **increasing returns**: `\(\uparrow X_1\)` increases `\(Y\)` at an *increasing* rate -- 2. `\(X_1 \rightarrow Y\)` is different for different levels of `\(X_2\)` - e.g. interaction effects (last lesson) --- # Nonlinearities Alter Marginal Effects .pull-left[ - **Linear**: `$$Y=\hat{\beta_0}+\hat{\beta_1}X$$` - marginal effect (slope), `\((\hat{\beta_1}) = \frac{\Delta Y}{\Delta X}\)` is constant for all `\(X\)` ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-8-1.png" width="504" /> ] --- # Nonlinearities Alter Marginal Effects .pull-left[ - **Polynomial**: `$$Y=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$$` - Marginal effect, “slope” `\(\left(\neq \hat{\beta_1}\right)\)` *depends on the value of* `\(X\)`! ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-9-1.png" width="504" /> ] --- # Sources of Nonlinearities III .pull-left[ - **Interaction Effect**: `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X_1+\hat{\beta_2}X_2+\hat{\beta_3}X_1 \times X_2$$` - Marginal effect, “slope” *depends on the value of* `\(X_2\)`! - Easy example: if `\(X_2\)` is a dummy variable: - .blue[`\\(X_2=0\\)` (control)] vs. .pink[`\\(X_2=1\\)` (treatment)] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-10-1.png" width="504" /> ] --- # Polynomial Functions of `\(X\)` I .pull-left[ .smallest[ - .blue[Linear] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-11-1.png" width="504" /> ] --- # Polynomial Functions of `\(X\)` I .pull-left[ .smallest[ - .blue[Linear] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$$` - .green[Quadratic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-12-1.png" width="504" /> ] --- # Polynomial Functions of `\(X\)` I .pull-left[ .smallest[ - .blue[Linear] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$$` - .green[Quadratic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$$` - .orange[Cubic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-13-1.png" width="504" /> ] --- # Polynomial Functions of `\(X\)` I .pull-left[ .smallest[ - .blue[Linear] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X$$` - .green[Quadratic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2$$` - .orange[Cubic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3$$` - .purple[Quartic] `$$\hat{Y}=\hat{\beta_0}+\hat{\beta_1}X+\hat{\beta_2}X^2+\hat{\beta_3}X^3+\hat{\beta_4}X^4$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-14-1.png" width="504" /> ] --- # Polynomial Functions of `\(X\)` I `$$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2 + \cdots + \hat{\beta_{\color{#e64173}{r}}} X_i^{\color{#e64173}{r}} + u_i$$` -- - Where `\(\color{e64173}{r}\)` is the highest power `\(X_i\)` is raised to - quadratic `\(\color{e64173}{r=2}\)` - cubic `\(\color{e64173}{r=3}\)` -- - The graph of an `\(r\)`<sup>th</sup>-degree polynomial function has `\((r-1)\)` bends -- - Just another multivariate OLS regression model! --- class: inverse, center, middle # The Quadratic Model --- # Quadratic Model `$$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2$$` - .hi[Quadratic model] has `\(X\)` and `\(X^2\)` variables in it (yes, need both!) -- - How to interpret coefficients (betas)? - `\(\beta_0\)` as “intercept” and `\(\beta_1\)` as “slope” makes no sense 🧐 - `\(\beta_1\)` as effect `\(X_i \rightarrow Y_i\)` holding `\(X_i^2\)` constant??<sup>.magenta[†]</sup> .footnote[<sup>.magenta[†]</sup> Note: this is *not* a perfect multicollinearity problem! Correlation only measures *linear* relationships!] -- - **Estimate marginal effects** by calculating predicted `\(\hat{Y_i}\)` for different levels of `\(X_i\)` --- # Quadratic Model: Calculating Marginal Effects `$$\hat{Y_i} = \hat{\beta_0} + \hat{\beta_1} X_i + \hat{\beta_2} X_i^2$$` - What is the .hi[marginal effect] of `\(\Delta X_i \rightarrow \Delta Y_i\)`? -- - Take the **derivative** of `\(Y_i\)` with respect to `\(X_i\)`: `$$\frac{\partial \, Y_i}{\partial \, X_i} = \hat{\beta_1}+2\hat{\beta_2} X_i$$` -- - .hi[Marginal effect] of a 1 unit change in `\(X_i\)` is a `\(\color{#6A5ACD}{\left(\hat{\beta_1}+2\hat{\beta_2} X_i \right)}\)` unit change in `\(Y\)` --- # Quadratic Model: Example I .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: `$$\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} \, \text{GDP per capita}_i+\hat{\beta_2}\, \text{GDP per capita}^2_i$$` ] - Use `gapminder` package and data ```r library(gapminder) ``` --- # Quadratic Model: Example II .smallest[ - These coefficients will be very large, so let's transform `gdpPercap` to be in $1,000's ```r gapminder <- gapminder %>% mutate(GDP_t = gdpPercap/1000) gapminder %>% head() # look at it ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["country"],"name":[1],"type":["fct"],"align":["left"]},{"label":["continent"],"name":[2],"type":["fct"],"align":["left"]},{"label":["year"],"name":[3],"type":["int"],"align":["right"]},{"label":["lifeExp"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["pop"],"name":[5],"type":["int"],"align":["right"]},{"label":["gdpPercap"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["GDP_t"],"name":[7],"type":["dbl"],"align":["right"]}],"data":[{"1":"Afghanistan","2":"Asia","3":"1952","4":"28.801","5":"8425333","6":"779.4453","7":"0.7794453"},{"1":"Afghanistan","2":"Asia","3":"1957","4":"30.332","5":"9240934","6":"820.8530","7":"0.8208530"},{"1":"Afghanistan","2":"Asia","3":"1962","4":"31.997","5":"10267083","6":"853.1007","7":"0.8531007"},{"1":"Afghanistan","2":"Asia","3":"1967","4":"34.020","5":"11537966","6":"836.1971","7":"0.8361971"},{"1":"Afghanistan","2":"Asia","3":"1972","4":"36.088","5":"13079460","6":"739.9811","7":"0.7399811"},{"1":"Afghanistan","2":"Asia","3":"1977","4":"38.438","5":"14880372","6":"786.1134","7":"0.7861134"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Quadratic Model: Example III .smallest[ - Let’s also create a squared term, `gdp_sq` ```r gapminder <- gapminder %>% mutate(GDP_sq = GDP_t^2) gapminder %>% head() # look at it ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["country"],"name":[1],"type":["fct"],"align":["left"]},{"label":["continent"],"name":[2],"type":["fct"],"align":["left"]},{"label":["year"],"name":[3],"type":["int"],"align":["right"]},{"label":["lifeExp"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["pop"],"name":[5],"type":["int"],"align":["right"]},{"label":["gdpPercap"],"name":[6],"type":["dbl"],"align":["right"]},{"label":["GDP_t"],"name":[7],"type":["dbl"],"align":["right"]},{"label":["GDP_sq"],"name":[8],"type":["dbl"],"align":["right"]}],"data":[{"1":"Afghanistan","2":"Asia","3":"1952","4":"28.801","5":"8425333","6":"779.4453","7":"0.7794453","8":"0.6075350"},{"1":"Afghanistan","2":"Asia","3":"1957","4":"30.332","5":"9240934","6":"820.8530","7":"0.8208530","8":"0.6737997"},{"1":"Afghanistan","2":"Asia","3":"1962","4":"31.997","5":"10267083","6":"853.1007","7":"0.8531007","8":"0.7277808"},{"1":"Afghanistan","2":"Asia","3":"1967","4":"34.020","5":"11537966","6":"836.1971","7":"0.8361971","8":"0.6992257"},{"1":"Afghanistan","2":"Asia","3":"1972","4":"36.088","5":"13079460","6":"739.9811","7":"0.7399811","8":"0.5475720"},{"1":"Afghanistan","2":"Asia","3":"1977","4":"38.438","5":"14880372","6":"786.1134","7":"0.7861134","8":"0.6179742"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Quadratic Model: Example IV .smallest[ - Can “manually” run a multivariate regression with `GDP_t` and `GDP_sq` ```r library(broom) reg1 <- lm(lifeExp ~ GDP_t + GDP_sq, data = gapminder) reg1 %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"GDP_sq","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Quadratic Model: Example V - OR use `gdp_t` and add the “transform” command in regression, `I(gdp_t^2)` .smallest[ ```r reg1_alt <- lm(lifeExp ~ GDP_t + I(GDP_t^2), data = gapminder) reg1_alt %>% tidy() ``` <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"I(GDP_t^2)","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] --- # Quadratic Model: Example VI .smallest[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"GDP_sq","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] -- .smallest[ `$$\widehat{\text{Life Expectancy}_i} = 50.52+1.55 \, \text{GDP}_i - 0.02\, \text{GDP}^2_i$$` ] -- .smallest[ - Positive effect `\((\hat{\beta_1}>0)\)`, with diminishing returns `\((\hat{\beta_2}<0)\)` - Marginal effect of GDP on Life Expectancy **depends on initial value of GDP!** ] --- # Quadratic Model: Example VII .smallest[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"GDP_sq","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] .smallest[ - .hi[Marginal effect] of GDP on Life Expectancy: ] -- .smallest[ `$$\begin{align*} \frac{\partial \, Y}{\partial \; X} &= \hat{\beta_1}+2\hat{\beta_2} X_i\\ \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &\approx 1.55+2(-0.02) \, \text{GDP}\\ &\approx \color{#e64173}{1.55-0.04 \, \text{GDP}}\\ \end{align*}$$` ] --- # Quadratic Model: Example VIII `$$\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$$` Marginal effect of GDP if GDP `\(=5\)` ($ thousand): `$$\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(5)\\ &= 1.55-0.20\\ &=1.35\\ \end{align*}$$` -- - i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 1.35 years --- # Quadratic Model: Example IX `$$\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$$` Marginal effect of GDP if GDP `\(=25\)` ($ thousand): -- `$$\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(25)\\ &= 1.55-1.00\\ &=0.55\\ \end{align*}$$` -- - i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy increases by 0.55 years --- # Quadratic Model: Example X `$$\frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} = 1.55-0.04 \, \text{GDP}$$` Marginal effect of GDP if GDP `\(=50\)` ($ thousand): -- `$$\begin{align*} \frac{\partial \, \text{Life Expectancy}}{\partial \, \text{GDP}} &= 1.55-0.04\text{GDP}\\ &= 1.55-0.04(50)\\ &= 1.55-2.00\\ &=-0.45\\ \end{align*}$$` -- - i.e. for every addition $1 (thousand) in GDP per capita, average life expectancy *decreases* by 0.45 years --- # Quadratic Model: Example XI .smallest[ `$$\begin{align*}\widehat{\text{Life Expectancy}_i} &= 50.52+1.55 \, \text{GDP per capita}_i - 0.02\, \text{GDP per capita}^2_i \\ \frac{\partial \, \text{Life Expectancy}}{d \, \text{GDP}} &= 1.55-0.04\text{GDP} \\ \end{align*}$$` ] | *Initial* GDP per capita | Marginal Effect<sup>.magenta[†]<sup> | |----------------|-------------------:| | $5,000 | `\(1.35\)` years | | $25,000 | `\(0.55\)` years | | $50,000 | `\(-0.45\)` years | .footnote[<sup>.magenta[†]<sup> Of +$1,000 GDP/capita on Life Expectancy.] --- # Quadratic Model: Example XII .pull-left[ .code50[ ```r ggplot(data = gapminder)+ aes(x = GDP_t, y = lifeExp)+ geom_point(color="blue", alpha=0.5)+ * stat_smooth(method = "lm", * formula = y ~ x + I(x^2), * color="green")+ geom_vline(xintercept=c(5,25,50), linetype="dashed", color="red", size = 1)+ scale_x_continuous(labels=scales::dollar, breaks=seq(0,120,10))+ scale_y_continuous(breaks=seq(0,100,10), limits=c(0,100))+ labs(x = "GDP per Capita (in Thousands)", y = "Life Expectancy (Years)")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-22-1.png" width="504" /> ] --- class: inverse, center, middle # The Quadratic Model: Maxima and Minima --- # Quadratic Model: Maxima and Minima I - For a polynomial model, we can also find the predicted **maximum** or **minimum** of `\(\hat{Y_i}\)` -- - A quadratic model has a single global maximum or minimum (1 bend) -- - By calculus, a minimum or maximum occurs where: `$$\begin{align*} \frac{ \partial \, Y_i}{\partial \, X_i} &=0\\ \beta_1 + 2\beta_2 X_i &= 0\\ 2\beta_2 X_i&= -\beta_1\\ X_i^*&=-\frac{\beta_1}{2\beta_2}\\ \end{align*}$$` --- # Quadratic Model: Maxima and Minima II .pull-left[ .quitesmall[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"GDP_sq","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] -- .pull-right[ `$$\begin{align*} GDP_i^*&=-\frac{\beta_1}{2\beta_2}\\ GDP_i^*&=-\frac{(1.55)}{2(-0.015)}\\ GDP_i^*& \approx 51.67\\ \end{align*}$$` ] --- # Quadratic Model: Maxima and Minima III .pull-left[ .code50[ ```r ggplot(data = gapminder)+ aes(x = GDP_t, y = lifeExp)+ geom_point(color="blue", alpha=0.5)+ * stat_smooth(method = "lm", * formula = y ~ x + I(x^2), * color="green")+ geom_vline(xintercept=51.67, linetype="dashed", color="red", size = 1)+ geom_label(x=51.67, y=90, label="$51.67", color="red")+ scale_x_continuous(labels=scales::dollar, breaks=seq(0,120,10))+ scale_y_continuous(breaks=seq(0,100,10), limits=c(0,100))+ labs(x = "GDP per Capita (in Thousands)", y = "Life Expectancy (Years)")+ ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16) ``` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-24-1.png" width="504" /> ] --- class: inverse, center, middle # Are Polynomials Necessary? --- # Determining Polynomials are Necessary I .smallest[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"50.52400578","3":"0.2978134673","4":"169.64984","5":"0.000000e+00"},{"1":"GDP_t","2":"1.55099112","3":"0.0373734945","4":"41.49976","5":"1.292863e-260"},{"1":"GDP_sq","2":"-0.01501927","3":"0.0005794139","4":"-25.92149","5":"3.935809e-125"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> - Is the quadratic term necessary? ] -- .smallest[ - Determine if `\(\hat{\beta_2}\)` (on `\(X_i^2)\)` is statistically significant: - `\(H_0: \hat{\beta_2}=0\)` - `\(H_a: \hat{\beta_2} \neq 0\)` ] -- .smallest[ - Statistically significant `\(\implies\)` we should keep the quadratic model - If we only ran a linear model, it would be incorrect! ] --- # Determining Polynomials are Necessary II .pull-left[ .smaller[ - Should we keep going up in polynomials? ] .quitesmall[ `$$\color{#6A5ACD}{\widehat{\text{Life Expectancy}_i} = \hat{\beta_0}+\hat{\beta_1} GDP_i+\hat{\beta_2}GDP^2_i+\hat{\beta_3}GDP_i^3}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-26-1.png" width="504" /> ] --- # Determining Polynomials are Necessary III .pull-left[ - In general, you should have a .hi-purple[compelling theoretical reason] why data or relationships should .hi-purple[“change direction”] multiple times - Or clear data patterns that have multiple “bends” - Recall, [we care more](https://metricsf21.classes.ryansafner.com/slides/3.1-slides#3) about accurately measuring the causal effect between `\(X\)` and `\(Y\)`, rather than getting the most accurate prediction possible for `\(\hat{Y}\)` ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-27-1.png" width="504" /> ] --- # A Second Polynomial Example I .pull-left[ .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: How does a school district's average income affect Test scores? ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-29-1.png" width="504" /> ] --- # A Second Polynomial Example I .pull-left[ .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: How does a school district's average income affect Test scores? ] .smallest[ `$$\color{red}{\widehat{\text{Test Score}_i}=\hat{\beta_0}+\hat{\beta_1}\text{Income}_i}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-30-1.png" width="504" /> ] --- # A Second Polynomial Example I .pull-left[ .bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[ .green[**Example**]: How does a school district's average income affect Test scores? ] .quitesmall[ `$$\color{green}{\widehat{\text{Test Score}_i}=\hat{\beta_0}+\hat{\beta_1}\text{Income}_i+\hat{\beta_1}\text{Income}_i^2}$$` ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-31-1.png" width="504" /> ] --- # A Second Polynomial Example II .pull-left[ .tiny[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"607.30173501","3":"3.046219282","4":"199.362449","5":"0.000000e+00"},{"1":"avginc","2":"3.85099474","3":"0.304261693","4":"12.656850","5":"2.690099e-31"},{"1":"I(avginc^2)","2":"-0.04230846","3":"0.006260061","4":"-6.758474","5":"4.713383e-11"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-33-1.png" width="504" /> ] --- # A Second Polynomial Example III .pull-left[ .tiny[ <div data-pagedtable="false"> <script data-pagedtable-source type="application/json"> {"columns":[{"label":["term"],"name":[1],"type":["chr"],"align":["left"]},{"label":["estimate"],"name":[2],"type":["dbl"],"align":["right"]},{"label":["std.error"],"name":[3],"type":["dbl"],"align":["right"]},{"label":["statistic"],"name":[4],"type":["dbl"],"align":["right"]},{"label":["p.value"],"name":[5],"type":["dbl"],"align":["right"]}],"data":[{"1":"(Intercept)","2":"6.000790e+02","3":"5.8295880342","4":"102.936774","5":"4.611745e-298"},{"1":"avginc","2":"5.018677e+00","3":"0.8594537744","4":"5.839379","5":"1.056874e-08"},{"1":"I(avginc^2)","2":"-9.580515e-02","3":"0.0373591998","4":"-2.564433","5":"1.068452e-02"},{"1":"I(avginc^3)","2":"6.854842e-04","3":"0.0004719549","4":"1.452436","5":"1.471343e-01"}],"options":{"columns":{"min":{},"max":[10]},"rows":{"min":[10],"max":[10]},"pages":{}}} </script> </div> ] .smallest[ - Should we keep going? ] ] .pull-right[ <img src="3.8-slides_files/figure-html/unnamed-chunk-35-1.png" width="504" /> ] --- # Strategy for Polynomial Model Specification 1. Are there good theoretical reasons for relationships changing (e.g. increasing/decreasing returns)? -- 2. Plot your data: does a straight line fit well enough? -- 3. Specify a polynomial function of a higher power (start with 2) and estimate OLS regression -- 4. Use `\(t\)`-test to determine if higher-power term is significant -- 5. Interpret effect of change in `\(X\)` on `\(Y\)` -- 6. Repeat steps 3-5 as necessary