2.7 — Inference for Regression

# 2.7 — Inference for Regression
## ECON 480 • Econometrics • Fall 2021
### Ryan Safner<br> Assistant Professor of Economics <br> <a href="mailto:safner@hood.edu"><i class="fa fa-paper-plane fa-fw"></i>safner@hood.edu</a> <br> <a href="https://github.com/ryansafner/metricsF21"><i class="fa fa-github fa-fw"></i>ryansafner/metricsF21</a><br> <a href="https://metricsF21.classes.ryansafner.com"> <i class="fa fa-globe fa-fw"></i>metricsF21.classes.ryansafner.com</a><br>

---

# Outline

### [Hypothesis Testing](#3)

### [Digression: p-Values and the Philosophy of Science](#33)

### [Hypothesis Testing by Simulation, with *infer*](#54)

### [Classical Statistical Inference (What R Calculates)](#79)

### [The Use and Abuse of `$p$`-values](#93)

---

# Hypothesis Testing

---

# Estimation and Hypothesis Testing I

.pull-left[
.smallest[
- We want to **test** if our estimates are .hi[statistically significant] and they describe the population
  - this is the “bread and butter” of using inferential statistics
  
.bg-washed-green.b--dark-green.ba.bw2.br3.shadow-5.ph4.mt5[
.green[**Examples**]:
- Does reducing class size actually improve test scores?
- Do more years of education increase your wages?
- Is the gender wage gap between men and women 23%?
]

]

.pull-right[
.center[
![](../images/scientifictesting.jpg)
]
- .hi-purple[All modern science is built upon statistical hypothesis testing, so understand it well!]

]

---

# Estimation and Hypothesis Testing II

.smallest[
- Note, we can test a lot of hypotheses about a lot of population parameters, e.g.
  - A population mean `$\mu$` 
      - <span class="green">**Example**: average height of adults</span>
  - A population proportion `$p$`
      - <span class="green">**Example**: percent of voters who voted for Trump</span>
  - A difference in population means `$\mu_A-\mu_B$`
      - <span class="green">**Example**: difference in average wages of men vs. women</span>
  - A difference in population proportions `$p_A-p_B$`
      - <span class="green">**Example**: difference in percent of patients reporting symptoms of drug A vs B</span>

- We will focus on hypotheses about .hi-purple[population regression slope] `$(\hat{\beta}_1)$`, i.e. the .hi-purple[causal effect]<sup>.magenta[†]</sup> of `$X$` on `$Y$`
]

.footnote[<sup>.magenta[†]</sup> With a model this simple, it's almost certainly **not** causal, but this is the ultimate direction we are heading...]

---

# Null and Alternative Hypotheses I

- All scientific inquiries begin with a .hi[null hypothesis] `$(H_0)$` that proposes a specific value of a population parameter
    - Notation: add a subscript 0: `$\beta_{1,0}$` (or `$\mu_0$`, `$p_0$`, etc)

- We suggest an .hi[alternative hypothesis] `$(H_a)$`, often the one we hope to verify
    - Note, can be multiple alternative hypotheses: `$H_1, H_2, \ldots , H_n$`

- Ask: .hi-purple["Does our data (sample) give us sufficient evidence to reject `\$H_0\$` in favor of `\$H_a\$`?"]
    - Note: **the test is *always* about** `$\mathbf{H_0}$`! 
    - See if we have sufficient evidence to reject the status quo

---

# Null and Alternative Hypotheses II

- Null hypothesis assigns a value (or a range) to a population parameter
  - e.g. `$\beta_1=2$` or `$\beta_1 \leq 20$`
  - .hi-purple[Most common is `\$\beta_1=0\$`] `$\implies$` `$X$` has no effect on `$Y$` (no slope for a line)
  - Note: always an equality!

- Alternative hypothesis must mathematically *contradict* the null hypothesis
    - e.g. `$\beta_1 \neq 2$` or `$\beta_1 > 20$` or `$\beta_1 \neq 0$`
    - Note: always an inequality!

- Alternative hypotheses come in two forms:
  1. .hi-purple[One-sided alternative]: `$\beta_1 >H_0$` or `$\beta_1< H_0$`
  2. .hi-purple[Two-sided alternative]: `$\beta_1 \neq H_0$`
        - Note this means either `$\beta_1 < H_0$` or `$\beta_1 > H_0$`

---

# Components of a Valid Hypothesis Test

- All statistical hypothesis tests have the following components:

1. A .hi-purple[null hypothesis, `\$H_0\$`]

2. An .hi-purple[alternative hypothesis, `\$H_a\$`]

3. A .hi-purple[test statistic] to determine if we reject `$H_0$` when the statistic reaches a "critical value"
    - Beyond the critical value is the "rejection region", sufficient evidence to reject `$H_0$`
    
--

4. A .hi-purple[conclusion] whether or not to reject `$H_0$` in favor of `$H_a$`

---

# Type I and Type II Errors I

.pull-left[
.smallest[
- Sample statistic `$(\hat{\beta_1})$` will rarely be exactly equal to the hypothesized parameter `$(\beta_1)$`

- Difference between observed statistic and true parameter could be because:

- .hi-turquoise[Parameter is *not* the hypothesized value]
    - `$H_0$` is *false*
    
- .hi-turquoise[Parameter is truly hypothesized value but *sampling variability* gave us a different estimate]
    - `$H_0$` is *true*

- .hi-purple[We cannot distinguish between these two possibilities with any certainty]

]
]

# Type I and Type II Errors II

.pull-left[
.smaller[
- We can interpret our estimates probabilistically as commiting one of two types of error:

1. .hi[Type I error (false positive)]: rejecting `$H_0$` when it is in fact true
    - Believing we found an important result when there is truly no relationship
    
2. .hi[Type II error (false negative)]: failing to reject `$H_0$` when it is in fact false
    - Believing we found nothing when there was truly a relationship to find
]
]

---

# Type I and Type II Errors III

.center[
![:scale 80%](../images/type_errors.png)
]
- Depending on context, committing one type of error may be more serious than the other

---

# Type I and Type II Errors IV

.smallest[
- Jury judges whether the evidence presented against the defendant is plausible *assuming the defendant were in fact innocent*
]
--

.smallest[
- If highly improbable (beyond a “reasonable doubt”): sufficient evidence to reject `$H_0$` and convict
]
---

# Type I and Type II Errors V

William Blackstone

(1723-1780)

]
]

> "It is better that ten guilty persons escape than that one innocent suffer."

- Type I error is worse than a Type II error in law!

]

---

# Type I and Type II Errors VI

---

# Type I and Type II Errors VI

---

# Significance Level, `$\alpha$`, and Confidence Level `$1-\alpha$`

- The .hi[significance level, `\$\alpha\$`], is the probability of a **Type I error**

`$$\alpha=P(\text{Reject } H_0 | H_0 \text{ is true})$$`

- The .hi[confidence level] is defined as .hi[`\$(1-\alpha)\$`]
  - Specify *in advance* an `$\alpha$`-level (0.10, 0.05, 0.01) with associated confidence level (90%, 95%, 99%)

- The probability of a **Type II error** is defined as `$\beta$`:

`$$\beta=P(\text{Don't reject } H_0 | H_0 \text{ is false})$$`

---

# `$\alpha$` and `$\beta$`

---

# Power and p-values

- The statistical .hi[power of the test] is `$(1-\beta)$`: the probability of correctly rejecting `$H_0$` when `$H_0$` is in fact false (e.g. convicting a guilty person)

`$$\text{Power} = 1- \beta = P(\text{Reject }H_0|H_0 \text{ is false})$$`

- The .hi[`\$p\$`-value] or .hi[significance probability] is the probability that, if the null hypothesis were true, the test statistic from any sample will be *at least as extreme* as the test statistic from *our* sample

`$$p(\delta \geq \delta_i|H_0 \text{ is true})$$`
  - where `$\delta$` represents some test statistic
  - `$\delta_i$` is the test statistic we observe in our sample
  - More on this in a bit

---

# p-Values and Statistical Significance 
 
- After running our test, we need to make a *decision* between the competing hypotheses

- Compare `$p$`-value with *pre-determined* `$\alpha$` (commonly, `$\alpha=0.05$`, 95% confidence level)

- If `$p<\alpha$`: .hi-purple[statistically significant] evidence sufficient to *reject* `$H_0$` in favor of `$H_a$`
  - Note this does **not** mean `$H_a$` is true! We merely have *rejected* `$H_0$`!

- If `$p \geq \alpha$`: *insufficient* evidence to reject `$H_0$`
  - Note this does **not** mean `$H_0$` is true! We merely have *failed* to *reject* `$H_0$`!

---

# Digression: p-Values and the Philosophy of Science

---

# Hypothesis Testing and the Philosophy of Science I

Sir Ronald A. Fisher

(1890&mdash;1962)
]
]

> "The null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."

1931, *The Design of Experiments*
]

---

# Hypothesis Testing and the Philosophy of Science I

.pull-left[
.smallest[
- Modern philosophy of science is largely based off of hypothesis testing and .hi-purple[falsifiability], which form the "Scientific Method"<sup>.magenta[†]</sup>

- For something to be "scientific", it must be .hi-purple[falsifiable], or at least .hi-purple[testable]

- Hypotheses can be *corroborated* with evidence, but always *tentative* until falsified by data in suggesting an alternative hypothesis

> **"All swans are white"** is a hypothesis rejected upon discovery of a single black swan

]
]
.pull-right[

.center[
.polaroid[![](../images/blackswan.jpg)]
]
]
.footnote[<sup>.magenta[†]</sup> Note: economics is a very different kind of "science" with a different methodology!]

---

# Hypothesis Testing and p-Values

- Hypothesis testing, confidence intervals, and p-values are probably the hardest thing to understand in statistics

[Fivethirtyeight: Not Even Scientists Can Easily Explain P-values](https://fivethirtyeight.com/features/not-even-scientists-can-easily-explain-p-values/)]

---

# Hypothesis Testing: Which Test? I

- Rigorous course on statistics ([**ECMG 212**](http://ryansafner.com/courses/ecmg212) or **MATH 112**) will spend weeks going through different types of tests:
    - Sample mean; difference of means
    - Proportion; difference of proportions
    - Z-test vs t-test
    - 1 sample vs. 2 samples
    - `$\chi^2$` test

---

# Hypothesis Testing: Which Test? II

---

# There is Only One Test

- Fortunately, some clever statisticians realized [“**there is only one test**”](https://allendowney.blogspot.com/2011/05/there-is-only-one-test.html) and some built a nice `R` package called `infer`

1. **Calculate** a statistic, `$\delta_i$`<sup>.magenta[†]</sup>, from a sample of data

2. **Simulate** a world where `$\delta$` is null `$(H_0)$`

3. **Examine** the distribution of `$\delta$` across the null world

4. **Calculate** the probability that `$\delta_i$` could exist in the null world

5. **Decide** if `$\delta_i$` is statistically significant

.footnote[<sup>.magenta[†]</sup> `\$\delta\$` can stand in for any test-statistic in any hypothesis test! For our purposes, `\$\delta\$` is the slope of our regression sample, `\$\hat{\beta}_1\$`.]

---

# Elements of a Hypothesis Test

[Alan Downey: “There is still only one test”](https://allendowney.blogspot.com/2016/06/there-is-still-only-one-test.html)
]

---

# Hypothesis Testing with the infer Package I

- R naturally runs the following hypothesis test on any regression as part of `lm()`:

`$$\begin{align*}
H_0: \; & \beta_1=0\\
H_1: \; & \beta_1 \neq 0
\end{align*}$$`

- `infer` allows you to run through these steps manually to understand the process:

1. `specify()` a model

2. `hypothesize()` the null

3. `generate()` simulations of the null world

4. `calculate()` the `$p$`-value

5. `visualize()` with a histogram (optional)

---