3.2 — Causal Inference and DAGs

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Correlation vs. Causation

Causal Diagrams

DAG Rules

You Don’t Need an RCT to Talk About Causality

Statistics profession is obstinant that we cannot say anything about causality
But you have to! It's how the human brain works!
We can’t concieve of (spurious) correlation without some causation

The Causal Revolution

RCTs and Evidence-Based PolicyShould we ONLY base policies on the evidence from Randomized Controlled Trials?

  

RCTs and Evidence-Based PolicyShould we ONLY base policies on the evidence from Randomized Controlled Trials?

  

RCTs and Evidence-Based Policy

Should we ONLY base policies on the evidence from Randomized Controlled Trials?

Source: British Medical Journal

RCTs and Evidence-Based Policy III

Correlation vs. Causation

Correlation and Causation I

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated
“Correlation implies causation”
- for an association, there must be some causal chain that relates and
- but not necessarily merely

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated
“Correlation implies causation”
- for an association, there must be some causal chain that relates and
- but not necessarily merely
“Correlation plus exogeneity is causation.”

Correlation and Causation

Correlation:
- Math & Statistics
- Computers, AI, Machine learning can figure this out (better than humans)
Causation:
- Philosophy, Intuition, Theory
- Counterfactual thinking, unique to humans (vs. animals or computers)
- Computers cannot (yet) figure this out

The Causal Revolution

Causation Requires Counterfactual Thinking

Causal Inference

We will seek to understand what causality is and how we can approach finding it
We will also explore the different common research designs meant to identify causal relationships
These skills, more than supply & demand, constrained optimization models, ISLM, etc, are the tools and comparative advantage of a modern research economist
- Why all big companies (especially in tech) have entire economics departments in them

“The Credibility Revolution”

Simultaneous “credibility revolution” in econometrics (c.1990s—2000s)
Use clever research designs to approximate natural experiments
Note: major disagreements between Pearl & Angrist/Imbens, etc.!

Clever Research Designs Identify Causality

Correlation and Causation

What Then IS Causation?

causes if we can intervene and change without changing anything else, and changes
Y “listens to” X
- may not be the only thing that causes !

What Then IS Causation?

causes if we can intervene and change without changing anything else, and changes
Y “listens to” X
- may not be the only thing that causes !

Example

If is a light switch, and is a light:

Flipping the switch causes the light to go on
But NOT if the light is burnt out (No despite )
OR if the light was already on without )

Non-Causal Claims

All of the following have non-zero correlations. Are they causal?

Example

Greater ice cream sales more violent crime
Rooster crows the sun rises in the morning
Taking Vitamin C colds go away a few days later
Political party in power economy performs better/worse

Counterfactuals

The sine qua non of causal claims are counterfactuals: what would have been if had been different?
It is impossible to make a counterfactual claim from data alone!
Need a (theoretical) causal model of the data-generating process!

Counterfactuals and RCTs

Again, RCTs are invoked as the gold standard for their ability to make counterfactual claims:
Treatment/intervention is randomly assigned to individuals

If person i who recieved treatment had not recieved the treatment, we can predict what his outcome would have been

If person j who did not recieve treatment had recieved treatment, we can predict what her outcome would have been

We can say this because, on average, treatment and control groups are the same before treatment

From RCTs to Causal Models

RCTs are but the best-known method of a large, growing science of causal inference
We need a causal model to describe the data-generating process (DGP)
Requires us to make some assumptions

Causal Diagrams

Causal Diagrams/DAGs

A surprisingly simple, yet rigorous and powerful method of modeling is using a causal diagram or DAG:
- Directed: Each node has arrows that points only one direction
- Acyclic: Arrows only have one direction, and cannot loop back
- Graph

Causal Diagrams/DAGs

A visual model of the data-generating process, encodes our understanding of the causal relationships
Requires some common sense/economic intutition
Remember, all models are wrong, we just need them to be useful!

Causal Diagrams/DAGs

Our light switch example of causality

Drawing a DAG: Example

Suppose we have data on three variables
- IP: how much a firm spends on IP lawsuits
- tech: whether a firm is in tech industry
- profit: firm profits
They are all correlated with each other, but what's are the causal relationships?
We need our own causal model (from theory, intuition, etc) to sort
- Data alone will not tell us!

Drawing a DAG:

Consider all the variables likely to be important to the data-generating process (including variables we can't observe!)
For simplicity, combine some similar ones together or prune those that aren't very important
Consider which variables are likely to affect others, and draw arrows connecting them
Test some testable implications of the model (to see if we have a correct one!)

Side Notes

Drawing an arrow requires a direction - making a statement about causality!
Omitting an arrow makes an equally important statement too!
- In fact, we will need omitted arrows to show causality!
If two variables are correlated, but neither causes the other, likely they are both caused by another (perhaps unobserved) variable - add it!
There should be no cycles or loops (if so, there’s probably another missing variable, such as time)

DAG Example I

Example: what is the effect of education on wages?

Education , “treatment” or “exposure”)
Wages , “outcome” or “response”)

DAG Example I

What other variables are important?
- Ability
- Socioeconomic status
- Demographics
- Phys. Ed. requirements
- Year of birth
- Location
- Schooling laws
- Job connections

DAG Example I

In social science and complex systems, 1000s of variables could plausibly be in DAG!
So simplify:
- Ignore trivial things (Phys. Ed. requirement)
- Combine similar variables (Socioeconomic status, Demographics, Location) Background

DAG Example II

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job connections probably cause wages

DAG Example III

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job connections probably cause wages
Job connections in fact is probably caused by education!
Location and background probably both caused by unobserved factor (u1)

DAG Example IV

This is messy, but we have a causal model!
Makes our assumptions explicit, and many of them are testable
DAG suggests certain relationships that will not exist:
- all relationships between laws and conx go through educ
- so if we controlled for educ, then cor(laws,conx) should be zero!

Let the Computer Do It: Dagitty.net I

Dagitty.net is a great tool to make these and give you testable implications
Click Model -> New Model
Name your "exposure" variable of interest) and "outcome" variable

Let the Computer Do It: Dagitty.net II

Click and drag to move nodes around
Add a new variable by double-clicking
Add an arrow by double-clicking one variable and then double-clicking on the target (do again to remove arrow)

Let the Computer Do It: Dagitty.net III

Tells you how to identify your effect! (upper right)

Minimal sufficient adjustment sets containing background, location, year for estimating the total effect of educ on wage: background, location, year

Let the Computer Do It: Dagitty.net III

Tells you some testable implications of your model
These are independencies or conditional independencies:

“X is independent of Y, given Z”

Implies that by controlling for , and should have no correlation

Let the Computer Do It: Dagitty.net III

Tells you some testable implications of your model
Example: look at the last one listed:

job_connections year educ

“Job connections are independent of year, controlling for education”

Implies that by controlling for educ, there should be no correlation between job_connections and year — can test this with data!

Causal Effect

If we control for background, location, and year, we can identify the causal effect of educ wage.

You Can Draw DAGs In R

New package: ggdag
Arrows are made with formula notation: Y~X+Z means "Y is caused by X and Z"

# install.packages("ggdag")
library(ggdag)
dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ", # optional: define X
       outcome = "wage" # optional: define Y
       ) %>%
  ggdag()+
  theme_dag()

You Can Draw DAGs In R

Or you can just copy the code from dagitty.net!
Use dagitty() from the dagitty package, and paste the code in quotes

library(dagitty)
dagitty('dag {
bb="0,0,1,1"
background [pos="0.413,0.335"]
compulsory_schooling_laws [pos="0.544,0.076"]
educ [exposure,pos="0.185,0.121"]
job_connections [pos="0.302,0.510"]
location [pos="0.571,0.431"]
u1 [pos="0.539,0.206"]
wage [outcome,pos="0.552,0.761"]
year [pos="0.197,0.697"]
background -> educ
background -> wage
compulsory_schooling_laws -> educ
educ -> job_connections
educ -> wage
job_connections -> wage
location -> educ
location -> wage
u1 -> background
u1 -> location
year -> educ
year -> wage
}') %>%
  ggdag()+
  theme_dag()

You Can Draw DAGs In R

It's not very pretty, but if you set text = FALSE, use_labels = "name inside ggdag(), makes it easier to read

dagitty('dag {
bb="0,0,1,1"
background [pos="0.413,0.335"]
compulsory_schooling_laws [pos="0.544,0.076"]
educ [exposure,pos="0.185,0.121"]
job_connections [pos="0.302,0.510"]
location [pos="0.571,0.431"]
u1 [pos="0.539,0.206"]
wage [outcome,pos="0.552,0.761"]
year [pos="0.197,0.697"]
background -> educ
background -> wage
compulsory_schooling_laws -> educ
educ -> job_connections
educ -> wage
job_connections -> wage
location -> educ
location -> wage
u1 -> background
u1 -> location
year -> educ
year -> wage
}') %>%
  ggdag(., text = FALSE, use_labels = "name")+
  theme_dag()

ggdag: Additional Tools

If you have defined X (exposure) and Y (outcome), you can use ggdag_paths() to have it show all possible paths between and !

dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  tidy_dagitty(seed = 2) %>%
  ggdag_paths()+
  theme_dag()

You Can Draw DAGs In R

If you have defined X (exposure) and Y (outcome), you can use ggdag_adjustment_set() to have it show you what you need to control for in order to identify !

dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  ggdag_adjustment_set(shadow = T)+
  theme_dag()

You Can Draw DAGs In RYou can also use impliedConditionalIndependencies() from the dagitty package to have it show the testable implications from dagitty.net

library(dagitty)
dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  impliedConditionalIndependencies()
## bckg _||_ conx | educ
## bckg _||_ laws
## bckg _||_ loc | u1
## bckg _||_ year
## conx _||_ laws | educ
## conx _||_ loc | educ
## conx _||_ u1 | bckg, loc
## conx _||_ u1 | educ
## conx _||_ year | educ
## educ _||_ u1 | bckg, loc
## laws _||_ loc
## laws _||_ u1
## laws _||_ wage | bckg, educ, loc, year
## laws _||_ year
## loc _||_ year
## u1 _||_ wage | bckg, loc
## u1 _||_ year

  

DAG Rules

How does dagitty.net and ggdag know how to identify effects, or what to control for, or what implications are testable?
Comes from fancy math called “do-calculus”

Fortunately, these amount to a few simple rules that we can see on a DAG

DAGs I

Typical notation:
is independent variable of interest
- Epidemiology: "intervention" or “exposure”
is dependent or "response" variable
Other variables use other letters
You can of course use words instead of letters!

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:
Indirect effects:
- is a “mediator” variable, the mechanism by which affects

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:
Indirect effects:
- is a “mediator” variable, the mechanism by which affects
You of course might have both!

Confounders

is a “confounder” of , it causes both and
is made up of two parts:
1. Causal effect of 👍
2. causing both the values of and 👎
Failing to control for will bias our estimate of the causal effect of !

Confounders

Confounders are the DAG-equivalent of omitted variable bias (next class)

By leaving out , this regression is biased
picks up both:

“Front Doors” and “Back Doors”

With this DAG, there are 2 paths that connect and ^†:

A causal “front-door” path:
- 👍 what we want to measure
A non-causal “back-door” path:
- At least one causal arrow runs in the opposite direction
- 👎 adds a confounding bias

^† Regardless of the directions of the arrows!

Controlling I

Ideally, if we ran a randomized control trial and randomily assigned different values of to different individuals, this would delete the arrow between and
- Individuals’ values of do not affect whether or not they are treated ($X$)
This would only leave the front-door,
But we can rarely run an ideal RCT

Controlling I

Instead of an RCT, if we can just “adjust for” or “control for” , we can block the back-door path
This would only leave the front-door path open,
“As good as” an RCT!

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are unobserved factors

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are unobserved factors

If we can randomly assign treatment, this makes treatment exogenous:

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are other unobserved factors

When we (often) can’t randomly assign treatment, we have to find another way to control for measurable things in

Controlling II

Controlling for a single variable along a long causal path is sufficient to block that path!
Causal path:
Backdoor path:
It is sufficient to block this backdoor by controlling either or or !

Controlling II

Controlling for a single variable along a long causal path is sufficient to block that path!
Causal path:
Backdoor path:
It is sufficient to block this backdoor by controlling either or or !

The Back Door Criterion

To identify the causal effect of :
“Back-door criterion”: control for the minimal amount of variables sufficient to ensure that no open back-door exists between and
Example: in this DAG, control for

The Back Door Criterion

Implications of the Back-door criterion:

1) You only need to control for the variables that keep a back-door open, not all other variables!

Example:

(front-door)
(back-door)

The Back Door Criterion

Implications of the Back-door criterion:

1) You only need to control for the variables that keep a back-door open, not all other variables!

Example:

(front-door)
(back-door)
Need only control for or to block the back-door path
and have no effect on , and therefore we don’t need to control for them!

The Back Door Criterion: Colliders

2) Exception: the case of a “collider”

If arrows “collide” at a node, that node is automatically blocking the pathway, do not control for it!
Controlling for a collider would open the path and add bias!

Example:

(front-door)
(back-door, but blocked by B!)

The Back Door Criterion: Colliders

2) Exception: the case of a “collider”

If arrows “collide” at a node, that node is automatically blocking the pathway, do not control for it!
Controlling for a collider would open the path and add bias!

Example:

(front-door)
(back-door, but blocked by B!)
Don’t need to control for anything here!

The Back Door Criterion: Colliders

Example: Are you less likely to get the flu if you are hit by a bus?

: getting the flu
: being hit by a bus
: being in the hospital
Both and send you to (arrows)
Conditional on being in , negative correlation between and (spurious!)

The Back Door Criterion: Colliders

In the NBA, apparently players’ height has no relationship to points scored?

The Back Door Criterion: Colliders

In the NBA, players’ height has no relationship to points scored
Naturally, taller people score more points in a basketball game, but if you only look at NBA players, that relationship goes away
A person being in the NBA is a collider! Colliders are another way to see selection bias

The Front Door Criterion: Mediators I

Another case where controlling for a variable actually adds bias is if that variable is known as a “mediator”.

Example:

(front-door)
(back-door)
(back-door)
Should we control for ?
If we did, this would block the front-door!

The Front Door Criterion: Mediators II

Another case where controlling for a variable actually adds bias is if that variable is known as a “mediator”.

Example:

If we control for , would block the front-door!
If we can estimate and (note, no back-doors to either of these!), we can estimate

This is the front door method

The Front Door Criterion: Mediators III

Tobacco industry claimed that could be spurious due to a confounding gene that affects both!
- Smoking gene is unobservable
Suppose smoking causes tar buildup in lungs, which cause cancer
We should not control for tar, it's on the front-door path
- This is how scientific studies can relate smoking to cancer

Summary: DAG Rules for Causal Identification

Thus, to achieve causal identification, control for the minimal amount of variables such that:

Ensure no back-door path remains open
- Close back-door paths by controlling for any one variable along that path
- Colliders along a path automatically close that path
Ensure no front-door path is closed
- Do not control for mediators

3.2 — Causal Inference and DAGs

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Correlation vs. Causation

Causal Diagrams

DAG Rules

You Don’t Need an RCT to Talk About Causality

Statistics profession is obstinant that we cannot say anything about causality
But you have to! It's how the human brain works!
We can’t concieve of (spurious) correlation without some causation

The Causal Revolution

RCTs and Evidence-Based PolicyShould we ONLY base policies on the evidence from Randomized Controlled Trials?

  

RCTs and Evidence-Based Policy

Should we ONLY base policies on the evidence from Randomized Controlled Trials?

|￣￣￣￣￣￣￣￣￣￣|
IF U DONT SMOKE,
U ALREADY
BELIEVE IN
CAUSAL INFERENCE
WITHOUT
RANDOMIZED TRIALS
|＿＿＿＿＿＿＿＿＿＿|
(__/) ||
(•ㅅ•) ||
/ 　づ#HistorianSignBunny #Epidemiology
— Ellie Murray (@EpiEllie) July 13, 2018

RCTs and Evidence-Based Policy

Should we ONLY base policies on the evidence from Randomized Controlled Trials?

|￣￣￣￣￣￣￣￣￣￣|
IF U DONT SMOKE,
U ALREADY
BELIEVE IN
CAUSAL INFERENCE
WITHOUT
RANDOMIZED TRIALS
|＿＿＿＿＿＿＿＿＿＿|
(__/) ||
(•ㅅ•) ||
/ 　づ#HistorianSignBunny #Epidemiology
— Ellie Murray (@EpiEllie) July 13, 2018

Source: British Medical Journal

RCTs and Evidence-Based Policy III

Correlation vs. Causation

Correlation and Causation I

"Correlation implies casuation," the dean whispered as he handed me my PhD.

"But then why-"

"Because if they knew, they wouldn't need us."
— David Robinson (@drob) June 22, 2017

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated
“Correlation implies causation”
- for an association, there must be some causal chain that relates and
- but not necessarily merely

What Does Causation Mean?

“Correlation does not imply causation”
- this is exactly backwards!
- this is just pointing out that exogeneity is violated
“Correlation implies causation”
- for an association, there must be some causal chain that relates and
- but not necessarily merely
“Correlation plus exogeneity is causation.”

Correlation and Causation

Correlation:
- Math & Statistics
- Computers, AI, Machine learning can figure this out (better than humans)
Causation:
- Philosophy, Intuition, Theory
- Counterfactual thinking, unique to humans (vs. animals or computers)
- Computers cannot (yet) figure this out

The Causal Revolution

Causation Requires Counterfactual Thinking

Causal Inference

We will seek to understand what causality is and how we can approach finding it
We will also explore the different common research designs meant to identify causal relationships
These skills, more than supply & demand, constrained optimization models, ISLM, etc, are the tools and comparative advantage of a modern research economist
- Why all big companies (especially in tech) have entire economics departments in them

“The Credibility Revolution”

BREAKING NEWS:
The 2021 Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel has been awarded with one half to David Card and the other half jointly to Joshua D. Angrist and Guido W. Imbens.#NobelPrize pic.twitter.com/nkMjWai4Gn
— The Nobel Prize (@NobelPrize) October 11, 2021

Simultaneous “credibility revolution” in econometrics (c.1990s—2000s)
Use clever research designs to approximate natural experiments
Note: major disagreements between Pearl & Angrist/Imbens, etc.!

Clever Research Designs Identify Causality

Correlation and Causation

Causality isn't achieved; it's approached.
— John B. Holbein (@JohnHolbein1) April 7, 2018

What Then IS Causation?

causes if we can intervene and change without changing anything else, and changes
Y “listens to” X
- may not be the only thing that causes !

What Then IS Causation?

causes if we can intervene and change without changing anything else, and changes
Y “listens to” X
- may not be the only thing that causes !

Example

If is a light switch, and is a light:

Flipping the switch causes the light to go on
But NOT if the light is burnt out (No despite )
OR if the light was already on without )

Non-Causal Claims

All of the following have non-zero correlations. Are they causal?

Example

Greater ice cream sales more violent crime
Rooster crows the sun rises in the morning
Taking Vitamin C colds go away a few days later
Political party in power economy performs better/worse

Counterfactuals

The sine qua non of causal claims are counterfactuals: what would have been if had been different?
It is impossible to make a counterfactual claim from data alone!
Need a (theoretical) causal model of the data-generating process!

Counterfactuals and RCTs

Again, RCTs are invoked as the gold standard for their ability to make counterfactual claims:
Treatment/intervention is randomly assigned to individuals

If person i who recieved treatment had not recieved the treatment, we can predict what his outcome would have been

If person j who did not recieve treatment had recieved treatment, we can predict what her outcome would have been

We can say this because, on average, treatment and control groups are the same before treatment

From RCTs to Causal Models

RCTs are but the best-known method of a large, growing science of causal inference
We need a causal model to describe the data-generating process (DGP)
Requires us to make some assumptions

Causal Diagrams

Causal Diagrams/DAGs

A surprisingly simple, yet rigorous and powerful method of modeling is using a causal diagram or DAG:
- Directed: Each node has arrows that points only one direction
- Acyclic: Arrows only have one direction, and cannot loop back
- Graph

Causal Diagrams/DAGs

A visual model of the data-generating process, encodes our understanding of the causal relationships
Requires some common sense/economic intutition
Remember, all models are wrong, we just need them to be useful!

Causal Diagrams/DAGs

Our light switch example of causality

Drawing a DAG: Example

Suppose we have data on three variables
- IP: how much a firm spends on IP lawsuits
- tech: whether a firm is in tech industry
- profit: firm profits
They are all correlated with each other, but what's are the causal relationships?
We need our own causal model (from theory, intuition, etc) to sort
- Data alone will not tell us!

Drawing a DAG:

Consider all the variables likely to be important to the data-generating process (including variables we can't observe!)
For simplicity, combine some similar ones together or prune those that aren't very important
Consider which variables are likely to affect others, and draw arrows connecting them
Test some testable implications of the model (to see if we have a correct one!)

Side Notes

Drawing an arrow requires a direction - making a statement about causality!
Omitting an arrow makes an equally important statement too!
- In fact, we will need omitted arrows to show causality!
If two variables are correlated, but neither causes the other, likely they are both caused by another (perhaps unobserved) variable - add it!
There should be no cycles or loops (if so, there’s probably another missing variable, such as time)

DAG Example I

Example: what is the effect of education on wages?

Education , “treatment” or “exposure”)
Wages , “outcome” or “response”)

DAG Example I

What other variables are important?
- Ability
- Socioeconomic status
- Demographics
- Phys. Ed. requirements
- Year of birth
- Location
- Schooling laws
- Job connections

DAG Example I

In social science and complex systems, 1000s of variables could plausibly be in DAG!
So simplify:
- Ignore trivial things (Phys. Ed. requirement)
- Combine similar variables (Socioeconomic status, Demographics, Location) Background

DAG Example II

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job connections probably cause wages

DAG Example III

Background, Year of birth, Location, Compulsory schooling, all cause education
Background, year of birth, location, job connections probably cause wages
Job connections in fact is probably caused by education!
Location and background probably both caused by unobserved factor (u1)

DAG Example IV

This is messy, but we have a causal model!
Makes our assumptions explicit, and many of them are testable
DAG suggests certain relationships that will not exist:
- all relationships between laws and conx go through educ
- so if we controlled for educ, then cor(laws,conx) should be zero!

Let the Computer Do It: Dagitty.net I

Dagitty.net is a great tool to make these and give you testable implications
Click Model -> New Model
Name your "exposure" variable of interest) and "outcome" variable

Let the Computer Do It: Dagitty.net II

Click and drag to move nodes around
Add a new variable by double-clicking
Add an arrow by double-clicking one variable and then double-clicking on the target (do again to remove arrow)

Let the Computer Do It: Dagitty.net III

Tells you how to identify your effect! (upper right)

Minimal sufficient adjustment sets containing background, location, year for estimating the total effect of educ on wage: background, location, year

Let the Computer Do It: Dagitty.net III

Tells you some testable implications of your model
These are independencies or conditional independencies:

“X is independent of Y, given Z”

Implies that by controlling for , and should have no correlation

Let the Computer Do It: Dagitty.net III

Tells you some testable implications of your model
Example: look at the last one listed:

job_connections year educ

“Job connections are independent of year, controlling for education”

Implies that by controlling for educ, there should be no correlation between job_connections and year — can test this with data!

Causal Effect

If we control for background, location, and year, we can identify the causal effect of educ wage.

You Can Draw DAGs In R

New package: ggdag
Arrows are made with formula notation: Y~X+Z means "Y is caused by X and Z"

# install.packages("ggdag")
library(ggdag)
dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ", # optional: define X
       outcome = "wage" # optional: define Y
       ) %>%
  ggdag()+
  theme_dag()

You Can Draw DAGs In R

Or you can just copy the code from dagitty.net!
Use dagitty() from the dagitty package, and paste the code in quotes

library(dagitty)
dagitty('dag {
bb="0,0,1,1"
background [pos="0.413,0.335"]
compulsory_schooling_laws [pos="0.544,0.076"]
educ [exposure,pos="0.185,0.121"]
job_connections [pos="0.302,0.510"]
location [pos="0.571,0.431"]
u1 [pos="0.539,0.206"]
wage [outcome,pos="0.552,0.761"]
year [pos="0.197,0.697"]
background -> educ
background -> wage
compulsory_schooling_laws -> educ
educ -> job_connections
educ -> wage
job_connections -> wage
location -> educ
location -> wage
u1 -> background
u1 -> location
year -> educ
year -> wage
}') %>%
  ggdag()+
  theme_dag()

You Can Draw DAGs In R

It's not very pretty, but if you set text = FALSE, use_labels = "name inside ggdag(), makes it easier to read

dagitty('dag {
bb="0,0,1,1"
background [pos="0.413,0.335"]
compulsory_schooling_laws [pos="0.544,0.076"]
educ [exposure,pos="0.185,0.121"]
job_connections [pos="0.302,0.510"]
location [pos="0.571,0.431"]
u1 [pos="0.539,0.206"]
wage [outcome,pos="0.552,0.761"]
year [pos="0.197,0.697"]
background -> educ
background -> wage
compulsory_schooling_laws -> educ
educ -> job_connections
educ -> wage
job_connections -> wage
location -> educ
location -> wage
u1 -> background
u1 -> location
year -> educ
year -> wage
}') %>%
  ggdag(., text = FALSE, use_labels = "name")+
  theme_dag()

ggdag: Additional Tools

If you have defined X (exposure) and Y (outcome), you can use ggdag_paths() to have it show all possible paths between and !

dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  tidy_dagitty(seed = 2) %>%
  ggdag_paths()+
  theme_dag()

You Can Draw DAGs In R

If you have defined X (exposure) and Y (outcome), you can use ggdag_adjustment_set() to have it show you what you need to control for in order to identify !

dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  ggdag_adjustment_set(shadow = T)+
  theme_dag()

You Can Draw DAGs In RYou can also use impliedConditionalIndependencies() from the dagitty package to have it show the testable implications from dagitty.net

library(dagitty)
dagify(wage~educ+conx+year+bckg+loc,
       educ~bckg+year+loc+laws,
       conx~educ,
       bckg~u1,
       loc~u1,
       exposure = "educ",
       outcome = "wage"
       ) %>%
  impliedConditionalIndependencies()
## bckg _||_ conx | educ
## bckg _||_ laws
## bckg _||_ loc | u1
## bckg _||_ year
## conx _||_ laws | educ
## conx _||_ loc | educ
## conx _||_ u1 | bckg, loc
## conx _||_ u1 | educ
## conx _||_ year | educ
## educ _||_ u1 | bckg, loc
## laws _||_ loc
## laws _||_ u1
## laws _||_ wage | bckg, educ, loc, year
## laws _||_ year
## loc _||_ year
## u1 _||_ wage | bckg, loc
## u1 _||_ year

  

DAG Rules

How does dagitty.net and ggdag know how to identify effects, or what to control for, or what implications are testable?
Comes from fancy math called “do-calculus”

Fortunately, these amount to a few simple rules that we can see on a DAG

DAGs I

Typical notation:
is independent variable of interest
- Epidemiology: "intervention" or “exposure”
is dependent or "response" variable
Other variables use other letters
You can of course use words instead of letters!

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:
Indirect effects:
- is a “mediator” variable, the mechanism by which affects

DAGs and Causal Effects

Arrows indicate causal effect (& direction)
Two types of causal effect:

Direct effects:
Indirect effects:
- is a “mediator” variable, the mechanism by which affects
You of course might have both!

Confounders

is a “confounder” of , it causes both and
is made up of two parts:
1. Causal effect of 👍
2. causing both the values of and 👎
Failing to control for will bias our estimate of the causal effect of !

Confounders

Confounders are the DAG-equivalent of omitted variable bias (next class)

By leaving out , this regression is biased
picks up both:

“Front Doors” and “Back Doors”

With this DAG, there are 2 paths that connect and ^†:

A causal “front-door” path:
- 👍 what we want to measure
A non-causal “back-door” path:
- At least one causal arrow runs in the opposite direction
- 👎 adds a confounding bias

^† Regardless of the directions of the arrows!

Controlling I

Ideally, if we ran a randomized control trial and randomily assigned different values of to different individuals, this would delete the arrow between and
- Individuals’ values of do not affect whether or not they are treated ($X$)
This would only leave the front-door,
But we can rarely run an ideal RCT

Controlling I

Instead of an RCT, if we can just “adjust for” or “control for” , we can block the back-door path
This would only leave the front-door path open,
“As good as” an RCT!

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are unobserved factors

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are unobserved factors

If we can randomly assign treatment, this makes treatment exogenous:

Controlling I

Using our terminology from last class, we have an outcome , and some treatment
But there are other unobserved factors

When we (often) can’t randomly assign treatment, we have to find another way to control for measurable things in

Controlling II

Controlling for a single variable along a long causal path is sufficient to block that path!
Causal path:
Backdoor path:
It is sufficient to block this backdoor by controlling either or or !

Controlling II

Controlling for a single variable along a long causal path is sufficient to block that path!
Causal path:
Backdoor path:
It is sufficient to block this backdoor by controlling either or or !

The Back Door Criterion

To identify the causal effect of :
“Back-door criterion”: control for the minimal amount of variables sufficient to ensure that no open back-door exists between and
Example: in this DAG, control for

The Back Door Criterion

Implications of the Back-door criterion:

1) You only need to control for the variables that keep a back-door open, not all other variables!

Example:

(front-door)
(back-door)

The Back Door Criterion

Implications of the Back-door criterion:

1) You only need to control for the variables that keep a back-door open, not all other variables!

Example:

(front-door)
(back-door)
Need only control for or to block the back-door path
and have no effect on , and therefore we don’t need to control for them!

The Back Door Criterion: Colliders

2) Exception: the case of a “collider”

If arrows “collide” at a node, that node is automatically blocking the pathway, do not control for it!
Controlling for a collider would open the path and add bias!

Example:

(front-door)
(back-door, but blocked by B!)

The Back Door Criterion: Colliders

2) Exception: the case of a “collider”

If arrows “collide” at a node, that node is automatically blocking the pathway, do not control for it!
Controlling for a collider would open the path and add bias!

Example:

(front-door)
(back-door, but blocked by B!)
Don’t need to control for anything here!

The Back Door Criterion: Colliders

Example: Are you less likely to get the flu if you are hit by a bus?

: getting the flu
: being hit by a bus
: being in the hospital
Both and send you to (arrows)
Conditional on being in , negative correlation between and (spurious!)

The Back Door Criterion: Colliders

In the NBA, apparently players’ height has no relationship to points scored?

The Back Door Criterion: Colliders

In the NBA, players’ height has no relationship to points scored
Naturally, taller people score more points in a basketball game, but if you only look at NBA players, that relationship goes away
A person being in the NBA is a collider! Colliders are another way to see selection bias

The Front Door Criterion: Mediators I

Another case where controlling for a variable actually adds bias is if that variable is known as a “mediator”.

Example:

(front-door)
(back-door)
(back-door)
Should we control for ?
If we did, this would block the front-door!

The Front Door Criterion: Mediators II

Another case where controlling for a variable actually adds bias is if that variable is known as a “mediator”.

Example:

If we control for , would block the front-door!
If we can estimate and (note, no back-doors to either of these!), we can estimate

This is the front door method

The Front Door Criterion: Mediators III

Tobacco industry claimed that could be spurious due to a confounding gene that affects both!
- Smoking gene is unobservable
Suppose smoking causes tar buildup in lungs, which cause cancer
We should not control for tar, it's on the front-door path
- This is how scientific studies can relate smoking to cancer

Summary: DAG Rules for Causal Identification

Thus, to achieve causal identification, control for the minimal amount of variables such that:

Ensure no back-door path remains open
- Close back-door paths by controlling for any one variable along that path
- Colliders along a path automatically close that path
Ensure no front-door path is closed
- Do not control for mediators

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

3.2 — Causal Inference and DAGs

ECON 480 • Econometrics • Fall 2021

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF21 metricsF21.classes.ryansafner.com

Outline

You Don’t Need an RCT to Talk About Causality

The Causal Revolution

RCTs and Evidence-Based Policy

RCTs and Evidence-Based Policy

RCTs and Evidence-Based Policy

RCTs and Evidence-Based Policy III

RCTs and Evidence-Based Policy III

Correlation vs. Causation

Correlation and Causation I

What Does Causation Mean?

What Does Causation Mean?

What Does Causation Mean?

Correlation and Causation

The Causal Revolution

Causation Requires Counterfactual Thinking

Causal Inference

“The Credibility Revolution”

Clever Research Designs Identify Causality

Correlation and Causation

What Then IS Causation?

What Then IS Causation?

Non-Causal Claims

Counterfactuals

Counterfactuals and RCTs

From RCTs to Causal Models

Causal Diagrams

Causal Diagrams/DAGs

Causal Diagrams/DAGs

Causal Diagrams/DAGs

Drawing a DAG: Example

Drawing a DAG:

Side Notes

DAG Example I

DAG Example I

DAG Example I

DAG Example II

DAG Example III

DAG Example IV

Let the Computer Do It: Dagitty.net I

Let the Computer Do It: Dagitty.net II

Let the Computer Do It: Dagitty.net III

Let the Computer Do It: Dagitty.net III

Let the Computer Do It: Dagitty.net III

Let the Computer Do It: Dagitty.net III

Causal Effect

You Can Draw DAGs In R

You Can Draw DAGs In R

You Can Draw DAGs In R

ggdag: Additional Tools

You Can Draw DAGs In R

You Can Draw DAGs In R

DAG Rules

DAG Rules

DAGs I

DAGs and Causal Effects

DAGs and Causal Effects

DAGs and Causal Effects

Confounders

Confounders

“Front Doors” and “Back Doors”

Controlling I

Controlling I

Controlling I

Controlling I

Controlling I

Controlling II

Controlling II

The Back Door Criterion

The Back Door Criterion

The Back Door Criterion

The Back Door Criterion: Colliders

The Back Door Criterion: Colliders

The Back Door Criterion: Colliders

The Back Door Criterion: Colliders

The Back Door Criterion: Colliders

The Front Door Criterion: Mediators I

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com