4.1 — Panel Data and Fixed Effects

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Pooled Regression Model

Fixed Effects Model

Least Squares Dummy Variable Approach

De-Meaned Approach

Two-Way Fixed Effects

Pooled Regression Model

Types of Data ICross-sectional data: compare different individual i’s at same time ˉt
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
Alabama201213.316056
Alaska201212.311976
Arizona201213.720419
Arkansas201216.466730
California20128.756507
Colorado201210.092204
6 rows | 1-3 of 4 columns

  

state <fct>	year <fct>	deaths <dbl>
Alabama	2012	13.316056
Alaska	2012	12.311976
Arizona	2012	13.720419
Arkansas	2012	16.466730
California	2012	8.756507
Colorado	2012	10.092204

Types of Data ICross-sectional data: compare different individual i’s at same time ˉt
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
Alabama201213.316056
Alaska201212.311976
Arizona201213.720419
Arkansas201216.466730
California20128.756507
Colorado201210.092204
6 rows | 1-3 of 4 columns
Time-series data: track same individual ˉi over different times t
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
Maryland200710.866679
Maryland200810.740963
Maryland20099.892754
Maryland20108.783883
Maryland20118.626745
Maryland20128.941916
6 rows | 1-3 of 4 columns

  

state <fct>	year <fct>	deaths <dbl>
Alabama	2012	13.316056
Alaska	2012	12.311976
Arizona	2012	13.720419
Arkansas	2012	16.466730
California	2012	8.756507
Colorado	2012	10.092204

state <fct>	year <fct>	deaths <dbl>
Maryland	2007	10.866679
Maryland	2008	10.740963
Maryland	2009	9.892754
Maryland	2010	8.783883
Maryland	2011	8.626745
Maryland	2012	8.941916

Types of Data I

Cross-sectional data: compare different individual ’s at same time

Time-series data: track same individual over different times

Types of Data I

Cross-sectional data: compare different individual ’s at same time

Time-series data: track same individual over different times

Panel data: combines these dimensions: compare all individual ’s over all time ’s

Panel Data I

Panel Data II

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>
Alabama	2007	18.075232
Alabama	2008	16.289227
Alabama	2009	13.833678
Alabama	2010	13.434084
Alabama	2011	13.771989
Alabama	2012	13.316056
Alaska	2007	16.301184
Alaska	2008	12.744090
Alaska	2009	12.973849
Alaska	2010	11.670893

Panel or Longitudinal data contains
- repeated observations
- on multiple individuals

Panel Data II

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>
Alabama	2007	18.075232
Alabama	2008	16.289227
Alabama	2009	13.833678
Alabama	2010	13.434084
Alabama	2011	13.771989
Alabama	2012	13.316056
Alaska	2007	16.301184
Alaska	2008	12.744090
Alaska	2009	12.973849
Alaska	2010	11.670893

Panel or Longitudinal data contains
- repeated observations
- on multiple individuals
Thus, our regression equation looks like:

for individual in time .

Panel Data: Our Motivating Example

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>
Alabama	2007	18.075232
Alabama	2008	16.289227
Alabama	2009	13.833678
Alabama	2010	13.434084
Alabama	2011	13.771989
Alabama	2012	13.316056
Alaska	2007	16.301184
Alaska	2008	12.744090
Alaska	2009	12.973849
Alaska	2010	11.670893

Example: Do cell phones cause more traffic fatalities?

No measure of cell phones used while driving
- cell_plans as a proxy for cell phone usage
State-level data over 6 years

The Data I

glimpse(phones)

## Rows: 306
## Columns: 8
## $ year          <fct> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…
## $ state         <fct> Alabama, Alaska, Arizona, Arkansas, California, Colorado…
## $ urban_percent <dbl> 30, 55, 45, 21, 54, 34, 84, 31, 100, 53, 39, 45, 11, 56,…
## $ cell_plans    <dbl> 8135.525, 6730.282, 7572.465, 8071.125, 8821.933, 8162.0…
## $ cell_ban      <fct> 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ text_ban      <fct> 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ deaths        <dbl> 18.075232, 16.301184, 16.930578, 19.595430, 12.104340, 1…
## $ year_num      <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…

The Data II

phones %>%
  count(state)

ABCDEFGHIJ0123456789

state <fct>	n <int>
Alabama	6
Alaska	6
Arizona	6
Arkansas	6
California	6
Colorado	6
Connecticut	6
Delaware	6
District of Columbia	6
Florida	6

The Data II

phones %>%
  count(state)

ABCDEFGHIJ0123456789

state <fct>	n <int>
Alabama	6
Alaska	6
Arizona	6
Arkansas	6
California	6
Colorado	6
Connecticut	6
Delaware	6
District of Columbia	6
Florida	6

phones %>%
  count(year)

ABCDEFGHIJ0123456789

year <fct>	n <int>
2007	51
2008	51
2009	51
2010	51
2011	51
2012	51

The Data III

phones %>%
  distinct(state)

ABCDEFGHIJ0123456789

state <fct>
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida

The Data III

phones %>%
  distinct(state)

ABCDEFGHIJ0123456789

state <fct>
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida

phones %>%
  distinct(year)

ABCDEFGHIJ0123456789

year <fct>
2007
2008
2009
2010
2011
2012

The Data IVphones %>%
  summarize(States = n_distinct(state),
            Years = n_distinct(year))
ABCDEFGHIJ0123456789
States
<int>
Years
<int>
516
1 row

  

States <int>	Years <int>
51	6

Pooled Regression I

What if we just ran a standard regression:

Pooled Regression I

What if we just ran a standard regression:

number of groups (e.g. U.S. States)
number of periods (e.g. years)

This is a pooled regression model: treats all observations as independent

Pooled Regression IIpooled <- lm(deaths ~ cell_plans, data = phones)
pooled %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)17.33710341670.97538450417.7746355.821724e-49
cell_plans-0.00056663850.000106975-5.2969262.264086e-07
2 rows

  

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	17.3371034167	0.975384504	17.774635	5.821724e-49
cell_plans	-0.0005666385	0.000106975	-5.296926	2.264086e-07

Pooled Regression III

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths)+
  geom_point()+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)

Pooled Regression III

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths)+
  geom_point()+
  geom_smooth(method = "lm", color = "red")+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)

Recap: Assumptions about Errors

Recall the 4 critical assumptions about :

The expected value of the residuals is 0
The variance of the residuals over is constant:
Errors are not correlated across observations:
There is no correlation between and the error term:

Biases of Pooled Regression

Assumption 3:
Pooled regression model is biased because it ignores:
- Multiple observations from same group
- Multiple observations from same time
Thus, errors are serially or auto-correlated; within same and within same

Biases of Pooled Regression: Our Example

Multiple observations from same state
- Probably similarities among for obs in same state
- Residuals on observations from same state are likely correlated
Multiple observations from same year
- Probably similarities among for obs in same year
- Residuals on observations from same year are likely correlated

Example: Consider Just 5 States

phones %>%
  filter(state %in% c("District of Columbia",
                      "Maryland", "Texas",
                      "California", "Kansas")) %>%
ggplot(data = .)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+
  geom_point()+ 
  geom_smooth(method = "lm")+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)+
  theme(legend.position = "top")

Example: Consider Just 5 States

phones %>%
  filter(state %in% c("District of Columbia",
                      "Maryland", "Texas",
                      "California", "Kansas")) %>%
ggplot(data = .)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+ 
  geom_point()+ 
  geom_smooth(method = "lm")+ 
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)+
  theme(legend.position = "none")+
  facet_wrap(~state, ncol=3)

Look at All States

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+ 
  geom_point()+ 
  geom_smooth(method = "lm")+ 
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed")+
  theme(legend.position = "none")+
  facet_wrap(~state, ncol=7)

The Bias in our Pooled Regression

is endogenous:

The Bias in our Pooled Regression

is endogenous:

The Bias in our Pooled Regression

is endogenous:

Things in uit correlated with Cell phonesit:
- infrastructure spending, population, urban vs. rural, more/less cautious citizens, cultural attitudes towards driving, texting, etc

The Bias in our Pooled Regression

is endogenous:

Things in uit correlated with Cell phonesit:
- infrastructure spending, population, urban vs. rural, more/less cautious citizens, cultural attitudes towards driving, texting, etc

A lot of these things vary systematically by State!
- cor(uit1,uit2)≠0
  - Error in State during correlates with error in State during
  - things in State that don’t change over time

Fixed Effects Model

Fixed Effects: DAG

A simple pooled model likely contains lots of omitted variable bias
Many (often unobservable) factors that determine both Phones & Deaths
- Culture, infrastructure, population, geography, institutions, etc

Fixed Effects: DAG

A simple pooled model likely contains lots of omitted variable bias
Many (often unobservable) factors that determine both Phones & Deaths
- Culture, infrastructure, population, geography, institutions, etc
But the beauty of this is that most of these factors systematically vary by U.S. State and are stable over time!
We can simply “control for State” to safely remove the influence of all of these factors!

Fixed Effects: Decomposing uitMuch of the endogeneity in Xit can be explained by systematic differences across i (groups)

  

Fixed Effects: Decomposing

Much of the endogeneity in can be explained by systematic differences across (groups)
Exploit the systematic variation across groups with a fixed effects model

Fixed Effects: Decomposing

Much of the endogeneity in can be explained by systematic differences across (groups)
Exploit the systematic variation across groups with a fixed effects model
Decompose the model error term into two parts:

Fixed Effects:

Decompose the model error term into two parts:

are group-specific fixed effects
- group tends to have higher or lower than other groups given regressor(s)
- estimate a separate for each group
- essentially, estimate a separate constant (intercept) for each group
- notice this is stable over time within each group (subscript only , no
This includes all factors that do not change within group i over time

Fixed Effects:

Decompose the model error term into two parts:

is the remaining random error
- As usual in OLS, assume the 4 typical assumptions about this error:
- , , ,
includes all other factors affecting not contained in group effect
- i.e. differences within each group that change over time
- Be careful: Xit can still be endogenous due to other factors!

Fixed Effects: New Regression Equation

We've pulled out of the original error term into the regression
Essentially we’ll estimate an intercept for each group (minus one, which is
- avoiding the dummy variable trap
Must have multiple observations (over time) for each group (i.e. panel data)

Fixed Effects: Our Example

is the State fixed effect
- Captures everything unique about each state that does not change over time
- culture, institutions, history, geography, climate, etc!
There could still be factors in that are correlated with !
- things that do change over time within States
- perhaps individual States have cell phone bans for some years in our data

Estimating Fixed Effects Models

Two methods to estimate fixed effects models:

Least Squares Dummy Variable (LSDV) approach
De-meaned data approach

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Sounds like a lot of work, automatic in R

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Sounds like a lot of work, automatic in R

^† If we do not estimate

, we could include all N dummies. In either case,

takes the place of one category-dummy.

Least Squares Dummy Variable Approach: Our Example

Example:

Let Alabama be the reference category , include all other States

Our Example in R I

If state is a factor variable, just include it in the regression
R automatically creates dummy variables and includes them in the regression
- Keeps intercept and leaves out first group dummy

Our Example in R II

fe_reg_1 <- lm(deaths ~ cell_plans + state, data = phones)
fe_reg_1 %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	25.507679925	1.0176400289	25.06552337	1.241581e-70
cell_plans	-0.001203742	0.0001013125	-11.88147584	3.483442e-26
stateAlaska	-2.484164783	0.6745076282	-3.68293060	2.816972e-04
stateArizona	-1.510577383	0.6704569688	-2.25305643	2.510925e-02
stateArkansas	3.192662931	0.6664383936	4.79063476	2.829319e-06
stateCalifornia	-4.978668651	0.6655467951	-7.48056889	1.206933e-12
stateColorado	-4.344553493	0.6654735335	-6.52851432	3.588784e-10
stateConnecticut	-6.595185530	0.6654428902	-9.91097152	8.698802e-20
stateDelaware	-2.098393628	0.6666483193	-3.14767707	1.842218e-03
stateDistrict of Columbia	6.355790010	1.2897172620	4.92804911	1.499627e-06

De-meaned Approach

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect
For each group , find the means (over time, :

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect
For each group , find the means (over time, :
Where:
- : average value of for group
- : average value of for group
- : average value of for group
- , by assumption 1 about errors

De-meaned Approach II

Subtract the means equation from the pooled equation to get:

De-meaned Approach II

Subtract the means equation from the pooled equation to get:

Within each group , the de-meaned variables and 's all have a mean of 0^†
Variables that don't change over time will drop out of analysis altogether
Removes any source of variation across groups (all now have mean of 0) to only work with variation within each group

^† Recall Rule 4 from the 2.3 class notes on the Summation Operator:

De-meaned Approach III

Yields identical results to dummy variable approach
More useful when we have many groups (would be many dummies)
Demonstrates intuition behind fixed effects:
- Converts all data to deviations from the mean of each group
- All groups are “centered” at 0, no variation across groups
- Fixed effects are often called the “within” estimators, they exploit variation within groups, not across groups

De-meaned Approach IV

We are basically comparing groups to themselves over time
- apples to apples comparison
- e.g. Maryland in 2000 vs. Maryland in 2005
Ignore all differences between groups, only look at differences within groups over time

De-Meaning the Data in R I# get means of Y and X by state
means_state <- phones %>%
  group_by(state) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
# look at it
means_state


  

De-Meaning the Data in R I

# get means of Y and X by state
means_state <- phones %>%
  group_by(state) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
# look at it
means_state

ABCDEFGHIJ0123456789

state <fct>	avg_deaths <dbl>	avg_phones <dbl>
Alabama	14.786711	8906.370
Alaska	13.612953	7817.759
Arizona	14.249825	8097.482
Arkansas	17.543881	9268.153
California	9.659712	9029.594
Colorado	10.351405	8981.762
Connecticut	8.141739	8947.729
Delaware	12.209610	9304.052
District of Columbia	8.015895	19811.205
Florida	13.544635	9078.592

De-Meaning the Data in R II

ggplot(data = means_state)+
  aes(x = fct_reorder(state, avg_deaths),
      y = avg_deaths,
      color = state)+
  geom_point()+
  geom_segment(aes(y = 0,
                   yend = avg_deaths,
                   x = state,
                   xend = state))+
  coord_flip()+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=10)+
  theme(legend.position = "none")

Visualizing “Within Group” Estimates for the 5 States

Visualizing “Within Group” Estimates for All 51 States

De-meaned Approach in R I

The fixest package is designed for running regressions with fixed effects
feols() function is just like lm(), with some additional arguments:

#install.packages("fixest")
library(fixest)
fe_reg_1_alt <- feols(deaths ~ cell_plans | state,
                      data = phones)

De-meaned Approach in R IIfe_reg_1_alt %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51
## Standard-errors: Clustered (state) 
##             Estimate Std. Error  t value  Pr(>|t|)    
## cell_plans -0.001204   0.000143 -8.41708 3.792e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1.05007     Adj. R2: 0.886524
##                 Within R2: 0.357238
# or using broom's tidy()
fe_reg_1_alt %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.0012037420.0001430118-8.4170773.791955e-11
1 row

  

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
cell_plans	-0.001203742	0.0001430118	-8.417077	3.791955e-11

Two-Way Fixed Effects

State fixed effect controls for all factors that vary by state but are stable over time
But there are still other (often unobservable) factors that affect both Phones and Deaths, that don’t vary by State
- The country’s macroeconomic performance, federal laws, etc

Two-Way Fixed Effects

State fixed effect controls for all factors that vary by state but are stable over time
But there are still other (often unobservable) factors that affect both Phones and Deaths, that don’t vary by State
- The country’s macroeconomic performance, federal laws, etc
If these factors systematically vary over time, but are the same by State, then we can “control for Year” to safely remove the influence of all of these factors!

Two-Way Fixed EffectsA one-way fixed effects model estimates a fixed effect for groups

  

Two-Way Fixed Effects

A one-way fixed effects model estimates a fixed effect for groups
Two-way fixed effects model estimates fixed effects for both groups and time periods
: group fixed effects
- accounts for time-invariant differences across groups
: time fixed effects
- accounts for group-invariant differences over time
remaining random error
- all remaining factors that affect that vary by state and change over time

Two-Way Fixed Effects: Our Example

: State fixed effects
- differences across states that are stable over time (note subscript only)
- e.g. geography, culture, (unchanging) state laws
: Year fixed effects
- differences over time that are stable across states (note subscript only)
- e.g. economy-wide macroeconomic changes, federal laws passed

Visualizing Year Effects I# find averages for years
means_year <- phones %>%
  group_by(year) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
means_year
ABCDEFGHIJ0123456789
year
<fct>
avg_deaths
<dbl>
avg_phones
<dbl>
200714.007518064.531
200812.871568482.903
200912.086328859.706
201011.614879134.592
201111.364319485.238
201211.656669660.474
6 rows

  

year <fct>	avg_deaths <dbl>	avg_phones <dbl>
2007	14.00751	8064.531
2008	12.87156	8482.903
2009	12.08632	8859.706
2010	11.61487	9134.592
2011	11.36431	9485.238
2012	11.65666	9660.474

Visualizing Year Effects II

ggplot(data = phones)+
  aes(x = year,
      y = deaths)+
  geom_point(aes(color = year))+
  # Add the yearly means as black points
  geom_point(data = means_year,
             aes(x = year,
                 y = avg_deaths),
             size = 3,
             color = "black")+
  # connect the means with a line
  geom_line(data = means_year,
            aes(x = as.numeric(year),
                y = avg_deaths),
            color = "black",
            size = 1)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 14)+
  theme(legend.position = "none")

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

2) Fully De-meaned data:

where for each variable:

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

2) Fully De-meaned data:

where for each variable:

3) Hybrid: de-mean for one effect (groups or years) and add dummies for the other effect (years or groups)

LSDV Method

fe2_reg_1 <- lm(deaths ~ cell_plans + state + year,
                data = phones)
fe2_reg_1 %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	18.9304707399	1.4511323962	13.0453092	5.427406e-30
cell_plans	-0.0002995294	0.0001723149	-1.7382677	8.339982e-02
stateAlaska	-1.4998292482	0.6241082951	-2.4031554	1.698648e-02
stateArizona	-0.7791714713	0.6113519094	-1.2745057	2.036724e-01
stateArkansas	2.8655344756	0.5985062952	4.7878101	2.895040e-06
stateCalifornia	-5.0900897113	0.5956293282	-8.5457338	1.299236e-15
stateColorado	-4.4127241692	0.5953924847	-7.4114543	1.945083e-12
stateConnecticut	-6.6325834801	0.5952933996	-11.1417051	1.169797e-23
stateDelaware	-2.4579829953	0.5991822226	-4.1022295	5.546475e-05
stateDistrict of Columbia	-3.5044963616	1.9710939218	-1.7779449	7.663326e-02

With fixestfe2_reg_2 <- feols(deaths ~ cell_plans | state + year,
                 data = phones)
fe2_reg_2 %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51,  year: 6
## Standard-errors: Clustered (state) 
##            Estimate Std. Error   t value Pr(>|t|) 
## cell_plans   -3e-04   0.000305 -0.980739  0.33144 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.930036     Adj. R2: 0.909197
##                  Within R2: 0.011989
fe2_reg_2 %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.00029952940.0003054118-0.98073940.3314431
1 row

  

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
cell_plans	-0.0002995294	0.0003054118	-0.9807394	0.3314431

Adding Covariates

State fixed effect absorbs all unobserved factors that vary by state, but are constant over time
Year fixed effect absorbs all unobserved factors that vary by year, but are constant over States
But there are still other (often unobservable) factors that affect both Phones and Deaths, that vary by State and change over time!
- Some States change their laws during the time period
- State urbanization rates change over the time period
We will also need to control for these variables (not picked up by fixed effects!)
- Add them to the regression

Adding Covariates I

Can still add covariates to remove endogeneity not soaked up by fixed effects
- factors that change within groups over time
- e.g. some states pass bans over the time period in data (some years before, some years after)

Adding Covariates IIfe2_controls_reg <- feols(deaths ~ cell_plans + text_ban + urban_percent + cell_ban | state + year,
                          data = phones) 
fe2_controls_reg %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51,  year: 6
## Standard-errors: Clustered (state) 
##                Estimate Std. Error  t value Pr(>|t|)    
## cell_plans    -0.000340   0.000277 -1.22780 0.225269    
## text_ban1      0.255926   0.243444  1.05127 0.298188    
## urban_percent  0.013135   0.009815  1.33822 0.186878    
## cell_ban1     -0.679796   0.335655 -2.02528 0.048194 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.920123     Adj. R2: 0.910039
##                  Within R2: 0.032939
fe2_controls_reg %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.00034037350.0002772212-1.2278050.22526919
text_ban10.25592615690.24344421111.0512720.29818803
urban_percent0.01313476570.00981507051.3382240.18687751
cell_ban1-0.67979565220.3356553662-2.0252790.04819377
4 rows

  

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
cell_plans	-0.0003403735	0.0002772212	-1.227805	0.22526919
text_ban1	0.2559261569	0.2434442111	1.051272	0.29818803
urban_percent	0.0131347657	0.0098150705	1.338224	0.18687751
cell_ban1	-0.6797956522	0.3356553662	-2.025279	0.04819377

Comparing Modelslibrary(huxtable)
huxreg("Pooled" = pooled,
       "State Effects" = fe_reg_1,
       "State & Year Effects" = fe2_reg_1,
       "With Controls" = fe2_controls_reg,
       coefs = c("Intercept" = "(Intercept)",
                 "Cell phones" = "cell_plans",
                 "Cell Ban" = "cell_ban1",
                 "Texting Ban" = "text_ban1",
                 "Urbanization Rate" = "urban_percent"),
       statistics = c("N" = "nobs",
                      "R-Squared" = "r.squared",
                      "SER" = "sigma"),
       number_format = 4)

PooledState EffectsState & Year EffectsWith Controls

Intercept17.3371 ***25.5077 ***18.9305 ***       

(0.9754)   (1.0176)   (1.4511)          

Cell phones-0.0006 ***-0.0012 ***-0.0003    -0.0003  

(0.0001)   (0.0001)   (0.0002)   (0.0003) 

Cell Ban                           -0.6798 *

                           (0.3357) 

Texting Ban                           0.2559  

                           (0.2434) 

Urbanization Rate                           0.0131  

                           (0.0098) 

N306         306         306         306       

R-Squared0.0845    0.9055    0.9259    0.9274  

SER3.2791    1.1526    1.0310    1.0262  

 *** p < 0.001;  ** p < 0.01;  * p < 0.05.

	Pooled	State Effects	State & Year Effects	With Controls
Intercept	17.3371 ***	25.5077 ***	18.9305 ***
	(0.9754)	(1.0176)	(1.4511)
Cell phones	-0.0006 ***	-0.0012 ***	-0.0003	-0.0003
	(0.0001)	(0.0001)	(0.0002)	(0.0003)
Cell Ban				-0.6798 *
				(0.3357)
Texting Ban				0.2559
				(0.2434)
Urbanization Rate				0.0131
				(0.0098)
N	306	306	306	306
R-Squared	0.0845	0.9055	0.9259	0.9274
SER	3.2791	1.1526	1.0310	1.0262
* p < 0.001; p < 0.01; * p < 0.05.

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Tile View: Overview of Slides

4.1 — Panel Data and Fixed Effects

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Outline

Pooled Regression Model

Fixed Effects Model

Least Squares Dummy Variable Approach

De-Meaned Approach

Two-Way Fixed Effects

Pooled Regression Model

Types of Data ICross-sectional data: compare different individual i’s at same time ˉt
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
cell_plans
<dbl>
Alabama201213.3160569433.800
Alaska201212.3119768872.799
Arizona201213.7204198810.889
Arkansas201216.46673010047.027
California20128.7565079362.424
Colorado201210.0922049403.225
6 rows

  

state <fct>	year <fct>	deaths <dbl>	cell_plans <dbl>
Alabama	2012	13.316056	9433.800
Alaska	2012	12.311976	8872.799
Arizona	2012	13.720419	8810.889
Arkansas	2012	16.466730	10047.027
California	2012	8.756507	9362.424
Colorado	2012	10.092204	9403.225

Types of Data ICross-sectional data: compare different individual i’s at same time ˉt
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
cell_plans
<dbl>
Alabama201213.3160569433.800
Alaska201212.3119768872.799
Arizona201213.7204198810.889
Arkansas201216.46673010047.027
California20128.7565079362.424
Colorado201210.0922049403.225
6 rows
Time-series data: track same individual ˉi over different times t
ABCDEFGHIJ0123456789
state
<fct>
year
<fct>
deaths
<dbl>
cell_plans
<dbl>
Maryland200710.8666798942.137
Maryland200810.7409639290.689
Maryland20099.8927549339.452
Maryland20108.7838839630.120
Maryland20118.62674510335.795
Maryland20128.94191610393.295
6 rows

  

state <fct>	year <fct>	deaths <dbl>	cell_plans <dbl>
Alabama	2012	13.316056	9433.800
Alaska	2012	12.311976	8872.799
Arizona	2012	13.720419	8810.889
Arkansas	2012	16.466730	10047.027
California	2012	8.756507	9362.424
Colorado	2012	10.092204	9403.225

Types of Data I

Cross-sectional data: compare different individual ’s at same time

Time-series data: track same individual over different times

Types of Data I

Cross-sectional data: compare different individual ’s at same time

Time-series data: track same individual over different times

Panel data: combines these dimensions: compare all individual ’s over all time ’s

Panel Data I

Panel Data II

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>	cell_plans <dbl>
Alabama	2007	18.075232	8135.525
Alabama	2008	16.289227	8494.391
Alabama	2009	13.833678	8979.108
Alabama	2010	13.434084	9054.894
Alabama	2011	13.771989	9340.501
Alabama	2012	13.316056	9433.800
Alaska	2007	16.301184	6730.282
Alaska	2008	12.744090	5580.707
Alaska	2009	12.973849	8389.730
Alaska	2010	11.670893	8560.595

Panel or Longitudinal data contains
- repeated observations
- on multiple individuals

Panel Data II

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>	cell_plans <dbl>
Alabama	2007	18.075232	8135.525
Alabama	2008	16.289227	8494.391
Alabama	2009	13.833678	8979.108
Alabama	2010	13.434084	9054.894
Alabama	2011	13.771989	9340.501
Alabama	2012	13.316056	9433.800
Alaska	2007	16.301184	6730.282
Alaska	2008	12.744090	5580.707
Alaska	2009	12.973849	8389.730
Alaska	2010	11.670893	8560.595

Panel or Longitudinal data contains
- repeated observations
- on multiple individuals
Thus, our regression equation looks like:

for individual in time .

Panel Data: Our Motivating Example

ABCDEFGHIJ0123456789

state <fct>	year <fct>	deaths <dbl>	cell_plans <dbl>
Alabama	2007	18.075232	8135.525
Alabama	2008	16.289227	8494.391
Alabama	2009	13.833678	8979.108
Alabama	2010	13.434084	9054.894
Alabama	2011	13.771989	9340.501
Alabama	2012	13.316056	9433.800
Alaska	2007	16.301184	6730.282
Alaska	2008	12.744090	5580.707
Alaska	2009	12.973849	8389.730
Alaska	2010	11.670893	8560.595

Example: Do cell phones cause more traffic fatalities?

No measure of cell phones used while driving
- cell_plans as a proxy for cell phone usage
State-level data over 6 years

The Data I

glimpse(phones)

## Rows: 306
## Columns: 8
## $ year          <fct> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…
## $ state         <fct> Alabama, Alaska, Arizona, Arkansas, California, Colorado…
## $ urban_percent <dbl> 30, 55, 45, 21, 54, 34, 84, 31, 100, 53, 39, 45, 11, 56,…
## $ cell_plans    <dbl> 8135.525, 6730.282, 7572.465, 8071.125, 8821.933, 8162.0…
## $ cell_ban      <fct> 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ text_ban      <fct> 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ deaths        <dbl> 18.075232, 16.301184, 16.930578, 19.595430, 12.104340, 1…
## $ year_num      <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 20…

The Data II

phones %>%
  count(state)

ABCDEFGHIJ0123456789

state <fct>	n <int>
Alabama	6
Alaska	6
Arizona	6
Arkansas	6
California	6
Colorado	6
Connecticut	6
Delaware	6
District of Columbia	6
Florida	6

The Data II

phones %>%
  count(state)

ABCDEFGHIJ0123456789

state <fct>	n <int>
Alabama	6
Alaska	6
Arizona	6
Arkansas	6
California	6
Colorado	6
Connecticut	6
Delaware	6
District of Columbia	6
Florida	6

phones %>%
  count(year)

ABCDEFGHIJ0123456789

year <fct>	n <int>
2007	51
2008	51
2009	51
2010	51
2011	51
2012	51

The Data III

phones %>%
  distinct(state)

ABCDEFGHIJ0123456789

state <fct>
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida

The Data III

phones %>%
  distinct(state)

ABCDEFGHIJ0123456789

state <fct>
Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida

phones %>%
  distinct(year)

ABCDEFGHIJ0123456789

year <fct>
2007
2008
2009
2010
2011
2012

The Data IVphones %>%
  summarize(States = n_distinct(state),
            Years = n_distinct(year))
ABCDEFGHIJ0123456789
States
<int>
Years
<int>
516
1 row

  

Pooled Regression I

What if we just ran a standard regression:

Pooled Regression I

What if we just ran a standard regression:

number of groups (e.g. U.S. States)
number of periods (e.g. years)

This is a pooled regression model: treats all observations as independent

Pooled Regression IIpooled <- lm(deaths ~ cell_plans, data = phones)
pooled %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
(Intercept)17.33710341670.97538450417.7746355.821724e-49
cell_plans-0.00056663850.000106975-5.2969262.264086e-07
2 rows

  

Pooled Regression III

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths)+
  geom_point()+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)

Pooled Regression III

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths)+
  geom_point()+
  geom_smooth(method = "lm", color = "red")+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven")+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)

Recap: Assumptions about Errors

Recall the 4 critical assumptions about :

The expected value of the residuals is 0
The variance of the residuals over is constant:
Errors are not correlated across observations:
There is no correlation between and the error term:

Biases of Pooled Regression

Assumption 3:
Pooled regression model is biased because it ignores:
- Multiple observations from same group
- Multiple observations from same time
Thus, errors are serially or auto-correlated; within same and within same

Biases of Pooled Regression: Our Example

Multiple observations from same state
- Probably similarities among for obs in same state
- Residuals on observations from same state are likely correlated
Multiple observations from same year
- Probably similarities among for obs in same year
- Residuals on observations from same year are likely correlated

Example: Consider Just 5 States

phones %>%
  filter(state %in% c("District of Columbia",
                      "Maryland", "Texas",
                      "California", "Kansas")) %>%
ggplot(data = .)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+
  geom_point()+ 
  geom_smooth(method = "lm")+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)+
  theme(legend.position = "top")

Example: Consider Just 5 States

phones %>%
  filter(state %in% c("District of Columbia",
                      "Maryland", "Texas",
                      "California", "Kansas")) %>%
ggplot(data = .)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+ 
  geom_point()+ 
  geom_smooth(method = "lm")+ 
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=14)+
  theme(legend.position = "none")+
  facet_wrap(~state, ncol=3)

Look at All States

ggplot(data = phones)+
  aes(x = cell_plans,
      y = deaths,
      color = state)+ 
  geom_point()+ 
  geom_smooth(method = "lm")+ 
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed")+
  theme(legend.position = "none")+
  facet_wrap(~state, ncol=7)

The Bias in our Pooled Regression

is endogenous:

The Bias in our Pooled Regression

is endogenous:

The Bias in our Pooled Regression

is endogenous:

Things in uit correlated with Cell phonesit:
- infrastructure spending, population, urban vs. rural, more/less cautious citizens, cultural attitudes towards driving, texting, etc

The Bias in our Pooled Regression

is endogenous:

Things in uit correlated with Cell phonesit:
- infrastructure spending, population, urban vs. rural, more/less cautious citizens, cultural attitudes towards driving, texting, etc

A lot of these things vary systematically by State!
- cor(uit1,uit2)≠0
  - Error in State during correlates with error in State during
  - things in State that don’t change over time

Fixed Effects Model

Fixed Effects: DAG

A simple pooled model likely contains lots of omitted variable bias
Many (often unobservable) factors that determine both Phones & Deaths
- Culture, infrastructure, population, geography, institutions, etc

Fixed Effects: DAG

A simple pooled model likely contains lots of omitted variable bias
Many (often unobservable) factors that determine both Phones & Deaths
- Culture, infrastructure, population, geography, institutions, etc
But the beauty of this is that most of these factors systematically vary by U.S. State and are stable over time!
We can simply “control for State” to safely remove the influence of all of these factors!

Fixed Effects: Decomposing uitMuch of the endogeneity in Xit can be explained by systematic differences across i (groups)

  

Fixed Effects: Decomposing

Much of the endogeneity in can be explained by systematic differences across (groups)
Exploit the systematic variation across groups with a fixed effects model

Fixed Effects: Decomposing

Much of the endogeneity in can be explained by systematic differences across (groups)
Exploit the systematic variation across groups with a fixed effects model
Decompose the model error term into two parts:

Fixed Effects:

Decompose the model error term into two parts:

are group-specific fixed effects
- group tends to have higher or lower than other groups given regressor(s)
- estimate a separate for each group
- essentially, estimate a separate constant (intercept) for each group
- notice this is stable over time within each group (subscript only , no
This includes all factors that do not change within group i over time

Fixed Effects:

Decompose the model error term into two parts:

is the remaining random error
- As usual in OLS, assume the 4 typical assumptions about this error:
- , , ,
includes all other factors affecting not contained in group effect
- i.e. differences within each group that change over time
- Be careful: Xit can still be endogenous due to other factors!

Fixed Effects: New Regression Equation

We've pulled out of the original error term into the regression
Essentially we’ll estimate an intercept for each group (minus one, which is
- avoiding the dummy variable trap
Must have multiple observations (over time) for each group (i.e. panel data)

Fixed Effects: Our Example

is the State fixed effect
- Captures everything unique about each state that does not change over time
- culture, institutions, history, geography, climate, etc!
There could still be factors in that are correlated with !
- things that do change over time within States
- perhaps individual States have cell phone bans for some years in our data

Estimating Fixed Effects Models

Two methods to estimate fixed effects models:

Least Squares Dummy Variable (LSDV) approach
De-meaned data approach

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Sounds like a lot of work, automatic in R

Least Squares Dummy Variable Approach

A dummy variable Di={0,1} for each possible group
- if observation is from group , otherwise

If there are N groups:
- Include dummies (to avoid dummy variable trap) and is the reference category^†
- So we are estimating a different intercept for each group

Sounds like a lot of work, automatic in R

^† If we do not estimate

, we could include all N dummies. In either case,

takes the place of one category-dummy.

Least Squares Dummy Variable Approach: Our Example

Example:

Let Alabama be the reference category , include all other States

Our Example in R I

If state is a factor variable, just include it in the regression
R automatically creates dummy variables and includes them in the regression
- Keeps intercept and leaves out first group dummy

Our Example in R II

fe_reg_1 <- lm(deaths ~ cell_plans + state, data = phones)
fe_reg_1 %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	25.507679925	1.0176400289	25.06552337	1.241581e-70
cell_plans	-0.001203742	0.0001013125	-11.88147584	3.483442e-26
stateAlaska	-2.484164783	0.6745076282	-3.68293060	2.816972e-04
stateArizona	-1.510577383	0.6704569688	-2.25305643	2.510925e-02
stateArkansas	3.192662931	0.6664383936	4.79063476	2.829319e-06
stateCalifornia	-4.978668651	0.6655467951	-7.48056889	1.206933e-12
stateColorado	-4.344553493	0.6654735335	-6.52851432	3.588784e-10
stateConnecticut	-6.595185530	0.6654428902	-9.91097152	8.698802e-20
stateDelaware	-2.098393628	0.6666483193	-3.14767707	1.842218e-03
stateDistrict of Columbia	6.355790010	1.2897172620	4.92804911	1.499627e-06

De-meaned Approach

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect
For each group , find the means (over time, :

De-meaned Approach I

Alternatively, we can control our regression for group fixed effects without directly estimating them
We simply de-mean the data for each group to remove the group fixed-effect
For each group , find the means (over time, :
Where:
- : average value of for group
- : average value of for group
- : average value of for group
- , by assumption 1 about errors

De-meaned Approach II

Subtract the means equation from the pooled equation to get:

De-meaned Approach II

Subtract the means equation from the pooled equation to get:

Within each group , the de-meaned variables and 's all have a mean of 0^†
Variables that don't change over time will drop out of analysis altogether
Removes any source of variation across groups (all now have mean of 0) to only work with variation within each group

^† Recall Rule 4 from the 2.3 class notes on the Summation Operator:

De-meaned Approach III

Yields identical results to dummy variable approach
More useful when we have many groups (would be many dummies)
Demonstrates intuition behind fixed effects:
- Converts all data to deviations from the mean of each group
- All groups are “centered” at 0, no variation across groups
- Fixed effects are often called the “within” estimators, they exploit variation within groups, not across groups

De-meaned Approach IV

We are basically comparing groups to themselves over time
- apples to apples comparison
- e.g. Maryland in 2000 vs. Maryland in 2005
Ignore all differences between groups, only look at differences within groups over time

De-Meaning the Data in R I# get means of Y and X by state
means_state <- phones %>%
  group_by(state) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
# look at it
means_state


  

De-Meaning the Data in R I

# get means of Y and X by state
means_state <- phones %>%
  group_by(state) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
# look at it
means_state

ABCDEFGHIJ0123456789

state <fct>	avg_deaths <dbl>	avg_phones <dbl>
Alabama	14.786711	8906.370
Alaska	13.612953	7817.759
Arizona	14.249825	8097.482
Arkansas	17.543881	9268.153
California	9.659712	9029.594
Colorado	10.351405	8981.762
Connecticut	8.141739	8947.729
Delaware	12.209610	9304.052
District of Columbia	8.015895	19811.205
Florida	13.544635	9078.592

De-Meaning the Data in R II

ggplot(data = means_state)+
  aes(x = fct_reorder(state, avg_deaths),
      y = avg_deaths,
      color = state)+
  geom_point()+
  geom_segment(aes(y = 0,
                   yend = avg_deaths,
                   x = state,
                   xend = state))+
  coord_flip()+
  labs(x = "Cell Phones Per 10,000 People",
       y = "Deaths Per Billion Miles Driven",
       color = NULL)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size=10)+
  theme(legend.position = "none")

Visualizing “Within Group” Estimates for the 5 States

Visualizing “Within Group” Estimates for All 51 States

De-meaned Approach in R I

The fixest package is designed for running regressions with fixed effects
feols() function is just like lm(), with some additional arguments:

#install.packages("fixest")
library(fixest)
fe_reg_1_alt <- feols(deaths ~ cell_plans | state,
                      data = phones)

De-meaned Approach in R IIfe_reg_1_alt %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51
## Standard-errors: Clustered (state) 
##             Estimate Std. Error  t value  Pr(>|t|)    
## cell_plans -0.001204   0.000143 -8.41708 3.792e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 1.05007     Adj. R2: 0.886524
##                 Within R2: 0.357238
# or using broom's tidy()
fe_reg_1_alt %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.0012037420.0001430118-8.4170773.791955e-11
1 row

  

Two-Way Fixed Effects

State fixed effect controls for all factors that vary by state but are stable over time
But there are still other (often unobservable) factors that affect both Phones and Deaths, that don’t vary by State
- The country’s macroeconomic performance, federal laws, etc

Two-Way Fixed Effects

State fixed effect controls for all factors that vary by state but are stable over time
But there are still other (often unobservable) factors that affect both Phones and Deaths, that don’t vary by State
- The country’s macroeconomic performance, federal laws, etc
If these factors systematically vary over time, but are the same by State, then we can “control for Year” to safely remove the influence of all of these factors!

Two-Way Fixed EffectsA one-way fixed effects model estimates a fixed effect for groups

  

Two-Way Fixed Effects

A one-way fixed effects model estimates a fixed effect for groups
Two-way fixed effects model estimates fixed effects for both groups and time periods
: group fixed effects
- accounts for time-invariant differences across groups
: time fixed effects
- accounts for group-invariant differences over time
remaining random error
- all remaining factors that affect that vary by state and change over time

Two-Way Fixed Effects: Our Example

: State fixed effects
- differences across states that are stable over time (note subscript only)
- e.g. geography, culture, (unchanging) state laws
: Year fixed effects
- differences over time that are stable across states (note subscript only)
- e.g. economy-wide macroeconomic changes, federal laws passed

Visualizing Year Effects I# find averages for years
means_year <- phones %>%
  group_by(year) %>%
  summarize(avg_deaths = mean(deaths),
            avg_phones = mean(cell_plans))
means_year
ABCDEFGHIJ0123456789
year
<fct>
avg_deaths
<dbl>
avg_phones
<dbl>
200714.007518064.531
200812.871568482.903
200912.086328859.706
201011.614879134.592
201111.364319485.238
201211.656669660.474
6 rows

  

Visualizing Year Effects II

ggplot(data = phones)+
  aes(x = year,
      y = deaths)+
  geom_point(aes(color = year))+
  # Add the yearly means as black points
  geom_point(data = means_year,
             aes(x = year,
                 y = avg_deaths),
             size = 3,
             color = "black")+
  # connect the means with a line
  geom_line(data = means_year,
            aes(x = as.numeric(year),
                y = avg_deaths),
            color = "black",
            size = 1)+
  theme_bw(base_family = "Fira Sans Condensed",
           base_size = 14)+
  theme(legend.position = "none")

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

2) Fully De-meaned data:

where for each variable:

Estimating Two-Way Fixed Effects

As before, several equivalent ways to estimate two-way fixed effects models:

1) Least Squares Dummy Variable (LSDV) Approach: add dummies for both groups and time periods (separate intercepts for groups and times)

2) Fully De-meaned data:

where for each variable:

3) Hybrid: de-mean for one effect (groups or years) and add dummies for the other effect (years or groups)

LSDV Method

fe2_reg_1 <- lm(deaths ~ cell_plans + state + year,
                data = phones)
fe2_reg_1 %>% tidy()

ABCDEFGHIJ0123456789

term <chr>	estimate <dbl>	std.error <dbl>	statistic <dbl>	p.value <dbl>
(Intercept)	18.9304707399	1.4511323962	13.0453092	5.427406e-30
cell_plans	-0.0002995294	0.0001723149	-1.7382677	8.339982e-02
stateAlaska	-1.4998292482	0.6241082951	-2.4031554	1.698648e-02
stateArizona	-0.7791714713	0.6113519094	-1.2745057	2.036724e-01
stateArkansas	2.8655344756	0.5985062952	4.7878101	2.895040e-06
stateCalifornia	-5.0900897113	0.5956293282	-8.5457338	1.299236e-15
stateColorado	-4.4127241692	0.5953924847	-7.4114543	1.945083e-12
stateConnecticut	-6.6325834801	0.5952933996	-11.1417051	1.169797e-23
stateDelaware	-2.4579829953	0.5991822226	-4.1022295	5.546475e-05
stateDistrict of Columbia	-3.5044963616	1.9710939218	-1.7779449	7.663326e-02

With fixestfe2_reg_2 <- feols(deaths ~ cell_plans | state + year,
                 data = phones)
fe2_reg_2 %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51,  year: 6
## Standard-errors: Clustered (state) 
##            Estimate Std. Error   t value Pr(>|t|) 
## cell_plans   -3e-04   0.000305 -0.980739  0.33144 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.930036     Adj. R2: 0.909197
##                  Within R2: 0.011989
fe2_reg_2 %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.00029952940.0003054118-0.98073940.3314431
1 row

  

Adding Covariates

State fixed effect absorbs all unobserved factors that vary by state, but are constant over time
Year fixed effect absorbs all unobserved factors that vary by year, but are constant over States
But there are still other (often unobservable) factors that affect both Phones and Deaths, that vary by State and change over time!
- Some States change their laws during the time period
- State urbanization rates change over the time period
We will also need to control for these variables (not picked up by fixed effects!)
- Add them to the regression

Adding Covariates I

Can still add covariates to remove endogeneity not soaked up by fixed effects
- factors that change within groups over time
- e.g. some states pass bans over the time period in data (some years before, some years after)

Adding Covariates IIfe2_controls_reg <- feols(deaths ~ cell_plans + text_ban + urban_percent + cell_ban | state + year,
                          data = phones) 
fe2_controls_reg %>% summary()
## OLS estimation, Dep. Var.: deaths
## Observations: 306 
## Fixed-effects: state: 51,  year: 6
## Standard-errors: Clustered (state) 
##                Estimate Std. Error  t value Pr(>|t|)    
## cell_plans    -0.000340   0.000277 -1.22780 0.225269    
## text_ban1      0.255926   0.243444  1.05127 0.298188    
## urban_percent  0.013135   0.009815  1.33822 0.186878    
## cell_ban1     -0.679796   0.335655 -2.02528 0.048194 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 0.920123     Adj. R2: 0.910039
##                  Within R2: 0.032939
fe2_controls_reg %>% tidy()
ABCDEFGHIJ0123456789
term
<chr>
estimate
<dbl>
std.error
<dbl>
statistic
<dbl>
p.value
<dbl>
cell_plans-0.00034037350.0002772212-1.2278050.22526919
text_ban10.25592615690.24344421111.0512720.29818803
urban_percent0.01313476570.00981507051.3382240.18687751
cell_ban1-0.67979565220.3356553662-2.0252790.04819377
4 rows

  

Comparing Modelslibrary(huxtable)
huxreg("Pooled" = pooled,
       "State Effects" = fe_reg_1,
       "State & Year Effects" = fe2_reg_1,
       "With Controls" = fe2_controls_reg,
       coefs = c("Intercept" = "(Intercept)",
                 "Cell phones" = "cell_plans",
                 "Cell Ban" = "cell_ban1",
                 "Texting Ban" = "text_ban1",
                 "Urbanization Rate" = "urban_percent"),
       statistics = c("N" = "nobs",
                      "R-Squared" = "r.squared",
                      "SER" = "sigma"),
       number_format = 4)

PooledState EffectsState & Year EffectsWith Controls

Intercept17.3371 ***25.5077 ***18.9305 ***       

(0.9754)   (1.0176)   (1.4511)          

Cell phones-0.0006 ***-0.0012 ***-0.0003    -0.0003  

(0.0001)   (0.0001)   (0.0002)   (0.0003) 

Cell Ban                           -0.6798 *

                           (0.3357) 

Texting Ban                           0.2559  

                           (0.2434) 

Urbanization Rate                           0.0131  

                           (0.0098) 

N306         306         306         306       

R-Squared0.0845    0.9055    0.9259    0.9274  

SER3.2791    1.1526    1.0310    1.0262  

 *** p < 0.001;  ** p < 0.01;  * p < 0.05.

4.1 — Panel Data and Fixed Effects

ECON 480 • Econometrics • Fall 2021

Ryan Safner Assistant Professor of Economics safner@hood.edu ryansafner/metricsF21 metricsF21.classes.ryansafner.com

Outline

Pooled Regression Model

Types of Data I

Types of Data I

Types of Data I

Types of Data I

Panel Data I

Panel Data II

Panel Data II

Panel Data: Our Motivating Example

The Data I

The Data II

The Data II

The Data III

The Data III

The Data IV

Pooled Regression I

Pooled Regression I

Pooled Regression II

Pooled Regression III

Pooled Regression III

Recap: Assumptions about Errors

Biases of Pooled Regression

Biases of Pooled Regression: Our Example

Example: Consider Just 5 States

Example: Consider Just 5 States

Look at All States

The Bias in our Pooled Regression

The Bias in our Pooled Regression

The Bias in our Pooled Regression

The Bias in our Pooled Regression

Fixed Effects Model

Fixed Effects: DAG

Fixed Effects: DAG

Fixed Effects: Decomposing uit

Fixed Effects: Decomposing uit

Fixed Effects: Decomposing uit

Fixed Effects: αi

Fixed Effects: ϵit

Fixed Effects: New Regression Equation

Fixed Effects: Our Example

Estimating Fixed Effects Models

Least Squares Dummy Variable Approach

Least Squares Dummy Variable Approach

Least Squares Dummy Variable Approach

Least Squares Dummy Variable Approach

Least Squares Dummy Variable Approach

Least Squares Dummy Variable Approach: Our Example

Our Example in R I

Our Example in R II

De-meaned Approach

De-meaned Approach I

De-meaned Approach I

De-meaned Approach I

De-meaned Approach II

De-meaned Approach II

De-meaned Approach II

De-meaned Approach III

De-meaned Approach IV

De-Meaning the Data in R I

De-Meaning the Data in R I

De-Meaning the Data in R II

Visualizing “Within Group” Estimates for the 5 States

Visualizing “Within Group” Estimates for All 51 States

De-meaned Approach in R I

De-meaned Approach in R II

Two-Way Fixed Effects

Two-Way Fixed Effects

Two-Way Fixed Effects

Two-Way Fixed Effects

Two-Way Fixed Effects

Two-Way Fixed Effects: Our Example

Visualizing Year Effects I

Visualizing Year Effects II

Estimating Two-Way Fixed Effects

Estimating Two-Way Fixed Effects

Estimating Two-Way Fixed Effects

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Fixed Effects: Decomposing

Fixed Effects: Decomposing

Fixed Effects: Decomposing

Fixed Effects:

Fixed Effects:

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Fixed Effects: Decomposing

Fixed Effects: Decomposing

Fixed Effects: Decomposing

Fixed Effects:

Fixed Effects: