+ - 0:00:00
Notes for current slide
Notes for next slide

2.1 — Random Variables & Distributions

ECON 480 • Econometrics • Fall 2021

Ryan Safner
Assistant Professor of Economics
safner@hood.edu
ryansafner/metricsF21
metricsF21.classes.ryansafner.com

Random Variables

Experiments

  • An experiment is any procedure that can (in principle) be repeated infinitely and has a well-defined set of outcomes

Example: flip a coin 10 times

Random Variables

  • A random variable (RV) takes on values that are unknown in advance, but determined by an experiment

  • A numerical summary of a random outcome

Example: the number of heads from 10 coin flips

Random Variables: Notation

  • Random variable X takes on individual values (xi) from a set of possible values

  • Often capital letters to denote RV's

    • lowercase letters for individual values

Example: Let X be the number of Heads from 10 coin flips. xi{0,1,2,...,10}

Discrete Random Variables

Discrete Random Variables

  • A discrete random variable: takes on a finite/countable set of possible values

Example: Let X be the number of times your computer crashes this semester1, xi{0,1,2,3,4}

1 Please, back up your files!

Discrete Random Variables: Probability Distribution

  • Probability distribution of a R.V. fully lists all the possible values of X and their associated probabilities

Example:

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

Discrete Random Variables: pdf

Probability distribution function (pdf) summarizes the possible outcomes of X and their probabilities

  • Notation: fX is the pdf of X:

fX=pi,i=1,2,...,k

  • For any real number xi, f(xi) is the probablity that X=xi

Example:

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

Discrete Random Variables: pdf

Probability distribution function (pdf) summarizes the possible outcomes of X and their probabilities

  • Notation: fX is the pdf of X:

fX=pi,i=1,2,...,k

  • For any real number xi, f(xi) is the probablity that X=xi

Example:

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01
  • What is f(0)?

Discrete Random Variables: pdf

Probability distribution function (pdf) summarizes the possible outcomes of X and their probabilities

  • Notation: fX is the pdf of X:

fX=pi,i=1,2,...,k

  • For any real number xi, f(xi) is the probablity that X=xi

Example:

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01
  • What is f(0)?

  • What is f(3)?

Discrete Random Variables: pdf Graph

crashes<-tibble(number = c(0,1,2,3,4),
prob = c(0.80, 0.10, 0.06, 0.03, 0.01))
ggplot(data = crashes)+
aes(x = number,
y = prob)+
geom_col(fill="#0072B2")+
labs(x = "Number of Crashes",
y = "Probability")+
theme_classic(base_family = "Fira Sans Condensed",
base_size=20)

Discrete Random Variables: cdf

Cumulative distribution function (pdf) lists probability X will be at most (less than or equal to) a given value xi

  • Notation: FX=P(Xxi)

Example:

xi f(x) F(x)
0 0.80 0.80
1 0.10 0.90
2 0.06 0.96
3 0.03 0.99
4 0.01 1.00

Discrete Random Variables: cdf

Cumulative distribution function (pdf) lists probability X will be at most (less than or equal to) a given value xi

  • Notation: FX=P(Xxi)

Example:

xi f(x) F(x)
0 0.80 0.80
1 0.10 0.90
2 0.06 0.96
3 0.03 0.99
4 0.01 1.00
  • What is the probability your computer will crash at most once, F(1)?

Discrete Random Variables: cdf

Cumulative distribution function (pdf) lists probability X will be at most (less than or equal to) a given value xi

  • Notation: FX=P(Xxi)

Example:

xi f(x) F(x)
0 0.80 0.80
1 0.10 0.90
2 0.06 0.96
3 0.03 0.99
4 0.01 1.00
  • What is the probability your computer will crash at most once, F(1)?

  • What about three times, F(3)?

Discrete Random Variables: cdf Graph

crashes<-crashes %>%
mutate(cum_prob = cumsum(prob))
crashes
## # A tibble: 5 × 3
## number prob cum_prob
## <dbl> <dbl> <dbl>
## 1 0 0.8 0.8
## 2 1 0.1 0.9
## 3 2 0.06 0.96
## 4 3 0.03 0.99
## 5 4 0.01 1
ggplot(data = crashes)+
aes(x = number,
y = cum_prob)+
geom_col(fill="#0072B2")+
labs(x = "Number of Crashes",
y = "Probability")+
theme_classic(base_family = "Fira Sans Condensed",
base_size=20)

Expected Value and Variance

Expected Value of a Random Variable

  • Expected value of a random variable X, written E(X) (and sometimes μ), is the long-run average value of X "expected" after many repetitions

E(X)=ki=1pixi

Expected Value of a Random Variable

  • Expected value of a random variable X, written E(X) (and sometimes μ), is the long-run average value of X "expected" after many repetitions

E(X)=ki=1pixi

  • E(X)=p1x1+p2x2++pkxk

  • A probability-weighted average of X, with each xi weighted by its associated probability pi

  • Also called the "mean" or "expectation" of X, always denoted either E(X) or μX

Expected Value: Example I

Example: Suppose you lend your friend $100 at 10% interest. If the loan is repaid, you receive $110. You estimate that your friend is 99% likely to repay, but there is a default risk of 1% where you get nothing. What is the expected value of repayment?

Expected Value: Example II

Example:

Let X be a random variable that is described by the following pdf:

xi P(X=xi)
1 0.50
2 0.25
3 0.15
4 0.10

Calculate E(X).

The Steps to Calculate E(X), Coded

# Make a Random Variable called X
X<-tibble(x_i=c(1,2,3,4), # values of X
p_i=c(0.50,0.25,0.15,0.10)) # probabilities
X %>%
summarize(expected_value = sum(x_i*p_i))
## # A tibble: 1 × 1
## expected_value
## <dbl>
## 1 1.85

Variance of a Random Variable

  • The variance of a random variable X, denoted var(X) or σ2X is:

σ2X=E[(xiμX)2]=ni=1(xiμX)2pi

  • This is the expected value of the squared deviations from the mean
    • i.e. the probability-weighted average of the squared deviations

Standard Deviation of a Random Variable

  • The standard deviation of a random variable X, denoted sd(X) or σX is:

σX=σ2X

Standard Deviation: Example I

Example: What is the standard deviation of computer crashes?

xi P(X=xi)
0 0.80
1 0.10
2 0.06
3 0.03
4 0.01

The Steps to Calculate sd(X), Coded I

# get the expected value
crashes %>%
summarize(expected_value = sum(number*prob))
## # A tibble: 1 × 1
## expected_value
## <dbl>
## 1 0.35
# save this for quick use
exp_value<-0.35
crashes_2 <- crashes %>%
select(-cum_prob) %>% # we don't need the cdf
# create new columns
mutate(deviations = number - exp_value, # deviations from exp_value
deviations_sq = deviations^2,
weighted_devs_sq = prob * deviations^2) # square deviations

The Steps to Calculate sd(X), Coded II

# look at what we made
crashes_2
## # A tibble: 5 × 5
## number prob deviations deviations_sq weighted_devs_sq
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.8 -0.35 0.122 0.098
## 2 1 0.1 0.65 0.423 0.0423
## 3 2 0.06 1.65 2.72 0.163
## 4 3 0.03 2.65 7.02 0.211
## 5 4 0.01 3.65 13.3 0.133

The Steps to Calculate sd(X), Coded III

# now we want to take the expected value of the squared deviations to get variance
crashes_2 %>%
summarize(variance = sum(weighted_devs_sq), # variance
sd = sqrt(variance)) # sd is square root
## # A tibble: 1 × 2
## variance sd
## <dbl> <dbl>
## 1 0.648 0.805

Standard Deviation: Example II

Example: What is the standard deviation of the random variable we saw before?

xi P(X=xi)
1 0.50
2 0.25
3 0.15
4 0.10

Hint: you already found it's expected value.

Continuous Random Variables

Continuous Random Variables

  • Continuous random variables can take on an uncountable (infinite) number of values

  • So many values that the probability of any specific value is infinitely small: P(X=xi)0

  • Instead, we focus on a range of values it might take on

Continuous Random Variables: pdf I

Probability density function (pdf) of a continuous variable represents the probability between two values as the area under a curve

  • The total area under the curve is 1

  • Since P(a)=0 and P(b)=0, P(a<X<b)=P(aXb)

Example: P(0X2)

Continuous Random Variables: pdf II

  • FYI using calculus:

P(aXb)=baf(x)dx

  • Complicated: software or (old fashioned!) probability tables to calculate

Continuous Random Variables: cdf I

  • The cumulative density function (cdf) describes the area under the pdf for all values less than or equal to (i.e. to the left of) a given value, k

P(Xk)

Example: P(X2)

Continuous Random Variables: cdf II

  • Note: to find the probability of values greater than or equal to (to the right of) a given value k:

P(Xk)=1P(Xk)

Example: P(X2)=1P(X2)

P(X2)= area under the curve to the right of 2

The Normal Distribution

The Normal Distribution I

  • The Gaussian or normal distribution is the most useful type of probability distribution

XN(μ,σ)

  • Continuous, symmetric, unimodal, with mean μ and standard deviation σ

The Normal Distribution II

  • FYI: The pdf of XN(μ,σ) is P(X=k)=12πσ2e12((kμ)σ)2

  • Do not try and learn this, we have software and (previously tables) to calculate pdfs and cdfs

The 68-95-99.7 Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

The 68-95-99.7 Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ1σXμ+1σ) 68%

The 68-95-99.7 Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ1σXμ+1σ) 68%

  • P(μ2σXμ+2σ) 95%

The 68-95-99.7 Rule

  • 68-95-99.7% empirical rule: for a normal distribution:

  • P(μ1σXμ+1σ) 68%

  • P(μ2σXμ+2σ) 95%

  • P(μ3σXμ+3σ) 99.7%

  • 68/95/99.7% of observations fall within 1/2/3 standard deviations of the mean

The Standard Normal Distribution

  • The standard normal distribution (often referred to as Z) has mean 0 and standard deviation 1

ZN(0,1)

The Standard Normal cdf

  • The standard normal cdf

Φ(k)=P(Zk)

Standardizing Variables

  • We can take any normal distribution (for any μ,σ) and standardize it to the standard normal distribution by taking the Z-score of any value, xi:

Z=xiμσ

  • Subtract any value by the distribution's mean and divide by standard deviation

  • Z: number of standard deviations xi value is away from the mean

Standardizing Variables: Example

Example: On August 8, 2011, the Dow dropped 634.8 points, sending shock waves through the financial community. Assume that during mid-2011 to mid-2012 the daily change for the Dow is normally distributed, with the mean daily change of 1.87 points and a standard deviation of 155.28 points. What is the Z-score?

Standardizing Variables: Example

Example: On August 8, 2011, the Dow dropped 634.8 points, sending shock waves through the financial community. Assume that during mid-2011 to mid-2012 the daily change for the Dow is normally distributed, with the mean daily change of 1.87 points and a standard deviation of 155.28 points. What is the Z-score?

Z=Xμσ

Standardizing Variables: Example

Example: On August 8, 2011, the Dow dropped 634.8 points, sending shock waves through the financial community. Assume that during mid-2011 to mid-2012 the daily change for the Dow is normally distributed, with the mean daily change of 1.87 points and a standard deviation of 155.28 points. What is the Z-score?

Z=Xμσ

Z=634.81.87155.28

Standardizing Variables: Example

Example: On August 8, 2011, the Dow dropped 634.8 points, sending shock waves through the financial community. Assume that during mid-2011 to mid-2012 the daily change for the Dow is normally distributed, with the mean daily change of 1.87 points and a standard deviation of 155.28 points. What is the Z-score?

Z=Xμσ

Z=634.81.87155.28

Z=4.1

This is 4.1 standard deviations (σ) beneath the mean, an extremely low probability event.

Standardizing Variables: From X to Z I

Example: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Standardizing Variables: From X to Z I

Example: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Convert to standard normal to find Z-scores for 8 and 3.2.

P(3.2<X<8)

Standardizing Variables: From X to Z I

Example: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Convert to standard normal to find Z-scores for 8 and 3.2.

P(3.2<X<8)

P(3.22.45.6<X2.45.6<82.45.6)

Standardizing Variables: From X to Z I

Example: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Convert to standard normal to find Z-scores for 8 and 3.2.

P(3.2<X<8)

P(3.22.45.6<X2.45.6<82.45.6)

P(1<Z<1)

Standardizing Variables: From X to Z I

Example: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

Convert to standard normal to find Z-scores for 8 and 3.2.

P(3.2<X<8)

P(3.22.45.6<X2.45.6<82.45.6)

P(1<Z<1)

P(X±1σ)=0.68

Standardizing Variables: From X to Z II

Standardizing Variables: From X to Z III

You Try: In the last quarter of 2015, a group of 64 mutual funds had a mean return of 2.4% with a standard deviation of 5.6%. These returns can be approximated by a normal distribution.

  1. What percent of the funds would you expect to be earning between -3.2% and 8.0% returns?

  2. What percent of the funds would you expect to be earning 2.4% or less?

  3. What percent of the funds would you expect to be earning between -8.8% and 13.6%?

  4. What percent of the funds would you expect to be earning returns greater than 13.6%?

Finding Z-score Probabilities I

  • How do we actually find the probabilities for Zscores?

Finding Z-score Probabilities I

  • How do we actually find the probabilities for Zscores?

Finding Z-score Probabilities II

Probability to the left of zi

P(Zzi)=Φ(zi)cdf of zi

Probability to the right of zi

P(Zzi)=1Φ(zi)cdf of zi

Finding Z-score Probabilities III

Probability between z1 and z2

P(z1Zz2)=Φ(z2)cdf of z2Φ(z1)cdf of z1

Finding Z-score Probabilities IV

  • pnorm() calculates probabilities with a normal distribution with arguments:
    • x = the value
    • mean = the mean
    • sd = the standard deviation
    • lower.tail =
      • TRUE if looking at area to LEFT of value
      • FALSE if looking at area to RIGHT of value

Finding Z-score Probabilities IV

Example: Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets at least an 80
pnorm(80,
mean = 75,
sd = 10,
lower.tail = FALSE) # looking to right
## [1] 0.3085375

Finding Z-score Probabilities V

Example: Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets at most an 80
pnorm(80,
mean = 75,
sd = 10,
lower.tail = TRUE) # looking to left
## [1] 0.6914625

Finding Z-score Probabilities VI

Example: Let the distribution of grades be normal, with mean 75 and standard deviation 10.

  • Probability a student gets between a 65 and 85
# subtract two left tails!
pnorm(85, # larger number first!
mean = 75,
sd = 10,
lower.tail = TRUE) - # looking to left, & SUBTRACT
pnorm(65, # smaller number second!
mean = 75,
sd = 10,
lower.tail = TRUE) #looking to left
## [1] 0.6826895

Random Variables

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow