You go into data analysis with the tools you know, not the tools you need
The next 2-3 weeks are all about giving you the tools you need
We will extend them as we learn specific models


Free and open source
A very large community
R firstCan handle virtually any data format
Makes replication easy
Can integrate into documents (with R markdown)
R is a language so it can do everything

library("gapminder")library("tidyverse")ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, color = continent))+ geom_point(alpha=0.3)+ geom_smooth(method = "lm")+ scale_x_log10(breaks=c(1000,10000, 100000), label=scales::dollar)+ labs(x = "GDP/Capita", y = "Life Expectancy (Years)")+ facet_wrap(~continent)+ guides(color = F)+ theme_light()

The average GDP per capita is $`r
round(mean(gapminder$gdpPercap),2)` with a standard deviation of $`r
round(sd(gapminder$gdpPercap),2)` .
The average GDP per capita is $7215.33 with a standard deviation of $9857.45.
R is the programming language that executes commands
R Studio is an integrated development environment (IDE) that makes your coding life a lot easier
R Markdown
R Studio
R is like your car's engine, R Studio is the dashboard
You will do everything in R Studio
R itself is just a command language (you could run it in your computer's shell/terminal/command prompt)

R Studio
R Studio has 4 window panes:

R Studio
†May not be immediately visible until you create new files.
You don't “learn R”, you learn how to do things in R
In order to do learn this, you need to learn how to search for what you want to do
You don't “learn R”, you learn how to do things in R
In order to do learn this, you need to learn how to search for what you want to do
My #rstats learning path:
— Jesse Mostipak (@kierisi) August 18, 2017
1. Install R
2. Install RStudio
3. Google "How do I [THING I WANT TO DO] in R?"
Repeat step 3 ad infinitum.

Type individual commands into the console window
Great for testing individual commands to see what happens
Not saved! Not reproducible! Not recommended!
2+2
## [1] 4summary(mpg$hwy)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 12.00 18.00 24.00 23.44 27.00 44.00Type individual commands into the console window
Great for testing individual commands to see what happens
Not saved! Not reproducible! Not recommended!

Source pane is a text-editor
Make .R files: all input commands in a single script
Comment with #
Can run any or all of script at once
Can save, reproduce, and send to others!

A later lecture: R Markdown, a simple markup language to write documents in
Can integrate text, R code, figures, citations & bibliographies in a single plain-text file & output into a variety of formats: PDF, webpage, slides, Word doc, etc.

Practicing typing at the Command line/Console
Learning different commands and objects relevant for data analysis
Saving and running .R scripts
Later: R markdown, literate programming, workflow management
Today may seem a bit overwhelming
R assumes a default (often inconvenient) "working directory" on your computer
open or save files Find out where R this is with getwd()
Change it with setwd(path/to/folder)†
Soon I'll show you better ways where you won't ever have to worry about this
† Note the path is OS-specific. For Windows it might be C:/Documents/. For Mac it is often your username folder.

Hadley Wickham
Chief Scientist, R Studio
"There’s an implied contract between you and R: it will do the tedious computation for you, but in return, you must be completely precise in your instructions. Typos matter. Case matters." - R for Data Science, Ch. 4


help(function_name) or ?(function_name) to get documentation on a functionFrom Kieran Healy's excellent (free online!) book on Data Visualization.

]
# starts a comment, R will ignore everything on the rest of that line# Run regression of y on x, save as reg1 reg1<-lm(y~x, data=data) #runs regression summary(reg1$coefficients) #prints coefficients
I follow this style guide (you are not required to)†
Naming objects and files will become important‡
my webpage in html turned into http://my%20webpage%20in%20html.htmli_use_underscoressome.people.use.snake.caseothersUseCamelCase
† Also described in today's course notes page and the course resources.
‡ Consider your folders on your computer as well...
You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
> 2+2
## [1] 4You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
> 2+2
## [1] 4> and give you output starting with ## [1]2^3
## [1] 82^3
## [1] 8sqrt(25)
## [1] 52^3
## [1] 8sqrt(25)
## [1] 5log(6)
## [1] 1.7917592^3
## [1] 8sqrt(25)
## [1] 5log(6)
## [1] 1.791759pi/2
## [1] 1.570796library()library("package_name")install.packages()† install.packages("package_name")
creating objects
= (or <-)running functions on objects
function_name(object_name)# make an objectmy_object = c(1,2,3,4,5)# look at it my_object
## [1] 1 2 3 4 5# find the sumsum(my_object)
## [1] 15# find the mean mean(my_object)
## [1] 3Functions have "arguments," the input(s)
Some functions may have multiple inputs
The argument of a function can be another function!
# find the sdsd(my_object)
## [1] 1.581139# round everything in my object to two decimalsround(my_object,2)
## [1] 1 2 3 4 5# round the sd to two decimalsround(sd(my_object),2)
## [1] 1.58Numeric objects are just numbers†
Can be mathematically manipulated
x = 2 y = 3x+y
## [1] 5x*y
## [1] 6integer or double if there are decimal values.Character objects are “strings” of text held inside quote marks
Can contain spaces, so long as contained within quote marks
name = "Ryan Safner"address = "Washington D.C."name
## [1] "Ryan Safner"address
## [1] "Washington D.C."TRUE or FALSE indicators>, <: greater than, less than>=, <=: greater than or equal to, less than or equal to==, !=: is equal to, is not equal to†&in& : is a member of the set of (∈)&: "AND"|: "OR" † One = assigns a value (like <-).
Two == evaluate a conditional statement.
z = 10 # set z equal to 10z==10 # is z equal to 10?
## [1] TRUE"red"=="blue" # is red equal to blue?
## [1] FALSEz > 1 & z < 12 # is z > 1 AND < 12?
## [1] TRUEz <= 1 | z==10 # is z >= 1 OR equal to 10?
## [1] TRUEFactor objects contain categorical data - membership in mutually exclusive groups
Look like strings, behave more like logicals, but with more than two options
## [1] senior junior freshman junior sophomore sophomore sophomore## [8] senior senior sophomore## Levels: freshman sophomore junior senior## [1] senior junior freshman junior sophomore sophomore sophomore## [8] senior senior sophomore## Levels: freshman < sophomore < junior < seniorVector: the simplest type of object, just a collection of elements
Make a vector using the combine c() function
# create a vector named vecvec = c(1,"orange", 83.5, pi)# look at vecvec
## [1] "1" "orange" "83.5" "3.14159265358979"Data.frame: what we'll be using almost always
Think like a “spreadsheet”
Each column is a vector (variable)
Each row is an observation (pair of values for all variables)
library("ggplot2")diamonds
## # A tibble: 53,940 × 10## carat cut color clarity depth table price x y z## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39## # … with 53,930 more rowsDataframes are really just combinations of (column) vectors
You can make data frames by combinining named vectors with data.frame() or creating each column/vector in each argument
# make two vectorsfruits = c("apple","orange","pear","kiwi","pineapple")numbers = c(3.3,2.0,6.1,7.5,4.2)# combine into dataframedf = data.frame(fruits,numbers)# do it all in one step (note the = instead of <-)df = data.frame(fruits=c("apple","orange","pear","kiwi","pineapple"), numbers=c(3.3,2.0,6.1,7.5,4.2))# look at itdf
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2= or <- my_vector = c(1,2,3,4,5)
my_vector
## [1] 1 2 3 4 5my_vector
## [1] 1 2 3 4 5my_vector = c(2,7,9,1,5)my_vector
## [1] 2 7 9 1 5class()class("six")
## [1] "character"class(6)
## [1] "numeric"class()class("six")
## [1] "character"class(6)
## [1] "numeric"is.() is.numeric("six")
## [1] FALSEis.character("six")
## [1] TRUEas.object_class()numeric, etc! as.character(6)
## [1] "6"as.numeric("six")
## [1] NAmixed_vector = c(pi, 12, "apple", 6.32)class(mixed_vector)
## [1] "character"mixed_vector
## [1] "3.14159265358979" "12" "apple" "6.32"df
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2class(df$fruits)
## [1] "character"class(df$numbers)
## [1] "numeric"†Remember each column in a data frame is a vector!
str() command to view its structureclass(df)
## [1] "data.frame"str(df)
## 'data.frame': 5 obs. of 2 variables:## $ fruits : chr "apple" "orange" "pear" "kiwi" ...## $ numbers: num 3.3 2 6.1 7.5 4.2n) rows with head()head(df)
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2head(df, n=2)
## fruits numbers## 1 apple 3.3## 2 orange 2.0summary()summary(df)
## fruits numbers ## Length:5 Min. :2.00 ## Class :character 1st Qu.:3.30 ## Mode :character Median :4.20 ## Mean :4.62 ## 3rd Qu.:6.10 ## Max. :7.50† For numeric data only; a frequency table is displayed for character or factor data

data.frame objects can be viewed in their own panel by clicking on the name of the object in the environment pane
my_vector = c(2,4,5,10) # create object called my_vectormy_vector # look at it
## [1] 2 4 5 10my_vector+4 # add 4 to all elements of my_vector
## [1] 6 8 9 14my_vector^2 # square all elements of my_vector
## [1] 4 16 25 100length(my_vector) # how many elements?
## [1] 4sum(my_vector) # add all elements together
## [1] 21max(my_vector) # find largest element
## [1] 10min(my_vector) # find smallest element
## [1] 2.pull-right[
mean(my_vector) # mean of all elements
## [1] 5.25median(my_vector) # median of all elements
## [1] 4.5var(my_vector) # variance of object
## [1] 11.58333sd(my_vector) # standard deviation of object
## [1] 3.40343+ sign waiting for you to finish the command> 2+(2*3+
)--or hit Esc to cancelmtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40The mtcars dataset is automatically built in with R.
df[r,c]r or c blank selects all rows or columnsc()†:r and c! † You can also "negate" values, selecting everything except for values with a - in front of them.
mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[1,] # first row
## mpg cyl disp hp drat wt qsec## Mazda RX4 21 6 160 110 3.9 2.62 16.46mtcars[c(1,3,4),] # first, third, and fourth rows
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Datsun 710 22.8 4 108 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44mtcars[1:3,] # first through third rows
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02## Datsun 710 22.8 4 108 93 3.85 2.320 18.61mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[,6] # select column 6
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070mtcars[,2:4] # select columns 2 through 4
## cyl disp hp## Mazda RX4 6 160.0 110## Mazda RX4 Wag 6 160.0 110## Datsun 710 4 108.0 93## Hornet 4 Drive 6 258.0 110## Hornet Sportabout 8 360.0 175## Valiant 6 225.0 105## Duster 360 8 360.0 245## Merc 240D 4 146.7 62## Merc 230 4 140.8 95## Merc 280 6 167.6 123## Merc 280C 6 167.6 123## Merc 450SE 8 275.8 180mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40[[]] selects a column by positionmtcars[[6]] # select sixth column (wt)
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070$mtcars$wt # does the same thing!
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[mtcars$wt>4,] # select all obs with wt>4
## mpg cyl disp hp drat wt qsec## Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4mtcars[mtcars$cyl==6,] # select all obs with exactly 6 cyl
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[mtcars$wt>2 & mtcars$wt<3,] # obs where 2<wt<3
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02## Datsun 710 22.8 4 108 93 3.85 2.320 18.61mtcars[mtcars$cyl==4 | mtcars$cyl==6,] # obs with 4 OR 6 cyl
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90Next class: data visualization with ggplot2
And then: data wrangling with tidyverse
And then: literate programming and workflow management with R Markdown, R Projects, maybe git
Finally: onto statistics and econometric theory!
Keyboard shortcuts
| ↑, ←, Pg Up, k | Go to previous slide |
| ↓, →, Pg Dn, Space, j | Go to next slide |
| Home | Go to first slide |
| End | Go to last slide |
| Number + Return | Go to specific slide |
| b / m / f | Toggle blackout / mirrored / fullscreen mode |
| c | Clone slideshow |
| p | Toggle presenter mode |
| t | Restart the presentation timer |
| ?, h | Toggle this help |
| o | Tile View: Overview of Slides |
| Esc | Back to slideshow |
You go into data analysis with the tools you know, not the tools you need
The next 2-3 weeks are all about giving you the tools you need
We will extend them as we learn specific models


Free and open source
A very large community
R firstCan handle virtually any data format
Makes replication easy
Can integrate into documents (with R markdown)
R is a language so it can do everything

library("gapminder")library("tidyverse")ggplot(data = gapminder, aes(x = gdpPercap, y = lifeExp, color = continent))+ geom_point(alpha=0.3)+ geom_smooth(method = "lm")+ scale_x_log10(breaks=c(1000,10000, 100000), label=scales::dollar)+ labs(x = "GDP/Capita", y = "Life Expectancy (Years)")+ facet_wrap(~continent)+ guides(color = F)+ theme_light()

The average GDP per capita is $`r
round(mean(gapminder$gdpPercap),2)` with a standard deviation of $`r
round(sd(gapminder$gdpPercap),2)` .
The average GDP per capita is $7215.33 with a standard deviation of $9857.45.
R is the programming language that executes commands
R Studio is an integrated development environment (IDE) that makes your coding life a lot easier
R Markdown
R Studio
R is like your car's engine, R Studio is the dashboard
You will do everything in R Studio
R itself is just a command language (you could run it in your computer's shell/terminal/command prompt)

R Studio
R Studio has 4 window panes:

R Studio
†May not be immediately visible until you create new files.
You don't “learn R”, you learn how to do things in R
In order to do learn this, you need to learn how to search for what you want to do
You don't “learn R”, you learn how to do things in R
In order to do learn this, you need to learn how to search for what you want to do
My #rstats learning path:
— Jesse Mostipak (@kierisi) August 18, 2017
1. Install R
2. Install RStudio
3. Google "How do I [THING I WANT TO DO] in R?"
Repeat step 3 ad infinitum.
A surprisingly large part of having expertise in a topic is not so much knowing everything about it but learning the language and sources well enough to be extremely efficient in google searches.
— Katie Mack (@AstroKatie) December 8, 2018

Type individual commands into the console window
Great for testing individual commands to see what happens
Not saved! Not reproducible! Not recommended!
2+2
## [1] 4summary(mpg$hwy)
## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 12.00 18.00 24.00 23.44 27.00 44.00Type individual commands into the console window
Great for testing individual commands to see what happens
Not saved! Not reproducible! Not recommended!

Source pane is a text-editor
Make .R files: all input commands in a single script
Comment with #
Can run any or all of script at once
Can save, reproduce, and send to others!

A later lecture: R Markdown, a simple markup language to write documents in
Can integrate text, R code, figures, citations & bibliographies in a single plain-text file & output into a variety of formats: PDF, webpage, slides, Word doc, etc.

Practicing typing at the Command line/Console
Learning different commands and objects relevant for data analysis
Saving and running .R scripts
Later: R markdown, literate programming, workflow management
Today may seem a bit overwhelming
R assumes a default (often inconvenient) "working directory" on your computer
open or save files Find out where R this is with getwd()
Change it with setwd(path/to/folder)†
Soon I'll show you better ways where you won't ever have to worry about this
† Note the path is OS-specific. For Windows it might be C:/Documents/. For Mac it is often your username folder.

Hadley Wickham
Chief Scientist, R Studio
"There’s an implied contract between you and R: it will do the tedious computation for you, but in return, you must be completely precise in your instructions. Typos matter. Case matters." - R for Data Science, Ch. 4


help(function_name) or ?(function_name) to get documentation on a functionFrom Kieran Healy's excellent (free online!) book on Data Visualization.

]
# starts a comment, R will ignore everything on the rest of that line# Run regression of y on x, save as reg1 reg1<-lm(y~x, data=data) #runs regression summary(reg1$coefficients) #prints coefficients
I follow this style guide (you are not required to)†
Naming objects and files will become important‡
my webpage in html turned into http://my%20webpage%20in%20html.htmli_use_underscoressome.people.use.snake.caseothersUseCamelCase
† Also described in today's course notes page and the course resources.
‡ Consider your folders on your computer as well...
You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
> 2+2
## [1] 4You'll have to get used to the fact that you are coding in commands to execute
Start with the easiest: simple math operators and calculations:
> 2+2
## [1] 4> and give you output starting with ## [1]2^3
## [1] 82^3
## [1] 8sqrt(25)
## [1] 52^3
## [1] 8sqrt(25)
## [1] 5log(6)
## [1] 1.7917592^3
## [1] 8sqrt(25)
## [1] 5log(6)
## [1] 1.791759pi/2
## [1] 1.570796library()library("package_name")install.packages()† install.packages("package_name")
creating objects
= (or <-)running functions on objects
function_name(object_name)# make an objectmy_object = c(1,2,3,4,5)# look at it my_object
## [1] 1 2 3 4 5# find the sumsum(my_object)
## [1] 15# find the mean mean(my_object)
## [1] 3Functions have "arguments," the input(s)
Some functions may have multiple inputs
The argument of a function can be another function!
# find the sdsd(my_object)
## [1] 1.581139# round everything in my object to two decimalsround(my_object,2)
## [1] 1 2 3 4 5# round the sd to two decimalsround(sd(my_object),2)
## [1] 1.58Numeric objects are just numbers†
Can be mathematically manipulated
x = 2 y = 3x+y
## [1] 5x*y
## [1] 6integer or double if there are decimal values.Character objects are “strings” of text held inside quote marks
Can contain spaces, so long as contained within quote marks
name = "Ryan Safner"address = "Washington D.C."name
## [1] "Ryan Safner"address
## [1] "Washington D.C."TRUE or FALSE indicators>, <: greater than, less than>=, <=: greater than or equal to, less than or equal to==, !=: is equal to, is not equal to†&in& : is a member of the set of (∈)&: "AND"|: "OR" † One = assigns a value (like <-).
Two == evaluate a conditional statement.
z = 10 # set z equal to 10z==10 # is z equal to 10?
## [1] TRUE"red"=="blue" # is red equal to blue?
## [1] FALSEz > 1 & z < 12 # is z > 1 AND < 12?
## [1] TRUEz <= 1 | z==10 # is z >= 1 OR equal to 10?
## [1] TRUEFactor objects contain categorical data - membership in mutually exclusive groups
Look like strings, behave more like logicals, but with more than two options
## [1] senior junior freshman junior sophomore sophomore sophomore## [8] senior senior sophomore## Levels: freshman sophomore junior senior## [1] senior junior freshman junior sophomore sophomore sophomore## [8] senior senior sophomore## Levels: freshman < sophomore < junior < seniorVector: the simplest type of object, just a collection of elements
Make a vector using the combine c() function
# create a vector named vecvec = c(1,"orange", 83.5, pi)# look at vecvec
## [1] "1" "orange" "83.5" "3.14159265358979"Data.frame: what we'll be using almost always
Think like a “spreadsheet”
Each column is a vector (variable)
Each row is an observation (pair of values for all variables)
library("ggplot2")diamonds
## # A tibble: 53,940 × 10## carat cut color clarity depth table price x y z## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31## 4 0.29 Premium I VS2 62.4 58 334 4.2 4.23 2.63## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49## 10 0.23 Very Good H VS1 59.4 61 338 4 4.05 2.39## # … with 53,930 more rowsDataframes are really just combinations of (column) vectors
You can make data frames by combinining named vectors with data.frame() or creating each column/vector in each argument
# make two vectorsfruits = c("apple","orange","pear","kiwi","pineapple")numbers = c(3.3,2.0,6.1,7.5,4.2)# combine into dataframedf = data.frame(fruits,numbers)# do it all in one step (note the = instead of <-)df = data.frame(fruits=c("apple","orange","pear","kiwi","pineapple"), numbers=c(3.3,2.0,6.1,7.5,4.2))# look at itdf
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2= or <- my_vector = c(1,2,3,4,5)
my_vector
## [1] 1 2 3 4 5my_vector
## [1] 1 2 3 4 5my_vector = c(2,7,9,1,5)my_vector
## [1] 2 7 9 1 5class()class("six")
## [1] "character"class(6)
## [1] "numeric"class()class("six")
## [1] "character"class(6)
## [1] "numeric"is.() is.numeric("six")
## [1] FALSEis.character("six")
## [1] TRUEas.object_class()numeric, etc! as.character(6)
## [1] "6"as.numeric("six")
## [1] NAmixed_vector = c(pi, 12, "apple", 6.32)class(mixed_vector)
## [1] "character"mixed_vector
## [1] "3.14159265358979" "12" "apple" "6.32"df
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2class(df$fruits)
## [1] "character"class(df$numbers)
## [1] "numeric"†Remember each column in a data frame is a vector!
str() command to view its structureclass(df)
## [1] "data.frame"str(df)
## 'data.frame': 5 obs. of 2 variables:## $ fruits : chr "apple" "orange" "pear" "kiwi" ...## $ numbers: num 3.3 2 6.1 7.5 4.2n) rows with head()head(df)
## fruits numbers## 1 apple 3.3## 2 orange 2.0## 3 pear 6.1## 4 kiwi 7.5## 5 pineapple 4.2head(df, n=2)
## fruits numbers## 1 apple 3.3## 2 orange 2.0summary()summary(df)
## fruits numbers ## Length:5 Min. :2.00 ## Class :character 1st Qu.:3.30 ## Mode :character Median :4.20 ## Mean :4.62 ## 3rd Qu.:6.10 ## Max. :7.50† For numeric data only; a frequency table is displayed for character or factor data

data.frame objects can be viewed in their own panel by clicking on the name of the object in the environment pane
my_vector = c(2,4,5,10) # create object called my_vectormy_vector # look at it
## [1] 2 4 5 10my_vector+4 # add 4 to all elements of my_vector
## [1] 6 8 9 14my_vector^2 # square all elements of my_vector
## [1] 4 16 25 100length(my_vector) # how many elements?
## [1] 4sum(my_vector) # add all elements together
## [1] 21max(my_vector) # find largest element
## [1] 10min(my_vector) # find smallest element
## [1] 2.pull-right[
mean(my_vector) # mean of all elements
## [1] 5.25median(my_vector) # median of all elements
## [1] 4.5var(my_vector) # variance of object
## [1] 11.58333sd(my_vector) # standard deviation of object
## [1] 3.40343+ sign waiting for you to finish the command> 2+(2*3+
)--or hit Esc to cancelmtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40The mtcars dataset is automatically built in with R.
df[r,c]r or c blank selects all rows or columnsc()†:r and c! † You can also "negate" values, selecting everything except for values with a - in front of them.
mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[1,] # first row
## mpg cyl disp hp drat wt qsec## Mazda RX4 21 6 160 110 3.9 2.62 16.46mtcars[c(1,3,4),] # first, third, and fourth rows
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Datsun 710 22.8 4 108 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44mtcars[1:3,] # first through third rows
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02## Datsun 710 22.8 4 108 93 3.85 2.320 18.61mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[,6] # select column 6
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070mtcars[,2:4] # select columns 2 through 4
## cyl disp hp## Mazda RX4 6 160.0 110## Mazda RX4 Wag 6 160.0 110## Datsun 710 4 108.0 93## Hornet 4 Drive 6 258.0 110## Hornet Sportabout 8 360.0 175## Valiant 6 225.0 105## Duster 360 8 360.0 245## Merc 240D 4 146.7 62## Merc 230 4 140.8 95## Merc 280 6 167.6 123## Merc 280C 6 167.6 123## Merc 450SE 8 275.8 180mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40[[]] selects a column by positionmtcars[[6]] # select sixth column (wt)
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070$mtcars$wt # does the same thing!
## [1] 2.620 2.875 2.320 3.215 3.440 3.460 3.570 3.190 3.150 3.440 3.440 4.070mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[mtcars$wt>4,] # select all obs with wt>4
## mpg cyl disp hp drat wt qsec## Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.4mtcars[mtcars$cyl==6,] # select all obs with exactly 6 cyl
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90mtcars
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40mtcars[mtcars$wt>2 & mtcars$wt<3,] # obs where 2<wt<3
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02## Datsun 710 22.8 4 108 93 3.85 2.320 18.61mtcars[mtcars$cyl==4 | mtcars$cyl==6,] # obs with 4 OR 6 cyl
## mpg cyl disp hp drat wt qsec## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44## Valiant 18.1 6 225.0 105 2.76 3.460 20.22## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90Next class: data visualization with ggplot2
And then: data wrangling with tidyverse
And then: literate programming and workflow management with R Markdown, R Projects, maybe git
Finally: onto statistics and econometric theory!