Getting Set Up
Before we begin, start a new file with File
\(\rightarrow\) New File
\(\rightarrow\) R Script
. As you work through this sheet in the console in R
, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
“Our Plot” from Class
# load ggplot2 package
library(ggplot2)
# make plot
ggplot(data = mpg)+ # set data source to mpg (included in ggplot2)
aes(x = displ, # x is displacement
y = hwy)+ # y is hwy mpg
geom_point(aes(color = class))+ # color points by car class
geom_smooth()+ # add regression line
facet_wrap(~year)+ # separate plots by year
labs(x = "Engine Displacement (Liters)",
y = "Highway MPG",
title = "Car Mileage and Displacement",
subtitle = "More Displacement Lowers Highway MPG",
caption = "Source: EPA",
color = "Vehicle Class")+
scale_color_viridis_d()+ # change color scale
theme_minimal()+ # change theme
theme(text = element_text(family = "Fira Sans")) # change font
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Download and run in R Studio on your computer (or open the file in our R Studio cloud project and run it there) to see our plot from class.
Exploring the Data
Question 1
We will look at GDP per Capita and Life Expectancy using some data from the gapminder project. There is a handy package called gapminder
that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder
. Type ?gapminder
and hit enter to see a description of the data.
Question 2
Let’s get a quick look at gapminder
to see what we’re dealing with.
- Get the
str
ucture of thegapminder
data. - What variables are there?
- Look at the
head
of the dataset to get an idea of what the data looks like. - Get
summary
statistics of all variables.
Simple Plots in Base R
Question 3
Let’s make sure you can do some basic plots before we get into the gg
. Use base R
’s hist()
function to plot a histogram of gdpPercap
.
Question 4
Use base R
’s boxplot()
function to plot a boxplot of gdpPercap
.
Question 5
Now make it a boxplot by continent
. Hint: use formula notation with ~
.
Question 6
Now make a scatterplot of gdpPercap
on the \(x\)-axis and LifeExp
on the \(y\)-axis.
Plots with ggplot2
Question 7
Load the package ggplot2
(you should have installed it previously. If not, install first with install.packages("ggplot2")
).
Question 8
Let’s first make a bar
graph to see how many countries are in each continent. The only aes
thetic you need is to map continent
to x
. Bar graphs are great for representing categories, but not quantitative data.
Question 9
For quantitative data, we want a histogram
to visualize the distribution of a variable. Make a histogram
of gdpPercap
. Your only aes
thetic here is to map gdpPercap
to x
.
Question 10
Now let’s try adding some color, specifically, add an aes
thetic that maps continent
to fill.
(In general, color
refers to the outside borders of a geom
(except points), fill
is the interior of an object.)
Question 11
Instead of a histogram
, change the geom
to make it a density
graph. To avoid overplotting, add alpha=0.4
to the geom
argument (alpha changes the transparency of a fill
).
Question 12
Redo your plot from 11 for lifeExp
instead of gdpPercap
.
Question 13
Now let’s try a scatterplot for lifeExp
(as y
) on gdpPercap
(as x
). You’ll need both for aes
thetics. The geom
here is geom_point()
.
Question 14
Add some color by mapping continent
to color
in your aes
thetics.
Question 15
Now let’s try adding a regression line with geom_smooth()
. Add this layer on top of your geom_point()
layer.
Question 16
Did you notice that you got multiple regression lines (colored by continent)? That’s because we set a global
aes
thetic of mapping continent
to color
. If we want just one regression line, we need to instead move the color = continent
inside the aes
of geom_point
. This will only map continent
to color
for points, not for anything else.
Question 17
Now add an aes
thetic to your point
s to map pop
to size
.
Question 18
Change the color of the regression line to "black"
. Try first by putting this inside an aes()
in your geom_smooth
, and try a second time by just putting it inside geom_smooth
without an aes()
. What’s the difference, and why?
Question 19
Another way to separate out continents is with facet
ing. Add +facet_wrap(~continent)
to create subplots by continent
.
Question 20
Remove the facet
layer. The scale
is quite annoying for the x
-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10()
.
Question 21
Now let’s fix the labels by adding +labs()
. Inside labs
, make proper axes titles for x
, y
, and a title
to the plot. If you want to change the name of the legends (continent color), add one for color
and size
.
Question 22
Now let’s try subsetting by looking only at North America. Take the gapminder
dataframe and subset it to only look at continent=="Americas"
). Assign this to a new dataframe object (call it something like america
.) Now, use this as your data
, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)
Question 23
Try this again for the whole world, but just for observations in the year 2002.