Getting Set Up
Before we begin, start a new file with File New File R Script. As you work through this sheet in the console in R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.
“Our Plot” from Class
# load ggplot2 package
library(ggplot2)
# make plot
ggplot(data = mpg)+ # set data source to mpg (included in ggplot2)
aes(x = displ, # x is displacement
y = hwy)+ # y is hwy mpg
geom_point(aes(color = class))+ # color points by car class
geom_smooth()+ # add regression line
facet_wrap(~year)+ # separate plots by year
labs(x = "Engine Displacement (Liters)",
y = "Highway MPG",
title = "Car Mileage and Displacement",
subtitle = "More Displacement Lowers Highway MPG",
caption = "Source: EPA",
color = "Vehicle Class")+
scale_color_viridis_d()+ # change color scale
theme_minimal()+ # change theme
theme(text = element_text(family = "Fira Sans")) # change font
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Download and run in R Studio on your computer (or open the file in our R Studio cloud project and run it there) to see our plot from class.
Exploring the Data
Question 1
We will look at GDP per Capita and Life Expectancy using some data from the gapminder project. There is a handy package called gapminder that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder. Type ?gapminder and hit enter to see a description of the data.
Simple Plots in Base R
Plots with ggplot2
Question 7
Load the package ggplot2 (you should have installed it previously. If not, install first with install.packages("ggplot2")).
Question 8
Let’s first make a bar graph to see how many countries are in each continent. The only aesthetic you need is to map continent to x. Bar graphs are great for representing categories, but not quantitative data.
Question 9
For quantitative data, we want a histogram to visualize the distribution of a variable. Make a histogram of gdpPercap. Your only aesthetic here is to map gdpPercap to x.
Question 10
Now let’s try adding some color, specifically, add an aesthetic that maps continent to fill. (In general, color refers to the outside borders of a geom (except points), fill is the interior of an object.)
Question 11
Instead of a histogram, change the geom to make it a density graph. To avoid overplotting, add alpha=0.4 to the geom argument (alpha changes the transparency of a fill).
Question 13
Now let’s try a scatterplot for lifeExp (as y) on gdpPercap (as x). You’ll need both for aesthetics. The geom here is geom_point().
Question 15
Now let’s try adding a regression line with geom_smooth(). Add this layer on top of your geom_point() layer.
Question 16
Did you notice that you got multiple regression lines (colored by continent)? That’s because we set a global aesthetic of mapping continent to color. If we want just one regression line, we need to instead move the color = continent inside the aes of geom_point. This will only map continent to color for points, not for anything else.
Question 18
Change the color of the regression line to "black". Try first by putting this inside an aes() in your geom_smooth, and try a second time by just putting it inside geom_smooth without an aes(). What’s the difference, and why?
Question 19
Another way to separate out continents is with faceting. Add +facet_wrap(~continent) to create subplots by continent.
Question 20
Remove the facet layer. The scale is quite annoying for the x-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10().
Question 21
Now let’s fix the labels by adding +labs(). Inside labs, make proper axes titles for x, y, and a title to the plot. If you want to change the name of the legends (continent color), add one for color and size.
Question 22
Now let’s try subsetting by looking only at North America. Take the gapminder dataframe and subset it to only look at continent=="Americas"). Assign this to a new dataframe object (call it something like america.) Now, use this as your data, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)