1.3 — Data Visualization with ggplot2 — Practice

Download PDF

Getting Set Up

Before we begin, start a new file with File \(\rightarrow\) New File \(\rightarrow\) R Script. As you work through this sheet in the console in R, also add (copy/paste) your commands that work into this new file. At the end, save it, and run to execute all of your commands at once.

“Our Plot” from Class

# load ggplot2 package
library(ggplot2)

# make plot
ggplot(data = mpg)+ # set data source to mpg (included in ggplot2)
  aes(x = displ, # x is displacement
      y = hwy)+ # y is hwy mpg
  geom_point(aes(color = class))+ # color points by car class
  geom_smooth()+ # add regression line
  facet_wrap(~year)+ # separate plots by year
  labs(x = "Engine Displacement (Liters)",
       y = "Highway MPG",
       title = "Car Mileage and Displacement",
       subtitle = "More Displacement Lowers Highway MPG",
       caption = "Source: EPA",
       color = "Vehicle Class")+
  scale_color_viridis_d()+ # change color scale
  theme_minimal()+ # change theme
  theme(text = element_text(family = "Fira Sans")) # change font
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Download and run in R Studio on your computer (or open the file in our R Studio cloud project and run it there) to see our plot from class.

Exploring the Data

Question 1

We will look at GDP per Capita and Life Expectancy using some data from the gapminder project. There is a handy package called gapminder that uses a small snippet of this data for exploratory analysis. Install and load the package gapminder. Type ?gapminder and hit enter to see a description of the data.

Question 2

Let’s get a quick look at gapminder to see what we’re dealing with.

  1. Get the structure of the gapminder data.
  2. What variables are there?
  3. Look at the head of the dataset to get an idea of what the data looks like.
  4. Get summary statistics of all variables.

Simple Plots in Base R

Question 3

Let’s make sure you can do some basic plots before we get into the gg. Use base R’s hist() function to plot a histogram of gdpPercap.

Question 4

Use base R’s boxplot() function to plot a boxplot of gdpPercap.

Question 5

Now make it a boxplot by continent. Hint: use formula notation with ~.

Question 6

Now make a scatterplot of gdpPercap on the \(x\)-axis and LifeExp on the \(y\)-axis.

Plots with ggplot2

Question 7

Load the package ggplot2 (you should have installed it previously. If not, install first with install.packages("ggplot2")).

Question 8

Let’s first make a bar graph to see how many countries are in each continent. The only aesthetic you need is to map continent to x. Bar graphs are great for representing categories, but not quantitative data.

Question 9

For quantitative data, we want a histogram to visualize the distribution of a variable. Make a histogram of gdpPercap. Your only aesthetic here is to map gdpPercap to x.

Question 10

Now let’s try adding some color, specifically, add an aesthetic that maps continent to fill. (In general, color refers to the outside borders of a geom (except points), fill is the interior of an object.)

Question 11

Instead of a histogram, change the geom to make it a density graph. To avoid overplotting, add alpha=0.4 to the geom argument (alpha changes the transparency of a fill).

Question 12

Redo your plot from 11 for lifeExp instead of gdpPercap.

Question 13

Now let’s try a scatterplot for lifeExp (as y) on gdpPercap (as x). You’ll need both for aesthetics. The geom here is geom_point().

Question 14

Add some color by mapping continent to color in your aesthetics.

Question 15

Now let’s try adding a regression line with geom_smooth(). Add this layer on top of your geom_point() layer.

Question 16

Did you notice that you got multiple regression lines (colored by continent)? That’s because we set a global aesthetic of mapping continent to color. If we want just one regression line, we need to instead move the color = continent inside the aes of geom_point. This will only map continent to color for points, not for anything else.

Question 17

Now add an aesthetic to your points to map pop to size.

Question 18

Change the color of the regression line to "black". Try first by putting this inside an aes() in your geom_smooth, and try a second time by just putting it inside geom_smooth without an aes(). What’s the difference, and why?

Question 19

Another way to separate out continents is with faceting. Add +facet_wrap(~continent) to create subplots by continent.

Question 20

Remove the facet layer. The scale is quite annoying for the x-axis, a lot of points are clustered on the lower level. Let’s try changing the scale by adding a layer: +scale_x_log10().

Question 21

Now let’s fix the labels by adding +labs(). Inside labs, make proper axes titles for x, y, and a title to the plot. If you want to change the name of the legends (continent color), add one for color and size.

Question 22

Now let’s try subsetting by looking only at North America. Take the gapminder dataframe and subset it to only look at continent=="Americas"). Assign this to a new dataframe object (call it something like america.) Now, use this as your data, and redo the graph from question 17. (You might want to take a look at your new dataframe to make sure it worked first!)

Question 23

Try this again for the whole world, but just for observations in the year 2002.

Previous
Next