Admittedly, we still need to cover basic descriptive statistics and data fundamentals
All of this is coming in 2 weeks as we return to statistics and econometric theory
But let's start with the fun stuff right away, even if you don't fully know the reasons: data visualiation
mpg
from the ggplot2
librarylibrary(ggplot2)head(mpg)
## # A tibble: 6 × 11## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
Base R
is very powerful and intuitive to plot, but not very sexy
Basic syntax for most types of plots:
plot_type(my_df$variable)
$
by just typing the variable names and then in another argument to the plotting function, specify data = my_df
plot_type(my_df$variable1, my_df$variable2, data = my_df)
mpg
data, plotting a histogram of hwy
hist(mpg$hwy)
mpg
data, plotting a boxplot of hwy
boxplot(mpg$hwy)
mpg
data, plotting a boxplot of hwy
by class
boxplot(mpg$hwy ~ mpg$class)
# second methodboxplot(mpg ~ class, data = mtcars)
~
is part of R
's “formula notation”: +
'sy~x+z
means "y
is explained by x
and z
"mpg
data, plotting a scatterplot of hwy
against displ
plot(mpg$hwy ~ mpg$displ)
# second methodplot(hwy ~ displ, data = mpg)
"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Largely (but not only) created by Hadley Wickham
We will look at this much more extensively next week!
This "flavor" of R
will make your coding life so much easier!
ggplot2
is perhaps the most popular package in R
and a core element of the tidyverse
gg
stands for a grammar of graphics
Very powerful and beautiful graphics, very customizable and reproducible, but requires a bit of a learning curve
All those "cool graphics" you've seen in the New York Times, fivethirtyeight, the Economist, Vox, etc use the grammar of graphics
Hadley Wickham
Chief Scientist, R Studio
"The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive."
This is a true grammar
We don’t talk about specific chart types
Instead we talk about specific chart components
Any graphic can be built from the same components:
Not every plot needs every component, but all plots must have the first 3!
Any graphic can be built from the same components:
data
to be drawn fromaes
thetic mappings from data to some visual markinggeom
metric objects on the plotscale
define the range of valuescoord
inates to organize locationlabels
describe the scale and markingsfacet
group into subplotstheme
style the plot elementsNot every plot needs every component, but all plots must have the first 3!
Produces plot output in viewer
Does not save plot
Export
menu in viewerAdding layers requires whole code for new plot
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()+ geom_smooth()
Saves your plot as an R
object
Does not show in viewer
Can add layers by calling the original plot name
# make and save plotp <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()p # view plot# add a layerp + geom_smooth() # shows new plotp <- p + geom_smooth() # saves and overwrites pp2 <- p + geom_smooth() # saves as different object
ggplot(data = mpg)
Data is the source of our data. As part of the tidyverse
, ggplot2
requires data to be "tidy"1:
Each variable forms a column
Each observation forms a row
Each observational unit forms a table
1 Data "tidyness" is the core element of all tidyverse
packages. Much more on all of this next class.
Add a layer with +
at the end of a line (never at the beginning!)
Style recommendation: start a new line after each +
to improve legibility!
We will build a plot layer-by-layer
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
displ
hwy
class
+ aes()
Aesthetics map data to visual elements or parameters
displ
→ x
hwy
→ y
class
→ shape, size, color, etc.
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
aes(x = displ, y = hwy, color = class)
+ geom_*()
Geometric objects displayed on the plot
+ geom_*()
Geometric objects displayed on the plot
geom
s you should use depends on what you want to show:Type | geom |
---|---|
Point | geom_point() |
Line | geom_line() , geom_path() |
Bar | geom_bar() , geom_col() |
Histogram | geom_histogram() |
Regression | geom_smooth() |
Boxplot | geom_boxplot() |
Text | geom_text() |
Density | geom_density() |
+ geom_*()
Geometric objects displayed on the plot
## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin2d" ## [5] "geom_blank" "geom_boxplot" "geom_col" "geom_contour" ## [9] "geom_count" "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" "geom_errorbar" ## [17] "geom_errorbarh" "geom_freqpoly" "geom_hex" "geom_histogram" ## [21] "geom_hline" "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" "geom_point" ## [29] "geom_pointrange" "geom_polygon" "geom_qq" "geom_qq_line" ## [33] "geom_quantile" "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" "geom_sf_label" ## [41] "geom_sf_text" "geom_smooth" "geom_spoke" "geom_step" ## [45] "geom_text" "geom_tile" "geom_violin" "geom_vline"
See http://ggplot2.tidyverse.org/reference for many more options
+ geom_*()
Geometric objects displayed on the plot
Or just start typing geom_
in R Studio!
ggplot(data = mpg)
ggplot(data = mpg)+ aes(x = displ, y = hwy)
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()
+ geom_*()
geom_*(aes, data, stat, position)
data
: geoms can have their own data
aes
: geoms can have their own aesthetics
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()
+ geom_*()
geom_*(aes, data, stat, position)
stat
: some geoms statistically transform data
geom_histogram()
uses stat_bin()
to group observations into binsposition
: some adjust location of objects
dodge
, stack
, jitter
ggplot(data = mpg)+ aes(x = class, y = hwy)+ geom_boxplot()
ggplot(data = mpg)+ aes(x = class)+ geom_bar()
ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar()
ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar(position = "dodge")
p <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()p # show plot
+ facet_wrap()
+ facet_grid()
p + facet_wrap(~year)
+ facet_wrap()
+ facet_grid()
p + facet_grid(cyl~year)
+ labs()
p + facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")
+ scale_*_*()
scale
+_
+<aes>
+_
+<type>
+()
<aes>
: parameter you want to adjust<type
: type of parameter
I want to change my discrete x-axis: scale_x_discrete()
scale_y_continuous()
scale_x_log10()
scale_fill_discrete()
, scale_color_manual()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()
+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Some themes that come with ggplot2
:
+ theme_bw()
+ theme_dark()
+ theme_gray()
+ theme_minimal()
+ theme_light()
+ theme_classic()
+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Many parameters we could change
Global options: line
, rect
, text
, title
axis
: x-, y-, or other axis title, ticks, lineslegend
: plot legends for fill or colorpanel
: actual plot areaplot
: whole imagestrip
: facet labelsggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_bw()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"))
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
+ theme_*()
ggthemes
package adds some other nice themes# install if you don't have it# install.packages("ggthemes")library("ggthemes") # load package
library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_economist()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_fivethirtyeight()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
aes()
can go in base (data
) layer and/or in individual geom()
layersgeoms
will inherit global aes
from data
layer unless overridden# ALL GEOMS will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy, color = class))+ geom_point()+ geom_smooth()
# ONLY points will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()
aes
thetics such as size
and color
can be mapped from data or set to a single valueaes()
, set outside of aes()
# Point colors are mapped from class dataggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()
# Point colors are all set to blueggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(), color = "red")+ geom_smooth(aes(), color = "blue")
# I did some (hidden) data work before this! ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens, group = group))+ geom_polygon(color = "gray90", size = 0.05)+ coord_equal()+ scale_fill_brewer(palette="Blues", labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))+ labs(fill = "Population per\nsquare mile") + theme_map() + guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
library("gapminder")library("gganimate")ggplot(gapminder) + aes(x = gdpPercap, y = lifeExp, size = pop, color = country) + geom_point() + guides(color = FALSE, size = FALSE) + scale_x_log10( breaks = c(10^3, 10^4, 10^5), labels = c("$1k", "$10k", "$100k")) + scale_color_manual(values = gapminder::country_colors) + scale_size(range = c(0.5, 12)) + labs( x = "GDP per capita", y = "Life Expectancy", caption = "Source: Hans Rosling's gapminder.org") + theme_minimal(14, base_family = "Fira Sans") + theme( strip.text = element_text(size = 16, face = "bold"), panel.border = element_rect(fill = NA, color = "grey40"), panel.grid.minor = element_blank())+ transition_states(year, 1, 0)+ ggtitle("Income and Life Expectancy - {closest_state}")
We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
Remember:
"Shoot me"
"Shoot me"
Less is More:
New York Times: "How Stable Are Democracies? ‘Warning Signs Are Flashing Red’", Nov 29, 2016
On ggplot2
ggplot2
's website reference sectionOn data visualization
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Admittedly, we still need to cover basic descriptive statistics and data fundamentals
All of this is coming in 2 weeks as we return to statistics and econometric theory
But let's start with the fun stuff right away, even if you don't fully know the reasons: data visualiation
mpg
from the ggplot2
librarylibrary(ggplot2)head(mpg)
## # A tibble: 6 × 11## manufacturer model displ year cyl trans drv cty hwy fl class ## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr> ## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
Base R
is very powerful and intuitive to plot, but not very sexy
Basic syntax for most types of plots:
plot_type(my_df$variable)
$
by just typing the variable names and then in another argument to the plotting function, specify data = my_df
plot_type(my_df$variable1, my_df$variable2, data = my_df)
mpg
data, plotting a histogram of hwy
hist(mpg$hwy)
mpg
data, plotting a boxplot of hwy
boxplot(mpg$hwy)
mpg
data, plotting a boxplot of hwy
by class
boxplot(mpg$hwy ~ mpg$class)
# second methodboxplot(mpg ~ class, data = mtcars)
~
is part of R
's “formula notation”: +
'sy~x+z
means "y
is explained by x
and z
"mpg
data, plotting a scatterplot of hwy
against displ
plot(mpg$hwy ~ mpg$displ)
# second methodplot(hwy ~ displ, data = mpg)
"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.
Largely (but not only) created by Hadley Wickham
We will look at this much more extensively next week!
This "flavor" of R
will make your coding life so much easier!
ggplot2
is perhaps the most popular package in R
and a core element of the tidyverse
gg
stands for a grammar of graphics
Very powerful and beautiful graphics, very customizable and reproducible, but requires a bit of a learning curve
All those "cool graphics" you've seen in the New York Times, fivethirtyeight, the Economist, Vox, etc use the grammar of graphics
Hadley Wickham
Chief Scientist, R Studio
"The transferrable skills from ggplot2 are not the idiosyncracies of plotting syntax, but a powerful way of thinking about visualisation, as a way of mapping between variables and the visual properties of geometric objects that you can perceive."
This is a true grammar
We don’t talk about specific chart types
Instead we talk about specific chart components
Any graphic can be built from the same components:
Not every plot needs every component, but all plots must have the first 3!
Any graphic can be built from the same components:
data
to be drawn fromaes
thetic mappings from data to some visual markinggeom
metric objects on the plotscale
define the range of valuescoord
inates to organize locationlabels
describe the scale and markingsfacet
group into subplotstheme
style the plot elementsNot every plot needs every component, but all plots must have the first 3!
Produces plot output in viewer
Does not save plot
Export
menu in viewerAdding layers requires whole code for new plot
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()+ geom_smooth()
Saves your plot as an R
object
Does not show in viewer
Can add layers by calling the original plot name
# make and save plotp <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()p # view plot# add a layerp + geom_smooth() # shows new plotp <- p + geom_smooth() # saves and overwrites pp2 <- p + geom_smooth() # saves as different object
ggplot(data = mpg)
Data is the source of our data. As part of the tidyverse
, ggplot2
requires data to be "tidy"1:
Each variable forms a column
Each observation forms a row
Each observational unit forms a table
1 Data "tidyness" is the core element of all tidyverse
packages. Much more on all of this next class.
Add a layer with +
at the end of a line (never at the beginning!)
Style recommendation: start a new line after each +
to improve legibility!
We will build a plot layer-by-layer
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
displ
hwy
class
+ aes()
Aesthetics map data to visual elements or parameters
displ
→ x
hwy
→ y
class
→ shape, size, color, etc.
+ aes()
Aesthetics map data to visual elements or parameters
+ aes()
Aesthetics map data to visual elements or parameters
aes(x = displ, y = hwy, color = class)
+ geom_*()
Geometric objects displayed on the plot
+ geom_*()
Geometric objects displayed on the plot
geom
s you should use depends on what you want to show:Type | geom |
---|---|
Point | geom_point() |
Line | geom_line() , geom_path() |
Bar | geom_bar() , geom_col() |
Histogram | geom_histogram() |
Regression | geom_smooth() |
Boxplot | geom_boxplot() |
Text | geom_text() |
Density | geom_density() |
+ geom_*()
Geometric objects displayed on the plot
## [1] "geom_abline" "geom_area" "geom_bar" "geom_bin2d" ## [5] "geom_blank" "geom_boxplot" "geom_col" "geom_contour" ## [9] "geom_count" "geom_crossbar" "geom_curve" "geom_density" ## [13] "geom_density_2d" "geom_density2d" "geom_dotplot" "geom_errorbar" ## [17] "geom_errorbarh" "geom_freqpoly" "geom_hex" "geom_histogram" ## [21] "geom_hline" "geom_jitter" "geom_label" "geom_line" ## [25] "geom_linerange" "geom_map" "geom_path" "geom_point" ## [29] "geom_pointrange" "geom_polygon" "geom_qq" "geom_qq_line" ## [33] "geom_quantile" "geom_raster" "geom_rect" "geom_ribbon" ## [37] "geom_rug" "geom_segment" "geom_sf" "geom_sf_label" ## [41] "geom_sf_text" "geom_smooth" "geom_spoke" "geom_step" ## [45] "geom_text" "geom_tile" "geom_violin" "geom_vline"
See http://ggplot2.tidyverse.org/reference for many more options
+ geom_*()
Geometric objects displayed on the plot
Or just start typing geom_
in R Studio!
ggplot(data = mpg)
ggplot(data = mpg)+ aes(x = displ, y = hwy)
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()
+ geom_*()
geom_*(aes, data, stat, position)
data
: geoms can have their own data
aes
: geoms can have their own aesthetics
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()
+ geom_*()
geom_*(aes, data, stat, position)
stat
: some geoms statistically transform data
geom_histogram()
uses stat_bin()
to group observations into binsposition
: some adjust location of objects
dodge
, stack
, jitter
ggplot(data = mpg)+ aes(x = class, y = hwy)+ geom_boxplot()
ggplot(data = mpg)+ aes(x = class)+ geom_bar()
ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar()
ggplot(data = mpg)+ aes(x = class, fill = drv)+ geom_bar(position = "dodge")
p <- ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()p # show plot
+ facet_wrap()
+ facet_grid()
p + facet_wrap(~year)
+ facet_wrap()
+ facet_grid()
p + facet_grid(cyl~year)
+ labs()
p + facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")
+ scale_*_*()
scale
+_
+<aes>
+_
+<type>
+()
<aes>
: parameter you want to adjust<type
: type of parameter
I want to change my discrete x-axis: scale_x_discrete()
scale_y_continuous()
scale_x_log10()
scale_fill_discrete()
, scale_color_manual()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()
+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Some themes that come with ggplot2
:
+ theme_bw()
+ theme_dark()
+ theme_gray()
+ theme_minimal()
+ theme_light()
+ theme_classic()
+ theme_*()
Theme changes appearance of plot decorations (things not mapped to data)
Many parameters we could change
Global options: line
, rect
, text
, title
axis
: x-, y-, or other axis title, ticks, lineslegend
: plot legends for fill or colorpanel
: actual plot areaplot
: whole imagestrip
: facet labelsggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_bw()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"))
ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_minimal()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
+ theme_*()
ggthemes
package adds some other nice themes# install if you don't have it# install.packages("ggthemes")library("ggthemes") # load package
library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_economist()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
library("ggthemes")ggplot(data = mpg)+ aes(x = displ, y = hwy)+ geom_point(aes(color = class))+ geom_smooth()+ facet_wrap(~year)+ labs(x = "Engine Displacement (Liters)", y = "Highway MPG", title = "Car Mileage and Displacement", subtitle = "More Displacement Lowers Highway MPG", caption = "Source: EPA", color = "Vehicle Class")+ scale_color_viridis_d()+ theme_fivethirtyeight()+ theme(text = element_text(family = "Fira Sans"), legend.position="bottom")
aes()
can go in base (data
) layer and/or in individual geom()
layersgeoms
will inherit global aes
from data
layer unless overridden# ALL GEOMS will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy, color = class))+ geom_point()+ geom_smooth()
# ONLY points will map data to colorsggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()
aes
thetics such as size
and color
can be mapped from data or set to a single valueaes()
, set outside of aes()
# Point colors are mapped from class dataggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(color = class))+ geom_smooth()
# Point colors are all set to blueggplot(data = mpg, aes(x = displ, y = hwy))+ geom_point(aes(), color = "red")+ geom_smooth(aes(), color = "blue")
# I did some (hidden) data work before this! ggplot(data = county_full, mapping = aes(x = long, y = lat, fill = pop_dens, group = group))+ geom_polygon(color = "gray90", size = 0.05)+ coord_equal()+ scale_fill_brewer(palette="Blues", labels = c("0-10", "10-50", "50-100", "100-500", "500-1,000", "1,000-5,000", ">5,000"))+ labs(fill = "Population per\nsquare mile") + theme_map() + guides(fill = guide_legend(nrow = 1)) + theme(legend.position = "bottom")
library("gapminder")library("gganimate")ggplot(gapminder) + aes(x = gdpPercap, y = lifeExp, size = pop, color = country) + geom_point() + guides(color = FALSE, size = FALSE) + scale_x_log10( breaks = c(10^3, 10^4, 10^5), labels = c("$1k", "$10k", "$100k")) + scale_color_manual(values = gapminder::country_colors) + scale_size(range = c(0.5, 12)) + labs( x = "GDP per capita", y = "Life Expectancy", caption = "Source: Hans Rosling's gapminder.org") + theme_minimal(14, base_family = "Fira Sans") + theme( strip.text = element_text(size = 16, face = "bold"), panel.border = element_rect(fill = NA, color = "grey40"), panel.grid.minor = element_blank())+ transition_states(year, 1, 0)+ ggtitle("Income and Life Expectancy - {closest_state}")
We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
We will return to various graphics as we cover descriptive statistics and regression
I hope to cover some basic principles of good graphic design for figures and plots
Remember:
"Shoot me"
"Shoot me"
Less is More:
New York Times: "How Stable Are Democracies? ‘Warning Signs Are Flashing Red’", Nov 29, 2016
On ggplot2
ggplot2
's website reference sectionOn data visualization