Data Visualization
esquisse and ggplot2
2/52
Why learn ggplot2?
More customization:
· branding
· making plots interactive
· combining plots
Easier plot automation (creating plots in scripts)
Faster (eventually)
3/52
ggplot2
· A package for producing graphics - gg = Grammar of Graphics
· Created by Hadley Wickham in 2005
· Belongs to “Tidyverse” family of packages
· “Make a ggplot” = Make a plot with the use of ggplot2 package
Resources:
· [Link]
· [Link]
4/52
ggplot2
Based on the idea of:
layering
plot objects are placed on top of each other with +
📉
➕
📈
5/52
ggplot2
· Pros: extremely powerful/flexible – allows combining multiple plot elements
together, allows high customization of a look, many resources online
· Cons: ggplot2-specific “grammar of graphic” of constructing a plot
· ggplot2 gallery
6/52
Tidy data
To make graphics using ggplot2, our data needs to be in a tidy format
Tidy data:
1. Each variable forms a column.
2. Each observation forms a row.
Messy data:
· Column headers are values, not variable names.
· Multiple variables are stored in one column.
· Variables are stored in both rows and columns.
7/52
Tidy data: example
Each variable forms a column. Each observation forms a row.
8/52
Messy data: example
Column headers are values, not variable names
Read more about tidy data and see other examples: Tidy Data tutorial by Hadley
Wickham
It’s also helpful to have data in long format!!!
9/52
Making data to plot
[Link](3)
var_1 <- seq(from = 1, to = 30)
var_2 <- rnorm(30)
my_data = tibble(var_1, var_2)
my_data
# A tibble: 30 × 2
var_1 var_2
<int> <dbl>
1 1 -0.962
2 2 -0.293
3 3 0.259
4 4 -1.15
5 5 0.196
6 6 0.0301
7 7 0.0854
8 8 1.12
9 9 -1.22
10 10 1.27
# … with 20 more rows
10/52
First plot with ggplot2 package
First layer of code with ggplot2 package
Will set up the plot - it will be empty!
· Aesthetic mapping (mapping= aes(x= , y =)) describes how variables in
our data are mapped to elements of the plot
library(ggplot2) # don't forget to load ggplot2
# This is not code but shows the general format
ggplot({data_to plot}, mapping = aes(x = {var in data to plot},
y = {var in data to plot}))
ggplot(my_data, mapping = aes(x = var_1, y = var_2))
12/52
Next layer code with ggplot2 package
There are many to choose from, to list just a few:
· geom_point() – points (we have seen)
· geom_line() – lines to connect observations
· geom_boxplot()
· geom_histogram()
· geom_bar()
· geom_col()
· geom_errorbar()
· geom_density()
· geom_tile() – blocks filled with color
13/52
Next layer code with ggplot2 package
Need the + sign to add the next layer to specify the type of plot
ggplot({data_to plot}, mapping = aes(x = {var in data to plot},
y = {var in data to plot})) +
geom_{type of plot}</div>
ggplot(my_data, mapping = aes(x = var_1, y = var_2)) +
geom_point()
Read as: add points to the plot (use data as provided by the aesthetic mapping)
14/52
Specifying plot layers: examples
plt1 <-
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point()
plt2 <-
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_line()
plt1; plt2 # to have 2 plots printed next to each other on a slide
Also check out the patchwork package
15/52
Specifying plot layers: combining multiple layers
Layer a plot on top of another plot with +
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point() +
geom_line()
16/52
Customize the look of the plot
You can change look of each layer separately.
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "black", linetype = 2)
17/52
Customize the look of the plot
You can change the look of whole plot using theme_*() functions.
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
theme_dark()
18/52
Customize the look of the plot
You can change the look of whole plot - specific elements, too - like changing
font and font size - or even more fonts
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
theme_bw(base_size = 20, base_family = "Comic Sans MS")
19/52
Adding labels
The labs() function can help you add or modify titles on your plot.
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
labs(title = "My plot of var1 vs var2",
x = "Variable 1",
y = "Variable 2")
20/52
Changing axis
xlim() and ylim() can specify the limits for each axis
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
labs(title = "My plot of var1 vs var2") +
xlim(0,40)
21/52
Changing axis
scale_x_continuous() and scale_y_continuous() can change how the axis is plotted. Can use
the breaks argument to specify how you want the axis ticks to be.
seq(from = 0, to = 30, by = 5)
[1] 0 5 10 15 20 25 30
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
scale_x_continuous(breaks = seq(from = 0, to = 30, by = 5))
22/52
Lab 1
Lab document
23/52
theme() function
The theme() function can help you modify various elements of your plot. Here
we will adjust the horizontal justification (hjust) of the plot title.
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
geom_line(size = 0.8, color = "brown", linetype = 2) +
labs(title = "My plot of var1 vs var2") +
theme([Link] = element_text(hjust = 0.5, size = 20))
24/52
theme() function
The theme() function always takes:
1. an object to change (use ?theme() to see - [Link], [Link],
[Link] etc.)
2. the aspect you are changing about this: element_text(), element_line(),
element_rect(), element_blank()
3. what you are changing:
· text: size, color, fill, face, alpha, angle
· position: "top", "bottom", "right", "left", "none"
· rectangle: size, color, fill, linetype
· line: size, color, linetype
25/52
theme() function
ggplot(my_data, aes(x = var_1, y = var_2)) +
geom_point(size = 5, color = "red", alpha = 0.5) +
labs(title = "My plot of var1 vs var2", x = "Variable 1") +
theme([Link] = element_text(hjust = 0.5, size = 20),
[Link].x = element_text(size = 16))
26/52
theme() function
head(Orange, 3)
Tree age circumference
1 1 118 30
2 1 484 58
3 1 664 87
If specifying position - use: “top”, “bottom”, “right”, “left”, “none”
ggplot(Orange, aes(x = Tree, y = circumference, fill = Tree)) +
geom_boxplot() +
theme([Link] = "none")
27/52
Can make your own theme to use on plots!
Guide on how to: [Link]
28/52
Group and/or color by variable’s values
First, we will generate some data frame for the purpose of demonstration.
· 2 different categories (e.g. pasta, rice)
· 4 different items (e.g. 2 of each category)
· 10 price values changes collected over time for each item
# create 4 vectors: 2x character class and 2x numeric class
item_categ <- rep(c("pasta", "rice"),each = 20)
item_ID <- rep(seq(from = 1, to = 4), each = 10)
item_ID <- paste0("ID_", item_ID)
observation_time <- rep(seq(from = 1, to = 10), times = 4)
item_price_change <- c(sample(0.5:2.5, size = 10, replace = TRUE),
sample(0:1, size = 10, replace = TRUE),
sample(2:5, size = 10, replace = TRUE),
sample(6:9, size = 10, replace = TRUE))
# use 4 vectors to create data frame with 4 columns
food <- tibble(item_ID, item_categ, observation_time, item_price_change)
29/52
Group and/or color by variable’s values
food
# A tibble: 40 × 4
item_ID item_categ observation_time item_price_change
<chr> <chr> <int> <dbl>
1 ID_1 pasta 1 0.5
2 ID_1 pasta 2 2.5
3 ID_1 pasta 3 1.5
4 ID_1 pasta 4 0.5
5 ID_1 pasta 5 0.5
6 ID_1 pasta 6 0.5
7 ID_1 pasta 7 0.5
8 ID_1 pasta 8 0.5
9 ID_1 pasta 9 1.5
10 ID_1 pasta 10 1.5
# … with 30 more rows
30/52
Starting a plot
ggplot(food, aes(x = observation_time,
y = item_price_change)) +
geom_line()
31/52
Using group in plots
You can use group element in a mapping to indicate that each item_ID will have
a separate price line.
ggplot(food, aes(x = observation_time,
y = item_price_change,
group = item_ID)) +
geom_line()
32/52
Adding color will automatically group the data
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line()
33/52
Adding color will automatically group the data
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_categ)) +
geom_line()
34/52
Sometimes you need group and color
ggplot(food, aes(x = observation_time,
y = item_price_change,
group = item_ID,
color = item_categ)) +
geom_line()
35/52
Adding a facet can help make it easier to see what is happening
Two options: facet_grid()- creates a grid shape facet_wrap() -more flexible
Need to specify how you are faceting with the ~ sign.
ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line() +
facet_grid( ~ item_categ)
36/52
facet_wrap()
· more flexible - arguments ncol and nrow can specify layout
· can have different scales for axes using scales = "free_x", scales =
"free_y", or scales = "free"
rp_fac_plot <- ggplot(food, aes(x = observation_time,
y = item_price_change,
color = item_ID)) +
geom_line() +
geom_point() +
facet_wrap( ~ item_categ, ncol = 1, scales = "free")
rp_fac_plot
37/52
Tips - Color vs Fill
NOTE: color is needed for points and lines, fill generally needed for boxes and
bars
ggplot(food, aes(x = item_ID,
y = item_price_change,
color = item_categ)) +
geom_boxplot()
38/52
Tips - Color vs Fill
NOTE: color is needed for points and lines, fill generally needed for boxes and
bars
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot()
39/52
Tips - plus sign + can’t come at start of a new line
This will not work! Also don’t use pipes instead of +!
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ))
+ geom_boxplot()
40/52
Tip - Good idea to add jitter to top of box plots
Can add width argument to make the jitter more narrow.
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot() +
geom_jitter(width = .06)
41/52
Tip - be careful about colors for colorblindess
scale_fill_viridis_d() for discrete /categorical data
scale_fill_viridis_c() for continuous data
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_boxplot() +
geom_jitter(width = .06) +
scale_fill_viridis_d()
42/52
Tip - can pipe data after wrangling into ggplot()
food_bar <-food %>%
group_by(item_categ) %>%
summarize("max_price_change" = max(item_price_change)) %>%
ggplot(aes(x = item_categ,
y = max_price_change,
fill = item_categ)) +
scale_fill_viridis_d()+
geom_col() +
theme([Link] = "none")
food_bar
43/52
Tip - color outside of aes()
Can be used to add an outline around column/bar plots.
food_bar +
geom_col(color = "black")
44/52
Tip - col vs bar
geom_bar() can only one aes mapping & geom_col() can have two
👀May not be plotting what you think you are…
ggplot(food, aes(x = item_ID,
y = item_price_change,
fill = item_categ)) +
geom_col()
45/52
What did we plot?
head(food)
# A tibble: 6 × 4
item_ID item_categ observation_time item_price_change
<chr> <chr> <int> <dbl>
1 ID_1 pasta 1 0.5
2 ID_1 pasta 2 2.5
3 ID_1 pasta 3 1.5
4 ID_1 pasta 4 0.5
5 ID_1 pasta 5 0.5
6 ID_1 pasta 6 0.5
food %>% group_by(item_ID) %>%
summarize(sum = sum(item_price_change))
# A tibble: 4 × 2
item_ID sum
<chr> <dbl>
1 ID_1 10
2 ID_2 5
3 ID_3 32
4 ID_4 75
46/52
Tip - make sure labels aren’t too small
food_bar +
theme(text = element_text(size = 20))
47/52
Extensions
directlabels package
Great for adding labels directly onto plots
[Link]
#[Link]("directlabels")
library(directlabels)
[Link](rp_fac_plot, method = list("[Link]"))
49/52
plotly
#[Link]("plotly")
library("plotly")
ggplotly(rp_fac_plot)
pasta
2.5
item_ID
2.0
ID_1
1.5
ID_2
1.0
ID_3
item_price_change
0.5
0.0 ID_4
2.5 5.0 7.5 10.0
rice
2
2.5 5.0 7.5 10.0
observation_time
Also check out the ggiraph package
50/52
Saving a ggplot to file
A few options:
· RStudio > Plots > Export > Save as image / Save as PDF
· RStudio > Plots > Zoom > [right mouse click on the plot] > Save image as
· In the code
ggsave(filename = "saved_plot.png", # will save in working directory
plot = rp_fac_plot,
width = 6, height = 3.5) # by default in inch
51/52
Lab 2
Lab document
52/52