Enhancing ggplot2 Visualizations
Enhancing ggplot2 Visualizations
A ggplot2 Primer
Ehssan Ghashim1 , Patrick Boily1,2,3,4
Abstract
R has become one of the world’s leading languages for statistical and data analysis. While the base R
installation does suppor simple visualizations, its plots are rarely of high-enough quality for publication.
Enter Hadley Wickam’s ggplot2, an aesthetically and logical approach to data visualization. In this short
report, we introduce gglot2’s graphic grammar elements, and present a number of examples.
Keywords
R, ggplot2, data visualization
1
Centre for Quantitative Analysis and Decision Support, Carleton University, Ottawa
2
Sprott School of Business, Carleton University, Ottawa
3
Department of Mathematics and Statistics, University of Ottawa, Ottawa
4
Idlewyld Analytics and Consulting Services, Wakefield, Canada
Email: [Link]@[Link]
3. Basics of ggplot2 Grammar A basic display call contains the following elements:
Let’s look at some illustrative ggplot2 code: ggplot(): start an object and specify the data
geom_point(): we want a scatter plot; this is
library("ggplot2") called a “geom”
theme_set(theme_bw()) # use the black aes(): specifies the “aesthetic” elements; a legend
and white theme throughout is automatically created
# artificial data: facet_grid(): specifies the “faceting” or panel
d <- [Link](x = c(1:8, 1:8), y = layout
runif(16),
group1 = rep(gl(2, 4, labels = c("a", Other components include statistics, scales, and annotation
"b")), 2), options. At a bare minimum, charts require a dataset, some
group2 = gl(2, 8)) aesthetics, and a geom, combined, as above, with “+” sym-
head(d) bols! This non-standard approach has the advantage of
## R output
allowing ggplot2 plots to be proper R objects, which can
## x y group1 group2
## 1 1 0.8683116 a 1 modified, inspected, and re-used.
## 2 2 0.1934542 a 1 ggplot2’s main plotting functions are qplot() and
## 3 3 0.1131743 a 1 ggplot(); qplot() is short for “quick plot” and is
## 4 4 0.9260514 a 1 meant to mimic the format of base R’s plot(); it requires
## 5 5 0.9476787 b 1 less syntax for many common tasks, but has limitations – it’s
## 6 6 0.2949107 b 1 essentially a wrapper for ggplot(), which is not itself
ggplot(data = d) + geom_point(aes(x, y, that complicated to use.
colour = group1)) +
facet_grid(~group2) We will focus on this latter function.
4. Specifying Plot Types with geoms Note that only the x variable (height) was specified when
creating the histogram, but that both the x (voice part) and
Whereas ggplot() specifies the data source and vari- the y (height) variables were specified for the box plot –
ables to be plotted, the various geom functions specify indeed, geom_histogram() defaults to counts on the
how these variables are to be visually represented (using y−axis when no y variable is specified (each function’s
points, bars, lines, and shaded regions). There are currently documentation contains details and additional examples,
37 available geoms. Table 1 lists the more common ones, but there’s a lot of value to be found in playing around with
along with frequently used options (most of the graphs data in order to determine their behaviour).
shown in this report can be created using those geoms). Let’s examine the use of some of these options using the
For example, the next bit of code produces a histogram Salaries dataset (from package “car”). The dataframe
of the heights of singers in the 1979 edition of the New contains information on the salaries of university professors
York Choral Society (Figure 4), and a display of height by collected during the 2008–2009 academic year. Variables
voice part for the same data (Figure 5). include rank (AsstProf, AssocProf, Prof), sex (Female, Male),
[Link] (years since Ph.D.), [Link] (years of ser-
library("ggplot2") vice), and salary (nine-month salary in dollars). The next
data(singer, package="lattice")
code produces the plot in Figure 3.
ggplot(singer, aes(x=height)) +
geom_histogram() data(Salaries, package="car")
ggplot(singer, aes(x=[Link], library(ggplot2)
y=height)) + geom_boxplot() ggplot(Salaries, aes(x=rank, y=salary)) +
geom_boxplot(fill="cornflowerblue",
From Figure 5, it appears that basses tend to be taller and color="black", notch=TRUE)+
sopranos tend to be shorter. Although the singers’ gender geom_point(position="jitter",
was not recorded, it probably accounts for much of the color="blue", alpha=.5)+
variation seen in the diagram. geom_rug(side="l", color="black")
Figure 3. Notched box plots with superimposed points describing the salaries of college professors by rank. A rug plot is
provided on the vertical axis.
Figure 4. Histogram of singer heights Figure 5. Box plot of singer heights by voice part
Figure 6. Visualizations of the mpg dataset – with aes() on the left, without on the right.
Figure 7. Faceted graph showing the distribution (histogram) of singer heights by voice part
Figure 8. Scatterplot of years since graduation and salary. Academic rank is represented by color and shape, and sex is
faceted.
9. Tidy Data: Getting Data into the Right Notice that while there were originally seven columns, there
Format are now only three: Var1, Var2, and value; Var1
represents the year, Var2 the continents, and value the
ggplot2 is compatible with what is generally referred to as number of phones. Every data cell – every observation –
the tidyverse [22]. Social scientists will likely be familiar every number of phones per year per continent – in the
with the distinction between data in wide format and in original dataset now has its own row in the melted dataset.
long format: In 1951, in North America, for instance, there were
45,939,000 phones, which is the same value as in the origi-
in a long format table, every column represents a
nal unmelted data – the data has not changed, it just got
variables, and every row an observation,
reshaped.
whereas in a wide format table, some variables are
spread out across columns, perhaps along some other
Changing the column names might make the data more
characteristic such as the year, say.
intuitive to read:
The plots that have been produced so far were simple to colnames(WorldPhones.m) = c("Year",
create because the data points were given in the format "Continent", "Phones")
of one observation per row which we call a "tall" format. head(WorldPhones.m)
But many datasets come in a "wide"" format, i.e. there is
more than one observation – more than one point on the ## Year Continent Phones
scatterplot – in each row. ## 1 1951 [Link] 45939
Consider, for instance, the WorldPhones dataset, ## 2 1956 [Link] 60423
one of R’s built-in dataset: ## 3 1957 [Link] 64721
## ...
data("WorldPhones")
Now that the data has been melted into a tall dataset, it is
This dataset records the number of telephones, in thou- easy to create a plot with ggplot2, with the usual steps of a
sands, on each continent for several years in the 1950s (see ggplot() call, but with WorldPhones.m instead of
Table 2). WorldPhones:
Each column represents a different continent, and each
ggplot(WorldPhones.m, aes(x=Year,
row represents a different year. This wide format seems like
y=Phones, color=Continent)) +
a reasonable way to store data, but suppose that we want geom_point()
to compare increases in phone usage between continents,
with time on the horizontal axis. In that case, each point on
We place the Year on the x−axis, in order to see how the
the plot is going to represent a continent during one year –
numbers change over time, while the number of Phones
there are seven observations in each row, which makes it
(the variable of interest) is displayed on the y−axis. The
very difficult to plot using ggplot2.
Continent factor will be represented with colour. A
Fortunately, the tidyvers provides an easy way to con-
scatterplot is obtained by adding a geom_point() layer.
vert this wide dataset into a tall dataset, by melting the
Scatterplots can also be used to show trends over time,
data. This can be achieved by loading a thrid-party package
by drawing lines between points for each continent. This
called reshape2. The WorldPhones dataset can now be
only require a change to a geom_line() layer.
melted from a wide to a tall dataset with the melt() func-
tion. Let’s assign the new, melted data to an object called ggplot(WorldPhones.m, aes(x=Year,
WorldPhones.m, where the m reminds us that the data y=Phones, color=Continent)) +
has been melted. geom_line()
library(reshape2)
The result is shown in Figure 11. Incidentally, one might
WorldPhones.m = melt(WorldPhones)
expect the number of phones to increase exponentially over
time, rather than linearly (a fair number of observations
The new, melted data looks like:
are clustered at the bottom of the chart).
head(WorldPhones.m) When that’s the case, it’s a good idea to plot the vertical
axis on a log scale. This can be done adding a logarithm
## Var1 Var2 value scale to the chart.
## 1 1951 [Link] 45939
## 2 1956 [Link] 60423 ggplot(WorldPhones.m, aes(x=Year,
## 3 1957 [Link] 64721 y=Phones, color=Continent)) +
## ... geom_line() + scale_y_log10()
Now each of the phone trends looks linear, and the lower Within RStudio, an alternative is to click on Export, then
values are spotted more easily; for example, it is now clear “Save Plot As Image” to open a GUI.
that Africa has overtaken Central America by 1956 (see
Figure 12).
11. Summary
Notice how easy it was to build this plot once the data
was in the tall format: one row for every point – that’s every The first 10 sections reviewed the ggplot2 package, which
combination of year and continent – on the graph. provides advanced graphical methods based on a compre-
hensive grammar of graphics. The package is designed
10. Saving Graphs to provide the use with a complete and comprehensive
alternative to the native graphics provided with R. It of-
Plots might look great on the screen, but they typically have fers methods for creating attractive and meaningful data
to be embedded in other documents (Markdown, LATEX, visualizations that are difficult to generate in other ways.
Word, etc.). In order to do so, they must first be saved It does come with some drawbacks, however: the gg-
in an appropriate format, with a specific resolution and plot2 and tidyverse design teams have fairly strong opinions
size. Default size settings can be saved within the .Rmd about how data should be visualized and processed. As a
document by declaring them in the first chunk of code. For result, it can sometimes be difficult to produce charts that
instance, this would tell knitr to produce 8 in. × 5 in. go against their design ideals. In the same vein, the various
charts: package updates do not always preserve the functionality
of working code, sending the analysts scurrying to figure
knitr::opts_chunk$set([Link]=8, how the new functions work, which can cause problems
[Link]=5) with legacy code. Still, the versatility and overall simplicity
of ggplot2 cannot be overstated.
A convenience function named ggsave() can be partic- A list of all ggplot2 functions, along with examples, can
ularly useful. Options include which plot to save, where to be found at [Link] The theory underly-
save it, and in what format. For example, ing ggplot2 is explained in great deatil in [2]; useful exam-
myplot <- ggplot(data=mtcars, ples and starting points can also be found in [1, 5].
aes(x=mpg)) + geom_histogram()
ggsave(file="[Link]", plot=myplot, The ggplot2 action flow is always the same: start with data
width=5, height=4) in a table, map the display variables to various aesthetics
(position, colour, shape, etc.), and select one or more geoms
saves the myplot object as a 5-inch by 4-inch .png file to draw the graph. This is accomplished in the code by
named [Link] in the current working directory. first creating an object with the basic data and mappings
The available formats include .ps, .tex, .jpeg, .pdf, information, and then by adding or layering additional
.jpg, .tiff, .png, .bmp, .svg, or .wmf (the latter information as needed.
only being available on Windows machines). Once this general way of thinking about plots is under-
Without the plot= option, the most recently created stood (especially the aesthetic mapping part), the drawing
graph is saved. The following code, for instance, the follow- process is simplified significantly. There is no need to think
ing bit of code would also save the mtcars plot (the latest about how to draw particular shapes or colours in the chart;
plot) to the current working directory (see the ggsave() the many (self-explanatory) geom_ functions do all the
helf file for additional details): heavy lifting.
Similarly, learning how to use new geoms is easier
ggplot(data=mtcars, aes(x=mpg)) + when they are viewed as ways to display specific aesthetic
geom_histogram() mappings.
ggsave(file="[Link]")
Figure 11. WorldPhones.m plots, using geom_points() (above) and geom_lines() (below).
email_campaign_funnel.csv") ggfortify
# X Axis Breaks and Labels theme([Link] = element_text(hjust =
brks <- seq(-15000000, 15000000, 5000000) .5), [Link] = element_blank()) +
lbls = paste0([Link](c(seq(15, 0, # Centre plot title
-5), seq(5, 15, 5))), "m") scale_fill_brewer(palette = "Dark2")
# Colour palette
# Plot
ggplot(email_campaign_funnel, aes(x =
Stage, y = Users, fill = Gender)) +
# Fill column Example 9 (Calendar Heatmap)
geom_bar(stat = "identity", width = .6) The calendar heat map is a great tool to see the daily varia-
+ # draw the bars tion (especially the highs and lows) of a variable like stock
scale_y_continuous(breaks = brks, # price, as it emphasizes the variation over time rather than
Breaks the actual value itself. It can (with a fair amount of data
labels = lbls) + # Labels preparation) be produced with geom_tile.
coord_flip() + # Flip axes
labs(title="Email Campaign Funnel") + # [Link]
theme_tufte() + # Tufte theme from [Link]
Figure 22. Population pyramid for the email campaign funnel dataset.
Figure 27. Graph of Mad Men characters who are linked by a romantic relationship.
Example 16 (Time Series Plot For a Monthly Time Series) scale_x_date(labels = lbls,
In order to select specific breaks on the x−axis, consider the breaks = brks) + # change to
functionality offered by scale_x_date() (plot shown monthly ticks and labels
in Figure 30). theme([Link].x = element_text(angle
= 90, vjust=0.5), # rotate x axis
library(ggplot2) text
library(lubridate) [Link] =
theme_set(theme_bw()) element_blank()) # turn off
minor grid
economics_m <- economics[1:24, ]
# labels and breaks for X axis text Example 18 (Time Series Plot From Long Data Format)
lbls <-
paste0([Link][month(economics_m$date) In this example, we construct the plot from a long data
], " ", format (i.e. the column names and respective values of all
lubridate::year(economics_m$date)) the columns are stacked in only 2 variables – variable
brks <- economics_m$date and value, respectively). In the wide format, the data
would takee the appearance of the economics dataset.
# plot Below, the geom_line objects are drawn using value
ggplot(economics_m, aes(x=date)) + and aes(col) is set to variable. In this way, multiple
geom_line(aes(y=pce)) + coloured lines are plotted (one for each unique variable
labs(title="Monthly Time Series",
level) with a single call; scale_x_date() changes the
subtitle="Personal consumption
x−axis breaks and labels, while the line colours are changed
expenditures, in billions of
dollars", by scale_color_manual.
caption="Source: Economics",
data(economics_long, package = "ggplot2")
y="pce") + # title and caption
# head(economics_long)
scale_x_date(labels = lbls,
library(ggplot2)
breaks = brks) + # change to
library(lubridate)
monthly ticks and labels
theme_set(theme_bw())
theme([Link].x = element_text(angle
df <-
= 90, vjust=0.5), # rotate x axis
economics_long[economics_long$variable
text
%in% c("psavert", "uempmed"), ]
[Link] =
df <- df[lubridate::year(df$date) %in%
element_blank()) # turn off
c(1967:1981), ]
minor grid
# labels and breaks for X axis text
brks <- df$date[seq(1, length(df$date),
Example 17 (Time Series Plot For a Yearly Time Series) 12)]
Here’s the same, but with a yearly breakdown (plot shown lbls <- lubridate::year(brks)
in Figure 31).
# plot
library(ggplot2) ggplot(df, aes(x=date)) +
library(lubridate) geom_line(aes(y=value, col=variable)) +
theme_set(theme_bw()) labs(title="Time Series of Returns
Percentage",
economics_y <- economics[1:90, ] subtitle="Drawn from Long Data
format",
# labels and breaks for X axis text caption="Source: Economics",
brks <- economics_y$date[seq(1, y="Returns %",
length(economics_y$date), 12)] color=NULL) + # title and caption
lbls <- lubridate::year(brks) scale_x_date(labels = lbls, breaks =
brks) + # change to monthly ticks
# plot and labels
ggplot(economics_y, aes(x=date)) + scale_color_manual(labels =
geom_line(aes(y=psavert)) + c("psavert", "uempmed"),
labs(title="Yearly Time Series", values =
subtitle="Personal savings rate", c("psavert"="#00ba38",
caption="Source: Economics", "uempmed"="#f8766d"))
y="psavert") + # title and caption + # line color
the contribution from individual components, the area has df <- economics[, c("date", "psavert",
to be stacked on top of the previous component, rather than "uempmed")]
relative to the floor of the plot. All the bottom layers have df <- df[lubridate::year(df$date) %in%
to be added to the y value of a new area. c(1967:1981), ]
In the example below (and in Figure 33), the top layer # labels and breaks for X axis text
brks <- df$date[seq(1, length(df$date),
is y=psavert+uempmed. However nice the plot might
12)]
look, keep in mind that it can be difficult to interpret. lbls <- lubridate::year(brks)
# plot
library(ggplot2)
ggplot(df, aes(x=date)) +
library(lubridate)
geom_area(aes(y=psavert+uempmed,
theme_set(theme_bw())
Figure 32. Time series from long data format for the economics dataset.
fill="psavert")) + scale_fill_manual(name="",
geom_area(aes(y=uempmed, values =
fill="uempmed")) + c("psavert"="#00ba38",
labs(title="Area Chart of Returns "uempmed"="#f8766d"))
Percentage", + # line color
subtitle="From Wide Data format", theme([Link] =
caption="Source: Economics", element_blank()) # turn off minor
y="Returns %") + # title and caption grid
scale_x_date(labels = lbls, breaks =
brks) + # change to monthly ticks
and labels
Figure 33. Stacked area chart for the economics dataset (green area is the sum of psavert and uempmed).
library(reshape2)
Example 21 (Parallel Coordinate Plots) library(ggplot2)
Parallel coordinate plots are useful to visualize multivariate # create long format
data. As a practical example, assume that a survey has been
df_pcp <- melt(df_grouped, [Link] =
c(’id’, ’freq’))
conducted, with a variety of questions. Each question is
df_pcp$value <- factor(df_pcp$value)
asked three times – in a different context – and is answered
on a discrete scale from 1 to 7. Consequently, each question
We can then specify what levels should be drawn on the
has three “dimensions”. The distribution of answers across
y−axis (1 to 7). In the ggplot() function we define an
the three dimensions should be displayed for each question.
aesthetic that uses the “variable” column for the x−axis and
Because the three dimensions have the same unit and scale,
the “value” column for the y−axis. We also specify that the
they can easily be compared on parallel coordinates (it
values should be grouped by using the id column. This is
would be possible to display more than three dimensions,
required, as the connections between the three dimensions
of course).
won’t be drawn otherwise. We use geom_path() to
library(triangle) draw the connection lines and make the width and colour
[Link](0) of the connection dependent on the n and id columns,
q1_d1 <- round(rtriangle(1000, 1, 7, 5)) respectively,
q1_d2 <- round(rtriangle(1000, 1, 7, 6))
q1_d3 <- round(rtriangle(1000, 1, 7, 2)) y_levels <- levels(factor(1:7))
df <- [Link](q1_d1 = factor(q1_d1), ggplot(df_pcp, aes(x = variable, y =
q1_d2 = factor(q1_d2), q1_d3 = value, group = id)) + # group = id
factor(q1_d3)) is important!
geom_path(aes(size = freq, color = id),
alpha = 0.5,
We are using the triangular distribution to get random inte-
lineend = ’round’, linejoin =
gers r ∈ [1, 7], around a different mode c for each dimen- ’round’) +
sion (5, 6 and 2). To plot the main “answer paths” (i.e. the scale_y_discrete(limits = y_levels,
most frequent answer combination across the three dimen- expand = c(0.5, 0)) +
sions), we need to group by all dimensions, and then to scale_size(breaks = NULL, range = c(1,
count the frequency of each unique answer combinations. 7))
This can be done with the dplyr package.
library(dplyr)
Figure 34. Seasonal plots for the AirPassengers dataset (above) and the nottem dataset (below).
ggsave(file="[Link]",
plot=clustering, width=5, height=4)
Figure 36. Clusters in the iris dataset, projected on the first 2 principal components.
Figure 38. Slope chart for the cancer survival rates dataset.
Figure 40. Density plot for the mpg dataset; simultaneous (top), faceted (bottom).
Figure 44. Word cloud for The Green Mile (English subtitles).
Figure 45. Bar chart for The Green Mile (English subtitles).
References
[1]
Chang, W. [2013], R Graphics Cookbook, O’Reilly.
[2]
Wickham, H. [2009], ggplot2: Elegant Graphics for
Data Analysis, Springer.
[3]
Wickham, H. [2009], A Layered Grammar of Graph-
ics. Journal of Computational and Graphical Statistics
19:3–28.
[4]
Horton, N.J., Kleinman, K. [2016], Using R and RStudio
for Data Management, Statistical Analysis, and Graphics,
2nd ed., CRC Press.
[5]
Healey, K. [2018], Data Visualization: A Practical
Introduction.
[6]
Kabacoff, R.I. [2011], R in Action, Second Edition: Data
analysis and graphics with R, Live.
[7]
Maindonald, J.H. [2008], Using R for Data Analysis
and Graphics Introduction, Code and Commentary.
[8]
Tyner, S., Briatte, F., Hofmann, H. [2017], Network
Visualization with ggplot2, The R Journal, vol. 9(1).
[9]
Broman, K. [2016], Data Visualization with ggplot2.
[10]
ggplot2 Extensions.
[11]
R Graph Gallery.
[12]
Anderson, S.C. [2015], An Introduction to ggplot2.
[13]
Prabhakaran, S., Top-50 ggplot2 Visualization (with
Master List R Code).
[14]
Konrad, M. [2016], Parallel Coordinate Plots for Dis-
crete and Categorical Data in R: A Comparison.
[15]
Wilkins, D., treemapify github repository.
[16]
Wilkins, D., treemapify R package (v. 0.2.1).
[17]
STHDA, Beautiful Dendrogram Visualizations in R:
5+ must-know methods - Unsupervised Machine
Learning.
[18]
Text Mining and Word Cloud Fundamentals in R: 5
simple steps you should know, on Easy Guides.
[19]
Harvard tutorial notes, R graphics with ggplot2 work-
shop.
[20]
Robinson, D., Visualizing Data Using ggplot2, on
[Link].
[21]
Manipulating, analyzing and exporting data with
tidyverse, on [Link].
[22]
Wickham, H. [2014], Tidy Data, Journal of Statistical
Software, v59, n10.
In ggplot2, aesthetics map data onto visual attributes such as size, shape, color, and more, generating an appropriate legend to enhance data visualization. These attributes are set using the aes() function. When aesthetics are specified inside the aes() function, they are mapped directly to the data, allowing for dynamic visual changes based on dataset values. Conversely, specifying them outside aes() applies static values that won't change across data points. For example, specifying color inside aes() maps different colors to variable values, whereas specifying it outside results in a uniform color across points .
grid.arrange from the gridExtra package is used in ggplot2 to display multiple independent plots on a single canvas, without the requirements or constraints of shared variables that faceting demands. Unlike faceting, which arranges plots based on one or more categorical variables into arrays, grid.arrange allows for completely independent graphs to be assembled into a single cohesive visual ensemble. This approach permits greater flexibility in arranging plots with different datasets or varying plot types into a customized layout, as illustrated by combining multiple ggplot2 charts such as bar plots and scatter plots in a single figure .
In ggplot2, notches in box plots are used to visually assess the statistical significance of differences between medians of different groups. They provide a formal comparison indicating whether the medians of two groups are significantly different: if the notches of two boxes do not overlap, this suggests that the medians are significantly different from each other. In the example from the "Salaries" dataset, the use of notches on the box plots of salaries by academic rank helps validate the signification of differences in salaries, as the absence of overlap in the notches of assistant, associate, and full professors' box plots indicates significant differences between their median salaries .
The primary difference between facet_wrap and facet_grid in ggplot2 lies in their layout and configuration options. facet_wrap lays plots into a sequence, wrapping to new rows or columns—ideal for visualizing data across a single dimension or when flexibility in the number of displayed columns or rows is desired. facet_grid organizes plots into a fixed grid format involving two dimensions (row and column)—suitable for structured comparisons requiring explicit alignment of data categories. facet_wrap is preferred when displaying many levels of a single category seamlessly, whereas facet_grid excels when observing interactions between two categorical variables .
The gridExtra package in R enhances the organization of multiple ggplot2 plots by providing the grid.arrange function, which allows multiple plots created independently to be displayed together within a single graphic output. This approach is more flexible than the base R methods like mfrow or layout, which are limited to base graphics. gridExtra accommodates varying plot types and layouts with ease, allowing diverse ggplot2 charts to coexist within one figure and maintain individual plot characteristics. This contrasts with standard R plotting which requires uniformity and is less suited to the unique, intricate designs possible with ggplot2 .
The aes() function in ggplot2 enhances plotting flexibility by allowing users to map variables to various aesthetic attributes such as color, size, and shape. This mapping is driven by the data itself, enabling dynamic visualization where plot components automatically adjust based on variable values. This data-driven approach offers adaptable visuals, as changing input data automatically updates the visual representation, making it powerful for exploring trends and relationships. For instance, applying aes(color=class) to a scatter plot groups data by class with automatically assigned colors, promoting clearer distinction and pattern recognition among data groups .
Parallel coordinate plots in ggplot2 enable visualization of multivariate data by drawing lines connecting points across multiple dimensions, each represented as a parallel axis. This facilitates identifying patterns, correlations, or clusters within datasets, as each line represents an observation across all dimensions, and their convergence or divergence highlights relationships. A practical example includes survey analysis where responses are measured across different conditions or factors. This method allows analysts to visualize how responses to a set of related questions vary, facilitating insights into underlying patterns or correlations, such as dimensional responses over a standardized scale from 1 to 7 .
Stacked area charts in ggplot2 represent cumulative values over time, showing how individual components contribute to the total, which benefits visualization by highlighting cumulative growth, distribution, or comparative volumes over time. This chart type is useful for demonstrating changes in composition among categories. However, interpreting these charts can be challenging due to occlusion of data and overlapping areas, which may obscure underlying trends. As some categories stack at varying baselines, this can also complicate accurate comparisons. For example, the stacked area chart with psavert and uempmed requires attention to avoid misinterpretation of the layers' contributions and changes .
Faceting functions like facet_wrap and facet_grid in ggplot2 enable the plotting of subsets of data into separate panels, facilitating comparison across multiple dimensions. facet_wrap creates a layout where panels are placed in a sequence—wrapping to the next row or column, which is useful when dealing with a single categorical variable. facet_grid, however, lays out panels into a grid format based on both row and column variables. In the choral example from the source, facet_wrap(~voice.part, nrow=4) creates plots for the height distribution of singers by voice part, allowing easy comparison of height variations among different voice parts .
Themes in ggplot2 play a crucial role in controlling the overall aesthetic appearance of plots to improve visual presentation and readability. Users can alter a variety of plot elements, such as the plot title, axis titles, axis text, background, gridlines, and more. Themes can be customized for a single plot or stored and applied across multiple plots to maintain a consistent style. For instance, the theme() function can change the plot and axis titles' typography, colors, and size, while modifying element_blank() can toggle gridlines on or off, as shown in customizing time series plots with a clean, minimalist look using theme_classic().