0% found this document useful (0 votes)
3 views37 pages

Data Visualization Using Ggplot

The document provides an overview of data visualization using the ggplot2 package in R, detailing its components such as data layers, aesthetics, geometric layers, facets, statistics, themes, and coordinates. It also includes practical examples using the mtcars dataset to demonstrate how to create various types of plots, including scatter plots and line charts, while explaining the significance of each layer in the visualization process. Additionally, it covers how to export visualizations for use in presentations.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views37 pages

Data Visualization Using Ggplot

The document provides an overview of data visualization using the ggplot2 package in R, detailing its components such as data layers, aesthetics, geometric layers, facets, statistics, themes, and coordinates. It also includes practical examples using the mtcars dataset to demonstrate how to create various types of plots, including scatter plots and line charts, while explaining the significance of each layer in the visualization process. Additionally, it covers how to export visualizations for use in presentations.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Visualization Using

ggplot
Data visualization
• Data visualization with R and ggplot2 in R Programming Language also
termed as Grammar of Graphics is a free, open-source, and easy-to-use
visualization package widely used in R Programming Language.
• It is the most powerful visualization package written by Hadley Wickham.
• It includes several layers on which it is governed. The layers are as follows:
• Building Blocks of layers with the grammar of graphics
1. Data: The element is the data set itself
2. Aesthetics: The data is to map onto the Aesthetics attributes (They define
how the data is visually represented, such as xaxis, y-axis, color, fill, size,
labels, alpha, shape, line width, line type)
3. Geometrics: How our data being displayed using point, line, histogram,
bar, boxplot
Data visualization
• Facets: It displays the subset of the data using Columns and rows (A way to
create small multiples, or separate plots for different subsets of the data)
• Statistics: Binning, smoothing, descriptive, intermediate (that processes
raw data into a summarized format before it is visualized on a plot)
• Coordinates: the space between data and display using Cartesian, fixed,
polar, limits (Scales: Control the mapping between data values and
aesthetic properties (e.g., what range the x-axis covers or which colors are
used).
• Themes: Non-data link (Control the non-data aesthetic aspects of the plot,
such as background color, font size, and legend position)
Dataset Used

• Dataset Used
• mtcars(motor trend car road test) comprise fuel consumption and 10
aspects of
• automobile design and performance for 32 automobiles and come
pre-installed
• with dplyr package in R.
•R
• # Print the top 5 records of the dataset
• head(mtcars)
• # Installing the package
• [Link]("dplyr")
• # Loading package
• library(dplyr)
• # Summary of dataset in package
• summary(mtcars)
Data Layer
• ggplot2 in R
• We devise visualizations on mtcars dataset which includes 32 car brands
and 11 attributes using ggplot2 layers.
• Data Layer:
• ggplot2 in R the data Layer we define the source of the information to be
• visualize, let’s use the mtcars dataset in the ggplot2 package.
•R
• library(ggplot2)
• library(dplyr)
• ggplot(data = mtcars) + labs(title = "MTCars Data Plot")
Aesthetic Layer

• Aesthetic Layer
• ggplot2 in R Here we will display and map dataset into certain
aesthetics.
•R
• # Aesthetic Layer
• ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp))+ labs(title =
"MTCars Data Plot")
aes: Short for "aesthetics", this function binds variables in your data to visual properties
(aesthetics) of the plot.
x = hp: The variable hp (horsepower) is mapped to the x-axis position.
y = mpg: The variable mpg (miles per gallon) is mapped to the y-axis position.
col = disp: The variable disp (displacement) is mapped to the color of the plot elements (e.g., points
in a scatter plot).

This mapping means that the plot will:


Use hp values to determine the horizontal position of points (or other geometric objects).
Use mpg values to determine the vertical position.
Color the elements based on their corresponding disp values, using a continuous color gradient
(since disp is typically a continuous numeric variable).
To produce an actual plot, this aes() mapping must be combined with a geometric object function,
such as geom_point(), to specify the type of plot:
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) + geom_point()
This code creates a scatter plot where each point's color changes according to its disp value.
Geometric layer

• Geometric layer:
• ggplot2 in R geometric layer control the essential elements, see how
our data being displayed using point, line, histogram, bar, boxplot.
• # Geometric layer
• ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp)) + geom_point()
+ labs(title = "Miles per Gallon vs Horsepower", x = "Horsepower", y =
"Miles per Gallon”)
Geometric layer
• Geometric layer: Adding Size, color, and shape and then plotting the
Histogram
• plot
R
• # Adding size
• ggplot(data = mtcars, aes(x = hp, y = mpg, size = disp)) +
• geom_point() +
• labs(title = "Miles per Gallon vs Horsepower",
• x = "Horsepower",
• y = "Miles per Gallon")
Geometric layer
• # Adding shape and color
• ggplot(data = mtcars, aes(x = hp, y = mpg, col = factor(cyl), shape =
factor(am))) +geom_point() + labs(title = "Miles per Gallon vs x =
"Horsepower", y = "Miles per Gallon")
• # Histogram plot
• ggplot(data = mtcars, aes(x = hp)) + geom_histogram(binwidth = 5) +
• labs(title = "Histogram of Horsepower", x = "Horsepower", y =
"Count")
aes(x = hp, y = mpg, col = factor(cyl), shape = factor(am)): The aes() function defines the aesthetic
mappings, which link variables in the data to visual properties of the plot.
x = hp: Maps the hp (Horsepower) variable to the x-axis.
y = mpg: Maps the mpg (Miles per Gallon) variable to the y-axis.
col = factor(cyl): Maps the cyl (number of cylinders) variable to the color of the points, treating it as a
categorical variable (factor()).
shape = factor(am): Maps the am (transmission type: 0 for automatic, 1 for manual) variable to the
shape of the points, also treating it as a categorical variable.
+: This operator is used to add new layers or components to the existing plot object.
geom_point(): This geometry function adds a layer of points (a scatter plot) to the visualization, using
the aesthetic mappings defined in the ggplot() call.
labs(title = "Miles per Gallon vs Horsepower", x = "Horsepower", y = "Miles per Gallon"): This function
customizes the labels of the plot for better readability.
title = "Miles per Gallon vs Horsepower": Sets the main title of the plot.
x = "Horsepower": Sets the label for the x-axis.
y = "Miles per Gallon": Sets the label for the y-axis.
The resulting plot will be a scatter plot that displays how miles per gallon (mpg) relates to horsepower
(hp), with different colors for the number of cylinders and different shapes for the transmission type,
all in one graphic.
ggplot(data = mtcars, aes(x = hp)): This initializes the plot and defines the
basic data and aesthetic mappings.
data = mtcars: Specifies that the mtcars data frame will be used for the plot.
aes(x = hp): Maps the hp (horsepower) column to the x-axis aesthetic.
+ geom_histogram(binwidth = 5): This adds a layer that defines the geometric
object to be a histogram.
geom_histogram(): Instructs ggplot to display the data as a histogram, which
represents the distribution of a continuous variable.
binwidth = 5: Sets the width of each bar (bin) to 5 units of horsepower. This
controls the granularity of the distribution visualization.
Facet Layer
ggplot2 in R facet layer is used to split the data up into subsets of the entire dataset and it allows the
subsets to be visualized on the same plot. Here we separate rows according to transmission type and
Separate columns according to
cylinders.
# Facet Layer
# Separate rows according to transmission type
p <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) + geom_point()
p + facet_grid(am ~ .) + labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon")
# Separate columns according to cylinders
p <- ggplot(data = mtcars, aes(x = hp, y = mpg, shape = factor(cyl))) +
geom_point()
p + facet_grid(. ~ cyl) +
labs(title = "Miles per Gallon vs Horsepower",
x = "Horsepower",
y = "Miles per Gallon”)
Statistics layer

• Statistics layer
• ggplot2 in R this layer, we transform our data using binning,
smoothing, descriptive, intermediate
ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point() +
stat_smooth(method = lm, col = "red") + labs(title = "Miles per Gallon
vs Horsepower")
ggplot(data = mtcars, aes(x = hp, y = mpg)): This initializes the plot, specifying the mtcars data
frame to use. The aes() function (aesthetics) maps the hp (horsepower) variable to the x-axis
and the mpg (miles per gallon) variable to the y-axis.
+ geom_point(): This adds a layer of points to the plot, creating a scatter plot of the data.
Each point on the graph represents one car from the dataset.
+ stat_smooth(method = lm, col = "red"): This adds a smoothed line (and a confidence
interval by default) to the scatter plot.
method = lm specifies that a linear model (lm) should be used for the smoothing.
col = "red" sets the color of the fitted line to red.
+ labs(title = "Miles per Gallon vs Horsepower"): This adds a title to the plot for better
interpretation.
In essence, the command creates a scatter plot showing how miles per gallon changes with
horsepower, overlays a red linear regression line to highlight the trend, and labels the plot
with a descriptive title.
Theme Layer

• Theme Layer:
• ggplot2 in R layer controls the finer points of display like the font size
and background color properties.
• Example 1: Theme layer – element_rect() function
• ggplot(data = mtcars, aes(x = hp, y = mpg)) + geom_point() +
facet_grid(. ~ cyl) + theme([Link] = element_rect(fill =
"blue", colour = "gray")) + labs(title = "Miles per Gallon vs
Horsepower”)
data = mtcars specifies that the data comes from the mtcars dataframe.
aes(x = hp, y = mpg) defines the aesthetic mapping, mapping the hp (horsepower)
variable to the x-axis and the mpg (miles per gallon) variable to the y-axis.
+ geom_point(): This adds a layer of geometric objects, specifically points, to the plot,
resulting in a scatter plot. Each point represents a car from the dataset.
+ facet_grid(. ~ cyl): This uses faceting to create a grid of subplots.
The formula . ~ cyl means the data is split by the cyl (number of cylinders) variable
across different columns, with rows indicated by the dot (.). You get three separate
plots: one for 4-cylinder, one for 6-cylinder, and one for 8-cylinder cars.
+ theme([Link] = element_rect(fill = "blue", colour = "gray")): This
customizes the non-data elements of the plot.
[Link] = element_rect() sets the entire plot's background fill color to blue
and its border color to gray.
+ labs(title = "Miles per Gallon vs Horsepower"): This adds labels and titles to the plot.
title = "Miles per Gallon vs Horsepower" sets the main title of the entire graphic.
Coordinates layer
• Coordinates layer:
• ggplot2 in R these layers, data coordinates are mapped together to
the mentioned plane of the graphic and we adjust the axis and
changes the spacing of displayed data with Control plot dimensions.
• ggplot(data = mtcars, aes(x = wt, y = mpg)) + geom_point() +
stat_smooth(method = lm, col = "red") + scale_y_continuous("Miles
per Gallon", limits = c(2, 35), expand = c(0, 0)) +
scale_x_continuous("Weight", limits = c(0, 25), expand = c(0, 0)) +
coord_equal() + labs(title = "Miles per Gallon vs Weight", x =
"Weight", y = "Miles per Gallon")
+ geom_point():
Adds a layer of points to the plot, creating a scatter plot.
+ stat_smooth(method = lm, col = "red"):
Adds a smoothed line (a statistical transformation layer).
method = lm specifies that a linear model (linear regression) should be used for the line of best fit.
col = "red" sets the color of this line to red.
+ scale_y_continuous("Miles per Gallon", limits = c(2, 35), expand = c(0, 0)):
Controls the properties of the y-axis scale.
Sets the axis label to "Miles per Gallon" and the display limits from 2 to 35.
expand = c(0, 0) ensures no extra space is added between the data points and the axis limits, causing the plot
to start exactly at the specified limits.
+ scale_x_continuous("Weight", limits = c(0, 25), expand = c(0, 0)):
Controls the properties of the x-axis scale, similar to the y-axis function.
Sets the axis label to "Weight", limits from 0 to 25, and removes expansion space.
+ coord_equal():
Sets a fixed aspect ratio for the plot, ensuring that one unit on the x-axis has the same visual length as one
unit on the y-axis.
+ labs(title = "Miles per Gallon vs Weight", x = "Weight", y = "Miles per Gallon"):
Adds labels and a title to the plot.
Specifically sets the main title, and explicitly sets the x and y axis labels (though this is somewhat redundant
given the scale_x_continuous and scale_y_continuous calls, it is a common practice).
Coord_cartesian() to proper zoom in:
• # Add coord_cartesian() to proper zoom in
• ggplot(data = mtcars, aes(x = wt, y = hp, col = am)) + geom_point() +
geom_smooth() + coord_cartesian(xlim = c(3, 6))

• The R code provided creates a scatterplot with a superimposed


smooth trend line using the ggplot2 package. This visualization uses
the built-in mtcars dataset to explore the relationship between car
weight (wt) and horsepower (hp), with points colored according to
the transmission type (am), and the x-axis limits fixed between 3 and
6.
line chart
• To create a line chart using ggplot2 in R and display it in a PowerPoint
(PPT) presentation, you first generate the plot in R and then export it
as an image to insert into your slides.
• Part 1: Create a Line Chart in R using ggplot2
• You need the ggplot2 package installed ([Link]("ggplot2")).
The following example uses the built-in economics dataset in R.
•R
• # Load the ggplot2 library
• library(ggplot2)
line chart
• [Link]("ggplot2") # Install ggplot2 package, this is a one-time thing for your own pc
• library(ggplot2) # Load the ggplot2 package, you have to do this everytime
• Now see the code:

• mtcars$Index <- 1:nrow(mtcars) ## Create an Index

• ggplot(mtcars, aes(x = Index, y = mpg)) +


• geom_line(color = "blue") +
• geom_point(color = "red") +
• theme_minimal() +
• labs(title = "Simulated Trend of MPG Over 'Time'",
• x = "Car Index",
• y = "Miles Per Gallon")
line chart
• Part 2: Export the Chart
• The best way to get the chart into PowerPoint is to save it as a
high-quality image file (like PNG or JPEG).
• # Save the plot to a file
• ggsave("unemployment_line_chart.png", plot = last_plot(), width = 8,
height = 5, units = "inches", dpi = 300)
• This saves the most recently created plot (last_plot()) to a file named
unemployment_line_chart.png in your current working directory.
Scatter Plot
• Scatter plots display values for two variables for a set of data so that we can get an idea of the trend or correlation.

• Scatter plots can be useful for identifying correlations, trends, and outliers in data. They visually demonstrate the strength and
direction of a relationship between two variables, making them indispensable for exploratory data analysis and regression
analysis. Note that while scatter plots are excellent for bivariate analysis, they can become cluttered and less effective with
large datasets. Also, scatter plots do not necessarily imply causation.

• The base R example graph:

• #base R
• # Scatter plot of MPG vs Weight
• plot(mtcars$wt, mtcars$mpg,
• xlab = "Weight (1000 lbs)", ylab = "Miles Per Gallon",
• main = "MPG vs. Car Weight",
• pch = 19, col = "blue")
Scatter Plot

• ggplot(mtcars, aes(x = wt, y = mpg)) +


• geom_point() +
• theme_minimal() +
• labs(title = "Scatter plot of MPG vs Car Weight",
• x = "Weight (1000 lbs)",
• y = "Miles per Gallon")
histogram
• A histogram is a type of graph used in statistics to represent the distribution of
numerical data by showing the number of data points that fall within a range of
values, known as bins. Note that It is similar to a bar chart but differs in that it
groups ranges of data into bins and displays the frequency of data points within
each bin, making it ideal for showing the shape and spread of continuous data.
We often plot the histogram of one continuous variable to have a rough idea
about the distribution(like, whether normal or not).

• Base R example:

• # Histogram of MPG
• hist(mtcars$mpg, breaks = 10, col = "skyblue",
• xlab = "Miles Per Gallon", main = "Histogram of MPG")
histogram
• ggplot(mtcars, aes(x = mpg)) +
• geom_histogram(binwidth = 5, fill = "yellow", color = "black") +
• theme_minimal() +
• labs(title = "Histogram of MPG",
• x = "Miles per Gallon",
• y = "Frequency")
Box plot
• Box plots (or box-and-whisker plots) summarize data using a five-number summary: minimum, first quartile (Q1), median(not
mean), third quartile (Q3), and maximum. They also highlight outliers. Box plots are highly efficient in depicting the distribution
of data, providing insights into the central tendency, variability, and skewness. They are particularly useful for comparing
distributions across different categories. Note that they do not convey the exact distribution shape as precisely as density plots
or histograms and can be misleading if misinterpreted by those unfamiliar with their construction.

• Base R example:

• # Box plot for MPG across different numbers of cylinders


• # Convert cyl to a factor if it's not already
• mtcars$cyl <- [Link](mtcars$cyl)

• # Box plot of MPG by number of cylinders


• boxplot(mpg ~ cyl, data = mtcars,
• xlab = "Number of Cylinders", ylab = "Miles Per Gallon",
• main = "MPG by Number of Cylinders", col = rainbow(length(unique(mtcars$cyl))))
Box plot

• ggplot(mtcars, aes(x = factor(cyl), y = mpg)) +


• geom_boxplot() +
• theme_minimal() +
• labs(title = "Box Plot of MPG by Cylinder Count",
• x = "Number of Cylinders",
• y = "Miles per Gallon")
Pie Chart

• A pie chart is employed to display the relative proportions of various


categories within a dataset. Each category is depicted as a slice of the pie,
with its size corresponding to the proportion it holds in relation to the
entire dataset. Pie charts are commonly used to display the composition of
a whole or to compare the sizes of different parts in relation to the entire
dataset. They can sometimes be misleading when slices are close in size, as
it's harder to make accurate comparisons with the human eye.
• Pie charts are commonly used for tasks such as:
• Showing the composition of a whole (e.g., sales distribution by product,
budget allocation by category).
• Comparing the proportions of different groups within a dataset.
• Highlighting the most significant category within a set.
Pie Chart
•# Sample data
•data <- [Link](category = c("Category
1", "Category 2", "Category 3", "Category
4"),values = c(30, 20, 15, 35))
•# Calculate percentage for each category
•data$percentage <-
(data$values/sum(data$values))*100
Pie Chart
•# Create a pie chart-like visualization
•ggplot(data, aes(x = "", y = percentage, fill = category)) +
• geom_bar(stat = "identity", width = 1) +
• coord_polar(theta = "y") +
• labs(title = "Pie Chart-like Visualization",
• fill = "Category",
• x = NULL,
• y = NULL) +
• theme_void() +
• theme([Link] = "right")
Bar Chart
• You can enhance your bar charts with various options:
• Colors: Use the fill aesthetic inside aes() to color bars by a variable, or
set a static fill color outside aes().
• Labels: Add data labels on top of bars using geom_text().
• Grouped/Stacked Bars: For multiple categorical variables, use position
= "dodge" for grouped bars or position = "stack" (default) for stacked
bars within geom_bar() or geom_col().
• Themes and Titles: Customize the appearance with functions like
theme_minimal() and add labels with labs().
• Horizontal Bars: Swap the x and y axes using coord_flip()
• df <- [Link](dose=c("D0.5", "D1", "D2"), len=c(4.2, 10, 29.5))
• head(df)
• Create barplots
• library(ggplot2)
• # Basic barplot
• p<-ggplot(data=df, aes(x=dose, y=len)) +
• geom_bar(stat="identity")
•p
• # Horizontal bar plot
• p + coord_flip()
Change the width and the color of bars :
# Change the width of bars
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", width=0.5)
# Change colors
ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", color="blue", fill="white")
# Minimal theme + blue fill color
p<-ggplot(data=df, aes(x=dose, y=len)) +
geom_bar(stat="identity", fill="steelblue")+
theme_minimal()
p

You might also like