0% found this document useful (0 votes)
13 views38 pages

R Data Visualization

The document discusses data visualization in R, highlighting its importance for gaining insights from data through various packages like ggplot2, plotly, and others. It covers the creation of different types of visualizations, including pie charts and bar charts, along with their advantages and disadvantages. Additionally, it explains the grammar of graphics, essential elements for creating effective visualizations, and provides examples of code for generating these charts.

Uploaded by

Nishanth N
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views38 pages

R Data Visualization

The document discusses data visualization in R, highlighting its importance for gaining insights from data through various packages like ggplot2, plotly, and others. It covers the creation of different types of visualizations, including pie charts and bar charts, along with their advantages and disadvantages. Additionally, it explains the grammar of graphics, essential elements for creating effective visualizations, and provides examples of code for generating these charts.

Uploaded by

Nishanth N
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

R Data Visualization

 In R, we can create visually appealing data visualizations by writing few lines of code. For this
purpose, we use the diverse functionalities of R.
 Data visualization is an efficient technique for gaining insight about data through a visual
medium. With the help of visualization techniques, a human can easily obtain information
about hidden patterns in data that might be neglected.

By using the data visualization technique, we can work with large datasets to efficiently obtain key
insights about it.

R Visualization Packages
R provides a series of packages for data visualization. These packages are as follows:

1) plotly

 The plotly package provides online interactive and quality graphs. This package extends upon
the JavaScript library ?[Link].

2) ggplot2

 R allows us to create graphics declaratively. R provides the ggplot package for this purpose.
 This package is famous for its elegant and quality graphs, which sets it apart from other
visualization packages.
3) tidyquant

 The tidyquant is a financial package that is used for carrying out quantitative financial
analysis.
 This package adds under tidyverse universe as a financial package that is used for importing,
analyzing, and visualizing the data.

4) taucharts

 Data plays an important role in taucharts. The library provides a declarative interface for rapid
mapping of data fields to visual properties.

5) ggiraph

 It is a tool that allows us to create dynamic ggplot graphs. This package allows us to add
tooltips, JavaScript actions, and animations to the graphics.

6) geofacets

 This package provides geofaceting functionality for 'ggplot2'. Geofaceting arranges a


sequence of plots for different geographical entities into a grid that preserves some of the
geographical orientation.

7) googleVis

 googleVis provides an interface between R and Google's charts tools. With the help of this
package, we can create web pages with interactive charts based on R data frames.

8) RColorBrewer

 This package provides color schemes for maps and other graphics, which are designed by
Cynthia Brewer.

9) dygraphs

 The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides
rich features for charting time-series data in R.

10) shiny

 R allows us to develop interactive and aesthetically pleasing web apps by providing


a shiny package. This package provides various extensions with HTML widgets, CSS, and
JavaScript.

R Graphics
 Graphics play an important role in carrying out the important features of the data.
 Graphics are used to examine marginal distributions, relationships between variables, and
summary of very large data.
 It is a very important complement for many statistical and computational techniques.
Standard Graphics
R standard graphics are available through package graphics, include several functions which provide
statistical plots, like:

o Scatterplots
o Piecharts
o Boxplots
o Barplots etc.

We use the above graphs that are typically a single function call.

Graphics Devices
It is something where we can make a plot to appear.

A graphics device is a window on your computer (screen device), a PDF file (file device), a Scalable
Vector Graphics (SVG) file (file device), or a PNG or JPEG file (file device).

There are some of the following points which are essential to understand:

o The functions of graphics devices produce output, which depends on the active graphics device.
o A screen is the default and most frequently used device.
o R graphical devices such as the PDF device, the JPEG device, etc. are used.
o We just need to open the graphics output device which we want. Therefore, R takes care of producing
the type of output which is required by the device.
o For producing a certain plot on the screen or as a GIF R graphics file, the R code should exactly be the
same. We only need to open the target output device before.
o Several devices can be open at the same time, but there will be only one active device.

The basics of the grammar of graphics


There are some key elements of a statistical graphic. These elements are the basics of the grammar
of graphics. Let's discuss each of the elements one by one to gain the basic knowledge of graphics.

1) Data

 Data is the most crucial thing which is processed and generates an output.

2) Aesthetic Mappings

 Aesthetic mappings are one of the most important elements of a statistical graphic.
 It controls the relation between graphics variables and data variables.
 In a scatter plot, it also helps to map the temperature variable of a data set into the X variable.

In graphics, it helps to map the species of a plant into the color of dots.

3) Geometric Objects

 Geometric objects are used to express each observation by a point using the aesthetic
mappings. It maps two variables in the data set into the x,y variables of the plot.

4) Statistical Transformations
 Statistical transformations allow us to calculate the statistical analysis of the data in the plot.
 The statistical transformation uses the data and approximates it with the help of a regression
line having x,y coordinates, and counts occurrences of certain values.

5) Scales

 It is used to map the data values into values present in the coordinate system of the graphics
device.

6) Coordinate system

The coordinate system plays an important role in the plotting of the data.

o Cartesian
o Plot

7) Faceting

Faceting is used to split the data into subgroups and draw sub-graphs for each group.

Advantages of Data Visualization in R


1. Understanding

It can be more attractive to look at the business. And, it is easier to understand through graphics
and charts than a written document with text and numbers. Thus, it can attract a wider range of
audiences. Also, it promotes the widespread use of business insights that come to make better
decisions.

2. Efficiency

Its applications allow us to display a lot of information in a small space. Although, the decision-
making process in business is inherently complex and multifunctional, displaying evaluation findings
in a graph can allow companies to organize a lot of interrelated information in useful ways.

3. Location

Its app utilizing features such as Geographic Maps and GIS can be particularly relevant to wider
business when the location is a very relevant factor. We will use maps to show business insights from
various locations, also consider the seriousness of the issues, the reasons behind them, and working
groups to address them.

Disadvantages of Data Visualization in R


1. Cost

R application development range a good amount of money. It may not be possible, especially for
small companies, that many resources can be spent on purchasing them. To generate reports, many
companies may employ professionals to create charts that can increase costs. Small enterprises are
often operating in resource-limited settings, and are also receiving timely evaluation results that can
often be of high importance.

2. Distraction

However, at times, data visualization apps create highly complex and fancy graphics-rich reports and
charts, which may entice users to focus more on the form than the function. If we first add visual
appeal, then the overall value of the graphic representation will be minimal. In resource-setting, it is
required to understand how resources can be best used. And it is also not caught in the graphics
trend without a clear purpose.

R Pie Charts
 R programming language has several libraries for creating charts and graphs.
 A pie-chart is a representation of values in the form of slices of a circle with different colors.
 Slices are labeled with a description, and the numbers corresponding to each slice are also
shown in the chart.
 However, pie charts are not recommended in the R documentation, and their characteristics
are limited.
 The authors recommend a bar or dot plot on a pie chart because people are able to measure
length more accurately than volume.

The Pie charts are created with the help of pie () function, which takes positive numbers as vector
input. Additional parameters are used to control labels, colors, titles, etc.

There is the following syntax of the pie() function:

pie(X, Labels, Radius, Main, Col, Clockwise)

Here,

X is a vector that contains the numeric values used in the pie chart.

1. Labels are used to give the description to the slices.


2. Radius describes the radius of the pie chart.
3. Main describes the title of the chart.
4. Col defines the color palette.
5. Clockwise is a logical value that indicates the clockwise or anti-clockwise direction in which slices are
drawn.

Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
# Giving the chart file a name.
png(file = "[Link]")
# Plotting the chart.
pie(x,labels)
# Saving the file.
[Link]()

Output:

Title and color


 A pie chart has several more features that we can use by adding more parameters to the pie()
function.
 We can give a title to our pie chart by passing the main parameter. It tells the title of the pie
chart to the pie() function. Apart from this, we can use a rainbow colour pallet while drawing
the chart by passing the col parameter.

Note: The length of the pallet will be the same as the number of values that we have for the chart.
So for that, we will use length() function.

Let's see an example to understand how these methods work in creating an attractive pie chart with
title and color.

Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
# Giving the chart file a name.
png(file = "title_color.jpg")
# Plotting the chart.
pie(x,labels,main="Country Pie chart",col=rainbow(length(x)))
# Saving the file.
[Link]()

Output:

Slice Percentage & Chart Legend


 There are two additional properties of the pie chart, i.e., slice percentage and chart legend.
 We can show the data in the form of percentage as well as we can add legends to plots in R
by using the legend() function. There is the following syntax of the legend() function.

legend(x,y=NULL,legend,fill,col,bg)

Here,

ADVERTISEMENT
ADVERTISEMENT

o x and y are the coordinates to be used to position the legend.


o legend is the text of legend
o fill is the color to use for filling the boxes beside the legend text.
o col defines the color of line and points besides the legend text.
o bg is the background color for the legend box.

Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
pie_percent<- round(100*x/sum(x), 1)
# Giving the chart file a name.
png(file = "per_pie.jpg")
# Plotting the chart.
pie(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
legend("topright", c("India", "America", "Shri Lanka", "Nepal"), cex = 0.8,
fill = rainbow(length(x)))
#Saving the file.
[Link]()

Output:

3 Dimensional Pie Chart


In R, we can also create a three-dimensional pie chart. For this purpose, R provides a plotrix package
whose pie3D() function is used to create an attractive 3D pie chart. The parameters of pie3D()
function remain same as pie() function. Let's see an example to understand how a 3D pie chart is
created with the help of this function.

Example
# Getting the library.
library(plotrix)
# Creating data for the graph.
x <- c(20, 65, 15, 50,45)
labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
# Give the chart file a name.
png(file = "3d_pie_chart1.jpg")
# Plot the chart.
pie3D(x,labelslabels = labels,explode = 0.1, main = "Country Pie Chart")
# Save the file.
[Link]()

Output:

Example
# Getting the library.
library(plotrix)
# Creating data for the graph.
x <- c(20, 65, 15, 50,45)
labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
pie_percent<- round(100*x/sum(x), 1)
# Giving the chart file a name.
png(file = "three_D_pie.jpg")
# Plotting the chart.
pie3D(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
legend("topright", c("India", "America", "Shri Lanka", "Nepal","Bhutan"), cex = 0.8,
fill = rainbow(length(x)))
#Saving the file.
[Link]()

Output:

R Bar Charts
 A bar chart is a pictorial representation in which numerical values of variables are represented
by length or height of lines or rectangles of equal width.
 A bar chart is used for summarizing a set of categorical data. In bar chart, the data is shown
through rectangular bars having the length of the bar proportional to the value of the
variable.

In R, we can create a bar chart to visualize the data in an efficient manner. For this purpose, R provides
the barplot() function, which has the following syntax:

barplot(h,x,y,main, [Link],col)

[Link] Parameter Description

1. H A vector or matrix which contains numeric values used in the bar chart.

2. xlab A label for the x-axis.

3. ylab A label for the y-axis.

4. main A title of the bar chart.

5. [Link] A vector of names that appear under each bar.

6. col It is used to give colors to the bars in the graph.

Example
# Creating the data for Bar chart
H<- c(12,35,54,3,41)
# Giving the chart file a name
png(file = "bar_chart.png")
# Plotting the bar chart
barplot(H)
# Saving the file
[Link]()

Output:

Labels, Title & Colors


Like pie charts, we can also add more
functionalities in the bar chart by-passing more arguments in the barplot() functions. We can add
a title in our bar chart or can add colors to the bar by adding the main and col parameters,
respectively. We can add another parameter i.e., [Link], which is a vector that has the same
number of values, which are fed as the input vector to describe the meaning of each bar.

Let's see an example to understand how labels, titles, and colors are added in our bar chart.

Example
# Creating the data for Bar chart
H <- c(12,35,54,3,41)
M<- c("Feb","Mar","Apr","May","Jun")

# Giving the chart file a name


png(file = "bar_properties.png")

# Plotting the bar chart


barplot(H,[Link]=M,xlab="Month",ylab="Revenue",col="Green",
main="Revenue Bar chart",border="red")
# Saving the file
[Link]()

Output:

Group Bar Chart & Stacked Bar Chart


We can create bar charts with groups of bars and stacks using matrices as input values in each bar.
One or more variables are represented as a matrix that is used to construct group bar charts and
stacked bar charts.

Let's see an example to understand how these charts are created.


Example
library(RColorBrewer)
months <- c("Jan","Feb","Mar","Apr","May")
regions <- c("West","North","South")
# Creating the matrix of the values.
Values <- matrix(c(21,32,33,14,95,46,67,78,39,11,22,23,94,15,16), nrow = 3, ncol = 5, byrow = TRUE)
# Giving the chart file a name
png(file = "stacked_chart.png")
# Creating the bar chart
barplot(Values, main = "Total Revenue", [Link] = months, xlab = "Month", ylab = "Revenue", cc
ol =c("cadetblue3","deeppink2","goldenrod1"))
# Adding the legend to the chart
legend("topleft", regions, cex = 1.3, fill = c("cadetblue3","deeppink2","goldenrod1"))

# Saving the file


[Link]()

Output:

R Boxplot
 Boxplots are a measure of how well data is distributed across a data set.
 This divides the data set into three quartiles. T
 his graph represents the minimum, maximum, average, first quartile, and the third quartile in
the data set. Boxplot is also useful in comparing the distribution of data in a data set by
drawing a boxplot for each of them.
R provides a boxplot() function to create a boxplot. There is the following syntax of boxplot()
function:

boxplot(x, data, notch, varwidth, names, main)

Here,

[Link] Parameter Description

1. x It is a vector or a formula.

2. data It is the data frame.

3. notch It is a logical value set as true to draw a notch.

4. varwidth It is also a logical value set as true to draw the width of the box same as
the sample size.

5. names It is the group of labels that will be printed under each boxplot.

6. main It is used to give a title to the graph.

Let?s see an example to understand how we can create a boxplot in R. In the below example, we will
use the "mtcars" dataset present in the R environment. We will use its two columns only, i.e., "mpg"
and "cyl". The below example will create a boxplot graph for the relation between mpg and cyl, i.e.,
miles per gallon and number of cylinders, respectively.

Example
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders",
ylab = "Miles Per Gallon", main = "R Boxplot Example")

# Save the file.


[Link]()

Output:
Boxplot using notch
In R, we can draw a boxplot using a notch. It helps us to find out how the medians of different data
groups match with each other. Let's see an example to understand how a boxplot graph is created
using notch for each of the groups.

In our below example, we will use the same dataset ?mtcars."

Example
# Giving a name to our chart.
png(file = "boxplot_using_notch.png")
# Plotting the chart.
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Quantity of Cylinders",
ylab = "Miles Per Gallon",
main = "Boxplot Example",
notch = TRUE,
varwidth = TRUE,
ccol = c("green","yellow","red"),
names = c("High","Medium","Low")
)
# Saving the file.
[Link]()

Output:

Violin Plots
R provides an additional plotting scheme which is created with the combination of a boxplot and
a kernel density plot. The violin plots are created with the help of vioplot() function present in the
vioplot package.

Let's see an example to understand the creation of the violin plot.

Example
# Loading the vioplot package
library(vioplot)
# Giving a name to our chart.
png(file = "[Link]")
#Creating data for vioplot function
x1 <- mtcars$mpg[mtcars$cyl==4]
x2 <- mtcars$mpg[mtcars$cyl==6]
x3 <- mtcars$mpg[mtcars$cyl==8]
#Creating vioplot function
vioplot(x1, x2, x3, names=c("4 cyl", "6 cyl", "8 cyl"),
col="green")
#Setting title
title("Violin plot example")
# Saving the file.
[Link]()

Output:

Bagplot- 2-Dimensional Boxplot Extension


The bagplot(x, y) function in the aplpack package provides a biennial version of the univariate
boxplot. The bag contains 50% of all points. The bivariate median is approximate. The fence
separates itself from the outside points, and the outlays are displayed.

Let?s see an example to understand how we can create a two-dimensional boxplot extension in R.
Example
# Loading aplpack package
library(aplpack)
# Giving a name to our chart.
png(file = "[Link]")
#Creating bagplot function
attach(mtcars)
bagplot(wt,mpg, xlab="Car Weight", ylab="Miles Per Gallon",
main="2D Boxplot Extension")
# Saving the file.
[Link]()

Output:

R Histogram
A histogram is a type of bar chart which shows the frequency of the number of values which are
compared with a set of values ranges. The histogram is used for the distribution, whereas a bar chart
is used for comparing different entities. In the histogram, each bar represents the height of the
number of values present in the given range.

For creating a histogram, R provides hist() function, which takes a vector as an input and uses more
parameters to add more functionality. There is the following syntax of hist() function:

1. hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)

Here,

[Link] Parameter Description

1. v It is a vector that contains numeric values.

2. main It indicates the title of the chart.

3. col It is used to set the color of the bars.

4. border It is used to set the border color of each bar.

5. xlab It is used to describe the x-axis.

6. ylab It is used to describe the y-axis.

7. xlim It is used to specify the range of values on the x-axis.

8. ylim It is used to specify the range of values on the y-axis.

9. breaks It is used to mention the width of each bar.

Let?s see an example in which we create a simple histogram with the help of required parameters
like v, main, col, etc.

Example
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)

# Giving a name to the chart file.


png(file = "histogram_chart.png")

# Creating the histogram.


hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red")

# Saving the file.


[Link]()

Output:

Let?s see some more examples in which we have used different parameters of hist() function to add
more functionality or to create a more attractive chart.

Example: Use of xlim & ylim parameter


# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)

# Giving a name to the chart file.


png(file = "histogram_chart_lim.png")

# Creating the histogram.


hist(v,xlab = "Weight",ylab="Frequency",col = "green",border = "red",xlim = c(0,40), ylim = c(0,3), br
eaks = 5)

# Saving the file.


[Link]()

Output:
Example: Finding return value of hist()
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)

# Giving a name to the chart file.


png(file = "histogram_chart_lim.png")
# Creating the histogram.
m<-hist(v)
m

Output:
Example: Using histogram return values for labels using text()
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60,120,40,70,90)
# Giving a name to the chart file.
png(file = "histogram_return.png")

# Creating the histogram.


m<-hist(v,xlab = "Weight",ylab="Frequency",col = "darkmagenta",border = "pink", breaks = 5)
#Setting labels
text(m$mids,m$counts,labels=m$counts, adj=c(0.5, -0.5))
# Saving the file.
[Link]()

Output:

Example: Histogram using non-


uniform width
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60,120,40,70,90)
# Giving a name to the chart file.
png(file = "histogram_non_uniform.png")
# Creating the histogram.
hist(v,xlab = "Weight",ylab="Frequency",xlim=c(50,100),col = "darkmagenta",border = "pink", breaks=c(10,5
5,60,70,75,80,100,120))
# Saving the file.
[Link]()

Output:

R Line Graphs
 A line graph is a pictorial representation of information which changes continuously over
time.
 A line graph can also be referred to as a line chart. Within a line graph, there are points
connecting the data to show the continuous change.
 The lines in a line graph can move up and down based on the data. We can use a line graph
to compare different events, information, and situations.

A line chart is used to connect a series of points by drawing line segments between them. Line charts
are used in identifying the trends in data. For line graph construction, R provides plot() function,
which has the following syntax:

plot(v,type,col,xlab,ylab)
Here,

[Link] Parameter Description

1. v It is a vector which contains the numeric values.

2. type This parameter takes the value ?I? to draw only the lines or ?p? to
draw only the points and "o" to draw both lines and points.
3. xlab It is the label for the x-axis.

4. ylab It is the label for the y-axis.

5. main It is the title of the chart.

6. col It is used to give the color for both the points and lines

Let?s see a basic example to understand how plot() function is used to create the line graph:

Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
# Giving a name to the chart file.
png(file = "line_graph.jpg")
# Plotting the bar chart.
plot(v,type = "o")
# Saving the file.
[Link]()

Output:

Line Chart Title, Color, and Labels


Like other graphs and charts, in line chart, we can add more features by adding more parameters.
We can add the colors to the lines and points, add labels to the axis, and can give a title to the chart.
Let?s see an example to understand how these parameters are used in plot() function to create an
attractive line graph.

Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
# Giving a name to the chart file.
png(file = "line_graph_feature.jpg")
# Plotting the bar chart.
plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
# Saving the file.
[Link]()

Output:
Line Charts Containing Multiple Lines
In our previous examples, we created line graphs containing only one line in each graph. R allows us
to create a line graph containing multiple lines. R provides lines() function to create a line in the line
graph.

The lines() function takes an additional input vector for creating a line. Let?s see an example to
understand how this function is used:

Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
w <- c(11,13,32,6,35)
x <- c(12,22,15,34,35)
# Giving a name to the chart file.
png(file = "multi_line_graph.jpg")
# Plotting the bar chart.
plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
lines(w, type = "o", col = "red")
lines(x, type = "o", col = "blue")
# Saving the file.
[Link]()

Output:
Line Graph using ggplot2
In R, there is another way to create a line graph i.e. the use of ggplot2 packages. The ggplot2 package
provides geom_line(), geom_step() and geom_path() function to create line graph. To use these
functions, we first have to install the ggplot2 package and then we load it into the current working
library.

Let?s see an example to understand how ggplot2 is used to create a line graph. In the below example,
we will use the predefined ToothGrowth dataset, which describes the effect of vitamin C on tooth
growth in Guinea pigs.

Example
library(ggplot2)
#Creating data for the graph
data_frame<- [Link](dose=c("D0.5", "D1", "D2"),
len=c(4.2, 10, 29.5))
head(data_frame)
png(file = "multi_line_graph2.jpg")
# Basic line plot with points
ggplot(data=data_frame, aes(x=dose, y=len, group=1)) +geom_line()+geom_point()
# Change the line type
ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(linetype = "dashed")+geom_point()
# Change the color
ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(color="red")+geom_point()
[Link]()
Output:

R Scatterplots
 The scatter plots are used to compare variables.
 A comparison between variables is required when we need to define how much one variable
is affected by another variable.
 In a scatterplot, the data is represented as a collection of points.
 Each point on the scatterplot defines the values of the two variables.
 One variable is selected for the vertical axis and other for the horizontal axis.
 In R, there are two ways of creating scatterplot, i.e., using plot() function and using the
ggplot2 package's functions.

There is the following syntax for creating scatterplot in R:

plot(x, y, main, xlab, ylab, xlim, ylim, axes)

Here,

[Link] Parameters Description

1. x It is the dataset whose values are the horizontal coordinates.

2. y It is the dataset whose values are the vertical coordinates.

3. main It is the title of the graph.

4. xlab It is the label on the horizontal axis.

5. ylab It is the label on the vertical axis.

6. xlim It is the limits of the x values which is used for plotting.


7. ylim It is the limits of the values of y, which is used for plotting.

8. axes It indicates whether both axes should be drawn on the plot.

Let's see an example to understand how we can construct a scatterplot using the plot function. In
our example, we will use the dataset "mtcars", which is the predefined dataset available in the R
environment.

Example
#Fetching two columns from mtcars
data <-mtcars[,c('wt','mpg')]
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = data$wt,y = data$mpg, xlab = "Weight", ylab = "Milage", xlim = c(2.5,5), ylim = c(15,30), main = "W
eight v/sMilage")
# Saving the file.
[Link]()

Output:
Scatterplot using ggplot2
In R, there is another way for creating scatterplot i.e. with the help of ggplot2 package.

The ggplot2 package provides ggplot() and geom_point() function for creating a scatterplot. The
ggplot() function takes a series of the input item. The first parameter is an input vector, and the
second is the aes() function in which we add the x-axis and y-axis.

Let's start understanding how the ggplot2 package is used with the help of an example where we
have used the familiar dataset "mtcars".

Example
#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "scatterplot_ggplot.png")
# Plotting the chart using ggplot() and geom_point() functions.
ggplot(mtcars, aes(x = drat, y = mpg)) +geom_point()
# Saving the file.
[Link]()

Output:

We can add more features and make a more attractive scatter plots also. Below are some examples
in which different parameters are added.

Example 1: Scatterplot with groups


#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart using ggplot() and geom_point() functions.
#The aes() function inside the geom_point() function controls the color of the group.
ggplot(mtcars, aes(x = drat, y = mpg)) +
geom_point(aes(color=factor(gear)))
# Saving the file.
[Link]()

Output:

Example 2: Changes in axis


#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart using ggplot() and geom_point() functions.
#The aes() function inside the geom_point() function controls the color of the group.
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color=factor(gear)))
# Saving the file.
[Link]()

Output:

Example 3: Scatterplot with fitted values


#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) + stat_smoo
th(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
# Saving the file.
[Link]()

Output:
Adding information to the graph
Example 4: Adding title
#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
new_graph+
labs(
title = "Scatterplot with more information"
)
# Saving the file.
[Link]()
Output:

Example 5: Adding title with dynamic name


#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
#Finding mean of mpg
mean_mpg<- mean(mtcars$mpg)
#Adding title with dynamic name
new_graph + labs(
title = paste("Adding additiona information. Average mpg is", mean_mpg)
)
# Saving the file.
[Link]()

Output:

Example 6: Adding a sub-title


#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
#Adding title with dynamic name
new_graph + labs(
title =
"Relation between Mile per hours and drat",
subtitle =
"Relationship break down by gear class",
caption = "Authors own computation"
)
# Saving the file.
[Link]()

Output:

Example 7: Changing name of x-axis and y-axis


#Loading ggplot2 package
library(ggplot2
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
#Adding title with dynamic name
new_graph + labs(
x = "Drat definition",
y = "Mile per hours",
color = "Gear",
title = "Relation between Mile per hours and drat",
subtitle = "Relationship break down by gear class",
caption = "Authors own computation"
)
# Saving the file.
[Link]()

Output:

Example 8: Adding theme


#Loading ggplot2 package
library(ggplot2
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
#Adding title with dynamic name
new_graph+
theme_dark() +
labs(
x = "Drat definition, in log",
y = "Mile per hours, in log",
color = "Gear",
title = "Relation between Mile per hours and drat",
subtitle = "Relationship break down by gear class",
caption = "Authors own computation"
)
# Saving the file.
[Link]()

Output:

You might also like