R Data Visualization
R Data Visualization
In R, we can create visually appealing data visualizations by writing few lines of code. For this
purpose, we use the diverse functionalities of R.
Data visualization is an efficient technique for gaining insight about data through a visual
medium. With the help of visualization techniques, a human can easily obtain information
about hidden patterns in data that might be neglected.
By using the data visualization technique, we can work with large datasets to efficiently obtain key
insights about it.
R Visualization Packages
R provides a series of packages for data visualization. These packages are as follows:
1) plotly
The plotly package provides online interactive and quality graphs. This package extends upon
the JavaScript library ?[Link].
2) ggplot2
R allows us to create graphics declaratively. R provides the ggplot package for this purpose.
This package is famous for its elegant and quality graphs, which sets it apart from other
visualization packages.
3) tidyquant
The tidyquant is a financial package that is used for carrying out quantitative financial
analysis.
This package adds under tidyverse universe as a financial package that is used for importing,
analyzing, and visualizing the data.
4) taucharts
Data plays an important role in taucharts. The library provides a declarative interface for rapid
mapping of data fields to visual properties.
5) ggiraph
It is a tool that allows us to create dynamic ggplot graphs. This package allows us to add
tooltips, JavaScript actions, and animations to the graphics.
6) geofacets
7) googleVis
googleVis provides an interface between R and Google's charts tools. With the help of this
package, we can create web pages with interactive charts based on R data frames.
8) RColorBrewer
This package provides color schemes for maps and other graphics, which are designed by
Cynthia Brewer.
9) dygraphs
The dygraphs package is an R interface to the dygraphs JavaScript charting library. It provides
rich features for charting time-series data in R.
10) shiny
R Graphics
Graphics play an important role in carrying out the important features of the data.
Graphics are used to examine marginal distributions, relationships between variables, and
summary of very large data.
It is a very important complement for many statistical and computational techniques.
Standard Graphics
R standard graphics are available through package graphics, include several functions which provide
statistical plots, like:
o Scatterplots
o Piecharts
o Boxplots
o Barplots etc.
We use the above graphs that are typically a single function call.
Graphics Devices
It is something where we can make a plot to appear.
A graphics device is a window on your computer (screen device), a PDF file (file device), a Scalable
Vector Graphics (SVG) file (file device), or a PNG or JPEG file (file device).
There are some of the following points which are essential to understand:
o The functions of graphics devices produce output, which depends on the active graphics device.
o A screen is the default and most frequently used device.
o R graphical devices such as the PDF device, the JPEG device, etc. are used.
o We just need to open the graphics output device which we want. Therefore, R takes care of producing
the type of output which is required by the device.
o For producing a certain plot on the screen or as a GIF R graphics file, the R code should exactly be the
same. We only need to open the target output device before.
o Several devices can be open at the same time, but there will be only one active device.
1) Data
Data is the most crucial thing which is processed and generates an output.
2) Aesthetic Mappings
Aesthetic mappings are one of the most important elements of a statistical graphic.
It controls the relation between graphics variables and data variables.
In a scatter plot, it also helps to map the temperature variable of a data set into the X variable.
In graphics, it helps to map the species of a plant into the color of dots.
3) Geometric Objects
Geometric objects are used to express each observation by a point using the aesthetic
mappings. It maps two variables in the data set into the x,y variables of the plot.
4) Statistical Transformations
Statistical transformations allow us to calculate the statistical analysis of the data in the plot.
The statistical transformation uses the data and approximates it with the help of a regression
line having x,y coordinates, and counts occurrences of certain values.
5) Scales
It is used to map the data values into values present in the coordinate system of the graphics
device.
6) Coordinate system
The coordinate system plays an important role in the plotting of the data.
o Cartesian
o Plot
7) Faceting
Faceting is used to split the data into subgroups and draw sub-graphs for each group.
It can be more attractive to look at the business. And, it is easier to understand through graphics
and charts than a written document with text and numbers. Thus, it can attract a wider range of
audiences. Also, it promotes the widespread use of business insights that come to make better
decisions.
2. Efficiency
Its applications allow us to display a lot of information in a small space. Although, the decision-
making process in business is inherently complex and multifunctional, displaying evaluation findings
in a graph can allow companies to organize a lot of interrelated information in useful ways.
3. Location
Its app utilizing features such as Geographic Maps and GIS can be particularly relevant to wider
business when the location is a very relevant factor. We will use maps to show business insights from
various locations, also consider the seriousness of the issues, the reasons behind them, and working
groups to address them.
R application development range a good amount of money. It may not be possible, especially for
small companies, that many resources can be spent on purchasing them. To generate reports, many
companies may employ professionals to create charts that can increase costs. Small enterprises are
often operating in resource-limited settings, and are also receiving timely evaluation results that can
often be of high importance.
2. Distraction
However, at times, data visualization apps create highly complex and fancy graphics-rich reports and
charts, which may entice users to focus more on the form than the function. If we first add visual
appeal, then the overall value of the graphic representation will be minimal. In resource-setting, it is
required to understand how resources can be best used. And it is also not caught in the graphics
trend without a clear purpose.
R Pie Charts
R programming language has several libraries for creating charts and graphs.
A pie-chart is a representation of values in the form of slices of a circle with different colors.
Slices are labeled with a description, and the numbers corresponding to each slice are also
shown in the chart.
However, pie charts are not recommended in the R documentation, and their characteristics
are limited.
The authors recommend a bar or dot plot on a pie chart because people are able to measure
length more accurately than volume.
The Pie charts are created with the help of pie () function, which takes positive numbers as vector
input. Additional parameters are used to control labels, colors, titles, etc.
Here,
X is a vector that contains the numeric values used in the pie chart.
Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
# Giving the chart file a name.
png(file = "[Link]")
# Plotting the chart.
pie(x,labels)
# Saving the file.
[Link]()
Output:
Note: The length of the pallet will be the same as the number of values that we have for the chart.
So for that, we will use length() function.
Let's see an example to understand how these methods work in creating an attractive pie chart with
title and color.
Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
# Giving the chart file a name.
png(file = "title_color.jpg")
# Plotting the chart.
pie(x,labels,main="Country Pie chart",col=rainbow(length(x)))
# Saving the file.
[Link]()
Output:
legend(x,y=NULL,legend,fill,col,bg)
Here,
ADVERTISEMENT
ADVERTISEMENT
Example
# Creating data for the graph.
x <- c(20, 65, 15, 50)
labels <- c("India", "America", "Shri Lanka", "Nepal")
pie_percent<- round(100*x/sum(x), 1)
# Giving the chart file a name.
png(file = "per_pie.jpg")
# Plotting the chart.
pie(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
legend("topright", c("India", "America", "Shri Lanka", "Nepal"), cex = 0.8,
fill = rainbow(length(x)))
#Saving the file.
[Link]()
Output:
Example
# Getting the library.
library(plotrix)
# Creating data for the graph.
x <- c(20, 65, 15, 50,45)
labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
# Give the chart file a name.
png(file = "3d_pie_chart1.jpg")
# Plot the chart.
pie3D(x,labelslabels = labels,explode = 0.1, main = "Country Pie Chart")
# Save the file.
[Link]()
Output:
Example
# Getting the library.
library(plotrix)
# Creating data for the graph.
x <- c(20, 65, 15, 50,45)
labels <- c("India", "America", "Shri Lanka", "Nepal","Bhutan")
pie_percent<- round(100*x/sum(x), 1)
# Giving the chart file a name.
png(file = "three_D_pie.jpg")
# Plotting the chart.
pie3D(x, labels = pie_percent, main = "Country Pie Chart",col = rainbow(length(x)))
legend("topright", c("India", "America", "Shri Lanka", "Nepal","Bhutan"), cex = 0.8,
fill = rainbow(length(x)))
#Saving the file.
[Link]()
Output:
R Bar Charts
A bar chart is a pictorial representation in which numerical values of variables are represented
by length or height of lines or rectangles of equal width.
A bar chart is used for summarizing a set of categorical data. In bar chart, the data is shown
through rectangular bars having the length of the bar proportional to the value of the
variable.
In R, we can create a bar chart to visualize the data in an efficient manner. For this purpose, R provides
the barplot() function, which has the following syntax:
barplot(h,x,y,main, [Link],col)
1. H A vector or matrix which contains numeric values used in the bar chart.
Example
# Creating the data for Bar chart
H<- c(12,35,54,3,41)
# Giving the chart file a name
png(file = "bar_chart.png")
# Plotting the bar chart
barplot(H)
# Saving the file
[Link]()
Output:
Let's see an example to understand how labels, titles, and colors are added in our bar chart.
Example
# Creating the data for Bar chart
H <- c(12,35,54,3,41)
M<- c("Feb","Mar","Apr","May","Jun")
Output:
Output:
R Boxplot
Boxplots are a measure of how well data is distributed across a data set.
This divides the data set into three quartiles. T
his graph represents the minimum, maximum, average, first quartile, and the third quartile in
the data set. Boxplot is also useful in comparing the distribution of data in a data set by
drawing a boxplot for each of them.
R provides a boxplot() function to create a boxplot. There is the following syntax of boxplot()
function:
Here,
1. x It is a vector or a formula.
4. varwidth It is also a logical value set as true to draw the width of the box same as
the sample size.
5. names It is the group of labels that will be printed under each boxplot.
Let?s see an example to understand how we can create a boxplot in R. In the below example, we will
use the "mtcars" dataset present in the R environment. We will use its two columns only, i.e., "mpg"
and "cyl". The below example will create a boxplot graph for the relation between mpg and cyl, i.e.,
miles per gallon and number of cylinders, respectively.
Example
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Quantity of Cylinders",
ylab = "Miles Per Gallon", main = "R Boxplot Example")
Output:
Boxplot using notch
In R, we can draw a boxplot using a notch. It helps us to find out how the medians of different data
groups match with each other. Let's see an example to understand how a boxplot graph is created
using notch for each of the groups.
Example
# Giving a name to our chart.
png(file = "boxplot_using_notch.png")
# Plotting the chart.
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Quantity of Cylinders",
ylab = "Miles Per Gallon",
main = "Boxplot Example",
notch = TRUE,
varwidth = TRUE,
ccol = c("green","yellow","red"),
names = c("High","Medium","Low")
)
# Saving the file.
[Link]()
Output:
Violin Plots
R provides an additional plotting scheme which is created with the combination of a boxplot and
a kernel density plot. The violin plots are created with the help of vioplot() function present in the
vioplot package.
Example
# Loading the vioplot package
library(vioplot)
# Giving a name to our chart.
png(file = "[Link]")
#Creating data for vioplot function
x1 <- mtcars$mpg[mtcars$cyl==4]
x2 <- mtcars$mpg[mtcars$cyl==6]
x3 <- mtcars$mpg[mtcars$cyl==8]
#Creating vioplot function
vioplot(x1, x2, x3, names=c("4 cyl", "6 cyl", "8 cyl"),
col="green")
#Setting title
title("Violin plot example")
# Saving the file.
[Link]()
Output:
Let?s see an example to understand how we can create a two-dimensional boxplot extension in R.
Example
# Loading aplpack package
library(aplpack)
# Giving a name to our chart.
png(file = "[Link]")
#Creating bagplot function
attach(mtcars)
bagplot(wt,mpg, xlab="Car Weight", ylab="Miles Per Gallon",
main="2D Boxplot Extension")
# Saving the file.
[Link]()
Output:
R Histogram
A histogram is a type of bar chart which shows the frequency of the number of values which are
compared with a set of values ranges. The histogram is used for the distribution, whereas a bar chart
is used for comparing different entities. In the histogram, each bar represents the height of the
number of values present in the given range.
For creating a histogram, R provides hist() function, which takes a vector as an input and uses more
parameters to add more functionality. There is the following syntax of hist() function:
1. hist(v,main,xlab,ylab,xlim,ylim,breaks,col,border)
Here,
Let?s see an example in which we create a simple histogram with the help of required parameters
like v, main, col, etc.
Example
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)
Output:
Let?s see some more examples in which we have used different parameters of hist() function to add
more functionality or to create a more attractive chart.
Output:
Example: Finding return value of hist()
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60)
Output:
Example: Using histogram return values for labels using text()
# Creating data for the graph.
v <- c(12,24,16,38,21,13,55,17,39,10,60,120,40,70,90)
# Giving a name to the chart file.
png(file = "histogram_return.png")
Output:
Output:
R Line Graphs
A line graph is a pictorial representation of information which changes continuously over
time.
A line graph can also be referred to as a line chart. Within a line graph, there are points
connecting the data to show the continuous change.
The lines in a line graph can move up and down based on the data. We can use a line graph
to compare different events, information, and situations.
A line chart is used to connect a series of points by drawing line segments between them. Line charts
are used in identifying the trends in data. For line graph construction, R provides plot() function,
which has the following syntax:
plot(v,type,col,xlab,ylab)
Here,
2. type This parameter takes the value ?I? to draw only the lines or ?p? to
draw only the points and "o" to draw both lines and points.
3. xlab It is the label for the x-axis.
6. col It is used to give the color for both the points and lines
Let?s see a basic example to understand how plot() function is used to create the line graph:
Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
# Giving a name to the chart file.
png(file = "line_graph.jpg")
# Plotting the bar chart.
plot(v,type = "o")
# Saving the file.
[Link]()
Output:
Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
# Giving a name to the chart file.
png(file = "line_graph_feature.jpg")
# Plotting the bar chart.
plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
# Saving the file.
[Link]()
Output:
Line Charts Containing Multiple Lines
In our previous examples, we created line graphs containing only one line in each graph. R allows us
to create a line graph containing multiple lines. R provides lines() function to create a line in the line
graph.
The lines() function takes an additional input vector for creating a line. Let?s see an example to
understand how this function is used:
Example
# Creating the data for the chart.
v <- c(13,22,28,7,31)
w <- c(11,13,32,6,35)
x <- c(12,22,15,34,35)
# Giving a name to the chart file.
png(file = "multi_line_graph.jpg")
# Plotting the bar chart.
plot(v,type = "o",col="green",xlab="Month",ylab="Temperature")
lines(w, type = "o", col = "red")
lines(x, type = "o", col = "blue")
# Saving the file.
[Link]()
Output:
Line Graph using ggplot2
In R, there is another way to create a line graph i.e. the use of ggplot2 packages. The ggplot2 package
provides geom_line(), geom_step() and geom_path() function to create line graph. To use these
functions, we first have to install the ggplot2 package and then we load it into the current working
library.
Let?s see an example to understand how ggplot2 is used to create a line graph. In the below example,
we will use the predefined ToothGrowth dataset, which describes the effect of vitamin C on tooth
growth in Guinea pigs.
Example
library(ggplot2)
#Creating data for the graph
data_frame<- [Link](dose=c("D0.5", "D1", "D2"),
len=c(4.2, 10, 29.5))
head(data_frame)
png(file = "multi_line_graph2.jpg")
# Basic line plot with points
ggplot(data=data_frame, aes(x=dose, y=len, group=1)) +geom_line()+geom_point()
# Change the line type
ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(linetype = "dashed")+geom_point()
# Change the color
ggplot(data=df, aes(x=dose, y=len, group=1)) +geom_line(color="red")+geom_point()
[Link]()
Output:
R Scatterplots
The scatter plots are used to compare variables.
A comparison between variables is required when we need to define how much one variable
is affected by another variable.
In a scatterplot, the data is represented as a collection of points.
Each point on the scatterplot defines the values of the two variables.
One variable is selected for the vertical axis and other for the horizontal axis.
In R, there are two ways of creating scatterplot, i.e., using plot() function and using the
ggplot2 package's functions.
Here,
Let's see an example to understand how we can construct a scatterplot using the plot function. In
our example, we will use the dataset "mtcars", which is the predefined dataset available in the R
environment.
Example
#Fetching two columns from mtcars
data <-mtcars[,c('wt','mpg')]
# Giving a name to the chart file.
png(file = "[Link]")
# Plotting the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = data$wt,y = data$mpg, xlab = "Weight", ylab = "Milage", xlim = c(2.5,5), ylim = c(15,30), main = "W
eight v/sMilage")
# Saving the file.
[Link]()
Output:
Scatterplot using ggplot2
In R, there is another way for creating scatterplot i.e. with the help of ggplot2 package.
The ggplot2 package provides ggplot() and geom_point() function for creating a scatterplot. The
ggplot() function takes a series of the input item. The first parameter is an input vector, and the
second is the aes() function in which we add the x-axis and y-axis.
Let's start understanding how the ggplot2 package is used with the help of an example where we
have used the familiar dataset "mtcars".
Example
#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "scatterplot_ggplot.png")
# Plotting the chart using ggplot() and geom_point() functions.
ggplot(mtcars, aes(x = drat, y = mpg)) +geom_point()
# Saving the file.
[Link]()
Output:
We can add more features and make a more attractive scatter plots also. Below are some examples
in which different parameters are added.
Output:
Output:
Output:
Adding information to the graph
Example 4: Adding title
#Loading ggplot2 package
library(ggplot2)
# Giving a name to the chart file.
png(file = "[Link]")
#Creating scatterplot with fitted values.
# An additional function stst_smooth is used for linear regression.
new_graph<-
ggplot(mtcars, aes(x = log(mpg), y = log(drat))) +geom_point(aes(color = factor(gear))) +
stat_smooth(method = "lm",col = "#C42126",se = FALSE,size = 1)
#in above example lm is used for linear regression and se stands for standard error.
new_graph+
labs(
title = "Scatterplot with more information"
)
# Saving the file.
[Link]()
Output:
Output:
Output:
Output:
Output: