Graphics with R
There are two types of graphical functions – high level function and low level function. High
level function creates a new graph whereas low level function adds elements to an already
existing graph. The graphs are produced with respect to graphical parameters, which are defined
by default and can be modified with the par function.
The following are certain functions used for standard plots.
Function Name of the plot
plot() Scatterplot
hist() Histogram
boxplot() Box—and- whiskers plot
stripchart() Stripchart
barplot() Bar diagram
stem() Stem-and-leaf plot
The plot function, hist function and boxplot function take extra arguments that control the
graphic. The following are some of the possible arguments
Arguments Explanation
main= Title
xlab= Label for X-axis
ylab= Label for Y-axis
xlim= Specifies x-limits
ylim= Specifies y-limits
type= Type of plot: “p” for points, “l” for lines, “o” for lines through points
pch= The style of points. It can be a number between 0 and 20. Plot character pch 0 is
for a square, 1 is for a circle, 2 is for a triangle, 3 is for a plus and so on
lty= The style of lines plotted. Line type (lty) can be specified using either text
(“blank”, “solid”, “dashed”, “dotted”, “dot dash”, “long dash”, “two dash”) or
number (0, 1, 2, 3, 4, 5, 6)
cex= Magnification factor. cex stands for “character expansion.” It's a parameter in R
that allows you to adjust the size of text, symbols, and other graphical elements
within your plots.
Some of low level plotting functions are
Function Explanation
lines() Lines
abline() Line given by intercept and slope
points() Points
text() Text in the plot
legend() List of symbols
Example:
x= seq(0,5,by 0.5)
y=x
plot(x,y,”o”,pch=1,lty=1,xlab=”x”,ylab=”y”,xlim=range(0,5), ylim=range(0,max(y)),cex=0.7)
Simple Bar Plot in R
A simple bar plot is used to represent categorical data with rectangular bars, where the height (or
length) of each bar is proportional to the corresponding value.
In R, bar plots are drawn using the barplot() function.
Syntax
barplot(values, [Link] = “ ”, main = “”, xlab = , ylab = , col = , horiz = )
Argument Explanation
Values Numeric vector representing the values (mandatory)
[Link] Names or labels for each bar
Main Title of the bar plot
Xlab Label for X-axis
Ylab Label for Y-axis
Col Color of bars
Logical value; FALSE for vertical (default), TRUE for
Horiz
horizontal
Example I
The following data shows the number of students in different classes
Class A B C D
No. of students 40 35 50 45
R Program
students <- c(40, 35, 50, 45)
classes <- c("Class A", "Class B", "Class C", "Class D")
barplot(students, [Link] = classes, main = "Number of Students in Each Class", xlab =
"Classes", ylab = "Number of Students", col=”Green”, horiz=FALSE)
Sub-Divided Bar Chart (Stacked Bar Chart) in R
A sub-divided bar chart is a bar diagram in which each bar is divided into segments, showing the
composition of a total across different categories. Each bar represents a group, and the sub-
divisions represent components of that group.
Uses
To compare total values and their internal composition
To show distribution of parts within a whole
Data Structure in R
In R, sub-divided bar charts are usually drawn using a matrix or data frame, where:
Rows → sub-categories (components)
Columns → main categories (groups)
Syntax
barplot(values,
main = "Title of Chart",
xlab = "X-axis label",
ylab = "Y-axis label",
col = colors,
beside= FALSE
[Link] = TRUE)
Argument Explanation
Values Matrix containing component values
Main Title of the bar chart
Xlab Label for x-axis
Ylab Label for y-axis
Col Colors for different components
Beside FALSE (default)
[Link] Displays legend for components
Note: Instead of using [Link], we can use the locator function command:
legend(locator(1),legend=c(“names”),fill=c(“colours”),horiz=T)
Locator function allows us to choose the location for legend. The argument 1 means we can
locate one point. The argument legend in legend function is the text for legend and argument fill
specifies the colors of the boxes next to legend text. The argument horiz causes legend to be
placed horizontally when true. The default is vertical.
Example:
Marks obtained by students in three subjects are as follows. Represent it by a subdivided bar
diagram
Subject Students
A B C
Maths 45 35 30
Physics 25 30 35
Chemistry 20 25 30
R Program
marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)
rownames(marks) <- c("Maths", "Physics", "Chemistry")
colnames(marks) <- c("Student A", "Student B", "Student C")
marks
barplot(marks,
main = "Marks of Students",
xlab = "Students",
ylab = "Marks",
col = c("red", "blue", "green"),
beside= FALSE,
[Link] = rownames(marks))
Aliter
marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)
rownames(marks) <- c("Maths", "Physics", "Chemistry")
colnames(marks) <- c("Student A", "Student B", "Student C")
marks
barplot(marks,
main = "Marks of Students",
xlab = "Students",
ylab = "Marks",
col = c("red", "blue", "green"),beside= TRUE)
legend(locator(1),legend=rownames(marks),fill=c("red", "blue", "green"), horiz=T)
Percentage Bar Diagram
It is a modification of component bar diagram. here a simple bar diagram is drawn with all bars
having equal length and then it is subdivided into component parts of length proportional to the
percentages of the respective total. This diagram is used when comparison of components is
thought to be more important. That is a percentage bar diagram shows the relative percentage of
different components within each category. It is very useful for comparing proportions instead of
actual values.
Example:
Marks obtained by students in three subjects are as follows. Represent it by a percentage bar
diagram
Subject Students
A B C
Maths 45 35 30
Physics 25 30 35
Chemistry 20 25 30
R Program
marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)
rownames(marks) <- c("Maths", "Physics", "Chemistry")
colnames(marks) <- c("Student A", "Student B", "Student C")
marks
colSums(marks)
percent_marks<-t(marks)/colSums(marks)*100
percent_marks
barplot(t(percent_marks),
main = "Marks of Students",
xlab = "Students",
ylab = "%Marks",
col = c("red", "blue", "green"),beside= FALSE,
[Link] = rownames(marks))
Multiple Bar Diagram
In this type of diagram two or more bars are constructed adjoining each other. These bars either
represent different variables or various components of the same variable. The bars of a given set
are constructed adjoining each other and identical gap is left in between the bars of different sets.
Just like simple bar diagrams, the length of the various bars varies in the ratio of the magnitude
of the given-values. The width of the different bars is identical. This diagram, on the one hand,
facilitates comparison of the values of different variables in a set and on the other it facilitates
the comparison of the values of the same variable over a period of time. To facilitate easy
comparison, the different bars of a set may be coloured or shaded differently. But he colour or
shade for the bars representing the same variable or a component in different sets should be the
same
Example
The following table shows the sales of products over various years. Represent it by a multiple
bar diagram
Year Product A Product B
2021 40 35
2022 55 50
2023 65 60
R programme:
sales <- matrix(c(40,35,55,50,65,60),nrow = 3, byrow = TRUE)
rownames(sales) <-c("2021","2022","2023")
colnames(sales) <-c("Product A", "Product B")
sales
sales
barplot(sales,
main = "Yearly Sales",
xlab = "Year",
ylab = "Sales",
col = c("red", "blue", "green"),beside= TRUE)
legend(locator(1),legend=rownames(sales),fill=c("red", "blue", "green"), horiz=F)
Pie Chart
It is a two dimensional diagram in which area represents magnitude of data. A pie chart is a
circle partitioned into segments, where each of the segments represents a category. The size of
each segment depends upon the value of the corresponding category and is determined by an the
angle. A pie chart represents different values as slices of a circle with different colours or
shadings. In R the pie chart is created using pie() function which takes positive numbers as a
vector input. The additional parameters are used to control labels, colour, title, etc.
Syntax:
pie(x, labels, main, col=, clockwise)
where x - vector containing numeric values used in the pie chart
labels - describes slices
main- indicates title of the chart
col- indicates the colour palette
clockwise – is a logical value indicating whether slices are drawn clockwise or anticlockwise
Example1
The following table gives the turnover of 4 companies. Represent it by a pie diagram
Tata Birla Ambani Mahindra
100 125 140 95
R program
turn=c(100, 125, 140, 95)
comp=c("Tata","Birla","Ambani","Mahindra")
pie(turn,comp,col=c("red","green","yellow","blue"), main="Turnover")
Example 2
The following table gives the monthly expenditure of a family. Represent it by a pie diagram
Housing Food Clothes Entertainment Others
600 400 200 150 80
R programme:
expenditure=c(600, 400, 200, 150,80)
item=c("housing","food","cloths","entertainment","others")
pie(expenditure,item,col=c("red","green","yellow","blue","orange"),main="Monthly
Expenditure")
Histogram
It is the most commonly uses graphical method for presentation of frequency distributions. A
histogram is a graphical representation used in statistics to show the distribution of continuous
numerical data. The data is grouped into class intervals (bins), and the height of each bar shows
the frequency of values or frequency density. A histogram helps us understand the shape of the
data, such as peaks, spread, and whether the data is symmetric or skewed.
The syntax is hist(x,breaks=,xlab=,ylab=,main=,col=, border=)
where
x = vector containing numeric values
breaks –is optional and can be used to mention width of each bar
xlab- description of x axis
ylab- description of y axis
main- title of the chart
col- colour of bars
border- border colour of each bar
Example:
Plot a histogram to the following data
1,5,3,2,7,2,3,4,6,8,25,12,15,11,19,21,23,16,14,10,22,14,13,11,12,18,24,25,21,20,20,8,1,4,2,3
R program:
x=c(1,5,3,2,7,2,3,4,6,8,25,12,15,11,19,21,23,16,14,10,22,14,13,11,12,18,24,25,21,20,20,8,1,4,2,
3)
hist(x,xlab="observations",ylab="frequency",main="histogram",col="blue")
Example:
Plot a histogram to the following data
class 0-20 20-40 40-60 60-80 80-100
frequecy 7 12 15 8 5
R program
midx=seq(10,90,20)
midx
freq=c(7,12,15,8,5)
y=rep(midx,freq)
y
breaks=seq(0,100,20)
hist(y,breaks,xlab="class",ylab="frequecncy",main="Histogram",col="white")
Scatter Plot
A scatter plot is used to display a bivariate data using characters or symbols, generally dots. It is
mainly used to show the relationship between two quantitative variables for a set of data, i.e., to
find the relationship between two given variables. So, using scatter plot we can easily check
whether variables are correlated or not. In R, a scatter plot is created using the plot() function.
Most importantly, if the type argument of the plot() function is not specified, then by default it
creates a scatter plot.
Correlation and Regression Analysis
Correlation analysis pertains to the study of interdependence or co-variation of two variables. If
large (small) values of one variable are associated with large (small) values of the other variable,
the correlation is said to be positive (direct) correlation. If large (small) values of one variable
are associated with small (large) values of the other variable, the correlation is said to be
negative (inverse) correlation.
Coefficient of correlation: Scatter diagram gives us only a rough idea about the relationship
between two variables but does not give a numerical measure. Karl Pearson introduced a
numerical measure to evaluate the degree of linear relationship between two variables and it is
called Karl Pearson’s or product moment correlation coefficient. It is defined by
𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝑆𝐷(𝑥) × 𝑆𝐷(𝑦)
The value of r always lies between -1 and 1.
If r =1, then there is a perfect positive linear relationship between variables
If r = -1, then there is a perfect negative linear relationship between variables
If 0.7 ≤ r <1, then there is a strong positive linear relationship between variables
If -1 < r ≤ -0.7, then there is a strong negative linear relationship between variables
If r = 0, then there is no linear relationship between variables.
The R command for computing Karl Pearson’s coefficient of correlation is cor(x,y),
where x is the vector that stores values of first variable and the vector y that stores values
of second variable.
The function cov(x,y) is used to compute the covariance between the
variables x and y.
Rank Correlation: Spearman suggested a measure of correlation based on ranks of
observations. It is usually used when data is ordinal. The R command is
cor(x,y,method=”spearman”).
Regression Analysis: From the value of correlation coefficient and associated scatter
diagram, one can decide whether two variables are linearly related or not. If there exists a
linear relationship between the variables x and y, we have to establish the linear equation
that connects one variable with the other. Simple linear regression analysis considers a
single independent variable and a single dependent variable. There are situations in which
the choice of dependent or independent variable is not clear. Thus we consider two lines
of regression: y on x and x on y. The line of regression of y on x is used to predict y for
known values of x or in other words, y is the dependent variable and x is the independent
variable. The line of regression of x on y is used to predict the values of x for known
values of x.
The line of regression of y on x is of the form y = ax + b, where a and b are
constants and are called slop coefficients and intercept respectively. The R command to
fit the line of regression of y on x is lm(y~x).
The line of regression of x on y is of the form x = cy + d, where c and d are constants and
are called slop coefficients and intercept respectively. The R command to fit the line of
regression of x on y is lm(x~y).
In R, the predict() function is used to generate predicted values from a fitted model . The
R syntax is predict(object, newdata), where object is the fitted model and newdata is
data frame containing new values.