0% found this document useful (0 votes)
3 views13 pages

R Notes

The document provides an overview of graphical functions in R, distinguishing between high-level and low-level functions for creating and modifying graphs. It covers various plotting functions such as scatterplots, histograms, bar plots, and pie charts, along with their syntax and examples. Additionally, it discusses correlation and regression analysis, including the calculation of correlation coefficients and the establishment of linear relationships between variables.

Uploaded by

akshayaammuzz224
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views13 pages

R Notes

The document provides an overview of graphical functions in R, distinguishing between high-level and low-level functions for creating and modifying graphs. It covers various plotting functions such as scatterplots, histograms, bar plots, and pie charts, along with their syntax and examples. Additionally, it discusses correlation and regression analysis, including the calculation of correlation coefficients and the establishment of linear relationships between variables.

Uploaded by

akshayaammuzz224
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Graphics with R

There are two types of graphical functions – high level function and low level function. High
level function creates a new graph whereas low level function adds elements to an already
existing graph. The graphs are produced with respect to graphical parameters, which are defined
by default and can be modified with the par function.

The following are certain functions used for standard plots.

Function Name of the plot


plot() Scatterplot
hist() Histogram
boxplot() Box—and- whiskers plot
stripchart() Stripchart
barplot() Bar diagram
stem() Stem-and-leaf plot

The plot function, hist function and boxplot function take extra arguments that control the
graphic. The following are some of the possible arguments

Arguments Explanation
main= Title
xlab= Label for X-axis
ylab= Label for Y-axis
xlim= Specifies x-limits
ylim= Specifies y-limits
type= Type of plot: “p” for points, “l” for lines, “o” for lines through points
pch= The style of points. It can be a number between 0 and 20. Plot character pch 0 is
for a square, 1 is for a circle, 2 is for a triangle, 3 is for a plus and so on
lty= The style of lines plotted. Line type (lty) can be specified using either text
(“blank”, “solid”, “dashed”, “dotted”, “dot dash”, “long dash”, “two dash”) or
number (0, 1, 2, 3, 4, 5, 6)
cex= Magnification factor. cex stands for “character expansion.” It's a parameter in R
that allows you to adjust the size of text, symbols, and other graphical elements
within your plots.

Some of low level plotting functions are

Function Explanation
lines() Lines
abline() Line given by intercept and slope
points() Points
text() Text in the plot
legend() List of symbols

Example:

x= seq(0,5,by 0.5)

y=x

plot(x,y,”o”,pch=1,lty=1,xlab=”x”,ylab=”y”,xlim=range(0,5), ylim=range(0,max(y)),cex=0.7)

Simple Bar Plot in R

A simple bar plot is used to represent categorical data with rectangular bars, where the height (or
length) of each bar is proportional to the corresponding value.

In R, bar plots are drawn using the barplot() function.

Syntax

barplot(values, [Link] = “ ”, main = “”, xlab = , ylab = , col = , horiz = )

Argument Explanation
Values Numeric vector representing the values (mandatory)
[Link] Names or labels for each bar
Main Title of the bar plot
Xlab Label for X-axis
Ylab Label for Y-axis
Col Color of bars
Logical value; FALSE for vertical (default), TRUE for
Horiz
horizontal
Example I

The following data shows the number of students in different classes

Class A B C D
No. of students 40 35 50 45

R Program

students <- c(40, 35, 50, 45)

classes <- c("Class A", "Class B", "Class C", "Class D")

barplot(students, [Link] = classes, main = "Number of Students in Each Class", xlab =


"Classes", ylab = "Number of Students", col=”Green”, horiz=FALSE)

Sub-Divided Bar Chart (Stacked Bar Chart) in R

A sub-divided bar chart is a bar diagram in which each bar is divided into segments, showing the
composition of a total across different categories. Each bar represents a group, and the sub-
divisions represent components of that group.

Uses

 To compare total values and their internal composition


 To show distribution of parts within a whole

Data Structure in R

In R, sub-divided bar charts are usually drawn using a matrix or data frame, where:

 Rows → sub-categories (components)


 Columns → main categories (groups)

Syntax

barplot(values,

main = "Title of Chart",

xlab = "X-axis label",


ylab = "Y-axis label",

col = colors,

beside= FALSE

[Link] = TRUE)

Argument Explanation
Values Matrix containing component values
Main Title of the bar chart
Xlab Label for x-axis
Ylab Label for y-axis
Col Colors for different components
Beside FALSE (default)
[Link] Displays legend for components

Note: Instead of using [Link], we can use the locator function command:
legend(locator(1),legend=c(“names”),fill=c(“colours”),horiz=T)

Locator function allows us to choose the location for legend. The argument 1 means we can
locate one point. The argument legend in legend function is the text for legend and argument fill
specifies the colors of the boxes next to legend text. The argument horiz causes legend to be
placed horizontally when true. The default is vertical.

Example:

Marks obtained by students in three subjects are as follows. Represent it by a subdivided bar
diagram

Subject Students
A B C
Maths 45 35 30
Physics 25 30 35
Chemistry 20 25 30

R Program

marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)

rownames(marks) <- c("Maths", "Physics", "Chemistry")


colnames(marks) <- c("Student A", "Student B", "Student C")

marks

barplot(marks,

main = "Marks of Students",

xlab = "Students",

ylab = "Marks",

col = c("red", "blue", "green"),

beside= FALSE,

[Link] = rownames(marks))

Aliter

marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)

rownames(marks) <- c("Maths", "Physics", "Chemistry")

colnames(marks) <- c("Student A", "Student B", "Student C")

marks

barplot(marks,

main = "Marks of Students",

xlab = "Students",

ylab = "Marks",

col = c("red", "blue", "green"),beside= TRUE)

legend(locator(1),legend=rownames(marks),fill=c("red", "blue", "green"), horiz=T)

Percentage Bar Diagram

It is a modification of component bar diagram. here a simple bar diagram is drawn with all bars
having equal length and then it is subdivided into component parts of length proportional to the
percentages of the respective total. This diagram is used when comparison of components is
thought to be more important. That is a percentage bar diagram shows the relative percentage of
different components within each category. It is very useful for comparing proportions instead of
actual values.

Example:

Marks obtained by students in three subjects are as follows. Represent it by a percentage bar
diagram

Subject Students
A B C
Maths 45 35 30
Physics 25 30 35
Chemistry 20 25 30

R Program

marks <- matrix(c(40, 35, 30,25, 30, 35,20, 25, 30),nrow = 3, byrow = TRUE)

rownames(marks) <- c("Maths", "Physics", "Chemistry")

colnames(marks) <- c("Student A", "Student B", "Student C")

marks

colSums(marks)

percent_marks<-t(marks)/colSums(marks)*100

percent_marks

barplot(t(percent_marks),

main = "Marks of Students",

xlab = "Students",

ylab = "%Marks",

col = c("red", "blue", "green"),beside= FALSE,

[Link] = rownames(marks))

Multiple Bar Diagram

In this type of diagram two or more bars are constructed adjoining each other. These bars either
represent different variables or various components of the same variable. The bars of a given set
are constructed adjoining each other and identical gap is left in between the bars of different sets.
Just like simple bar diagrams, the length of the various bars varies in the ratio of the magnitude
of the given-values. The width of the different bars is identical. This diagram, on the one hand,
facilitates comparison of the values of different variables in a set and on the other it facilitates
the comparison of the values of the same variable over a period of time. To facilitate easy
comparison, the different bars of a set may be coloured or shaded differently. But he colour or
shade for the bars representing the same variable or a component in different sets should be the
same

Example

The following table shows the sales of products over various years. Represent it by a multiple
bar diagram

Year Product A Product B


2021 40 35
2022 55 50
2023 65 60

R programme:

sales <- matrix(c(40,35,55,50,65,60),nrow = 3, byrow = TRUE)

rownames(sales) <-c("2021","2022","2023")

colnames(sales) <-c("Product A", "Product B")

sales

sales

barplot(sales,

main = "Yearly Sales",

xlab = "Year",

ylab = "Sales",

col = c("red", "blue", "green"),beside= TRUE)


legend(locator(1),legend=rownames(sales),fill=c("red", "blue", "green"), horiz=F)

Pie Chart

It is a two dimensional diagram in which area represents magnitude of data. A pie chart is a
circle partitioned into segments, where each of the segments represents a category. The size of
each segment depends upon the value of the corresponding category and is determined by an the
angle. A pie chart represents different values as slices of a circle with different colours or
shadings. In R the pie chart is created using pie() function which takes positive numbers as a
vector input. The additional parameters are used to control labels, colour, title, etc.

Syntax:

pie(x, labels, main, col=, clockwise)

where x - vector containing numeric values used in the pie chart

labels - describes slices

main- indicates title of the chart

col- indicates the colour palette

clockwise – is a logical value indicating whether slices are drawn clockwise or anticlockwise

Example1

The following table gives the turnover of 4 companies. Represent it by a pie diagram

Tata Birla Ambani Mahindra


100 125 140 95

R program

turn=c(100, 125, 140, 95)

comp=c("Tata","Birla","Ambani","Mahindra")
pie(turn,comp,col=c("red","green","yellow","blue"), main="Turnover")

Example 2

The following table gives the monthly expenditure of a family. Represent it by a pie diagram

Housing Food Clothes Entertainment Others


600 400 200 150 80

R programme:

expenditure=c(600, 400, 200, 150,80)

item=c("housing","food","cloths","entertainment","others")

pie(expenditure,item,col=c("red","green","yellow","blue","orange"),main="Monthly
Expenditure")

Histogram

It is the most commonly uses graphical method for presentation of frequency distributions. A
histogram is a graphical representation used in statistics to show the distribution of continuous
numerical data. The data is grouped into class intervals (bins), and the height of each bar shows
the frequency of values or frequency density. A histogram helps us understand the shape of the
data, such as peaks, spread, and whether the data is symmetric or skewed.

The syntax is hist(x,breaks=,xlab=,ylab=,main=,col=, border=)

where

x = vector containing numeric values

breaks –is optional and can be used to mention width of each bar

xlab- description of x axis

ylab- description of y axis

main- title of the chart


col- colour of bars

border- border colour of each bar

Example:

Plot a histogram to the following data

1,5,3,2,7,2,3,4,6,8,25,12,15,11,19,21,23,16,14,10,22,14,13,11,12,18,24,25,21,20,20,8,1,4,2,3

R program:

x=c(1,5,3,2,7,2,3,4,6,8,25,12,15,11,19,21,23,16,14,10,22,14,13,11,12,18,24,25,21,20,20,8,1,4,2,
3)

hist(x,xlab="observations",ylab="frequency",main="histogram",col="blue")

Example:

Plot a histogram to the following data

class 0-20 20-40 40-60 60-80 80-100


frequecy 7 12 15 8 5

R program

midx=seq(10,90,20)
midx
freq=c(7,12,15,8,5)
y=rep(midx,freq)
y
breaks=seq(0,100,20)
hist(y,breaks,xlab="class",ylab="frequecncy",main="Histogram",col="white")
Scatter Plot

A scatter plot is used to display a bivariate data using characters or symbols, generally dots. It is
mainly used to show the relationship between two quantitative variables for a set of data, i.e., to
find the relationship between two given variables. So, using scatter plot we can easily check
whether variables are correlated or not. In R, a scatter plot is created using the plot() function.
Most importantly, if the type argument of the plot() function is not specified, then by default it
creates a scatter plot.

Correlation and Regression Analysis

Correlation analysis pertains to the study of interdependence or co-variation of two variables. If


large (small) values of one variable are associated with large (small) values of the other variable,
the correlation is said to be positive (direct) correlation. If large (small) values of one variable
are associated with small (large) values of the other variable, the correlation is said to be
negative (inverse) correlation.

Coefficient of correlation: Scatter diagram gives us only a rough idea about the relationship
between two variables but does not give a numerical measure. Karl Pearson introduced a
numerical measure to evaluate the degree of linear relationship between two variables and it is
called Karl Pearson’s or product moment correlation coefficient. It is defined by

𝐶𝑜𝑣(𝑥, 𝑦)
𝑟=
𝑆𝐷(𝑥) × 𝑆𝐷(𝑦)

 The value of r always lies between -1 and 1.


 If r =1, then there is a perfect positive linear relationship between variables
 If r = -1, then there is a perfect negative linear relationship between variables
 If 0.7 ≤ r <1, then there is a strong positive linear relationship between variables
 If -1 < r ≤ -0.7, then there is a strong negative linear relationship between variables
 If r = 0, then there is no linear relationship between variables.
The R command for computing Karl Pearson’s coefficient of correlation is cor(x,y),
where x is the vector that stores values of first variable and the vector y that stores values
of second variable.
The function cov(x,y) is used to compute the covariance between the
variables x and y.
Rank Correlation: Spearman suggested a measure of correlation based on ranks of
observations. It is usually used when data is ordinal. The R command is
cor(x,y,method=”spearman”).

Regression Analysis: From the value of correlation coefficient and associated scatter
diagram, one can decide whether two variables are linearly related or not. If there exists a
linear relationship between the variables x and y, we have to establish the linear equation
that connects one variable with the other. Simple linear regression analysis considers a
single independent variable and a single dependent variable. There are situations in which
the choice of dependent or independent variable is not clear. Thus we consider two lines
of regression: y on x and x on y. The line of regression of y on x is used to predict y for
known values of x or in other words, y is the dependent variable and x is the independent
variable. The line of regression of x on y is used to predict the values of x for known
values of x.
The line of regression of y on x is of the form y = ax + b, where a and b are
constants and are called slop coefficients and intercept respectively. The R command to
fit the line of regression of y on x is lm(y~x).
The line of regression of x on y is of the form x = cy + d, where c and d are constants and
are called slop coefficients and intercept respectively. The R command to fit the line of
regression of x on y is lm(x~y).

In R, the predict() function is used to generate predicted values from a fitted model . The
R syntax is predict(object, newdata), where object is the fitted model and newdata is
data frame containing new values.

You might also like