0% found this document useful (0 votes)
13 views23 pages

Graphing Data in R: Bar and Pie Charts

The document provides an overview of data analytics using R, focusing on variable selection, filtering data, and creating various types of plots including bar charts, pie charts, box plots, and scatter plots. It includes examples of how to use functions like select(), filter(), barplot(), pie(), and boxplot() to manipulate and visualize data effectively. Additionally, it explains the syntax and parameters for each plotting function to customize the visual output.

Uploaded by

Sathya Bhat
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views23 pages

Graphing Data in R: Bar and Pie Charts

The document provides an overview of data analytics using R, focusing on variable selection, filtering data, and creating various types of plots including bar charts, pie charts, box plots, and scatter plots. It includes examples of how to use functions like select(), filter(), barplot(), pie(), and boxplot() to manipulate and visualize data effectively. Additionally, it explains the syntax and parameters for each plotting function to customize the visual output.

Uploaded by

Sathya Bhat
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data analytics Using R Unit - IV

Example 4 : Selecting Variables contain 'I' in their names


mydata4 = select(mydata, contains("I"))

filter( ) Function
It is used to subset data with matching logical conditions.
syntax : filter(data , ....)
data : Data Frame
.... : Logical Condition

Example 1 : Filter Rows


Suppose you need to subset data. You want to filter rows and retain only those values in which
Index is equal to A.
mydata7 = filter(mydata, Index == "A")
Index State Y2002 Y2003 Y2004 Y2005 Y2006 Y2007 Y2008 Y2009
1 A Alabama 1296530 1317711 1118631 1492583 1107408 1440134 1945229 1944173
2 A Alaska 1170302 1960378 1818085 1447852 1861639 1465841 1551826 1436541
3 A Arizona 1742027 1968140 1377583 1782199 1102568 1109382 1752886 1554330
4 A Arkansas 1485531 1994927 1119299 1947979 1669191 1801213 1188104 1628980

Y2010 Y2011 Y2012 Y2013 Y2014 Y2015


1 1237582 1440756 1186741 1852841 1558906 1916661
2 1629616 1230866 1512804 1985302 1580394 1979143
3 1300521 1130709 1907284 1363279 1525866 1647724
4 1669295 1928238 1216675 1591896 1360959 1329341

Example 2 : Multiple Selection Criteria


The %in% operator can be used to select multiple items. In the following program, we are telling
R to select rows against 'A' and 'C' in column 'Index'.
mydata7 = filter(mydata6, Index %in% c("A", "C"))

Graphics in R:
Graphical facilities are an important and extremely versatile component of the R environment.
It is possible to use the facilities to display a wide variety of statistical graphs and also to
build entirely new types of graph.
R is capable of creating high quality graphics. Graphs are typically created using a series of high-
level and low-level plotting commands. High-level functions create new plots and low-level
functions add information to an existing plot. Customize graphs (line style, symbols, color, etc)
by specifying graphical parameters. Specify graphic options using the par() function.

45
Data analytics Using R Unit - IV

Once the device driver is running, R plotting commands can be used to produce a variety of
graphical displays and to create entirely new kinds of display. Plotting commands are divided
into two basic groups.

• High-level plotting commands: High Level plotting functions create a new plot on the
graphics device, possibly with axes, labels, titles and so on. High-level plotting functions are
designed to generate a complete plot of the data passed as arguments to the function. Where
appropriate, axes, labels and titles are automatically generated (unless you request otherwise.)
High-level plotting commands always start a new plot, erasing the current plot if necessary.

plot() Scatter plot


hist() Histogram
boxplot() Boxplot
qqplot(), qqnorm(), qqline() Quantile plots
[Link]() Interaction plot
sunflower plot() Sunflower scatter plot
pairs() Scatter plot matrix
symbols() Draw symbols on a plot
dotchart(), barplot(), pie() Dot chart, bar chart, pie chart

curve() Draw a curve from a given function


Create a grid of colored rectangles with colors
image() based on the values of a third variable

• Low-level plotting commands: Low-level plotting functions add more information to an


existing plot, such as extra points, lines and labels. Sometimes the high-level plotting
functions don’t produce exactly the kind of plot you desire. In this case, low -level plotting
commands can be used to add extra information (such as points, lines or text) to the current
plot.
points() Add points to a figure
lines() Add lines to a figure
text() Insert text in the plot region
mtext() Insert text in the figure and outer margins
title() Add figure title or outer title
legend() Insert legend
axis(), [Link]() Customize axes
abline() Add horizontal and vertical lines or a single line
box() Draw a box around the current plot
polygon() Draw a polygon
rect() Draw a rectangle
arrows() Draw arrows
segments() Draw line segments

Bar Chart:

46
Data analytics Using R Unit - IV

A bar chart represents data in rectangular bars with length of the bar proportional to the value of
the variable. R uses the function barplot() to create bar charts. R can draw both vertical and
Horizontal bars in the bar chart. In bar chart each of the bars can be given different colors.
Syntax
The basic syntax to create a bar-chart in R is −
barplot(H,xlab,ylab,main, [Link],col)
Following is the description of the parameters used −

• H is a vector or matrix containing numeric values used in bar chart.


• xlab is the label for x axis.
• ylab is the label for y axis.
• main is the title of the bar chart.
• [Link] is a vector of names appearing under each bar.
• col is used to give colors to the bars in the graph.
Example
A simple bar chart is created using just the input vector and the name of each [Link] below
script will create and save the bar chart in the current R working directory.
# Create the data for the chart
H <- c(7,12,28,3,41)
# Give the chart file a name
png(file = "[Link]")
# Plot the bar chart
barplot(H)
# Save the file
[Link]()
When we execute above code, it produces following result −

47
Data analytics Using R Unit - IV

Bar Chart Labels, Title and Colors


The features of the bar chart can be expanded by adding more parameters. The main parameter
is used to add title. The col parameter is used to add colors to the bars. The [Link] is a
vector having same number of values as the input vector to describe the meaning of each bar.
Example
The below script will create and save the bar chart in the current R working directory.

# Create the data for the chart


H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")

# Give the chart file a name


png(file = "barchart_months_revenue.png")

# Plot the bar chart


barplot(H,[Link]=M,xlab="Month",ylab="Revenue",col="blue",
main="Revenue chart",border="red")

# Save the file


[Link]()

We can also plot bars horizontally by providing the argument horiz = TRUE.
# barchart with added parameters
barplot([Link], main = "Maximum Temperatures in a Week", xlab = "Degree Celsius", ylab =
"Day",
[Link] = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), col = "darkred", horiz =
TRUE)

48
Data analytics Using R Unit - IV

How to plot barplot with matrix?


As mentioned before, barplot() function can take in vector as well as matrix. If the input is
matrix, a stacked bar is plotted. Each column of the matrix will be represented by a stacked bar.
Let us consider the following matrix which is derived from our Titanic dataset.
> [Link]
Class
Survival 1st 2nd 3rd Crew
No 122 167 528 673
Yes 203 118 178 212
This data is plotted as follows.
barplot([Link], main = "Survival of Each Class", xlab = "Class",col = c("red","green") )
legend("topleft", c("Not survived","Survived"), fill = c("red","green") )

Instead of a stacked bar we can have different bars for each element in a column juxtaposed to
each other by specifying the parameter beside = TRUE as shown below.

49
Data analytics Using R Unit - IV

Pie Chart:
A pie-chart is a representation of values as slices of a circle with different colors. The slices are
labeled and the numbers corresponding to each slice is also represented in the [Link] R the pie
chart is created using the pie() function which takes positive numbers as a vector input. The
additional parameters are used to control labels, color, title etc.
Syntax:
pie(x, labels, radius, main, col, clockwise)
Following is the description of the parameters used −
• x is a vector containing the numeric values used in the pie chart.
• labels is used to give description to the slices.
• radius indicates the radius of the circle of the pie chart.(value between −1 and +1).
• main indicates the title of the chart.
• col indicates the color palette.
• clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
Example:
A very simple pie-chart is created using just the input vector and labels. The below script will
create and save the pie chart in the current R working directory

# Create data for the graph.


x <- c(21, 62, 10, 53)
labels<- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.


png(file = "[Link]")

# Plot the chart.


pie(x,labels)

# Save the file.


[Link]()

50
Data analytics Using R Unit - IV

Pie Chart Title and Colors


We can expand the features of the chart by adding more parameters to the function. We will use
parameter main to add a title to the chart and another parameter is col which will make use of
rainbow color pallet while drawing the chart. The length of the pallet should be same as the
number of values we have for the chart. Hence we use length(x).
Example
The below script will create and save the pie chart in the current R working directory.

# Create data for the graph.


x <- c(21, 62, 10, 53)
labels<- c("London", "New York", "Singapore", "Mumbai")

# Give the chart file a name.


png(file = "city_title_colours.jpg")

# Plot the chart with title and rainbow color pallet.


pie(x, labels, main = "City pie chart", col = rainbow(length(x)))

# Save the file.


[Link]()

51
Data analytics Using R Unit - IV

Example 2: Pie chart with additional parameters


pie(expenditure, labels=[Link](expenditure), main="Monthly Expenditure Breakdown",
col=c("red","orange","yellow","blue","green"), border="brown", clockwise=TRUE )

Box Plots:
Boxplots are a measure of how well distributed is the data in a data set. It divides the data set
into three quartiles. This graph represents the minimum, maximum, median, first quartile and
third quartile in the data set. It is also useful in comparing the distribution of data across data sets
by drawing boxplots for each of them. The boxplot() function takes in any number of
numeric vectors, drawing a boxplot for each vector. You can also pass in a list (or data frame)

52
Data analytics Using R Unit - IV

with numeric vectors as its components. Boxplots are created in R by using


the boxplot() function.
Syntax
The basic syntax to create a boxplot in R is −
boxplot(x, data, notch, varwidth, names, main)
Following is the description of the parameters used −
• x is a vector or a formula.
• data is the data frame.
• notch is a logical value. Set as TRUE to draw a notch.
• varwidth is a logical value. Set as true to draw width of the box proportionate to the
sample size.
• names are the group labels which will be printed under each boxplot.
• main is used to give a title to the graph.

Ex1:
Let us use the built-in dataset airquality which has “Daily air quality measurements in New
York, May to September 1973.”-R documentation.
> str(airquality)
'[Link]': 153 obs. of 6 variables:
$ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
$ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
$ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
$ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
$ Month : int 5 5 5 5 5 5 5 5 5 5 ...
$ Day : int 1 2 3 4 5 6 7 8 9 10 ...

Let us make a boxplot for the ozone readings.

53
Data analytics Using R Unit - IV

>boxplot(airquality$Ozone)

We can see that data above the median is more dispersed. We can also notice two outliers at the
higher extreme.
Ex2:
We can pass in additional parameters to control the way our plot looks. Some of the frequently
used ones are, main-to give the title, xlab and ylab-to provide labels for the axes, col to define
color etc. Additionally, with the argument horizontal = TRUE we can plot it horizontally and
with notch = TRUE we can add a notch to the box.

boxplot(airquality$Ozone,
main = "Mean ozone in parts per billion at Roosevelt Island",
xlab = "Parts Per Billion",
ylab = "Ozone",
col = "orange",
border = "brown",
horizontal = TRUE,
notch = TRUE
)

54
Data analytics Using R Unit - IV

Multiple Boxplots
We can draw multiple boxplots in a single plot, by passing in a list, data frame or multiple
vectors.
Let us consider the Ozone and Temp field of airquality dataset. Let us also generate normal
distribution with the same mean and standard deviation and plot them side by side for
comparison.
# prepare the data
>ozone <- airquality$Ozone
>temp <- airquality$Temp
# gererate normal distribution with same mean and sd
>ozone_norm <- rnorm(200,mean=mean(ozone, [Link]=TRUE), sd=sd(ozone, [Link]=TRUE))
>temp_norm <- rnorm(200,mean=mean(temp, [Link]=TRUE), sd=sd(temp, [Link]=TRUE))
➢ rnorm generates a random value from the normal distribution. runif generates a random
value from the uniform.
Now we us make 4 boxplots with this data. We use the arguments at and names to denote the
place and label.
>boxplot(ozone, ozone_norm, temp, temp_norm,
main = "Multiple boxplots for comparision",
at = c(1,2,4,5),
names = c("ozone", "normal", "temp", "normal"),
las = 2,
col = c("orange","red"),
border = "brown",
horizontal = TRUE,
notch = TRUE
)

55
Data analytics Using R Unit - IV

Scatter Plot:
The Scatter Plot in R Programming is very useful to visualize the relationship between two sets
of data. The data is displayed as collection of points that shows the linear relation between those
two data sets. For example, if we want to visualize the Age against Weight then we can use this
Scatter Plot.

Scatterplots show many points plotted in the Cartesian plane. Each point represents the values
of two variables. One variable is chosen in the horizontal axis and another in the vertical axis.
The simple scatterplot is created using the plot() function.
Syntax
The basic syntax for creating scatterplot in R is −
plot(x, y, type,main, xlab, ylab, xlim, ylim, axes)
Following is the description of the parameters used −
• x is the data set whose values are the horizontal coordinates.
• y is the data set whose values are the vertical coordinates.
• type: Please specify, what type of plot you want to draw.
o To draw Points, use type = “p”
o To draw Lines use type = “l”
o Use type = “h” for Histograms
o Use type = “s” for stair steps
o To draw over-plotted use type = “o”
• main is the tile of the graph.
• xlab is the label in the horizontal axis.
• ylab is the label in the vertical axis.
• xlim is the limits of the values of x used for plotting.
• ylim is the limits of the values of y used for plotting.

56
Data analytics Using R Unit - IV

• axes indicate whether both axes should be drawn on the plot.


Example
We use the data set "mtcars" available in the R environment to create a basic scatterplot. Let's
use the columns "wt" and "mpg" in mtcars.

input<- mtcars[,c('wt','mpg')]

print(head(input))
When we execute the above code, it produces the following result −
wt mpg
Mazda RX4 2.620 21.0
Mazda RX4 Wag 2.875 21.0
Datsun 710 2.320 22.8
Hornet 4 Drive 3.215 21.4
Hornet Sportabout 3.440 18.7
Valiant 3.460 18.1

Creating the Scatterplot


The below script will create a scatterplot graph for the relation between wt(weight) and
mpg(miles per gallon).

# Get the input values.


input<- mtcars[,c('wt','mpg')]
# Give the chart file a name.
png(file = "[Link]")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage" )
# Save the file.
[Link]()

57
Data analytics Using R Unit - IV

# How to create a Scatter Plot in R Example


faithful

# Finding the Correlation


cor(faithful$eruptions, faithful$waiting)

# Drawing Scatter Plot


plot(faithful$eruptions, faithful$waiting)

Following statement will find the correlation between the eruptions, and waiting
cor(faithful$eruptions, faithful$waiting)

Change Colors of Scatter plot in R


In this example we will show you, how to change the scatter plot color using col argument, and
size of the character that represents the point using cex (character expansion) argument.

• col: Please specify the color you want to use for your Scatter plot.
• cex: Please specify the size of the point(s)
R CODE
• # R Scatter Plot - Changing Color, Dot Size Example
• # Drawing Scatter Plot
• plot(faithful$eruptions, faithful$waiting,
• col = "chocolate",
• cex = 1.2,
• main = "R Scatter Plot",

58
Data analytics Using R Unit - IV

• xlab = "Eruptions",
• ylab = "Waiting",
• las = 1)

Change Shapes and Axis limits of Scatter Plot in R


In this example we will show you, How to change the shape using pch argument.

• xlim: This argument can help you to specify the limits for the X-Axis
• ylim: This argument may help you to specify the Y-Axis limits
R CODE
# R Scatter Plot - Changing X, Y Limitations, Dot Sape Example
faithful

# Drawing Scatter Plot


plot(faithful$eruptions, faithful$waiting,
col = "chocolate",
pch = 8,
main = "R Scatter Plot",
xlab = "Eruptions",
ylab = "Waiting",
las = 1,
xlim = c(1.5, 5.5),
ylim = c(40, 100))

Low-Level Graphics:

59
Data analytics Using R Unit - IV

The plot( ) function in R:


The most used plotting function in R programming is the plot() function. It is a generic function,
meaning, it has many methods which are called according to the type of object passed to plot().
In the simplest case, we can pass in a vector and we will get a scatter plot of magnitude vs index.
But generally, we pass in two vectors and a scatter plot of these points are plotted.
For example, the command plot(c(1,2),c(3,5)) would plot the points (1,3) and (2,5).

Adding shapes to Graphs:


➢ Adding Titles and Labeling Axes
We can add a title to our plot with the parameter main. similarly, xlab and ylab can be used to
label the x-axis and y-axis respectively.

plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)")

➢ Changing Color and Plot Type


We can see above that the plot is of circular points and black in color. This is the default color.
We can change the plot type with the argument type. It accepts the following strings and has the
given effect.

"p" - points
"l" - lines
"b" - both points and lines
"c" - empty points joined by lines
"o" - over plotted points and lines
"s" and "S" - stair steps
"h" - histogram-like vertical lines

60
Data analytics Using R Unit - IV

"n" - does not produce any points or lines

Ex:

plot(x, sin(x),
main="The Sine Function",
ylab="sin(x)",
type="l",
col="blue")

➢ Overlaying Plots Using legend () and lines () functions:


Calling plot() multiple times will have the effect of plotting the current graph on the same
window replacing the previous one. However, sometimes we wish to overlay the plots in order to
compare the results. This is made possible with the functions lines() and points() to add lines and
points respectively, to the existing plot.

plot(x, sin(x),
main="Overlaying Graphs",
ylab="",
type="l",
col="blue")
lines(x,cos(x), col="red")
legend("topleft",
c("sin(x)","cos(x)"),
fill=c("blue","red")
)

61
Data analytics Using R Unit - IV

➢ Adding Other Shapes to a Plot


Using the following functions, we can add the extra graphical objects in plots:

• rect – For plotting rectangles – rect(xleft, ybottom, xright, ytop)

Ex: > plot(c(100, 250), c(300, 450), type = "n", xlab = "", ylab = "",

+ main = "2 x 11 rectangles'")

> rect(100+i, 300+i, 150+i, 380+i, col = rainbow(11, start = 0.7, end = 0.1))

> rect(100, 400, 125, 450, col = "green", border = "blue")

o/p:

Using the locater function, we can obtain the coordinates of the corners of the rectangle. But the
rect function does not accept locator as its argument.

• arrows – For plotting arrows and headed bars – The syntax for the arrows function is to
draw a line from the point (xO, yO) to the point (x1, y1) with the arrowhead, by default, at
the “second” end (x1, y1).

62
Data analytics Using R Unit - IV

arrows(xO, yO, xl, yl)


Adding code=3 produces a horizontal double-headed arrow from (1,9) to (5,9), for example:
arrows(1,9,5,9,code=3)
Ex:
> plot(x,y, main = "arrows and segments”)
> ## draw arrows from point to point:
> s <- seq(length(x)-1) # one shorter than data
> arrows(x[s], y[s], x[s+1], y[s+1], col = 1:3)

• polygon – For plotting more complicated filled shapes, including objects with curved sides.
To draw a polygon in R, save the coordinates of six points in a vector called locations by using
the following commands:
locations<-locator(6)
Now you can draw a lavender-colored polygon by using the following command:
polygon(locations,col=.lavender.)

Ex: > xx <- c(0:n, n:0)

> yy <- c(c(0,cumsum(rnorm(n))), rev(c(0,cumsum(rnorm(n)))))

> plot (xx, yy, type="n", xlab="Time", ylab="Distance")


> polygon(xx, yy, col="gray", border = "red")

63
Data analytics Using R Unit - IV

• Plot symbols in R: Different plotting symbols are available in R. The graphical argument
used to specify point shapes is pch. By default pch=1 . The different points
symbols commonly used in R are shown in the figure below

64
Data analytics Using R Unit - IV

Example:
x<-c(2.2, 3, 3.8, 4.5, 7, 8.5, 6.7, 5.5)
y<-c(4, 5.5, 4.5, 9, 11, 15.2, 13.3, 10.5)

# Plot points plot(x, y)


# Change plotting symbol # Use solid
circle
plot(x, y, pch = 19)

65
Data Analysis using R

Saving Graphs to Files:


If you want to publish your results, you have to save your plot to a file in R and then import this
graphics file into another document. Much of the time however, you may simply want to use R
graphics in an interactive way to explore your data.
To save a plot to an image file, you have to do three things in sequence:
1. Open a graphics device.
➢ The default graphics device in R is your computer screen. To save a plot to an image file, you
need to tell R to open a new type of device — in this case, a graphics file of a specific type,
such as PNG, PDF, or JPG.
➢ The R function to create a PNG device is png(). Similarly, you create a PDF device
with pdf() and a JPG device with jpg().
➢ The first step in deciding how to save plots is to decide on the output format that you want to
use. The following table lists some of the available formats, along with guidance as to when
they may be useful.
Format Driver Notes
JPG jpeg Can be used anywhere, but doesn't resize
PNG png Can be used anywhere, but doesn't resize
WMF [Link] Windows only; best choice with Word; easily resizable
PDF pdf Best choice with pdflatex; easily resizable
Postscript postscript Best choice with latex and Open Office; easily resizable

2. Create the plot.

Methods to Save Graphs to Files in R


Below, are the methods to Save Graphs to Files in R
i. A General Method
Here’s a general method that will work on any computer with R, regardless of operating system
or the way that we are connecting.
For Example:
If we have to save a plot as a JPG file, so we will use the jpeg driver. If we want to save a jpg file
called “[Link]” containing a plot of x and y, we would type the following commands:

• Save as Jpeg image


>jpeg('[Link]')
> plot(x,y)
> [Link]()
• Save as png image
>png(file="C:/Datamentor/R-tutorial/saving_plot2.png", width=600, height=350)
>hist(Temperature, col="gold")
>[Link]()

ii. Another Approach


66
Data Analysis using R

In R, the [Link] command is used to copy the contents of the graph window to a file without
having to re-enter the commands.
For Example:
To create a png file called [Link] from a graph that is displayed by R, type
> [Link](png,'[Link]')
> [Link]()

3. Close the graphics device.


You do this with the [Link]() function.
Put this in action by saving a plot of faithful to the home folder on your computer. First set your
working directory to your home folder (or to any other folder you prefer
Now you can check your file system to see whether the file [Link] exists. (It should!) The
result is a graphics file of type PNG that you can insert into a presentation, document, or website.

To save a plot as jpeg image we would perform the following steps. Please note that we need to
call the function [Link]() after all the plotting, to save the file and return control to the screen.
Ex: jpeg(file="saving_plot1.jpeg")
hist(Temperature, col="darkgreen")
[Link]()

Descriptive Statistics:
Statistical analysis in R is performed by using many in-built functions. Most of these functions
are part of the R base package. These functions take R vector as an input along with the
arguments and give the result.
The functions we are discussing in this chapter are mean, median and mode.

Mean
It is calculated by taking the sum of the values and dividing with the number of values in a data
series.
The function mean() is used to calculate this in R.
Syntax:
mean(x, trim = 0, [Link] = FALSE, ...)
Following is the description of the parameters used −
• x is the input vector.
• trim is used to drop some observations from both end of the sorted vector.
• [Link] is used to remove the missing values from the input vector.
67

You might also like