DATA
VISUALIZATION
GRAPHICS IN R
Agenda:
◦Scatter Plot
◦Histogram
◦Box Plot
◦Customizing Plots
Example Data
To view available datasets, type:
>data()
To load a dataset into memory, type
data(name of data set)
>data(ChickWeight)
To view the loaded dataset type its name:
>ChickWeight
The Data File
Weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
7 106 12 1 1
8 125 14 1 1
9 149 16 1 1
10 171 18 1 1
Summary of the Data File
>summary(ChickWeight)
Weight Time
Min. : 35 Min. : 0.00
1st Qu.: 63 1st Qu.: 4.00
Median :103 Median :10.00
Mean :121 Mean :10.72
3rd Qu.:163 3rd Qu.:16.00
Max. :373 Max. :21.00
Graphics in R
Plot() is the main graphing
function
Automatically produces simple
plots for vectors, functions or
data frames
Many useful customization
options…
Plotting a Vector
plot(v) will print the elements of
the vector v according to their
index
# Plot weight for each observation
> plot(ChickWeight$Weight)
# Plot weight against their ranks
> plot(sort(ChickWeight$Weight))
plot(ChickWeight$Weight)plot(sort(ChickWeight$Weight))
Common Parameters for
plot()
Specifying labels:
main – provides a title
xlab – label for the x axis
ylab – label for the y axis
Specifying range limits:
ylim – 2-element vector gives range for y
axis
xlim – 2-element vector gives range for x
axis
A Labeled Plot
>plot(ChickWeight$weight, ylim =
c(50,200), ylab = "Weight", xlab = "Rank",
main = "Distribution of Weights")
Plotting Two Vectors
plot() can pair elements from 2 vectors to produce x-y coordinates
You can exclude vectors using:
> plot(datasetname[-c(1,2)])
Plotting Two Vectors
>plot(ChickWeight$Diet,
ChickWeight$weight, xlab = "diet", ylab =
"Weight", main = "Type of Diet Effect on
Weight”, col = "blue”)
Plotting Contents of a Dataset
> plot(ChickWeight)
Plotting Contents of a Dataset
> plot(ChickWeight[-c(1)])
Histograms
A diagram consisting of rectangles whose area is
proportional to the frequency of a variable
Histograms effectively only work with one variable at a
time.
The parameter breaks is key:
- Specifies the number of categories to plot
- Specifies the breakpoints for each category
- Controls the number of bars, cells or bins of the histogram.
The xlab, ylab, xlim, ylim options work as expected
Histograms
hist(ChickWeight$weight, col = "lightblue", xlab =
"Weight", main = "Weight Histogram")
Histograms With Breaks
hist(ChickWeight$weight, col = "lightblue", xlab =
"Weight", main = "Weight Histogram", breaks =
seq (0,400, by=10) )
Boxplots
Generated by the boxplot()
function
Draws plot summarizing
- Median
◦Quartiles (Q1, Q3)*
◦Outliers – by default,
observations more than 1.5 * (Q1
– Q3) distant from nearest
quartile
Boxplot
boxplot(ChickWeight, col = rainbow(6), ylab = "ChickWeight
Boxplot")
Outliers
Boxplot for Weight
rug() can add a tick for each
observation to the side of a
boxplot() and other plots.
The side parameter specifies
where tick marks are drawn.
> boxplot(ChickWeight$weight, col = rainbow(6), ylab =
"ChickWeight Boxplot")
> rug(ChickWeight$weight,side=2)
Customizing Plots
R provides a series of functions for adding
text, lines and points to a plot
We will illustrate some useful ones, but
look at demo(graphics) for more examples
Type <Return> or Press enter for more
Drawing on a plot
To add additional data use
- points(x,y)
- lines(x,y)
For freehand drawing use
- polygon()
- rect()
Text Drawing
Two commonly used functions:
- text() – writes inside the plot region, could be used to label datapoints
- mtext() – writes on the margins
Plotting Two Data Series
> x <- seq(0,2*pi, by = 0.1)
> y <- sin(x)
> plot(x,y, col = "green", type = "l", lwd = 3)
> y1 <- cos(x)
> lines(x,y1, col = "red", lwd = 3)
> mtext("Sine and Cosine Plot", side = 3, line = 1)
Adding a Label & Rectangle
> rect(0,-1,2,0.5)
> text(1,0.6, "label here")
Multiple Plots on a Page
Set the mfrow or mfcol options
Take 2 dimensional vector as an argument
- The first value specifies the number of rows
- The second specifies the number of columns
The 2 options differ in the order individual plots are
printed
Multiple Plots on a Page
>par(mfcol = c(3,1))
>hist(ChickWeight $weight*1000,
breaks = 10, main = "Weight (in
mg)", xlab = "Weight")
>hist(ChickWeight$weight, breaks
= 10, main = "Weight (in g)", xlab
= "Weight")
>
hist(ChickWeight$weight/1000,bre
aks = 10, main = "Weight (in kg)",
xlab = "Weight")
Saving R Plots
R usually generates output to the screen
R can also save its graphics output in a file that you
can distribute or include in a document prepared with
Word or LATEX . From File -> Save As
Hands On
Plot two graphs as follows:
◦Plot1: Histogram of ChickWeight weight vector (in kgs instead
of grams). Having main label “Weight in Kgs” and 8 breaks.
◦Plot 2: the box plot of the Weight vector in Kgs in red color.
◦Two plots in the same row
Solution
> par(mfrow=c(1,2))
> hist(ChickWeight$weight/1000, breaks=8, main = "Weight (in
kg)", xlab = "Weight")
> boxplot(ChickWeight$weight/1000, col = "red")
Practice
Note:
Cex=n plots a figure n times the default size
Pch denotes plot symbol
Practice
>f1<- function(){
x1<- rep(1:5, times=5)
#1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,
5
y1<- rep(1:5,each=5)
#1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,
5
plot(x1,y1,pch=1:25, cex =3, bg= "red", main="Data symbols 1:25")}
>f1()
Note:
Cex=n plots a figure n times the default size
Pch denotes plot symbol