0% found this document useful (0 votes)
12 views75 pages

Introduction to R Programming Basics

Uploaded by

steve huffle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views75 pages

Introduction to R Programming Basics

Uploaded by

steve huffle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

INTRODUCTION TO R PROGRAMMING

R is a programming language and software environment for statistical analysis, graphics


representation and reporting. R was created by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently developed by the R Development
Core Team.

The core of R is an interpreted computer language which allows branching and looping as
well as modular programming using functions. R allows integration with the procedures
written in the C, C++, .Net, Python or FORTRAN languages for efficiency.

R is freely available under the GNU General Public License, and pre-compiled binary
versions are provided for various operating systems like Linux, Windows and Mac.

R is free software distributed under a GNU-style copy left, and an official part of the GNU
project called GNU S.

Evolution of R
R was initially written by Ross Ihaka and Robert Gentleman at the Department of
Statistics of the University of Auckland in Auckland, New Zealand. R made its first
appearance in 1993.
• A large group of individuals has contributed to R by sending code and bug reports.
• Since mid-1997 there has been a core group (the "R Core Team") who can modify
the R source code archive.

Features of R
As stated earlier, R is a programming language and software environment for statistical
analysis, graphics representation and reporting.

The following are the important features of R−


• R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
• R has an effective data handling and storage facility,
• R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
• R provides a large, coherent and integrated collection of tools for data analysis.
• R provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
As a conclusion, R is world’s most widely used statistics programming language.

R - Basic Syntax
As a convention, we will start learning R programming by writing a "Hello, World!" program.
Depending on the needs, you can program either at R command prompt or you can use an
R script file to write your program. Let's check both one by one.
R Command Prompt
Once you have R environment setup, then it’s easy to start your R command prompt by
just typing the following command at your command prompt −

$R
This will launch R interpreter and you will get a prompt > where you can start typing your
program as follows −

> myString <- "Hello, World!"


> print ( myString)
[1] "Hello, World!"

Here first statement defines a string variable myString, where we assign a string "Hello,
World!" and then next statement print() is being used to print the value stored in variable
myString.

R Script File
Usually, you will do your programming by writing your programs in script files and then you
execute those scripts at your command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in a text file called test. R as
under−

# My first program in R Programming


myString <- "Hello, World!"
print ( myString)

Save the above code in a file test.R and execute it at Linux command prompt as given
below. Even if you are using Windows or other system, syntax will remain same.

$ Rscript test.R
When we run the above program, it produces the following result.

[1] "Hello, World!"

R - Data Types
Variables are nothing but reserved memory locations to store values. This means that,
when you create a variable you reserve some space in memory.

You may like to store information of various data types like character, wide character,
integer, floating point, double floating point, Boolean etc. Based on the data type of a
variable, the operating system allocates memory and decides what can be stored in the
reserved memory.
The variables are assigned with R-Objects and the data type of the R-object becomes the
data type of the variable. There are many types of R-objects. The frequently used ones
are−
• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames
The simplest of these objects is the vector object and there are six data types of these
atomic vectors, also termed as six classes of vectors. The other R-Objects are built upon
the atomic vectors.
Data Type Example Verify
Logical TRUE, FALSE v <- TRUE
print(class(v))
it produces the following result −
[1] "logical"
Numeric 12.3, 5, 999 v <- 23.5
print(class(v))
it produces the following result −
[1] "numeric"
Integer 2L, 34L, 0L v <- 2L
print(class(v))
it produces the following result −
[1] "integer"
Complex 3 + 2i v <- 2+5i
print(class(v))
it produces the following result −
[1] "complex"
Character 'a' , '"good", "TRUE", '23.4' v <- "TRUE"
print(class(v))
it produces the following result −
[1] "character"
Raw "Hello" is stored as 48 65 6c 6c 6f v <- charToRaw("Hello")
print(class(v))
it produces the following result −
[1] "raw"

In R programming, the very basic data types are the R-objects called vectors which hold
elements of different classes as shown above. Please note in R the number of classes is
not confined to only the above six types. For example, we can use many atomic vectors
and create an array whose class will become array.
Vectors
When you want to create vector with more than one element, you should use c() function
which means to combine the elements into a vector.
# Create a vector.
apple <- c('red','green',"yellow")
print(apple)
# Get the class of the vector.
print(class(apple))

When we execute the above code, it produces the following result −


[1] "red" "green" "yellow"
[1] "character"

Lists
A list is an R-object which can contain many different types of elements inside it like
vectors, functions and even another list inside it.
# Create a list.
list1 <- list(c(2,5,3),21.3,sin)
# Print the list.
print(list1)

When we execute the above code, it produces the following result −


[[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
function (x) .Primitive("sin")

Matrices
A matrix is a two-dimensional rectangular data set. It can be created using a vector input to
the matrix function.
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] "a" "a" "b"
[2,] "c" "b" "a"
Arrays
While matrices are confined to two dimensions, arrays can be of any number of
dimensions. The array function takes a dim attribute which creates the required number of
dimension. In the below example we create an array with two elements which are 3x3
matrices each.

# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

When we execute the above code, it produces the following result −

,,1
[,1] [,2] [,3]
[1,] "green" "yellow" "green"
[2,] "yellow" "green" "yellow"
[3,] "green" "yellow" "green"

,,2
[,1] [,2] [,3]
[1,] "yellow" "green" "yellow"
[2,] "green" "yellow" "green"
[3,] "yellow" "green" "yellow"

Factors
Factors are the r-objects which are created using a vector. It stores the vector along with
the distinct values of the elements in the vector as labels. The labels are always character
irrespective of whether it is numeric or character or Boolean etc. in the input vector. They
are useful in statistical modeling.

Factors are created using the factor() function. The nlevels functions gives the count of
levels.

# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')
# Create a factor object.
factor_apple <- factor(apple_colors)
# Print the factor.
print(factor_apple)
print(nlevels(factor_apple))

When we execute the above code, it produces the following result −


[1] green green yellow red red red green
Levels: green red yellow
[1] 3

Data Frames
Data frames are tabular data objects. Unlike a matrix in data frame each column can
contain different modes of data. The first column can be numeric while the second column
can be character and third column can be logical. It is a list of vectors of equal length. Data
Frames are created using the [Link]() function.
# Create the data frame.
BMI <- [Link](
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)

When we execute the above code, it produces the following result −


gender height weight Age
1 Male 152.0 81 42
2 Male 171.5 93 38
3 Female 165.0 78 26

R – Variables

A variable provides us with named storage that our programs can manipulate. A variable in
R can store an atomic vector, group of atomic vectors or a combination of many Robjects.
A valid variable name consists of letters, numbers and the dot or underline characters. The
variable name starts with a letter or the dot not followed by a number.

Variable Name Validity Reason

var_name2. valid Has letters, numbers, dot and underscore

var_name% Invalid Has the character '%'. Only dot(.) and underscore allowed.

2var_name invalid Starts with a number

.var_name, valid Can start with a dot(.) but the dot(.)should not be followed by a
[Link] number

.2var_name invalid The starting dot is followed by a number making it invalid.


Variable Assignment
The variables can be assigned values using leftward, rightward and equal to
operator. The values of the variables can be printed using print() or cat() function.
The cat() function combines multiple items into a continuous print output.

# Assignment using equal operator.


var.1 = c(0,1,2,3)
# Assignment using leftward operator.
var.2 <- c("learn","R")
# Assignment using rightward operator.
c(TRUE,1) -> var.3
print(var.1)
cat ("var.1 is ", var.1 ,"\n")
cat ("var.2 is ", var.2 ,"\n")
cat ("var.3 is ", var.3 ,"\n")

When we execute the above code, it produces the following result −

[1] 0 1 2 3
var.1 is 0 1 2 3
var.2 is learn R
var.3 is 1 1
Note − The vector c(TRUE,1) has a mix of logical and numeric class. So logical class is
coerced to numeric class making TRUE as 1.

Data Type of a Variable


In R, a variable itself is not declared of any data type, rather it gets the data type of the R -
object assigned to it. So R is called a dynamically typed language, which means that we
can change a variable’s data type of the same variable again and again when using it in a
program.

var_x <- "Hello"


cat("The class of var_x is ",class(var_x),"\n")
var_x <- 34.5
cat(" Now the class of var_x is ",class(var_x),"\n")
var_x <- 27L
cat(" Next the class of var_x becomes ",class(var_x),"\n")

When we execute the above code, it produces the following result −


The class of var_x is character
Now the class of var_x is numeric
Next the class of var_x becomes integer

Finding Variables
To know all the variables currently available in the workspace we use the ls()function. Also
the ls() function can use patterns to match the variable names.

print(ls())

When we execute the above code, it produces the following result −

[1] "my var" "my_new_var" "my_var" "var.1"


[5] "var.2" "var.3" "[Link]" "var_name2."
[9] "var_x" "varname"

Note − It is a sample output depending on what variables are declared in your


environment.

The ls() function can use patterns to match the variable names.

# List the variables starting with the pattern "var".

print(ls(pattern = "var"))

When we execute the above code, it produces the following result −

[1] "my var" "my_new_var" "my_var" "var.1"


[5] "var.2" "var.3" "[Link]" "var_name2."
[9] "var_x" "varname"
The variables starting with dot(.) are hidden, they can be listed using "[Link] = TRUE"
argument to ls() function.

print(ls([Link] = TRUE))

When we execute the above code, it produces the following result −


[1] ".cars" ".[Link]" ".var_name" ".varname" ".varname2"
[6] "my var" "my_new_var" "my_var" "var.1" "var.2"
[11]"var.3" "[Link]" "var_name2." "var_x"

Deleting Variables
Variables can be deleted by using the rm() function. Below we delete the variable var.3.
On printing the value of the variable error is thrown.

rm(var.3)
print(var.3)
When we execute the above code, it produces the following result −
[1] "var.3"
Error in print(var.3) : object 'var.3' not found
All the variables can be deleted by using the rm() and ls() function together.
rm(list = ls())
print(ls())
When we execute the above code, it produces the following result −
character(0)

R - Operators
An operator is a symbol that tells the compiler to perform specific mathematical or logical
manipulations. R language is rich in built-in operators and provides following types of
operators.

Types of Operators
We have the following types of operators in R programming −
• Arithmetic Operators
• Relational Operators
• Logical Operators
• Assignment Operators
• Miscellaneous Operators

Arithmetic Operators
Following table shows the arithmetic operators supported by R language. The operators
act on each element of the vector.

Operator Description Example


+ Adds two vectors v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v+t)
it produces the following result −
[1] 10.0 8.5 10.0
− Subtracts second vector from the v <- c( 2,5.5,6)
first t <- c(8, 3, 4)
print(v-t)
it produces the following result −
[1] -6.0 2.5 2.0
* Multiplies both vectors v <- c( 2,5.5,6)
t <- c(8, 3, 4)
print(v*t)
it produces the following result −
[1] 16.0 16.5 24.0
/ Divide the first vector with the v <- c( 2,5.5,6)
second t <- c(8, 3, 4)
print(v/t)
When we execute the above code, it produces
the following result −
[1] 0.250000 1.833333 1.500000
%% Give the remainder of the first v <- c( 2,5.5,6)
vector with the second t <- c(8, 3, 4)
print(v%%t)
it produces the following result −
[1] 2.0 2.5 2.0
%/% The result of division of first vector v <- c( 2,5.5,6)
with second (quotient) t <- c(8, 3, 4)
print(v%/%t)
it produces the following result −
[1] 0 1 1
^ The first vector raised to the v <- c( 2,5.5,6)
exponent of second vector t <- c(8, 3, 4)
print(v^t)
it produces the following result −
[1] 256.000 166.375 1296.000

Relational Operators
Following table shows the relational operators supported by R language. Each element of
the first vector is compared with the corresponding element of the second vector. The
result of comparison is a Boolean value.
Operator Description Example
> Checks if each element of the first v <- c(2,5.5,6,9)
vector is greater than the corresponding t <- c(8,2.5,14,9)
element of the second vector. print(v>t)
it produces the following result −
[1] FALSE TRUE FALSE FALSE
< Checks if each element of the first v <- c(2,5.5,6,9)
vector is less than the corresponding t <- c(8,2.5,14,9)
element of the second vector. print(v < t)
it produces the following result −
[1] TRUE FALSE TRUE FALSE
== Checks if each element of the first v <- c(2,5.5,6,9)
vector is equal to the corresponding t <- c(8,2.5,14,9)
element of the second vector. print(v == t)
it produces the following result −
[1] FALSE FALSE FALSE TRUE
<= Checks if each element of the first v <- c(2,5.5,6,9)
vector is less than or equal to the t <- c(8,2.5,14,9)
corresponding element of the second print(v<=t)
vector.
it produces the following result −
[1] TRUE FALSE TRUE TRUE
>= Checks if each element of the first v <- c(2,5.5,6,9)
vector is greater than or equal to the t <- c(8,2.5,14,9)
corresponding element of the second print(v>=t)
vector.
it produces the following result −
[1] FALSE TRUE FALSE TRUE
!= Checks if each element of the first v <- c(2,5.5,6,9)
vector is unequal to the corresponding t <- c(8,2.5,14,9)
element of the second vector. print(v!=t)
it produces the following result −
[1] TRUE TRUE TRUE FALSE

Logical Operators
Following table shows the logical operators supported by R language. It is applicable only
to vectors of type logical, numeric or complex. All numbers greater than 1 are considered
as logical value TRUE.

Each element of the first vector is compared with the corresponding element of the second
vector. The result of comparison is a Boolean value.
Operator Description Example
& It is called Element-wise Logical AND v <- c(3,1,TRUE,2+3i)
operator. It combines each element of the t <- c(4,1,FALSE,2+3i)
first vector with the corresponding print(v&t)
element of the second vector and gives a
output TRUE if both the elements are it produces the following result −
TRUE. [1] TRUE TRUE FALSE TRUE
| It is called Element-wise Logical OR v <- c(3,0,TRUE,2+2i)
operator. It combines each element of the t <- c(4,0,FALSE,2+3i)
first vector with the corresponding print(v|t)
element of the second vector and gives a
output TRUE if one the elements is it produces the following result −
TRUE. [1] TRUE FALSE TRUE TRUE
! It is called Logical NOT operator. Takes v <- c(3,0,TRUE,2+2i)
each element of the vector and gives the print(!v)
opposite logical value.
it produces the following result −
[1] FALSE TRUE FALSE FALSE

The logical operator && and || considers only the first element of the vectors and give a
vector of single element as output.
Operator Description Example
&& Called Logical AND operator. Takes first v <- c(3,0,TRUE,2+2i)
element of both the vectors and gives the t <- c(1,3,TRUE,2+3i)
TRUE only if both are TRUE. print(v&&t)
it produces the following result −
[1] TRUE
|| Called Logical OR operator. Takes first v <- c(0,0,TRUE,2+2i)
element of both the vectors and gives the t <- c(0,3,TRUE,2+3i)
TRUE if one of them is TRUE. print(v||t)
it produces the following result −
[1] FALSE

Assignment Operators
These operators are used to assign values to vectors.
Operator Description Example
<− Called Left Assignment v1 <- c(3,1,TRUE,2+3i)
or v2 <<- c(3,1,TRUE,2+3i)
= v3 = c(3,1,TRUE,2+3i)
or print(v1)
<<− print(v2)
print(v3)
it produces the following result −
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
-> Called Right Assignment c(3,1,TRUE,2+3i) -> v1
or c(3,1,TRUE,2+3i) ->> v2
->> print(v1)
print(v2)
it produces the following result −
[1] 3+0i 1+0i 1+0i 2+3i
[1] 3+0i 1+0i 1+0i 2+3i
Miscellaneous Operators
These operators are used to for specific purpose and not general mathematical or logical
computation.
Operator Description Example
: Colon operator. It v <- 2:8
creates the series print(v)
of numbers in
sequence for a it produces the following result −
vector. [1] 2 3 4 5 6 7 8
%in% This operator is v1 <- 8
used to identify if v2 <- 12
an element belongs t <- 1:10
to a vector. print(v1 %in% t)
print(v2 %in% t)
it produces the following result −
[1] TRUE
[1] FALSE
%*% This operator is M = matrix( c(2,6,5,1,10,4), nrow = 2,ncol = 3,byrow = TRUE)
used to multiply a t = M %*% t(M)
matrix with its print(t)
transpose.
it produces the following result −
[,1] [,2]
[1,] 65 82
[2,] 82 117

R - Decision making
Decision making structures require the programmer to specify one or more conditions to
be evaluated or tested by the program, along with a statement or statements to be
executed if the condition is determined to be true, and optionally, other statements to be
executed if the condition is determined to be false.

Following is the general form of a typical decision making structure found in most of the
programming languages −

R provides the following types of decision making statements.


[Link] Statement & Description
1 if statement
An if statement consists of a Boolean expression followed by one or more statements.
Syntax
The basic syntax for creating an if statement in R is −
if(boolean_expression) {
// statement(s) will execute if the boolean expression is true.
}
Example
x <- 30L
if([Link](x)) {
print("X is an Integer")
}
When the above code is compiled and executed, it produces the following result −
[1] "X is an Integer"
2 if...else statement
An if statement can be followed by an optional else statement, which executes when the
Boolean expression is false.
Syntax
The basic syntax for creating an if...else statement in R is −
if(boolean_expression) {
// statement(s) will execute if the boolean expression is true.
} else {
// statement(s) will execute if the boolean expression is false.
}
Example
x <- c("what","is","truth")
if("Truth" %in% x) {
print("Truth is found")
} else {
print("Truth is not found")
}
When the above code is compiled and executed, it produces the following result −
[1] "Truth is not found"
The if...else if...else Statement
Syntax
The basic syntax for creating an if...else if...else statement in R is −
if(boolean_expression 1) {
// Executes when the boolean expression 1 is true.
} else if( boolean_expression 2) {
// Executes when the boolean expression 2 is true.
} else if( boolean_expression 3) {
// Executes when the boolean expression 3 is true.
} else {
// executes when none of the above condition is true.
}
Example
x <- c("what","is","truth")
if("Truth" %in% x) {
print("Truth is found the first time")
} else if ("truth" %in% x) {
print("truth is found the second time")
} else {
print("No truth found")
}
When the above code is compiled and executed, it produces the following result −
[1] "truth is found the second time"
3 switch statement
A switch statement allows a variable to be tested for equality against a list of values. Each
value is called a case, and the variable being switched on is checked for each case.
Syntax
The basic syntax for creating a switch statement in R is −
switch(expression, case1, case2, case3....)

Example
x <- switch(
3,
"first",
"second",
"third",
"fourth"
)
print(x)
When the above code is compiled and executed, it produces the following result −
[1] "third"

R - Loops
A loop statement allows us to execute a statement or group of statements multiple times
and the following is the general form of a loop statement in most of the programming
languages −
R programming language provides the following kinds of loop to handle looping
requirements.
[Link] Loop Type & Description
1 repeat loop The Repeat loop executes the same code again and again until a stop condition
is met.
Syntax
repeat {
commands
if(condition) {
break
}
}
Example
v <- c("Hello","loop")
cnt <- 2
repeat {
print(v)
cnt <- cnt+1

if(cnt > 5) {
break
}
}
When the above code is compiled and executed, it produces the following result −
[1] "Hello" "loop"
[1] "Hello" "loop"
[1] "Hello" "loop"
[1] "Hello" "loop"
2 while loop
The While loop executes the same code again and again until a stop condition is met. It tests
the condition before executing the loop body.
Syntax
while (test_expression) {
statement
}
Example
v <- c("Hello","while loop")
cnt <- 2

while (cnt < 7) {


print(v)
cnt = cnt + 1
}
When the above code is compiled and executed, it produces the following result −
[1] "Hello" "while loop"
[1] "Hello" "while loop"
[1] "Hello" "while loop"
[1] "Hello" "while loop"
[1] "Hello" "while loop"
3 for loop
A For loop is a repetition control structure that allows you to efficiently write a loop that
needs to execute a specific number of times. It tests the condition at the end of the loop body.
Syntax
for (value in vector) {
statements
}
Example
v <- LETTERS[1:4]
for ( i in v) {
print(i)
}
When the above code is compiled and executed, it produces the following result −
[1] "A"
[1] "B"
[1] "C"
[1] "D"

Loop Control Statements


Loop control statements change execution from its normal sequence. When execution
leaves a scope, all automatic objects that were created in that scope are destroyed.

R supports the following control statements. Click the following links to check their detail.
[Link]. Control Statement & Description
1 break statement
Terminates the loop statement and transfers execution to the statement immediately
following the loop.
2 Next statement
The next statement simulates the behavior of R switch.

R – Functions
A function is a set of statements organized together to perform a specific task. R has a
large number of in-built functions and the user can create their own functions.

In R, a function is an object so the R interpreter is able to pass control to the function,


along with arguments that may be necessary for the function to accomplish the actions.

The function in turn performs its task and returns control to the interpreter as well as any
result which may be stored in other objects.

Function Definition
An R function is created by using the keyword function. The basic syntax of an R function
definition is as follows −

function_name <- function(arg_1, arg_2, ...) {


Function body
}
Function Components
The different parts of a function are −

• Function Name − This is the actual name of the function. It is stored in R


environment as an object with this name.

• Arguments − An argument is a placeholder. When a function is invoked, you pass a


value to the argument. Arguments are optional; that is, a function may contain no
arguments. Also arguments can have default values.

• Function Body − The function body contains a collection of statements that defines
what the function does.
Return Value − The return value of a function is the last expression in the function body to
be evaluated.
R has many in-built functions which can be directly called in the program without defining
them first. We can also create and use our own functions referred as user
defined functions.

Built-in Function
Simple examples of in-built functions are seq(), mean(), max(), sum(x)and paste(...) etc.
They are directly called by user written programs.

# Create a sequence of numbers from 32 to 44.


print(seq(32,44))
# Find mean of numbers from 25 to 82.
print(mean(25:82))
# Find sum of numbers frm 41 to 68.
print(sum(41:68))

When we execute the above code, it produces the following result −

[1] 32 33 34 35 36 37 38 39 40 41 42 43 44

[1] 53.5

[1] 1526

User-defined Function
We can create user-defined functions in R. They are specific to what a user wants and
once created they can be used like the built-in functions. Below is an example of how a
function is created and used.

# Create a function to print squares of numbers in sequence.


[Link] <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
}
# Create a function to print squares of numbers in sequence.
[Link] <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}
}
# Call the function [Link] supplying 6 as an argument.
[Link](6)

When we execute the above code, it produces the following result −


[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
[1] 36

Calling a Function without an Argument


# Create a function without an argument.
[Link] <- function() {
for(i in 1:5) {
print(i^2)
}
}
# Call the function without supplying an argument.
[Link]()

When we execute the above code, it produces the following result −

[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
Calling a Function with Argument Values (by position and by name)
The arguments to a function call can be supplied in the same sequence as defined in the
function or they can be supplied in a different sequence but assigned to the names of the
arguments.

# Create a function with arguments.


[Link] <- function(a,b,c) {
result <- a * b + c
print(result)
}
# Call the function by position of arguments.
[Link](5,3,11)
# Call the function by names of the arguments.
[Link](a = 11, b = 5, c = 3)

When we execute the above code, it produces the following result −

[1] 26
[1] 58

Calling a Function with Default Argument


We can define the value of the arguments in the function definition and call the function
without supplying any argument to get the default result. But we can also call such
functions by supplying new values of the argument and get non default result.

# Create a function with arguments.


[Link] <- function(a = 3, b = 6) {
result <- a * b
print(result)
}
# Call the function without giving any argument.
[Link]()
# Call the function with giving new values of the argument.
[Link](9,5)

When we execute the above code, it produces the following result −

[1] 18
[1] 45

Lazy Evaluation of Function


Arguments to functions are evaluated lazily, which means so they are evaluated only when
needed by the function body.

# Create a function with arguments.


[Link] <- function(a, b) {
print(a^2)
print(a)
print(b)
}
# Evaluate the function without supplying one of the arguments.
[Link](6)

When we execute the above code, it produces the following result −

[1] 36
[1] 6
Error in print(b) : argument "b" is missing, with no default
Experiment No. 1 Date:

Aim: Load the ‘iris. CSV’ file and display the names and type of each column. Find
statistics such as min, max, range, mean, median, variance, standard deviation for
each column of data.

Loading Iris Data Set in R


Loading Data using data()
Loading and importing Iris Data set
Iris Data set is present in R by default. We can load Iris data by using data() function :
data() - It is used to load specified data sets
Syntax:
> data("iris")

It can load iris data in R.

Now if we want to export this iris data set as a csv file we can do it as below

Export to CSV in R

iris_datset <- data("iris")

# Export as CSV file in R


[Link](iris, file = "[Link]")
#This gets saved in your working directory as [Link]

To know your current working directory


Syntax:
getwd()

If you want to set your working directory to a folder of your choice use
Syntax:
setwd()
Example:
> setwd("c:/x/")

Reading Data From a CSV File

Now Suppose We have iris data set in a CSV file as [Link].

We can import this iris data set in a csv file by using [Link]() function:
Syntax:
?[Link]()
It opens help window of [Link] function.
Syntax:
[Link]() - It is used to read csv files and create a data frame from it.
We import iris data by giving path of data file of "[Link]" .
Example:
iris<- [Link]("C:\\x\\[Link]")
Display the names of each column of iris data:

> colnames(iris)
[1] "X" "[Link]" "[Link]" "[Link]" "[Link]" "Species"

Type of each column of iris data:

> typeof(iris$[Link])
[1] "double"
> typeof(iris$[Link])
[1] "double"
> typeof(iris$[Link])
[1] "double"
> typeof(iris$[Link])
[1] "double"
> typeof(iris$Species)
[1] "integer"

Minimum value of each column of iris data:

> apply(iris,2,min)
X [Link] [Link] [Link] [Link] Species
" 1" "4.3" "2.0" "1.0" "0.1" "setosa"

Maximum value of each column of iris data:

> apply(iris,2,max)
X [Link] [Link] [Link] [Link] Species
"150" "7.9" "4.4" "6.9" "2.5" "virginica"

Range of each column of iris data:


The range of an observation variable is the difference of its largest and smallest data
values. It is a measure of how far apart the entire data spreads in value.

> apply(iris,2,range)
X [Link] [Link] [Link] [Link] Species
[1,] " 1" "4.3" "2.0" "1.0" "0.1" "setosa"
[2,] "150" "7.9" "4.4" "6.9" "2.5" "virginica"

Mean
It is calculated by taking the sum of the values and dividing with the number of values in a
data series.
The mean of an observation variable is a numerical measure of the central location of the
data values. It is the sum of its data values divided by data count.
Hence, for a data sample of size n, its sample mean is defined as follows:
Similarly, for a data population of size N, the population mean is:

The function mean() is used to calculate this in R.


Syntax
The basic syntax for calculating mean in R is −
mean(x, trim = 0, [Link] = FALSE, ...)
Following is the description of the parameters used −
• x is the input vector.
• trim is used to drop some observations from both end of the sorted vector.
• [Link] is used to remove the missing values from the input vector.
Example:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
[Link] <- mean(x)
print([Link])

When we execute the above code, it produces the following result −


[1] 8.22

Mean of each column of iris data:


> print(mean(iris$X))
[1] 75.5
> print(mean(iris$[Link]))
[1] 5.843333
> print(mean(iris$[Link]))
[1] 3.057333
> print(mean(iris$[Link]))
[1] 3.758
> print(mean(iris$[Link]))
[1] 1.199333
> print(mean(iris$Species))
[1] NA
Warning message:
In [Link](iris$Species) :
argument is not numeric or logical: returning NA

Median
The middle most value in a data series is called the median. The median() function is used
in R to calculate this value.
Syntax
The basic syntax for calculating median in R is −
median (x, [Link] = FALSE)
Following is the description of the parameters used −
• x is the input vector.
• [Link] is used to remove the missing values from the input vector.
Example
# Create the vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find the median.
[Link] <- median(x)
print([Link])

When we execute the above code, it produces the following result −


[1] 5.6

Median of each column of iris data:


> print(median(iris$X))
[1] 75.5
> print(median(iris$[Link]))
[1] 5.8
> print(median(iris$[Link]))
[1] 3
> print(median(iris$[Link]))
[1] 4.35
> print(median(iris$[Link]))
[1] 1.3

Variance
The variance is a numerical measure of how the data values is dispersed around the mean.
In particular, the sample variance is defined as:

Similarly, the population variance is defined in terms of the population mean μ and
population size N:

The function var() is used to calculate this in R.


Syntax:
var(x, y = NULL, [Link] = FALSE, use)
Following is the description of the parameters used −
• x is a numeric vector, matrix or data frame.
• y is NULL (default) or a vector, matrix or data frame with compatible dimensions to
x. The default is equivalent to y = x (but more efficient).
• [Link] is logical. Should missing values be removed?
• Use is an optional character string giving a method for computing covariances in the
presence of missing values.

Variance of each column of iris data:


> print(var(iris$X))
[1] 1887.5
> print(var(iris$[Link]))
[1] 0.6856935
> print(var(iris$[Link]))
[1] 0.1899794
> print(var(iris$[Link]))
[1] 3.116278
> print(var(iris$[Link]))
[1] 0.5810063

Standard Deviation
The standard deviation of an observation variable is the square root of its variance. This
function computes the standard deviation of the values in x. If [Link] is TRUE then missing
values are removed before computation proceeds.

POPULATION STANDARD DEVIATION


The standard deviation of a population is the square root of the population variance. The
symbol for the population standard deviation is Σ (sigma). Its formula is

SAMPLE STANDARD DEVIATION


The standard deviation of a sample estimate of the standard deviation of a population is the
square root of the sample variance. Its symbol is s and its formula is

Syntax:
sd(x, [Link] = FALSE)
Following is the description of the parameters used −
• X is a numeric vector or an R object which is coercible to one by [Link](x).
• [Link] logical. Should missing values be removed?

Standard Deviation of each column of iris data:


> print(sd(iris$X))
[1] 43.44537
> print(sd(iris$[Link]))
[1] 0.8280661
> print(sd(iris$[Link]))
[1] 0.4358663
> print(sd(iris$[Link]))
[1] 1.765298
> print(sd(iris$[Link]))
[1] 0.7622377
Experiment No. 2 Date:

Aim: Write R program to normalize the variables into 0 to 1 scale using min-max
normalization.

Normalization
Normalization or scaling refers to bringing all the columns into same range.

Min-Max normalization:
It is simple way of scaling values in a column. But, it tries to move the values towards the
mean of the column. Here is the formula

Converting it into R can be pretty simple as follows

> z<- (value-min(column))/(max(column)-min(column))

a) By using R program to normalize the variables into 0 to 1 scale using min-max


normalization:
Program:
setwd("c:\\x")
mydf <- [Link]("c:\\x\\[Link]")
print(mydf)

normalize <- function(df,cols)


{
result <- df # make a copy of the input data frame

for (j in cols) { # each specified col


m <- mean(df[,j]) # column mean
std <- sd(df[,j]) # column (sample) sd

for (i in 1:nrow(result)) { # each row of cur col


result[i,j] <- (result[i,j] - m) / std
}
}
return(result)
}

cols <- c(2,3,4,5)

print(normalize(mydf,cols))

Output:
X [Link] [Link] [Link] [Link] Species
1 1 -0.89767388 1.01560199 -1.33575163 -1.3110521482 setosa
2 2 -1.13920048 -0.13153881 -1.33575163 -1.3110521482 setosa
3 3 -1.38072709 0.32731751 -1.39239929 -1.3110521482 setosa
4 4 -1.50149039 0.09788935 -1.27910398 -1.3110521482 setosa
5 5 -1.01843718 1.24503015 -1.33575163 -1.3110521482 setosa
6 6 -0.53538397 1.93331463 -1.16580868 -1.0486667950 setosa
7 7 -1.50149039 0.78617383 -1.33575163 -1.1798594716 setosa
8 8 -1.01843718 0.78617383 -1.27910398 -1.3110521482 setosa
9 9 -1.74301699 -0.36096697 -1.33575163 -1.3110521482 setosa
10 10 -1.13920048 0.09788935 -1.27910398 -1.4422448248 setosa
11 11 -0.53538397 1.47445831 -1.27910398 -1.3110521482 setosa
12 12 -1.25996379 0.78617383 -1.22245633 -1.3110521482 setosa
13 13 -1.25996379 -0.13153881 -1.33575163 -1.4422448248 setosa
14 14 -1.86378030 -0.13153881 -1.50569459 -1.4422448248 setosa
15 15 -0.05233076 2.16274279 -1.44904694 -1.3110521482 setosa
16 16 -0.17309407 3.08045544 -1.27910398 -1.0486667950 setosa
17 17 -0.53538397 1.93331463 -1.39239929 -1.0486667950 setosa
18 18 -0.89767388 1.01560199 -1.33575163 -1.1798594716 setosa
19 19 -0.17309407 1.70388647 -1.16580868 -1.1798594716 setosa
20 20 -0.89767388 1.70388647 -1.27910398 -1.1798594716 setosa
21 21 -0.53538397 0.78617383 -1.16580868 -1.3110521482 setosa
22 22 -0.89767388 1.47445831 -1.27910398 -1.0486667950 setosa
23 23 -1.50149039 1.24503015 -1.56234224 -1.3110521482 setosa
24 24 -0.89767388 0.55674567 -1.16580868 -0.9174741184 setosa
25 25 -1.25996379 0.78617383 -1.05251337 -1.3110521482 setosa
26 26 -1.01843718 -0.13153881 -1.22245633 -1.3110521482 setosa
27 27 -1.01843718 0.78617383 -1.22245633 -1.0486667950 setosa
28 28 -0.77691058 1.01560199 -1.27910398 -1.3110521482 setosa
29 29 -0.77691058 0.78617383 -1.33575163 -1.3110521482 setosa
30 30 -1.38072709 0.32731751 -1.22245633 -1.3110521482 setosa
31 31 -1.25996379 0.09788935 -1.22245633 -1.3110521482 setosa
32 32 -0.53538397 0.78617383 -1.27910398 -1.0486667950 setosa
33 33 -0.77691058 2.39217095 -1.27910398 -1.4422448248 setosa
34 34 -0.41462067 2.62159911 -1.33575163 -1.3110521482 setosa
35 35 -1.13920048 0.09788935 -1.27910398 -1.3110521482 setosa
36 36 -1.01843718 0.32731751 -1.44904694 -1.3110521482 setosa
37 37 -0.41462067 1.01560199 -1.39239929 -1.3110521482 setosa
38 38 -1.13920048 1.24503015 -1.33575163 -1.4422448248 setosa
39 39 -1.74301699 -0.13153881 -1.39239929 -1.3110521482 setosa
40 40 -0.89767388 0.78617383 -1.27910398 -1.3110521482 setosa
41 41 -1.01843718 1.01560199 -1.39239929 -1.1798594716 setosa
42 42 -1.62225369 -1.73753594 -1.39239929 -1.1798594716 setosa
43 43 -1.74301699 0.32731751 -1.39239929 -1.3110521482 setosa
44 44 -1.01843718 1.01560199 -1.22245633 -0.7862814418 setosa
45 45 -0.89767388 1.70388647 -1.05251337 -1.0486667950 setosa
46 46 -1.25996379 -0.13153881 -1.33575163 -1.1798594716 setosa
47 47 -0.89767388 1.70388647 -1.22245633 -1.3110521482 setosa
48 48 -1.50149039 0.32731751 -1.33575163 -1.3110521482 setosa
49 49 -0.65614727 1.47445831 -1.27910398 -1.3110521482 setosa
50 50 -1.01843718 0.55674567 -1.33575163 -1.3110521482 setosa
51 51 1.39682886 0.32731751 0.53362088 0.2632599711 versicolor
52 52 0.67224905 0.32731751 0.42032558 0.3944526477 versicolor
53 53 1.27606556 0.09788935 0.64691619 0.3944526477 versicolor
54 54 -0.41462067 -1.73753594 0.13708732 0.1320672944 versicolor
55 55 0.79301235 -0.59039513 0.47697323 0.3944526477 versicolor
56 56 -0.17309407 -0.59039513 0.42032558 0.1320672944 versicolor
57 57 0.55148575 0.55674567 0.53362088 0.5256453243 versicolor
58 58 -1.13920048 -1.50810778 -0.25944625 -0.2615107354 versicolor
59 59 0.91377565 -0.36096697 0.47697323 0.1320672944 versicolor
60 60 -0.77691058 -0.81982329 0.08043967 0.2632599711 versicolor
61 61 -1.01843718 -2.42582042 -0.14615094 -0.2615107354 versicolor
62 62 0.06843254 -0.13153881 0.25038262 0.3944526477 versicolor
63 63 0.18919584 -1.96696410 0.13708732 -0.2615107354 versicolor
64 64 0.30995914 -0.36096697 0.53362088 0.2632599711 versicolor
65 65 -0.29385737 -0.36096697 -0.08950329 0.1320672944 versicolor
66 66 1.03453895 0.09788935 0.36367793 0.2632599711 versicolor
67 67 -0.29385737 -0.13153881 0.42032558 0.3944526477 versicolor
68 68 -0.05233076 -0.81982329 0.19373497 -0.2615107354 versicolor
69 69 0.43072244 -1.96696410 0.42032558 0.3944526477 versicolor
70 70 -0.29385737 -1.27867961 0.08043967 -0.1303180588 versicolor
71 71 0.06843254 0.32731751 0.59026853 0.7880306775 versicolor
72 72 0.30995914 -0.59039513 0.13708732 0.1320672944 versicolor
73 73 0.55148575 -1.27867961 0.64691619 0.3944526477 versicolor
74 74 0.30995914 -0.59039513 0.53362088 0.0008746178 versicolor
75 75 0.67224905 -0.36096697 0.30703027 0.1320672944 versicolor
76 76 0.91377565 -0.13153881 0.36367793 0.2632599711 versicolor
77 77 1.15530226 -0.59039513 0.59026853 0.2632599711 versicolor
78 78 1.03453895 -0.13153881 0.70356384 0.6568380009 versicolor
79 79 0.18919584 -0.36096697 0.42032558 0.3944526477 versicolor
80 80 -0.17309407 -1.04925145 -0.14615094 -0.2615107354 versicolor
81 81 -0.41462067 -1.50810778 0.02379201 -0.1303180588 versicolor
82 82 -0.41462067 -1.50810778 -0.03285564 -0.2615107354 versicolor
83 83 -0.05233076 -0.81982329 0.08043967 0.0008746178 versicolor
84 84 0.18919584 -0.81982329 0.76021149 0.5256453243 versicolor
85 85 -0.53538397 -0.13153881 0.42032558 0.3944526477 versicolor
86 86 0.18919584 0.78617383 0.42032558 0.5256453243 versicolor
87 87 1.03453895 0.09788935 0.53362088 0.3944526477 versicolor
88 88 0.55148575 -1.73753594 0.36367793 0.1320672944 versicolor
89 89 -0.29385737 -0.13153881 0.19373497 0.1320672944 versicolor
90 90 -0.41462067 -1.27867961 0.13708732 0.1320672944 versicolor
91 91 -0.41462067 -1.04925145 0.36367793 0.0008746178 versicolor
92 92 0.30995914 -0.13153881 0.47697323 0.2632599711 versicolor
93 93 -0.05233076 -1.04925145 0.13708732 0.0008746178 versicolor
94 94 -1.01843718 -1.73753594 -0.25944625 -0.2615107354 versicolor
95 95 -0.29385737 -0.81982329 0.25038262 0.1320672944 versicolor
96 96 -0.17309407 -0.13153881 0.25038262 0.0008746178 versicolor
97 97 -0.17309407 -0.36096697 0.25038262 0.1320672944 versicolor
98 98 0.43072244 -0.36096697 0.30703027 0.1320672944 versicolor
99 99 -0.89767388 -1.27867961 -0.42938920 -0.1303180588 versicolor
100 100 -0.17309407 -0.59039513 0.19373497 0.1320672944 versicolor
101 101 0.55148575 0.55674567 1.27004036 1.7063794137 virginica
102 102 -0.05233076 -0.81982329 0.76021149 0.9192233541 virginica
103 103 1.51759216 -0.13153881 1.21339271 1.1816087073 virginica
104 104 0.55148575 -0.36096697 1.04344975 0.7880306775 virginica
105 105 0.79301235 -0.13153881 1.15674505 1.3128013839 virginica
106 106 2.12140867 -0.13153881 1.60992627 1.1816087073 virginica
107 107 -1.13920048 -1.27867961 0.42032558 0.6568380009 virginica
108 108 1.75911877 -0.36096697 1.43998331 0.7880306775 virginica
109 109 1.03453895 -1.27867961 1.15674505 0.7880306775 virginica
110 110 1.63835547 1.24503015 1.32668801 1.7063794137 virginica
111 111 0.79301235 0.32731751 0.76021149 1.0504160307 virginica
112 112 0.67224905 -0.81982329 0.87350679 0.9192233541 virginica
113 113 1.15530226 -0.13153881 0.98680210 1.1816087073 virginica
114 114 -0.17309407 -1.27867961 0.70356384 1.0504160307 virginica
115 115 -0.05233076 -0.59039513 0.76021149 1.5751867371 virginica
116 116 0.67224905 0.32731751 0.87350679 1.4439940605 virginica
117 117 0.79301235 -0.13153881 0.98680210 0.7880306775 virginica
118 118 2.24217198 1.70388647 1.66657392 1.3128013839 virginica
119 119 2.24217198 -1.04925145 1.77986923 1.4439940605 virginica
120 120 0.18919584 -1.96696410 0.70356384 0.3944526477 virginica
121 121 1.27606556 0.32731751 1.10009740 1.4439940605 virginica
122 122 -0.29385737 -0.59039513 0.64691619 1.0504160307 virginica
123 123 2.24217198 -0.59039513 1.66657392 1.0504160307 virginica
124 124 0.55148575 -0.81982329 0.64691619 0.7880306775 virginica
125 125 1.03453895 0.55674567 1.10009740 1.1816087073 virginica
126 126 1.63835547 0.32731751 1.27004036 0.7880306775 virginica
127 127 0.43072244 -0.59039513 0.59026853 0.7880306775 virginica
128 128 0.30995914 -0.13153881 0.64691619 0.7880306775 virginica
129 129 0.67224905 -0.59039513 1.04344975 1.1816087073 virginica
130 130 1.63835547 -0.13153881 1.15674505 0.5256453243 virginica
131 131 1.87988207 -0.59039513 1.32668801 0.9192233541 virginica
132 132 2.48369858 1.70388647 1.49663097 1.0504160307 virginica
133 133 0.67224905 -0.59039513 1.04344975 1.3128013839 virginica
134 134 0.55148575 -0.59039513 0.76021149 0.3944526477 virginica
135 135 0.30995914 -1.04925145 1.04344975 0.2632599711 virginica
136 136 2.24217198 -0.13153881 1.32668801 1.4439940605 virginica
137 137 0.55148575 0.78617383 1.04344975 1.5751867371 virginica
138 138 0.67224905 0.09788935 0.98680210 0.7880306775 virginica
139 139 0.18919584 -0.13153881 0.59026853 0.7880306775 virginica
140 140 1.27606556 0.09788935 0.93015445 1.1816087073 virginica
141 141 1.03453895 0.09788935 1.04344975 1.5751867371 virginica
142 142 1.27606556 0.09788935 0.76021149 1.4439940605 virginica
143 143 -0.05233076 -0.81982329 0.76021149 0.9192233541 virginica
144 144 1.15530226 0.32731751 1.21339271 1.4439940605 virginica
145 145 1.03453895 0.55674567 1.10009740 1.7063794137 virginica
146 146 1.03453895 -0.13153881 0.81685914 1.4439940605 virginica
147 147 0.55148575 -1.27867961 0.70356384 0.9192233541 virginica
148 148 0.79301235 -0.13153881 0.81685914 1.0504160307 virginica
149 149 0.43072244 0.78617383 0.93015445 1.4439940605 virginica
150 150 0.06843254 -0.13153881 0.76021149 0.7880306775 virginica
b) By using min max normalization on each column
Example:

> z <- (iris$[Link]-min(iris$[Link]))/(max(iris$[Link])-


min(iris$[Link]))
> print(z)

Output:
[1] 0.22222222 0.16666667 0.11111111 0.08333333 0.19444444 0.30555556 0.08333333
0.19444444
[9] 0.02777778 0.16666667 0.30555556 0.13888889 0.13888889 0.00000000 0.41666667
0.38888889
[17] 0.30555556 0.22222222 0.38888889 0.22222222 0.30555556 0.22222222
0.08333333 0.22222222
[25] 0.13888889 0.19444444 0.19444444 0.25000000 0.25000000 0.11111111
0.13888889 0.30555556
[33] 0.25000000 0.33333333 0.16666667 0.19444444 0.33333333 0.16666667
0.02777778 0.22222222
[41] 0.19444444 0.05555556 0.02777778 0.19444444 0.22222222 0.13888889
0.22222222 0.08333333
[49] 0.27777778 0.19444444 0.75000000 0.58333333 0.72222222 0.33333333
0.61111111 0.38888889
[57] 0.55555556 0.16666667 0.63888889 0.25000000 0.19444444 0.44444444
0.47222222 0.50000000
[65] 0.36111111 0.66666667 0.36111111 0.41666667 0.52777778 0.36111111
0.44444444 0.50000000
[73] 0.55555556 0.50000000 0.58333333 0.63888889 0.69444444 0.66666667
0.47222222 0.38888889
[81] 0.33333333 0.33333333 0.41666667 0.47222222 0.30555556 0.47222222
0.66666667 0.55555556
[89] 0.36111111 0.33333333 0.33333333 0.50000000 0.41666667 0.19444444
0.36111111 0.38888889
[97] 0.38888889 0.52777778 0.22222222 0.38888889 0.55555556 0.41666667
0.77777778 0.55555556
[105] 0.61111111 0.91666667 0.16666667 0.83333333 0.66666667 0.80555556
0.61111111 0.58333333
[113] 0.69444444 0.38888889 0.41666667 0.58333333 0.61111111 0.94444444
0.94444444 0.47222222
[121] 0.72222222 0.36111111 0.94444444 0.55555556 0.66666667 0.80555556
0.52777778 0.50000000
[129] 0.58333333 0.80555556 0.86111111 1.00000000 0.58333333 0.55555556
0.50000000 0.94444444
[137] 0.55555556 0.58333333 0.47222222 0.72222222 0.66666667 0.72222222
0.41666667 0.69444444
[145] 0.66666667 0.66666667 0.55555556 0.61111111 0.52777778 0.44444444
Experiment No. 3 Date:

Aim: Generate histograms for any one variable (sepal length/ sepal width/ petal
length/ petal width) and generate scatter plots for every pair of variables showing
each species in different color

R – Histograms
A histogram represents the frequencies of values of a variable bucketed into ranges.
Histogram is similar to bar chat but the difference is it groups the values into continuous
ranges. Each bar in histogram represents the height of the number of values present in
that range.

R creates histogram using hist() function. This function takes a vector as an input and
uses some more parameters to plot histograms.

Syntax
The basic syntax for creating a histogram using R is −

hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
• v is a vector containing numeric values used in histogram.
• main indicates title of the chart.
• col is used to set color of the bars.
• border is used to set border color of each bar.
• xlab is used to give description of x-axis.
• xlim is used to specify the range of values on the x-axis.
• ylim is used to specify the range of values on the y-axis.
• breaks is used to mention the width of each bar.

Example
A simple histogram is created using input vector, label, col and border parameters.

The script given below will create and save the histogram in the current R working
directory.
# Create data for the graph.
v<-c(9,13,21,8,36,22,12,41,31,33,19)
# Give the chart file a name.
png(file = "[Link]")
# Create the histogram.
hist(v,xlab = "Weight",col = "yellow",border = "blue")
When we execute the above code, it produces the following result −

Histogram for Sepal Length Variable of Iris data


Syntax:
> hist(iris$[Link])

Output:
R - Scatterplots

Scatterplots show many points plotted in the Cartesian plane. Each point represents the
values of two variables. One variable is chosen in the horizontal axis and another in the
vertical axis.

The simple scatterplot is created using the plot() function.

Syntax
The basic syntax for creating scatterplot in R is −

plot(x, y, main, xlab, ylab, xlim, ylim, axes)


Following is the description of the parameters used −

• x is the data set whose values are the horizontal coordinates.

• y is the data set whose values are the vertical coordinates.

• main is the tile of the graph.

• xlab is the label in the horizontal axis.

• ylab is the label in the vertical axis.

• xlim is the limits of the values of x used for plotting.

• ylim is the limits of the values of y used for plotting.

• axes indicates whether both axes should be drawn on the plot.

Example
We use the data set "mtcars" available in the R environment to create a basic scatterplot.
Let's use the columns "wt" and "mpg" in mtcars.

input <- mtcars[,c('wt','mpg')]

print(head(input))

When we execute the above code, it produces the following result −

wt mpg
Mazda RX4 2.620 21.0
Mazda RX4 Wag 2.875 21.0
Datsun 710 2.320 22.8
Hornet 4 Drive 3.215 21.4
Hornet Sportabout 3.440 18.7
Valiant 3.460 18.1
Creating the Scatterplot
The below script will create a scatterplot graph for the relation between wt(weight) and
mpg(miles per gallon).

# Get the input values.


input <- mtcars[,c('wt','mpg')]
# Give the chart file a name.
png(file = "[Link]")
# Plot the chart for cars with weight between 2.5 to 5 and mileage between 15 and 30.
plot(x = input$wt,y = input$mpg,
xlab = "Weight",
ylab = "Milage",
xlim = c(2.5,5),
ylim = c(15,30),
main = "Weight vs Milage"
)
# Save the file.
[Link]()

When we execute the above code, it produces the following result −


Scatterplot Matrices
When we have more than two variables and we want to find the correlation between one
variable versus the remaining ones we use scatterplot matrix. We use pairs() function to
create matrices of scatterplots.

Syntax
The basic syntax for creating scatterplot matrices in R is −

pairs(formula, data)
Following is the description of the parameters used −

• formula represents the series of variables used in pairs.

• data represents the data set from which the variables will be taken.

Example
Each variable is paired up with each of the remaining variable. A scatterplot is plotted for
each pair.

# Give the chart file a name.


png(file = "scatterplot_matrices.png")
# Plot the matrices between 4 variables giving 12 plots.
# One variable with 3 others and total 4 variables.
pairs(~wt+mpg+disp+cyl,data = mtcars,
main = "Scatterplot Matrix")
# Save the file.
[Link]()

When the above code is executed we get the following output.


Scatter plots for every pair of variables showing each species in different color:
Scatter plots Generic X-Y Plotting
Description
Generic function for plotting of R objects. For simple scatter plots, [Link] will be used.
However, there are plot methods for many R objects, including functions, [Link],
density objects, etc. Use methods (plot) and the documentation for these.
Syntax:
plot(x, y, ...)

Attributes:
x the coordinates of points in the plot. Alternatively, a single plotting structure, function
or any R object with a plotmethod can be provided.
y the y coordinates of points in the plot, optional if x is an appropriate structure.
... Arguments to be passed to methods, such as graphical parameters (see par). Many
methods will accept the following arguments:
type
what type of plot should be drawn. Possible types are
• "p" for points,
• "l" for lines,
• "b" for both,
• "c" for the lines part alone of "b",
• "o" for both ‘overplotted’,
• "h" for ‘histogram’ like (or ‘high-density’) vertical lines,
• "s" for stair steps,
• "S" for other steps, see ‘Details’ below,
• "n" for no plotting.
All other types give a warning or an error; using, e.g., type = "punkte" being
equivalent to type = "p" for S compatibility. Note that some methods, e.g.
[Link], do not accept this.
main
an overall title for the plot
sub
a sub title for the plot
xlab
a title for the x axis
ylab
a title for the y axis
asp
the y/x aspect ratio
1) plot(iris$[Link], iris$[Link], pch=23, bg=c("red","green3","blue")
[unclass(iris$Species)], main="Iris Data")

2) plot(iris$[Link], iris$[Link], pch=23, bg=c("red","green3","blue")


[unclass(iris$Species)], main="Iris Data")
Experiment No. 4 Date:

Aim: Generate box plots for each of the numerical attributes. Identify the attribute
with the highest variance.

R - Boxplots
Boxplots are a measure of how well distributed is the data in a data set. It divides the data
set into three quartiles. This graph represents the minimum, maximum, median, first
quartile and third quartile in the data set. It is also useful in comparing the distribution of
data across data sets by drawing boxplots for each of them.

Boxplots are created in R by using the boxplot() function.

Syntax
The basic syntax to create a boxplot in R is −

boxplot(x, data, notch, varwidth, names, main)


Following is the description of the parameters used −
• x is a vector or a formula.
• data is the data frame.
• notch is a logical value. Set as TRUE to draw a notch.
• varwidth is a logical value. Set as true to draw width of the box proportionate to the
sample size.
• names are the group labels which will be printed under each boxplot.
• main is used to give a title to the graph.

Example
We use the data set "mtcars" available in the R environment to create a basic boxplot.
Let's look at the columns "mpg" and "cyl" in mtcars.

input <- mtcars[,c('mpg','cyl')]


print(head(input))

When we execute above code, it produces following result −

mpg cyl
Mazda RX4 21.0 6
Mazda RX4 Wag 21.0 6
Datsun 710 22.8 4
Hornet 4 Drive 21.4 6
Hornet Sportabout 18.7 8
Valiant 18.1 6
Creating the Boxplot
The below script will create a boxplot graph for the relation between mpg (miles per gallon)
and cyl (number of cylinders).

# Give the chart file a name.


png(file = "[Link]")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders",
ylab = "Miles Per Gallon", main = "Mileage Data")
# Save the file.
[Link]()

When we execute the above code, it produces the following result −

Boxplot with Notch


We can draw boxplot with notch to find out how the medians of different data groups match
with each other.
The below script will create a boxplot graph with notch for each of the data group.

# Give the chart file a name.


png(file = "boxplot_with_notch.png")
# Plot the chart.
boxplot(mpg ~ cyl, data = mtcars,
xlab = "Number of Cylinders",
ylab = "Miles Per Gallon",
main = "Mileage Data",
notch = TRUE,
varwidth = TRUE,
col = c("green","yellow","purple"),
names = c("High","Medium","Low")
)
# Save the file.
[Link]()

When we execute the above code, it produces the following result −


Generating box plots for each of the numerical attributes of Iris data with reference
with Species

> boxplot([Link]~Species, ylab="sepal length (cm)", main="Iris sepal length by species")

> boxplot([Link]~Species, ylab="sepal width (cm)", main="Iris sepal width by species")


> boxplot([Link]~Species, ylab="Petal length (cm)", main="Iris Petal length by species")

> boxplot([Link]~Species, ylab="Petal Width (cm)", main="Iris Petal Width by species")


Experiment No. 5 Date:

Aim: Study of homogeneous and heterogeneous data structures such as vector,


matrix, array, list, data frame in R.

R - Vectors
Vectors are the most basic R data objects and there are six types of atomic vectors. They
are logical, integer, double, complex, character and raw.

Vector Creation
Single Element Vector
Even when you write just one value in R, it becomes a vector of length 1 and belongs to
one of the above vector types.

# Atomic vector of type character.


print("abc");
# Atomic vector of type double.
print(12.5)
# Atomic vector of type integer.
print(63L)
# Atomic vector of type logical.
print(TRUE)
# Atomic vector of type complex.
print(2+3i)
# Atomic vector of type raw.
print(charToRaw('hello'))

When we execute the above code, it produces the following result −


[1] "abc"
[1] 12.5
[1] 63
[1] TRUE
[1] 2+3i
[1] 68 65 6c 6c 6f

Multiple Elements Vector


Using colon operator with numeric data

# Creating a sequence from 5 to 13.


v <- 5:13
print(v)
# Creating a sequence from 6.6 to 12.6.
v <- 6.6:12.6
print(v)
# If the final element specified does not belong to the sequence then it is discarded.
v <- 3.8:11.4
print(v)

When we execute the above code, it produces the following result −

[1] 5 6 7 8 9 10 11 12 13
[1] 6.6 7.6 8.6 9.6 10.6 11.6 12.6
[1] 3.8 4.8 5.8 6.8 7.8 8.8 9.8 10.8

Using sequence (Seq.) operator

# Create vector with elements from 5 to 9 incrementing by 0.4.


print(seq(5, 9, by = 0.4))

When we execute the above code, it produces the following result −

[1] 5.0 5.4 5.8 6.2 6.6 7.0 7.4 7.8 8.2 8.6 9.0

Using the c() function

The non-character values are coerced to character type if one of the elements is a
character.

# The logical and numeric values are converted to characters.


s <- c('apple','red',5,TRUE)
print(s)

When we execute the above code, it produces the following result −

[1] "apple" "red" "5" "TRUE"

Accessing Vector Elements


Elements of a Vector are accessed using indexing. The [ ] brackets are used for indexing.
Indexing starts with position 1. Giving a negative value in the index drops that element from
result. TRUE, FALSE or 0 and 1 can also be used for indexing.

# Accessing vector elements using position.


t <- c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
u <- t[c(2,3,6)]
print(u)
# Accessing vector elements using logical indexing.
v <- t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
print(v)
# Accessing vector elements using negative indexing.
x <- t[c(-2,-5)]
print(x)
# Accessing vector elements using 0/1 indexing.
y <- t[c(0,0,0,0,0,0,1)]
print(y)

When we execute the above code, it produces the following result −

[1] "Mon" "Tue" "Fri"


[1] "Sun" "Fri"
[1] "Sun" "Tue" "Wed" "Fri" "Sat"
[1] "Sun"

Vector Manipulation
Vector arithmetic
Two vectors of same length can be added, subtracted, multiplied or divided giving the
result as a vector output.
# Create two vectors.
v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11,0,8,1,2)
# Vector addition.
[Link] <- v1+v2
print([Link])
# Vector subtraction.
[Link] <- v1-v2
print([Link])
# Vector multiplication.
[Link] <- v1*v2
print([Link])
# Vector division.
[Link] <- v1/v2
print([Link])

When we execute the above code, it produces the following result −


[1] 7 19 4 13 1 13
[1] -1 -3 4 -3 -1 9
[1] 12 88 0 40 0 22
[1] 0.7500000 0.7272727 Inf 0.6250000 0.0000000 5.5000000

Vector Element Recycling


If we apply arithmetic operations to two vectors of unequal length, then the elements of the
shorter vector are recycled to complete the operations.

v1 <- c(3,8,4,5,0,11)
v2 <- c(4,11)
# V2 becomes c(4,11,4,11,4,11)
[Link] <- v1+v2
print([Link])
[Link] <- v1-v2
print([Link])

When we execute the above code, it produces the following result −

[1] 7 19 8 16 4 22
[1] -1 -3 0 -6 -4 0

Vector Element Sorting


Elements in a vector can be sorted using the sort() function.

v <- c(3,8,4,5,0,11, -9, 304)


# Sort the elements of the vector.
[Link] <- sort(v)
print([Link])
# Sort the elements in the reverse order.
[Link] <- sort(v, decreasing = TRUE)
print([Link])
# Sorting character vectors.
v <- c("Red","Blue","yellow","violet")
[Link] <- sort(v)
print([Link])
# Sorting character vectors in reverse order.
[Link] <- sort(v, decreasing = TRUE)
print([Link])

When we execute the above code, it produces the following result −

[1] -9 0 3 4 5 8 11 304
[1] 304 11 8 5 4 3 0 -9
[1] "Blue" "Red" "violet" "yellow"
[1] "yellow" "violet" "Red" "Blue"

R - Lists
Lists are the R objects which contain elements of different types like − numbers, strings,
vectors and another list inside it. A list can also contain a matrix or a function as its
elements. List is created using list() function.

Creating a List
Following is an example to create a list containing strings, numbers, vectors and a logical
values.

# Create a list containing strings, numbers, vectors and a logical


# values.
list_data <- list("Red", "Green", c(21,32,11), TRUE, 51.23, 119.1)
print(list_data)

When we execute the above code, it produces the following result −

[[1]]
[1] "Red"
[[2]]
[1] "Green"
[[3]]
[1] 21 32 11
[[4]]
[1] TRUE
[[5]]
[1] 51.23
[[6]]
[1] 119.1

Naming List Elements


The list elements can be given names and they can be accessed using these names.

# Create a list containing a vector, a matrix and a list.


list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Show the list.
print(list_data)

When we execute the above code, it produces the following result −

$`1st_Quarter`
[1] "Jan" "Feb" "Mar"

$A_Matrix
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8

$A_Inner_list
$A_Inner_list[[1]]
[1] "green"

$A_Inner_list[[2]]
[1] 12.3

Accessing List Elements


Elements of the list can be accessed by the index of the element in the list. In case of
named lists it can also be accessed using the names.

We continue to use the list in the above example −

# Create a list containing a vector, a matrix and a list.


list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Access the first element of the list.
print(list_data[1])
# Access the thrid element. As it is also a list, all its elements will be printed.
print(list_data[3])
# Access the list element using the name of the element.
print(list_data$A_Matrix)

When we execute the above code, it produces the following result −

$`1st_Quarter`
[1] "Jan" "Feb" "Mar"
$A_Inner_list
$A_Inner_list[[1]]
[1] "green"
$A_Inner_list[[2]]
[1] 12.3
[,1] [,2] [,3]
[1,] 3 5 -2
[2,] 9 1 8

Manipulating List Elements


We can add, delete and update list elements as shown below. We can add and delete
elements only at the end of a list. But we can update any element.

# Create a list containing a vector, a matrix and a list.


list_data <- list(c("Jan","Feb","Mar"), matrix(c(3,9,5,1,-2,8), nrow = 2),
list("green",12.3))
# Give names to the elements in the list.
names(list_data) <- c("1st Quarter", "A_Matrix", "A Inner list")
# Add element at the end of the list.
list_data[4] <- "New element"
print(list_data[4])
# Remove the last element.
list_data[4] <- NULL
# Print the 4th Element.
print(list_data[4])
# Update the 3rd Element.
list_data[3] <- "updated element"
print(list_data[3])

When we execute the above code, it produces the following result −

[[1]]
[1] "New element"
$<NA>
NULL
$`A Inner list`
[1] "updated element"

Merging Lists
You can merge many lists into one list by placing all the lists inside one list() function.

# Create two lists.


list1 <- list(1,2,3)
list2 <- list("Sun","Mon","Tue")
# Merge the two lists.
[Link] <- c(list1,list2)
# Print the merged list.
print([Link])

When we execute the above code, it produces the following result −


[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] "Sun"
[[5]]
[1] "Mon"
[[6]]
[1] "Tue"

Converting List to Vector


A list can be converted to a vector so that the elements of the vector can be used for
further manipulation. All the arithmetic operations on vectors can be applied after the list is
converted into vectors. To do this conversion, we use the unlist() function. It takes the list
as input and produces a vector.

# Create lists.
list1 <- list(1:5)
print(list1)
list2 <-list(10:14)
print(list2)
# Convert the lists to vectors.
v1 <- unlist(list1)
v2 <- unlist(list2)
print(v1)
print(v2)
# Now add the vectors
result <- v1+v2
print(result)

When we execute the above code, it produces the following result −

[[1]]
[1] 1 2 3 4 5

[[1]]
[1] 10 11 12 13 14

[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19

R - Matrices
Matrices are the R objects in which the elements are arranged in a two-dimensional
rectangular layout. They contain elements of the same atomic types. Though we can
create a matrix containing only characters or only logical values, they are not of much use.
We use matrices containing numeric elements to be used in mathematical calculations.

A Matrix is created using the matrix() function.

Syntax:
The basic syntax for creating a matrix in R is −

matrix(data, nrow, ncol, byrow, dimnames)


Following is the description of the parameters used −
• data is the input vector which becomes the data elements of the matrix.
• nrow is the number of rows to be created.
• ncol is the number of columns to be created.
• byrow is a logical clue. If TRUE then the input vector elements are arranged by row.
• dimname is the names assigned to the rows and columns.

Example
Create a matrix taking a vector of numbers as input.

# Elements are arranged sequentially by row.


M <- matrix(c(3:14), nrow = 4, byrow = TRUE)
print(M)
# Elements are arranged sequentially by column.
N <- matrix(c(3:14), nrow = 4, byrow = FALSE)
print(N)
# Define the column and row names.
rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
print(P)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 4 5
[2,] 6 7 8
[3,] 9 10 11
[4,] 12 13 14
[,1] [,2] [,3]
[1,] 3 7 11
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14
col1 col2 col3
row1 3 4 5
row2 6 7 8
row3 9 10 11
row4 12 13 14

Accessing Elements of a Matrix


Elements of a matrix can be accessed by using the column and row index of the element.
We consider the matrix P above to find the specific elements below.

# Define the column and row names.


rownames = c("row1", "row2", "row3", "row4")
colnames = c("col1", "col2", "col3")
# Create the matrix.
P <- matrix(c(3:14), nrow = 4, byrow = TRUE, dimnames = list(rownames, colnames))
# Access the element at 3rd column and 1st row.
print(P[1,3])
# Access the element at 2nd column and 4th row.
print(P[4,2])
# Access only the 2nd row.
print(P[2,])
# Access only the 3rd column.
print(P[,3])

When we execute the above code, it produces the following result −

[1] 5
[1] 13
col1 col2 col3
6 7 8
row1 row2 row3 row4
5 8 11 14

Matrix Computations
Various mathematical operations are performed on the matrices using the R operators.
The result of the operation is also a matrix.

The dimensions (number of rows and columns) should be same for the matrices involved
in the operation.

Matrix Addition & Subtraction


# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Add the matrices.
result <- matrix1 + matrix2
cat("Result of addition","\n")
print(result)
# Subtract the matrices
result <- matrix1 - matrix2
cat("Result of subtraction","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of addition
[,1] [,2] [,3]
[1,] 8 -1 5
[2,] 11 13 10
Result of subtraction
[,1] [,2] [,3]
[1,] -2 -1 -1
[2,] 7 -5 2

Matrix Multiplication & Division


# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow = 2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow = 2)
print(matrix2)
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)
# Divide the matrices
result <- matrix1 / matrix2
cat("Result of division","\n")
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 3 -1 2
[2,] 9 4 6
[,1] [,2] [,3]
[1,] 5 0 3
[2,] 2 9 4
Result of multiplication
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Result of division
[,1] [,2] [,3]
[1,] 0.6 -Inf 0.6666667
[2,] 4.5 0.4444444 1.5000000
R - Arrays
Arrays are the R data objects which can store data in more than two dimensions. For
example − If we create an array of dimension (2, 3, 4) then it creates 4 rectangular
matrices each with 2 rows and 3 columns. Arrays can store only data type.

An array is created using the array() function. It takes vectors as input and uses the values
in the dim parameter to create an array.

Example
The following example creates an array of two 3x3 matrices each with 3 rows and 3
columns.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2))
print(result)

When we execute the above code, it produces the following result −

,,1

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

,,2

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

Naming Columns and Rows


We can give names to the rows, columns and matrices in the array by using
the dimnames parameter.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
[Link] <- c("COL1","COL2","COL3")
[Link] <- c("ROW1","ROW2","ROW3")
[Link] <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),
dim = c(3,3,2),
dimnames = list([Link],[Link],[Link]))
print(result)

When we execute the above code, it produces the following result −

, , Matrix1

COL1 COL2 COL3


ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

, , Matrix2

COL1 COL2 COL3


ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

Accessing Array Elements


# Create two vectors of different lengths.
vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
[Link] <- c("COL1","COL2","COL3")
[Link] <- c("ROW1","ROW2","ROW3")
[Link] <- c("Matrix1","Matrix2")
# Take these vectors as input to the array.
result <- array(c(vector1,vector2),dim = c(3,3,2),dimnames = list([Link], [Link],
[Link]))
# Print the third row of the second matrix of the array.
print(result[3,,2])
# Print the element in the 1st row and 3rd column of the 1st matrix.
print(result[1,3,1])
# Print the 2nd Matrix.
print(result[,,2])

When we execute the above code, it produces the following result −


COL1 COL2 COL3
3 12 15
[1] 13
COL1 COL2 COL3
ROW1 5 10 13
ROW2 9 11 14
ROW3 3 12 15

Manipulating Array Elements


As array is made up matrices in multiple dimensions, the operations on elements of array
are carried out by accessing elements of the matrices.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
array1 <- array(c(vector1,vector2),dim = c(3,3,2))
# Create two vectors of different lengths.
vector3 <- c(9,1,0)
vector4 <- c(6,0,11,3,14,1,2,6,9)
array2 <- array(c(vector1,vector2),dim = c(3,3,2))
# create matrices from these arrays.
matrix1 <- array1[,,2]
matrix2 <- array2[,,2]
# Add the matrices.
result <- matrix1+matrix2
print(result)

When we execute the above code, it produces the following result −

[,1] [,2] [,3]


[1,] 10 20 26
[2,] 18 22 28
[3,] 6 24 30

Calculations Across Array Elements


We can do calculations across the elements in an array using the apply()function.

Syntax
apply(x, margin, fun)

Following is the description of the parameters used −


• x is an array.

• margin is the name of the data set used.

• fun is the function to be applied across the elements of the array.

Example
We use the apply() function below to calculate the sum of the elements in the rows of an
array across all the matrices.

# Create two vectors of different lengths.


vector1 <- c(5,9,3)
vector2 <- c(10,11,12,13,14,15)
# Take these vectors as input to the array.
[Link] <- array(c(vector1,vector2),dim = c(3,3,2))
print([Link])
# Use apply to calculate the sum of the rows across all the matrices.
result <- apply([Link], c(1), sum)
print(result)

When we execute the above code, it produces the following result −

,,1

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

,,2

[,1] [,2] [,3]


[1,] 5 10 13
[2,] 9 11 14
[3,] 3 12 15

[1] 56 68 60

R - Data Frames
A data frame is a table or a two-dimensional array-like structure in which each column
contains values of one variable and each row contains one set of values from each
column.

Following are the characteristics of a data frame.

• The column names should be non-empty.


• The row names should be unique.
• The data stored in a data frame can be of numeric, factor or character type.
• Each column should contain same number of data items.
Create Data Frame
# Create the data frame.
[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print([Link])

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date


1 1 Rick 623.30 2012-01-01
2 2 Dan 515.20 2013-09-23
3 3 Michelle 611.00 2014-11-15
4 4 Ryan 729.00 2014-05-11
5 5 Gary 843.25 2015-03-27

Get the Structure of the Data Frame


The structure of the data frame can be seen by using str() function.

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Get the structure of the data frame.
str([Link])

When we execute the above code, it produces the following result −

'[Link]': 5 obs. of 4 variables:


$ emp_id : int 1 2 3 4 5
$ emp_name : chr "Rick" "Dan" "Michelle" "Ryan" ...
$ salary : num 623 515 611 729 843
$ start_date: Date, format: "2012-01-01" "2013-09-23" "2014-11-15" "2014-05-11" ...

Summary of Data in Data Frame


The statistical summary and nature of the data can be obtained by applying summary()
function.

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Print the summary.
print(summary([Link]))

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date


Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
Mean :3 Mean :664.4 Mean :2014-01-14
3rd Qu.:4 3rd Qu.:729.0 3rd Qu.:2014-11-15
Max. :5 Max. :843.2 Max. :2015-03-27

Extract Data from Data Frame


Extract specific column from a data frame using column name.

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01","2013-09-23","2014-11-15","2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Extract Specific columns.
result <- [Link]([Link]$emp_name,[Link]$salary)
print(result)

When we execute the above code, it produces the following result −

[Link].emp_name [Link]
1 Rick 623.30
2 Dan 515.20
3 Michelle 611.00
4 Ryan 729.00
5 Gary 843.25
Extract the first two rows and then all columns

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Extract first two rows.
result <- [Link][1:2,]
print(result)

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date


1 1 Rick 623.3 2012-01-01
2 2 Dan 515.2 2013-09-23
Extract 3rd and 5th row with 2nd and 4th column

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11", "2015-03-27")),
stringsAsFactors = FALSE
)
# Extract 3rd and 5th row with 2nd and 4th column.
result <- [Link][c(3,5),c(2,4)]
print(result)

When we execute the above code, it produces the following result −

emp_name start_date
3 Michelle 2014-11-15
5 Gary 2015-03-27

Expand Data Frame


A data frame can be expanded by adding columns and rows.

Add Column
Just add the column vector using a new column name.

# Create the data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
stringsAsFactors = FALSE
)
# Add the "dept" coulmn.
[Link]$dept <- c("IT","Operations","IT","HR","Finance")
v <- [Link]
print(v)

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date dept


1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance

Add Row
To add more rows permanently to an existing data frame, we need to bring in the new
rows in the same structure as the existing data frame and use the rbind() function.

In the example below we create a data frame with new rows and merge it with the existing
data frame to create the final data frame.

# Create the first data frame.


[Link] <- [Link](
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = [Link](c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11","2015-03-27")),
dept = c("IT","Operations","IT","HR","Finance"),
stringsAsFactors = FALSE
)
# Create the second data frame
[Link] <- [Link](
emp_id = c (6:8),
emp_name = c("Rasmi","Pranab","Tusar"),
salary = c(578.0,722.5,632.8),
start_date = [Link](c("2013-05-21","2013-07-30","2014-06-17")),
dept = c("IT","Operations","Fianance"),
stringsAsFactors = FALSE
)
# Bind the two data frames.
[Link] <- rbind([Link],[Link])
print([Link])

When we execute the above code, it produces the following result −

emp_id emp_name salary start_date dept


1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 5 Gary 843.25 2015-03-27 Finance
6 6 Rasmi 578.00 2013-05-21 IT
7 7 Pranab 722.50 2013-07-30 Operations
8 8 Tusar 632.80 2014-06-17 Fianance
Experiment No. 6 Date:

Aim: Write R Program using ‘apply’ group of functions to create and apply
normalization function on each of the numeric variables/columns of iris dataset to
transform them into a value around 0 with z-score normalization.
Experiment No. 7 Date:

Aim:
a) Use R to apply linear regression to predict evaporation coefficient in terms of air
velocity using the data given below:

Air Velocity (cm/sec) 20,60,100,140,180,220,260,300,340,380


Evaporation 0.18, 0.37, 0.35, 0.78, 0.56, 0.75, 1.18, 1.36,
Coefficient 1.17, 1.65
(sqmm/sec)

b) Analyze the significance of residual standard-error value, R-squared value, F-


statistic. Find the correlation coefficient for this data and analyze the significance of
the correlation value.

c) Perform a log transformation on the ‘Air Velocity 'column, perform linear


regression again, and analyze all the relevant values.
Experiment No. 8 Date:

Aim:
a) Create an ARFF (Attribute-Relation File Format) file and read it in WEKA.
b) Explore the purpose of each button under the preprocess panel after loading
the ARFF file.
c) Also, try to interpret using a different ARFF file, weather. arff, provided with
WEKA.

Program:

a) Create an ARFF (Attribute-Relation File Format) file and read it in WEKA.


Experiment No. 9 Date:

Aim: Performing data preprocessing in Weka


Study Unsupervised Attribute Filters such as Replace Missing Values to replace
missing values in the given dataset, Add to add the new attribute Average, Discretize
to discretize the attributes into bins. Explore Normalize and Standardize options on a
dataset with numerical attributes.
Experiment No. 10 Date:

Aim: Classification using the WEKA toolkit


a) Demonstration of classification process using id3 algorithm on categorical
dataset(weather).
b) Demonstration of classification process using naïve Bayes algorithm on
categorical dataset (‘vote’).
c) Demonstration of classification process using Random Forest algorithm on
datasets containing large number of attributes.
Experiment No. 11 Date:

Aim: Classification using the WEKA toolkit – Part 2


a) Demonstration of classification process using J48 algorithm on mixed type of
dataset after discretizing numeric attributes.
b) Perform cross-validation strategy with various fold levels. Compare the
accuracy of the results.
Experiment No. 12 Date:

Aim: Performing clustering in WEKA


Apply hierarchical clustering algorithm on numeric dataset and estimate cluster
quality. Apply DBSCAN algorithm on numeric dataset and estimate cluster quality.
Experiment No. 13 Date:

Aim: Association rule analysis in WEKA


a) Demonstration of Association Rule Mining on supermarket dataset using
Apriori Algorithm with different support and confidence thresholds.
b) Demonstration of Association Rule Mining on supermarket dataset using FP-
Growth Algorithm with different support and confidence thresholds.
Experiment No. 14 Date:

Aim: Implement AI problem solving through Rule based forward chaining inference
using public domain software tool like CLIPS.
Experiment No. 15 Date:

Aim: Implement AI problem solving through Rule based Backward chaining


inference using PROLOG

You might also like