Module-1 R
Module-1 R
What is R Programming
R is an open-source programming language that is widely used as a statistical software and
data analysis tool. R generally comes with the Command-line interface. R is available
across widely used platforms like Windows, Linux, and macOS. Also, the R programming
language is the latest cutting-edge tool.
"R is an interpreted computer programming language which was created by Ross Ihaka
and Robert Gentleman at the University of Auckland, New Zealand." The R
Development Core Team currently develops R. It is also a software environment used to
analyze statistical information, graphical representation, reporting, and data modelling.
R is the implementation of the S programming language, which is combined with lexical
scoping semantics.
R not only allows us to do branching and looping but also allows to do modular
programming using functions.
R allows integration with the procedures written in the C, C++, .Net, Python, and
FORTRAN languages to improve efficiency.
R is one of the most important tool which is used by researchers, data analyst, statisticians,
and marketers for retrieving, cleaning, analyzing, visualizing, and presenting data.
History of R Programming
The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and
Robert Gentleman in the University of Auckland, New Zealand, and the R Development Core
Team currently develops it. This programming language name is taken from the name of both
the developers. The first project was considered in 1992. The initial version was released in
1995, and in 2000, a stable beta version was released.
The following table shows the release date, version, and description of R language:
0.49 1997-04-23 First time R's source was released, and CRAN
(Comprehensive R Archive Network) was started.
0.65.1 1999-10-07 update. Packages and install. Packages both are included.
2.13 2011-04-14 Added a function that rapidly converts code to byte code.
Console Lower-left The location where commands are entered and output
Window is printed.
Installation of Rstudio
R is maintained by an international team of developers who make the language available
through the web page of The Comprehensive R Archive Network. The top of the web page
provides three links for downloading R. Follow the link that describes your operating system:
Windows, Mac, or Linux.
Windows
To install R on Windows, click the “Download R for Windows” link. Then click the “base”
link. Next, click the first link at the top of the new page. This link should say something like
“Download R 3.0.3 for Windows,” except the 3.0.3 will be replaced by the most current
version of R. The link downloads an installer program, which installs the most up-to-date
version of R for Windows. Run this program and step through the installation wizard that
appears. The wizard will install R into your program files folders and place a shortcut in your
Start menu. Note that you’ll need to have all of the appropriate administration privileges to
install new software on your machine.
Mac
To install R on a Mac, click the “Download R for Mac” link. Next, click on the R-
3.0.3 package link (or the package link for the most current release of R). An installer will
download to guide you through the installation process, which is very easy. The installer lets
you customize your installation, but the defaults will be suitable for most users. I’ve never
found a reason to change them. If your computer requires a password before installing new
programs, you’ll need it here.
Binaries Versus Source
R can be installed from precompiled binaries or built from source on any operating system.
For Windows and Mac machines, installing R from binaries is extremely easy. The binary
comes preloaded in its own installer. Although you can build R from source on these
platforms, the process is much more complicated and won’t provide much benefit for most
users. For Linux systems, the opposite is true. Precompiled binaries can be found for some
systems, but it is much more common to build R from source files when installing on Linux.
The download pages on CRAN’s website provide information about building R from source
for the Windows, Mac, and Linux platforms.
Linux
R comes preinstalled on many Linux systems, but you’ll want the newest version of R if
yours is out of date. The CRAN website provides files to build R from source on Debian,
Redhat, SUSE, and Ubuntu systems under the link “Download R for Linux.” Click the link
and then follow the directory trail to the version of Linux you wish to install on. The exact
installation procedure will vary depending on the Linux system you use. CRAN guides the
process by grouping each set of source files with documentation or README files that
explain how to install on your system.
Introduction to R Studio
Use the plus button, which is just below the file tab and you can choose R script,
from there, to open a new R script file.
Once you open an R script file, this is how an R Studio with the script file open looks like.
So, 3 panels console, environment/history and file/plots panels are there. On top left you have
a new window, which is now being opened as a script file. Now you are ready to write a
script file or some program in R Studio.
Different ways to run a R program
There are many ways to run an R program:
Method 1: Using command prompt or terminal
Write your code in notepad or any text editor, save it as “helloworld.r”,
Run it in command prompt or terminal using the command “Rscript helloworld.r”.
Method 2: Using an online IDE
There are many online IDE available. We can use that without the need of
installing or downloading anything.
Method 3: Using IDE like RStudio, RTVS, StatET
You can download and install these IDE in your system and can write and run the
program there. RStudio & StatET(Eclipse software)is available for Windows,
Mac, and Linux. RTVS presently available only on windows.
RPE: the R Productivity Environment for Windows
The R Productivity Environment, or RPE, is a brand new interface that is designed to
make creating R programs easier and more reliable. It's available now to subscription
customers of REvolution R Enterprise on Windows.
Features:
Enhanced Script Editor with hover-over help, word completion, find-across-files
variety of analyses. Tooltip help gives guidance in filling out the snippet. R
function authors can write their own code snippets to share with other users.
Object Browser allowing users to see all the data and function objects that are
and step-out capability, allowing users to inspect and modify R objects as they are
debugging.
A Visual Solution Explorer for organizing, viewing, adding, removing, and
rearranging, and deploying R scripts. Users can create their own Project Templates
for automatic creation of a set of customized scripts for a new R project.
Dockable, Floating, and Tabbed Tool Windows allowing for personally
customized workspaces.
Enhanced Help including complete search capabilities and hover-over tooltips for
programming: you use a menu to select a standard task (like importing a file or
creating a chart), but instead of just performing the action, the RPE inserts R code
to do it.
You use the TAB key to skip between placeholder sections in the code, and
pop-up help tells you what each placeholder does, so it's almost as easy as
using a dialog.
The debugger really streamlines the process of finding and fixing mistakes in R
code. You no longer have to edit a function to insert a browser() call, or use
the trace() or debug() functions. Instead, all you need to do to create a breakpoint
in a script or a function is click once on the line where you want the breakpoint to
go.
You can set as many breakpoints as you want, and then switch the RPE to Debug
mode. Now when your code runs, it will stop at each breakpoint where you can
inspect or change variables with the command line or Object Browser.
An efficient way to install and load R packages
What is a R package and how to use it?
Only fundamental functionalities come by default with R. User can install some
“extensions” to perform the analysis . These extensions which are collections of functions
and datasets developed and published by R users are called packages.
Packages extend existing base R functionalities by adding new ones. R is open
source so everyone can write code and publish it as a package, and everyone can
install a package and start using the functions or datasets built inside the package, all
this for free.
Install packages by [Link]("name_of_package") (do not forget "" around the
name of the package, otherwise R will look for an object saved under that name!).
Once the package is installed, you must load the package and only after it has been
loaded you can use all the functions and datasets it contains.
Note that packages must be installed only once (until you update your R, then you have to
install them again), whereas packages must be loaded every time you open R.
More efficient way
# Package names
packages <- c("ggplot2", "readxl", "dplyr", "tidyr", "ggfortify", "DT", "reshape2", "knitr",
"lubridate", "pwr", "psy", "car", "doBy", "imputeMissings", "RcmdrMisc", "questionr",
"vcd", "multcomp", "KappaGUI", "rcompanion", "FactoMineR")
# Packages loading
invisible(lapply(packages, library, [Link] = TRUE))
This code for installing and loading R packages is more efficient in several ways:
1. The function [Link]() accepts a vector as argument, so one line of code for
each package in the past is now one line including all packages
2. In the second part of the code, it checks whether a package is already installed or not,
and then install only the missing ones
3. Regarding the packages loading (the last part of the code), the lapply() function is
used to call the library() function on all packages at once, which makes the code more
condense.
4. The output when loading a package is rarely useful. The invisible() function removes
this output.
R tools
Building R packages requires a tool chain that must be in place before you begin
developing. If you are developing packages that contain only R code, then the tools you
need come with R and RStudio.
R Packages
The objectives of this section are:
Recognize the basic structure and purpose of an R package
Recognize the key directives in a NAMESPACE file
R package is a mechanism for extending the basic functionality of R. It is the natural
extension of writing functions that each do a specific thing well. In the previous chapter,
we discussed how writing functions abstracts the behavior of a set of R expressions by
providing a defined interface, with inputs (i.e. function arguments) and outputs (i.e. return
values).
Once one has developed many functions, it becomes natural to group them in to
collections of functions that are aimed at achieving an overall goal. This collection of
functions can be assembled into an R package.
R packages represent another level of abstraction, where the interface presented to the user
is a set of user-facing functions. These functions provide access to the underlying
functionality of the package and simplify the user experience because the one does not
need to be concerned with the many other helper functions that are required.
Basic Structure of an R Package
An R package begins life as a directory on your computer. This directory has a specific
layout with specific files and sub-directories. The two required sub-directories are
R, which contains all of your R code files
man, which contains your documentation files.
At the top level of your package directory you will have a DESCRIPTION file and
a NAMESPACE file. This represents the minimal requirements for an R package. Other
files and sub-directories can be added and will discuss how and why in the sections below.
I> While RStudio is not required to build R packages, it contains a number of convenient
features that make the development process easier and faster.
DESCRIPTION File
The DESCRIPTION file is an essential part of an R package because it contains key
metadata for the package that is used by repositories like CRAN and by R itself. In
particular, this file contains the package name, the version number, the author and
maintainer contact information, the license information, as well as any dependencies on
other packages.
As an example, here is the DESCRIPTION file for the mvtsplot package on CRAN. This
package provides a function for plotting multivariate time series data.
Package: mvtsplot
Version: 1.0-3
Date: 2016-05-13
Depends: R (>= 3.0.0)
Imports: splines, graphics, grDevices, stats, RColorBrewer
Title: Multivariate Time Series Plot
Author: Roger D. Peng <rpeng@[Link]>
Maintainer: Roger D. Peng <rpeng@[Link]>
Description: A function for plotting multivariate time series data.
License: GPL (>= 2)
URL: [Link]
NAMESPACE File
The NAMESPACE file specifies the interface to the package that is presented to the user.
This is done via a series of export() statements, which indicate which functions in the
package are exported to the user. Functions that are not exported cannot be called directly
by the user (although see below). In addition to exports, the NAMESPACE file also
specifies what functions or packages are imported by the package.
export("mvtsplot")
import(splines)
import(RColorBrewer)
importFrom("grDevices", "colorRampPalette", "gray")
importFrom("graphics", "abline", "axis", "box", "image", "layout",
"lines", "par", "plot", "points", "segments", "strwidth",
"text", "Axis")
Here we can see that only a single function is exported from the package
(the mvtsplot() function). There are two types of import statements:
import(), simply takes a package name as an argument, and the interpretation is that
all exported functions from that external package will be accessible to your
package
importFrom(), takes a package and a series of function names as arguments. This
directive allows you to specify exactly which function you need from an external
package. For example, this package imports
the colorRampPalette() and gray() functions from the grDevices package.
Namespace Function Notation
As you start to use many packages in R, the likelihood of two functions having the same
name increases. For example, the commonly used dplyr package has a function
named filter(), which is also the name of a function in the stats package. If one has both
packages loaded (a more than likely scenario) how can one specific exactly
which filter() function they want to call?
<package name>::<exported function name>
For example, the filter() function from the dplyr package can be referenced
as dplyr::filter().
Loading and Attaching a Package Namespace
When dealing with R packages, it’s useful to understand the distinction between loading a
package namespace and attaching it. When package A imports the namespace of
package B, package A loads the namespace of package B in order to gain access to the
exported functions of package B.
The R Sub-directory
The R sub-directory contains all of your R code, either in a single file, or in multiple files.
For larger packages it’s usually best to split code up into multiple files that logically group
functions together. The names of the R code files do not matter, but generally it’s not a
good idea to have spaces in the file names.
The man Sub-directory
The man sub-directory contains the documentation files for all of the exported objects of a
package. With older versions of R one had to write the documentation of R objects directly
into the man directory using a LaTeX-style notation. However, with the development of
the roxygen2 package, we no longer need to do that and can write the documentation
directly into the R code files.
The devtools Package
The objective of this section is
Create a simple R package skeleton using the devtools package
R package development has become substantially easier in recent years with the
introduction of a package by Hadley Wickham called devtools. As the package name
suggests, this includes a variety of functions that facilitate software development in R.
Here are some of the key functions included in devtools and what they do, roughly in the
order you are likely to use them as you develop an R package:
Function Use
Creating a package
As an alternative to create, you can also initialize an R package in RStudio by selecting
“File” -> “New Project” -> “New Direction” -> “R Package.”
RStudio panes when you open an R project like a package directory, as shown in this
figure.
Example of the directory contents of the initial package structure created with devtools.
Building and installing an R package
Open a terminal window
Go to the directory that contains your package directory.
Type
R CMD build brocolors
(Replace brocolors with the name of your package directory, which hopefully is
also the name of your package.)
Installing an R package
To install the package, type (at the command line)
R CMD INSTALL brocolors_0.[Link]
Then start R and type library(brocolors) to see that it was indeed installed, and then try out
one of the functions. For my package, I’d try
brocolors()
plot_crayons()
What are Repositories?
A repository is a place where packages are located and stored so you can install R packages
from it. Organizations and Developers have a local repository, typically they are online and
accessible to everyone. Some of the most popular repositories for R packages are:
CRAN: Comprehensive R Archive Network(CRAN) is the official repository, it
is a network of FTP and web servers maintained by the R community around the
world. The R community coordinates it, and for a package to be published in
CRAN, the Package needs to pass several tests to ensure that the package is
following CRAN policies.
Bioconductor: Bioconductor is a topic-specific repository, intended for open
source software for bioinformatics. Similar to CRAN, it has its own submission
and review processes, and its community is very active having several
conferences and meetings per year in order to maintain quality.
Github: Github is the most popular repository for open-source projects. It’s
popular as it comes from the unlimited space for open source, the integration
with git, a version control software, and its ease to share and collaborate with
others.
Install an R-Packages
There are multiple ways to install R Package, some of them are,
Installing R Packages From CRAN: For installing R Package from CRAN we
need the name of the package and use the following command:
[Link]("package name")
Installing Package from CRAN is the most common and easiest way as we just
have to use only one command. In order to install more than a package at a time,
we just have to write them as a character vector in the first argument of
the [Link]() function:
Example:
[Link](c("vioplot", "MASS"))
Installing BiocManager Packages: In Bioconductor,using R version 3.5 or
greater, which is not compatible with the biocLite.R script for installing
Bioconductor packages.
Instead, you should use the BiocManager package to install and manage Bioconductor
packages. Here’s an example of how to install the BiocManager package and use it to
install a Bioconductor package:
[Link]("BiocManager")
Update, Remove and Check Installed Packages in R
To check what packages are installed on your computer, type this command:
[Link]()
To update all the packages, type this command:
[Link]()
To update a specific package, type this command:
[Link]("PACKAGE NAME")
Installing Packages Using RStudio UI
In R Studio goto Tools -> Install Package, and there we will get a pop-up window to type
the package you want to install:
Packages in R Programming
Under Packages, type, and search Package which we want to install and then click
on install button.
How to Load Packages in R Programming Language?
When a R package is installed, we are ready to use its functionalities. If we just need a
sporadic use of a few functions or data inside a package we can access them with the
following notation.
# Load a package using the library function
library(dplyr)
# Load a package using the require function
require(dplyr)
Difference Between a Package and a Library
There is always confusion between a package and a library, and we find people calling
libraries as packages.
library(): It is the command used to load a package, and it refers to the place
where the package is contained, usually a folder on our computer.
Package: It is a collection of functions bundled conveniently. The package is an
appropriate way to organize our own work and share it with others.
Load More Than One Package at a Time
We can just input a vector of names to the [Link]() function to install an R
package, in the case of the library() function, this is not possible. We can load a set of
packages one at a time, or if you prefer, use one of the many workarounds developed by R
users.
In R, you can load more than one package at a time using the library() function. Simply
provide the names of the packages you want to load as a vector inside the library() function.
Here’s an example:
# Load multiple packages at once
library(caret, dplyr, ggplot2)
This code loads the caret, dplyr, and ggplot2 packages at once.
To install an R package from GitHub from devtools package.
First, you need to install devtools by running the following code:
install. Packages("devtools")
R BASICS
Simple Math
In R, you can use operators to perform common mathematical operations on numbers.
The + operator is used to add together two values:
Example
10 + 5
And the - operator is used for subtraction:
Example
10 - 5
sqrt( )
The sqrt() function returns the square root of a number:
Example
sqrt(16)
abs()
The abs() function returns the absolute (positive) value of a number:
Example
abs(-4.7)
floor(1.4)
R Variables
A variable is a memory allocated for the storage of specific data and the name associated with
the variable is used to work around this reserved block. The name given to a variable is
known as its variable name. Usually a single variable stores only the data belonging to a
certain data type. The name is so given to them because when the program executes there is
subject to change hence it varies from time to time.
Variables in R
Variables are used to store the information to be manipulated and referenced in the R
program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type
of data type when the statement is run. A valid variable name contains letter, numbers, dot
and underlines characters. A variable name should start with a letter or the dot not followed
by a number.
var_name, Valid Variable can start with a dot, but dot should not be followed by a
[Link] number. In this case, the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name
except dot and underscore.
# Initialization of variables
# using equal to operator
var1 = "hello"
print(var1)
# using leftward operator
var2 <- "hello"
print(var2)
# using rightward operator
"hello" -> var3
print(var3)
Output
[1] "hello"
[1] "hello"
[1] "hello"
Rules of R Variables
The following rules need to be kept in mind while naming a R variable:
A valid variable name consists of a combination of alphabets, numbers, dot(.), and
underscore(_) characters. Example: var.1_ is valid
Apart from the dot and underscore operators, no other special character is allowed.
Example: var$1 or var#1 both are invalid
Variables can start with alphabets or dot characters. Example: .var or var is valid
The variable should not start with numbers or underscore. Example: 2var or _var
is invalid.
If a variable starts with a dot the next thing after the dot cannot be a number.
Example: .3var is invalid
The variable name should not be a reserved keyword in R. Example: TRUE,
FALSE,etc.
Important Methods for R Variables
R provides some useful methods to perform operations on variables. These methods are used
to determine the data type of the variable, finding a variable, deleting a variable, etc.
Following are some of the methods used to work on variables:
class() function
This built-in function is used to determine the data type of the variable provided to it. The R
variable to be checked is passed to this as an argument and it prints the data type in return.
Syntax
class(variable)
Example
var1 = "hello"
print(class(var1))
Output
[1] "character"
ls() function
This built-in function is used to know all the present variables in the workspace. This is
generally helpful when dealing with a large number of variables at once and helps prevents
overwriting any of them.
Syntax
ls()
Example
Output:
[1] "var1" "var2" "var3"
rm() function
This is again a built-in function used to delete an unwanted variable within your workspace.
This helps clear the memory space allocated to certain variables that are not in use thereby
creating more space for others. The name of the variable to be deleted is passed as an
argument to it.
Syntax
rm(variable)
Example
Output
Error in print(var3) : object 'var3' not found
Execution halted
Scope of Variables in R programming
The location where we can find a variable and also access it if required is called the scope of
a variable. There are mainly two types of variable scopes:
Global Variables
Global variables are those variables that exist throughout the execution of a program. It can
be changed and accessed from any part of the program.
As the name suggests, Global Variables can be accessed from any part of the program.
They are available throughout the lifetime of a program.
They are declared anywhere in the program outside all of the functions or blocks.
Declaring global variables
Global variables are usually declared outside of all of the functions and blocks. They can be
accessed from any portion of the program.
# R program to illustrate
# usage of global variables
# global variable
global = 5
# global variable accessed from
# within a function
display = function(){
print(global)
}
display()
# changing value of global variable
global = 10
display()
Output
[1] 5
[1] 10
Local Variables
Local variables are those variables that exist only within a certain part of a program like a
function and are released when the function call ends. Local variables do not exist outside the
block in which they are declared, i.e. they cannot be accessed or used outside that block.
Declaring local variables
Local variables are declared inside a block.
# R program to illustrate
# usage of local variables
func = function(){
# this variable is local to the
# function func() and cannot be
# accessed outside this function
age = 18
print(age)}
cat("Age is:\n")
func()
Output
Age is:
[1] 18
Difference between local and global variables in R
1. Scope A global variable is defined outside of any function and may be accessed
from anywhere in the program, as opposed to a local variable.
2. Lifetime A local variable’s lifetime is constrained by the function in which it is
defined. The local variable is destroyed once the function has finished running. A
global variable, on the other hand, doesn’t leave memory until the program is
finished running or the variable is explicitly deleted.
3. Naming conflicts If the same variable name is used in different portions of the
program, they may occur since a global variable can be accessed from anywhere
in the program. Contrarily, local variables are solely applicable to the function in
which they are defined, reducing the likelihood of naming conflicts.
4. Memory usage Because global variables are kept in memory throughout program
execution, they can eat up more memory than local variables. Local variables, on
the other hand, are created and destroyed only when necessary, therefore they
normally use less memory.
Data types of variable
R programming is a dynamically typed language, which means that we can change the data
type of the same variable again and again in our program. Because of its dynamic nature, a
variable is not declared of any data type. It gets the data type from the R-object, which is to
be assigned to the variable.
We can check the data type of the variable with the help of the class() function. Let's see an
example:
variable_y<- 124
cat("The data type of variable_y is ",class(variable_y),"\n")
variable_y<- "Learn R Programming"
cat(" Now the data type of variable_y is ",class(variable_y),"\n")
variable_y<- 133L
cat(" Next the data type of variable_y becomes ",class(variable_y),"\n")
R Vector
A vector is a basic data structure which plays an important role in R programming.
In R, a sequence of elements which share the same data type is known as vector. A vector
supports logical, integer, double, character, complex, or raw data type. The elements which
are contained in vector known as components of the vector. We can check the type of vector
with the help of the typeof() function.
The length is an important property of a vector. A vector length is basically the number of
elements in the vector, and it is calculated with the help of the length() function.
Vector is classified into two parts, i.e., Atomic vectors and Lists. They have three common
properties, i.e., function type, function length, and attribute function.
There is only one difference between atomic vectors and lists. In an atomic vector, all the
elements are of the same type, but in the list, the elements are of different data types.
How to create a vector in R?
In R, we use c() function to create a vector. This function returns a one-dimensional array or
simply vector. The c() function is a generic function which combines its argument. All
arguments are restricted with a common data type which is the type of the returned value.
There are various other ways to create a vector in R, which are as follows:
1) Using the colon(:) operator
We can create a vector with the help of the colon operator. There is the following syntax to
use colon operator:
1. z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
Example:
1. a<-4:-10
2. a
Output
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
2) Using the seq() function
In R, we can create a vector with the help of the seq() function. A sequence function creates a
sequence of elements as a vector. The seq() function is used in two ways, i.e., by setting step
size with ?by' parameter or specifying the length of the vector with the 'length. Out' feature.
Example:
seq_vec<-seq(1,4,by=0.5)
seq_vec
class(seq_vec)
Output
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in Data
Science. Atomic vectors are created with the help of c() function. These atomic vectors are as
follows:
Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to
any variable d, then this d variable will become a numeric type. A vector which contains
numeric elements is known as a numeric vector.
Example:
d<-45.5
num_vec<-c(10.1, 10.2, 33.2)
d
num_vec
class(d)
class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by
"Int." The Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an
integer value to a variable, i.e., by using [Link]() function and appending of L to the value.
A vector which contains integer elements is known as an integer vector.
Example:
d<-[Link](5)
e<-5L
int_vec<-c(1,2,3,4,5)
int_vec<-[Link](int_vec)
int_vec1<-c(1L,2L,3L,4L,5L)
class(d)
class(e)
class(int_vec)
class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to
create a character data type value, i.e., using [Link]() function and by typing string
between double quotes("") or single quotes('').
A vector which contains character elements is known as an integer vector.
Example:
d<-'shubham'
e<-"Arpita"
f<-65
f<-[Link](f)
d
e
f
char_vec<-c(1,2,3,4,5)
char_vec<-[Link](char_vec)
char_vec1<-c("shubham","arpita","nishka","vaishali")
char_vec
class(d)
class(e)
class(f)
class(char_vec)
class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Logical vector
The logical data types have only two values i.e., True or False. These values are based on
which condition is satisfied. A vector which contains Boolean values is known as the logical
vector.
Example:
d<-[Link](5)
e<-[Link](6)
f<-[Link](7)
g<-d>e
h<-e<f
g
h
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
log_vec
class(g)
class(h)
class(log_vec)
Output
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
Accessing elements of vectors
We can access the elements of a vector with the help of vector indexing. Indexing denotes the
position where the value in a vector is stored. Indexing will be performed with the help of
integer, character, or logic.
# Print fruits
fruits
1) Combining vectors
The c() function is not only used to create a vector, but also it is also used to combine two
vectors. By combining one or more vectors, it forms a new vector which contains all the
elements of each vector. Let see an example to see how c() function combines the vectors.
Example:
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Output
[1] "1" "2" "4" "5" "7" "8"
[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"
2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations are
performed member-by-member on vectors. We can add, subtract, multiply, or divide two
vectors. Let see an example to understand how arithmetic operations are performed on
vectors.
Example:
a<-c(1,3,5,7)
b<-c(2,4,6,8)
a+b
a-b
a/b
a%%b
3) Logical Index vector
With the help of the logical index vector in R, we can form a new vector from a given vector.
This vector has the same length as the original vector. The vector members are TRUE only
when the corresponding members of the original vector are included in the slice; otherwise, it
will be false. Let see an example to understand how a new vector is formed with the help of
logical index vector.
Example:
a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
a[b]
Output
[1] "Shubham" "Nishka" "Vaishali"
4) Numeric Index
In R, we specify the index between square braces [ ] for indexing a numerical value. If our
index is negative, it will return us all the values except for the index which we have specified.
For example, specifying [-3] will prompt R to convert -3 into its absolute value and then
search for the value which occupies that index.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[2]
q[-4]
q[15]
Output
[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
5) Duplicate Index
An index vector allows duplicate values which means we can access one element twice in
one operation. Let see an example to understand how duplicate index works.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[c(2,4,4,3)]
Output
[1] "arpita" "gunjan" "gunjan" "nishka"
6) Range Indexes
Range index is used to slice our vector to form a new vector. For slicing, we used colon(:)
operator. Range indexes are very helpful for the situation involving a large operator. Let see
an example to understand how slicing is done with the help of the colon operator to form a
new vector.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
b<-q[2:5]
b
Output
[1] "arpita" "nishka" "gunjan" "vaishali"
7) Out-of-order Indexes
In R, the index vector can be out-of-order. Below is an example in which a vector slice with
the order of first and second values reversed.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
2. q[c(2,1,3,4,5,6)]
Output
[1] "arpita" "shubham" "nishka" "gunjan" "vaishali" "sumit"
8) Named vectors members
We first create our vector of characters as:
1. z=c("TensorFlow","PyTorch")
2. z
Output
[1] "TensorFlow" "PyTorch"
Once our vector of characters is created, we name the first vector member as "Start" and the
second member as "End" as:
1. names(z)=c("Start","End")
2. z
Output
Start End
"TensorFlow" "PyTorch"
We retrieve the first member by its name as follows:
1. z["Start"]
Output
Start
"TensorFlow"
We can reverse the order with the help of the character string index vector.
1. z[c("Second","First")]
Output
Second First
"PyTorch" "TensorFlow"
Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are
extended to eigenvalues and eigenvector and then used for performing decomposition
in vector spaces.
2. The inputs which are provided to the deep learning model are in the form of vectors.
These vectors consist of standardized data which is supplied to the input layer of the
neural network.
3. In the development of support vector machine algorithms, vectors are used.
4. Vector operations are utilized in neural networks for various operations like image
recognition and text processing.
R Functions
A set of statements which are organized together to perform a specific task is known as a
function. R provides a series of in-built functions, and it allows the user to create their own
functions. Functions are used to perform tasks in the modular approach.
Functions are used to avoid repeating the same task and to reduce complexity. To understand
and maintain our code, we logically break it into smaller parts using the function. A function
should be
1. Written to carry out a specified task.
2. May or may not have arguments
3. Contain a body in which our code is written.
4. May or may not return one or more output values.
"An R function is created by using the keyword function." There is the following syntax
of R function:
func_name <- function(arg_1, arg_2, ...) {
Function body
}
Components of Functions
There are four components of function, which are as follows:
Function Name
The function name is the actual name of the function. In R, the function is stored as an object
with its name.
Arguments
In R, an argument is a placeholder. In function, arguments are optional means a function may
or may not contain arguments, and these arguments can have default values also. We pass a
value to the argument when a function is invoked.
Function Body
The function body contains a set of statements which defines what the function does.
Return value
It is the last expression in the function body which is to be evaluated.
Function Types
Similar to the other languages, R also has two types of function, i.e. Built-in
Function and User-defined Function. In R, there are lots of built-in functions which we can
directly call in the program without defining them. R also allows us to create our own
functions.
Built-in function
The functions which are already created or defined in the programming framework are
known as built-in functions. User doesn't need to create these types of functions, and these
functions are built into an application. End-users can access these functions by simply calling
it. R have different types of built-in functions such as seq(), mean(), max(), and sum(x) etc.
# Creating sequence of numbers from 32 to 46.
print(seq(32,46))
# Finding the mean of numbers from 22 to 80.
print(mean(22:80))
# Finding the sum of numbers from 41 to 70.
print(sum(41:70))
Output:
User-defined function
R allows us to create our own function in our program. A user defines a user-define function
to fulfil the requirement of user. Once these functions are created, we can use these functions
like in-built function.
# Creating a function without an argument.
[Link] <- function() {
for(i in 1:5) {
print(i^2)
} }
[Link]()
Output:
As the name indicates, Missing values are those elements which are not known. NA or NaN
are reserved words that indicate a missing value in R Programming language for q
arithmetical operations that are undefined.
R – handling Missing Values
Missing values are practical in life. For example, some cells in spreadsheets are empty. If an
insensible or impossible arithmetic operation is tried then NAs occur.
Dealing Missing Values in R
Missing Values in R, are handled with the use of some pre-defined functions:
[Link]() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NA values present. It returns
a Boolean value. If NA is present in a vector it returns TRUE else FALSE.
x<- c(NA, 3, 4, NA, NA, NA)
[Link](x)
Output:
[1] TRUE FALSE FALSE TRUE TRUE TRUE
[Link]() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NaN values present. It
returns a Boolean value. If NaN is present in a vector it returns TRUE else FALSE.
Output:
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
Properties of Missing Values:
For testing objects that are NA use [Link]()
For testing objects that are NaN use [Link]()
There are classes under which NA comes. Hence integer class has integer type
NA, the character class has character type NA, etc.
A NaN value is counted in NA but the reverse is not valid.
The creation of a vector with one or multiple NAs is also possible.
Output:
[1] NA 3 4 NA NA NA
Removing NA or NaN values
There are two ways to remove missing values:
Extracting values except for NA or NaN values:
Example 1:
Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data
objects. There are six types of atomic vectors such as logical, integer, character, double, and
raw. "A vector is a collection of elements which is most commonly of mode character,
integer, logical or numeric" A vector can be one of the following two types:
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode. A list contains a mixture of data types. The list is also known as generic vectors
because the element of the list can be of any type of R object. "A list is a special type of
vector in which each element can be a different type."
We can create a list with the help of list() or [Link](). We can use vector() to create a required
length empty list.
Arrays
There is another type of data objects which can store data in more than two dimensions
known as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an
input and uses the value in the dim parameter to create an array.
Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional
rectangular layout. In the matrix, elements of the same atomic types are contained. For
mathematical calculation, this can use a matrix containing the numeric element. A matrix is
created with the help of the matrix() function in R.
Syntax
The basic syntax of creating a matrix is as follows:
1. matrix(data, no_row, no_col, by_row, dim_name)
Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which
each column contains the value of one variable, and row contains the set of value from each
column.
There are the following characteristics of a data frame:
1. The column name will be non-empty.
2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.
Factors
# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor
# named gender
gender <-factor(x)
print(gender)
Output
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
Modification of a Factor in R
After a factor is formed, its components can be modified but the new values which need to
be assigned must be at the predefined level.
Example
Output
[1] female female male female
Levels: female male
For selecting all the elements of the factor gender except ith element, gender[-i] should be
used. So if you want to modify a factor and add value out of predefined levels, then first
modify levels.
Example
Output
[1] female male other female
Levels: female male other
Factors in Data Frame
The Data frame is similar to a 2D array with the columns containing all the values of one
variable and the rows having one set of values from every column. There are four things to
remember about data frames:
column names are compulsory and cannot be empty.
Unique names should be assigned to each row.
The data frame’s data can be only of three types- factor, numeric, and character
type.
The same number of data items must be present in each column.
In R language when we create a data frame, its column is categorical data, and hence a R
factor is automatically created on it.
We can create a data frame and check if its column is a factor.
Example
Output
age salary gender
1 40 103200 male
2 49 106200 male
3 48 150200 transgender
4 40 10606 female
5 67 10390 male
6 52 14070 female
7 53 10220 transgender