0% found this document useful (0 votes)
10 views48 pages

Module-1 R

R is an open-source programming language widely used for statistical software and data analysis, developed by Ross Ihaka and Robert Gentleman. It supports various platforms, integrates with other programming languages, and is favored by researchers and data analysts for its powerful data manipulation capabilities. RStudio serves as an integrated development environment for R, enhancing user experience with features like script editing, environment management, and plotting.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views48 pages

Module-1 R

R is an open-source programming language widely used for statistical software and data analysis, developed by Ross Ihaka and Robert Gentleman. It supports various platforms, integrates with other programming languages, and is favored by researchers and data analysts for its powerful data manipulation capabilities. RStudio serves as an integrated development environment for R, enhancing user experience with features like script editing, environment management, and plotting.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MODULE 1: INTRODUCTION

What is R Programming
R is an open-source programming language that is widely used as a statistical software and
data analysis tool. R generally comes with the Command-line interface. R is available
across widely used platforms like Windows, Linux, and macOS. Also, the R programming
language is the latest cutting-edge tool.
"R is an interpreted computer programming language which was created by Ross Ihaka
and Robert Gentleman at the University of Auckland, New Zealand." The R
Development Core Team currently develops R. It is also a software environment used to
analyze statistical information, graphical representation, reporting, and data modelling.
R is the implementation of the S programming language, which is combined with lexical
scoping semantics.
 R not only allows us to do branching and looping but also allows to do modular
programming using functions.
 R allows integration with the procedures written in the C, C++, .Net, Python, and
FORTRAN languages to improve efficiency.
R is one of the most important tool which is used by researchers, data analyst, statisticians,
and marketers for retrieving, cleaning, analyzing, visualizing, and presenting data.
History of R Programming
The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and
Robert Gentleman in the University of Auckland, New Zealand, and the R Development Core
Team currently develops it. This programming language name is taken from the name of both
the developers. The first project was considered in 1992. The initial version was released in
1995, and in 2000, a stable beta version was released.
The following table shows the release date, version, and description of R language:

Version- Date Description


Release

0.49 1997-04-23 First time R's source was released, and CRAN
(Comprehensive R Archive Network) was started.

0.60 1997-12-05 R officially gets the GNU license.

0.65.1 1999-10-07 update. Packages and install. Packages both are included.

1.0 2000-02-29 The first production-ready version was released.

1.4 2001-12-19 First version for Mac OS is made available.

2.0 2004-10-04 The first version for Mac OS is made available.

2.1 2005-04-18 Add support for UTF-8encoding, internationalization,


localization etc.

2.11 2010-04-22 Add support for Windows 64-bit systems.

2.13 2011-04-14 Added a function that rapidly converts code to byte code.

2.14 2011-10-31 Added some new packages.


2.15 2012-03-30 Improved serialization speed for long vectors.

3.0 2013-04-03 Support for larger numeric values on 64-bit systems.

3.4 2017-04-21 The just-in-time compilation (JIT) is enabled by default.

3.5 2018-04-23 Added new features such as compact internal


representation of integer sequences, serialization format
etc.

Key Differences between 32-bit and 64-bit R:


1. Memory Access:
o 32-bit R: Can only access up to 4GB of RAM, limiting the size of datasets you
can work with.
o 64-bit R: Can access much more memory (theoretically up to 16 exabytes, but
practically limited by your system’s RAM). This allows for working with larger
datasets and performing more memory-intensive computations.
2. Performance:
o 32-bit R: Might perform faster on systems with less memory (e.g., under 4GB)
for smaller tasks because of lower memory overhead.
o 64-bit R: Generally better suited for tasks requiring large data structures and
computations, as it can utilize the system's full RAM and more advanced
processing instructions.
3. Package Compatibility:
o 32-bit R: Some older R packages may only work in 32-bit environments,
though this is increasingly rare.
o 64-bit R: Most modern R packages are optimized for 64-bit systems and may
perform better or are required for larger datasets.
4. Installation:
o On Windows, R is typically installed with both 32-bit and 64-bit versions. You
can select which one to use when you start an R session.
o On Mac and Linux, 64-bit versions of R are generally the default.
Why should you learn R?
 There are several tools available in the market to perform data analysis. Learning new
languages is time taken. The data scientist can use two excellent tools, i.e., R and
Python.
 The important task in data science is the way we deal with the data: clean, feature
engineering, feature selection, and import. It should be our primary focus. Data
scientist job is to understand the data, manipulate it, and expose the best approach.
For machine learning, the best algorithms can be implemented with
R. Keras and TensorFlow allow us to create high-end machine learning techniques.
R has a package to perform Xgboost. Xgboost is one of the best algorithms
for Kaggle competition.
 R communicate with the other languages and possibly calls Python, Java, C++. The
big data world is also accessible to R. We can connect R with different databases
like Spark or Hadoop.
 R is a great tool to investigate and explore the data. The elaborate analysis such as
clustering, correlation, and data reduction are done with R.
 R is open-source, which means that it is constantly being updated and improved by
other collaborative developers around the world
RStudio is split into 4 quadrants:
 Script (top left): where commands are written, executed, and saved
 Environment (top right): lists the data, variables, and functions that are currently
in the workspace
 Console (bottom left): for quickly testing code and where commands and outputs
are displayed, except plots
 Plot (bottom right): where graphics are displayed
Features of R programming
R is a domain-specific programming language which aims to do data analysis. It has some
unique features which make it very powerful. The most important arguably being the notation
of vectors. These vectors allow us to perform a complex operation on a set of values in a
single command. There are the following features of R programming:
1. It is a simple and effective programming language which has been well developed.
2. It is data analysis software.
3. It is a well-designed, easy, and effective language which has the concepts of user-
defined, looping, conditional, and various I/O facilities.
4. It has a consistent and incorporated set of tools which are used for data analysis.
5. For different types of calculation on arrays, lists and vectors, R contains a suite of
operators.
6. It provides effective data handling and storage facility.
7. It is an open-source, powerful, and highly extensible software.
8. It provides highly extensible graphical techniques.
9. It allows us to perform multiple calculations using vectors.
10. R is an interpreted language.
Applications of R
There are several-applications available in real-time. Some of the popular applications are as
follows:
o Facebook
o Google
o Twitter
o Sunlight Foundation
o RealClimate
RStudio IDE
RStudio is an integrated development environment which allows us to interact with R more
readily. RStudio is similar to the standard RGUI, but it is considered more user-friendly. This
IDE has various drop-down menus, Windows with multiple tabs, and so many
customization processes.
The first time when we open RStudio, we will see three Windows. The fourth Window will
be hidden by default. We can open this hidden Window by clicking the File drop-down menu,
then New File and then R Script.

RStudio Location Description


Windows/Tabs

Console Lower-left The location where commands are entered and output
Window is printed.

Source Tabs Upper-left Built-in test editor

Environment Upper-left An interactive list of loaded R objects.


Tab
History Tab Upper-left List of keystrokes entered into the console.

Files Tab Lower-right File explorer to navigate C drive folders.

Plots Tab Lower-right Output location for plots.

Packages Tab Lower-right List of installed packages.

Viewer Tab Lower-right Advanced tab for local web content.

Installation of Rstudio
R is maintained by an international team of developers who make the language available
through the web page of The Comprehensive R Archive Network. The top of the web page
provides three links for downloading R. Follow the link that describes your operating system:
Windows, Mac, or Linux.
Windows
To install R on Windows, click the “Download R for Windows” link. Then click the “base”
link. Next, click the first link at the top of the new page. This link should say something like
“Download R 3.0.3 for Windows,” except the 3.0.3 will be replaced by the most current
version of R. The link downloads an installer program, which installs the most up-to-date
version of R for Windows. Run this program and step through the installation wizard that
appears. The wizard will install R into your program files folders and place a shortcut in your
Start menu. Note that you’ll need to have all of the appropriate administration privileges to
install new software on your machine.
Mac
To install R on a Mac, click the “Download R for Mac” link. Next, click on the R-
3.0.3 package link (or the package link for the most current release of R). An installer will
download to guide you through the installation process, which is very easy. The installer lets
you customize your installation, but the defaults will be suitable for most users. I’ve never
found a reason to change them. If your computer requires a password before installing new
programs, you’ll need it here.
Binaries Versus Source
R can be installed from precompiled binaries or built from source on any operating system.
For Windows and Mac machines, installing R from binaries is extremely easy. The binary
comes preloaded in its own installer. Although you can build R from source on these
platforms, the process is much more complicated and won’t provide much benefit for most
users. For Linux systems, the opposite is true. Precompiled binaries can be found for some
systems, but it is much more common to build R from source files when installing on Linux.
The download pages on CRAN’s website provide information about building R from source
for the Windows, Mac, and Linux platforms.
Linux
R comes preinstalled on many Linux systems, but you’ll want the newest version of R if
yours is out of date. The CRAN website provides files to build R from source on Debian,
Redhat, SUSE, and Ubuntu systems under the link “Download R for Linux.” Click the link
and then follow the directory trail to the version of Linux you wish to install on. The exact
installation procedure will vary depending on the Linux system you use. CRAN guides the
process by grouping each set of source files with documentation or README files that
explain how to install on your system.

Introduction to R Studio

R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where you


can write your quotes, see the results and also see the variables that are generated during the
course of programming.
 R Studio is available as both Open source and Commercial software.
 R Studio is also available as both Desktop and Server versions.
 R Studio is also available for various platforms such as Windows, Linux, and
macOS.
Introduction to R studio for beginners:
RStudio is an open-source tool that provides Ide to use R language, and enterprise-ready
professional software for data science teams to develop share the work with their team.
 The console panel(left panel) is the place where R is waiting for you to tell it what to
do, and see the results that are generated when you type in the commands.
 To the top right, you have the Environmental/History panel. It contains 2 tabs:
 Environment tab: It shows the variables that are generated during the course of
programming in a workspace that is temporary.
 History tab: In this tab, you’ll see all the commands that are used till now from the
start of usage of R Studio.
To the right bottom, you have another panel, which contains multiple tabs, such as
files, plots, packages, help, and viewer.
 The Files tab shows the files and directories that are available within the default
workspace of R.
 The Plots tab shows the plots that are generated during the course of programming.
 The Packages tab helps you to look at what are the packages that are already
installed in the R Studio and it also gives a user interface to install new packages.
 The Help tab is the most important one where you can get help from the R
Documentation on the functions that are in built-in R.
 The final and last tab is that the Viewer tab which can be used to see the local web
content that’s generated using R.
How to Install R Studio on Windows and Linux?
R programming language is a language and free software environment, available under
GNU license, supported by R Foundation for Statistical Computing. The language is most
widely known for its powerful statistical and data interpretation capabilities.
To use R language, you need the R environment to be installed on your machine, and an IDE
(Integrated development environment) to run the language (can also be run using CMD on
Windows or Terminal on Linux).
Why use R Studio?
 It is a powerful IDE, specifically used for the R language.
 Provides literate programming tools, which basically allow the use of R scripts,
outputs, text, and images into reports, Word documents, and even an HTML file.
 The use of Shiny (open-source R package) allows us to create interactive content
in reports and presentations.
Creation and Execution of R File in R Studio
R Studio is an integrated development environment(IDE) for R. IDE is a GUI, where you can
write your quotes, see the results and also see the variables that are generated during the
course of programming. R is available as an Open Source software for Client as well as
Server Versions.
Creating an R file
There are two ways to create an R file in R studio:
 You can click on the File tab, from there when you click it will give a drop-down
menu, where you can select the new file and then R script, so that, you will get a
new file open.

 Use the plus button, which is just below the file tab and you can choose R script,
from there, to open a new R script file.

Once you open an R script file, this is how an R Studio with the script file open looks like.

So, 3 panels console, environment/history and file/plots panels are there. On top left you have
a new window, which is now being opened as a script file. Now you are ready to write a
script file or some program in R Studio.
Different ways to run a R program
There are many ways to run an R program:
 Method 1: Using command prompt or terminal
Write your code in notepad or any text editor, save it as “helloworld.r”,
Run it in command prompt or terminal using the command “Rscript helloworld.r”.
 Method 2: Using an online IDE
There are many online IDE available. We can use that without the need of
installing or downloading anything.
 Method 3: Using IDE like RStudio, RTVS, StatET
You can download and install these IDE in your system and can write and run the
program there. RStudio & StatET(Eclipse software)is available for Windows,
Mac, and Linux. RTVS presently available only on windows.
RPE: the R Productivity Environment for Windows
The R Productivity Environment, or RPE, is a brand new interface that is designed to
make creating R programs easier and more reliable. It's available now to subscription
customers of REvolution R Enterprise on Windows.
Features:
 Enhanced Script Editor with hover-over help, word completion, find-across-files

capability, automatic syntax checking, bookmarks, and navigation buttons.


 R Code Snippets to automatically generate fill-in-the-blank sections of R code for a

variety of analyses. Tooltip help gives guidance in filling out the snippet. R
function authors can write their own code snippets to share with other users.
 Object Browser allowing users to see all the data and function objects that are

available, including those in loaded and installed R packages. Context menus


provide the capability to quickly edit and plot data or load a package.
 Full-featured Visual Debugger for debugging R scripts, with step-in, step-over,

and step-out capability, allowing users to inspect and modify R objects as they are
debugging.
 A Visual Solution Explorer for organizing, viewing, adding, removing, and

rearranging, and deploying R scripts. Users can create their own Project Templates
for automatic creation of a set of customized scripts for a new R project.
 Dockable, Floating, and Tabbed Tool Windows allowing for personally
customized workspaces.
 Enhanced Help including complete search capabilities and hover-over tooltips for

functions and data objects.


 Snippets give you the best of both worlds of menu/dialog systems and

programming: you use a menu to select a standard task (like importing a file or
creating a chart), but instead of just performing the action, the RPE inserts R code
to do it.
 You use the TAB key to skip between placeholder sections in the code, and
pop-up help tells you what each placeholder does, so it's almost as easy as
using a dialog.
 The debugger really streamlines the process of finding and fixing mistakes in R

code. You no longer have to edit a function to insert a browser() call, or use
the trace() or debug() functions. Instead, all you need to do to create a breakpoint
in a script or a function is click once on the line where you want the breakpoint to
go.
 You can set as many breakpoints as you want, and then switch the RPE to Debug

mode. Now when your code runs, it will stop at each breakpoint where you can
inspect or change variables with the command line or Object Browser.
An efficient way to install and load R packages
What is a R package and how to use it?
Only fundamental functionalities come by default with R. User can install some
“extensions” to perform the analysis . These extensions which are collections of functions
and datasets developed and published by R users are called packages.
 Packages extend existing base R functionalities by adding new ones. R is open
source so everyone can write code and publish it as a package, and everyone can
install a package and start using the functions or datasets built inside the package, all
this for free.
 Install packages by [Link]("name_of_package") (do not forget "" around the
name of the package, otherwise R will look for an object saved under that name!).
Once the package is installed, you must load the package and only after it has been
loaded you can use all the functions and datasets it contains.
Note that packages must be installed only once (until you update your R, then you have to
install them again), whereas packages must be loaded every time you open R.
More efficient way
# Package names
packages <- c("ggplot2", "readxl", "dplyr", "tidyr", "ggfortify", "DT", "reshape2", "knitr",
"lubridate", "pwr", "psy", "car", "doBy", "imputeMissings", "RcmdrMisc", "questionr",
"vcd", "multcomp", "KappaGUI", "rcompanion", "FactoMineR")
# Packages loading
invisible(lapply(packages, library, [Link] = TRUE))
This code for installing and loading R packages is more efficient in several ways:
1. The function [Link]() accepts a vector as argument, so one line of code for
each package in the past is now one line including all packages
2. In the second part of the code, it checks whether a package is already installed or not,
and then install only the missing ones
3. Regarding the packages loading (the last part of the code), the lapply() function is
used to call the library() function on all packages at once, which makes the code more
condense.
4. The output when loading a package is rarely useful. The invisible() function removes
this output.
R tools
Building R packages requires a tool chain that must be in place before you begin
developing. If you are developing packages that contain only R code, then the tools you
need come with R and RStudio.
R Packages
The objectives of this section are:
 Recognize the basic structure and purpose of an R package
 Recognize the key directives in a NAMESPACE file
R package is a mechanism for extending the basic functionality of R. It is the natural
extension of writing functions that each do a specific thing well. In the previous chapter,
we discussed how writing functions abstracts the behavior of a set of R expressions by
providing a defined interface, with inputs (i.e. function arguments) and outputs (i.e. return
values).
Once one has developed many functions, it becomes natural to group them in to
collections of functions that are aimed at achieving an overall goal. This collection of
functions can be assembled into an R package.
R packages represent another level of abstraction, where the interface presented to the user
is a set of user-facing functions. These functions provide access to the underlying
functionality of the package and simplify the user experience because the one does not
need to be concerned with the many other helper functions that are required.
Basic Structure of an R Package
An R package begins life as a directory on your computer. This directory has a specific
layout with specific files and sub-directories. The two required sub-directories are
 R, which contains all of your R code files
 man, which contains your documentation files.
At the top level of your package directory you will have a DESCRIPTION file and
a NAMESPACE file. This represents the minimal requirements for an R package. Other
files and sub-directories can be added and will discuss how and why in the sections below.
I> While RStudio is not required to build R packages, it contains a number of convenient
features that make the development process easier and faster.
DESCRIPTION File
The DESCRIPTION file is an essential part of an R package because it contains key
metadata for the package that is used by repositories like CRAN and by R itself. In
particular, this file contains the package name, the version number, the author and
maintainer contact information, the license information, as well as any dependencies on
other packages.
As an example, here is the DESCRIPTION file for the mvtsplot package on CRAN. This
package provides a function for plotting multivariate time series data.
Package: mvtsplot
Version: 1.0-3
Date: 2016-05-13
Depends: R (>= 3.0.0)
Imports: splines, graphics, grDevices, stats, RColorBrewer
Title: Multivariate Time Series Plot
Author: Roger D. Peng <rpeng@[Link]>
Maintainer: Roger D. Peng <rpeng@[Link]>
Description: A function for plotting multivariate time series data.
License: GPL (>= 2)
URL: [Link]
NAMESPACE File
The NAMESPACE file specifies the interface to the package that is presented to the user.
This is done via a series of export() statements, which indicate which functions in the
package are exported to the user. Functions that are not exported cannot be called directly
by the user (although see below). In addition to exports, the NAMESPACE file also
specifies what functions or packages are imported by the package.
export("mvtsplot")

import(splines)
import(RColorBrewer)
importFrom("grDevices", "colorRampPalette", "gray")
importFrom("graphics", "abline", "axis", "box", "image", "layout",
"lines", "par", "plot", "points", "segments", "strwidth",
"text", "Axis")
Here we can see that only a single function is exported from the package
(the mvtsplot() function). There are two types of import statements:
 import(), simply takes a package name as an argument, and the interpretation is that
all exported functions from that external package will be accessible to your
package
 importFrom(), takes a package and a series of function names as arguments. This
directive allows you to specify exactly which function you need from an external
package. For example, this package imports
the colorRampPalette() and gray() functions from the grDevices package.
Namespace Function Notation
As you start to use many packages in R, the likelihood of two functions having the same
name increases. For example, the commonly used dplyr package has a function
named filter(), which is also the name of a function in the stats package. If one has both
packages loaded (a more than likely scenario) how can one specific exactly
which filter() function they want to call?
<package name>::<exported function name>
For example, the filter() function from the dplyr package can be referenced
as dplyr::filter().
Loading and Attaching a Package Namespace
When dealing with R packages, it’s useful to understand the distinction between loading a
package namespace and attaching it. When package A imports the namespace of
package B, package A loads the namespace of package B in order to gain access to the
exported functions of package B.
The R Sub-directory
The R sub-directory contains all of your R code, either in a single file, or in multiple files.
For larger packages it’s usually best to split code up into multiple files that logically group
functions together. The names of the R code files do not matter, but generally it’s not a
good idea to have spaces in the file names.
The man Sub-directory
The man sub-directory contains the documentation files for all of the exported objects of a
package. With older versions of R one had to write the documentation of R objects directly
into the man directory using a LaTeX-style notation. However, with the development of
the roxygen2 package, we no longer need to do that and can write the documentation
directly into the R code files.
The devtools Package
The objective of this section is
 Create a simple R package skeleton using the devtools package
R package development has become substantially easier in recent years with the
introduction of a package by Hadley Wickham called devtools. As the package name
suggests, this includes a variety of functions that facilitate software development in R.
Here are some of the key functions included in devtools and what they do, roughly in the
order you are likely to use them as you develop an R package:

Function Use

Create Create the file structure for a new package

load_all Load the code for all functions in the package

Create \man documentation files and the “NAMESPACE” file


Document
from roxygen2 code

use_data Save an object in your R session as a dataset in the package

use_package Add a package you’re using to the DESCRIPTION file

use_vignette Set up the package to include a vignette

Creating a package
As an alternative to create, you can also initialize an R package in RStudio by selecting
“File” -> “New Project” -> “New Direction” -> “R Package.”
RStudio panes when you open an R project like a package directory, as shown in this
figure.
Example of the directory contents of the initial package structure created with devtools.
Building and installing an R package
 Open a terminal window
 Go to the directory that contains your package directory.
 Type
 R CMD build brocolors

(Replace brocolors with the name of your package directory, which hopefully is
also the name of your package.)
Installing an R package
To install the package, type (at the command line)
R CMD INSTALL brocolors_0.[Link]
Then start R and type library(brocolors) to see that it was indeed installed, and then try out
one of the functions. For my package, I’d try
brocolors()
plot_crayons()
What are Repositories?
A repository is a place where packages are located and stored so you can install R packages
from it. Organizations and Developers have a local repository, typically they are online and
accessible to everyone. Some of the most popular repositories for R packages are:
 CRAN: Comprehensive R Archive Network(CRAN) is the official repository, it
is a network of FTP and web servers maintained by the R community around the
world. The R community coordinates it, and for a package to be published in
CRAN, the Package needs to pass several tests to ensure that the package is
following CRAN policies.
 Bioconductor: Bioconductor is a topic-specific repository, intended for open
source software for bioinformatics. Similar to CRAN, it has its own submission
and review processes, and its community is very active having several
conferences and meetings per year in order to maintain quality.
 Github: Github is the most popular repository for open-source projects. It’s
popular as it comes from the unlimited space for open source, the integration
with git, a version control software, and its ease to share and collaborate with
others.
Install an R-Packages
There are multiple ways to install R Package, some of them are,
 Installing R Packages From CRAN: For installing R Package from CRAN we
need the name of the package and use the following command:
[Link]("package name")
 Installing Package from CRAN is the most common and easiest way as we just
have to use only one command. In order to install more than a package at a time,
we just have to write them as a character vector in the first argument of
the [Link]() function:
Example:
[Link](c("vioplot", "MASS"))
 Installing BiocManager Packages: In Bioconductor,using R version 3.5 or
greater, which is not compatible with the biocLite.R script for installing
Bioconductor packages.
Instead, you should use the BiocManager package to install and manage Bioconductor
packages. Here’s an example of how to install the BiocManager package and use it to
install a Bioconductor package:
[Link]("BiocManager")
Update, Remove and Check Installed Packages in R

To check what packages are installed on your computer, type this command:
[Link]()
To update all the packages, type this command:
[Link]()
To update a specific package, type this command:
[Link]("PACKAGE NAME")
Installing Packages Using RStudio UI
In R Studio goto Tools -> Install Package, and there we will get a pop-up window to type
the package you want to install:

Packages in R Programming
Under Packages, type, and search Package which we want to install and then click
on install button.
How to Load Packages in R Programming Language?
When a R package is installed, we are ready to use its functionalities. If we just need a
sporadic use of a few functions or data inside a package we can access them with the
following notation.
# Load a package using the library function
library(dplyr)
# Load a package using the require function
require(dplyr)
Difference Between a Package and a Library
There is always confusion between a package and a library, and we find people calling
libraries as packages.
 library(): It is the command used to load a package, and it refers to the place
where the package is contained, usually a folder on our computer.
 Package: It is a collection of functions bundled conveniently. The package is an
appropriate way to organize our own work and share it with others.
Load More Than One Package at a Time
We can just input a vector of names to the [Link]() function to install an R
package, in the case of the library() function, this is not possible. We can load a set of
packages one at a time, or if you prefer, use one of the many workarounds developed by R
users.
In R, you can load more than one package at a time using the library() function. Simply
provide the names of the packages you want to load as a vector inside the library() function.
Here’s an example:
# Load multiple packages at once
library(caret, dplyr, ggplot2)
This code loads the caret, dplyr, and ggplot2 packages at once.
To install an R package from GitHub from devtools package.
First, you need to install devtools by running the following code:

install. Packages("devtools")

R BASICS
Simple Math
In R, you can use operators to perform common mathematical operations on numbers.
The + operator is used to add together two values:
Example
10 + 5
And the - operator is used for subtraction:
Example
10 - 5

Built-in Math Functions


R also has many built-in math functions that allows you to perform mathematical tasks on
numbers.
For example, the min() and max() functions can be used to find the lowest or highest number
in a set:
Example
max(5, 10, 15)

min(5, 10, 15)

sqrt( )
The sqrt() function returns the square root of a number:
Example
sqrt(16)

abs()
The abs() function returns the absolute (positive) value of a number:
Example
abs(-4.7)

ceiling() and floor()


The ceiling() function rounds a number upwards to its nearest integer, and the floor() function
rounds a number downwards to its nearest integer, and returns the result:
Example
ceiling(1.4)

floor(1.4)
R Variables

A variable is a memory allocated for the storage of specific data and the name associated with
the variable is used to work around this reserved block. The name given to a variable is
known as its variable name. Usually a single variable stores only the data belonging to a
certain data type. The name is so given to them because when the program executes there is
subject to change hence it varies from time to time.
Variables in R
Variables are used to store the information to be manipulated and referenced in the R
program. The R variable can store an atomic vector, a group of atomic vectors, or a
combination of many R objects.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type
of data type when the statement is run. A valid variable name contains letter, numbers, dot
and underlines characters. A variable name should start with a letter or the dot not followed
by a number.

Name of Validity Reason for valid and invalid


variable

_var_name Invalid Variable name can't start with an underscore(_).

var_name, Valid Variable can start with a dot, but dot should not be followed by a
[Link] number. In this case, the variable will be invalid.

var_name% Invalid In R, we can't use any special character in the variable name
except dot and underscore.

2var_name Invalid Variable name cant starts with a numeric digit.

Declaring and Initializing Variables in R Language


R supports three ways of variable assignment:
 Using equal operator- operators use an arrow or an equal sign to assign values to
variables.
 Using the leftward operator- data is copied from right to left.
 Using the rightward operator- data is copied from left to right.
R Variables Syntax
Types of Variable Creation in R:
 Using equal to operators
variable_name = value

 using leftward operator


variable_name <- value

 using rightward operator


value -> variable_name
Creating Variables in R

# Initialization of variables
# using equal to operator
var1 = "hello"
print(var1)
# using leftward operator
var2 <- "hello"
print(var2)
# using rightward operator
"hello" -> var3
print(var3)

Output
[1] "hello"
[1] "hello"
[1] "hello"
Rules of R Variables
The following rules need to be kept in mind while naming a R variable:
 A valid variable name consists of a combination of alphabets, numbers, dot(.), and
underscore(_) characters. Example: var.1_ is valid
 Apart from the dot and underscore operators, no other special character is allowed.
Example: var$1 or var#1 both are invalid
 Variables can start with alphabets or dot characters. Example: .var or var is valid
 The variable should not start with numbers or underscore. Example: 2var or _var
is invalid.
 If a variable starts with a dot the next thing after the dot cannot be a number.
Example: .3var is invalid
 The variable name should not be a reserved keyword in R. Example: TRUE,
FALSE,etc.
Important Methods for R Variables
R provides some useful methods to perform operations on variables. These methods are used
to determine the data type of the variable, finding a variable, deleting a variable, etc.
Following are some of the methods used to work on variables:
class() function
This built-in function is used to determine the data type of the variable provided to it. The R
variable to be checked is passed to this as an argument and it prints the data type in return.
Syntax
class(variable)
Example

var1 = "hello"
print(class(var1))

Output
[1] "character"
ls() function
This built-in function is used to know all the present variables in the workspace. This is
generally helpful when dealing with a large number of variables at once and helps prevents
overwriting any of them.
Syntax
ls()
Example

# using equal to operator


var1 = "hello"
# using leftward operator
var2 <- "hello"
# using rightward operator
"hello" -> var3
print(ls())

Output:
[1] "var1" "var2" "var3"
rm() function
This is again a built-in function used to delete an unwanted variable within your workspace.
This helps clear the memory space allocated to certain variables that are not in use thereby
creating more space for others. The name of the variable to be deleted is passed as an
argument to it.
Syntax
rm(variable)
Example

# using equal to operator


var1 = "hello"
# using leftward operator
var2 <- "hello"
# using rightward operator
"hello" -> var3
# Removing variable
rm(var3)
print(var3)

Output
Error in print(var3) : object 'var3' not found
Execution halted
Scope of Variables in R programming

The location where we can find a variable and also access it if required is called the scope of
a variable. There are mainly two types of variable scopes:
Global Variables
Global variables are those variables that exist throughout the execution of a program. It can
be changed and accessed from any part of the program.
As the name suggests, Global Variables can be accessed from any part of the program.
 They are available throughout the lifetime of a program.
 They are declared anywhere in the program outside all of the functions or blocks.
Declaring global variables
Global variables are usually declared outside of all of the functions and blocks. They can be
accessed from any portion of the program.

# R program to illustrate
# usage of global variables
# global variable
global = 5
# global variable accessed from
# within a function
display = function(){
print(global)
}
display()
# changing value of global variable
global = 10
display()

Output
[1] 5
[1] 10
Local Variables
Local variables are those variables that exist only within a certain part of a program like a
function and are released when the function call ends. Local variables do not exist outside the
block in which they are declared, i.e. they cannot be accessed or used outside that block.
Declaring local variables
Local variables are declared inside a block.

# R program to illustrate
# usage of local variables
func = function(){
# this variable is local to the
# function func() and cannot be
# accessed outside this function
age = 18
print(age)}
cat("Age is:\n")
func()

Output
Age is:
[1] 18
Difference between local and global variables in R

1. Scope A global variable is defined outside of any function and may be accessed
from anywhere in the program, as opposed to a local variable.
2. Lifetime A local variable’s lifetime is constrained by the function in which it is
defined. The local variable is destroyed once the function has finished running. A
global variable, on the other hand, doesn’t leave memory until the program is
finished running or the variable is explicitly deleted.
3. Naming conflicts If the same variable name is used in different portions of the
program, they may occur since a global variable can be accessed from anywhere
in the program. Contrarily, local variables are solely applicable to the function in
which they are defined, reducing the likelihood of naming conflicts.
4. Memory usage Because global variables are kept in memory throughout program
execution, they can eat up more memory than local variables. Local variables, on
the other hand, are created and destroyed only when necessary, therefore they
normally use less memory.
Data types of variable
R programming is a dynamically typed language, which means that we can change the data
type of the same variable again and again in our program. Because of its dynamic nature, a
variable is not declared of any data type. It gets the data type from the R-object, which is to
be assigned to the variable.
We can check the data type of the variable with the help of the class() function. Let's see an
example:
variable_y<- 124
cat("The data type of variable_y is ",class(variable_y),"\n")
variable_y<- "Learn R Programming"
cat(" Now the data type of variable_y is ",class(variable_y),"\n")
variable_y<- 133L
cat(" Next the data type of variable_y becomes ",class(variable_y),"\n")
R Vector
A vector is a basic data structure which plays an important role in R programming.
In R, a sequence of elements which share the same data type is known as vector. A vector
supports logical, integer, double, character, complex, or raw data type. The elements which
are contained in vector known as components of the vector. We can check the type of vector
with the help of the typeof() function.

The length is an important property of a vector. A vector length is basically the number of
elements in the vector, and it is calculated with the help of the length() function.
Vector is classified into two parts, i.e., Atomic vectors and Lists. They have three common
properties, i.e., function type, function length, and attribute function.
There is only one difference between atomic vectors and lists. In an atomic vector, all the
elements are of the same type, but in the list, the elements are of different data types.
How to create a vector in R?
In R, we use c() function to create a vector. This function returns a one-dimensional array or
simply vector. The c() function is a generic function which combines its argument. All
arguments are restricted with a common data type which is the type of the returned value.
There are various other ways to create a vector in R, which are as follows:
1) Using the colon(:) operator
We can create a vector with the help of the colon operator. There is the following syntax to
use colon operator:
1. z<-x:y
This operator creates a vector with elements from x to y and assigns it to z.
Example:
1. a<-4:-10
2. a
Output
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
2) Using the seq() function
In R, we can create a vector with the help of the seq() function. A sequence function creates a
sequence of elements as a vector. The seq() function is used in two ways, i.e., by setting step
size with ?by' parameter or specifying the length of the vector with the 'length. Out' feature.
Example:
seq_vec<-seq(1,4,by=0.5)
seq_vec
class(seq_vec)
Output
[1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0
Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in Data
Science. Atomic vectors are created with the help of c() function. These atomic vectors are as
follows:

Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to
any variable d, then this d variable will become a numeric type. A vector which contains
numeric elements is known as a numeric vector.
Example:
d<-45.5
num_vec<-c(10.1, 10.2, 33.2)
d
num_vec
class(d)
class(num_vec)
Output
[1] 45.5
[1] 10.1 10.2 33.2
[1] "numeric"
[1] "numeric"
Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by
"Int." The Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an
integer value to a variable, i.e., by using [Link]() function and appending of L to the value.
A vector which contains integer elements is known as an integer vector.
Example:
d<-[Link](5)
e<-5L
int_vec<-c(1,2,3,4,5)
int_vec<-[Link](int_vec)
int_vec1<-c(1L,2L,3L,4L,5L)
class(d)
class(e)
class(int_vec)
class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to
create a character data type value, i.e., using [Link]() function and by typing string
between double quotes("") or single quotes('').
A vector which contains character elements is known as an integer vector.
Example:
d<-'shubham'
e<-"Arpita"
f<-65
f<-[Link](f)
d
e
f
char_vec<-c(1,2,3,4,5)
char_vec<-[Link](char_vec)
char_vec1<-c("shubham","arpita","nishka","vaishali")
char_vec
class(d)
class(e)
class(f)
class(char_vec)
class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Logical vector
The logical data types have only two values i.e., True or False. These values are based on
which condition is satisfied. A vector which contains Boolean values is known as the logical
vector.
Example:
d<-[Link](5)
e<-[Link](6)
f<-[Link](7)
g<-d>e
h<-e<f
g
h
log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
log_vec
class(g)
class(h)
class(log_vec)
Output
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
Accessing elements of vectors
We can access the elements of a vector with the help of vector indexing. Indexing denotes the
position where the value in a vector is stored. Indexing will be performed with the help of
integer, character, or logic.

1) Indexing with integer vector


On integer vector, indexing is performed in the same way as we have applied in C, C++, and
java. There is only one difference, i.e., in C, C++, and java the indexing starts from 0, but in
R, the indexing starts from 1. Like other programming languages, we perform indexing by
specifying an integer value in square braces [] next to our vector.
Example:
seq_vec<-seq(1,4,[Link]=6)
seq_vec
seq_vec[2]
Output
[1] 1.0 1.6 2.2 2.8 3.4 4.0
[1] 1.6
2) Indexing with a character vector
In character vector indexing, we assign a unique key to each element of the vector. These
keys are uniquely defined as each element and can be accessed very easily. Let's see an
example to understand how it is performed.
Example:
char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
char_vec
char_vec["arpita"]
Output
shubham arpita vaishali
22 23 25
arpita
23
3) Indexing with a logical vector
In logical indexing, it returns the values of those positions whose corresponding position has
a logical vector TRUE. Let see an example to understand how it is performed on vectors.
Example:
a<-c(1,2,3,4,5,6)
a[c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)]
Output
[1] 1 3 4 6
Vector Operation
A vector is simply a list of items that are of the same type.
To combine the list of items to a vector, use the c() function and separate the items by a
comma.
In the example below, we create a vector variable called fruits, that combine strings:
# Vector of strings
fruits <- c("banana", "apple", "orange")

# Print fruits
fruits
1) Combining vectors
The c() function is not only used to create a vector, but also it is also used to combine two
vectors. By combining one or more vectors, it forms a new vector which contains all the
elements of each vector. Let see an example to see how c() function combines the vectors.
Example:
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
Output
[1] "1" "2" "4" "5" "7" "8"
[7] "shubham" "arpita" "nishka" "gunjan" "vaishali" "sumit"
2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations are
performed member-by-member on vectors. We can add, subtract, multiply, or divide two
vectors. Let see an example to understand how arithmetic operations are performed on
vectors.
Example:
a<-c(1,3,5,7)
b<-c(2,4,6,8)
a+b
a-b
a/b
a%%b
3) Logical Index vector
With the help of the logical index vector in R, we can form a new vector from a given vector.
This vector has the same length as the original vector. The vector members are TRUE only
when the corresponding members of the original vector are included in the slice; otherwise, it
will be false. Let see an example to understand how a new vector is formed with the help of
logical index vector.
Example:
a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
a[b]
Output
[1] "Shubham" "Nishka" "Vaishali"
4) Numeric Index
In R, we specify the index between square braces [ ] for indexing a numerical value. If our
index is negative, it will return us all the values except for the index which we have specified.
For example, specifying [-3] will prompt R to convert -3 into its absolute value and then
search for the value which occupies that index.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[2]
q[-4]
q[15]
Output
[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
5) Duplicate Index
An index vector allows duplicate values which means we can access one element twice in
one operation. Let see an example to understand how duplicate index works.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
q[c(2,4,4,3)]
Output
[1] "arpita" "gunjan" "gunjan" "nishka"
6) Range Indexes
Range index is used to slice our vector to form a new vector. For slicing, we used colon(:)
operator. Range indexes are very helpful for the situation involving a large operator. Let see
an example to understand how slicing is done with the help of the colon operator to form a
new vector.
Example:
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
b<-q[2:5]
b
Output
[1] "arpita" "nishka" "gunjan" "vaishali"
7) Out-of-order Indexes
In R, the index vector can be out-of-order. Below is an example in which a vector slice with
the order of first and second values reversed.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
2. q[c(2,1,3,4,5,6)]
Output
[1] "arpita" "shubham" "nishka" "gunjan" "vaishali" "sumit"
8) Named vectors members
We first create our vector of characters as:
1. z=c("TensorFlow","PyTorch")
2. z
Output
[1] "TensorFlow" "PyTorch"
Once our vector of characters is created, we name the first vector member as "Start" and the
second member as "End" as:
1. names(z)=c("Start","End")
2. z
Output
Start End
"TensorFlow" "PyTorch"
We retrieve the first member by its name as follows:
1. z["Start"]
Output
Start
"TensorFlow"
We can reverse the order with the help of the character string index vector.
1. z[c("Second","First")]
Output
Second First
"PyTorch" "TensorFlow"
Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are
extended to eigenvalues and eigenvector and then used for performing decomposition
in vector spaces.
2. The inputs which are provided to the deep learning model are in the form of vectors.
These vectors consist of standardized data which is supplied to the input layer of the
neural network.
3. In the development of support vector machine algorithms, vectors are used.
4. Vector operations are utilized in neural networks for various operations like image
recognition and text processing.
R Functions
A set of statements which are organized together to perform a specific task is known as a
function. R provides a series of in-built functions, and it allows the user to create their own
functions. Functions are used to perform tasks in the modular approach.
Functions are used to avoid repeating the same task and to reduce complexity. To understand
and maintain our code, we logically break it into smaller parts using the function. A function
should be
1. Written to carry out a specified task.
2. May or may not have arguments
3. Contain a body in which our code is written.
4. May or may not return one or more output values.
"An R function is created by using the keyword function." There is the following syntax
of R function:
func_name <- function(arg_1, arg_2, ...) {
Function body
}
Components of Functions
There are four components of function, which are as follows:

Function Name
The function name is the actual name of the function. In R, the function is stored as an object
with its name.
Arguments
In R, an argument is a placeholder. In function, arguments are optional means a function may
or may not contain arguments, and these arguments can have default values also. We pass a
value to the argument when a function is invoked.
Function Body
The function body contains a set of statements which defines what the function does.
Return value
It is the last expression in the function body which is to be evaluated.
Function Types
Similar to the other languages, R also has two types of function, i.e. Built-in
Function and User-defined Function. In R, there are lots of built-in functions which we can
directly call in the program without defining them. R also allows us to create our own
functions.
Built-in function
The functions which are already created or defined in the programming framework are
known as built-in functions. User doesn't need to create these types of functions, and these
functions are built into an application. End-users can access these functions by simply calling
it. R have different types of built-in functions such as seq(), mean(), max(), and sum(x) etc.
# Creating sequence of numbers from 32 to 46.
print(seq(32,46))
# Finding the mean of numbers from 22 to 80.
print(mean(22:80))
# Finding the sum of numbers from 41 to 70.
print(sum(41:70))
Output:

User-defined function
R allows us to create our own function in our program. A user defines a user-define function
to fulfil the requirement of user. Once these functions are created, we can use these functions
like in-built function.
# Creating a function without an argument.
[Link] <- function() {
for(i in 1:5) {
print(i^2)
} }
[Link]()
Output:

Function calling with an argument


We can easily call a function by passing an appropriate argument in the function. Let see an
example to see how a function is called.
# Creating a function to print squares of numbers in sequence.
[Link] <- function(a) {
for(i in 1:a) {
b <- i^2
print(b) }
# Calling the function [Link] supplying 10 as an argument.
[Link](10)
Output:

Function calling with no argument


In R, we can call a function without an argument in the following way
# Creating a function to print squares of numbers in sequence.
[Link] <- function() {
for(i in 1:5) {
a <- i^2
print(a)
} }
# Calling the function [Link] with no argument.
[Link]()
Output:

Function calling with Argument Values


We can supply the arguments to a function call in the same sequence as defined in the
function or can supply in a different sequence but assigned them to the names of the
arguments.
# Creating a function with arguments.
[Link] <- function(x,y,z) {
result <- x * y + z
print(result) }
# Calling the function by position of arguments.
[Link](11,13,9)
# Calling the function by names of the arguments.
[Link](x = 2, y = 5, z = 3)
Output:

Function calling with default arguments


To get the default result, we assign the value to the arguments in the function definition, and
then we call the function without supplying argument. If we pass any argument in the
function call, then it will get replaced with the default value of the argument in the function
definition.
# Creating a function with arguments.
[Link] <- function(x = 11, y = 24) {
result <- x * y
print(result) }
# Calling the function without giving any argument.
[Link]()
# Calling the function with giving new values of the argument.
[Link](4,6)
Functions Documentation
The output of our check tells us that we are missing documentation for
the make_shades function. Writing this kind of documentation is another part of package
development that has been made much easier by modern packages, in this case one
called roxygen2.

Missing Data in R Programming

Handling Missing Values in R Programming

As the name indicates, Missing values are those elements which are not known. NA or NaN
are reserved words that indicate a missing value in R Programming language for q
arithmetical operations that are undefined.
R – handling Missing Values
Missing values are practical in life. For example, some cells in spreadsheets are empty. If an
insensible or impossible arithmetic operation is tried then NAs occur.
Dealing Missing Values in R
Missing Values in R, are handled with the use of some pre-defined functions:
[Link]() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NA values present. It returns
a Boolean value. If NA is present in a vector it returns TRUE else FALSE.
x<- c(NA, 3, 4, NA, NA, NA)
[Link](x)

Output:
[1] TRUE FALSE FALSE TRUE TRUE TRUE
[Link]() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NaN values present. It
returns a Boolean value. If NaN is present in a vector it returns TRUE else FALSE.

x<- c(NA, 3, 4, NA, NA, 0 / 0, 0 / 0)


[Link](x)

Output:
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE
Properties of Missing Values:
 For testing objects that are NA use [Link]()
 For testing objects that are NaN use [Link]()
 There are classes under which NA comes. Hence integer class has integer type
NA, the character class has character type NA, etc.
 A NaN value is counted in NA but the reverse is not valid.
The creation of a vector with one or multiple NAs is also possible.

x<- c(NA, 3, 4, NA, NA, NA)


x

Output:
[1] NA 3 4 NA NA NA
Removing NA or NaN values
There are two ways to remove missing values:
Extracting values except for NA or NaN values:
Example 1:

x <- c(1, 2, NA, 3, NA, 4)


d <- [Link](x)
x[! d]
Output:
[1] 1 2 3 4
Function called [Link]() can also be used. This function also works on data frames.
Missing Value Filter Functions
The modeling functions in R language acknowledge a [Link] argument which provides
instructions to the function regarding its response if NA comes in its way.
And hence this way the function calls one of the missing value filter functions. Missing Value
Filter Functions alter the data set and in the new data set the value of NAs has been changed.
The default Missing Value Filter Function is [Link]. It omits every row containing even one
NA. Some other Missing Value Filter Functions are:
 [Link]– omits every row containing even one NA
 [Link]– halts and does not proceed if NA is encountered
 [Link]– excludes every row containing even one NA but keeps a record of
their original position
 [Link]– it just ignores NA and passes through it
Output:
[1] "A" "B" "C"
Error in [Link](df) : missing values in object
Calls: [Link] -> [Link]
Special Cases
There are two special cases where NA is denoted or presented differently:
 Factor Vectors– is the symbol displayed in factor vectors for missing values.
 NaN – This is a special case of NA only. It is displayed when an arithmetic
operation yields a result which is not a number. For example, dividing zero by
zero produces NaN.
Advanced Data Structures in R Programming
Data structures are very important to understand. Data structure are the objects which we will
manipulate in our day-to-day basis in R. Dealing with object conversions is the most common
sources of despairs for beginners. We can say that everything in R is an object.
R has many data structures, which include:
1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors

Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data
objects. There are six types of atomic vectors such as logical, integer, character, double, and
raw. "A vector is a collection of elements which is most commonly of mode character,
integer, logical or numeric" A vector can be one of the following two types:
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single
mode. A list contains a mixture of data types. The list is also known as generic vectors
because the element of the list can be of any type of R object. "A list is a special type of
vector in which each element can be a different type."
We can create a list with the help of list() or [Link](). We can use vector() to create a required
length empty list.
Arrays
There is another type of data objects which can store data in more than two dimensions
known as arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four
rectangular matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an
input and uses the value in the dim parameter to create an array.
Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional
rectangular layout. In the matrix, elements of the same atomic types are contained. For
mathematical calculation, this can use a matrix containing the numeric element. A matrix is
created with the help of the matrix() function in R.
Syntax
The basic syntax of creating a matrix is as follows:
1. matrix(data, no_row, no_col, by_row, dim_name)
Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which
each column contains the value of one variable, and row contains the set of value from each
column.
There are the following characteristics of a data frame:
1. The column name will be non-empty.
2. The row names will be unique.
3. A data frame stored numeric, factor or character type data.
4. Each column will contain same number of data items.
Factors

Factors in R Programming Language are data structures that are implemented to


categorize the data or represent categorical data and store it on multiple levels.
They can be stored as integers with a corresponding label to every unique integer. The R
factors may look similar to character vectors, they are integers and care must be taken while
using them as strings. The R factor accepts only a restricted number of distinct values. For
example, a data field such as gender may contain values only from female, male, or
transgender.
Attributes of Factors in R Language
 x: It is the vector that needs to be converted into a factor.
 Levels: It is a set of distinct values which are given to the input vector x.
 Labels: It is a character vector corresponding to the number of labels.
 Exclude: This will mention all the values you want to exclude.
 Ordered: This logical attribute decides whether the levels are ordered.
 nmax: It will decide the upper limit for the maximum number of levels.

Creating a Factor in R Programming Language


The command used to create or modify a factor in R language is – factor() with a vector as
input.
The two steps to creating an R factor :
 Creating a vector
 Converting the vector created into a factor using function factor()
Examples: Let us create a factor gender with levels female, male and transgender.

# Creating a vector
x <-c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor
# named gender
gender <-factor(x)
print(gender)
Output
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
Modification of a Factor in R
After a factor is formed, its components can be modified but the new values which need to
be assigned must be at the predefined level.
Example

gender <- factor(c("female", "male", "male", "female" ));


gender[2]<-"female"
gender

Output
[1] female female male female
Levels: female male
For selecting all the elements of the factor gender except ith element, gender[-i] should be
used. So if you want to modify a factor and add value out of predefined levels, then first
modify levels.
Example

gender <- factor(c("female", "male", "male", "female" ));


# add new level
levels(gender) <- c(levels(gender), "other")
gender[3] <- "other"
gender

Output
[1] female male other female
Levels: female male other
Factors in Data Frame
The Data frame is similar to a 2D array with the columns containing all the values of one
variable and the rows having one set of values from every column. There are four things to
remember about data frames:
 column names are compulsory and cannot be empty.
 Unique names should be assigned to each row.
 The data frame’s data can be only of three types- factor, numeric, and character
type.
 The same number of data items must be present in each column.

In R language when we create a data frame, its column is categorical data, and hence a R
factor is automatically created on it.
We can create a data frame and check if its column is a factor.
Example

age <- c(40, 49, 48, 40, 67, 52, 53)


salary <- c(103200, 106200, 150200,
10606, 10390, 14070, 10220)
gender <- c("male", "male", "transgender",
"female", "male", "female", "transgender")
employee<- [Link](age, salary, gender)
print(employee)
print([Link](employee$gender))

Output
age salary gender
1 40 103200 male
2 49 106200 male
3 48 150200 transgender
4 40 10606 female
5 67 10390 male
6 52 14070 female
7 53 10220 transgender

You might also like