Unit 1: Introduction of R Language
R is an interpreted language that’s strictly case- and character-sensitive, which means
that you enter instructions that follow the specific syntactic rules of the language into a
console or command-line interface. The software then interprets and executes your code
and returns any results.
NOTE: R is what’s known as a high-level programming language. Level refers to the
level of abstraction away from the fundamental details of computer execution. That is, a
low-level language will require you to do things such as manually manage the machine’s
memory allotments, but with a high-level language like R, you’re fortunately spared
these technicalities.
Why R Programming Language?
R programming is used as a leading tool for machine learning, statistics, and data
analysis. Objects, functions, and packages can easily be created by R.
It’s a platform-independent language. This means it can be applied to all
operating system.
It’s an open-source free language. That means anyone can install it in any
organization without purchasing a license.
R programming language is not only a statistic package but also allows us to
integrate with other languages (C, C++). Thus, you can easily interact with many
data sources and statistical packages.
The R programming language has a vast community of users and it’s growing day
by day.
R is currently one of the most requested programming languages in the Data
Science job market that makes it the hottest trend nowadays.
Features of R Programming Language
Statistical Features of R:
Basic Statistics: The most common basic statistics terms are the mean, mode, and
median. These are all known as “Measures of Central Tendency.” So, using the R language
we can measure central tendency very easily.
Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps, mosaic
plots, biplots, and the list goes on.
Probability distributions: Probability distributions play a vital role in statistics and by
using R we can easily handle various types of probability distribution such as Binomial
Distribution, Normal Distribution, Chi-squared Distribution and many more.
Data analysis: It provides a large, coherent and integrated collection of tools for data
analysis.
Programming Features of R:
R Packages: One of the major features of R is it has a wide availability of libraries. R has
CRAN(Comprehensive R Archive Network), which is a repository holding more than 10,
0000 packages.
Distributed Computing: Distributed computing is a model in which components of a
software system are shared among multiple computers to improve efficiency and
performance. Two new packages ddR and multidplyr used for distributed programming
in R were released in November 2015.
Programming in R:
Since R is much similar to other widely used languages syntactically, it is easier to code
and learn in R. Programs can be written in R in any of the widely used IDE like R Studio,
Rattle, Tinn-R, etc. After writing the program save the file with the extension .r. To run
the program, use the following command on the command line:
R file_name.r
Example: # R program to print Welcome to GFG!
# Below line will print "Welcome to GFG!"
cat("Welcome to GFG!")
Output:
Welcome to GFG!
Advantages of R:
R is the most comprehensive statistical analysis package. As new technology and
concepts often appear first in R.
As R programming language is an open source. Thus, you can run R anywhere
and at any time.
R programming language is suitable for GNU/Linux and Windows operating
system.
R programming is cross-platform which runs on any operating system.
In R, everyone is welcome to provide new packages, bug fixes, and code
enhancements.
Disadvantages of R:
In the R programming language, the standard of some packages is less than
perfect.
Although, R commands give little pressure to memory management. So, R
programming language may consume all available memory.
In R basically, nobody to complain if something doesn’t work.
R programming language is much slower than other programming languages
such as Python and MATLAB.
Applications of R:
We use R for Data Science. It gives us a broad variety of libraries related to
statistics. It also provides the environment for statistical computing and design.
R is used by many quantitative analysts as its programming tool. Thus, it helps in
data importing and cleaning.
R is the most prevalent language. So many data analysts and research
programmers use it. Hence, it is used as a fundamental tool for finance.
Tech giants like Google, Facebook, bing, Twitter, Accenture, Wipro and many
more using R nowadays.
NUMERICS, ARITHMETIC, ASSIGNMENT, AND VECTORS
R for Basic Math
All common arithmetic operations and mathematical functionality are ready to use at
the console prompt. You can perform addition, subtraction, multiplication, and division
with the symbols +, -, *, and /, respectively. You can create exponents (also referred to as
powers or indices) using ^, and you control the order of the calculations in a single
command using parentheses, ( ).
Arithmetic
In R, standard mathematical rules apply throughout and follow the usual left-to-right
order of operations: parentheses, exponents, multiplication, division, addition,
subtraction (PEMDAS). Here’s an example in the console:
R> 2+3
[1] 5
R> 14/6
[1] 2.333333
R> 14/6+5
[1] 7.333333
R> 14/(6+5)
[1] 1.272727
R> 3^2
[1] 9
R> 2^3
[1] 8
Assigning Objects
R has simply displayed the results of the example calculations by printing them to the
console. If you want to save the results and perform further operations, you need to be
able to assign the results of a given computation to an object in the current workspace.
You can specify an assignment in R in two ways: using arrow notation (<-) and using a
single equal sign (=).
Both methods are shown here:
R> x <- -5
R> x
[1] -5
R> x = x + 1 # this overwrites the previous value of x
R> x
[1] -4
R> mynumber = 45.2
R> y <- mynumber*x
R> y
[1] -180.8
R> ls()
[1] "mynumber" "x" "y"
As you can see from these examples, R will display the value assigned to an object when
you enter the name of the object into the console. When you use the object in
subsequent operations, R will substitute the value you assigned to it.
Vectors
The vector is the essential building block for handling multiple items in R.
Creating a Vector
In a numeric sense, you can think of a vector as a collection of observations or
measurements concerning a single variable, for example, the heights of 50 people or the
number of coffees you drink daily. More complicated data structures may consist of
several vectors.
The function for creating a vector is the single letter c, with the desired entries in
parentheses separated by commas.
R> myvec <- c(1,3,1,42)
R> myvec
[1] 1 3 1 42
Vector entries can be calculations or previously stored items (including vectors
themselves).
This code created a new vector assigned to the object my vec2. Some of the entries are
defined as arithmetic expressions, and it’s the result of the expression that’s stored in
the vector. The last element, foo, is an existing numeric object defined as 32.1.
Let’s look at another example.
This code creates and stores yet another vector, myvec3, which contains the entries of
myvec and myvec2 appended together in that order.
Sequences, Repetition, Sorting, and Lengths
Some common and useful functions associated with R vectors: seq, rep, sort, and length.
Let’s create an equally spaced sequence of increasing or decreasing numeric values. The
easiest way to create such a sequence, with numeric values separated by intervals of 1,
is to use the colon operator.
The example 3:27 should be read as “from 3 to 27 (by 1).”
Sequences with seq
You can also use the seq command, which allows for more flexible creations of
sequences. This ready-to-use function takes in a from value, a to value, and a by value,
and it returns the corresponding sequence as a numeric vector.
This gives you a sequence with intervals of 3 rather than 1.
Repetition with rep
Sequences are extremely useful, but sometimes you may want simply to repeat a certain
value. You do this using rep.
The rep function is given a single value or a vector of values as its argument x, as well as
a value for the argument’s times and each. The value for times provides the number of
times to repeat x, and each provides the number of times to repeat each element of x. In
the first line directly above, you simply repeat a single value four times. The other
examples first use rep and times on a vector to repeat the entire vector, then use each to
repeat each member of the vector, and finally use both times and each to do both at
once.
Sorting with sort
Sorting a vector in increasing or decreasing order of its elements is another simple
operation that crops up in everyday tasks. The conveniently named sort function does
just that.
You supply a vector to the function as the argument x, and a second argument,
decreasing, indicates the order in which you want to sort. This argument takes a type of
value you have not yet met: one of the all-important logical values. A logical value can be
only one of two specific, case-sensitive values: TRUE or FALSE. Generally speaking,
logicals are used to indicate the satisfaction or failure of a certain condition, and they
form an integral part of all programming languages.
For now, in regards to sort, you set decreasing=FALSE to sort from smallest to largest,
and decreasing=TRUE sorts from largest to smallest.
Finding a Vector Length with length
I’ll round off this section with the length function, which determines how many entries
exist in a vector given as the argument x.
MATRICES AND ARRAYS
A matrix is simply several vectors stored together. Whereas the size of a vector is
described by its length, the size of a matrix is specified by a number of rows and a
number of columns. You can also create higher dimensional structures that are referred
to as arrays.
Defining a Matrix
The matrix is an important mathematical construct, and it’s essential to many statistical
methods. You typically describe a matrix A as an m × n matrix; that is, A will have exactly
m rows and n columns. This means A will have a total of mn entries, with each entry a i,j
having a unique position given by its specific row (i = 1, 2, . . ., m) and column ( j = 1,
2, . . ., n).
To create a matrix in R, use the aptly named matrix command, providing the entries of
the matrix to the data argument as a vector:
You must make sure that the length of this vector matches exactly with the number of
desired rows (nrow) and columns (ncol). You can elect not to supply nrow and ncol
when calling matrix, in which case R’s default behavior is to return a single-column
matrix of the entries in data. For example, matrix(data=c(-3,2,893,0.17)) would be
identical to matrix(data=c(-3,2,893,0.17),nrow=4,ncol=1).