Unit-1 Statistical Computing & R Programming
Introduction to R Programming
1. Introduction:
Ross Ihaka and Robert Gentleman from university of Auckland, New Zealand developed the open source
programming language R.
R is platform independent; it supports various platforms to name a few Windows, Linux, and Mac. R is a
programming language is mainly used for both for organizing data, statistical computing and data analysis and
visualization. R is an interpreted language and supports both procedural as well as object-oriented
programming. R provides various machine learning operations such as clustering, association rule mining,
classification and regression. R has over 10,000 packages in the CRAN repository which is constantly
increasing. This programming language was named R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.
One of the option to download (Install) R is form its official website [Link] . After downloading
and installing R, we can run R on Command prompt or any IDE (integrated development environment).
Why Use R Language? (Features of R Programming Language)
The R Language is a powerful tool widely used for data analysis, statistical computing, and machine learning.
Here are several reasons why professionals across various fields prefer R:
1. Comprehensive Statistical Analysis:
R langauge provides a wide array of statistical techniques, including linear and nonlinear modeling, classical
statistical tests, time-series analysis, classification, and clustering.
2. Advanced Data Visualization:
With packages like ggplot2, plotly, and lattice, R excels at creating complex and aesthetically pleasing data
visualizations, including plots, graphs, and charts.
3. Extensive Packages and Libraries: The Comprehensive R Archive Network (CRAN) hosts thousands of
packages that extend R’s capabilities in areas such as machine learning, data manipulation, bioinformatics, and
more.
SESHADRIPURAM COLLEGE TUMKUR Page 1
Unit-1 Statistical Computing & R Programming
4. Open Source and Free:
R is free to download and use, making it accessible to everyone. Its open-source nature encourages
community contributions and continuous improvement.
5. Platform Independence:
R is platform-independent, running on various operating systems, including Windows, macOS, and Linux,
which ensures flexibility and ease of use across different environments.
6. Integration with Other Languages:
R language can integrate with other programming languages such as C, C++, Python, Java, and SQL, allowing
for seamless interaction with various data sources and computational processes.
7. Powerful Data Handling and Storage:
R efficiently handles and stores data, supporting various data types and structures, including vectors,
matrices, data frames, and lists.
8. Robust Community and Support:
R has a vibrant and active community that provides extensive support through forums, mailing lists, and
online resources, contributing to its rich ecosystem of packages and documentation.
9. Interactive Development Environment (IDE):
RStudio, the most popular IDE for R, offers a user-friendly interface with features like syntax highlighting,
code completion, and integrated tools for plotting, history, and debugging.
10. Reproducible Research:
R supports reproducible research practices with tools like R Markdown and Knitr, enabling users to create
dynamic reports, presentations, and documents that combine code, text, and visualizations.
11. Strong Data Visualization Capabilities:
R language excels in data visualization, offering powerful tools like ggplot2 and plotly, which enable the
creation of detailed and aesthetically pleasing graphs and plots.
12. Growing Community and Support:
R language has a large and active community of users and developers who contribute to its continuous
improvement and provide extensive support through forums, mailing lists, and online resources.
13. High Demand in Data Science:
R is one of the most requested programming languages in the Data Science job market, making it a valuable
skill for professionals looking to advance their careers in this field.
Advantages of R language
R is the most comprehensive statistical analysis package. As new technology and concepts often appear first
in R.
As R programming language is an open source. Thus, you can run R anywhere and at any time.
R programming language is suitable for GNU/Linux and Windows operating systems.
SESHADRIPURAM COLLEGE TUMKUR Page 2
Unit-1 Statistical Computing & R Programming
R programming is cross-platform and runs on any operating system.
In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
Disadvantages of R language
In the R programming language, the standard of some packages is less than perfect.
Although, R commands give little pressure on memory management. So R programming language may
consume all available memory.
In R basically, nobody to complain if something doesn’t work.
R programming language is much slower than other programming languages such as Python and MATLAB.
Applications of R language
1. Data Analysis and Visualization
Statistical Analysis: R is used for performing descriptive statistics, hypothesis testing, regression analysis,
and predictive modeling. Researchers and data analysts rely on R to analyze and interpret complex
datasets.
Data Visualization: With packages like ggplot2, plotly, and lattice, R can create detailed and customized
plots, graphs, and charts. It is extensively used in exploratory data analysis (EDA) to reveal patterns,
trends, and insights in data.
2. Machine Learning
Supervised Learning: R provides multiple packages like caret, randomForest, and e1071 for building
models such as decision trees, random forests, linear regression, and support vector machines (SVM).
Unsupervised Learning: Techniques like clustering (e.g., k-means, hierarchical clustering) and
dimensionality reduction (e.g., PCA) are easily implemented in R.
Model Evaluation: R offers tools for evaluating machine learning models by computing metrics like
accuracy, precision, recall, and F1-score, as well as generating confusion matrices and ROC curves.
3. Bioinformatics
R is heavily used in the field of bioinformatics for analyzing genomic and proteomic data. It is applied in
analyzing DNA sequences, protein structures, gene expression, and more.
SESHADRIPURAM COLLEGE TUMKUR Page 3
Unit-1 Statistical Computing & R Programming
Popular bioinformatics packages include Bioconductor, which offers tools for analyzing biological data,
such as RNA sequencing and microarray data.
4. Finance and Economics
Risk Analysis and Portfolio Management: R is widely used in quantitative finance for tasks such as
financial modeling, risk assessment, and portfolio optimization.
Time Series Analysis: Financial analysts use R for analyzing time series data, forecasting stock prices, and
assessing market trends with packages like xts, zoo, and forecast.
Econometrics: Economists use R for regression analysis, econometric modeling, and simulations to
understand economic trends and predict future outcomes.
5. Healthcare and Clinical Research
Medical Statistics: R is used in clinical trials for survival analysis, drug efficacy testing, and biostatistics. It
helps researchers analyze patient data and make informed medical decisions.
Epidemiology: R plays a crucial role in tracking the spread of diseases, predicting outbreaks, and
performing epidemiological modeling.
Genomics and Proteomics: R is frequently used for genetic data analysis, including next-generation
sequencing, microarray data, and genome-wide association studies (GWAS).
6. Social Media Analysis
Text Mining: R is applied in text mining and sentiment analysis, allowing companies to extract insights
from social media platforms like Twitter, Facebook, and Instagram using packages like tm and text2vec.
Natural Language Processing (NLP): With packages like quanteda and tidytext, R enables the analysis of
textual data to identify trends, themes, and public sentiments.
7. E-commerce and Business Analytics
Customer Segmentation
Sales Forecasting
Churn Prediction
8. Geographical Data Analysis
Spatial Data Analysis: R is equipped with packages like sp, rgeos, and sf that allow users to work with
geographic data, perform spatial analysis, and visualize maps.
Geostatistics: R can be used for geospatial modeling, analyzing geographical patterns, and creating
interactive maps for location-based services.
SESHADRIPURAM COLLEGE TUMKUR Page 4
Unit-1 Statistical Computing & R Programming
R V/S Python
Feature Python R
Primary Purpose General-purpose programming language Primarily used for statistical analysis
Moderate learning curve, especially for non-
Ease of Learning Easy to learn with clear syntax
statistical users
Extensive libraries for machine learning, data Specialized libraries for statistical computing
Libraries
science (e.g., NumPy, Pandas, Scikit-learn) and visualization (e.g., ggplot2, dplyr)
Exceptional for statistical analysis of large
Data Handling Excellent for structured data (Pandas)
datasets
Visualization Good visualization (Matplotlib, Seaborn) Excellent visualization tools (ggplot2, lattice)
Machine Learning Limited machine learning libraries but strong
Strong support (TensorFlow, PyTorch)
Support in statistical modeling
Community Large community, suitable for a wide range of Strong community in statistics,
Support applications bioinformatics, and data science
Slower for large tasks, but great for statistical
Performance Faster in many general-purpose tasks
operations
Development
Commonly used in Jupyter, PyCharm Commonly used in RStudio
Environment
Object-Oriented
Strong object-oriented programming Less object-oriented focus
Support
Statistical Analysis Moderate statistical capabilities Best suited for statistical analysis
Scripting Language More general-purpose scripting Focused on data analysis scripting
More flexible for various domains like web Specialized for data manipulation, analysis,
Flexibility
development, automation and visualization
More specialized but popular in academia
Popularity Widely used across many domains
and research
Mainly used for research and statistical
Deployment Easier to deploy for production environments
reporting
Comments, variables, keywords, operators and data types in R
Types of mode in R:
1) Interactive mode: A command line shell which gives immediate feedback for each statement.
2) Script mode: A text file, containing R commands.
Tokens in R: It is the small individual element in the programs.
• Keywords
• Variables
• Identifiers
• Operators
Comments: Comments are statements completely ignored by the compiler and are used to improve the
readability of the programme. R supports only single line comment. Any statement starting with “#” is a
comment in R.
SESHADRIPURAM COLLEGE TUMKUR Page 5
Unit-1 Statistical Computing & R Programming
Comments are generic English sentences, mostly written in a program to explain what it does or what a
piece of code is supposed to do. Comments are generally used for the following purposes:
• Code Readability
• Explanation of the code or Metadata of the project
• Prevent execution of code to include resources
Types of Comments
There are generally three types of comments supported by languages, namely :
Single-Line Comments- Comment that only needs one line(# comment statement)
Multi-line Comments- Comment that requires more than one line.
Documentation Comments- Comments that are drafted usually for a quick documentation look-up.
Note: R doesn’t support Multi-line and Documentation comments
Example: # R programming
Reserved Words (Keywords): Reserved words in R programming have special meaning and cannot be used
as an identifier (variable name, function name etc.). Sample reserved words in R programming are
Identifiers in R
In R programming, identifiers are names used to identify variables, functions, or objects. R has specific rules and
best practices for naming these identifiers. Here’s an overview of the guidelines for creating identifiers in R:
Rules for Identifiers in R:
1. Allowed Characters:
o Identifiers can include letters (both uppercase and lowercase), numbers, and dots (.) or
underscores (_).
o Example: data_set1, [Link], my_data, value2, score_1
2. First Character:
o Identifiers must begin with a letter or a dot (.).
o If a dot is the first character, it cannot be followed by a number.
o Example: .variableName is valid, but .123variable is not valid.
3. Case Sensitivity:
o R is case-sensitive, meaning Variable and variable are considered different identifiers.
o Example: Score and score would refer to different objects.
4. Reserved Keywords:
o Certain words are reserved in R and cannot be used as identifiers. These include control flow
keywords and built-in functions.
o Examples of reserved words: if, else, repeat, while, function, TRUE, FALSE, NULL, Inf, NA.
5. Length:
SESHADRIPURAM COLLEGE TUMKUR Page 6
Unit-1 Statistical Computing & R Programming
o There is no strict limit on the length of an identifier in R, but for readability, it is a best practice to
use concise, descriptive names.
6. Avoid Special Characters:
o Identifiers cannot include special characters such as spaces, commas, hyphens, or mathematical
operators.
o Example: data-set or my data would not be valid.
Best Practices for Identifiers in R:
1. Descriptive Names:
o Use meaningful and descriptive names for your variables and functions. This helps improve the
readability of the code.
o Example: Instead of using x, use total_sales, customer_count, etc.
2. Consistent Naming Conventions:
o Choose a consistent style for naming, such as snake_case or camelCase:
snake_case: my_variable_name
camelCase: myVariableName
3. Avoid Starting with a Dot:
o Even though identifiers can start with a dot, it is generally avoided unless creating hidden
objects. Objects that start with a dot are treated as hidden and will not show up in typical R
environment listings (e.g., with ls()).
4. Reserved Names:
o Avoid naming variables after commonly used functions or keywords in R to prevent confusion.
o Example: Avoid names like mean, data, sum, c, or matrix for variables.
Examples of Valid and Invalid Identifiers:
Valid Identifiers Invalid Identifiers Reason
dataFrame 2ndVariable Cannot start with a number
sales_2024 first-variable Hyphens are not allowed
[Link] user name Spaces are not allowed
.hiddenVar .9hiddenVar Cannot start with a dot followed by a number
customerData TRUE TRUE is a reserved word
score_total else else is a reserved word
Operators: R supports 4 types of operators
1. Arithmetic Operators
Operator Description Example Output
a <-5
b <- 12
+ Addition 17
a+b
- Subtraction a-b -7
* Multiplication a*b 60
/ Division b/a 2.5
^ Exponent b^a 248832
%% Modulus(Remainder from division) b %% a 2
%/% Integer Division (integer quotient) b %/% a 2
SESHADRIPURAM COLLEGE TUMKUR Page 7
Unit-1 Statistical Computing & R Programming
2. Relational Operators
Operator Description Example Output
a <-5
b <- 12
< Less than a<b TRUE
> Greater than a>b FALSE
<= Less than or equal to a <= b TRUE
>= Greater than or equal to a >= b FALSE
== Equal to a == b FALSE
!= Not equal to a !=b TRUE
3. Logical Operators
The outcome of the logical operators is TRUE or FALSE. Zero is considered FALSE and non-zero numbers are
taken as TRUE. Logical operator || and && returns results taking only first element of the vector. Element
wise logical operator | and & return results by comparing element of the first vector with the corresponding
element of the second vector.
Note: If we have a vector with more than one element, use c() function which means to combine the
elements into a vector.
Operator Description Example Output
! Logical NOT a <-TRUE FALSE
print(!a)
a <- c(0,15,TRUE) TRUE FALSE FALSE
print(!a)
|| Logical OR a <- TRUE TRUE
Takes first element of both the vectors b <- FALSE
and gives the TRUE if one of them is print(a||b)
TRUE. a <-c(0,5,TRUE) FALSE
b <-c(0,3,FALSE)
print(a||b)
a <- c(5,0,TRUE) TRUE
b <- c(0,3,TRUE)
print(a||b)
&& Logical AND a <- TRUE TRUE
Returns True if both the first elements of b <- TRUE
the operands are True. a <-c(5,0,TRUE) FALSE
b <-c(0,3,TRUE)
print(a&&b)
a <- c(5,0,TRUE) TRUE
b <- c(1,3,TRUE)
print(a&&b)
SESHADRIPURAM COLLEGE TUMKUR Page 8
Unit-1 Statistical Computing & R Programming
& Element-wise Logical AND operator. a <- c(5,0,TRUE,TRUE) TRUE FALSE TRUE FALSE
It combines each element of the first b<- c(1,3,TRUE,FALSE)
vector with the corresponding element of print(a&b)
the second vector and gives a output
FALSE if one the elements is FALSE
a <- c(5,0,TRUE,TRUE) FALSE FALSE TRUE FALSE
b<- c(0,0,TRUE,FALSE)
print(a&b)
| Element-wise logical OR a <- c(5,0,TRUE,TRUE) TRUE TRUE TRUE TRUE
It combines each element of the first b <- c(1,3,TRUE,FALSE)
vector with the corresponding element of print(a|b)
the second vector and gives a output
TRUE if one of the elements is TRUE. a <- c(5,0,TRUE,TRUE) TRUE FALSE TRUE TRUE
b <- c(0,0,TRUE,FALSE)
print(a|b)
4. Assignment Operators
Variables: Variables are containers for storing data values. In R, we do not have command to declare
variables. A variable is created by assigning the value to it. In R assignment can be done in three ways.
= (Simple Assignment)
<- (Leftward Assignment) Note: <- is preferred assignment in R
-> (Rightward Assignment)
Example:
Sum = 0 # declares a variable Sum and assigns 0 to Sum
Sum <- 0 # declares a variable Sum and assigns 0 to Sum
0 -> Sum # declares a variable Sum and assigns 0 to Sum
Result <- “Pass” # assigns string value to variable Result
R Basic Data Types R has 5 basic data types as listed below with example of each.
R Basic Data Types Examples Remarks
numeric 16.55, 11 Set of all real numbers
integer 5L, 543L, Ldeclares this as an Set of all integers
integer
complex 10+4i, i is imaginary part Set of complex numbers
character “R”, ’Plots’, “ R programming” Any alphabet/number/special character
enclosed with quotes
logical TRUE or FALSE TRUE and FALSE values
Note: class() and typeof() function to check the class and data type of a variable
R Data Structures
SESHADRIPURAM COLLEGE TUMKUR Page 9
Unit-1 Statistical Computing & R Programming
The R data structures include −
Vectors
Lists
Arrays
Matrices
Factors
Data Frames
Vectors:
Definition: A vector in R is a basic data structure that contains elements of the same data type, such as numeric,
character, or logical. It is one of the simplest and most commonly used structures in R.
Key Points:
1. Homogeneous Elements: A vector can only store elements of the same type (e.g., all elements must be
numeric, character, or logical).
2. Indexing: Vector elements are indexed from 1 in R (unlike languages like Python that use 0-based
indexing).
Syntax:
To create a vector, use the c() function:
vector_name <- c(element1, element2, element3, ...)
Examples of Vector Creation
Numeric Vector: numeric_vector <- c(1, 2, 3, 4, 5)
Character Vector: char_vector <- c("apple", "banana", "cherry")
Logical Vector: logical_vector <- c(TRUE, FALSE, TRUE)
Sequence Vectors: Using : operator or seq() function
o sequence_vector <- 1:10 # Generates a sequence from 1 to 10
o sequence_vector2 <- seq(1, 20, by = 2) # Generates a sequence from 1 to 20 with a step of 2.
Operations on Vectors
1. Appending Elements to a Vector: Add elements to an existing vector using the c() function.
Syntax: new_vector <- c(existing_vector, new_element)
Example:
vec <- c(1, 2, 3)
vec <- c(vec, 4) # Appends 4 to the vector
2. Modifying Elements in a Vector: Modify specific elements by referencing their position using the index.
Syntax: vector_name[index] <- new_value
Example:
vec <- c(10, 20, 30)
vec[2] <- 25 # Changes the second element to 25
SESHADRIPURAM COLLEGE TUMKUR Page 10
Unit-1 Statistical Computing & R Programming
3. Viewing Elements in a Vector: Access individual elements or subsets of a vector using their indices.
Syntax: vector_name[index]
Example:
vec <- c(10, 20, 30, 40)
vec[2] # Accesses the second element (20)
vec[1:3] # Accesses the first three elements
4. Naming Elements in a Vector: Assign names to the elements of a vector using the names() function.
Syntax: names(vector_name) <- c("name1", "name2", "name3", ...)
Example:
vec <- c(1, 2, 3)
names(vec) <- c("First", "Second", "Third")
vec["Second"] # Accesses the element named "Second"
5. Deleting Elements from a Vector: Remove elements by assigning NULL or by creating a new vector
without the elements you wish to delete.
Syntax: vector_name <- vector_name[-index]
Example:
vec <- c(10, 20, 30, 40)
vec <- vec[-2] # Removes the second element (20)
6. Combining Vectors: Combine multiple vectors using the c() function.
Syntax: combined_vector <- c(vector1, vector2)
Example:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
combined_vec <- c(vec1, vec2)
7. Arithmetic Operations on Vectors: R allows element-wise arithmetic operations on vectors.
Syntax:
vector1 + vector2 # Addition
vector1 - vector2 # Subtraction
vector1 * vector2 # Multiplication
vector1 / vector2 # Division
Example:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result <- vec1 + vec2 # Element-wise addition
Vector Functions
Length of a Vector: length(vector_name)
Sort a Vector: sort(vector_name)
Find the Maximum and Minimum:
max(vector_name)
min(vector_name)
Sum of Vector Elements: sum(vector_name)
Mean and Median of Vector Elements:
mean(vector_name)
median(vector_name)
Check for Missing Values (NA): [Link](vector_name) # Checks for NA values
SESHADRIPURAM COLLEGE TUMKUR Page 11
Unit-1 Statistical Computing & R Programming
Lists in R Programming
Definition: A list in R is a data structure that can hold elements of different types (numeric, character, logical,
vectors, matrices, data frames, or even other lists). Unlike vectors, lists can contain heterogeneous data.
Key Points:
1. Heterogeneous Elements: Lists can contain elements of different data types, including numbers, strings,
and even other lists or more complex objects.
2. Indexing: List elements can be accessed using double square brackets [[ ]] or the $ sign if they are named.
3. Nested Structure: Lists can be nested, meaning a list can contain other lists as its elements.
4. Flexible: Lists provide great flexibility for storing complex datasets.
Syntax: To create a list, use the list() function
list_name <- list(element1, element2, element3, ...)
Examples of List Creation
Simple List: simple_list <- list(1, "apple", TRUE)
Named List: simple_list <- list(1, "apple", TRUE)
Nested List: nested_list <- list(c(1, 2, 3), list("a", "b", "c"))
Operations on Lists
1. Creating Lists: Create lists using the list() function.
Syntax: my_list <- list(element1, element2, element3)
Example:
my_list <- list(1, "banana", c(TRUE, FALSE), list("a", 1))
2. Appending Elements to a List: Add elements to an existing list by using the append() function.
Syntax: new_list <- append(existing_list, new_element)
Example:
my_list <- list(1, "banana")
my_list <- append(my_list, "apple") # Appends "apple" to the list
3. Modifying Elements in a List: Modify specific elements by referencing them using double square brackets
[[ ]] . Syntax: list_name[[index]] <- new_value
Example:
my_list <- list(1, "banana", TRUE)
my_list[[2]] <- "apple" # Changes the second element to "apple"
4. Viewing Elements in a List: Access individual elements by using double square brackets [[ ]] or by name
using the $ operator.
SESHADRIPURAM COLLEGE TUMKUR Page 12
Unit-1 Statistical Computing & R Programming
Syntax:
list_name[[index]]
list_name$name
To view a subset or multiple elements, use [
list_name[c(index1, index2)]
Example:
my_list <- list(number = 1, fruit = "banana", flag = TRUE)
my_list[[2]] # Accesses the second element ("banana")
my_list$fruit # Accesses the element with name "fruit"
5. Naming Elements in a List: Name the elements in a list when creating it or after creation using names().
Syntax: names(list_name) <- c("name1", "name2", "name3")
Example:
my_list <- list(1, "banana", TRUE)
names(my_list) <- c("number", "fruit", "flag")
my_list$fruit # Accesses the "fruit" element
6. Deleting Elements from a List: Remove elements from a list by assigning NULL to the specific element.
Syntax: list_name[[index]] <- NULL
Example:
my_list <- list(1, "banana", TRUE)
my_list[[2]] <- NULL # Removes the second element ("banana")
7. Combining Lists: Combine multiple lists using the c() function.
Syntax: combined_list <- c(list1, list2)
Example:
list1 <- list(1, 2, 3)
list2 <- list("a", "b", "c")
combined_list <- c(list1, list2)
NOTE:
List Arithmetic Operations: Unlike vectors, lists do not support direct arithmetic operations. Need to
manipulate elements within the list that are numeric.
List Functions
1. Length of a List: Syntax: length(list_name)
Example:
my_list <- list(1, "banana", TRUE)
length(my_list) # Returns 3
2. Checking for NULL Values: Syntax: [Link](list_name[[index]])
3. Converting a List to a Vector: If all elements are of the same type, you can unlist the list. Syntax:
unlist(list_name)
SESHADRIPURAM COLLEGE TUMKUR Page 13
Unit-1 Statistical Computing & R Programming
Example:
my_list <- list(1, 2, 3)
unlist(my_list) # Converts the list to a numeric vector
4. Checking the Type of Elements: Check the data type of list elements using the class() or typeof()
function. Syntax: class(list_name[[index]])
Arrays in R Programming
Definition: An array in R is a data structure that holds elements of the same data type (like numeric, character, or
logical) and arranges them into multi-dimensional formats. Arrays can have more than two dimensions, making
them a more flexible form of matrices (which are 2D arrays).
Key Points:
1. Homogeneous Elements: All elements in an array must be of the same data type.
2. Multi-dimensional: Arrays can have any number of dimensions (e.g., 1D, 2D, 3D, etc.).
3. Indexing: Arrays are indexed by row, column, and dimension. Indexing starts from 1 in R.
4. Array Creation: Arrays can be created using the array() function, where you specify the data and the
dimensions.
Syntax: To create an array, use the array() function.
array_name <- array(data, dim = c(row_size, col_size, depth_size, ...))
Examples of Array Creation
1D Array (Similar to a Vector): one_d_array <- array(1:5, dim = c(5)) # 1D array
2D Array (Matrix): two_d_array <- array(1:6, dim = c(2, 3)) # 2 rows, 3 columns
3D Array: three_d_array <- array(1:12, dim = c(2, 3, 2)) # 2 rows, 3 columns, 2 layers
Operations on Arrays
1. Creating Arrays: Arrays can be created using the array() function, which takes the data and dimensions as
arguments.
Syntax: array_name <- array(data, dim = c(row_size, col_size, depth_size))
Example: my_array <- array(1:9, dim = c(3, 3)) # Creates a 3x3 array
2. Appending Elements to an Array: In R, arrays have fixed dimensions, so you cannot directly append
elements to an array. However, you can create a new array by concatenating the original array with new
elements and reshaping it.
3. Modifying Elements in an Array: Modify elements in an array using their indices.
Syntax: array_name[row, column, dimension] <- new_value
Example:
my_array <- array(1:9, dim = c(3, 3))
my_array[2, 3] <- 10 # Changes the element at row 2, column 3 to 10
4. Viewing Elements in an Array: Access specific elements or slices of an array using indexing.
Syntax: array_name[row, column, dimension]
Example:
my_array <- array(1:12, dim = c(3, 2, 2))
SESHADRIPURAM COLLEGE TUMKUR Page 14
Unit-1 Statistical Computing & R Programming
my_array[1, , ] # Accesses the first row across all dimensions
5. Deleting Elements from an Array: Cannot delete specific elements from an array in R. You would need to
create a new array without the unwanted elements by reshaping the data.
6. Combining Arrays: Combine arrays along a new dimension using the abind() function from the abind
package, or along existing dimensions using functions like cbind() (column bind) and rbind() (row bind).
7. Arithmetic Operations on Arrays: Element-wise arithmetic operations can be performed on arrays, as
long as the dimensions are compatible.
Array Functions
Length of an Array: length(array_name)
Dimensions of an Array: dim(array_name)
Number of Rows and Columns:
nrow(array_name) # Number of rows
ncol(array_name) # Number of columns
Reshaping an Array: Use the dim() function to reshape an array.
dim(array_name) <- c(new_row_size, new_col_size, ...)
Sum of Elements in an Array: sum(array_name)
Differences between Arrays and Other Data Structures
Feature Vector List Matrix Array
Homogeneous/ Homogeneous Heterogeneous (different
Homogeneous Homogeneous
Heterogeneous (same type) types allowed)
1D (can store lists of any
Dimensionality 1D 2D Multi-dimensional
dimensions)
Double square brackets [[ ]] Row, Column,
Indexing Single index (1D) Row, Column
or $ for named elements Dimension
Numeric, Numeric, Character, Logical, Numeric, Numeric, Character,
Data Types
Character, Logical Vectors, Lists, etc. Character, Logical Logical
Simple data Complex data with mixed Tabular data like Multi-dimensional
Common Uses
storage types matrices datasets like tensors
Creation Function c() list() matrix() array()
By row and By row, column, and
Modify Elements By position By position or name
column index dimension index
Can have lists of lists Can have more than 2
Dimensionality Fixed to 1D Fixed to 2D
(nested) dimensions
Matrices in R Programming
Definition: A matrix in R is a two-dimensional data structure where all elements are of the same data type
(numeric, character, or logical). Matrices are essentially vectors with a dimension attribute that creates rows and
columns.
Key Points:
1. Homogeneous Elements: All elements in a matrix must be of the same data type (e.g., all numeric, all
character, etc.).
SESHADRIPURAM COLLEGE TUMKUR Page 15
Unit-1 Statistical Computing & R Programming
2. Two-Dimensional: A matrix has two dimensions — rows and columns.
3. Indexing: Matrix elements are indexed by both row and column numbers, and indexing starts at 1 in R.
4. Column-major Order: By default, elements are filled by column in R matrices.
5. Matrix Creation: Matrices can be created using the matrix() function or by converting vectors into
matrices.
Syntax: To create a matrix, use the matrix() function.
matrix_name <- matrix(data, nrow = row_number, ncol = column_number, byrow = FALSE/FALSE)
Examples of Matrix Creation
Creating a Numeric Matrix:
Creating a Character Matrix:
Filling a Matrix by Row:
Operations on Matrices
1. Creating Matrices: Matrices can be created using the matrix() function, which takes data, the number of
rows, and the number of columns.
Syntax: matrix_name <- matrix(data, nrow = row_number, ncol = column_number)
Example: my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
2. Appending Rows or Columns to a Matrix: Cannot directly append elements to a matrix like a list.
However, you can use the rbind() or cbind() functions to append rows or columns to an existing matrix.
Syntax:
new_matrix <- rbind(existing_matrix, new_row) # Append a row
new_matrix <- cbind(existing_matrix, new_column) # Append a column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
SESHADRIPURAM COLLEGE TUMKUR Page 16
Unit-1 Statistical Computing & R Programming
new_matrix <- rbind(my_matrix, c(7, 8, 9)) # Adds a new row
new_matrix <- cbind(my_matrix, c(7, 8)) # Adds a new column
3. Modifying Elements in a Matrix: Modify specific elements by referencing them with their row and
column indices.
Syntax: matrix_name[row, column] <- new_value
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix[1, 2] <- 10 # Changes the element at row 1, column 2 to 10
4. Viewing Elements in a Matrix: Access specific elements, rows, or columns using indexing.
Syntax:
matrix_name[row, column]
matrix_name[row, ] # Access a specific row
matrix_name[, column] # Access a specific column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix[1, 2] # Accesses the element at row 1, column 2
my_matrix[1, ] # Accesses the entire first row
my_matrix[, 2] # Accesses the entire second column
5. Naming Rows and Columns: Name the rows and columns of a matrix using the rownames() and
colnames() functions.
Syntax:
rownames(matrix_name) <- c("row1", "row2", ...)
colnames(matrix_name) <- c("col1", "col2", ...)
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
rownames(my_matrix) <- c("row1", "row2")
colnames(my_matrix) <- c("col1", "col2", "col3")
6. Deleting Rows or Columns from a Matrix: remove rows or columns from a matrix by specifying negative
indices
Syntax:
new_matrix <- matrix_name[-row_index, ] # Remove a row
new_matrix <- matrix_name[, -column_index] # Remove a column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix <- my_matrix[-1, ] # Removes the first row
7. Combining Matrices: combine matrices using the rbind() and cbind() functions to add rows or columns.
Syntax:
combined_matrix <- rbind(matrix1, matrix2) # Combine by rows
combined_matrix <- cbind(matrix1, matrix2) # Combine by columns
Example:
mat1 <- matrix(1:4, nrow = 2, ncol = 2)
mat2 <- matrix(5:8, nrow = 2, ncol = 2)
SESHADRIPURAM COLLEGE TUMKUR Page 17
Unit-1 Statistical Computing & R Programming
combined_matrix <- rbind(mat1, mat2) # Combine mat1 and mat2 by rows
Matrix Functions:
Sum of Matrix Elements: sum(matrix_name)
Row and Column Sums:
rowSums(matrix_name)
colSums(matrix_name)
Row and Column Means:
rowMeans(matrix_name)
colMeans(matrix_name)
Matrix Dimensions: dim(matrix_name)
Matrix Determinant and Inverse (for square matrices):
det(matrix_name) # Determinant
solve(matrix_name) # Inverse
Factors in R Programming
Definition: A factor in R is a data structure used to represent categorical data. Factors are stored as integers,
where each integer represents a level (a unique category or value) and a label corresponds to these levels. Factors
are particularly useful in statistical modeling when you need to distinguish between discrete categories.
Key Points:
1. Categorical Data: Factors are designed to handle categorical data (nominal or ordinal).
2. Levels: Factors have a fixed set of unique values called levels. These levels can be ordered or unordered.
3. Efficiency: Internally, factors are stored as integers, making them more memory-efficient than storing the
raw values (especially for repeated categorical values).
4. Nominal vs Ordinal: Factors can be nominal (unordered categories) or ordinal (ordered categories). R
treats these types differently in statistical modeling.
5. Used in Modeling: Factors are commonly used in data frames and for statistical modeling purposes such
as linear regression, where categorical predictors are needed.
Syntax: To create a factor, use the factor() function.
factor_name <- factor(vector, levels = c("level1", "level2", ...), ordered = TRUE/FALSE)
Examples of Factor Creation
1. Creating a Simple Factor (Nominal):
Example: colors <- factor(c("red", "blue", "green", "blue", "red"))
Output:
[1] red blue green blue red
Levels: blue green red
2. Creating an Ordered Factor (Ordinal):
SESHADRIPURAM COLLEGE TUMKUR Page 18
Unit-1 Statistical Computing & R Programming
Example: temperature <- factor(c("cold", "warm", "hot", "cold"), levels = c("cold", "warm", "hot"),
ordered = TRUE)
Output:
[1] cold warm hot cold
Levels: cold < warm < hot
3. Specifying Levels:
Example: fruits <- factor(c("apple", "banana", "cherry"), levels = c("banana", "apple", "cherry"))
Output:
[1] apple banana cherry
Levels: banana apple cherry
Operations on Factors
1. Creating Factors: Factors can be created using the factor() function by passing a vector of values and optionally
specifying the levels and order.
Syntax: factor_name <- factor(vector, levels = c("level1", "level2", ...), ordered = TRUE/FALSE)
Example: genders <- factor(c("male", "female", "male"), levels = c("female", "male"))
2. Appending Values to a Factor: Factors have fixed levels, so appending new values to a factor is not
straightforward. To add new values, you should first convert the factor to a character vector or specify the levels
beforehand.
Syntax:
# Convert factor to character, append, and create a new factor
new_factor <- factor(c([Link](factor_name), "new_value"))
Example:
fruits <- factor(c("apple", "banana"))
new_fruits <- factor(c([Link](fruits), "cherry"))
3. Modifying Factor Levels: Modify the levels of a factor using the levels() function.
Syntax: levels(factor_name) <- c("new_level1", "new_level2", ...)
Example:
genders <- factor(c("male", "female"))
levels(genders) <- c("woman", "man")
4. Viewing Factors: View the levels of a factor and its internal representation (as integers).
Syntax:
levels(factor_name) # View the levels of the factor
[Link](factor_name) # View the internal integer representation
Example:
genders <- factor(c("male", "female", "male"))
levels(genders) # Output: "female", "male"
[Link](genders) # Output: 2 1 2
5. Renaming Levels of a Factor: To rename the levels of a factor, use the levels() function to assign new names to
existing levels.
SESHADRIPURAM COLLEGE TUMKUR Page 19
Unit-1 Statistical Computing & R Programming
Syntax: levels(factor_name) <- c("new_level1", "new_level2", ...)
Example:
fruits <- factor(c("apple", "banana", "apple"))
levels(fruits) <- c("A", "B")
6. Deleting Levels of a Factor: If a factor has unused levels, you can remove them using the droplevels() function.
Syntax: factor_name <- droplevels(factor_name)
Example:
fruits <- factor(c("apple", "banana", "apple"), levels = c("apple", "banana", "cherry"))
fruits <- droplevels(fruits) # Removes "cherry" from the levels
7. Adding New Levels to a Factor: To add new levels, you can use the levels() function.
Syntax: levels(factor_name) <- c(levels(factor_name), "new_level")
Example:
fruits <- factor(c("apple", "banana"))
levels(fruits) <- c(levels(fruits), "cherry")
8. Ordering Levels in a Factor: To create an ordered factor (ordinal factor), set the ordered argument to TRUE
and specify the level order.
Syntax: factor_name <- factor(vector, levels = c("low", "medium", "high"), ordered = TRUE)
Example:
priority <- factor(c("low", "medium", "high"), levels = c("low", "medium", "high"), ordered = TRUE)
9. Checking if a Variable is a Factor: Check if a variable is a factor using the [Link]() function.
Syntax: [Link](variable)
Example: [Link](fruits) # Returns TRUE if it's a factor
10. Converting Factor to Character or Numeric: To convert a factor to a character or numeric type.
Syntax:
[Link](factor_name) # Convert to character
[Link](factor_name) # Convert to numeric (using the underlying integer codes)
Example:
fruits <- factor(c("apple", "banana"))
[Link](fruits) # Returns c("apple", "banana")
[Link](fruits) # Returns 1 2 (because "apple" is the 1st level, "banana" the 2nd)
Factor Functions
1. Check Levels of a Factor: levels(factor_name)
2. Summary of a Factor: summary(factor_name)
3. Convert Factor to Numeric (Using Actual Values): To convert factors to their actual numeric values
(instead of their underlying integer codes), first convert them to character and then to numeric.
[Link]([Link](factor_name))
SESHADRIPURAM COLLEGE TUMKUR Page 20
Unit-1 Statistical Computing & R Programming
4. Length of a Factor: length(factor_name)
Data Frames in R Programming
Definition: A data frame in R is a two-dimensional, table-like data structure that can hold different data types in
different columns (e.g., numeric, character, factor, etc.). Data frames are one of the most common ways to store
data in R, especially when handling datasets where different variables have different types.
Key Points:
1. Tabular Structure: Data frames are structured in a tabular format with rows and columns.
2. Heterogeneous Data: Unlike matrices (which require all elements to be of the same type), data frames
allow different columns to hold different types of data.
3. Row and Column Indexing: Data frames are indexed by both row and column, and indexing starts at 1 in
R.
4. Column Names: Columns can be given names, making it easy to reference them by name rather than by
index.
5. Row Names: Row names can also be set, although they are less commonly used in modern data analysis.
6. Flexible Data Storage: Data frames are flexible in that you can append new rows or columns, rename,
modify, and even delete existing data.
Syntax: To create a data frame, use the [Link]() function
data_frame_name <- [Link](column1 = c(data), column2 = c(data), ...)
Examples of Data Frame Creation
1. Creating a Simple Data Frame:
Example:
df <- [Link](Name = c("John", "Sarah", "Mike"), Age = c(23, 25, 22), Gender = c("M", "F", "M"))
print(df)
Output:
2. Creating a Data Frame with Factors:
Example: df <- [Link](Name = c("John", "Sarah", "Mike"), Age = c(23, 25, 22), Gender = factor(c("M",
"F", "M")))
3. Creating an Empty Data Frame:
Example: df <- [Link]()
SESHADRIPURAM COLLEGE TUMKUR Page 21
Unit-1 Statistical Computing & R Programming
Operations on Data Frames
1. Creating Data Frames: Data frames are created using the [Link]() function by specifying column
names and their respective values.
Syntax: df <- [Link](column1 = c(value1, value2), column2 = c(value1, value2))
Example: student_data <- [Link](Name = c("John", "Sara"), Score = c(85, 90))
2. Viewing Data Frames: View the contents of a data frame using various functions
Print the Entire Data Frame: print(df)
View the Structure: The str() function gives a compact display of the structure of the data frame.
str(df)
View Summary: Use the summary() function to see descriptive statistics of numeric columns and
the distribution of categorical columns. summary(df)
Viewing Top/Bottom Rows:
o head(df) displays the first few rows.
o tail(df) displays the last few rows.
3. Accessing Data from Data Frames:
By Column Name:
Syntax: df$column_name
Example: student_data$Name # Outputs the "Name" column
By Index: Access specific rows and columns using the row and column index
Syntax: df[row_index, column_index]
Example:
student_data[1, 2] # Accesses the element in the 1st row, 2nd column
student_data[1, ] # Accesses the entire 1st row
student_data[, 2] # Accesses the entire 2nd column
Subset Columns/Rows: You can subset specific columns or rows using the subset() function.
Syntax: subset(df, select = c(column1, column2))
Example: subset(student_data, select = c(Name, Score))
4. Modifying Data in Data Frames:
Modify Specific Elements: Modify specific elements of a data frame by indexing.
Syntax: df[row, column] <- new_value
Example: student_data[1, 2] <- 95 # Change the 1st student's score
Modify Entire Columns:
Syntax: df$column_name <- new_data
Example: student_data$Score <- c(85, 90, 80) # Modify the entire "Score" column
5. Adding New Columns:
Adding a New Column Directly: Append a new column by assigning a new vector to a new
column name.
Syntax: df$new_column <- c(data)
Example: student_data$Grade <- c("A", "A", "B")
Using cbind() to Add a Column: df <- cbind(df, new_column_name = c(data))
6. Adding New Rows: Using rbind() to Add a Row.
Syntax :
new_row <- [Link](Name = "David", Score = 88, Grade = "B")
df <- rbind(df, new_row)
7. Renaming Columns: Using names() Function rename the columns of a data frame using names().
8. Deleting Columns/Rows:
Removing Columns: remove a column by setting it to NULL.
Syntax: df$column_name <- NULL
Example: student_data$Grade <- NULL # Removes the "Grade" column
SESHADRIPURAM COLLEGE TUMKUR Page 22
Unit-1 Statistical Computing & R Programming
Removing Rows: remove rows by using negative indices.
Syntax: df <- df[-row_index, ]
Example: student_data <- student_data[-2, ] # Removes the 2nd row
Data Frame Functions
dim(df): Returns the dimensions of the data frame (number of rows and columns).
nrow(df): Returns the number of rows.
ncol(df): Returns the number of columns.
colnames(df): Returns or sets the column names of the data frame.
rownames(df): Returns or sets the row names of the data frame.
merge(df1, df2): Combines two data frames based on common columns or row names.
Difference Between Data Frames and Other Data Structures
Feature Data Frame Matrix List Vector
1D, can hold
Dimensionality 2D (rows and columns) 2D (rows and columns) 1D
heterogeneous data
Heterogeneous (different Homogeneous (all Heterogeneous (can Homogeneous (all
Data Types columns can hold elements must be the hold different types of elements must be the
different data types) same type) data) same type)
Indexing Rows and columns Rows and columns Indexed by position Indexed by position
Column
Yes No No No
Names
Row Names Yes No No No
Less flexible, restricted Fixed-size, can’t hold
Modifiability Flexible Very flexible
by homogeneity different data types
SESHADRIPURAM COLLEGE TUMKUR Page 23
Unit-1 Statistical Computing & R Programming
Special values in R
NA, NaN, Inf, -Inf and NULL are special values in R.
a) NA: NA stands for ‘not available’. NA are used to represent missing values. If you expand the size of a
vector (or matrix or array) beyond the size where values were defined, the new spaces will have the value
NA. [Link]() function is used to check if a value is NA.
[Link]() function returns a logical vector of the same size as x with value TRUE if and only if the
corresponding element in x is NA. If a numeric vector has NA value, then we cannot compute statistical
operations (eg mean, sum etc). The [Link]() will help to remove the NA values and then we can perform
the statistical operations as explained in example below.
# create x vector with two missing values dented by NA
x<- c(5,6,7,NA,10, 11, NA)
print(x)
5 6 7 NA 10 11 NA
# find sum of x vector elements
sum(x) # will show ouput as NA since the vector x has missing values
NA
# find sum of vector by removing NA using [Link] ==TRUE
sum(x, [Link] = TRUE)
39 # displays sum = 39
# find mean of Vector x, will show NA since x has missing values
mean(x)
NA
# find mean of vector by removing NA using [Link] ==TRUE
mean(x,[Link]=TRUE)
7.8 # display mean = 7.8
# create vector
x <- c(1, NA, 12, NA, 50, 30)
# remove NA values from x and copy to x_new
x_new <- x[]
SESHADRIPURAM COLLEGE TUMKUR Page 24
Unit-1 Statistical Computing & R Programming
x_new # print output after removing NA
1 12 50 30
# replace NA by some constant numeric value
> x[[Link](x)] <- 0
> x # display x with NA replaced by 0
5550550
b) NaN stands for Not a Number and applies to numerical values, as well as real and imaginary parts of
complex values, but not to values of integer vector. NaN usually appears when you divide 0 by 0, find log
of negative number etc.
x= 0
y=0
print(x/y) # display NaN indicating Not a Number
x = -5
print(log(x) # display NaN indicating Not a Number
# create vector with NaN values
x <-c(21, 12, NaN, NaN, 19, NaN, -25)
print(x) # displays 21 12 NaN NaN 19 NaN -25
# counts number of NaN in vector x using sum([Link](x))
sum([Link](x)) will print 3 indicating total number of NaN in x
# displays positions with NaN using which([Link]())
print(which([Link](x)) # prints 3,4, 6 indicating positions of NaN in x
# display Vector after removing NaN
x_new <- x[]
x_new # will display after removing NaN 21 12 19 -25
# replace NaN value by some value
x[[Link](x)] <- 5 # replace NaN by 5
x # will print 21 12 5 5 19 5 -25
c) Inf and -Inf stands for infinity (or negative infinity) and is a result of storing either a large number or a
product that is a result of division by zero. Infinite value is not missing and not a NAN.
x= 5
y=0
print(x/y) # display inf indicating infinity
x= -5
y=0
print(x/y) # display -inf indicating infinity
print(x^ 2000) # display inf
SESHADRIPURAM COLLEGE TUMKUR Page 25
Unit-1 Statistical Computing & R Programming
d) NULL: NULL is used to indicate or specify that an object is absent.
Example: A vector of zero length will show NULL value.
x<-c() # to create empty vector
print(x) # will display NULL
NULL
# to check if x is null using [Link]() function
[Link](x)
TRUE # output is true
# create vector with three elements and check if null using [Link]()
x<-c(4,3,7)
[Link](x)
FALSE # output is false
print(x)
437
Example: A list with two vectors say x and y. If you try to access 3 rd element in list, it will show null
X <-c(4,5,6) # create numerical vector
Y<- c(‘a’,’b’,’c’) # create character vector
L = list(X,Y) # create list with X and Y as element
Length (L) # will display 2 as length of list L with two elements x ,Y
rd
L[3] # display 3 element
rd
NULL # output is NULL, since L doesn’t have 3 element
SESHADRIPURAM COLLEGE TUMKUR Page 26
Unit-1 Statistical Computing & R Programming
Coercion: Coercing of an object from one type of class to another is known as coercion.
Function Description Example
[Link] Converts the value to logical type. x<- c(5, 1,0,8)
Converts Zero to False and nonzero to [Link](x)
True #output TRUE TRUE FALSE TRUE
[Link](x) Converting object to character type x<- c(5, 1, 8) #numeric vector
[Link](x)
#output ‘5’, ‘1’, ‘8’
X<- TRUE # logical
[Link](x)
# output “TRUE”
[Link]() Converts the object to integer type
[Link]() Converts the object to double precision x<- c(5, 1, 8)
type
[Link]() Converts the object to complex type x<- c(5, 1, 8)
[Link](x)
# output 5+0i 1+0i 8+0i
[Link]() It accepts only dictionary type or vector as x<-c(88,99)
input arguments in the parameter [Link](x)
#output
[[1]]
[1] 88
[[2]]
[1] 99
[Link]() x<- FALSE
Character or Logical to Numeric type [Link](x)
# output is 0, 0 represents false
X<-TRUE
[Link](x)
# output is 1, 1 represents true
x<-“543”
[Link](x)
# output 543
Matrix to vector: using [Link](matrix_name)
Vectors to Matrix: can be done in two ways
(i) rbind(vector1, vector2, vector3…..vectorN) # row wise matix
(ii) cbind(vector1, vector2, vector3…..vectorN) # column wise matrix
Note : Similarly we can convert matrix to dataframe and dataframe to matrix.
SESHADRIPURAM COLLEGE TUMKUR Page 27
Unit-1 Statistical Computing & R Programming
Sample questions:
1. What is R programming? Who developed?
2. List the difference between vector and list.
3. Mention two applications of R programming.
4. Explain NA and NAN with example.
5. Explain different types of operators in R.
6. What is a data frame? Explain any two operations on data frame.
7. Explain different matrix operations functions in R.
8. Explain different data structures in R in detail.
9. Explain features of R programming.
10. Explain special values in R.
Topics covered
1) Introduction, Comments, tokens (keywords, identifiers, operators, data types)
2) R data structure
3) Special values
4) Coercion
SESHADRIPURAM COLLEGE TUMKUR Page 28