0% found this document useful (0 votes)
15 views28 pages

R Programming: Features and Applications

The document provides an introduction to R programming, detailing its development, features, and applications in various fields such as data analysis, machine learning, bioinformatics, and finance. It highlights the advantages and disadvantages of R, compares it with Python, and discusses the structure of R code, including comments, identifiers, and operators. Additionally, it outlines best practices for naming conventions and provides examples of valid and invalid identifiers.

Uploaded by

hegdep333
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views28 pages

R Programming: Features and Applications

The document provides an introduction to R programming, detailing its development, features, and applications in various fields such as data analysis, machine learning, bioinformatics, and finance. It highlights the advantages and disadvantages of R, compares it with Python, and discusses the structure of R code, including comments, identifiers, and operators. Additionally, it outlines best practices for naming conventions and provides examples of valid and invalid identifiers.

Uploaded by

hegdep333
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit-1 Statistical Computing & R Programming

Introduction to R Programming

1. Introduction:

Ross Ihaka and Robert Gentleman from university of Auckland, New Zealand developed the open source
programming language R.

R is platform independent; it supports various platforms to name a few Windows, Linux, and Mac. R is a
programming language is mainly used for both for organizing data, statistical computing and data analysis and
visualization. R is an interpreted language and supports both procedural as well as object-oriented
programming. R provides various machine learning operations such as clustering, association rule mining,
classification and regression. R has over 10,000 packages in the CRAN repository which is constantly
increasing. This programming language was named R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.

One of the option to download (Install) R is form its official website [Link] . After downloading
and installing R, we can run R on Command prompt or any IDE (integrated development environment).

Why Use R Language? (Features of R Programming Language)

The R Language is a powerful tool widely used for data analysis, statistical computing, and machine learning.
Here are several reasons why professionals across various fields prefer R:
1. Comprehensive Statistical Analysis:
R langauge provides a wide array of statistical techniques, including linear and nonlinear modeling, classical
statistical tests, time-series analysis, classification, and clustering.
2. Advanced Data Visualization:
With packages like ggplot2, plotly, and lattice, R excels at creating complex and aesthetically pleasing data
visualizations, including plots, graphs, and charts.
3. Extensive Packages and Libraries: The Comprehensive R Archive Network (CRAN) hosts thousands of
packages that extend R’s capabilities in areas such as machine learning, data manipulation, bioinformatics, and
more.

SESHADRIPURAM COLLEGE TUMKUR Page 1


Unit-1 Statistical Computing & R Programming

4. Open Source and Free:


R is free to download and use, making it accessible to everyone. Its open-source nature encourages
community contributions and continuous improvement.
5. Platform Independence:
R is platform-independent, running on various operating systems, including Windows, macOS, and Linux,
which ensures flexibility and ease of use across different environments.
6. Integration with Other Languages:
R language can integrate with other programming languages such as C, C++, Python, Java, and SQL, allowing
for seamless interaction with various data sources and computational processes.
7. Powerful Data Handling and Storage:
R efficiently handles and stores data, supporting various data types and structures, including vectors,
matrices, data frames, and lists.
8. Robust Community and Support:
R has a vibrant and active community that provides extensive support through forums, mailing lists, and
online resources, contributing to its rich ecosystem of packages and documentation.
9. Interactive Development Environment (IDE):
RStudio, the most popular IDE for R, offers a user-friendly interface with features like syntax highlighting,
code completion, and integrated tools for plotting, history, and debugging.
10. Reproducible Research:
R supports reproducible research practices with tools like R Markdown and Knitr, enabling users to create
dynamic reports, presentations, and documents that combine code, text, and visualizations.
11. Strong Data Visualization Capabilities:
R language excels in data visualization, offering powerful tools like ggplot2 and plotly, which enable the
creation of detailed and aesthetically pleasing graphs and plots.
12. Growing Community and Support:
R language has a large and active community of users and developers who contribute to its continuous
improvement and provide extensive support through forums, mailing lists, and online resources.
13. High Demand in Data Science:
R is one of the most requested programming languages in the Data Science job market, making it a valuable
skill for professionals looking to advance their careers in this field.
Advantages of R language
 R is the most comprehensive statistical analysis package. As new technology and concepts often appear first
in R.
 As R programming language is an open source. Thus, you can run R anywhere and at any time.
 R programming language is suitable for GNU/Linux and Windows operating systems.

SESHADRIPURAM COLLEGE TUMKUR Page 2


Unit-1 Statistical Computing & R Programming

 R programming is cross-platform and runs on any operating system.


 In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
Disadvantages of R language
 In the R programming language, the standard of some packages is less than perfect.
 Although, R commands give little pressure on memory management. So R programming language may
consume all available memory.
 In R basically, nobody to complain if something doesn’t work.
 R programming language is much slower than other programming languages such as Python and MATLAB.
Applications of R language

1. Data Analysis and Visualization

 Statistical Analysis: R is used for performing descriptive statistics, hypothesis testing, regression analysis,
and predictive modeling. Researchers and data analysts rely on R to analyze and interpret complex
datasets.
 Data Visualization: With packages like ggplot2, plotly, and lattice, R can create detailed and customized
plots, graphs, and charts. It is extensively used in exploratory data analysis (EDA) to reveal patterns,
trends, and insights in data.

2. Machine Learning

 Supervised Learning: R provides multiple packages like caret, randomForest, and e1071 for building
models such as decision trees, random forests, linear regression, and support vector machines (SVM).
 Unsupervised Learning: Techniques like clustering (e.g., k-means, hierarchical clustering) and
dimensionality reduction (e.g., PCA) are easily implemented in R.
 Model Evaluation: R offers tools for evaluating machine learning models by computing metrics like
accuracy, precision, recall, and F1-score, as well as generating confusion matrices and ROC curves.

3. Bioinformatics

 R is heavily used in the field of bioinformatics for analyzing genomic and proteomic data. It is applied in
analyzing DNA sequences, protein structures, gene expression, and more.

SESHADRIPURAM COLLEGE TUMKUR Page 3


Unit-1 Statistical Computing & R Programming

 Popular bioinformatics packages include Bioconductor, which offers tools for analyzing biological data,
such as RNA sequencing and microarray data.

4. Finance and Economics

 Risk Analysis and Portfolio Management: R is widely used in quantitative finance for tasks such as
financial modeling, risk assessment, and portfolio optimization.
 Time Series Analysis: Financial analysts use R for analyzing time series data, forecasting stock prices, and
assessing market trends with packages like xts, zoo, and forecast.
 Econometrics: Economists use R for regression analysis, econometric modeling, and simulations to
understand economic trends and predict future outcomes.

5. Healthcare and Clinical Research

 Medical Statistics: R is used in clinical trials for survival analysis, drug efficacy testing, and biostatistics. It
helps researchers analyze patient data and make informed medical decisions.
 Epidemiology: R plays a crucial role in tracking the spread of diseases, predicting outbreaks, and
performing epidemiological modeling.
 Genomics and Proteomics: R is frequently used for genetic data analysis, including next-generation
sequencing, microarray data, and genome-wide association studies (GWAS).

6. Social Media Analysis

 Text Mining: R is applied in text mining and sentiment analysis, allowing companies to extract insights
from social media platforms like Twitter, Facebook, and Instagram using packages like tm and text2vec.
 Natural Language Processing (NLP): With packages like quanteda and tidytext, R enables the analysis of
textual data to identify trends, themes, and public sentiments.

7. E-commerce and Business Analytics

 Customer Segmentation
 Sales Forecasting
 Churn Prediction

8. Geographical Data Analysis

 Spatial Data Analysis: R is equipped with packages like sp, rgeos, and sf that allow users to work with
geographic data, perform spatial analysis, and visualize maps.
 Geostatistics: R can be used for geospatial modeling, analyzing geographical patterns, and creating
interactive maps for location-based services.

SESHADRIPURAM COLLEGE TUMKUR Page 4


Unit-1 Statistical Computing & R Programming

R V/S Python

Feature Python R
Primary Purpose General-purpose programming language Primarily used for statistical analysis
Moderate learning curve, especially for non-
Ease of Learning Easy to learn with clear syntax
statistical users
Extensive libraries for machine learning, data Specialized libraries for statistical computing
Libraries
science (e.g., NumPy, Pandas, Scikit-learn) and visualization (e.g., ggplot2, dplyr)
Exceptional for statistical analysis of large
Data Handling Excellent for structured data (Pandas)
datasets
Visualization Good visualization (Matplotlib, Seaborn) Excellent visualization tools (ggplot2, lattice)
Machine Learning Limited machine learning libraries but strong
Strong support (TensorFlow, PyTorch)
Support in statistical modeling
Community Large community, suitable for a wide range of Strong community in statistics,
Support applications bioinformatics, and data science
Slower for large tasks, but great for statistical
Performance Faster in many general-purpose tasks
operations
Development
Commonly used in Jupyter, PyCharm Commonly used in RStudio
Environment
Object-Oriented
Strong object-oriented programming Less object-oriented focus
Support
Statistical Analysis Moderate statistical capabilities Best suited for statistical analysis
Scripting Language More general-purpose scripting Focused on data analysis scripting
More flexible for various domains like web Specialized for data manipulation, analysis,
Flexibility
development, automation and visualization
More specialized but popular in academia
Popularity Widely used across many domains
and research
Mainly used for research and statistical
Deployment Easier to deploy for production environments
reporting

Comments, variables, keywords, operators and data types in R

Types of mode in R:

1) Interactive mode: A command line shell which gives immediate feedback for each statement.
2) Script mode: A text file, containing R commands.

Tokens in R: It is the small individual element in the programs.


• Keywords
• Variables
• Identifiers
• Operators

Comments: Comments are statements completely ignored by the compiler and are used to improve the
readability of the programme. R supports only single line comment. Any statement starting with “#” is a
comment in R.

SESHADRIPURAM COLLEGE TUMKUR Page 5


Unit-1 Statistical Computing & R Programming

Comments are generic English sentences, mostly written in a program to explain what it does or what a
piece of code is supposed to do. Comments are generally used for the following purposes:
• Code Readability
• Explanation of the code or Metadata of the project
• Prevent execution of code to include resources
Types of Comments
There are generally three types of comments supported by languages, namely :
Single-Line Comments- Comment that only needs one line(# comment statement)
Multi-line Comments- Comment that requires more than one line.
Documentation Comments- Comments that are drafted usually for a quick documentation look-up.
Note: R doesn’t support Multi-line and Documentation comments
Example: # R programming
Reserved Words (Keywords): Reserved words in R programming have special meaning and cannot be used
as an identifier (variable name, function name etc.). Sample reserved words in R programming are

Identifiers in R

In R programming, identifiers are names used to identify variables, functions, or objects. R has specific rules and
best practices for naming these identifiers. Here’s an overview of the guidelines for creating identifiers in R:

Rules for Identifiers in R:

1. Allowed Characters:
o Identifiers can include letters (both uppercase and lowercase), numbers, and dots (.) or
underscores (_).
o Example: data_set1, [Link], my_data, value2, score_1
2. First Character:
o Identifiers must begin with a letter or a dot (.).
o If a dot is the first character, it cannot be followed by a number.
o Example: .variableName is valid, but .123variable is not valid.
3. Case Sensitivity:
o R is case-sensitive, meaning Variable and variable are considered different identifiers.
o Example: Score and score would refer to different objects.
4. Reserved Keywords:
o Certain words are reserved in R and cannot be used as identifiers. These include control flow
keywords and built-in functions.
o Examples of reserved words: if, else, repeat, while, function, TRUE, FALSE, NULL, Inf, NA.
5. Length:

SESHADRIPURAM COLLEGE TUMKUR Page 6


Unit-1 Statistical Computing & R Programming

o There is no strict limit on the length of an identifier in R, but for readability, it is a best practice to
use concise, descriptive names.
6. Avoid Special Characters:
o Identifiers cannot include special characters such as spaces, commas, hyphens, or mathematical
operators.
o Example: data-set or my data would not be valid.

Best Practices for Identifiers in R:


1. Descriptive Names:
o Use meaningful and descriptive names for your variables and functions. This helps improve the
readability of the code.
o Example: Instead of using x, use total_sales, customer_count, etc.
2. Consistent Naming Conventions:
o Choose a consistent style for naming, such as snake_case or camelCase:
 snake_case: my_variable_name
 camelCase: myVariableName
3. Avoid Starting with a Dot:
o Even though identifiers can start with a dot, it is generally avoided unless creating hidden
objects. Objects that start with a dot are treated as hidden and will not show up in typical R
environment listings (e.g., with ls()).
4. Reserved Names:
o Avoid naming variables after commonly used functions or keywords in R to prevent confusion.
o Example: Avoid names like mean, data, sum, c, or matrix for variables.

Examples of Valid and Invalid Identifiers:

Valid Identifiers Invalid Identifiers Reason


dataFrame 2ndVariable Cannot start with a number
sales_2024 first-variable Hyphens are not allowed
[Link] user name Spaces are not allowed
.hiddenVar .9hiddenVar Cannot start with a dot followed by a number
customerData TRUE TRUE is a reserved word
score_total else else is a reserved word

Operators: R supports 4 types of operators


1. Arithmetic Operators

Operator Description Example Output


a <-5
b <- 12
+ Addition 17
a+b
- Subtraction a-b -7
* Multiplication a*b 60
/ Division b/a 2.5
^ Exponent b^a 248832
%% Modulus(Remainder from division) b %% a 2
%/% Integer Division (integer quotient) b %/% a 2

SESHADRIPURAM COLLEGE TUMKUR Page 7


Unit-1 Statistical Computing & R Programming

2. Relational Operators

Operator Description Example Output


a <-5
b <- 12

< Less than a<b TRUE

> Greater than a>b FALSE


<= Less than or equal to a <= b TRUE
>= Greater than or equal to a >= b FALSE
== Equal to a == b FALSE
!= Not equal to a !=b TRUE

3. Logical Operators
The outcome of the logical operators is TRUE or FALSE. Zero is considered FALSE and non-zero numbers are
taken as TRUE. Logical operator || and && returns results taking only first element of the vector. Element
wise logical operator | and & return results by comparing element of the first vector with the corresponding
element of the second vector.

Note: If we have a vector with more than one element, use c() function which means to combine the
elements into a vector.

Operator Description Example Output

! Logical NOT a <-TRUE FALSE


print(!a)
a <- c(0,15,TRUE) TRUE FALSE FALSE
print(!a)
|| Logical OR a <- TRUE TRUE
Takes first element of both the vectors b <- FALSE
and gives the TRUE if one of them is print(a||b)
TRUE. a <-c(0,5,TRUE) FALSE
b <-c(0,3,FALSE)
print(a||b)
a <- c(5,0,TRUE) TRUE
b <- c(0,3,TRUE)
print(a||b)
&& Logical AND a <- TRUE TRUE
Returns True if both the first elements of b <- TRUE
the operands are True. a <-c(5,0,TRUE) FALSE
b <-c(0,3,TRUE)
print(a&&b)
a <- c(5,0,TRUE) TRUE
b <- c(1,3,TRUE)
print(a&&b)

SESHADRIPURAM COLLEGE TUMKUR Page 8


Unit-1 Statistical Computing & R Programming

& Element-wise Logical AND operator. a <- c(5,0,TRUE,TRUE) TRUE FALSE TRUE FALSE
It combines each element of the first b<- c(1,3,TRUE,FALSE)
vector with the corresponding element of print(a&b)
the second vector and gives a output
FALSE if one the elements is FALSE
a <- c(5,0,TRUE,TRUE) FALSE FALSE TRUE FALSE
b<- c(0,0,TRUE,FALSE)
print(a&b)

| Element-wise logical OR a <- c(5,0,TRUE,TRUE) TRUE TRUE TRUE TRUE


It combines each element of the first b <- c(1,3,TRUE,FALSE)
vector with the corresponding element of print(a|b)
the second vector and gives a output
TRUE if one of the elements is TRUE. a <- c(5,0,TRUE,TRUE) TRUE FALSE TRUE TRUE
b <- c(0,0,TRUE,FALSE)
print(a|b)

4. Assignment Operators

Variables: Variables are containers for storing data values. In R, we do not have command to declare
variables. A variable is created by assigning the value to it. In R assignment can be done in three ways.

= (Simple Assignment)

<- (Leftward Assignment) Note: <- is preferred assignment in R

-> (Rightward Assignment)

Example:

Sum = 0 # declares a variable Sum and assigns 0 to Sum


Sum <- 0 # declares a variable Sum and assigns 0 to Sum
0 -> Sum # declares a variable Sum and assigns 0 to Sum
Result <- “Pass” # assigns string value to variable Result

R Basic Data Types R has 5 basic data types as listed below with example of each.

R Basic Data Types Examples Remarks


numeric 16.55, 11 Set of all real numbers
integer 5L, 543L, Ldeclares this as an Set of all integers
integer
complex 10+4i, i is imaginary part Set of complex numbers
character “R”, ’Plots’, “ R programming” Any alphabet/number/special character
enclosed with quotes
logical TRUE or FALSE TRUE and FALSE values

Note: class() and typeof() function to check the class and data type of a variable

R Data Structures

SESHADRIPURAM COLLEGE TUMKUR Page 9


Unit-1 Statistical Computing & R Programming

The R data structures include −

 Vectors
 Lists
 Arrays
 Matrices
 Factors
 Data Frames

Vectors:

Definition: A vector in R is a basic data structure that contains elements of the same data type, such as numeric,
character, or logical. It is one of the simplest and most commonly used structures in R.

Key Points:

1. Homogeneous Elements: A vector can only store elements of the same type (e.g., all elements must be
numeric, character, or logical).
2. Indexing: Vector elements are indexed from 1 in R (unlike languages like Python that use 0-based
indexing).

Syntax:

To create a vector, use the c() function:

vector_name <- c(element1, element2, element3, ...)

Examples of Vector Creation

 Numeric Vector: numeric_vector <- c(1, 2, 3, 4, 5)


 Character Vector: char_vector <- c("apple", "banana", "cherry")
 Logical Vector: logical_vector <- c(TRUE, FALSE, TRUE)
 Sequence Vectors: Using : operator or seq() function
o sequence_vector <- 1:10 # Generates a sequence from 1 to 10
o sequence_vector2 <- seq(1, 20, by = 2) # Generates a sequence from 1 to 20 with a step of 2.

Operations on Vectors

1. Appending Elements to a Vector: Add elements to an existing vector using the c() function.
Syntax: new_vector <- c(existing_vector, new_element)
Example:
vec <- c(1, 2, 3)
vec <- c(vec, 4) # Appends 4 to the vector

2. Modifying Elements in a Vector: Modify specific elements by referencing their position using the index.
Syntax: vector_name[index] <- new_value
Example:
vec <- c(10, 20, 30)
vec[2] <- 25 # Changes the second element to 25

SESHADRIPURAM COLLEGE TUMKUR Page 10


Unit-1 Statistical Computing & R Programming

3. Viewing Elements in a Vector: Access individual elements or subsets of a vector using their indices.
Syntax: vector_name[index]
Example:
vec <- c(10, 20, 30, 40)
vec[2] # Accesses the second element (20)
vec[1:3] # Accesses the first three elements

4. Naming Elements in a Vector: Assign names to the elements of a vector using the names() function.
Syntax: names(vector_name) <- c("name1", "name2", "name3", ...)
Example:
vec <- c(1, 2, 3)
names(vec) <- c("First", "Second", "Third")
vec["Second"] # Accesses the element named "Second"

5. Deleting Elements from a Vector: Remove elements by assigning NULL or by creating a new vector
without the elements you wish to delete.
Syntax: vector_name <- vector_name[-index]
Example:
vec <- c(10, 20, 30, 40)
vec <- vec[-2] # Removes the second element (20)

6. Combining Vectors: Combine multiple vectors using the c() function.


Syntax: combined_vector <- c(vector1, vector2)
Example:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
combined_vec <- c(vec1, vec2)

7. Arithmetic Operations on Vectors: R allows element-wise arithmetic operations on vectors.


Syntax:
vector1 + vector2 # Addition
vector1 - vector2 # Subtraction
vector1 * vector2 # Multiplication
vector1 / vector2 # Division
Example:
vec1 <- c(1, 2, 3)
vec2 <- c(4, 5, 6)
result <- vec1 + vec2 # Element-wise addition

Vector Functions

 Length of a Vector: length(vector_name)


 Sort a Vector: sort(vector_name)
 Find the Maximum and Minimum:
max(vector_name)
min(vector_name)
 Sum of Vector Elements: sum(vector_name)
 Mean and Median of Vector Elements:
mean(vector_name)
median(vector_name)
 Check for Missing Values (NA): [Link](vector_name) # Checks for NA values

SESHADRIPURAM COLLEGE TUMKUR Page 11


Unit-1 Statistical Computing & R Programming

Lists in R Programming

Definition: A list in R is a data structure that can hold elements of different types (numeric, character, logical,
vectors, matrices, data frames, or even other lists). Unlike vectors, lists can contain heterogeneous data.

Key Points:

1. Heterogeneous Elements: Lists can contain elements of different data types, including numbers, strings,
and even other lists or more complex objects.
2. Indexing: List elements can be accessed using double square brackets [[ ]] or the $ sign if they are named.
3. Nested Structure: Lists can be nested, meaning a list can contain other lists as its elements.
4. Flexible: Lists provide great flexibility for storing complex datasets.

Syntax: To create a list, use the list() function

list_name <- list(element1, element2, element3, ...)

Examples of List Creation

 Simple List: simple_list <- list(1, "apple", TRUE)


 Named List: simple_list <- list(1, "apple", TRUE)
 Nested List: nested_list <- list(c(1, 2, 3), list("a", "b", "c"))

Operations on Lists

1. Creating Lists: Create lists using the list() function.

Syntax: my_list <- list(element1, element2, element3)

Example:

my_list <- list(1, "banana", c(TRUE, FALSE), list("a", 1))

2. Appending Elements to a List: Add elements to an existing list by using the append() function.

Syntax: new_list <- append(existing_list, new_element)

Example:
my_list <- list(1, "banana")
my_list <- append(my_list, "apple") # Appends "apple" to the list

3. Modifying Elements in a List: Modify specific elements by referencing them using double square brackets
[[ ]] . Syntax: list_name[[index]] <- new_value
Example:
my_list <- list(1, "banana", TRUE)
my_list[[2]] <- "apple" # Changes the second element to "apple"

4. Viewing Elements in a List: Access individual elements by using double square brackets [[ ]] or by name
using the $ operator.

SESHADRIPURAM COLLEGE TUMKUR Page 12


Unit-1 Statistical Computing & R Programming

Syntax:
list_name[[index]]
list_name$name

To view a subset or multiple elements, use [


list_name[c(index1, index2)]

Example:
my_list <- list(number = 1, fruit = "banana", flag = TRUE)
my_list[[2]] # Accesses the second element ("banana")
my_list$fruit # Accesses the element with name "fruit"

5. Naming Elements in a List: Name the elements in a list when creating it or after creation using names().
Syntax: names(list_name) <- c("name1", "name2", "name3")
Example:
my_list <- list(1, "banana", TRUE)
names(my_list) <- c("number", "fruit", "flag")
my_list$fruit # Accesses the "fruit" element

6. Deleting Elements from a List: Remove elements from a list by assigning NULL to the specific element.

Syntax: list_name[[index]] <- NULL

Example:
my_list <- list(1, "banana", TRUE)
my_list[[2]] <- NULL # Removes the second element ("banana")

7. Combining Lists: Combine multiple lists using the c() function.

Syntax: combined_list <- c(list1, list2)

Example:
list1 <- list(1, 2, 3)
list2 <- list("a", "b", "c")
combined_list <- c(list1, list2)

NOTE:
List Arithmetic Operations: Unlike vectors, lists do not support direct arithmetic operations. Need to
manipulate elements within the list that are numeric.

List Functions

1. Length of a List: Syntax: length(list_name)

Example:
my_list <- list(1, "banana", TRUE)
length(my_list) # Returns 3

2. Checking for NULL Values: Syntax: [Link](list_name[[index]])

3. Converting a List to a Vector: If all elements are of the same type, you can unlist the list. Syntax:
unlist(list_name)

SESHADRIPURAM COLLEGE TUMKUR Page 13


Unit-1 Statistical Computing & R Programming

Example:
my_list <- list(1, 2, 3)
unlist(my_list) # Converts the list to a numeric vector

4. Checking the Type of Elements: Check the data type of list elements using the class() or typeof()
function. Syntax: class(list_name[[index]])
Arrays in R Programming

Definition: An array in R is a data structure that holds elements of the same data type (like numeric, character, or
logical) and arranges them into multi-dimensional formats. Arrays can have more than two dimensions, making
them a more flexible form of matrices (which are 2D arrays).

Key Points:

1. Homogeneous Elements: All elements in an array must be of the same data type.
2. Multi-dimensional: Arrays can have any number of dimensions (e.g., 1D, 2D, 3D, etc.).
3. Indexing: Arrays are indexed by row, column, and dimension. Indexing starts from 1 in R.
4. Array Creation: Arrays can be created using the array() function, where you specify the data and the
dimensions.

Syntax: To create an array, use the array() function.

array_name <- array(data, dim = c(row_size, col_size, depth_size, ...))

Examples of Array Creation

 1D Array (Similar to a Vector): one_d_array <- array(1:5, dim = c(5)) # 1D array


 2D Array (Matrix): two_d_array <- array(1:6, dim = c(2, 3)) # 2 rows, 3 columns
 3D Array: three_d_array <- array(1:12, dim = c(2, 3, 2)) # 2 rows, 3 columns, 2 layers

Operations on Arrays

1. Creating Arrays: Arrays can be created using the array() function, which takes the data and dimensions as
arguments.
Syntax: array_name <- array(data, dim = c(row_size, col_size, depth_size))
Example: my_array <- array(1:9, dim = c(3, 3)) # Creates a 3x3 array

2. Appending Elements to an Array: In R, arrays have fixed dimensions, so you cannot directly append
elements to an array. However, you can create a new array by concatenating the original array with new
elements and reshaping it.

3. Modifying Elements in an Array: Modify elements in an array using their indices.


Syntax: array_name[row, column, dimension] <- new_value
Example:
my_array <- array(1:9, dim = c(3, 3))
my_array[2, 3] <- 10 # Changes the element at row 2, column 3 to 10
4. Viewing Elements in an Array: Access specific elements or slices of an array using indexing.
Syntax: array_name[row, column, dimension]
Example:
my_array <- array(1:12, dim = c(3, 2, 2))

SESHADRIPURAM COLLEGE TUMKUR Page 14


Unit-1 Statistical Computing & R Programming

my_array[1, , ] # Accesses the first row across all dimensions

5. Deleting Elements from an Array: Cannot delete specific elements from an array in R. You would need to
create a new array without the unwanted elements by reshaping the data.
6. Combining Arrays: Combine arrays along a new dimension using the abind() function from the abind
package, or along existing dimensions using functions like cbind() (column bind) and rbind() (row bind).
7. Arithmetic Operations on Arrays: Element-wise arithmetic operations can be performed on arrays, as
long as the dimensions are compatible.
Array Functions

 Length of an Array: length(array_name)


 Dimensions of an Array: dim(array_name)
 Number of Rows and Columns:
nrow(array_name) # Number of rows
ncol(array_name) # Number of columns
 Reshaping an Array: Use the dim() function to reshape an array.
dim(array_name) <- c(new_row_size, new_col_size, ...)
 Sum of Elements in an Array: sum(array_name)

Differences between Arrays and Other Data Structures

Feature Vector List Matrix Array


Homogeneous/ Homogeneous Heterogeneous (different
Homogeneous Homogeneous
Heterogeneous (same type) types allowed)
1D (can store lists of any
Dimensionality 1D 2D Multi-dimensional
dimensions)
Double square brackets [[ ]] Row, Column,
Indexing Single index (1D) Row, Column
or $ for named elements Dimension
Numeric, Numeric, Character, Logical, Numeric, Numeric, Character,
Data Types
Character, Logical Vectors, Lists, etc. Character, Logical Logical
Simple data Complex data with mixed Tabular data like Multi-dimensional
Common Uses
storage types matrices datasets like tensors
Creation Function c() list() matrix() array()
By row and By row, column, and
Modify Elements By position By position or name
column index dimension index
Can have lists of lists Can have more than 2
Dimensionality Fixed to 1D Fixed to 2D
(nested) dimensions

Matrices in R Programming

Definition: A matrix in R is a two-dimensional data structure where all elements are of the same data type
(numeric, character, or logical). Matrices are essentially vectors with a dimension attribute that creates rows and
columns.

Key Points:

1. Homogeneous Elements: All elements in a matrix must be of the same data type (e.g., all numeric, all
character, etc.).

SESHADRIPURAM COLLEGE TUMKUR Page 15


Unit-1 Statistical Computing & R Programming

2. Two-Dimensional: A matrix has two dimensions — rows and columns.


3. Indexing: Matrix elements are indexed by both row and column numbers, and indexing starts at 1 in R.
4. Column-major Order: By default, elements are filled by column in R matrices.
5. Matrix Creation: Matrices can be created using the matrix() function or by converting vectors into
matrices.

Syntax: To create a matrix, use the matrix() function.

matrix_name <- matrix(data, nrow = row_number, ncol = column_number, byrow = FALSE/FALSE)

Examples of Matrix Creation

 Creating a Numeric Matrix:

 Creating a Character Matrix:

 Filling a Matrix by Row:

Operations on Matrices

1. Creating Matrices: Matrices can be created using the matrix() function, which takes data, the number of
rows, and the number of columns.
Syntax: matrix_name <- matrix(data, nrow = row_number, ncol = column_number)
Example: my_matrix <- matrix(1:9, nrow = 3, ncol = 3)

2. Appending Rows or Columns to a Matrix: Cannot directly append elements to a matrix like a list.
However, you can use the rbind() or cbind() functions to append rows or columns to an existing matrix.
Syntax:
new_matrix <- rbind(existing_matrix, new_row) # Append a row
new_matrix <- cbind(existing_matrix, new_column) # Append a column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)

SESHADRIPURAM COLLEGE TUMKUR Page 16


Unit-1 Statistical Computing & R Programming

new_matrix <- rbind(my_matrix, c(7, 8, 9)) # Adds a new row


new_matrix <- cbind(my_matrix, c(7, 8)) # Adds a new column

3. Modifying Elements in a Matrix: Modify specific elements by referencing them with their row and
column indices.
Syntax: matrix_name[row, column] <- new_value
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix[1, 2] <- 10 # Changes the element at row 1, column 2 to 10

4. Viewing Elements in a Matrix: Access specific elements, rows, or columns using indexing.
Syntax:
matrix_name[row, column]
matrix_name[row, ] # Access a specific row
matrix_name[, column] # Access a specific column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix[1, 2] # Accesses the element at row 1, column 2
my_matrix[1, ] # Accesses the entire first row
my_matrix[, 2] # Accesses the entire second column
5. Naming Rows and Columns: Name the rows and columns of a matrix using the rownames() and
colnames() functions.
Syntax:
rownames(matrix_name) <- c("row1", "row2", ...)
colnames(matrix_name) <- c("col1", "col2", ...)
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
rownames(my_matrix) <- c("row1", "row2")
colnames(my_matrix) <- c("col1", "col2", "col3")

6. Deleting Rows or Columns from a Matrix: remove rows or columns from a matrix by specifying negative
indices
Syntax:
new_matrix <- matrix_name[-row_index, ] # Remove a row
new_matrix <- matrix_name[, -column_index] # Remove a column
Example:
my_matrix <- matrix(1:6, nrow = 2, ncol = 3)
my_matrix <- my_matrix[-1, ] # Removes the first row

7. Combining Matrices: combine matrices using the rbind() and cbind() functions to add rows or columns.
Syntax:
combined_matrix <- rbind(matrix1, matrix2) # Combine by rows
combined_matrix <- cbind(matrix1, matrix2) # Combine by columns
Example:
mat1 <- matrix(1:4, nrow = 2, ncol = 2)
mat2 <- matrix(5:8, nrow = 2, ncol = 2)

SESHADRIPURAM COLLEGE TUMKUR Page 17


Unit-1 Statistical Computing & R Programming

combined_matrix <- rbind(mat1, mat2) # Combine mat1 and mat2 by rows

Matrix Functions:
 Sum of Matrix Elements: sum(matrix_name)
 Row and Column Sums:
rowSums(matrix_name)
colSums(matrix_name)
 Row and Column Means:
rowMeans(matrix_name)
colMeans(matrix_name)
 Matrix Dimensions: dim(matrix_name)
 Matrix Determinant and Inverse (for square matrices):
det(matrix_name) # Determinant
solve(matrix_name) # Inverse

Factors in R Programming

Definition: A factor in R is a data structure used to represent categorical data. Factors are stored as integers,
where each integer represents a level (a unique category or value) and a label corresponds to these levels. Factors
are particularly useful in statistical modeling when you need to distinguish between discrete categories.

Key Points:

1. Categorical Data: Factors are designed to handle categorical data (nominal or ordinal).
2. Levels: Factors have a fixed set of unique values called levels. These levels can be ordered or unordered.
3. Efficiency: Internally, factors are stored as integers, making them more memory-efficient than storing the
raw values (especially for repeated categorical values).
4. Nominal vs Ordinal: Factors can be nominal (unordered categories) or ordinal (ordered categories). R
treats these types differently in statistical modeling.
5. Used in Modeling: Factors are commonly used in data frames and for statistical modeling purposes such
as linear regression, where categorical predictors are needed.

Syntax: To create a factor, use the factor() function.

factor_name <- factor(vector, levels = c("level1", "level2", ...), ordered = TRUE/FALSE)

Examples of Factor Creation

1. Creating a Simple Factor (Nominal):

Example: colors <- factor(c("red", "blue", "green", "blue", "red"))


Output:

[1] red blue green blue red

Levels: blue green red

2. Creating an Ordered Factor (Ordinal):

SESHADRIPURAM COLLEGE TUMKUR Page 18


Unit-1 Statistical Computing & R Programming

Example: temperature <- factor(c("cold", "warm", "hot", "cold"), levels = c("cold", "warm", "hot"),
ordered = TRUE)
Output:
[1] cold warm hot cold
Levels: cold < warm < hot
3. Specifying Levels:
Example: fruits <- factor(c("apple", "banana", "cherry"), levels = c("banana", "apple", "cherry"))
Output:
[1] apple banana cherry
Levels: banana apple cherry

Operations on Factors

1. Creating Factors: Factors can be created using the factor() function by passing a vector of values and optionally
specifying the levels and order.
Syntax: factor_name <- factor(vector, levels = c("level1", "level2", ...), ordered = TRUE/FALSE)
Example: genders <- factor(c("male", "female", "male"), levels = c("female", "male"))

2. Appending Values to a Factor: Factors have fixed levels, so appending new values to a factor is not
straightforward. To add new values, you should first convert the factor to a character vector or specify the levels
beforehand.
Syntax:
# Convert factor to character, append, and create a new factor
new_factor <- factor(c([Link](factor_name), "new_value"))
Example:
fruits <- factor(c("apple", "banana"))
new_fruits <- factor(c([Link](fruits), "cherry"))

3. Modifying Factor Levels: Modify the levels of a factor using the levels() function.

Syntax: levels(factor_name) <- c("new_level1", "new_level2", ...)


Example:
genders <- factor(c("male", "female"))
levels(genders) <- c("woman", "man")

4. Viewing Factors: View the levels of a factor and its internal representation (as integers).

Syntax:
levels(factor_name) # View the levels of the factor
[Link](factor_name) # View the internal integer representation
Example:
genders <- factor(c("male", "female", "male"))
levels(genders) # Output: "female", "male"
[Link](genders) # Output: 2 1 2

5. Renaming Levels of a Factor: To rename the levels of a factor, use the levels() function to assign new names to
existing levels.

SESHADRIPURAM COLLEGE TUMKUR Page 19


Unit-1 Statistical Computing & R Programming

Syntax: levels(factor_name) <- c("new_level1", "new_level2", ...)


Example:
fruits <- factor(c("apple", "banana", "apple"))
levels(fruits) <- c("A", "B")

6. Deleting Levels of a Factor: If a factor has unused levels, you can remove them using the droplevels() function.

Syntax: factor_name <- droplevels(factor_name)


Example:
fruits <- factor(c("apple", "banana", "apple"), levels = c("apple", "banana", "cherry"))
fruits <- droplevels(fruits) # Removes "cherry" from the levels

7. Adding New Levels to a Factor: To add new levels, you can use the levels() function.

Syntax: levels(factor_name) <- c(levels(factor_name), "new_level")


Example:
fruits <- factor(c("apple", "banana"))
levels(fruits) <- c(levels(fruits), "cherry")

8. Ordering Levels in a Factor: To create an ordered factor (ordinal factor), set the ordered argument to TRUE
and specify the level order.

Syntax: factor_name <- factor(vector, levels = c("low", "medium", "high"), ordered = TRUE)
Example:
priority <- factor(c("low", "medium", "high"), levels = c("low", "medium", "high"), ordered = TRUE)

9. Checking if a Variable is a Factor: Check if a variable is a factor using the [Link]() function.

Syntax: [Link](variable)
Example: [Link](fruits) # Returns TRUE if it's a factor

10. Converting Factor to Character or Numeric: To convert a factor to a character or numeric type.

Syntax:
[Link](factor_name) # Convert to character
[Link](factor_name) # Convert to numeric (using the underlying integer codes)
Example:
fruits <- factor(c("apple", "banana"))
[Link](fruits) # Returns c("apple", "banana")
[Link](fruits) # Returns 1 2 (because "apple" is the 1st level, "banana" the 2nd)

Factor Functions

1. Check Levels of a Factor: levels(factor_name)


2. Summary of a Factor: summary(factor_name)
3. Convert Factor to Numeric (Using Actual Values): To convert factors to their actual numeric values
(instead of their underlying integer codes), first convert them to character and then to numeric.
[Link]([Link](factor_name))

SESHADRIPURAM COLLEGE TUMKUR Page 20


Unit-1 Statistical Computing & R Programming

4. Length of a Factor: length(factor_name)

Data Frames in R Programming

Definition: A data frame in R is a two-dimensional, table-like data structure that can hold different data types in
different columns (e.g., numeric, character, factor, etc.). Data frames are one of the most common ways to store
data in R, especially when handling datasets where different variables have different types.

Key Points:

1. Tabular Structure: Data frames are structured in a tabular format with rows and columns.
2. Heterogeneous Data: Unlike matrices (which require all elements to be of the same type), data frames
allow different columns to hold different types of data.
3. Row and Column Indexing: Data frames are indexed by both row and column, and indexing starts at 1 in
R.
4. Column Names: Columns can be given names, making it easy to reference them by name rather than by
index.
5. Row Names: Row names can also be set, although they are less commonly used in modern data analysis.
6. Flexible Data Storage: Data frames are flexible in that you can append new rows or columns, rename,
modify, and even delete existing data.

Syntax: To create a data frame, use the [Link]() function

data_frame_name <- [Link](column1 = c(data), column2 = c(data), ...)

Examples of Data Frame Creation

1. Creating a Simple Data Frame:

Example:

df <- [Link](Name = c("John", "Sarah", "Mike"), Age = c(23, 25, 22), Gender = c("M", "F", "M"))

print(df)

Output:

2. Creating a Data Frame with Factors:


Example: df <- [Link](Name = c("John", "Sarah", "Mike"), Age = c(23, 25, 22), Gender = factor(c("M",
"F", "M")))
3. Creating an Empty Data Frame:
Example: df <- [Link]()

SESHADRIPURAM COLLEGE TUMKUR Page 21


Unit-1 Statistical Computing & R Programming

Operations on Data Frames

1. Creating Data Frames: Data frames are created using the [Link]() function by specifying column
names and their respective values.
Syntax: df <- [Link](column1 = c(value1, value2), column2 = c(value1, value2))

Example: student_data <- [Link](Name = c("John", "Sara"), Score = c(85, 90))

2. Viewing Data Frames: View the contents of a data frame using various functions
 Print the Entire Data Frame: print(df)
 View the Structure: The str() function gives a compact display of the structure of the data frame.
str(df)
 View Summary: Use the summary() function to see descriptive statistics of numeric columns and
the distribution of categorical columns. summary(df)
 Viewing Top/Bottom Rows:
o head(df) displays the first few rows.
o tail(df) displays the last few rows.
3. Accessing Data from Data Frames:
 By Column Name:
Syntax: df$column_name
Example: student_data$Name # Outputs the "Name" column
 By Index: Access specific rows and columns using the row and column index
Syntax: df[row_index, column_index]
Example:
student_data[1, 2] # Accesses the element in the 1st row, 2nd column
student_data[1, ] # Accesses the entire 1st row
student_data[, 2] # Accesses the entire 2nd column
 Subset Columns/Rows: You can subset specific columns or rows using the subset() function.
Syntax: subset(df, select = c(column1, column2))
Example: subset(student_data, select = c(Name, Score))
4. Modifying Data in Data Frames:
 Modify Specific Elements: Modify specific elements of a data frame by indexing.
Syntax: df[row, column] <- new_value
Example: student_data[1, 2] <- 95 # Change the 1st student's score
 Modify Entire Columns:
Syntax: df$column_name <- new_data
Example: student_data$Score <- c(85, 90, 80) # Modify the entire "Score" column
5. Adding New Columns:
 Adding a New Column Directly: Append a new column by assigning a new vector to a new
column name.
Syntax: df$new_column <- c(data)
Example: student_data$Grade <- c("A", "A", "B")
 Using cbind() to Add a Column: df <- cbind(df, new_column_name = c(data))
6. Adding New Rows: Using rbind() to Add a Row.
Syntax :
new_row <- [Link](Name = "David", Score = 88, Grade = "B")
df <- rbind(df, new_row)
7. Renaming Columns: Using names() Function rename the columns of a data frame using names().
8. Deleting Columns/Rows:
 Removing Columns: remove a column by setting it to NULL.
Syntax: df$column_name <- NULL
Example: student_data$Grade <- NULL # Removes the "Grade" column

SESHADRIPURAM COLLEGE TUMKUR Page 22


Unit-1 Statistical Computing & R Programming

 Removing Rows: remove rows by using negative indices.


Syntax: df <- df[-row_index, ]
Example: student_data <- student_data[-2, ] # Removes the 2nd row

Data Frame Functions

 dim(df): Returns the dimensions of the data frame (number of rows and columns).
 nrow(df): Returns the number of rows.
 ncol(df): Returns the number of columns.
 colnames(df): Returns or sets the column names of the data frame.
 rownames(df): Returns or sets the row names of the data frame.
 merge(df1, df2): Combines two data frames based on common columns or row names.

Difference Between Data Frames and Other Data Structures

Feature Data Frame Matrix List Vector


1D, can hold
Dimensionality 2D (rows and columns) 2D (rows and columns) 1D
heterogeneous data
Heterogeneous (different Homogeneous (all Heterogeneous (can Homogeneous (all
Data Types columns can hold elements must be the hold different types of elements must be the
different data types) same type) data) same type)
Indexing Rows and columns Rows and columns Indexed by position Indexed by position
Column
Yes No No No
Names
Row Names Yes No No No
Less flexible, restricted Fixed-size, can’t hold
Modifiability Flexible Very flexible
by homogeneity different data types

SESHADRIPURAM COLLEGE TUMKUR Page 23


Unit-1 Statistical Computing & R Programming

Special values in R

NA, NaN, Inf, -Inf and NULL are special values in R.

a) NA: NA stands for ‘not available’. NA are used to represent missing values. If you expand the size of a
vector (or matrix or array) beyond the size where values were defined, the new spaces will have the value
NA. [Link]() function is used to check if a value is NA.
[Link]() function returns a logical vector of the same size as x with value TRUE if and only if the
corresponding element in x is NA. If a numeric vector has NA value, then we cannot compute statistical
operations (eg mean, sum etc). The [Link]() will help to remove the NA values and then we can perform
the statistical operations as explained in example below.

# create x vector with two missing values dented by NA

x<- c(5,6,7,NA,10, 11, NA)

print(x)

5 6 7 NA 10 11 NA

# find sum of x vector elements

sum(x) # will show ouput as NA since the vector x has missing values

NA

# find sum of vector by removing NA using [Link] ==TRUE

sum(x, [Link] = TRUE)

39 # displays sum = 39

# find mean of Vector x, will show NA since x has missing values

mean(x)

NA

# find mean of vector by removing NA using [Link] ==TRUE

mean(x,[Link]=TRUE)

7.8 # display mean = 7.8

# create vector

x <- c(1, NA, 12, NA, 50, 30)

# remove NA values from x and copy to x_new

x_new <- x[![Link](x)]

SESHADRIPURAM COLLEGE TUMKUR Page 24


Unit-1 Statistical Computing & R Programming

x_new # print output after removing NA

1 12 50 30

# replace NA by some constant numeric value

> x[[Link](x)] <- 0

> x # display x with NA replaced by 0

5550550

b) NaN stands for Not a Number and applies to numerical values, as well as real and imaginary parts of
complex values, but not to values of integer vector. NaN usually appears when you divide 0 by 0, find log
of negative number etc.

x= 0
y=0
print(x/y) # display NaN indicating Not a Number
x = -5
print(log(x) # display NaN indicating Not a Number
# create vector with NaN values
x <-c(21, 12, NaN, NaN, 19, NaN, -25)
print(x) # displays 21 12 NaN NaN 19 NaN -25
# counts number of NaN in vector x using sum([Link](x))
sum([Link](x)) will print 3 indicating total number of NaN in x
# displays positions with NaN using which([Link]())
print(which([Link](x)) # prints 3,4, 6 indicating positions of NaN in x
# display Vector after removing NaN

x_new <- x[![Link](x)]

x_new # will display after removing NaN 21 12 19 -25

# replace NaN value by some value

x[[Link](x)] <- 5 # replace NaN by 5


x # will print 21 12 5 5 19 5 -25

c) Inf and -Inf stands for infinity (or negative infinity) and is a result of storing either a large number or a
product that is a result of division by zero. Infinite value is not missing and not a NAN.
x= 5
y=0
print(x/y) # display inf indicating infinity
x= -5
y=0
print(x/y) # display -inf indicating infinity
print(x^ 2000) # display inf

SESHADRIPURAM COLLEGE TUMKUR Page 25


Unit-1 Statistical Computing & R Programming

d) NULL: NULL is used to indicate or specify that an object is absent.

Example: A vector of zero length will show NULL value.

x<-c() # to create empty vector

print(x) # will display NULL

NULL

# to check if x is null using [Link]() function

[Link](x)

TRUE # output is true

# create vector with three elements and check if null using [Link]()

x<-c(4,3,7)

[Link](x)

FALSE # output is false

print(x)

437

Example: A list with two vectors say x and y. If you try to access 3 rd element in list, it will show null

X <-c(4,5,6) # create numerical vector

Y<- c(‘a’,’b’,’c’) # create character vector

L = list(X,Y) # create list with X and Y as element

Length (L) # will display 2 as length of list L with two elements x ,Y

rd
L[3] # display 3 element
rd
NULL # output is NULL, since L doesn’t have 3 element

SESHADRIPURAM COLLEGE TUMKUR Page 26


Unit-1 Statistical Computing & R Programming

Coercion: Coercing of an object from one type of class to another is known as coercion.

Function Description Example


[Link] Converts the value to logical type. x<- c(5, 1,0,8)
Converts Zero to False and nonzero to [Link](x)
True #output TRUE TRUE FALSE TRUE

[Link](x) Converting object to character type x<- c(5, 1, 8) #numeric vector


[Link](x)
#output ‘5’, ‘1’, ‘8’

X<- TRUE # logical


[Link](x)
# output “TRUE”
[Link]() Converts the object to integer type

[Link]() Converts the object to double precision x<- c(5, 1, 8)


type
[Link]() Converts the object to complex type x<- c(5, 1, 8)
[Link](x)
# output 5+0i 1+0i 8+0i

[Link]() It accepts only dictionary type or vector as x<-c(88,99)


input arguments in the parameter [Link](x)
#output
[[1]]
[1] 88

[[2]]
[1] 99
[Link]() x<- FALSE
Character or Logical to Numeric type [Link](x)
# output is 0, 0 represents false
X<-TRUE
[Link](x)
# output is 1, 1 represents true

x<-“543”
[Link](x)
# output 543

Matrix to vector: using [Link](matrix_name)

Vectors to Matrix: can be done in two ways

(i) rbind(vector1, vector2, vector3…..vectorN) # row wise matix


(ii) cbind(vector1, vector2, vector3…..vectorN) # column wise matrix

Note : Similarly we can convert matrix to dataframe and dataframe to matrix.

SESHADRIPURAM COLLEGE TUMKUR Page 27


Unit-1 Statistical Computing & R Programming

Sample questions:
1. What is R programming? Who developed?
2. List the difference between vector and list.
3. Mention two applications of R programming.
4. Explain NA and NAN with example.
5. Explain different types of operators in R.
6. What is a data frame? Explain any two operations on data frame.
7. Explain different matrix operations functions in R.
8. Explain different data structures in R in detail.
9. Explain features of R programming.
10. Explain special values in R.

Topics covered

1) Introduction, Comments, tokens (keywords, identifiers, operators, data types)


2) R data structure
3) Special values
4) Coercion

SESHADRIPURAM COLLEGE TUMKUR Page 28

Common questions

Powered by AI

R is renowned for its excellent data visualization capabilities, offering powerful graphics packages like ggplot2, lattice, and plotly, which allow for detailed and aesthetically pleasing visual representations of data . These tools enable complex visualizations that are highly customizable and are integral in exploratory data analysis. Python also provides good visualization tools such as Matplotlib and Seaborn; however, it is generally regarded as less specialized in creating intricate statistical plots compared to R . While Python’s visualization libraries are robust and versatile, R’s libraries are specifically designed with statistical applications in mind, often making them more suitable for in-depth and nuanced data visual exploration .

R excels in statistical analysis and data visualization, offering specialized libraries such as ggplot2 and dplyr, which are highly effective for data science tasks . However, R is slower for large-scale computational tasks and can put pressure on memory management, potentially consuming all available memory . Python, on the other hand, is a general-purpose language with extensive libraries like NumPy and Pandas that are optimized for data manipulation and machine learning tasks, often offering better performance and speed . Furthermore, Python's syntax is easier to learn and is widely adopted for various applications, contributing to its large community and support. These differences make Python more suitable for scalable performance in general-purpose and large computational tasks, whereas R is favored mainly for statistical analysis and visualization .

R supports machine learning through packages like caret, randomForest, and e1071, which aid in building models for decision trees, random forests, linear regression, and support vector machines (SVM). It provides tools for model evaluation and operations such as clustering and dimensionality reduction. However, R's limitations compared to languages like Python include fewer machine learning libraries and slower performance for large tasks . Python offers extensive frameworks such as TensorFlow and PyTorch, which are widely used for deep learning and other advanced machine learning applications, often providing better scalability and efficiency . Consequently, while R is strong in statistical modeling, its capabilities in extensive machine learning applications are comparatively limited .

R is preferred for data analysis and statistical computing due to several key features: comprehensive statistical analysis capabilities, advanced data visualization with packages like ggplot2 and plotly, a vast array of over 10,000 packages in the CRAN repository, platform independence across Windows, macOS, and Linux, integration capabilities with languages like C, C++, Python, Java, and SQL, efficient data handling and storage, and strong community support that offers extensive resources . R also supports reproducible research with tools like R Markdown and Knitr, making it a powerful tool for professionals in various fields .

R's open-source nature has significantly contributed to its growth and community support. Being open source means R is free to use, allowing extensive collaboration and contribution from developers worldwide. This has resulted in a vibrant community that constantly enhances R’s capabilities by developing new packages and tools, addressing bug fixes, and providing extensive documentation and support through forums and resources . The openness promotes innovation, embedding R at the forefront of statistical and data analysis advancements, as new technologies often appear first in R . This collaborative environment fosters continuous improvement and extensive support, ensuring that R evolves to meet emerging analytical needs .

Vectors in R are homogeneous data structures, meaning all elements must be of the same type, such as numeric, character, or logical. They are commonly used for storing simple data sequences and support element-wise operations . In contrast, lists in R are heterogeneous, allowing elements of different types, such as numbers, strings, and even other lists, making them suitable for complex datasets with varied data types. Unlike vectors, lists do not support direct arithmetic operations, but they offer flexibility for storing and accessing complex data .

R supports integration with other programming languages such as C, C++, Python, Java, and SQL, allowing it to interact seamlessly with various data sources and computational processes . This integration enhances R’s functionality by enabling the use of efficient algorithms and data handling capabilities from other languages, supplementing R’s own statistical and analytical strengths. For example, developers can write performance-critical functions in C or C++ and call them from R to improve execution speed. Similarly, by interfacing with Python, users can leverage Python's extensive machine learning libraries while benefiting from R's superior statistical analysis tools, thus creating a more comprehensive and powerful analytical environment .

R's memory management can present challenges, particularly when dealing with large datasets, as some of its operations may consume substantial memory, potentially exhausting available resources . This limitation arises because R loads entire datasets into memory, which can lead to inefficiencies and processing delays for large-scale computations. Users tackling big data problems need to optimize their R scripts and sometimes rely on external packages to manage memory usage effectively. This memory management issue can hinder R's performance on large data processing tasks, making it less suitable for applications requiring extensive computational efficiency unless mitigated with strategies like data chunking or integrating more efficient database systems .

The platform independence of R programming language allows it to run on various operating systems, including Windows, macOS, and Linux, providing flexibility and accessibility to a broad range of users . This feature ensures that R can be used in diverse computing environments without compatibility issues, facilitating its integration into different workflows and infrastructures. For educational settings, businesses, and industries that use multiple operating systems, this independence minimizes constraints and eases the adoption and deployment of R in various projects and collaborations, enhancing its usability and acceptance across global communities .

R facilitates reproducible research through tools like R Markdown and Knitr, which enable users to integrate code, text, and visualizations into dynamic documents. These tools allow data analyses and visualizations to be reproduced and shared efficiently, ensuring consistency in results and transparency in research methodologies . Reproducibility is crucial as it enhances the credibility and reliability of research findings, allowing others to verify results or build upon them without ambiguity or error. This feature is especially important in academic and scientific research, where reproducibility underpins the integrity of scholarly communication .

You might also like