SAGE University Indore
LAB MANUAL
Subject code :- ACTDSRPR001P
SUBJECT NAME :- R Programming Lab
SEMESTER: VI SEM III YEAR
Session: - Dec - may 2023
Institute of Advance Computing
Submitted to :- Submitted by:-
Prof. Bhupendra Mandloi Name:- Pranav Patidar
Enrollment No :- 21ADV3CSE0117
INDEX
[Link] Name of the Experiment Date of Page no. Remark
Experiment
1.
FUNDAMENTALS OF R
2.
VECTORS
3.
CONTROL STATEMENTS
4.
FUNCTIONS IN R
5.
MATRICES
6.
STRINGS
7.
LISTS
8.
ARRAYS IN R
9.
R FACTORS
10.
DATA FRAMES IN R
Experiment 1
FUNDAMENTALS OF R
What is R Programming Language?
● R programming is used as a leading tool for machine learning,
statistics, and data analysis. Objects, functions, and packages can
easily be created by R.
● It’s a platform-independent language. This means it can be
applied to all operating systems.
● It’s an open-source free language. That means anyone can install
it in any organization without purchasing a license.
● R programming language is not only a statistic package but also
allows us to integrate with other languages (C, C++). Thus, you
can easily interact with many data sources and statistical
packages.
● The R programming language has a vast community of users and
it’s growing day by day.
● R is currently one of the most requested programming languages
in the Data Science job market which makes it the hottest trend
nowadays
● It was designed by Ross Ihaka and Robert Gentleman at the
University of Auckland, New Zealand, and is currently being
developed by the R Development Core Team.
● R programming language is an implementation of the S
programming language. It also combines with lexical scoping
semantics inspired by Scheme. Moreover, the project was
conceived in 1992, with an initial version released in 1995 and a
stable beta version in 2000.
Why Use R?
● Statistical Analysis: R is designed for analysis and It provides an
extensive collection of graphical and statistical techniques, By
making a preferred choice for statisticians and data analysts.
● Open Source: R is an open – source software, which means it is
freely available to anyone. It can be accessble by a vibrant
community of users and developers.
● Data Visulaization : R boasts an array of libraries like ggplot2
that enable the creation of high-quality, customizable data
visualizations.
● Data Manipulation : R offers tools that are for data manipulation
and transformation. For example: IT simplifies the process of
filtering , summarizing and transforming data.
● Integration : R can be easily integrate with other programming
languages and data sources. IT has connectors to various
databases and can be used in conjunction with python, SQL and
other tools.
● Community and Packages: R has vast ecosystem of packages
that extend its functionality. There are packages that can help
you accomplish needs of analytics.
Features of R Programming Language
● R Packages: One of the major features of R is it has a wide
availability of libraries. R has CRAN(Comprehensive R Archive
Network), which is a repository holding more than 10, 0000
packages.
● Distributed Computing: Distributed computing is a model in
which components of a software system are shared among
multiple computers to improve efficiency and performance. Two
new packages ddR and multidplyr used for distributed
programming in R were released in November 2015.
Statistical Features of R
● Basic Statistics: The most common basic statistics terms are the
mean, mode, and median. These are all known as “Measures of
Central Tendency.” So using the R language we can measure
central tendency very easily.
● Static graphics: R is rich with facilities for creating and
developing interesting static graphics. R contains functionality for
many plot types including graphic maps, mosaic plots, biplots,
and the list goes on.
● Probability distributions: Probability distributions play a vital
role in statistics and by using R we can easily handle various
types of probability distributions such as Binomial Distribution,
Normal Distribution, Chi-squared Distribution, and many more.
● Data analysis: It provides a large, coherent, and integrated
collection of tools for data analysis.
Basic R program
Since R is much similar to other widely used languages syntactically, it is
easier to code and learn in R. Programs can be written in R in any of the
widely used IDE like R Studio, Rattle, Tinn-R, etc. After writing the
program save the file with the extension .r. To run the program use the
following command on the command line:
R file_name.r
R
# R program to print Welcome to GFG!
# Below line will print "Welcome to
GFG!"cat("Welcome to GFG!")
Output:
Welcome to GFG!
Advantages of R
● R is the most comprehensive statistical analysis package. As new
technology and concepts often appear first in R.
● As R programming language is an open source. Thus, you can run
R anywhere and at any time.
● R programming language is suitable for GNU/Linux and
Windows operating systems.
● R programming is cross-platform and runs on any operating
system.
● In R, everyone is welcome to provide new packages, bug fixes,
and code enhancements.
Disadvantages of R
● In the R programming language, the standard of some packages
is less than perfect.
● Although, R commands give little pressure on memory
management. So R programming language may consume all
available memory.
● In R basically, nobody to complain if something doesn’t work.
● R programming language is much slower than other
programming languages such as Python and MATLAB.
Applications of R
● We use R for Data Science. It gives us a broad variety of libraries
related to statistics. It also provides the environment for
statistical computing and design.
● R is used by many quantitative analysts as its programming tool.
Thus, it helps in data importing and cleaning.
● R is the most prevalent language. So many data analysts and
research programmers use it. Hence, it is used as a fundamental
tool for finance.
● Tech giants like Google, Facebook, Bing, Twitter, Accenture,
Wipro, and many more using R nowadays.
Experiment 2
VECTORS
R Vectors
Vectors
A vector is simply a list of items that are of the same type.
To combine the list of items to a vector, use the c() function and separate
the items by a comma.
In the example below, we create a vector variable called fruits, that combine
strings:
Example
# Vector of strings
fruits <- c("banana", "apple", "orange")
# Print fruits
fruits
In this example, we create a vector that combines numerical values:
Example
# Vector of numerical values
numbers <- c(1, 2, 3)
# Print numbers
Numbers
To create a vector with numerical values in a sequence, use the : operator:
Example
# Vector with numerical values in a sequence
numbers <- 1:10
numbers
You can also create numerical values with decimals in a sequence, but note
that if the last element does not belong to the sequence, it is not used:
Example
# Vector with numerical decimals in a sequence
numbers1 <- 1.5:6.5
numbers1
# Vector with numerical decimals in a sequence where the last element
is not used
numbers2 <- 1.5:6.3
numbers2
Result:
[1] 1.5 2.5 3.5 4.5 5.5 6.5
[1] 1.5 2.5 3.5 4.5 5.5
In the example below, we create a vector of logical values:
Example
# Vector of logical values
log_values <- c(TRUE, FALSE, TRUE, FALSE)
log_values
Vector Length
To find out how many items a vector has, use the length() function:
Example
fruits <- c("banana", "apple", "orange")
length(fruits)
Sort a Vector
To sort items in a vector alphabetically or numerically, use the sort()
function:
Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")
numbers <- c(13, 3, 5, 7, 20, 2)
sort(fruits) # Sort a string
sort(numbers) # Sort numbers
Access Vectors
You can access the vector items by referring to its index number inside
brackets []. The first item has index 1, the second item has index 2, and so
on:
Example
fruits <- c("banana", "apple", "orange")
# Access the first item (banana)
fruits[1]
You can also access multiple elements by referring to different index
positions with the c() function:
Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Access the first and third item (banana and orange)
fruits[c(1, 3)]
You can also use negative index numbers to access all items except the ones
specified:
Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Access all items except for the first item
fruits[c(-1)]
Change an Item
To change the value of a specific item, refer to the index number:
Example
fruits <- c("banana", "apple", "orange", "mango", "lemon")
# Change "banana" to "pear"
fruits[1] <- "pear"
# Print fruits
fruits
Repeat Vectors
To repeat vectors, use the rep() function:
Example
Repeat each value:
repeat_each <- rep(c(1,2,3), each = 3)
Repeat_each
Example
Repeat the sequence of the vector:
repeat_times <- rep(c(1,2,3), times = 3)
Repeat_times
Example
Repeat each value independently:
repeat_indepent <- rep(c(1,2,3), times = c(5,2,1))
repeat_indepent
Generating Sequenced Vectors
One of the examples on top, showed you how to create a vector with
numerical values in a sequence with the : operator:
Example
numbers <- 1:10
Numbers
To make bigger or smaller steps in a sequence, use the seq() function:
Example
numbers <- seq(from = 0, to = 100, by = 20)
Numbers
Experiment 3
CONTROL STATEMENTS
In R, control statements are used to control the flow of execution in a program. The main
control statements in R include:
1. **if-else:** It is used to execute a block of code if a specified condition is true, and
another block of code if the condition is false.
```r
if (condition) {
# code to be executed if condition is true
} else {
# code to be executed if condition is false
```
2. **if-else if-else:** It is an extension of the if-else statement to test multiple conditions.
```r
if (condition1) {
# code to be executed if condition1 is true
} else if (condition2) {
# code to be executed if condition2 is true
} else {
# code to be executed if none of the conditions are true
```
3. **for:** It is used to iterate over a sequence (like a vector, list, or data frame) and execute
a block of code for each element in the sequence.
```r
for (variable in sequence) {
# code to be executed
```
4. **while:** It is used to repeatedly execute a block of code as long as a specified condition
is true.
```r
while (condition) {
# code to be executed
```
5. **repeat:** It is used to repeatedly execute a block of code indefinitely until a break
statement is encountered.
```r
repeat {
# code to be executed
if (condition) {
break
}
}
```
6. **break:** It is used to terminate the execution of a loop (for, while, or repeat)
prematurely.
7. **next:** It is used to skip the rest of the current iteration of a loop and move to the next
iteration.
8. **switch:** It is used to select one of several blocks of code to execute based on the value
of an expression.
```r
switch(expression,
value1 = {
# code to be executed if expression equals value1
},
value2 = {
# code to be executed if expression equals value2
},
...
default = {
# code to be executed if none of the values match
```
These control statements allow you to implement conditional logic, looping, and branching in
R programs, enabling you to write more complex and flexible code.
Experiment 4
FUNCTIONS IN R
In R, functions are blocks of code that perform a specific task and can be reused throughout your
script or session. They take input arguments, perform operations, and return output values. Here's how
you define and use functions in R:
### Defining Functions:
You can define functions in R using the `function` keyword. The basic syntax is as follows:
```r
function_name <- function(arg1, arg2, ...) {
# Function body: code to be executed
# Return statement (optional)
return(output)
```
- `function_name`: Name of the function.
- `arg1`, `arg2`, ...: Input arguments to the function.
- `output`: Value to be returned by the function.
### Example:
Let's define a simple function that calculates the sum of two numbers:
```r
sum_numbers <- function(a, b) {
result <- a + b
return(result)
}
```
### Calling Functions:
You can call a function by its name and pass arguments to it:
```r
result <- sum_numbers(5, 3)
print(result) # Output: 8
```
### Function Arguments:
Functions in R can have both positional and named arguments. Named arguments allow you to
specify the order of arguments explicitly:
```r
result <- sum_numbers(b = 5, a = 3)
print(result) # Output: 8
```
### Default Arguments:
You can assign default values to function arguments, allowing users to omit them if desired:
```r
sum_numbers <- function(a, b = 0) {
result <- a + b
return(result)
}
result <- sum_numbers(5)
print(result) # Output: 5
```
### Variable-length Arguments:
Functions can accept a variable number of arguments using `...`:
```r
mean_value <- function(...) {
values <- c(...)
result <- mean(values)
return(result)
result <- mean_value(1, 2, 3, 4, 5)
print(result) # Output: 3
```
### Returning Values:
Functions return values using the `return()` statement. If no `return()` statement is present, the
function returns the last evaluated expression:
```r
add_numbers <- function(a, b) {
a+b
result <- add_numbers(5, 3)
print(result) # Output: 8
```
### Documentation:
Document your functions using comments and the `#'` syntax. You can then use `?function_name` or
`help(function_name)` to access the documentation.
```r
#' Calculate the sum of two numbers
#'
#' This function calculates the sum of two numbers.
#' @param a First number
#' @param b Second number
#' @return Sum of a and b
sum_numbers <- function(a, b) {
result <- a + b
return(result)
```
Functions are essential for structuring your code, promoting code reuse, and making your scripts more
readable and maintainable.
Experiment 5
MATRICES
In R, matrices are two-dimensional arrays that contain elements of the same data type. They are useful
for storing and manipulating data in a tabular format, such as in mathematical operations, statistical
analysis, and data manipulation tasks. Here's how you work with matrices in R:
### Creating Matrices:
You can create matrices in R using the `matrix()` function. The basic syntax is as follows:
```r
matrix(data, nrow, ncol, byrow, dimnames)
```
- `data`: A vector containing the elements of the matrix.
- `nrow`: Number of rows in the matrix.
- `ncol`: Number of columns in the matrix.
- `byrow`: A logical value indicating whether the matrix should be filled by rows (`TRUE`) or by
columns (`FALSE`). Default is `FALSE`.
- `dimnames`: Optional list providing names for the rows and columns.
### Example:
Let's create a simple 3x3 matrix:
```r
# Creating a matrix using matrix() function
mat <- matrix(1:9, nrow = 3, ncol = 3)
print(mat)
```
### Accessing Elements:
You can access elements of a matrix using square brackets `[ ]` with row and column indices:
```r
# Accessing elements of a matrix
element <- mat[2, 3]
print(element) # Output: 6
```
### Operations:
Matrices support various arithmetic and matrix operations, such as addition, subtraction,
multiplication, and transposition:
```r
# Matrix operations
mat1 <- matrix(1:6, nrow = 2)
mat2 <- matrix(7:12, nrow = 2)
# Addition
result_add <- mat1 + mat2
# Subtraction
result_sub <- mat1 - mat2
# Element-wise multiplication
result_mul <- mat1 * mat2
# Matrix multiplication
result_matmul <- mat1 %*% t(mat2) # Matrix multiplication requires transposing one of the
matrices
# Transposition
result_transpose <- t(mat1)
```
### Row and Column Operations:
You can perform operations on rows and columns using functions like `rowSums()`, `colSums()`,
`rowMeans()`, and `colMeans()`:
```r
# Row and column operations
row_sum <- rowSums(mat)
col_sum <- colSums(mat)
row_mean <- rowMeans(mat)
col_mean <- colMeans(mat)
```
### Binding Matrices:
You can bind matrices together horizontally or vertically using `cbind()` and `rbind()` functions:
```r
# Binding matrices
mat3 <- matrix(10:12, nrow = 1)
mat_combined <- rbind(mat, mat3) # Combine matrices vertically
```
### Manipulating Matrices:
You can manipulate matrices using functions like `dim()`, `rownames()`, and `colnames()`:
```r
# Manipulating matrices
dim(mat) # Get dimensions of the matrix
rownames(mat) # Get row names
colnames(mat) # Get column names
dimnames(mat) <- list(c("row1", "row2", "row3"), c("col1", "col2", "col3")) # Set row and
column names
```
Matrices are fundamental data structures in R and are widely used in various statistical and
computational tasks. Understanding how to create, manipulate, and perform operations on matrices is
essential for data analysis and manipulation in R.
Experiment 6
STRINGS
In R, "strings" typically refer to character vectors, which are sequences of characters. Here's a brief
overview of how you can work with strings in R:
1. **Creating Strings**: You can create strings using single quotes (`'`) or double quotes (`"`). For
example:
```R
string1 <- "Hello, World!"
string2 <- 'This is a string.'
```
2. **Concatenating Strings**: You can concatenate strings using the `paste()` function or the
`paste0()` function:
```R
string3 <- paste(string1, string2)
string4 <- paste0(string1, string2)
```
3. **String Manipulation**: There are several functions for manipulating strings in R, such as:
- `nchar()`: Get the number of characters in a string.
- `tolower()` and `toupper()`: Convert strings to lowercase or uppercase.
- `substring()`: Extract parts of a string.
- `gsub()`: Replace patterns in a string.
- `strsplit()`: Split a string based on a delimiter.
- `sprintf()`: Format strings with placeholders.
4. **Regular Expressions**: R supports regular expressions for advanced string manipulation tasks.
Functions like `grep()`, `grepl()`, `regexpr()`, and `sub()` allow you to work with regular expressions.
5. **String Comparison**: You can compare strings using operators like `==`, `!=`, `<`, `>`, etc. Keep
in mind that string comparison is case-sensitive by default.
6. **String Searching**: Functions like `grep()` and `grepl()` can be used for searching strings and
finding patterns within them.
7. **String Encoding**: R supports different character encodings, and you can convert between them
using functions like `iconv()`.
8. **String Operations**: R provides various functions for operations like trimming whitespace
(`trimws()`), padding strings (`str_pad()`), and finding substrings (`str_detect()`, `str_subset()`,
`str_extract()`, etc.) through the `stringr` package.
Here's a simple example demonstrating some of these operations:
```R
# Create strings
string1 <- "Hello"
string2 <- "World"
# Concatenate strings
combined <- paste(string1, string2)
# String manipulation
uppercase <- toupper(combined)
substring <- substr(combined, start = 1, stop = 5)
# Regular expressions
pattern <- "W"
contains_W <- grepl(pattern, combined)
# Output
print(combined)
print(uppercase)
print(substring)
print(contains_W)
```
This code will output the concatenated string, the uppercase version of it, a substring of the first 5
characters, and whether the string contains the letter "W".
Experiment 7
LISTS
In R, a list is a data structure that can hold elements of different types such as vectors, matrices, data
frames, or even other lists. Lists are quite flexible and useful for organizing and storing heterogeneous
data. Here's an overview of how to work with lists in R:
### Creating Lists:
You can create a list using the `list()` function, specifying the elements you want to include:
```R
# Create a list
my_list <- list(name = "John", age = 30, grades = c(85, 90, 95))
```
### Accessing Elements:
You can access elements of a list using double brackets `[[ ]]` or by using the dollar sign `$` followed
by the element name:
```R
# Access elements
name <- my_list$name
grades <- my_list[["grades"]]
```
### Adding Elements:
You can add elements to a list using the list subsetting notation or by using the `append()` function:
```R
# Add new elements
my_list[["city"]] <- "New York"
my_list <- append(my_list, list(gender = "Male"))
```
### Length and Structure:
You can find the length of a list using the `length()` function and view its structure using the `str()`
function:
```R
# Length of list
length_of_list <- length(my_list)
# Structure of list
str(my_list)
```
### List Manipulation:
You can manipulate lists in various ways, such as extracting subsets, combining lists, or removing
elements:
```R
# Extracting subset of list
subset_list <- my_list[c("name", "age")]
# Combining lists
new_list <- list(occupation = "Engineer", hobbies = c("Reading", "Hiking"))
combined_list <- c(my_list, new_list)
# Removing elements
updated_list <- my_list[-2] # Removes the second element
```
### Nested Lists:
You can create nested lists, which are lists that contain other lists:
```R
# Nested lists
nested_list <- list(
person1 = list(name = "Alice", age = 25),
person2 = list(name = "Bob", age = 30)
# Accessing elements of nested list
nested_name <- nested_list$person1$name
```
Lists are powerful data structures in R, especially when dealing with complex and heterogeneous data.
They provide flexibility and ease of manipulation for various analytical tasks.
Experiment 8
ARRAYS IN R
In R, arrays are multidimensional data structures that can hold elements of the same data type. Arrays
can have one or more dimensions, and they are particularly useful for representing data in multiple
dimensions, such as matrices or higher-dimensional data. Here's an overview of how to work with
arrays in R:
### Creating Arrays:
You can create arrays using the `array()` function, specifying the data elements and dimensions:
```R
# Create a 3x3 array
my_array <- array(1:9, dim = c(3, 3))
```
You can also create arrays from vectors using the `dim()` function:
```R
# Create a 2x2x2 array from a vector
my_vector <- 1:8
dim(my_vector) <- c(2, 2, 2)
```
### Accessing Elements:
You can access elements of an array using indexing for each dimension:
```R
# Access elements
element <- my_array[1, 2] # Accesses element in the first row, second column
```
### Dimensions and Attributes:
You can find the dimensions of an array using the `dim()` function and view its structure using the
`str()` function:
```R
# Dimensions of array
dimensions <- dim(my_array)
# Structure of array
str(my_array)
```
### Array Manipulation:
You can manipulate arrays in various ways, such as reshaping, transposing, or combining them:
```R
# Reshaping array
reshaped_array <- array(1:12, dim = c(3, 4)) # Create a 3x4 array
dim(reshaped_array) <- c(2, 6) # Reshape to a 2x6 array
# Transposing array
transposed_array <- t(my_array)
# Combining arrays
combined_array <- cbind(my_array, my_array) # Combine arrays column-wise
```
### Array Operations:
You can perform various operations on arrays, such as element-wise arithmetic operations:
```R
# Element-wise arithmetic
result_array <- my_array * 2 # Multiply each element by 2
```
### Higher-Dimensional Arrays:
You can create arrays with more than two dimensions, which are often used in applications like image
processing or simulations:
```R
# Create a 3D array
my_3d_array <- array(1:24, dim = c(2, 3, 4))
```
Arrays are versatile data structures in R, particularly useful for handling multidimensional data sets.
They provide efficient storage and manipulation capabilities for complex data analysis tasks.
Experiment 9
R FACTORS
In R, a factor is a data type used for categorical variables. Factors are useful for representing data that
have a fixed number of unique values or levels, such as "male" and "female" for gender or "low",
"medium", and "high" for levels of education.
Here's an overview of working with factors in R:
### Creating Factors:
You can create factors using the `factor()` function:
```R
# Create a factor
gender <- factor(c("male", "female", "male", "female"))
```
By default, R will assign levels to the factor based on the unique values in the data. You can also
specify the levels explicitly:
```R
# Specify levels explicitly
education <- factor(c("low", "high", "medium"), levels = c("low", "medium", "high"))
```
### Viewing Factor Levels:
You can view the levels of a factor using the `levels()` function:
```R
# View levels
education_levels <- levels(education)
```
### Summary of Factor:
You can get a summary of the factor using the `summary()` function:
```R
# Summary of factor
summary(gender)
```
This will display the count of each level in the factor.
### Converting to Factor:
You can convert character vectors to factors using the `[Link]()` function:
```R
# Convert to factor
age_group <- c("young", "middle-aged", "senior", "young")
age_group_factor <- [Link](age_group)
```
### Operations with Factors:
Factors can be used in various operations like subsetting, merging, or modeling.
### Changing Factor Levels:
You can change the levels of a factor using the `levels()` function:
```R
# Change factor levels
levels(age_group_factor) <- c("Junior", "Middle", "Senior")
```
### Removing Levels:
You can remove unused levels from a factor using the `droplevels()` function:
```R
# Remove unused levels
gender <- droplevels(gender)
```
This can be useful when subsetting data and you want to remove levels that are not present in the
subset.
Factors play an essential role in statistical modeling and data analysis in R, particularly for
representing categorical variables in a structured and efficient way.
Experiment 10
DATA FRAMES IN R
In R, a data frame is a two-dimensional data structure similar to a table or spreadsheet in which data is
organized into rows and columns. Data frames are one of the most commonly used data structures in
R for handling and manipulating structured data. Here's an overview of working with data frames in
R:
### Creating Data Frames:
You can create a data frame using the `[Link]()` function, specifying the columns as vectors:
```R
# Create a data frame
df <- [Link](
name = c("John", "Alice", "Bob"),
age = c(30, 25, 35),
gender = c("Male", "Female", "Male")
```
### Viewing Data Frames:
You can view the contents of a data frame using the `print()` function or simply by typing its name:
```R
# View data frame
print(df)
```
### Accessing Elements:
You can access elements of a data frame using row and column indexes or by column names:
```R
# Access elements
first_row <- df[1, ] # Access the first row
ages <- df$age # Access the "age" column
```
### Summary of Data Frame:
You can get a summary of the data frame using the `summary()` function:
```R
# Summary of data frame
summary(df)
```
This will provide a summary of each column in the data frame, including statistics such as mean,
median, min, max, and quartiles for numeric variables.
### Subsetting Data Frames:
You can subset data frames to extract specific rows or columns using indexing:
```R
# Subset data frame
subset_df <- df[df$age > 30, ] # Select rows where age is greater than 30
```
### Adding Columns:
You can add new columns to a data frame using assignment:
```R
# Add new column
df$profession <- c("Engineer", "Doctor", "Teacher")
```
### Removing Columns:
You can remove columns from a data frame using the `subset()` function:
```R
# Remove column
subset_df <- subset(df, select = -gender)
```
### Manipulating Data Frames:
You can perform various operations on data frames such as merging, sorting, and reshaping using
functions like `merge()`, `order()`, and `reshape2` package functions like `melt()` and `dcast()`.
### Working with Factors in Data Frames:
Factors are often used to represent categorical variables in data frames, as discussed in the previous
explanation.
Data frames are fundamental for data manipulation, exploration, and analysis in R, and mastering
their usage is essential for effective data handling and manipulation in R programming.