0% found this document useful (0 votes)
19 views36 pages

Unit-1 (R Programming)

R is a programming language designed for statistical computing and data analysis, developed in the early 1990s, and is open-source with extensive package support. It is widely used across various industries for data science, finance, healthcare, and academia due to its comprehensive statistical tools and visualization capabilities. R has a rich history tied to the S programming language and has evolved significantly, with modern tools like RStudio and the Tidyverse enhancing its usability.

Uploaded by

tannutannu3849
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views36 pages

Unit-1 (R Programming)

R is a programming language designed for statistical computing and data analysis, developed in the early 1990s, and is open-source with extensive package support. It is widely used across various industries for data science, finance, healthcare, and academia due to its comprehensive statistical tools and visualization capabilities. R has a rich history tied to the S programming language and has evolved significantly, with modern tools like RStudio and the Tidyverse enhancing its usability.

Uploaded by

tannutannu3849
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit-1

Introduction of R:-
R is a programming language designed for statistical computing, data analysis and visualization.
Developed in the early 1990s by Ross Ihaka and Robert Gentleman, it provides a flexible
environment for working primarily with structured (tabular) data, handling unstructured data
typically requires additional packages
 Specifically built for statistical analysis and data modeling
 Open-source and freely available to everyone
 Supported by thousands of packages via the Comprehensive R Archive Network
 Widely used for data analysis and decision-making across industries
Why Choose R Programming:-
R is a unique language that offers a wide range of features for data analysis, making it an
essential tool for professionals in various fields. Here’s why R is preferred:
 Free and Open-Source: R is open to everyone, meaning users can modify, share and
distribute their work freely.
 Designed for Data: R is built for data analysis, offering a comprehensive set of tools for
statistical computing and graphics.
 Large Package Repository: The Comprehensive R Archive Network (CRAN) offers
thousands of add-on packages for specialized tasks.
 Cross-Platform Compatibility: R can work on Windows, Mac and Linux operating
systems.
 Great for Visualization: With packages like ggplot2, R makes it easy to create informative,
interactive charts and plots.

Key Features of R:-


 Cross-Platform Support: R works on multiple operating systems, making it versatile for
different environments.
 Interactive Development: R allows users to interactively experiment with data and see the
results immediately.
 Data Wrangling: Tools like dplyr and tidyr help simplify data cleaning and transformation.
 Statistical Modeling: R has built-in support for various statistical models like regression,
time-series analysis and clustering.
 Reproducible Research: With R Markdown, users can combine code, output and narrative
in one document, ensuring their analysis is reproducible.
Example Program in R:-
 We first create a vector data that contains numerical values.
 We use the mean() function to calculate the mean of the dataset.
 The sd() function calculates the standard deviation.

data <- c(5, 10, 15, 20, 25, 30, 35, 40, 45, 50)
mean_data <- mean(data)

print(paste("Mean: ", mean_data))

std_dev <- sd(data)

print(paste("Standard Deviation: ", std_dev))


Applications of R:-
R is used in a variety of fields, including:
 Data Science and Machine Learning: R is widely used for data analysis, statistical
modeling and machine learning tasks.
 Finance: Financial analysts use R for quantitative modeling and risk analysis.
 Healthcare: In clinical research, R helps analyze medical data and test hypotheses.
 Academia: Researchers and statisticians use R for data analysis and publishing reproducible
research.
Advantages of R Programming:-
 Comprehensive Statistical Tools: R includes many statistical functions and models,
making it the ideal choice for data analysis.
 Customizable Visualizations: R’s visualization tools allow for customizations for a simple
bar chart or a detailed heatmap.
 Extensive Community Support: R has a large user base and there are countless resources,
forums and tutorials available.
 Highly Extendable: The availability of over 15,000 R packages means we can extend R's
functionality to suit any project or need.
Limitations of R Programming:-
 Can consume high memory with very large datasets
 Slower execution speed for large-scale computations
 Syntax may be challenging for beginners
 Error handling is less structured compared to some modern languages

History and Evolution of R:-

1. Roots in S (1970s)

The development of R is closely tied to the S programming language, created by John


Chambers and his colleagues at Bell Laboratories in 1976.
S laid the foundation for modern statistical computing and influenced R’s design and syntax.
2. Origins of R (Early 1990s)

R was developed in 1991 by Ross Ihaka and Robert Gentleman at the University of Auckland.
The name “R” reflects both:

 A play on the language S


 The initials of its creators

3. Public Release (1993–1995)

 1993: First announced publicly via StatLib.


 1995: With encouragement from Martin Mächler, R was released under the GNU
General Public License.
This made R free and open-source software, accelerating its adoption.

4. Infrastructure Development (1997)

 Formation of the R Core Team to maintain and develop R.


 Creation of the Comprehensive R Archive Network (CRAN), a central repository for R
packages and distributions.

5. Stable Release (2000)

 R version 1.0.0 was officially released on February 29, 2000, marking R as a mature
and stable platform for statistical computing.

6. Modern Era (2011–Present)

 2011: Launch of RStudio, a powerful integrated development environment that made R


more user-friendly.
 2016: Introduction of the Tidyverse, a set of packages that simplified data manipulation,
visualization, and analysis.

Today, R is widely used in:

 Data science
 Machine learning
 Academic research
 Business analytics

R and R studio setup:-

1. Install R (first, always)

R is the core engine—you must install it before RStudio.


1. Go to the official site:
👉 [Link]
2. Choose your OS (Windows / macOS / Linux)
3. Download and install the latest version
4. Use default settings unless you have specific needs

💻 2. Install RStudio

RStudio is the interface that makes R easier to use.

1. Go to:
👉 [Link]
2. Download RStudio Desktop (free version)
3. Install it like any normal software

🚀 3. Verify Installation

1. Open RStudio
2. You should see:
o Console (bottom left)
o Script editor (top left)
3. Test with this command:

print("Hello World")

If it runs without error, you’re set 👍

📦 4. Install Useful Packages

Run this in the console:

[Link]("tidyverse")
[Link]("ggplot2")

Load them:

library(tidyverse)

⚙️5. Optional Setup Tips

 Set CRAN mirror (RStudio usually prompts you)


 Change theme:
Tools → Global Options → Appearance
 Enable autosave for scripts
🛠️Common Issues

 R not detected in RStudio → reinstall R first, then RStudio


 Permission errors → run installer as admin
 Package install fails → check internet or mirror

Basic syntax in R:-

x <- 10

name <- "John"

Running R Programs:-

1. Run Code in Console (Quick Way)

You can directly type commands in the Console:

print("Hello World")

Press Enter → it runs immediately.

2. Run a Script File (Recommended)

Step 1: Create a script

 In RStudio: File → New File → R Script

Step 2: Write code


x <- 5
y <- 10
print(x + y)
Step 3: Run code

You have 3 options:

 Click Run button (top right of script editor)


 Press Ctrl + Enter (runs selected line)
 Select multiple lines → Ctrl + Enter

3. Run Entire Script

To run the whole file:


 Click Source button
OR
 Press Ctrl + Shift + Enter

4. Run an External .R File

If you already have a file:

source("file_name.R")

Example:

source("my_script.R")

5. Set Working Directory

R needs to know where your file is:

getwd() # check current folder


setwd("C:/Users/YourName/Documents")

Or in RStudio:

 Session → Set Working Directory → Choose Directory

6. Example Full Program


# Simple program
numbers <- c(1, 2, 3, 4, 5)

result <- sum(numbers)

print(result)

Run it using Source or Ctrl + Enter

Data Types in R:-



Data types in R specify the type of information stored in a variable and determine how it
behaves during calculations and analysis. They define how data is represented in memory and
how functions interpret it. R is a dynamically typed language, so the data type is assigned
automatically when a value is created.
 Choosing the correct data type improves program performance and memory efficiency.
 Proper data types ensure accurate mathematical, logical and statistical computations.
 Selecting the right data type simplifies data processing and improves overall code clarity.

1. Numeric Data Type:-

In R, numbers with decimal points are called numeric. This is the default data type for numbers
and it is used to store real values for calculations. Numeric values are stored using double-
precision floating-point format, which allows accurate representation of decimal numbers.

# Numeric variable with decimal

x <- 5.6

print(class(x))

print(typeof(x))

# Integer-like number without decimal

y <- 5

print(class(y))

print(typeof(y))

# Check if y is an integer

print([Link](y))

2. Integer Data Type

The integer data type is used for storing numbers without decimal points. Integers can be created
using the [Link]() function or by adding the suffix L to a number. This explicitly tells R to
store the value as an integer rather than the default double type.
 Integer values in R are stored as 32-bit signed integers, with a range from -231 to 231-1.
 Integers are used when exact whole numbers are required, such as counts, indexes or
categorical numeric codes.
# Creating integer using [Link]()
x <- [Link](5)
print(class(x))
print(typeof(x))
# Creating integer using L suffix
y <- 5L
print(class(y))
print(typeof(y))

3. Logical Data Type

Logical data types in R represent Boolean values as TRUE or FALSE. Logical values are often
created using comparisons between variables or by directly assigning Boolean values. This data
type is used in decision-making, conditional statements and filtering data.
# Creating a logical value using comparison
x <- 4
y <- 3
z <- x > y
print(z)
print(class(z))
print(typeof(z))

# Creating a logical value using direct assignment


logi <- FALSE
print(class(logi))
print(typeof(logi))

4. Complex Data Type

Complex data types are used to store numbers with both real and imaginary components. The
imaginary part is denoted using the suffix i. Complex numbers are useful in scientific
computations, signal processing and mathematical modeling where imaginary numbers are
required.

# Creating a complex number

x <- 4 + 3i

print(class(x))

print(typeof(x))
5. Character Data Type in R

Character data types are used to store text, including alphabets, numbers and special symbols.
Character values also called strings are enclosed in single (') or double (") quotes. This data type
is commonly used for names, labels, messages and textual information in datasets.

# Creating a character variable

char <- "Geeksforgeeks"

print(class(char))

print(typeof(char))

6. Raw Data Type in R

The raw data type in R is used to store and manipulate data at the byte level. It represents
unprocessed binary values, making it useful for low-level operations such as working with files,
network data or binary protocols. Raw vectors consist of elements in the range 00 to FF
(hexadecimal notation).

# Creating a raw vector

x <- [Link](c(0x1, 0x2, 0x3, 0x4, 0x5))

print(x)

Variables:-
A variable is a name that refers to a value or object in R, allowing you to store and manipulate
data and the name assigned to it allows you to access stored value. It acts as an identifier for the
memory block, which can hold values of different data types during the program’s execution.
 Variables store values in memory that can be accessed or updated later.
 R variables are dynamically typed their type is determined by the assigned value.
 The assignment operator <- is the standard way to assign values, though = can also be used.
How to Create Variables in R
In R a variable is created by assigning a value to a name. R supports three ways to assign values
to variables:
1. Using the Equal Operator (=)

While = can be used for assignment, <- is the preferred and widely used operator in R as it
clearly indicates assignment and avoids confusion with other uses.
# Assign a string using equal operator
var1 = "Hello Geeks"
print(var1)

Output

[1] "Hello Geeks"

2. Using the Leftward Operator (<-)

The leftward operator <- assigns a value from the right side to the variable on the left. It is
widely used in R because it makes the direction of assignment clear and helps distinguish
assignment from comparison.
# Assign a string using leftward operator
var2 <- "Ready to code"
print(var2)

Output

[1] "Ready to code"

3. Using the Rightward Operator (->)

The rightward operator -> assigns a value from the left side to the variable on the right. It works
the same way as <-, but the direction of assignment is reversed.
# Assign a string using rightward operator
"Byte-by-Byte" -> var3
print(var3)

Output

[1] "Byte-by-Byte"
Rules for naming a variables:-

1. Start with a letter or dot .

✔️x, name, .value


❌ 1name

2. Use only letters, numbers, _ or .

✔️marks_1, [Link]
❌ marks-1, total@score

3. Don’t use spaces

✔️student_name
❌ student name

4. R is case-sensitive

age ≠ Age ≠ AGE

5. Don’t use keywords or TRUE/FALSE

❌ if, for, TRUE, FALSE

Example:- age <- 20

student_name <- "Rahul"

total_marks <- 95

Scope of Variables in R programming:-


Scope of a variable determines where it can be accessed or used in a program. Understanding
variable scope helps prevent errors and manage data effectively. There are mainly two types of
variable scopes:

1. Global Variables

Global variables are defined outside any function and can be accessed or modified from
anywhere in the program. They exist for the entire duration of the program unless explicitly
removed.
 Remain in memory throughout the program which may increase memory usage.
 Can cause naming conflicts if multiple parts of the program use the same name.
 Defined outside functions and exist until the program ends or the variable is deleted.
global <- 5

display <- function(){


global <- 20
print(global)
}

display()
print(global)

2. Local Variables

Local variables are created inside a function or a specific block of code and can only be used
within that block. They exist only while the function is running and are removed from memory
once the function finishes.
 Defined inside a function and accessible only within that function.
 Exist only during the function’s execution and are destroyed after the function ends.
 Helps avoid conflicts with variables in other parts of the program.
 Uses memory only when needed and is removed afterward.

my_function <- function() {


local_var <- 10 # This is a local variable
print(local_var)
}

my_function()
print(local_var)

Important Methods for R Variables


R provides several built-in functions to work with variables. Understanding these functions
makes managing variables easier, especially in large programs.

1. class() Function

This built-in function is used to determine the data type of the variable provided to it. The R
variable to be checked is passed to this as an argument and it prints the data type in return.
Syntax:
class(variable)
Example:
var1 <- "HI Geeks 001"
print(class(var1))

Output

[1] "character"

2. ls() function

This built-in function is used to know all the present variables in the workspace. This is
generally helpful when dealing with a large number of variables at once and helps prevents
overwriting any of them.
Syntax:
ls()
Example:
var1 <- "hello"
var2 <- 20
var3 <- TRUE

print(ls())

Output

[1] "var1" "var2" "var3"

3. rm() function

rm() is a built-in function used to delete an unwanted variable within your workspace. This
helps clear the memory space allocated to certain variables that are not in use thereby creating
more space for others. The name of the variable to be deleted is passed as an argument to it.
Syntax:
rm(variable)
Example:
var3 <- "hello"
# Remove var3
rm(var3)
print(var3)
Output:
Error: object 'var3' not found
4. exists() Function

The exists() function checks whether a variable exists in the workspace. It returns TRUE if the
variable exists, otherwise FALSE.
Syntax:
exists("variable_name")
Example:
var1 <- 10
print(exists("var1"))
print(exists("varX"))

Output

[1] TRUE
[1] FALSE

R Operators:-

Operators in R are symbols that perform operations on variables and values (operands). They
allow you to carry out mathematical calculations, logical comparisons, assignments and other
operations efficiently.

Arithmetic Operators
Arithmetic operators perform mathematical operations on numeric values or vectors. In R, these
operations are applied element-wise when working with vectors.

1. Addition (+)

The values at the corresponding positions of both operands are added.


a <- c (1, 0.1)
b <- c (2.33, 4)
print (a+b)

Output

[1] 3.33 4.10


2. Subtraction (-)

The second operand values are subtracted from the first.


a <- 6
b <- 8.4
print (a-b)

Output

[1] -2.4

4. Division (/)

The first operand is divided by the second operand with the use of the '/' operator.
a <- 10
b <- 5
print (a/b)

Output

[1] 2

5. Power (^)

The first operand is raised to the power of the second operand.


a <- 4
b <- 5
print(a^b)

Output

[1] 1024

6. Modulo (%%)

It returns the remainder after dividing the first operand by the second operand.
a<- c(2, 22)
b<-c(2,4)
print(a %% b)

Output

[1] 0 2

Logical Operators
Logical Operators in R simulate element-wise decision operations, based on the specified
operator between the operands, which are then evaluated to either a True or False boolean value.
Any non-zero integer value is considered as a TRUE value, be it a complex or real number.

1. Element-wise AND (&)

Returns True if both the operands are True.


a <- c(TRUE, 0.1)
b <- c(0,4+3i)
print(a & b)

Output

[1] FALSE TRUE

2. Element-wise OR (|)

Returns True if either of the operands is True.


a <- c(TRUE, 0.1)
b <- c(0,4+3i)
print(a|b)

Output

[1] TRUE TRUE

3. NOT (!)

A unary operator that negates the status of the elements of the operand.
a <- c(0,FALSE)
print(!a)
Output

[1] TRUE TRUE

4. Short-circuit AND (&&)

Returns True if both the first elements of the operands are True.
a <- c(TRUE, 0.1)
b <- c(0,4+3i)
print(a[1] && b[1])

Output

[1] FALSE

5. Short-circuit OR (||)

Returns True if either of the first elements of the operands is True.


a <- c(TRUE, 0.1)
b <- c(0,4+3i)
print(a[1]||b[1])

Output

[1] TRUE

Relational Operators
The Relational Operators in R carry out comparison operations between the corresponding
elements of the operands. Returns a boolean TRUE value if the first operand satisfies the relation
compared to the second. In logical comparisons, TRUE is internally treated as 1 and FALSE as
0. However, comparisons involving logical values depend on context and type coercion.

1. Less than (<)

Returns TRUE if the corresponding element of the first operand is less than that of the second
operand. Else returns FALSE.
a <- c(TRUE, 0.1,"apple")
b <- c(0,0.1,"bat")
print(a<b)

Output

[1] FALSE FALSE TRUE

2. Less than or equal to (<=)

Returns TRUE if the corresponding element of the first operand is less than or equal to that of
the second operand. Else returns FALSE.
a <- c(TRUE, 0.1, "apple")
b <- c(TRUE, 0.1, "bat")

c <- [Link](a)
d <- [Link](b)

print(c <= d)

Output

[1] TRUE TRUE TRUE

3. Greater than (>)

Returns TRUE if the corresponding element of the first operand is greater than that of the second
operand. Else returns FALSE.
a <- c(TRUE, 0.1, "apple")
b <- c(TRUE, 0.1, "bat")
print(a > b)

Output

[1] FALSE FALSE FALSE

4. Greater than or equal to (>=)

Returns TRUE if the corresponding element of the first operand is greater or equal to that of the
second operand. Else returns FALSE.
a <- c(TRUE, 0.1, "apple")
b <- c(TRUE, 0.1, "bat")
print(a >= b)

Output

[1] TRUE TRUE FALSE

5. Not equal to (!=)

Returns TRUE if the corresponding element of the first operand is not equal to the second
operand. Else returns FALSE.
When different data types are combined in a vector, R performs type coercion by converting all
elements to a common type (usually the most flexible type, such as character).
a <- c(TRUE, 0.1,'apple')
b <- c(0,0.1,"bat")
print(a!=b)

Output

[1] TRUE FALSE TRUE

Assignment Operators
Assignment Operators in R are used to assigning values to various data objects in R. The objects
may be integers, vectors or functions. These values are then stored by the assigned variable
names.

1. Left Assignment (<- , <<- , =)

Assigns a value to a vector.


vec1 = c("ab", TRUE)
print (vec1)

Output

[1] "ab" "TRUE"

2. Right Assignment (-> , ->>)

Assigns value to a vector.


c("ab", TRUE) ->> vec1
print (vec1)

Output

[1] "ab" "TRUE"

Miscellaneous Operators
Miscellaneous operators in R are special-purpose operators used for tasks such as membership
checking (%in%) and matrix multiplication (%*%).

1. %in% Operator

Checks if an element belongs to a list and returns a boolean value TRUE if the value is
present else FALSE.
val <- 0.1
a <- c(TRUE, 0.1,"apple")
print (val %in% a)

Output

[1] TRUE

2. %*% Operator (Matrix Multiplication)

The %*% operator performs matrix multiplication.


 Columns of the first matrix must equal rows of the second.
 If A is (r × c) and B is (c × r), the result is (r × r).
mat = matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
print (mat)
print( t(mat))
pro = mat %*% t(mat)
print(pro)

Subsetting in R Programming:-

In R Programming Language, subsetting allows the user to access elements from an object. It
takes out a portion from the object based on the condition provided. There are 4 ways of
subsetting in R programming. Each of the methods depends on the usability of the user and the
type of object. For example, if there is a dataframe with many columns such as states, country,
and population and suppose the user wants to extract states from it, then subsetting is used to
do this operation. In this article, let us discuss the implementation of different types of
subsetting in R programming.

Subsetting is the process of extracting or selecting a portion of data from a larger data structure
such as a vector, matrix, list, or data frame. It allows the user to access specific elements, rows,
columns, or components based on position, logical conditions, or names.

🔹 2. Importance of Subsetting

Subsetting is a fundamental concept in R because:

 It helps in data analysis and manipulation


 It allows filtering of data
 It is used in data cleaning and transformation
 It supports efficient handling of large datasets

🔹 3. Data Structures Used in Subsetting

Subsetting can be applied to:

 Vectors (1-dimensional)
 Matrices (2-dimensional)
 Arrays (multi-dimensional)
 Lists (heterogeneous elements)
 Data Frames (tabular data)

🔹 4. Operators Used in Subsetting

📌 (1) Square Brackets [ ]

 Used for general subsetting


 Returns the same type of object

📌 (2) Double Brackets [[ ]]

 Used to extract a single element


 Mainly used for lists

📌 (3) Dollar $

 Used to access elements by name


 Works with lists and data frames
🔹 5. Methods of Subsetting

🔸 (A) Subsetting by Position (Indexing)

📘 Theory

In this method, elements are selected using their index (position).

 Index in R starts from 1


 Can select single or multiple elements

📌 Types:

1. Positive indexing → selects elements


2. Negative indexing → excludes elements

📌 Example
x <- c(10, 20, 30, 40)

x[2] # selects 20
x[c(1,3)] # selects 10 and 30
x[-2] # removes 20

🔸 (B) Subsetting by Logical Conditions

📘 Theory

In this method, elements are selected based on TRUE or FALSE conditions.

 Logical expressions return TRUE/FALSE


 Only TRUE values are selected

📌 Example
x <- c(5, 10, 15, 20)

x > 10 # FALSE FALSE TRUE TRUE


x[x > 10] # returns 15 and 20

👉 This method is widely used in filtering datasets.

🔸 (C) Subsetting by Names


📘 Theory

Elements can be accessed using their assigned names instead of position.

📌 Example
x <- c(a=10, b=20, c=30)

x["b"] # returns 20

👉 Useful when working with labeled data.

🔸 (D) Subsetting Using Functions

📌 1. subset() Function

📘 Theory

Used to extract subsets of data frames based on conditions.

subset(data, condition, select)

📌 Example
df <- [Link](x=1:5, y=6:10)

subset(df, x > 2)
subset(df, x > 2, select = y)

📌 2. which() Function

📘 Theory

Returns the index positions of TRUE values.

which(x > 10)

📌 3. %in% Operator

📘 Theory

Used to match elements with a given set.

x[x %in% c(10, 30)]


📌 4. [Link]() Function

📘 Theory

Used to handle missing values.

x[![Link](x)]

🔹 6. Subsetting Different Data Structures

🔸 (1) Vector

 Uses single index

x[1]
x[x > 5]

🔸 (2) Matrix

📘 Theory

Matrices use row and column indexing

m[row, column]

📌 Example
m[1,2] # element
m[ ,2] # column
m[2, ] # row

🔸 (3) List

📘 Theory

Lists require special operators:

 [ → returns sublist
 [[ → returns element
 $ → by name

lst[[1]]
lst$name
🔸 (4) Data Frame

📘 Theory

Data frame is a table (rows + columns)

df[row, column]

📌 Example
df[1,2]
df[df$x > 2, ]
df$y

🔹 7. Important Rules

 Indexing starts from 1


 Negative and positive indexing cannot be mixed
 Logical vectors must match length
 drop = FALSE preserves structure in matrices
 [[ ]] extracts value, [ ] keeps structure

🔹 8. Advantages of Subsetting

 Efficient data handling


 Flexible data selection
 Essential for data analysis
 Supports conditional filtering

Vectorized Operations in R

🔹 1. Definition

Vectorized operations refer to performing operations on an entire vector or array at once, rather
than processing each element individually using loops.

In R, most operators and functions are vectorized by default, meaning they automatically apply
to all elements of a vector.

🔹 2. Key Characteristics
📌 (1) High Performance

Vectorized operations are much faster than loops because:

 Computation is handled internally in optimized C/Fortran code


 Avoids repeated interpretation by R

📌 (2) Conciseness

 Eliminates long for loops


 Makes code shorter and easier to read

👉 Example:

x + y # instead of loop

📌 (3) Recycling Principle

When vectors have different lengths:

 The shorter vector is repeated (recycled) to match the longer one

x <- c(1,2,3,4)
y <- c(10,20)

x + y # 11 22 13 24

⚠️If lengths are not multiples, R gives a warning.

🔹 3. Types of Vectorized Operations

🔸 (A) Arithmetic Operations

Operators work element-wise:

x <- c(1, 2, 3)
y <- c(10, 20, 30)

x + y # 11 22 33
x * y # 10 40 90
🔸 (B) Logical Comparisons

Return a logical vector (TRUE/FALSE):

ages <- c(15, 25, 30, 10)

ages > 18
# FALSE TRUE TRUE FALSE

🔸 (C) Mathematical Functions

Functions operate on each element:

x <- c(1, 4, 9)

sqrt(x) # 1 2 3
log(x)
exp(x)

🔸 (D) Conditional Operations (ifelse())

Vectorized alternative to if-else:

marks <- c(40, 60, 80)

result <- ifelse(marks >= 50, "Pass", "Fail")

👉 Output: "Fail" "Pass" "Pass"

🔹 4. Scalar and Vector Operations

Scalars are automatically applied to all elements:

radii <- c(1, 2, 3)

2 * pi * radii

👉 Here, 2 and pi are recycled across all elements.

🔹 5. Comparison with Loops


❌ Using Loop
result <- numeric(3)
for(i in 1:3){
result[i] <- x[i] + y[i]
}
✅ Vectorized Approach
x+y

👉 Vectorized code is:

 Faster
 Simpler
 Less error-prone

🔹 6. Advantages

 Efficient computation
 Cleaner and shorter code
 Better readability
 Core feature of R programming

🔹 7. Important Points

 Operations are element-wise


 Works on vectors, matrices, and data frames
 Recycling rule applies
 Avoids explicit loops

🔹 8. Conclusion

Vectorized operations are a fundamental feature of R that enable efficient and concise data
processing. By operating on entire data structures at once, they improve performance and
simplify coding, making them essential for statistical computing and data analysis.

NA and NULL Values in R

🔹 1. Introduction
In data analysis, datasets often contain incomplete or missing information. R provides special
objects to represent such situations:

 NA (Not Available) → represents missing or undefined data


 NULL → represents absence of any value or object

These two are fundamentally different and are used in different contexts during data handling
and computation.

🔷 2. NA (Not Available)

🔹 2.1 Definition

NA (Not Available) is a special value used to denote missing or unknown data in R.

👉 It indicates that:

A value exists in the dataset, but it is not currently available.

🔹 2.2 Nature of NA

 NA is a logical constant, but it can be coerced into other types


 It occupies one position in a vector
 It propagates through operations (i.e., results remain NA)

🔹 2.3 Types of NA

R provides typed missing values to maintain consistency:

 NA_integer_ → integer missing value


 NA_real_ → numeric missing value
 NA_character_ → character missing value
 NA_complex_ → complex missing value

👉 This ensures type safety in computations.

🔹 2.4 Properties of NA

1. Length: NA has length 1


2. Propagation: Any operation involving NA returns NA

3. Comparison:

NA == NA # returns NA, not TRUE


👉 Because the value is unknown

🔹 2.5 Operations with NA

📌 Arithmetic
x <- c(10, 20, NA)
x+5

👉 Result: 15 25 NA

📌 Logical
x > 15

👉 Result: FALSE TRUE NA

🔹 2.6 Detection of NA

[Link](x)

 Returns TRUE where values are NA


 Essential for identifying missing data

🔹 2.7 Handling NA Values

📌 Removing NA
x[![Link](x)]

📌 Ignoring NA in functions
mean(x, [Link] = TRUE)
sum(x, [Link] = TRUE)

👉 [Link] = TRUE removes NA before computation.

🔹 2.8 Importance of NA

 Essential for real-world datasets


 Helps maintain data integrity
 Used in statistical modeling and analysis

🔷 3. NULL

🔹 3.1 Definition
NULL represents the absence of any object or value in R.

👉 It means:

No data exists at all.

🔹 3.2 Nature of NULL

 NULL is a special object, not a value


 It has length 0
 Represents an empty structure

🔹 3.3 Properties of NULL

1. Length:

length(NULL) # 0

2. No type or value
3. Not stored in atomic vectors

🔹 3.4 Behavior of NULL

📌 In vectors
c(1, 2, NULL, 3)

👉 Output: 1 2 3
👉 NULL is ignored

📌 In lists
lst <- list(a=1, b=2)
lst$b <- NULL

👉 Removes element b

🔹 3.5 Detection of NULL

[Link](x)

 Returns TRUE if object is NULL

🔹 3.6 Uses of NULL


 Initialize empty objects
 Remove elements from lists
 Represent absence of output in functions

Differences Between NA and NULL


Feature NA (Not Available) NULL

No value /
Meaning Missing value
no object

Value does
Existence Value exists but unknown
not exist

Length 1 0

Lists,
Data Structures Vectors, matrices, data frames
objects

Ignored in
Behavior Propagates in operations
vectors

Detection [Link]() [Link]()

Coding Standards in R

🔹 1. Introduction
Coding standards are a set of rules and guidelines used to write clean, readable, and
maintainable code.
In R, following coding standards ensures that code is:

 Easy to understand
 Easy to debug
 Easy to maintain and reuse

🔹 2. Importance of Coding Standards


Coding standards are important because they:

 Improve code readability


 Enhance consistency
 Reduce errors and bugs
 Help in team collaboration
 Make code easier to maintain and update

🔷 3. General Coding Guidelines in R

🔸 (1) Naming Conventions


📘 Theory

Use meaningful and descriptive names for variables and functions.

📌 Rules

 Use lowercase letters


 Separate words using _ (snake_case)
 Avoid spaces and special characters

📌 Example
student_marks <- 85
total_sum <- sum(x)

🔸 (2) Assignment Operator


📘 Theory

Use <- for assignment instead of = (recommended in R style guides).

📌 Example
x <- 10

🔸 (3) Spacing
📘 Theory

Proper spacing improves readability.

📌 Rules

 Add space around operators (+, -, =)


 No unnecessary spaces inside parentheses

📌 Example
x <- a + b

🔸 (4) Indentation
📘 Theory

Indent code blocks properly for clarity.

📌 Example
if (x > 10) {
print("Greater")
}

🔸 (5) Line Length


📘 Theory

Keep lines short (usually ≤ 80 characters).

👉 Long lines should be broken into multiple lines.

🔸 (6) Comments
📘 Theory

Use comments to explain code logic.

📌 Example
# Calculate average marks
mean_marks <- mean(x)

👉 Comments should be:


 Clear
 Short
 Meaningful

🔸 (7) Function Writing Style


📘 Theory

Functions should be well-structured and readable.

📌 Example
calculate_mean <- function(x) {
mean(x, [Link] = TRUE)
}

🔸 (8) Avoid Hard Coding


📘 Theory

Avoid directly using values in code; use variables instead.

📌 Example
threshold <- 50
if (marks > threshold) {
print("Pass")
}

🔸 (9) Consistent Style


📘 Theory

Maintain the same style throughout the program.

 Same naming pattern


 Same indentation
 Same spacing

🔸 (10) Use of Built-in Functions


📘 Theory

Prefer built-in vectorized functions instead of loops for efficiency.


🔷 4. Popular R Style Guides

Some widely followed standards:

 Google R Style Guide


 Tidyverse Style Guide

👉 These guides define best practices for writing clean R code.

🔷 5. Example of Good vs Bad Code


❌ Bad Code
x=1:10
y=mean(x)
print(y)

✅ Good Code
numbers <- 1:10
average <- mean(numbers)

print(average)

🔷 6. Advantages of Following Coding Standards

 Improves readability
 Makes debugging easier
 Enhances collaboration
 Produces professional-quality code

You might also like