Introduction to R Programming Basics
Introduction to R Programming Basics
UNIT-1
Basics of R Programming
1. Introduction to R
It provides a rich ecosystem of packages and functions for handling complex data analysis tasks
efficiently.
Key points:
2. Features of R
R has several distinct features that make it popular among statisticians, researchers, and data
analysts:
a) Open Source
e) Graphical Capabilities
CRAN (Comprehensive R Archive Network) hosts over 20,000 packages for diverse tasks:
o Data manipulation: dplyr, tidyr
o Visualization: ggplot2, plotly
o Machine learning: caret, randomForest
i) Community Support
R is a flexible, powerful, and versatile tool for data analysis and visualization. Its open-source
nature, rich package ecosystem, strong statistical capabilities, and graphical tools make it one of the
most preferred languages for statisticians, data scientists, and researchers.
Applications of R
R is a powerful tool for data analysis, statistical computing, and visualization, widely used in
various fields. Its versatility and strong statistical capabilities make it indispensable for researchers,
analysts, and data scientists.
1. Statistical Analysis
R provides built-in functions and packages for various statistical operations, such as:
o Descriptive statistics: mean, median, variance, standard deviation
o Inferential statistics: t-tests, chi-square tests, ANOVA
o Regression analysis: linear, logistic, multiple regression
o Time series analysis: forecasting, ARIMA models
Example: Analyzing exam scores of students to find mean performance and trends.
2. Data Visualization
4. Bioinformatics
6. Business Analytics
Conclusion
R is a versatile tool with applications across statistics, data science, finance, business, research,
and bioinformatics. Its open-source nature, strong statistical functions, and visualization
capabilities make it one of the most preferred tools for data analysis and decision-making.
Variables in R
In R, a variable is a name that stores data or a value. Variables allow you to store, manipulate,
and retrieve data easily in your programs.
1. Creating Variables in R
Variables are created using the assignment operator: <- (most common), = or ->.
# Using <-
x <- 10
y <- "Hello R"
# Using =
z = TRUE
x <- 10
y <- "Hello"
z <- TRUE
mode(x) # "numeric"
mode(y) # "character"
4. Changing Variable Type (Type Casting)
x <- "100"
x_num <- [Link](x) # Converts string to numeric
y <- 1
y_char <- [Link](y) # Converts numeric to string
5. Constants vs Variables
6. Removing Variables
x <- 10
rm(x) # x is deleted
# Character variable
name <- "Alice"
print(name)
# Output: [1] "Alice"
# Logical variable
is_student <- TRUE
print(is_student)
# Output: [1] TRUE
# Factor variable
gender <- factor(c("Male","Female","Female"))
print(gender)
# Output: [1] Male Female Female
# Levels: Female Male
OUTPUT:
[1] 25
[1] 5.9
[1] "Alice"
[1] TRUE
[1] Male Female Female
Levels: Female Male
[1] 30.9
Summary
In-Built Functions in R
1. Definition
Key points:
A) Mathematical Functions
B) Statistical Functions
x <- c(2,4,6,8)
mean(x) # Average → 5
sum(x) # Total → 20
sd(x) # Standard deviation → 2.581989
var(x) # Variance → 6.666667
median(x) # Median → 5
C) Character/String Functions
E) Factor Functions
print(paste("Total:", total))
print(paste("Average:", average))
print(paste("Max:", max_score))
print(paste("Min:", min_score))
print(paste("SD:", round(sd_score,2)))
Output:
6. Conclusion
In-built functions are core components of R. They allow programmers to perform operations
efficiently, handle data of different types, and create analysis-ready outputs. Knowing and using
these functions effectively is key to mastering R programming.
In-Built Functions in R
R provides hundreds of in-built functions for performing various tasks, saving time and effort.
These functions are predefined and ready to use. Below are the main categories with examples.
1. Mathematical Functions
Example:
x <- 9
sqrt(x) # 3
abs(-12) # 12
round(3.567,2) # 3.57
2. Trigonometric Functions
Example:
sin(pi/6) # 0.5
cos(pi) # -1
tan(pi/4) # 1
Example:
log(2.718) # 1
log10(1000) # 3
exp(1) # 2.718282
Example:
5. Sequence Functions
Example:
seq(1,10,2) # 1 3 5 7 9
rep("R",3) # "R" "R" "R"
Example:
paste() concatenates strings and variables into a single string: "Name: Alice Score:
90"
print() displays the string and adds quotes around it and prints the index [1]
Output:
Output:
R variables can store numeric, character, logical, factor, complex, or raw data, each suitable for
different types of computations and analysis.
Example:
# Numeric variables
age <- 25 # Integer-like numeric
height <- 5.9 # Decimal numeric
print(age) # 25
print(height) # 5.9
Example:
# Integer variable
count <- 10L
print(count) # 10
class(count) # "integer"
Example:
# Character variable
name <- "Alice"
city <- 'New York'
print(name) # "Alice"
print(city) # "New York"
Example:
# Logical variable
is_student <- TRUE
passed_exam <- FALSE
print(is_student) # TRUE
print(passed_exam) # FALSE
Usage in condition:
if(is_student){
print("Student discount applied")
}
Example:
# Factor variable
gender <- factor(c("Male","Female","Female"))
print(gender)
# Output: Male Female Female
# Levels: Female Male
Example:
# Complex number
z <- 2 + 3i
print(z) # 2+3i
class(z) # "complex"
Example:
# Raw data
r <- charToRaw("Hello")
print(r) # 48 65 6c 6c 6f
class(r) # "raw"
x <- NULL
print(x) # NULL
y <- NA
print(y) # NA
x <- 10
class(x) # "numeric"
typeof(x) # "double"
num <- 10
[Link](num) # "10"
[Link]("20") # 20
[Link](c("A","B","A")) # Factor variable
[Link](1) # TRUE
A vector is a one-dimensional data structure in R that stores elements of the same data
type (numeric, character, logical, or complex).
Vectors are the most basic and commonly used data structure in R.
They are used for storing, manipulating, and analyzing data.
2. Types of Vectors
3. Creating Vectors
# Numeric vector
numbers <- c(10, 20, 30, 40, 50)
print(numbers)
class(numbers)
Output:
[1] 10 20 30 40 50
[1] "numeric"
# Character vector
fruits <- c("Apple", "Banana", "Cherry")
print(fruits)
class(fruits)
Output:
# Logical vector
bool_vec <- c(TRUE, FALSE, TRUE, FALSE)
print(bool_vec)
class(bool_vec)
Output:
# Complex vector
comp_vec <- c(2+3i, 4+5i, 1+2i)
print(comp_vec)
class(comp_vec)
Output:
# Access a range
numbers[2:4] # 20 30 40
# Exclude elements
numbers[-3] # 10 20 40 50
5. Vector Operations
# Arithmetic operations
vec1 + vec2 # 11 22 33
vec1 - vec2 # 9 18 27
vec1 * vec2 # 10 40 90
vec1 / vec2 # 10 10 10
# Comparison
vec1 > 15 # FALSE TRUE TRUE
# Logical operations
vec1 > 15 & vec2 < 3 # FALSE TRUE FALSE
6. Vector Functions in R
Example:
7. Modifying Vectors
numbers <- c(10, 20, 30, 40, 50)
# Add elements
numbers <- c(numbers, 60)
print(numbers) # 10 20 35 40 50 60
# Remove elements (exclude 2nd)
numbers <- numbers[-2]
print(numbers) # 10 35 40 50 60
# Vector operations
nums_plus_5 <- nums + 5
nums_square <- nums^2
# Access elements
first_num <- nums[1]
last_num <- nums[length(nums)]
# Built-in functions
total <- sum(nums)
average <- mean(nums)
max_num <- max(nums)
min_num <- min(nums)
cum_sum <- cumsum(nums)
# Print outputs
print(nums_plus_5)
print(nums_square)
print(first_num)
print(last_num)
print(total)
print(average)
print(max_num)
print(min_num)
print(cum_sum)
Output:
[1] 10 15 20 25 30
[1] 25 100 225 400 625
[1] 5
[1] 25
[1] 75
[1] 15
[1] 25
[1] 5
[1] 5 15 30 50 75
9. Key Points
1. Vectors store elements of the same data type.
2. Created using c() function, but can also be created with seq() or rep().
3. Supports indexing, slicing, and logical selection.
4. Supports arithmetic and comparison operations.
5. Many built-in functions are available for analysis and manipulation.
MATRICES IN R –
1. Definition of Matrix in R
A matrix in R is a two-dimensional data structure that stores elements in rows and columns.
It can store only one type of data, such as numeric, character, or logical.
Matrices are widely used in data analysis, statistics, and mathematical computations.
2. Creating Matrices in R
Syntax
matrix(data, nrow, ncol, byrow = FALSE)
Definition of Parameters
Output
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Definition
Example
m2 <- matrix(1:6, nrow = 2, byrow = TRUE)
print(m2)
Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
Example
a <- c(1, 2, 3)
b <- c(4, 5, 6)
m3 <- cbind(a, b)
print(m3)
Output
a b
[1,] 1 4
[2,] 2 5
[3,] 3 6
Example
x <- c(10, 20, 30)
y <- c(40, 50, 60)
m4 <- rbind(x, y)
print(m4)
Output
[,1] [,2] [,3]
x 10 20 30
y 40 50 60
Definition
Matrix elements are accessed using indexing:
matrix[row, column].
Example
m <- matrix(1:9, nrow=3)
m[1, 2] # element at row 1 column 2
m[ , 3] # entire column 3
m[2, ] # entire row 2
Output
[1] 4
[1] 3 6 9
[1] 2 5 8
5. Matrix Operations in R
Example
A <- matrix(1:4, 2, 2)
B <- matrix(5:8, 2, 2)
A + B
A - B
Output
A + B:
[,1] [,2]
[1,] 6 8
[2,] 9 11
A - B:
[,1] [,2]
[1,] -4 -4
[2,] -4 -4
Example
A * B
Output
[,1] [,2]
[1,] 5 21
[2,] 12 32
Output
[,1] [,2]
[1,] 19 43
[2,] 22 50
Example
t(A)
Output
[,1] [,2]
[1,] 1 3
[2,] 2 4
Example
solve(A)
Output
[,1] [,2]
[1,] -2 1
[2,] 1 -0.5
6. Matrix Functions in R
Example
A <- matrix(1:6, 2, 3)
rowSums(A)
colSums(A)
Output
rowSums: 9 12
colSums: 4 6 8
Definition
R provides functions to check whether an object is a matrix.
Example
[Link](A)
class(A)
Output
[1] TRUE
[1] "matrix"
8. Converting Other Objects to Matrix
Definition
[Link]() converts vectors, data frames, or lists into matrices.
Example
v <- 1:6
m <- [Link](v)
print(m)
Output
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6
1. Definition of Array in R
An array in R is a multi-dimensional data structure that stores elements in two or more
dimensions.
Arrays can store only one type of data (numeric, character, or logical).
They are useful for complex computations in statistics, mathematics, scientific simulations, and
multi-level data operations.
Example:
A 2D array = matrix
A 3D array = multiple matrices stacked
A 4D array = multiple 3D arrays
2. Creating Arrays in R
Syntax
array(data, dim)
Parameter Definitions
Output
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
Definition
Array elements are accessed using square brackets:
Example
arr[1, 2, 1] # 1st row, 2nd column, 1st matrix
arr[ , , 2] # entire 2nd matrix
arr[3, , ] # 3rd row across all matrices
Output
[1] 4
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12
, , 1
[1] 3 6
, , 2
[1] 9 12
Definition
You can assign names to rows, columns, and matrices using:
dimnames()
or by passing names inside array().
Example
row_names <- c("R1", "R2", "R3")
col_names <- c("C1", "C2")
mat_names <- c("M1", "M2")
Output
, , M1
C1 C2
R1 1 4
R2 2 5
R3 3 6
, , M2
C1 C2
R1 7 10
R2 8 11
R3 9 12
5. Array Operations
A + B
A - B
Output
A + B:
[,1] [,2]
[1,] 7 7
[2,] 7 7
[3,] 7 7
A - B:
[,1] [,2]
[1,] -5 3
[2,] -3 1
[3,] -1 -1
5.2 Multiplication
A * B
Output
[,1] [,2]
[1,] 6 5
[2,] 8 8
[3,] 10 3
Definition
apply() is used to perform operations across rows, columns, or layers.
Syntax
apply(array, margin, function)
Margins
1 = rows
2 = columns
3 = layers
# Sum of rows
apply(arr, 1, sum)
# Sum of columns
apply(arr, 2, sum)
Output
Rows: 33 36 39
Columns: 30 78
Layers: 21 57
7. Combining Arrays
Definition
Arrays can be combined using:
Example
A <- array(1:6, dim=c(3,2,1))
B <- array(7:12, dim=c(3,2,1))
cbind(A, B)
Output
, , 1
A B
[1,] 1 7
[2,] 2 8
[3,] 3 9
[4,] 4 10
[5,] 5 11
[6,] 6 12
Definition
To verify if an object is an array:
Example
[Link](arr)
class(arr)
Output
[1] TRUE
[1] "array"
Definition
[Link]() converts vectors, matrices, and lists into arrays.
Example
v <- 1:8
arr <- [Link](v)
print(arr)
Output
[1] 1 2 3 4 5 6 7 8
(1D array)
1. Definition of List in R
A list in R is a heterogeneous data structure that can store different types of data elements
together, such as:
Numbers
Strings
Vectors
Matrices
Arrays
Data frames
Other lists
Lists are flexible and are one of the most important data structures in R.
Example:
A single list can contain a number, a vector, a matrix, and a string all at once.
2. Creating Lists
Syntax
list(item1, item2, item3, ...)
Output
[[1]]
[1] 10
[[2]]
[1] "Hello"
[[3]]
[1] TRUE
Example 2: List with Named Components
student <- list(Name="Alice", Age=21, Marks=c(90,85,88))
print(student)
Output
$Name
[1] "Alice"
$Age
[1] 21
$Marks
[1] 90 85 88
Example
student$Name
student[[1]]
student[["Age"]]
Output
[1] "Alice"
[1] "Alice"
[1] 21
Example
student[1]
Output
$Name
[1] "Alice"
Example
student$Marks
Output
[1] 90 85 88
Definition
Lists are mutable; elements can be added, updated, or removed.
Output
[1] 22
Output
$Name
[1] "Alice"
$Age
[1] 22
$Marks
[1] 90 85 88
$Grade
[1] "A"
Output
$Name
[1] "Alice"
$Age
[1] 22
$Grade
[1] "A"
5. Nested Lists
Definition
A list that contains another list inside it is called a nested list.
Example
nested <- list(
person=list(Name="John", Age=25),
scores=list(Math=90, Science=85)
)
print(nested)
Output
$person
$person$Name
[1] "John"
$person$Age
[1] 25
$scores
$scores$Math
[1] 90
$scores$Science
[1] 85
6. List Operations
Output
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] 4
Output
[1] 3
Output
(Example)
Definition
Use lapply() and sapply() to apply functions to each list element.
Output
$a
[1] 15
$b
[1] 40
Output
a b
15 40
Definition
Lists can be converted to vectors, matrices, or data frames.
Example
df <- [Link](student)
print(df)
Output
Name Age Grade
1 Alice 22 A
A list is a heterogeneous data structure that can store multiple data types.
Created using the list() function.
Elements accessed using [], [[]], and $.
Lists are mutable and support adding, updating, and deleting elements.
Lists can store complex objects such as matrices, arrays, functions, and nested lists.
lapply() and sapply() allow applying functions to list components.
Lists are widely used in data manipulation, statistical modeling, and return values of
functions.
1. Definition
A Factor in R is a data structure used to store categorical data (data that can be divided into
groups).
It stores values as levels, which are unique categories.
3. Creating Factors
Output
[1] Male Female Female Male
Levels: Female Male
Explanation:
R identifies the categories ("Male", "Female") and arranges them alphabetically as levels.
4. Levels of a Factor
Example
levels(gender)
Output
[1] "Female" "Male"
5. Ordered Factors
Output
[1] Good Poor Excellent Good
Levels: Poor < Good < Excellent
Example
class(gender)
str(gender)
Output
[1] "factor"
7. Converting Factors
[Link](gender)
Output
[1] MALE FEMALE FEMALE MALE
Levels: FEMALE MALE
Output
[1] MALE FEMALE FEMALE MALE
Levels: FEMALE MALE OTHER
Output
FEMALE MALE OTHER
2 2 0
11. Example Program – Full Demonstration
Program
# Creating a factor
city <- factor(c("Delhi", "Mumbai", "Kolkata", "Mumbai", "Delhi"))
# Print factor
print(city)
# Print levels
print(levels(city))
# Rename levels
levels(city) <- c("DELHI", "KOLKATA", "MUMBAI")
print(city)
print(grades)
# Summary of factor
summary(grades)
Output
[1] Delhi Mumbai Kolkata Mumbai Delhi
Levels: Delhi Kolkata Mumbai
[1] B A C A
Levels: A < B < C
A B C
2 1 1
1. Memory-efficient
2. Makes analysis of categorical data easy
3. Required for statistical modeling
4. Provides ordering capability
5. Prevents invalid category entries
13. Applications of Factors
Factors are used to store categorical data as levels. They are essential for statistical modeling.
Factors can be ordered or unordered. R uses factors to efficiently handle attributes like
gender, grades, categories, groups, etc. Operations such as creating factors, modifying levels,
ordering levels, and converting between data types are commonly used. Example programs
demonstrate factor creation, manipulation, and usage with outputs.
1. Definition
A Data Frame in R is a two-dimensional data structure used to store tabular data in rows and
columns, similar to an Excel sheet or SQL table.
Each column can contain different data types (numeric, character, factor, logical, etc.).
Each row represents an observation, and each column represents a variable.
Data frames are one of the most important data structures for data analysis and statistics in R.
2. Features of Data Frames
1. Two-dimensional structure
Data is arranged in rows and columns.
2. Heterogeneous data
Each column can have a different data type.
3. Column names and row names
Data frames support naming both rows and columns.
4. Easy data manipulation
Subsetting, filtering, merging, adding/removing columns is simple.
5. Great for statistical modeling
Functions like lm(), glm(), [Link]() accept data frames directly.
6. Conversion available
You can convert vectors, lists, and matrices into data frames.
Output
name age score
1 Alice 22 89
2 Bob 25 92
3 Charlie 30 85
Output
[1] 22 25 30
Output
[1] 22 25 30
Output
Example
student_df$grade <- c("A", "A+", "B")
print(student_df)
Output
name age score grade
1 Alice 22 89 A
2 Bob 25 92 A+
3 Charlie 30 85 B
Example
new_row <- [Link](name="David", age=28, score=90, grade="A")
student_df <- rbind(student_df, new_row)
print(student_df)
Output
name age score grade
1 Alice 22 89 A
2 Bob 25 92 A+
3 Charlie 30 85 B
4 David 28 90 A
Remove Column
student_df$grade <- NULL
Remove Row
student_df <- student_df[-2, ]
Example
summary(student_df)
str(student_df)
Output
name age score
Alice :1 Min. :22.00 Min. :85.00
Bob :1 1st Qu.:23.75 1st Qu.:87.00
Charlie:1 Median :25.00 Median :89.00
Mean :25.67 Mean :88.67
Max. :30.00 Max. :92.00
Output
name age score
"character" "numeric" "numeric"
10. Filtering Rows
Example
student_df[student_df$score > 88, ]
Output
name age score
2 Bob 25 92
Example
df1 <- [Link](ID=1:3, Name=c("A","B","C"))
df2 <- [Link](ID=1:3, Score=c(80,90,85))
Output
ID Name Score
1 1 A 80
2 2 B 90
3 3 C 85
From Matrix
m <- matrix(1:6, nrow=2)
df <- [Link](m)
2 (rows × 2 or more
Dimensions 1 1 1 2 (rows × columns)
columns) (e.g., 3D, 4D)
Data Types Same type Same type Different types Same type
Same type only Different types (mixed)
Allowed only only allowed (categories)
Simple
Mathematical Scientific, Storing
Best Used numeric/ch Storing mixed
operations, multidimensio categorical Statistical datasets, tables
For ar objects
linear algebra nal data variables
sequences
Homogeneo
us / Homogene Homogeneous
Homogeneous Homogeneous Heterogeneous Heterogeneous
Heterogene ous categories
ous
Matrix
Typical Sorting, Multidimensio Accessing Level
multiplication, Filtering, merging, summarizing
Operations arithmetic nal ops components manipulation
transpose
Creation
c() matrix() array() list() factor() [Link]()
Function
Feature Vectors Matrices Arrays Lists Factors Data Frames
Support for
Yes (named list Yes (factor
Row/Colum No Yes Yes Yes
elements) levels)
n Names
Atomic vector
Storage Atomic Atomic vector Recursive Integer codes +
with List of equal-length vectors
Type vector with attributes structure levels
dimensions
Suitable For
No Sometimes No Sometimes Yes Yes (most used)
Modeling
df <-
Data Frame [Link](x=1:3,y=c("A","B","C"))
Table with 2 columns
R provides multiple data structures for different purposes. Vectors, matrices, and arrays store
homogeneous data, whereas lists and data frames store heterogeneous data. Factors are special
structures for categorical variables. Together, these six structures support powerful data
manipulation, statistical modeling, and scientific computation in R.
⭐DECISION-MAKING STRUCTURES IN R (DETAILED – 15 MARKS)
1. if statement
2. if–else statement
3. if–else if ladder
4. Nested if
5. switch statement
6. ifelse() function (vectorized decision)
🔵 1. IF Statement
✔Definition
The if statement executes a block of code only when a given condition is TRUE.
If the condition is FALSE, R simply skips the block.
✔Syntax
if (condition) {
statements
}
✔Flowchart (text-based)
Condition?
|
+---+---+
| |
TRUE FALSE
| |
Execute Skip
Block Block
✔Example
x <- 10
if (x > 5) {
print("x is greater than 5")
}
✔Output
[1] "x is greater than 5"
🔵 2. IF–ELSE Statement
✔Definition
Used when two alternative actions are required:
✔Syntax
if (condition) {
statements
} else {
other_statements
}
✔Flowchart
Condition?
|
+-----+-----+
| |
TRUE FALSE
| |
Execute Execute
IF part ELSE part
✔Example
age <- 17
if (age >= 18) {
print("Adult")
} else {
print("Minor")
}
✔Output
[1] "Minor"
✔Definition
Used when multiple conditions need to be tested one after another.
The first TRUE condition gets executed.
✔Syntax
if (condition1) {
statements1
} else if (condition2) {
statements2
} else if (condition3) {
statements3
} else {
default_statements
}
✔Flow Explanation
Conditions are checked from top to bottom.
As soon as one condition is TRUE, the remaining conditions are not evaluated.
✔Output
[1] "Grade B"
🔵 4. Nested IF Statement
✔Definition
When an if statement is placed inside another if, it is called a nested-if.
Used for multi-level decision making.
✔Syntax
if(condition1) {
if(condition2) {
statements
} else {
statements
}
} else {
statements
}
if (num > 0) {
if (num %% 2 == 0) {
print("Positive Even Number")
} else {
print("Positive Odd Number")
}
} else {
print("Negative Number")
}
✔Output
[1] "Positive Even Number"
🔵 5. SWITCH Statement
✔Definition
The switch() statement selects and executes one option from multiple choices.
Useful for menu-driven applications.
✔Syntax
switch(expression,
case1 = value1,
case2 = value2,
case3 = value3,
...
)
✔Output
[1] "Blue"
✔Output
[1] "Wednesday"
🔵 6. ifelse() Function
✔Definition
ifelse() is a vectorized decision-making function.
It checks a condition over an entire vector and returns outputs element-wise.
✔Syntax
ifelse(test, value_if_true, value_if_false)
✔Example
x <- c(5, 12, 8, 20)
result <- ifelse(x > 10, "Greater", "Smaller")
print(result)
✔Output
[1] "Smaller" "Greater" "Smaller" "Greater"
Introduction
Loops in R are control structures that allow repetitive execution of a block of code until a specified
condition is met. They help automate tasks, reduce redundancy, and simplify complex iterative
processes. R provides three main types of loops:
✔for loop
✔while loop
✔repeat loop
1. for Loop
Definition
A for loop in R is used when the number of iterations is already known. It iterates over elements of
a vector, list, sequence, or any iterable object.
Syntax
for (variable in sequence) {
# statements
}
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Output
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25
2. while Loop
Definition
A while loop repeatedly executes a block of code as long as the condition is TRUE.
Used when number of iterations is unknown but depends on a condition.
Syntax
while (condition) {
# statements
}
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
3. repeat Loop
Definition
The repeat loop is an infinite loop that continues until a break statement is encountered.
Used when the number of iterations is unknown and depends on logic inside the loop.
Syntax
repeat {
# statements
if (condition) {
break
}
}
repeat {
print(x)
x <- x + 1
if (x > 5) {
break
}
}
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
4. Control Statements inside Loops
Example
for (i in 1:10) {
if (i == 6) {
break
}
print(i)
}
Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
Example
for (i in 1:5) {
if (i == 3) {
next
}
print(i)
}
Output
[1] 1
[1] 2
[1] 4
[1] 5
(Note: 3 is skipped)
Comparison of Loops in R
Feature for Loop while Loop repeat Loop
Number of iterations
Yes No No
known?
Conclusion
These loops are essential for data processing, automation, simulations, and iterative computations in
R.
1. Introduction
A function in R is a block of organized, reusable code designed to perform a specific task.
R provides many built-in functions, but users can also create their own functions, known as User-
Defined Functions (UDFs).
Reduce repetition
Make code modular
Improve readability
Enhance reusability
2. Definition
return(value)
}
Components
greet()
Output
[1] "Hello, welcome to R programming!"
square(5)
Output
[1] 25
add_numbers(4, 7)
Output
[1] 11
multiply(3, 4)
Output
[1] 12
7. Default Arguments
Example 5
power <- function(x, p = 2) {
return(x^p)
}
power(5)
power(5, 3)
Output
[1] 25
[1] 125
Types of Scope:
demo()
print(x)
Output
[1] 5 # Local variable inside function
[1] 10 # Global variable outside function
Example 7
calculate <- function(a, b) {
sum <- a + b
diff <- a - b
prod <- a * b
calculate(10, 4)
Output
$SUM
[1] 14
$DIFFERENCE
[1] 6
$PRODUCT
[1] 40
Example 8
(result <- (function(x, y) { x + y })(6, 4))
Output
[1] 10
Example 9: Factorial
factorial_r <- function(n) {
if (n == 0)
return(1)
else
return(n * factorial_r(n - 1))
}
factorial_r(5)
Output
[1] 120
✔Data processing
✔Statistical calculations
✔Custom mathematical models
✔Data cleaning functions
✔Machine learning preprocessing
✔Automated reporting
# determining grade
if (marks >= 90) {
grade <- "A+"
} else if (marks >= 75) {
grade <- "A"
} else if (marks >= 60) {
grade <- "B"
} else {
grade <- "Fail"
}
# returning as list
return(list(Name = name, Marks = marks, Grade = grade))
}
student_result("Alice", 88)
Output
$Name
[1] "Alice"
$Marks
[1] 88
$Grade
[1] "A"
Conclusion
User-defined functions in R play an essential role in building efficient programs. They help
structure code, facilitate reuse, enhance readability, and support complex computing operations. R
provides high flexibility in designing functions with arguments, return values, default parameters,
recursion, and multiple outputs.
User-Defined Functions in R
A User-Defined Function (UDF) is a function created by the user to perform a specific task.
Functions help in:
Reusing code
Improving modularity
Making programs readable
Avoiding repetition
Definition
A function that does not require any input is called a function without arguments.
It performs its task independently and uses only the statements inside the function body.
Syntax
function_name <- function() {
# Statements
}
greet()
Output
[1] "Welcome to R Programming!"
show_date()
Output (Example)
[1] "2025-11-23"
hello()
Output
[1] "Hello Students!"
Definition
A function that takes one or more inputs is known as a function with arguments.
These inputs (parameters) help the function perform calculations or processing.
Syntax
function_name <- function(arg1, arg2, ...) {
# Statements
}
Mandatory
Optional with default values
add(10, 20)
Output
[1] 30
Output
[1] 25
[1] 125
area_circle(7)
Output
[1] 153.86
Output
[1] 50
# With Arguments
multiply <- function(x, y) {
return(x * y)
}
welcome()
multiply(4, 5)
Output
[1] "Hello! This is an R Program."
[1] 20
Conclusion
1. Introduction
R is widely known for its strong package ecosystem. Packages extend the functionality of R by
providing additional functions, datasets, and tools.
While thousands of packages exist on CRAN, users often need to create their own packages for:
Custom functions
Reusable code
Research projects
Sharing tools with others
Automation of common tasks
A User-Defined Package allows users to bundle their functions, documentation, and data into a
structured form that can be installed and used just like any other R package.
2. Definition
Explanation of Components
Component Description
Component Description
Package: mypackage
Type: Package
Title: My First User Defined Package
Version: 0.1.0
Author: Your Name
Maintainer: Your Name <youremail@[Link]>
Description: This package contains custom functions created by the user.
License: GPL-3
Encoding: UTF-8
Step 3: Create NAMESPACE file
export(add_two)
export(square_num)
File 1: add_two.R
add_two <- function(x) {
return(x + 2)
}
File 2: square_num.R
square_num <- function(n) {
return(n * n)
}
add_two(5)
square_num(4)
Output
[1] 7
[1] 16
Run:
devtools::document()
Output
[1] "Hello Aishman"
6. Adding Data to Package
1. Reusability
2. Collaboration
3. Organization
4. Maintenance
5. Distribution
8. Real-Life Applications
calculate_grade()
percentage()
Function 1: percentage()
percentage <- function(marks, total = 100) {
return((marks / total) * 100)
}
Function 2: calculate_grade()
calculate_grade <- function(percentage) {
if (percentage >= 90) return("A+")
else if (percentage >= 75) return("A")
else if (percentage >= 60) return("B")
else return("Fail")
}
p <- percentage(88)
calculate_grade(p)
Output
[1] "A"
Conclusion
User-Defined Packages in R are essential tools that allow programmers and data scientists to extend
R’s capabilities. They help structure code professionally, promote reusability, and enable sharing
with the research or developer community. Creating packages enhances efficiency and supports
large-scale scientific and analytical projects.
1. Introduction
R Markdown (Rmd) is a powerful tool in R used to create dynamic, reproducible reports.
It allows users to combine:
All in a single document that can be converted into multiple formats such as:
✔HTML
✔PDF
✔Word
✔Slides
R Markdown is widely used in data analysis, reporting, research papers, statistical summaries,
dashboards, and presentations.
2. What is R Markdown?
R Markdown is a file format with extension .Rmd that blends markdown syntax with R code
chunks.
It follows the concept of literate programming, where analysis and documentation exist together in
one report.
3. Structure of an R Markdown Document
Example
---
title: "Sales Report 2025"
author: "Aishman Rai"
date: "23 November 2025"
output: html_document
---
Example
## Introduction
This report shows the monthly sales performance.
Example
```{r}
x <- 1:5
mean(x)
When the report is knitted, the result will appear in the output.
---
In RStudio:
File → New File → R Markdown
HTML
PDF
Word
Headings
# Heading 1
## Heading 2
### Heading 3
Lists
- Item 1
- Item 2
Tables
| Name | Marks |
|------|--------|
| Riya | 89 |
| Sam | 78 |
6. R Code Chunk Options
```{r}
...
### Example
```r
```{r, echo=FALSE, message=FALSE}
library(ggplot2)
---
```markdown
---
title: "Student Performance Report"
author: "Aishman Rai"
output: word_document
---
# Introduction
This report analyses the marks of students.
# Data
```{r}
marks <- c(80, 75, 92, 60, 88)
marks
Average Marks
Writing
mean(marks)
Text
Plot of Marks
Writing
Text
Conclusion
---
---
---
# **10. Advantages**
✔ Clean and professional reports
✔ Combines text, code, and graphs
✔ Promotes reproducibility
✔ Supports multiple output formats
✔ Easy to maintain
✔ Excellent for research and education
---
# **Conclusion**
R Markdown is a powerful tool for producing **dynamic, reproducible, and
professional reports** that combine analysis, interpretation, and visualization
in a single document. Its ability to integrate code and narrative makes it
essential for data scientists, researchers, and students.
1. Introduction to R Markdown
R Markdown is a powerful document format in R that allows users to combine code, text, and
output in a single file.
It is widely used for creating:
Reports
Assignments
Presentations
Dashboards
Websites
Books
Example:
---
title: "Student Report"
author: "Aishman Rai"
date: "`r [Link]()`"
output: html_document
---
Example:
## Introduction
This report analyses the performance of students.
Example:
```{r}
x <- c(10, 20, 30)
mean(x)
---
---
### **Definition**
Direct rendering refers to **rendering (knitting) the R Markdown file directly**
from the RStudio interface or by using the knit button.
- HTML report
- PDF report
- Word report
### **Example**
```{r}
marks <- c(85, 90, 78, 92)
mean(marks)
---
### **Definition**
Indirect rendering means **rendering an R Markdown file programmatically using R
scripts**, not by manually clicking “Knit”.
```r
rmarkdown::render("[Link]")
How it Works
1. Write an .Rmd file
2. Use an R script (.R file) to call rmarkdown::render()
3. The .Rmd is processed and converted into HTML/PDF/Word automatically
Example
```{r}
x <- rnorm(5)
x
source("render_script.R")
```r
```{r}
plot(marks, type="b", main="Student Marks", xlab="Student", ylab="Marks")
---
# **5. Conclusion**
Both methods together make R Markdown ideal for academic, industrial, and
research-oriented reporting.
UNIT-2
Import and Export of Data in R (Detailed Notes)
Data import and export are essential operations in R because most real-world data comes from
external sources such as text files, CSV files, Excel, databases, or web APIs. R provides a rich set of
built-in functions and packages for reading (importing) and writing (exporting) data in different
formats.
1. IMPORTING DATA IN R
Import means bringing external data into R so that it can be analyzed, manipulated, or visualized.
Function:
[Link]("[Link]")
Example:
data <- [Link]("[Link]")
print(data)
Output (example):
Name Age Marks
1 Ram 18 85
2 Sita 19 90
3 Ravi 20 78
Function:
[Link]("[Link]", header = TRUE, sep = "\t")
Example:
data <- [Link]("[Link]", header = TRUE)
Function:
library(readxl)
data <- read_excel("[Link]")
Example:
print(data)
load("[Link]")
library(jsonlite)
data <- fromJSON("[Link]")
2. EXPORTING DATA IN R
Exporting means writing or saving data from R into external files so they can be used in other
applications like Excel, Python, SPSS, etc.
2.1 Exporting CSV Files
Function:
[Link](data, "[Link]", [Link] = FALSE)
Example:
[Link](iris, "iris_output.csv", [Link] = FALSE)
library(openxlsx)
[Link](data, "[Link]")
Reading Again
d <- readRDS("[Link]")
Connecting to MySQL
library(RMySQL)
conn <- dbConnect(MySQL(), dbname="school", host="localhost",
user="root", password="1234")
Import Query
data <- dbGetQuery(conn, "SELECT * FROM students")
Export Query
dbWriteTable(conn, "new_table", data)
# Displaying data
print(students)
# Performing operation
students$Percentage <- (students$Marks / 100) * 100
6. Summary Table
Data Format Import Function Export Function
Data Format Import Function Export Function
Conclusion
1. Introduction
Data visualization is the process of converting raw data into graphical or pictorial formats. It helps
in identifying trends, patterns, relationships, and anomalies within data.
R programming provides a powerful environment for visualizing data using built-in functions,
graphical parameters, and advanced libraries.
Syntax
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y")
Example
x <- c(1,2,3,4,5)
y <- c(3,4,6,8,10)
plot(x, y, type="p", main="Scatter Plot", xlab="X Values", ylab="Y Values")
Syntax
plot(x, y, type="l")
Example
years <- 2015:2020
sales <- c(50, 60, 65, 70, 90, 110)
plot(years, sales, type="l", main="Sales Trend", xlab="Year", ylab="Sales")
2.3 Bar Chart
Represents categorical data using rectangular bars.
Syntax
barplot(values)
Example
marks <- c(80, 90, 75, 60)
names(marks) <- c("A","B","C","D")
barplot(marks, col="blue", main="Student Marks")
2.4 Histogram
Used to show distribution of continuous data.
Syntax
hist(data)
Example
values <- c(5,7,8,9,10,12,12,13,15)
hist(values, main="Histogram of Values")
2.5 Boxplot
Shows distribution and outliers using quartiles.
Syntax
boxplot(data)
Example
heights <- c(150, 155, 160, 162, 170, 180)
boxplot(heights, main="Boxplot of Heights")
Syntax
pie(values)
Example
slices <- c(40, 30, 20, 10)
labels <- c("A", "B", "C", "D")
pie(slices, labels, main="Pie Chart Example")
Features of ggplot2
Layered graphics
Highly customizable
Supports color, themes, facets
Example
library(lattice)
xyplot(mpg ~ wt | factor(cyl), data = mtcars)
3.3 Heatmaps
Used to show intensity values using color.
Example
heatmap([Link](mtcars))
pairs(iris[,1:4])
Example:
persp(volcano)
Important parameters:
col – color
pch – point shape
lty – line type
lwd – line width
cex – character expansion (size)
xlab, ylab, main – labels and titles
Example:
6. Applications of Visualization in R
Business analytics
Scientific research
Financial analysis
Machine learning model evaluation
Health and medical data
Academic research
Social science statistics
8. Conclusion
Basic Visualization in R
Basic visualization refers to the simple plotting techniques available in Base R (without using any
advanced libraries like ggplot2). These visualizations help in understanding the distribution, trends,
and relationships in data.
Definition:
A scatter plot shows the relationship between two continuous variables using points.
Syntax:
plot(x, y)
Example:
x <- c(1,2,3,4,5)
y <- c(3,4,6,8,10)
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y")
output:
2. Line Plot
Definition:
A line plot connects data points with lines; used for time-series and trend analysis.
Syntax:
plot(x, y, type="l")
Example:
months <- 1:6
sales <- c(50,60,65,70,80,90)
plot(months, sales, type="l", main="Sales Trend", xlab="Month", ylab="Sales")
output:
3. Bar Plot
Definition:
Represents categorical data with bars.
Syntax:
barplot(values)
Example:
marks <- c(80,70,90)
names(marks) <- c("A","B","C")
barplot(marks, main="Marks of Students")
output:
4. Histogram
Definition:
Syntax:
hist(data)
Example:
ages <- c(18,19,20,22,23,24,25,26)
hist(ages, main="Age Distribution")
output:
5. Boxplot
Definition:
Displays the minimum, first quartile, median, third quartile, and maximum.
Helps detect outliers.
Syntax:
boxplot(data)
Example:
heights <- c(150,160,170,180,175,165)
boxplot(heights, main="Heights of Students")
output:
6. Pie Chart
Definition:
Syntax:
pie(values)
Example:
sizes <- c(40,30,20,10)
labels <- c("A","B","C","D")
pie(sizes, labels, main="Category Share")
output:
Summary Table
Plot Type Purpose Function
Basic visualization in R is simple, powerful, and useful for quick data exploration. Using functions
like plot(), barplot(), hist(), and boxplot(), we can easily understand trends, patterns, and
distributions in datasets.
⭐ADVANCED VISUALIZATION IN R
✔ggplot2
✔lattice
✔Heatmaps
✔Density plots
✔Pair plots
✔3D plots
Program
library(ggplot2)
Output (Representation)
MPG vs Weight
mpg |
35 | *
30 | * *
25 | * *
20 | * * *
15 | * * *
10 |*
+-------------------------
2 3 4 5 weight
Program
df <- [Link](
year = 2015:2020,
sales = c(50,60,75,80,95,110)
)
Output
Sales Growth
110 | *
100 | *
90 | *
80 | *
70 | *
60 | *
50 | *
+-----------------------------
2015 16 17 18 19 20
Program
df <- [Link](
student = c("A","B","C"),
marks = c(80,90,70)
)
Output
Marks of Students
90 | ██████
80 | ██████
70 | █████
+--------------------
A B C
2. LATTICE GRAPHICS
Program
library(lattice)
Output
MPG vs Weight by Cylinders
3. HEATMAP
Program
data_matrix <- [Link](mtcars[,1:5])
heatmap(data_matrix, main="Heatmap of mtcars")
Output
Heatmap of mtcars
Cyan = Low Values → Red = High Values
4. DENSITY PLOT
Program
plot(density(mtcars$mpg), main="Density Plot of MPG")
Output
Density Plot of MPG
density |
0.25 | /\
0.20 | / \
0.15 | / \
0.10 | / \
0.05 | / \
+---------------------------
10 20 30 40 mpg
5. PAIR PLOT (Correlation Matrix Plot)
Program
pairs(iris[,1:4], main="Pair Plot of Iris Dataset")
Output
Pair Plot of Iris Dataset
Sepal L. * * * * *
* * * *
Sepal W. * * * *
Petal L. * *
Petal W. **
(Scatter plots for every pair of variables)
6. 3D Visualization
Output
3D Surface Plot
________
/ /|
/________/ |
| | /
|________|/
(A 3D box-like surface representation)
SUMMARY TABLE (ADVANCED VISUALIZATIONS)
Technique Package Purpose Example Function
CONCLUSION
1. Introduction
Statistical analysis is the process of collecting, organizing, analyzing, interpreting, and presenting
data to support decision-making.
R is one of the most powerful languages for statistics due to its built-in functions, packages,
visualization tools, and wide support in academic research.
Example Program
data <- c(10, 20, 30, 40, 50)
mean_value
median_value
sd_value
var_value
Output
[1] 30
[1] 30
[1] 15.81139
[1] 250
Interpretation
Interpretation
Output (Summary)
Shows:
Coefficients
R-squared
p-value
Residuals
Example
x <- c(10,20,30,40,50)
y <- c(5,10,20,25,30)
cor(x, y)
Output
[1] 0.982708
Summary Statistics
summary(df)
Boxplot
boxplot(df$income, main="Income Distribution")
Histogram
hist(df$income, main="Income Histogram")
output:
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.7084 0.9560 0.38309 0.14393
Proportion of Variance 0.7296 0.2285 0.03669 0.00518
Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
(b) Clustering
[Link](123)
kmeans_result <- kmeans(iris[,1:4], centers = 3)
kmeans_result$cluster
Statistical analysis in R helps in understanding data, drawing conclusions, predicting outcomes, and
making informed decisions.
Its rich set of statistical functions, models, and visualization tools make R one of the most preferred
languages for academic and research-oriented statistical analysis
R supports all major statistical operations such as descriptive statistics, inferential statistics,
regression modelling, correlation, hypothesis testing, and multivariate analysis.
Mean
Median
Mode
Range
Variance
Standard Deviation
Quartiles
Minimum and Maximum values
Example Program
data <- c(10, 20, 30, 40, 50)
mean(data)
median(data)
var(data)
sd(data)
summary(data)
Expected Output
[1] 30
[1] 30
[1] 250
[1] 15.81139
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 20 30 30 40 50
Major Techniques:
Hypothesis testing
t-tests
ANOVA
Chi-square tests
Confidence intervals
F-tests
Interpretation
Interpretation
Interpretation
Regression analysis is used to study the relationship between a dependent variable and one or more
independent variables.
Example
x <- c(1,2,3,4,5)
y <- c(2,4,5,4,5)
Interpretation
Coefficients show how y changes with x
R-squared shows model accuracy
p-value tests significance
Example
df <- [Link](
income = c(50,60,70,80,100),
experience = c(1,2,3,4,5),
sales = c(3,4,5,6,8)
)
Interpretation
Correlation measures strength and direction of a linear relationship between two numerical
variables.
Example
x <- c(10,20,30,40,50)
y <- c(5,10,20,25,30)
cor(x, y)
Output
0.982708
Interpretation
summary()
str()
head(), tail()
hist()
boxplot()
plot()
Example
df <- [Link](
marks = c(45,55,60,70,80,85,90)
)
[Link](123)
clusters <- kmeans(iris[,1:4], centers = 3)
clusters$cluster
⭐7. Advantages of Using R for Statistical Analysis
⭐8. Conclusion
Statistical analysis in R is an essential part of data science, research, and data-driven decision-
making.
R provides an extensive collection of statistical methods including descriptive analysis, hypothesis
testing, regression modelling, correlation, ANOVA, PCA, and clustering. Its powerful visualization
tools and open-source availability make it one of the most preferred platforms for statistical
computing in academia and industry.
1. Introduction
Basic statistics deals with the methods used to collect, organize, summarize, analyze, and interpret
numerical data.
It helps convert raw data into meaningful information and supports decision-making in fields like
business, education, healthcare, economics, and research.
Includes:
1. Measures of Central Tendency
o Mean
o Median
o Mode
2. Measures of Dispersion
o Range
o Variance
o Standard deviation
o Interquartile range (IQR)
3. Measures of Shape
o Skewness
o Kurtosis
4. Tabular and Graphical representation
o Tables
o Bar charts
o Histograms
o Boxplots
Includes:
Hypothesis testing
t-test
chi-square test
ANOVA
Confidence intervals
Correlation
Regression analysis
Types:
1. Quantitative Variables
o Numeric
Example: height, weight, marks
2. Qualitative Variables
o Categories
Example: gender, blood group
Example:
Data: 10, 20, 30
Mean = (10+20+30) / 3 = 20
4.2 Median
Middle value when data is arranged in order.
Example:
5, 9, 12 → Median = 9
4.3 Mode
Value that occurs most frequently.
Example:
2, 4, 4, 6 → Mode = 4
5.1 Range
Difference between maximum and minimum values.
Example:
Data = 5, 8, 12, 15
Range = 15 − 5 = 10
5.2 Variance
Average of squared deviations from mean.
Example:
Data: 10, 20, 30
Mean = 20
SD ≈ 8.16
⭐6. Correlation
Correlation measures the strength and direction of a relationship between two variables.
+1 → perfect positive
–1 → perfect negative
0 → no correlation
7.1 Experiment
7.2 Event
A set of outcomes.
Example: getting a head.
(B) Histogram
Shows frequency distribution of numerical data.
(D) Boxplot
Shows median, quartiles, and outliers.
Example Data
data <- c(10, 20, 30, 40, 50)
Mean
mean(data)
Median
median(data)
Standard Deviation
sd(data)
Summary
summary(data)
Output
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 20 30 30 40 50
⭐11. Conclusion
Basic statistics forms the foundation for advanced statistical analysis and data science.
It enables understanding of data behavior, identifies patterns, measures relationships, and supports
decision-making.
With tools like R, statistical analysis becomes faster, more accurate, and easier to visualize.
Program
data <- c(10, 20, 30, 40, 50, 50)
print(mean_value)
print(median_value)
print(mode_value)
Output
[1] 33.33333
[1] 35
[1] "50"
Program
data <- c(5, 10, 15, 20, 25)
print(variance)
print(sd_value)
Output
[1] 62.5
[1] 7.905694
Program
values <- c(12, 18, 25, 30, 45, 50)
summary(values)
Output
Min. 1st Qu. Median Mean 3rd Qu. Max.
12 18 27.5 30 45 50
⭐4. Correlation
Program
x <- c(1,2,3,4,5)
y <- c(2,4,5,4,5)
Output
[1] 0.8944272
⭐5. Covariance
Program
x <- c(10,20,30,40)
y <- c(15,25,35,45)
cov(x, y)
Output
[1] 166.6667
Program
data <- c(1,2,2,3,3,3,4,4,5)
table(data)
Output
data
1 2 3 4 5
1 2 3 2 1
Program
marks <- c(45, 50, 55, 60, 70, 80, 90)
hist(marks, main="Marks Distribution", xlab="Marks")
Output (Description):
A histogram showing frequency bars of marks distribution.
⭐8. Boxplot
Program
boxplot(marks, main="Boxplot of Marks")
Output (Description):
Median line
Box showing Q1–Q3
Possible outliers
Program
Probability of getting head in 10 coin tosses:
Output
toss
H T
6 4
[Link](table(toss))
H T
0.6 0.4
Program
x <- c(1,2,3,4,5)
y <- c(3,4,5,6,8)
Output (Description):
Scatter plot showing relationship between x and y.
Parametric and Non-Parametric Techniques (Detailed 20-Marks Answer)
Statistical analysis broadly uses two types of techniques to draw conclusions from data: parametric
and non-parametric techniques. Both approaches differ mainly in their assumptions about the
population, type of data, and method of analysis. Understanding these two categories is essential for
selecting the appropriate statistical test in research.
1. Parametric Techniques
Definition
Parametric techniques are statistical methods that assume the data comes from a population that
follows a specific probability distribution, usually a normal distribution.
They also assume the sample has homogeneity of variance and uses interval/ratio scale data.
Key Characteristics
1. Assume normal distribution
Data must be approximately bell-shaped.
2. Based on fixed parameters
Mean (μ) and standard deviation (σ) describe the population.
3. Used for quantitative (numeric) data
Interval and ratio scale measurements.
4. More powerful tests
Because of strong assumptions, results are more reliable if assumptions are met.
5. Require larger sample size
Usually n > 30.
3. Z-Test
4. Pearson Correlation
5. Linear Regression
Assumptions:
2. Non-Parametric Techniques
Definition
Non-parametric techniques are statistical methods that do NOT assume any particular probability
distribution of the population.
They are useful when data is not normally distributed, small sample size, or when data is ordinal
or nominal.
Key Characteristics
1. No distribution assumptions
Works with skewed or non-normal data.
2. Used for ordinal, nominal, or ranked data
Example: satisfaction rating, ranks.
3. Useful for small sample sizes
Can be used even when n < 30.
4. Less powerful compared to parametric tests but more flexible.
5. Useful for analysis of median or ranks rather than mean.
2. Mann–Whitney U Test
4. Kruskal–Wallis Test
6. Friedman Test
Use of
Mean is used Median/ranks are used
Mean/Median
Data is numeric
Sample size is large
Data is normally distributed
Variances are equal
Conclusion
Both parametric and non-parametric techniques play a crucial role in statistical analysis.
Parametric tests offer more power and precision when assumptions are met, while non-parametric
tests provide flexibility and robustness when assumptions are violated.
A researcher must choose the appropriate method depending on data type, distribution, and sample
size.
Statistical techniques used in data analysis can be broadly classified into parametric and non-
parametric methods. These two groups differ mainly in the assumptions they make about the
population and the type of data they handle.
1. Parametric Techniques
1.1 Definition
Parametric techniques are statistical methods that assume that the underlying population follows a
known distribution, usually a normal distribution.
They use numerical parameters like mean (µ) and standard deviation (σ) to describe the
population.
xˉ=60+72+75+65+68+80+78+708=71\bar{x} = \frac{60+72+75+65+68+80+78+70}{8} =
71xˉ=860+72+75+65+68+80+78+70=71
Step 3: Hypothesis
H₀: µ = 70
H₁: µ ≠ 70
Step 4: t-statistic
Step 5: Decision
At 5% level and df = 7,
critical t = 2.365
Hypothesis:
2. Non-Parametric Techniques
2.1 Definition
Non-parametric techniques are statistical methods that do NOT assume any specific distribution
of the population.
They are useful for ordinal, nominal, or non-normal quantitative data.
2.2 Characteristics
1. Distribution-free – does not require normality
2. Works with small samples
3. Suitable for ranked, ordinal, nominal data
4. Less powerful but more flexible
5. Based on medians, frequencies, or ranks
Male 12 18
Female 10 10
Step 3: Decision
Score Rank
68 1
70 2
72 3
74 4
75 5
76 6
78 7
80 8
82 9
Score Rank
85 10
88 11
90 12
Sum of ranks:
A = 7+8+9+10+11+12 = 57
B = 1+2+3+4+5+6 = 21
Step 2: Mann–Whitney U
Step 3: Critical U
For n1 = n2 = 6 → critical U = 5
Since 0 < 5,
Reject H₀ → Method A is significantly better.
Data is numerical
Data is normal
Variances are equal
Sample size is large
Conclusion
R Program
weights <- c(68, 72, 75, 70, 69, 71, 73, 74)
data: weights
t = 2.31, df = 7, p-value = 0.052
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
69.88 73.62
sample mean = 71.75
Conclusion
R Program
groupA <- c(85, 88, 90, 82, 87)
groupB <- c(78, 80, 76, 75, 79)
OUTPUT
Welch Two Sample t-test
Conclusion
R Program
MethodA <- c(60, 65, 70)
MethodB <- c(55, 58, 54)
MethodC <- c(72, 75, 78)
OUTPUT
Df Sum Sq Mean Sq F value Pr(>F)
group 2 806.7 403.3 20.66 0.00032 ***
Residuals 6 117.0 19.5
Conclusion
R Program
data <- matrix(c(12, 18,
10, 10),
nrow = 2, byrow = TRUE)
[Link](data)
OUTPUT
Pearson's Chi-squared test
data: data
X-squared = 0.4808, df = 1, p-value = 0.4883
Conclusion
p-value = 0.48 > 0.05 ⇒ No relationship between gender & drink choice.
R Program
A <- c(78, 82, 85, 90, 88, 80)
B <- c(70, 75, 72, 68, 74, 76)
OUTPUT
Wilcoxon rank sum test with continuity correction
Conclusion
R Program
before <- c(140, 150, 138, 145, 155)
after <- c(135, 148, 132, 140, 150)
OUTPUT
Wilcoxon signed rank test
Conclusion
R Program
g1 <- c(12, 14, 15)
g2 <- c(18, 20, 19)
g3 <- c(25, 23, 22)
[Link](values ~ groups)
OUTPUT
Kruskal-Wallis rank sum test
Conclusion
R Program
x <- c(10, 20, 30, 40, 50)
y <- c(15, 22, 32, 45, 48)
OUTPUT
Spearman's rank correlation rho
Conclusion
Statistical analysis in R is crucial for decision-making and research because it allows for comprehensive data examination, modeling, and prediction. R supports various statistical techniques such as descriptive and inferential statistics, regression, correlation, and multivariate analysis, which are vital in disciplines like economics, healthcare, and social sciences. Its capability to handle large datasets and provide insightful visualizations aids in drawing reliable conclusions and informing strategic decisions .
R differentiates between local and global variable scopes. Local variables exist within function boundaries, and changes to them do not affect variables outside those boundaries. Global variables, however, are accessible throughout the program. The scope type impacts how values are stored and accessed during function execution, influencing code behavior and potential errors, like unintentional variable masking or modification .
In-built functions in R are predefined and optimized for repeated use, which enhances efficiency by obviating the need for writing code from scratch. They are documented and categorized, which improves reliability through consistent performance in mathematical, statistical, and data manipulation tasks . Their efficient execution compared to custom functions aids in faster processing, particularly in large datasets.
Recursive functions in R call themselves with modified arguments, effectively solving complex problems by breaking them into simpler sub-problems. They offer advantages like code simplicity and elegance for problems like factorials or binary search trees. However, drawbacks include potential for increased memory use and the risk of stack overflow with excessive recursive depth if not properly managed .
User-defined functions in R allow users to encapsulate logic for specific tasks, contributing to code modularity by breaking large programs into smaller, manageable units. They improve reusability, enabling the same code logic to be applied in different contexts. This enhances readability and maintainability of programs by avoiding code repetition and making it easier to debug .
Naming rules in R require variable names to start with a letter and can contain letters, numbers, underscores, or dots. Names cannot begin with a number or contain spaces and are case-sensitive (e.g., 'Age' is different from 'age'). These rules affect program execution by ensuring variable names are valid and error-free, maintaining clarity and preventing conflicts in variable referencing.
Default arguments in R simplify function usage by providing predefined values for parameters, reducing the need for users to specify all arguments each time the function is called. This enhances usability by making functions easier to use and understand, particularly when dealing with functions that have multiple parameters where only a few may change regularly .
EDA is crucial for discovering patterns, anomalies, and relationships in data, forming the basis for subsequent statistical analyses. R offers several tools for EDA, including functions like summary(), hist(), boxplot(), and plot() that provide descriptive statistics and visualizations, helping users to understand data distributions and identify potential issues or insights prior to more formal analysis .
PCA reduces data dimensionality by transforming original variables into a new set of orthogonal features (principal components), preserving as much variance as possible in fewer dimensions. This is important because it simplifies data, reduces computational costs, and helps in identifying the most influential features in large datasets, improving model performance and interpretation .
Parametric techniques assume the data follows a specific probability distribution (often normal) and require interval/ratio data, making them suitable for larger samples and yielding more powerful tests when assumptions are met. Non-parametric techniques, by contrast, make no distribution assumptions and can handle ordinal, nominal, or skewed data. They are suitable for smaller samples and are useful when data does not meet parametric assumptions, though they are generally less powerful .