0% found this document useful (0 votes)

22 views154 pages

Introduction to R Programming Basics

Q: Discuss the importance of statistical analysis in R for decision-making and research in various fields.

Statistical analysis in R is crucial for decision-making and research because it allows for comprehensive data examination, modeling, and prediction. R supports various statistical techniques such as descriptive and inferential statistics, regression, correlation, and multivariate analysis, which are vital in disciplines like economics, healthcare, and social sciences. Its capability to handle large datasets and provide insightful visualizations aids in drawing reliable conclusions and informing strategic decisions .

Q: Describe how R handles variable scope and the potential implications of scope types on function execution.

R differentiates between local and global variable scopes. Local variables exist within function boundaries, and changes to them do not affect variables outside those boundaries. Global variables, however, are accessible throughout the program. The scope type impacts how values are stored and accessed during function execution, influencing code behavior and potential errors, like unintentional variable masking or modification .

Q: How do in-built functions in R enhance efficiency and reliability in data analysis tasks?

In-built functions in R are predefined and optimized for repeated use, which enhances efficiency by obviating the need for writing code from scratch. They are documented and categorized, which improves reliability through consistent performance in mathematical, statistical, and data manipulation tasks . Their efficient execution compared to custom functions aids in faster processing, particularly in large datasets.

Q: Evaluate how recursive functions operate in R and their advantages and potential drawbacks.

Recursive functions in R call themselves with modified arguments, effectively solving complex problems by breaking them into simpler sub-problems. They offer advantages like code simplicity and elegance for problems like factorials or binary search trees. However, drawbacks include potential for increased memory use and the risk of stack overflow with excessive recursive depth if not properly managed .

Q: Explain the role of user-defined functions in R and how they contribute to code modularity and reusability.

User-defined functions in R allow users to encapsulate logic for specific tasks, contributing to code modularity by breaking large programs into smaller, manageable units. They improve reusability, enabling the same code logic to be applied in different contexts. This enhances readability and maintainability of programs by avoiding code repetition and making it easier to debug .

Q: What are the rules for naming variables in R and how do these rules affect program execution?

Naming rules in R require variable names to start with a letter and can contain letters, numbers, underscores, or dots. Names cannot begin with a number or contain spaces and are case-sensitive (e.g., 'Age' is different from 'age'). These rules affect program execution by ensuring variable names are valid and error-free, maintaining clarity and preventing conflicts in variable referencing.

Q: In what ways does the use of default arguments in user-defined functions improve code usability in R?

Default arguments in R simplify function usage by providing predefined values for parameters, reducing the need for users to specify all arguments each time the function is called. This enhances usability by making functions easier to use and understand, particularly when dealing with functions that have multiple parameters where only a few may change regularly .

Q: What is the significance of Exploratory Data Analysis (EDA) in R programming, and which tools does R offer for EDA?

EDA is crucial for discovering patterns, anomalies, and relationships in data, forming the basis for subsequent statistical analyses. R offers several tools for EDA, including functions like summary(), hist(), boxplot(), and plot() that provide descriptive statistics and visualizations, helping users to understand data distributions and identify potential issues or insights prior to more formal analysis .

Q: How does Principal Component Analysis (PCA) contribute to data dimensionality reduction in R, and why is this important?

PCA reduces data dimensionality by transforming original variables into a new set of orthogonal features (principal components), preserving as much variance as possible in fewer dimensions. This is important because it simplifies data, reduces computational costs, and helps in identifying the most influential features in large datasets, improving model performance and interpretation .

Q: How do parametric and non-parametric techniques differ in their assumptions and applications in statistical analysis?

Parametric techniques assume the data follows a specific probability distribution (often normal) and require interval/ratio data, making them suitable for larger samples and yielding more powerful tests when assumptions are met. Non-parametric techniques, by contrast, make no distribution assumptions and can handle ordinal, nominal, or skewed data. They are suitable for smaller samples and are useful when data does not meet parametric assumptions, though they are generally less powerful .

The document provides an overview of R programming, highlighting its features, applications, and variable management. R is an open-source language widely used for statistical computing, data analysis, and visualization, with a rich ecosystem of packages. It supports various data types, in-built functions, and is applicable in fields like data science, finance, and bioinformatics.

Uploaded by

sahilbrawed4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views154 pages

Introduction to R Programming Basics

Uploaded by

sahilbrawed4

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

R PROGRAMMING

UNIT-1
Basics of R Programming
1. Introduction to R

R is a powerful programming language and software environment used for statistical

computing, data analysis, and graphical representation. It was created by Ross Ihaka and
Robert Gentleman at the University of Auckland, New Zealand, in the early 1990s. R is open-
source and freely available under the GNU General Public License.

R is widely used in:

 Data Science and Analytics

 Statistical Modeling
 Machine Learning
 Bioinformatics
 Research and Academic Studies

It provides a rich ecosystem of packages and functions for handling complex data analysis tasks
efficiently.

Key points:

 R is a statistical programming language.

 It supports data manipulation, statistical modeling, and visualization.
 It can run on Windows, MacOS, and Linux.
 R has a large community, which ensures continuous updates and packages.

2. Features of R

R has several distinct features that make it popular among statisticians, researchers, and data
analysts:

a) Open Source

 R is completely free to use.

 It allows users to modify and distribute the software.
 No licensing cost is required.
b) Platform Independent

 R runs on multiple platforms: Windows, MacOS, Linux.

 Scripts written in R are portable across platforms.

c) Data Handling and Storage

 R can handle large volumes of data efficiently.

 Supports vectors, matrices, lists, data frames, and more complex data structures.

d) Statistical and Mathematical Functions

 R provides a wide variety of statistical techniques:

o Descriptive statistics (mean, median, mode)
o Inferential statistics (t-test, chi-square)
o Regression analysis (linear, logistic)
o Time series analysis
 It also supports mathematical operations like linear algebra, calculus, and optimization.

e) Graphical Capabilities

 R can create high-quality plots and charts.

 Supports histograms, bar charts, scatter plots, line charts, boxplots, and 3D plots.
 Packages like ggplot2 enhance the visualization capabilities.

f) Rich Package Ecosystem

 CRAN (Comprehensive R Archive Network) hosts over 20,000 packages for diverse tasks:
o Data manipulation: dplyr, tidyr
o Visualization: ggplot2, plotly
o Machine learning: caret, randomForest

g) Integration with Other Languages

 R can integrate with C, C++, Python, Java, and SQL.

 Allows the user to leverage libraries from other languages for advanced tasks.

h) Reproducibility and Reporting

 R supports reproducible research with packages like R Markdown and Shiny.

 Enables creating dynamic reports and interactive dashboards.

i) Community Support

 R has a large and active community.

 Extensive documentation, tutorials, and forums are available for support.
3. Conclusion

R is a flexible, powerful, and versatile tool for data analysis and visualization. Its open-source
nature, rich package ecosystem, strong statistical capabilities, and graphical tools make it one of the
most preferred languages for statisticians, data scientists, and researchers.

Applications of R
R is a powerful tool for data analysis, statistical computing, and visualization, widely used in
various fields. Its versatility and strong statistical capabilities make it indispensable for researchers,
analysts, and data scientists.

1. Statistical Analysis

 R provides built-in functions and packages for various statistical operations, such as:
o Descriptive statistics: mean, median, variance, standard deviation
o Inferential statistics: t-tests, chi-square tests, ANOVA
o Regression analysis: linear, logistic, multiple regression
o Time series analysis: forecasting, ARIMA models
 Example: Analyzing exam scores of students to find mean performance and trends.

2. Data Visualization

 R is well-known for its graphical capabilities, which include:

o Scatter plots, bar charts, histograms, line charts, boxplots
o Advanced visualizations with packages like ggplot2 and plotly
 Example: Plotting sales trends over months to identify peak periods.

3. Data Science and Machine Learning

 R is widely used in data science for predictive analytics:

o Supervised learning: regression, classification
o Unsupervised learning: clustering (K-means, hierarchical), PCA
o Model evaluation and optimization
 Example: Predicting customer churn based on historical data using logistic regression.

4. Bioinformatics

 R is popular in bioinformatics and genomics for:

o Analyzing gene expression data
o Protein structure prediction
o Sequence analysis
 Example: Visualizing RNA-seq data to find gene expression differences.

5. Finance and Economics

 R is used in financial analytics and econometrics:

o Risk analysis and portfolio management
o Time series forecasting for stock prices
o Economic modeling and simulations
 Example: Predicting stock market trends using ARIMA models.

6. Business Analytics

 Companies use R for decision-making and business insights:

o Customer segmentation
o Sales forecasting
o Marketing campaign analysis
 Example: Identifying the most profitable customer segments using clustering.

7. Machine Learning and Artificial Intelligence

 R has packages like caret, randomForest, xgboost, which allow:

o Predictive modeling
o Classification and regression tasks
o Model validation and performance evaluation
 Example: Building a model to classify emails as spam or non-spam.

8. Reporting and Dashboard Creation

 R supports reproducible research and reporting using:
o R Markdown for dynamic reports
o Shiny for interactive web dashboards
 Example: Creating an interactive dashboard to monitor real-time sales performance.

9. Research and Academia

 R is extensively used in research for:

o Data analysis in experiments
o Statistical modeling in scientific studies
o Publication-quality graphics for journals and papers
 Example: Analyzing survey data to publish research findings.

10. Other Applications

 Integration with big data tools like Hadoop and Spark

 Text mining and natural language processing (NLP)
 Geospatial data analysis with packages like sf and ggmap

Conclusion

R is a versatile tool with applications across statistics, data science, finance, business, research,
and bioinformatics. Its open-source nature, strong statistical functions, and visualization
capabilities make it one of the most preferred tools for data analysis and decision-making.

Variables in R
In R, a variable is a name that stores data or a value. Variables allow you to store, manipulate,
and retrieve data easily in your programs.

1. Creating Variables in R

 Variables are created using the assignment operator: <- (most common), = or ->.
# Using <-
x <- 10
y <- "Hello R"

# Using =
z = TRUE

# Using -> (assign value to variable on the left)

100 -> a

Rules for naming variables:

1. Names must start with a letter (a-z, A-Z).

2. Can contain letters, numbers, underscores (_), or dot (.).
3. Cannot start with a number or contain spaces.
4. Case-sensitive: Age ≠ age.

2. Data Types of Variables

R variables can store different types of data:

Data Type Description Example

Numeric Numbers (decimal or integer) x <- 10.5
Integer Whole numbers y <- 5L
Character Text or string name <- "Alice"
Logical TRUE or FALSE flag <- TRUE
Factor Categorical data group <- factor(c("A","B"))
Complex Complex numbers z <- 2 + 3i

3. Checking Variable Type

 Use the following functions:

x <- 10
y <- "Hello"
z <- TRUE

class(x) # Output: "numeric"

class(y) # Output: "character"
class(z) # Output: "logical"

 Check the mode of a variable:

mode(x) # "numeric"
mode(y) # "character"
4. Changing Variable Type (Type Casting)

 Convert variable types using functions like [Link](), [Link](), [Link](),

[Link]()

x <- "100"
x_num <- [Link](x) # Converts string to numeric
y <- 1
y_char <- [Link](y) # Converts numeric to string

5. Constants vs Variables

 Variables: Can change value during execution

 Constants: Fixed value, usually defined using const in other languages (R doesn’t have a
strict constant mechanism but we can avoid reassigning the variable)

6. Removing Variables

 Use rm() to remove variables from memory:

x <- 10
rm(x) # x is deleted

 Remove all variables:

rm(list = ls()) # Clears the entire environment

7. Examples of Variable Usage

# Numeric variables
age <- 25
height <- 5.9
print(age)
# Output: [1] 25
print(height)
# Output: [1] 5.9

# Character variable
name <- "Alice"
print(name)
# Output: [1] "Alice"

# Logical variable
is_student <- TRUE
print(is_student)
# Output: [1] TRUE

# Factor variable
gender <- factor(c("Male","Female","Female"))
print(gender)
# Output: [1] Male Female Female
# Levels: Female Male

# Using variables in calculations

total <- age + height
print(total)
# Output: [1] 30.9

OUTPUT:

[1] 25
[1] 5.9
[1] "Alice"
[1] TRUE
[1] Male Female Female
Levels: Female Male
[1] 30.9

Summary

 Variables in R store data and allow data manipulation.

 R supports multiple data types: numeric, integer, character, logical, factor, complex.
 Variables are case-sensitive, and their type can be checked or converted.
 Always follow naming rules to avoid errors.

In-Built Functions in R
1. Definition

In R, an in-built function (also called a pre-defined or system function) is a function that is

already provided by the R programming language. These functions are ready to use and allow
programmers to perform common tasks without writing the code from scratch.

Key points:

 In-built functions are part of R’s base package or additional packages.

 They perform specific tasks like mathematical calculations, statistical operations, data
manipulation, or string handling.
 Examples include sum(), mean(), sqrt(), paste(), factor(), length(), etc.
2. Characteristics of In-Built Functions

1. Predefined in R: Already available; no need to define them.

2. Reusable: Can be called multiple times in different programs.
3. Optimized: Written efficiently, faster than custom-written functions.
4. Documented: Every function has help files; e.g., ?sum gives details.
5. Categorized: Functions are categorized based on purpose: mathematical, statistical,
character, logical, or date/time operations.

3. Categories of In-Built Functions with Examples

A) Mathematical Functions

 Used for arithmetic and mathematical calculations.

sqrt(16) # Square root → 4

abs(-5) # Absolute value → 5
round(3.1415,2) # Round to 2 decimals → 3.14
ceiling(4.2) # Round up → 5
floor(4.7) # Round down → 4

B) Statistical Functions

 Used to calculate statistical measures.

x <- c(2,4,6,8)
mean(x) # Average → 5
sum(x) # Total → 20
sd(x) # Standard deviation → 2.581989
var(x) # Variance → 6.666667
median(x) # Median → 5

C) Character/String Functions

 For manipulating and analyzing text data.

name <- "Alice"

nchar(name) # Number of characters → 5
toupper(name) # Convert to uppercase → "ALICE"
tolower("HELLO") # Convert to lowercase → "hello"
substr(name,2,4) # Extract substring → "lic"
paste("Hello", name) # Concatenate strings → "Hello Alice"

D) Logical and Comparison Functions

 For checking conditions or types.

[Link](10) # Check numeric → TRUE

[Link]("R") # Check character → TRUE
any(c(FALSE, TRUE)) # Any TRUE? → TRUE
all(c(TRUE, TRUE)) # All TRUE? → TRUE
length(c(1,2,3)) # Length of vector → 3

E) Factor Functions

 Used for categorical data analysis.

gender <- factor(c("Male","Female","Female"))

levels(gender) # Shows categories → "Female", "Male"

F) Date and Time Functions

 Handling date/time objects.

[Link]() # Current date → "2025-11-22"

[Link]() # Current date and time → "2025-11-22 16:00:00"
[Link]("2025-11-22") # Convert string to Date

4. Example Combining Multiple In-Built Functions

scores <- c(80, 90, 85, 95)

total <- sum(scores) # Sum of scores

average <- mean(scores) # Average
max_score <- max(scores) # Maximum
min_score <- min(scores) # Minimum
sd_score <- sd(scores) # Standard deviation

print(paste("Total:", total))
print(paste("Average:", average))
print(paste("Max:", max_score))
print(paste("Min:", min_score))
print(paste("SD:", round(sd_score,2)))

Output:

[1] "Total: 350"

[1] "Average: 87.5"
[1] "Max: 95"
[1] "Min: 80"
[1] "SD: 6.45"

5. Importance of In-Built Functions

1. Time-saving: No need to write code from scratch.
2. Efficiency: Optimized for speed and performance.
3. Accuracy: Reduces errors in calculations and operations.
4. Ease of Use: Simple syntax for complex operations.
5. Reusability: Can be used across programs or projects.
6. Foundation for Advanced Analysis: Essential for data science, statistical modeling, and
graphical visualization.

6. Conclusion

In-built functions are core components of R. They allow programmers to perform operations
efficiently, handle data of different types, and create analysis-ready outputs. Knowing and using
these functions effectively is key to mastering R programming.

In-Built Functions in R
R provides hundreds of in-built functions for performing various tasks, saving time and effort.
These functions are predefined and ready to use. Below are the main categories with examples.

1. Mathematical Functions

Used for arithmetic and numeric calculations.

Function Description Example Output

sqrt(x) Square root sqrt(16) 4
abs(x) Absolute value abs(-5) 5
round(x, n) Round to n decimal places round(3.1415,2) 3.14
ceiling(x) Round up ceiling(4.2) 5
floor(x) Round down floor(4.7) 4
factorial(x) Factorial factorial(5) 120
sum(x) Sum of numbers sum(c(1,2,3)) 6
prod(x) Product of numbers prod(c(2,3,4)) 24

Example:

x <- 9
sqrt(x) # 3
abs(-12) # 12
round(3.567,2) # 3.57
2. Trigonometric Functions

Used for angles and trigonometry.

Function Description Example Output

sin(x) Sine of x (radians) sin(pi/2) 1
cos(x) Cosine of x cos(0) 1
tan(x) Tangent of x tan(pi/4) 1
asin(x) Inverse sine asin(1) 1.5708
acos(x) Inverse cosine acos(1) 0
atan(x) Inverse tangent atan(1) 0.785398

Example:

sin(pi/6) # 0.5
cos(pi) # -1
tan(pi/4) # 1

3. Logarithmic and Exponential Functions

Used for logarithms and exponentials.

Function Description Example Output

log(x) Natural logarithm (ln) log(2.718) 1
log10(x) Base-10 logarithm log10(100) 2
exp(x) Exponential e^x exp(2) 7.389056

Example:

log(2.718) # 1
log10(1000) # 3
exp(1) # 2.718282

4. Date and Time Functions

Used for handling dates and times.

Function Description Example Output

[Link]()
Current system [Link]() "2025-11-22"
date
Function Description Example Output
[Link]()
Current date and [Link]()
"2025-11-22
time 16:00"
[Link](x)
Convert string to [Link]("2025-11-22") "2025-11-22"
date
format(date, format([Link](),
"%d/%m/%Y") Format date "%d/%m/%Y") "22/11/2025"

Example:

today <- [Link]()

now <- [Link]()
formatted <- format(today, "%d-%b-%Y")

5. Sequence Functions

Used to generate sequences of numbers.

Function Description Example Output

seq(from, to, by) Generate sequence seq(1,5,1) 12345
rep(x, times) Repeat elements rep(1:3, 2) 123123
length(x) Number of elements length(c(1,2,3)) 3

Example:

seq(1,10,2) # 1 3 5 7 9
rep("R",3) # "R" "R" "R"

6. Input/Output (I/O) Functions

Used to read and write data.

Function Description Example

[Link]("[Link]") Read CSV file data <- [Link]("[Link]")
[Link](data, "[Link]") Write data frame to CSV [Link](df,"[Link]")
scan() Read input from console numbers <- scan()
print() Print output to console print("Hello R")
cat() Concatenate and print cat("Total =", 100)

Example:

name <- "Alice"

score <- 90
print(paste("Name:", name, "Score:", score))
cat("Name:", name, "Score:", score)

Step 1: Using print()

print(paste("Name:", name, "Score:", score))

 paste() concatenates strings and variables into a single string: "Name: Alice Score:
90"
 print() displays the string and adds quotes around it and prints the index [1]

Output:

[1] "Name: Alice Score: 90"

Step 2: Using cat()

cat("Name:", name, "Score:", score)

 cat() concatenates and prints without quotes or index

 Produces a clean output suitable for reports or console display

Output:

Name: Alice Score: 90

Key Difference Between print() and cat()

Feature print() cat()

Adds quotes? Yes No
Adds index [1]? Yes No
Suitable for readable output No (for reporting) Yes (for console/plots)
Can handle multiple arguments Not directly Yes, concatenates easily

Summary Table of Functions

Category Functions Examples Purpose

Mathematical sqrt(), abs(), factorial() Numeric calculations
Trigonometric sin(), cos(), tan() Angle calculations
Category Functions Examples Purpose
Logarithmic log(), log10(), exp() Logarithms & exponentials
Date & Time [Link](), [Link](), [Link]() Date and time operations
Sequence seq(), rep(), length() Generate sequences
Input/Output (I/O) print(), cat(), [Link](), [Link]() Data input/output

Data Types in R (In Detail)

In R, data types define the type of values a variable can store. R is a dynamically typed
language, meaning you don’t need to declare types explicitly; R automatically assigns a type
based on the value assigned to a variable.

R variables can store numeric, character, logical, factor, complex, or raw data, each suitable for
different types of computations and analysis.

1. Numeric Data Type

 Definition: Stores numbers, either integers or decimals (floating-point numbers).

 Usage: Used in mathematical calculations.

Example:

# Numeric variables
age <- 25 # Integer-like numeric
height <- 5.9 # Decimal numeric

print(age) # 25
print(height) # 5.9

# Check data type

class(age) # "numeric"
class(height) # "numeric"

2. Integer Data Type

 Definition: Stores whole numbers only.

 Usage: Use L suffix to explicitly create an integer.

Example:

# Integer variable
count <- 10L

print(count) # 10
class(count) # "integer"

3. Character Data Type

 Definition: Stores text or string values.

 Usage: Used for names, labels, or textual data.

Example:

# Character variable
name <- "Alice"
city <- 'New York'

print(name) # "Alice"
print(city) # "New York"

# Check data type

class(name) # "character"

4. Logical Data Type

 Definition: Stores Boolean values: TRUE or FALSE.

 Usage: Used in conditions, comparisons, and loops.

Example:

# Logical variable
is_student <- TRUE
passed_exam <- FALSE

print(is_student) # TRUE
print(passed_exam) # FALSE

# Check data type

class(is_student) # "logical"

Usage in condition:

if(is_student){
print("Student discount applied")
}

5. Factor Data Type

 Definition: Stores categorical data.
 Usage: Useful for grouping, statistical modeling, and analysis.
 Internally stored as integers, but displayed as category labels.

Example:

# Factor variable
gender <- factor(c("Male","Female","Female"))

print(gender)
# Output: Male Female Female
# Levels: Female Male

levels(gender) # "Female" "Male"

class(gender) # "factor"

6. Complex Data Type

 Definition: Stores complex numbers with real and imaginary parts.

 Usage: Used in scientific and engineering calculations.

Example:

# Complex number
z <- 2 + 3i

print(z) # 2+3i
class(z) # "complex"

7. Raw Data Type

 Definition: Stores raw bytes, used for binary data manipulation.

 Usage: Rarely used, mostly for low-level data operations.

Example:

# Raw data
r <- charToRaw("Hello")
print(r) # 48 65 6c 6c 6f
class(r) # "raw"

8. Special Data Types / NULL

 NULL: Represents absence of value.

x <- NULL
print(x) # NULL

 NA: Represents missing data.

y <- NA
print(y) # NA

9. Checking and Converting Data Types

 Check data type:

x <- 10
class(x) # "numeric"
typeof(x) # "double"

 Convert data type (Type Casting):

num <- 10
[Link](num) # "10"
[Link]("20") # 20
[Link](c("A","B","A")) # Factor variable
[Link](1) # TRUE

10. Summary Table of R Data Types

Data Type Description Example Class/Output

Numeric Numbers (integer/decimal) x <- 5, y <- 3.14 "numeric"
Integer Whole numbers a <- 5L "integer"
Character Text/strings name <- "Alice" "character"
Logical TRUE or FALSE flag <- TRUE "logical"
Factor Categorical data gender <- factor(c("M","F")) "factor"
Complex Complex numbers z <- 2+3i "complex"
Raw Raw bytes r <- charToRaw("Hi") "raw"
NULL / NA Missing or no value x <- NULL, y <- NA NULL / NA

11. Key Points

1. R automatically assigns data types when a value is assigned.

2. Choosing the correct data type is important for mathematical, statistical, or textual
operations.
3. Data types can be checked with class() or typeof().
4. Data types can be converted using [Link](), [Link](), [Link](), etc.
Vectors in R
1. Definition

 A vector is a one-dimensional data structure in R that stores elements of the same data
type (numeric, character, logical, or complex).
 Vectors are the most basic and commonly used data structure in R.
 They are used for storing, manipulating, and analyzing data.

2. Types of Vectors

1. Numeric vector – stores numbers.

2. Character vector – stores text or strings.
3. Logical vector – stores TRUE or FALSE values.
4. Integer vector – stores whole numbers explicitly.
5. Complex vector – stores complex numbers.

3. Creating Vectors

 Use the c() function (combine) to create vectors.

Example 1: Numeric Vector

# Numeric vector
numbers <- c(10, 20, 30, 40, 50)
print(numbers)
class(numbers)

Output:

[1] 10 20 30 40 50
[1] "numeric"

Example 2: Character Vector

# Character vector
fruits <- c("Apple", "Banana", "Cherry")
print(fruits)
class(fruits)

Output:

[1] "Apple" "Banana" "Cherry"

[1] "character"

Example 3: Logical Vector

# Logical vector
bool_vec <- c(TRUE, FALSE, TRUE, FALSE)
print(bool_vec)
class(bool_vec)

Output:

[1] TRUE FALSE TRUE FALSE

[1] "logical"

Example 4: Complex Vector

# Complex vector
comp_vec <- c(2+3i, 4+5i, 1+2i)
print(comp_vec)
class(comp_vec)

Output:

[1] 2+3i 4+5i 1+2i

[1] "complex"

4. Accessing Elements of a Vector

 Use indexing (indices start from 1).

numbers <- c(10, 20, 30, 40, 50)

# Access first element

numbers[1] # 10

# Access multiple elements

numbers[c(2,4)] # 20 40

# Access a range
numbers[2:4] # 20 30 40

# Exclude elements
numbers[-3] # 10 20 40 50

5. Vector Operations

 Vectors support arithmetic, comparison, and logical operations.

vec1 <- c(10, 20, 30)

vec2 <- c(1, 2, 3)

# Arithmetic operations
vec1 + vec2 # 11 22 33
vec1 - vec2 # 9 18 27
vec1 * vec2 # 10 40 90
vec1 / vec2 # 10 10 10

# Comparison
vec1 > 15 # FALSE TRUE TRUE

# Logical operations
vec1 > 15 & vec2 < 3 # FALSE TRUE FALSE

6. Vector Functions in R

 R provides built-in functions to manipulate vectors.

Function Description Example Output

length(x) Number of elements length(numbers) 5
sum(x) Sum of elements sum(numbers) 150
mean(x) Mean of elements mean(numbers) 30
sort(x) Sort in ascending order sort(numbers) 10 20 30 40 50
rev(x) Reverse vector rev(numbers) 50 40 30 20 10
unique(x) Unique elements unique(c(1,2,2,3)) 1 2 3
cumsum(x) Cumulative sum cumsum(numbers) 10 30 60 100 150
which(x > 25) Indices of elements matching condition which(numbers > 25) 3 4 5

Example:

numbers <- c(10, 20, 30, 40, 50)

print(length(numbers)) # 5
print(sum(numbers)) # 150
print(mean(numbers)) # 30
print(sort(numbers, decreasing = TRUE)) # 50 40 30 20 10

7. Modifying Vectors
numbers <- c(10, 20, 30, 40, 50)

# Change the 3rd element

numbers[3] <- 35
print(numbers) # 10 20 35 40 50

# Add elements
numbers <- c(numbers, 60)
print(numbers) # 10 20 35 40 50 60
# Remove elements (exclude 2nd)
numbers <- numbers[-2]
print(numbers) # 10 35 40 50 60

8. Example Program Combining Everything

# Creating vectors
nums <- c(5, 10, 15, 20, 25)
fruits <- c("Apple", "Banana", "Cherry")
flags <- c(TRUE, FALSE, TRUE)

# Vector operations
nums_plus_5 <- nums + 5
nums_square <- nums^2

# Access elements
first_num <- nums[1]
last_num <- nums[length(nums)]

# Built-in functions
total <- sum(nums)
average <- mean(nums)
max_num <- max(nums)
min_num <- min(nums)
cum_sum <- cumsum(nums)

# Print outputs
print(nums_plus_5)
print(nums_square)
print(first_num)
print(last_num)
print(total)
print(average)
print(max_num)
print(min_num)
print(cum_sum)

Output:

[1] 10 15 20 25 30
[1] 25 100 225 400 625
[1] 5
[1] 25
[1] 75
[1] 15
[1] 25
[1] 5
[1] 5 15 30 50 75

9. Key Points
1. Vectors store elements of the same data type.
2. Created using c() function, but can also be created with seq() or rep().
3. Supports indexing, slicing, and logical selection.
4. Supports arithmetic and comparison operations.
5. Many built-in functions are available for analysis and manipulation.

MATRICES IN R –

1. Definition of Matrix in R
A matrix in R is a two-dimensional data structure that stores elements in rows and columns.
It can store only one type of data, such as numeric, character, or logical.
Matrices are widely used in data analysis, statistics, and mathematical computations.

2. Creating Matrices in R

2.1 Definition of matrix() Function

The matrix() function is the primary method to create matrices in R.
It arranges data into rows and columns.

Syntax
matrix(data, nrow, ncol, byrow = FALSE)

Definition of Parameters

 data → vector of values

 nrow → number of rows
 ncol → number of columns
 byrow → fills matrix by row when TRUE (default is FALSE)

Example 1: Create a Matrix

m <- matrix(1:6, nrow = 2, ncol = 3)
print(m)

Output
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

(Default filling column-wise)

2.2 Creating Matrix Using byrow = TRUE

Definition

byrow=TRUE fills the matrix row-by-row.

Example
m2 <- matrix(1:6, nrow = 2, byrow = TRUE)
print(m2)

Output
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6

3. Creating Matrices Using Binding Functions

3.1 Definition of cbind()

cbind() (column bind) combines vectors as columns to form a matrix.

Example
a <- c(1, 2, 3)
b <- c(4, 5, 6)
m3 <- cbind(a, b)
print(m3)

Output
a b
[1,] 1 4
[2,] 2 5
[3,] 3 6

3.2 Definition of rbind()

rbind() (row bind) combines vectors as rows to form a matrix.

Example
x <- c(10, 20, 30)
y <- c(40, 50, 60)
m4 <- rbind(x, y)
print(m4)

Output
[,1] [,2] [,3]
x 10 20 30
y 40 50 60

4. Accessing Matrix Elements

Definition
Matrix elements are accessed using indexing:
matrix[row, column].

Example
m <- matrix(1:9, nrow=3)
m[1, 2] # element at row 1 column 2
m[ , 3] # entire column 3
m[2, ] # entire row 2

Output
[1] 4
[1] 3 6 9
[1] 2 5 8
5. Matrix Operations in R

5.1 Definition of Matrix Addition and Subtraction

These operations are performed element-wise on matrices of the same dimension.

Example
A <- matrix(1:4, 2, 2)
B <- matrix(5:8, 2, 2)

A + B
A - B

Output
A + B:
[,1] [,2]
[1,] 6 8
[2,] 9 11

A - B:
[,1] [,2]
[1,] -4 -4
[2,] -4 -4

5.2 Definition of Element-wise Multiplication

Multiplies each element of matrix A with corresponding element of matrix B.
Operator: *

Example
A * B

Output
[,1] [,2]
[1,] 5 21
[2,] 12 32

5.3 Definition of Matrix Multiplication

Matrix multiplication uses the %*% operator.
It follows rules of linear algebra (row × column).
Example
A %*% B

Output
[,1] [,2]
[1,] 19 43
[2,] 22 50

5.4 Definition of Transpose

The transpose of a matrix is obtained by flipping rows and columns.
Function: t()

Example
t(A)

Output
[,1] [,2]
[1,] 1 3
[2,] 2 4

5.5 Definition of Matrix Inverse

solve() returns the inverse of a square matrix (if it is non-singular).

Example
solve(A)

Output
[,1] [,2]
[1,] -2 1
[2,] 1 -0.5

6. Matrix Functions in R

Definition of Common Functions

Function Meaning

dim(A) Returns dimensions (rows, columns)

nrow(A) Number of rows

ncol(A) Number of columns

rowSums(A) Sum of each row

colSums(A) Sum of each column

rowMeans(A) Average of each row

colMeans(A) Average of each column

Example
A <- matrix(1:6, 2, 3)
rowSums(A)
colSums(A)

Output
rowSums: 9 12
colSums: 4 6 8

7. Checking Matrix Type

Definition
R provides functions to check whether an object is a matrix.

Example
[Link](A)
class(A)

Output
[1] TRUE
[1] "matrix"
8. Converting Other Objects to Matrix

Definition
[Link]() converts vectors, data frames, or lists into matrices.

Example
v <- 1:6
m <- [Link](v)
print(m)

Output
[,1]
[1,] 1
[2,] 2
[3,] 3
[4,] 4
[5,] 5
[6,] 6

FINAL SUMMARY (Perfect for 15 Marks)

 A matrix is a 2D homogeneous data structure.

 It is created using matrix(), cbind(), and rbind().
 Matrix elements are accessed using [row, column].
 Supports mathematical operations such as addition, subtraction, transpose, inverse,
matrix multiplication.
 R provides several useful functions like rowSums(), colMeans(), t(), and solve().
 Matrices are widely used in statistics, ML, linear algebra, simulations, etc.

ARRAYS IN R – Detailed Explanation (15 Marks)

1. Definition of Array in R
An array in R is a multi-dimensional data structure that stores elements in two or more
dimensions.
Arrays can store only one type of data (numeric, character, or logical).
They are useful for complex computations in statistics, mathematics, scientific simulations, and
multi-level data operations.

Example:
A 2D array = matrix
A 3D array = multiple matrices stacked
A 4D array = multiple 3D arrays

2. Creating Arrays in R

2.1 Definition of array() Function

The array() function is used to create arrays of any number of dimensions.

Syntax
array(data, dim)

Parameter Definitions

 data → vector of values

 dim → dimension vector (specifies rows, columns, layers)

Example 1: Create a 3D Array

arr <- array(1:12, dim = c(3, 2, 2))
print(arr)

Output
, , 1
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

, , 2
[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12

This means we have 2 matrices (layers), each of size 3 × 2.

3. Accessing Array Elements

Definition
Array elements are accessed using square brackets:

array[row, column, layer]

Example
arr[1, 2, 1] # 1st row, 2nd column, 1st matrix
arr[ , , 2] # entire 2nd matrix
arr[3, , ] # 3rd row across all matrices

Output
[1] 4

[,1] [,2]
[1,] 7 10
[2,] 8 11
[3,] 9 12

, , 1
[1] 3 6

, , 2
[1] 9 12

4. Naming Array Dimensions

Definition
You can assign names to rows, columns, and matrices using:

 dimnames()
or by passing names inside array().
Example
row_names <- c("R1", "R2", "R3")
col_names <- c("C1", "C2")
mat_names <- c("M1", "M2")

arr2 <- array(1:12, dim = c(3,2,2),

dimnames = list(row_names, col_names, mat_names))
print(arr2)

Output
, , M1
C1 C2
R1 1 4
R2 2 5
R3 3 6

, , M2
C1 C2
R1 7 10
R2 8 11
R3 9 12

5. Array Operations

Arrays support element-wise operations.

5.1 Addition and Subtraction

A <- array(1:6, dim=c(3,2,1))
B <- array(6:1, dim=c(3,2,1))

A + B
A - B

Output
A + B:
[,1] [,2]
[1,] 7 7
[2,] 7 7
[3,] 7 7

A - B:
[,1] [,2]
[1,] -5 3
[2,] -3 1
[3,] -1 -1
5.2 Multiplication
A * B

Output
[,1] [,2]
[1,] 6 5
[2,] 8 8
[3,] 10 3

6. Applying Functions Across Dimensions

Definition
apply() is used to perform operations across rows, columns, or layers.

Syntax
apply(array, margin, function)

Margins

 1 = rows
 2 = columns
 3 = layers

Example: Sum of Rows and Columns

arr <- array(1:12, dim = c(3, 2, 2))

# Sum of rows
apply(arr, 1, sum)

# Sum of columns
apply(arr, 2, sum)

# Sum of each layer

apply(arr, 3, sum)

Output
Rows: 33 36 39
Columns: 30 78
Layers: 21 57

7. Combining Arrays

Definition
Arrays can be combined using:

 cbind() → combine column-wise

 rbind() → combine row-wise

Example
A <- array(1:6, dim=c(3,2,1))
B <- array(7:12, dim=c(3,2,1))

cbind(A, B)

Output
, , 1
A B
[1,] 1 7
[2,] 2 8
[3,] 3 9
[4,] 4 10
[5,] 5 11
[6,] 6 12

8. Checking Array Type

Definition
To verify if an object is an array:

Example
[Link](arr)
class(arr)

Output
[1] TRUE
[1] "array"

9. Converting Other Objects to Array

Definition
[Link]() converts vectors, matrices, and lists into arrays.

Example
v <- 1:8
arr <- [Link](v)
print(arr)

Output
[1] 1 2 3 4 5 6 7 8

(1D array)

FINAL SUMMARY (Perfect for 15 Marks)

 An array is a multi-dimensional homogeneous data structure in R.

 Created using the array() function with defined dimensions.
 Supports unlimited dimensions (2D, 3D, 4D, etc.).
 Elements accessed using [row, column, layer].
 Arrays support mathematical, logical, and element-wise operations.
 R provides powerful tools like apply() for performing operations across dimensions.
 Arrays are useful in scientific computing, simulations, graphics, and statistical modeling.

LISTS IN R – Detailed Explanation (15 Marks)

1. Definition of List in R
A list in R is a heterogeneous data structure that can store different types of data elements
together, such as:

 Numbers
 Strings
 Vectors
 Matrices
 Arrays
 Data frames
 Other lists

Lists are flexible and are one of the most important data structures in R.

Example:
A single list can contain a number, a vector, a matrix, and a string all at once.

2. Creating Lists

2.1 Definition of list() Function

The list() function is used to create lists.

Syntax
list(item1, item2, item3, ...)

Example 1: Simple List

my_list <- list(10, "Hello", TRUE)
print(my_list)

Output
[[1]]
[1] 10

[[2]]
[1] "Hello"

[[3]]
[1] TRUE
Example 2: List with Named Components
student <- list(Name="Alice", Age=21, Marks=c(90,85,88))
print(student)

Output
$Name
[1] "Alice"

$Age
[1] 21

$Marks
[1] 90 85 88

3. Accessing List Elements

3.1 Access Using Double Brackets [[]]

Used to extract a single element from a list.

Example
student$Name
student[[1]]
student[["Age"]]

Output
[1] "Alice"
[1] "Alice"
[1] 21

3.2 Access Using Single Brackets []

Returns a sub-list, not the element itself.

Example
student[1]

Output
$Name
[1] "Alice"

3.3 Access Using $ Operator

Used to access named elements.

Example
student$Marks

Output
[1] 90 85 88

4. Modifying List Elements

Definition
Lists are mutable; elements can be added, updated, or removed.

Example 1: Changing an Element

student$Age <- 22
print(student$Age)

Output
[1] 22

Example 2: Adding New Element

student$Grade <- "A"
print(student)

Output
$Name
[1] "Alice"
$Age
[1] 22

$Marks
[1] 90 85 88

$Grade
[1] "A"

Example 3: Removing Element

student$Marks <- NULL
print(student)

Output
$Name
[1] "Alice"

$Age
[1] 22

$Grade
[1] "A"

5. Nested Lists

Definition
A list that contains another list inside it is called a nested list.

Example
nested <- list(
person=list(Name="John", Age=25),
scores=list(Math=90, Science=85)
)
print(nested)

Output
$person
$person$Name
[1] "John"

$person$Age
[1] 25
$scores
$scores$Math
[1] 90

$scores$Science
[1] 85

6. List Operations

6.1 Concatenating Lists

list1 <- list(1,2)
list2 <- list(3,4)
merged <- c(list1, list2)
print(merged)

Output
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

6.2 List Length

length(student)

Output
[1] 3

6.3 Converting List to Vector

unlist(student)

Output
(Example)

Name Age Grade

"Alice" "22" "A"

7. Applying Functions to List Elements

Definition
Use lapply() and sapply() to apply functions to each list element.

Example Using lapply()

num_list <- list(a=1:5, b=6:10)
lapply(num_list, sum)

Output
$a
[1] 15

$b
[1] 40

Example Using sapply()

sapply(num_list, sum)

Output
a b
15 40

8. Converting Lists to Other Structures

Definition
Lists can be converted to vectors, matrices, or data frames.
Example
df <- [Link](student)
print(df)

Output
Name Age Grade
1 Alice 22 A

FINAL SUMMARY (Perfect for 15 Marks)

 A list is a heterogeneous data structure that can store multiple data types.
 Created using the list() function.
 Elements accessed using [], [[]], and $.
 Lists are mutable and support adding, updating, and deleting elements.
 Lists can store complex objects such as matrices, arrays, functions, and nested lists.
 lapply() and sapply() allow applying functions to list components.
 Lists are widely used in data manipulation, statistical modeling, and return values of
functions.

FACTORS IN R (DETAILED – 15 MARKS)

1. Definition

A Factor in R is a data structure used to store categorical data (data that can be divided into
groups).
It stores values as levels, which are unique categories.

Examples of categorical data:

 Gender: Male, Female

 Colors: Red, Blue, Green
 Grades: A, B, C
 Yes/No
Factors are important because many statistical models in R (like regression, ANOVA) require
categorical data to be stored as factors.

2. Why Use Factors? (Importance)

1. Efficient storage of categorical data

2. Maintains category levels even if some categories do not appear
3. Useful for statistical modeling
4. Prevents invalid values from being added
5. Allows ordering of categories (e.g., Low < Medium < High)

3. Creating Factors

We use the factor() function.

Example 1: Creating a simple factor

gender <- factor(c("Male", "Female", "Female", "Male"))
print(gender)

Output
[1] Male Female Female Male
Levels: Female Male

Explanation:
R identifies the categories ("Male", "Female") and arranges them alphabetically as levels.

4. Levels of a Factor

Levels represent the distinct categories.

Example
levels(gender)

Output
[1] "Female" "Male"

5. Ordered Factors

Ordered factors represent data with natural order such as:

 Low < Medium < High

 Poor < Average < Good < Excellent
 Small < Medium < Large

Example: Ordered Factor

rating <- factor(c("Good", "Poor", "Excellent", "Good"),
ordered = TRUE,
levels = c("Poor", "Good", "Excellent"))
print(rating)

Output
[1] Good Poor Excellent Good
Levels: Poor < Good < Excellent

6. Checking Data Type

Use class() and str().

Example
class(gender)
str(gender)

Output
[1] "factor"

Factor w/ 2 levels "Female","Male": 2 1 1 2

7. Converting Factors

(a) Factor → Character

[Link](gender)

(b) Factor → Numeric

(Converts level codes, not values)

[Link](gender)

8. Modifying Factor Levels

Example: Renaming Levels

levels(gender) <- c("FEMALE", "MALE")
print(gender)

Output
[1] MALE FEMALE FEMALE MALE
Levels: FEMALE MALE

9. Adding New Levels

gender <- factor(gender, levels = c("FEMALE", "MALE", "OTHER"))
print(gender)

Output
[1] MALE FEMALE FEMALE MALE
Levels: FEMALE MALE OTHER

10. Summary of a Factor

summary(gender)

Output
FEMALE MALE OTHER
2 2 0
11. Example Program – Full Demonstration

Program
# Creating a factor
city <- factor(c("Delhi", "Mumbai", "Kolkata", "Mumbai", "Delhi"))

# Print factor
print(city)

# Print levels
print(levels(city))

# Rename levels
levels(city) <- c("DELHI", "KOLKATA", "MUMBAI")
print(city)

# Creating ordered factor

grades <- factor(c("B", "A", "C", "A"),
ordered = TRUE,
levels = c("A", "B", "C"))

print(grades)

# Summary of factor
summary(grades)

Output
[1] Delhi Mumbai Kolkata Mumbai Delhi
Levels: Delhi Kolkata Mumbai

[1] "Delhi" "Kolkata" "Mumbai"

[1] DELHI MUMBAI KOLKATA MUMBAI DELHI

Levels: DELHI KOLKATA MUMBAI

[1] B A C A
Levels: A < B < C

A B C
2 1 1

12. Advantages of Factors

1. Memory-efficient
2. Makes analysis of categorical data easy
3. Required for statistical modeling
4. Provides ordering capability
5. Prevents invalid category entries
13. Applications of Factors

 Statistical modeling (ANOVA, regression)

 Classification tasks
 Data preprocessing
 Survey analysis
 Grouping and summarizing data

⭐Perfect 15-Marks Answer Summary

Factors are used to store categorical data as levels. They are essential for statistical modeling.
Factors can be ordered or unordered. R uses factors to efficiently handle attributes like
gender, grades, categories, groups, etc. Operations such as creating factors, modifying levels,
ordering levels, and converting between data types are commonly used. Example programs
demonstrate factor creation, manipulation, and usage with outputs.

✅DATA FRAMES IN R (DETAILED – 15 MARKS)

1. Definition

A Data Frame in R is a two-dimensional data structure used to store tabular data in rows and
columns, similar to an Excel sheet or SQL table.

 Each column can contain different data types (numeric, character, factor, logical, etc.).
 Each row represents an observation, and each column represents a variable.

Data frames are one of the most important data structures for data analysis and statistics in R.
2. Features of Data Frames

1. Two-dimensional structure
Data is arranged in rows and columns.
2. Heterogeneous data
Each column can have a different data type.
3. Column names and row names
Data frames support naming both rows and columns.
4. Easy data manipulation
Subsetting, filtering, merging, adding/removing columns is simple.
5. Great for statistical modeling
Functions like lm(), glm(), [Link]() accept data frames directly.
6. Conversion available
You can convert vectors, lists, and matrices into data frames.

3. Creating a Data Frame

Example 1: Simple Data Frame

# Creating vectors
name <- c("Alice", "Bob", "Charlie")
age <- c(22, 25, 30)
score <- c(89, 92, 85)

# Creating data frame

student_df <- [Link](name, age, score)
print(student_df)

Output
name age score
1 Alice 22 89
2 Bob 25 92
3 Charlie 30 85

4. Accessing Data Frame Elements

(a) Access Column using $

student_df$age

Output
[1] 22 25 30

(b) Access by column index

student_df[ , 2]

Output

[1] 22 25 30

(c) Access by row

student_df[1, ]

Output

name age score

1 Alice 22 89

5. Adding New Columns

Example
student_df$grade <- c("A", "A+", "B")
print(student_df)

Output
name age score grade
1 Alice 22 89 A
2 Bob 25 92 A+
3 Charlie 30 85 B

6. Adding New Rows

Example
new_row <- [Link](name="David", age=28, score=90, grade="A")
student_df <- rbind(student_df, new_row)
print(student_df)

Output
name age score grade
1 Alice 22 89 A
2 Bob 25 92 A+
3 Charlie 30 85 B
4 David 28 90 A

7. Removing Columns and Rows

Remove Column
student_df$grade <- NULL

Remove Row
student_df <- student_df[-2, ]

8. Summary and Structure of a Data Frame

Example
summary(student_df)
str(student_df)

Output
name age score
Alice :1 Min. :22.00 Min. :85.00
Bob :1 1st Qu.:23.75 1st Qu.:87.00
Charlie:1 Median :25.00 Median :89.00
Mean :25.67 Mean :88.67
Max. :30.00 Max. :92.00

'[Link]': 3 obs. of 3 variables:

$ name : chr "Alice" "Bob" "Charlie"
$ age : num 22 25 30
$ score: num 89 92 85

9. Checking Data Types of Columns

sapply(student_df, class)

Output
name age score
"character" "numeric" "numeric"
10. Filtering Rows

Example
student_df[student_df$score > 88, ]

Output
name age score
2 Bob 25 92

11. Merging Data Frames

Example
df1 <- [Link](ID=1:3, Name=c("A","B","C"))
df2 <- [Link](ID=1:3, Score=c(80,90,85))

merge(df1, df2, by="ID")

Output
ID Name Score
1 1 A 80
2 2 B 90
3 3 C 85

12. Converting to Data Frame

From Matrix
m <- matrix(1:6, nrow=2)
df <- [Link](m)

⭐Perfect 15-Marks Summary Answer

A Data Frame is a two-dimensional, table-like structure in R capable of storing heterogeneous data

types across columns. It supports row and column names, easy data manipulation, filtering,
merging, and is widely used in statistical analysis. Data frames can be created using vectors, lists, or
matrices. Common operations include accessing elements, adding/removing rows and columns,
summarizing data, and merging multiple data frames. Examples demonstrate creation, modification,
and analysis with outputs, making data frames one of R’s most essential structures.

⭐ COMPARISON TABLE OF 6 DATA TYPES IN R

Feature Vectors Matrices Arrays Lists Factors Data Frames

1-D 2-D Multi-

1-D
homogene homogeneous dimensional Categorical data 2-D heterogeneous table-like
Definition heterogeneous
ous data rectangular homogeneous stored as levels data
data structure
structure data structure data structure

2 (rows × 2 or more
Dimensions 1 1 1 2 (rows × columns)
columns) (e.g., 3D, 4D)

Data Types Same type Same type Different types Same type
Same type only Different types (mixed)
Allowed only only allowed (categories)

Examples of matrix(1:6, array(1:12, factor(c("Male",

c(1, 2, 3) list(1, "A", TRUE) [Link](Name, Age)
Values nrow=2) dim=c(2,2,3)) "Female"))

Single Two indices Multi-index

Indexing Single index [i] Single index [i] Two indices [i,j]
index [i] [i,j] [i,j,k]

Simple
Mathematical Scientific, Storing
Best Used numeric/ch Storing mixed
operations, multidimensio categorical Statistical datasets, tables
For ar objects
linear algebra nal data variables
sequences

Homogeneo
us / Homogene Homogeneous
Homogeneous Homogeneous Heterogeneous Heterogeneous
Heterogene ous categories
ous

Matrix
Typical Sorting, Multidimensio Accessing Level
multiplication, Filtering, merging, summarizing
Operations arithmetic nal ops components manipulation
transpose

Creation
c() matrix() array() list() factor() [Link]()
Function
Feature Vectors Matrices Arrays Lists Factors Data Frames

Support for
Yes (named list Yes (factor
Row/Colum No Yes Yes Yes
elements) levels)
n Names

Atomic vector
Storage Atomic Atomic vector Recursive Integer codes +
with List of equal-length vectors
Type vector with attributes structure levels
dimensions

Suitable For
No Sometimes No Sometimes Yes Yes (most used)
Modeling

⭐Simple Example Summary Table

Type R Example Output Structure

Vector v <- c(1,2,3) 123

Matrix m <- matrix(1:6, nrow=2) 2×3 matrix

Array a <- array(1:8, c(2,2,2)) 3-D array

List lst <- list(1,"A",TRUE) Mixed elements

Factor f <- factor(c("A","B","A")) Levels: A, B

df <-
Data Frame [Link](x=1:3,y=c("A","B","C"))
Table with 2 columns

⭐Superb 4–5 Lines Conclusion (Useful for 15 Marks)

R provides multiple data structures for different purposes. Vectors, matrices, and arrays store
homogeneous data, whereas lists and data frames store heterogeneous data. Factors are special
structures for categorical variables. Together, these six structures support powerful data
manipulation, statistical modeling, and scientific computation in R.
⭐DECISION-MAKING STRUCTURES IN R (DETAILED – 15 MARKS)

Decision-making structures allow a program to choose different actions based on conditions.

They help control the flow of execution in R programs.

R evaluates conditions as TRUE or FALSE, and executes statements accordingly.

R provides the following decision-making structures:

1. if statement
2. if–else statement
3. if–else if ladder
4. Nested if
5. switch statement
6. ifelse() function (vectorized decision)

🔵 1. IF Statement

✔Definition
The if statement executes a block of code only when a given condition is TRUE.
If the condition is FALSE, R simply skips the block.

✔Syntax
if (condition) {
statements
}

✔Flowchart (text-based)
Condition?
|
+---+---+
| |
TRUE FALSE
| |
Execute Skip
Block Block
✔Example
x <- 10
if (x > 5) {
print("x is greater than 5")
}

✔Output
[1] "x is greater than 5"

🔵 2. IF–ELSE Statement

✔Definition
Used when two alternative actions are required:

 one when condition is TRUE

 another when condition is FALSE

✔Syntax
if (condition) {
statements
} else {
other_statements
}

✔Example
age <- 17
if (age >= 18) {
print("Adult")
} else {
print("Minor")
}

✔Output
[1] "Minor"

🔵 3. IF – ELSE IF – ELSE LADDER

✔Definition
Used when multiple conditions need to be tested one after another.
The first TRUE condition gets executed.

✔Syntax
if (condition1) {
statements1
} else if (condition2) {
statements2
} else if (condition3) {
statements3
} else {
default_statements
}

✔Flow Explanation
 Conditions are checked from top to bottom.
 As soon as one condition is TRUE, the remaining conditions are not evaluated.

✔Example: Grading System

marks <- 85

if (marks >= 90) {

print("Grade A")
} else if (marks >= 75) {
print("Grade B")
} else if (marks >= 50) {
print("Grade C")
} else {
print("Fail")
}

✔Output
[1] "Grade B"

🔵 4. Nested IF Statement

✔Definition
When an if statement is placed inside another if, it is called a nested-if.
Used for multi-level decision making.

✔Syntax
if(condition1) {
if(condition2) {
statements
} else {
statements
}
} else {
statements
}

✔Example: Check positive & odd/even

num <- 12

if (num > 0) {
if (num %% 2 == 0) {
print("Positive Even Number")
} else {
print("Positive Odd Number")
}
} else {
print("Negative Number")
}
✔Output
[1] "Positive Even Number"

🔵 5. SWITCH Statement

✔Definition
The switch() statement selects and executes one option from multiple choices.
Useful for menu-driven applications.

✔Syntax
switch(expression,
case1 = value1,
case2 = value2,
case3 = value3,
...
)

✔Example 1: Using numeric expression

choice <- 3
color <- switch(choice,
"Red", # 1
"Green", # 2
"Blue" # 3
)
print(color)

✔Output
[1] "Blue"

✔Example 2: Using character expression

day <- "Wed"

result <- switch(day,

"Mon" = "Monday",
"Tue" = "Tuesday",
"Wed" = "Wednesday",
"Invalid Day"
)
print(result)

✔Output
[1] "Wednesday"

🔵 6. ifelse() Function

✔Definition
ifelse() is a vectorized decision-making function.
It checks a condition over an entire vector and returns outputs element-wise.

✔Syntax
ifelse(test, value_if_true, value_if_false)

✔Example
x <- c(5, 12, 8, 20)
result <- ifelse(x > 10, "Greater", "Smaller")
print(result)

✔Output
[1] "Smaller" "Greater" "Smaller" "Greater"

⭐Practical Use of Decision-Making in Real-Life Programs

1. Grading systems (marks → grade)

2. Tax calculation (income → tax slab)
3. Login validation (username/password)
4. Menu-based programs (switch case)
5. Data cleaning (replace NA with values using ifelse)
6. Categorizing values (e.g., age group, BMI class)

⭐Comparison Table of Decision-Making Structures

Structure Checks Best Used For Example

if Single condition Simple checks If(x>0)

if-else Two conditions Dual decisions Pass/Fail

else-if ladder Multiple conditions Grading, multi-level checks Marks → grade

nested if Multi-level decision Combined conditions Positive & even

switch Multiple choices Menu, day names switch(choice)

ifelse() Vectorized condition Data cleaning, categorizing ifelse(x>10)

⭐Conclusion (2–3 Marks)

Decision-making structures in R control the logical flow of a program.

They enable execution of code based on TRUE/FALSE conditions and support simple, multiple,
and vectorized decisions.
Structures such as if, if–else, else-if ladder, nested-if, switch, and ifelse() are essential for writing
efficient and intelligent R programs in data analysis, automation, and statistical computing.

Loops in R (Detailed Answer for 15 Marks)

Introduction
Loops in R are control structures that allow repetitive execution of a block of code until a specified
condition is met. They help automate tasks, reduce redundancy, and simplify complex iterative
processes. R provides three main types of loops:
✔for loop
✔while loop
✔repeat loop

1. for Loop

Definition
A for loop in R is used when the number of iterations is already known. It iterates over elements of
a vector, list, sequence, or any iterable object.

Syntax
for (variable in sequence) {
# statements
}

Example 1: Printing numbers 1 to 5

for (i in 1:5) {
print(i)
}

Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Example 2: Calculating squares

for (x in 1:5) {
print(x^2)
}

Output
[1] 1
[1] 4
[1] 9
[1] 16
[1] 25

2. while Loop

Definition
A while loop repeatedly executes a block of code as long as the condition is TRUE.
Used when number of iterations is unknown but depends on a condition.

Syntax
while (condition) {
# statements
}

Example: Print numbers from 1 to 5

i <- 1
while (i <= 5) {
print(i)
i <- i + 1
}

Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Example: Sum of numbers until value exceeds 15

sum <- 0
num <- 1

while (sum <= 15) {

sum <- sum + num
num <- num + 1
}
print(sum)
Output
[1] 21

3. repeat Loop

Definition
The repeat loop is an infinite loop that continues until a break statement is encountered.
Used when the number of iterations is unknown and depends on logic inside the loop.

Syntax
repeat {
# statements
if (condition) {
break
}
}

Example: Print numbers from 1 to 5

x <- 1

repeat {
print(x)
x <- x + 1

if (x > 5) {
break
}
}

Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
4. Control Statements inside Loops

Control statements help modify the normal execution flow of loops.

(a) break Statement

Used to terminate a loop immediately.

Example
for (i in 1:10) {
if (i == 6) {
break
}
print(i)
}

Output
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

(b) next Statement

Skips current iteration and moves to the next.

Example
for (i in 1:5) {
if (i == 3) {
next
}
print(i)
}

Output
[1] 1
[1] 2
[1] 4
[1] 5
(Note: 3 is skipped)

Comparison of Loops in R
Feature for Loop while Loop repeat Loop

Number of iterations
Yes No No
known?

Uses condition? No Yes (top) Yes (break inside)

Can become infinite? Rarely Yes Yes (by default)

Easy to use sequence? Yes No No

Iterating over Condition-based Custom controlled infinite

Best use
vectors repetition loops

Conclusion

Loops in R provide flexible mechanisms to perform repetitive tasks efficiently.

 for loop is best for fixed iterations,

 while loop is best for condition-dependent tasks, and
 repeat loop is best for infinite loops requiring explicit termination.

These loops are essential for data processing, automation, simulations, and iterative computations in
R.

User-Defined Functions in R (20 Marks – Detailed Answer)

1. Introduction
A function in R is a block of organized, reusable code designed to perform a specific task.
R provides many built-in functions, but users can also create their own functions, known as User-
Defined Functions (UDFs).

User-defined functions help:

 Reduce repetition
 Make code modular
 Improve readability
 Enhance reusability

2. Definition

A User-Defined Function (UDF) in R is a function created by the user to perform custom

operations that are not available through built-in functions.
It is defined using the function() keyword and can take parameters, execute statements, and return
values.

3. Syntax of User-Defined Function

function_name <- function(argument1, argument2, ...) {

# body: statements to be executed

return(value)
}

Components

1. function_name → Name given by the user

2. arguments → Input parameters
3. body → Computation
4. return() → Outputs the result (optional)

4. Creating and Calling a Function

Example 1: A simple function

greet <- function() {
print("Hello, welcome to R programming!")
}

greet()

Output
[1] "Hello, welcome to R programming!"

5. Functions with Arguments

Functions may receive inputs (parameters) to operate on.

Example 2: Function with one argument

square <- function(x) {
result <- x * x
return(result)
}

square(5)

Output
[1] 25

Example 3: Function with multiple arguments

add_numbers <- function(a, b) {
sum <- a + b
return(sum)
}

add_numbers(4, 7)

Output
[1] 11

6. Function Without Return Statement

If no return() is used, R automatically returns the last evaluated expression.

Example 4:
multiply <- function(x, y) {
x * y
}

multiply(3, 4)

Output
[1] 12

7. Default Arguments

R allows default values for function parameters.

Example 5
power <- function(x, p = 2) {
return(x^p)
}

power(5)
power(5, 3)

Output
[1] 25
[1] 125

8. Variable Scope in Functions

Scope determines where variables can be accessed.

Types of Scope:

1. Local Scope – Variables created inside a function

2. Global Scope – Variables created outside a function

Example 6: Local vs Global Variables

x <- 10 # Global variable
demo <- function() {
x <- 5 # Local variable
print(x)
}

demo()
print(x)

Output
[1] 5 # Local variable inside function
[1] 10 # Global variable outside function

9. Returning Multiple Values

A function can return multiple values using list().

Example 7
calculate <- function(a, b) {
sum <- a + b
diff <- a - b
prod <- a * b

return(list(SUM = sum, DIFFERENCE = diff, PRODUCT = prod))

}

calculate(10, 4)

Output
$SUM
[1] 14

$DIFFERENCE
[1] 6

$PRODUCT
[1] 40

10. Anonymous Functions

Functions without names are called anonymous or inline functions.

Example 8
(result <- (function(x, y) { x + y })(6, 4))

Output
[1] 10

11. Recursive Functions

A recursive function calls itself.

Example 9: Factorial
factorial_r <- function(n) {
if (n == 0)
return(1)
else
return(n * factorial_r(n - 1))
}

factorial_r(5)

Output
[1] 120

12. Advantages of User-Defined Functions

 Modularity: Break large programs into smaller units

 Reusability: Use the same function in multiple programs
 Readability: Makes code easy to understand
 Debugging ease: Errors can be tracked easily
 Customization: Allows defining logic specific to user requirements

13. Applications of User-Defined Functions

✔Data processing
✔Statistical calculations
✔Custom mathematical models
✔Data cleaning functions
✔Machine learning preprocessing
✔Automated reporting

14. Full Example Program (Combine Everything)

student_result <- function(name, marks) {

# determining grade
if (marks >= 90) {
grade <- "A+"
} else if (marks >= 75) {
grade <- "A"
} else if (marks >= 60) {
grade <- "B"
} else {
grade <- "Fail"
}

# returning as list
return(list(Name = name, Marks = marks, Grade = grade))
}

student_result("Alice", 88)

Output
$Name
[1] "Alice"

$Marks
[1] 88

$Grade
[1] "A"

Conclusion

User-defined functions in R play an essential role in building efficient programs. They help
structure code, facilitate reuse, enhance readability, and support complex computing operations. R
provides high flexibility in designing functions with arguments, return values, default parameters,
recursion, and multiple outputs.
User-Defined Functions in R

A User-Defined Function (UDF) is a function created by the user to perform a specific task.
Functions help in:

 Reusing code
 Improving modularity
 Making programs readable
 Avoiding repetition

In R, functions are created using the function() keyword.

TYPES OF USER-DEFINED FUNCTIONS

There are two main types:

1. Functions without arguments

2. Functions with arguments

1. Functions Without Arguments

Definition
A function that does not require any input is called a function without arguments.
It performs its task independently and uses only the statements inside the function body.

Syntax
function_name <- function() {
# Statements
}

Example 1: Function to Print a Message

greet <- function() {
print("Welcome to R Programming!")
}

greet()

Output
[1] "Welcome to R Programming!"

Example 2: Function to Display Current Date

show_date <- function() {
print([Link]())
}

show_date()

Output (Example)
[1] "2025-11-23"

Example 3: Function Without Return Statement

hello <- function() {
"Hello Students!"
}

hello()

Output
[1] "Hello Students!"

2. Functions With Arguments

Definition
A function that takes one or more inputs is known as a function with arguments.
These inputs (parameters) help the function perform calculations or processing.

Syntax
function_name <- function(arg1, arg2, ...) {
# Statements
}

Arguments can be:

 Mandatory
 Optional with default values

Examples of Functions With Arguments

Example 1: Adding Two Numbers

add <- function(a, b) {
sum <- a + b
return(sum)
}

add(10, 20)

Output
[1] 30

Example 2: Function With Default Argument

power <- function(x, p = 2) {
return(x ^ p)
}

power(5) # uses default p = 2

power(5, 3) # user-defined p

Output
[1] 25
[1] 125

Example 3: Calculate Area of Circle

area_circle <- function(r) {
area <- 3.14 * r * r
return(area)
}

area_circle(7)

Output
[1] 153.86

Example 4: Find Maximum of Three Numbers

maximum <- function(a, b, c) {
max_val <- max(a, b, c)
return(max_val)
}

maximum(10, 50, 30)

Output
[1] 50

Difference Between With Argument & Without Argument Functions

Feature Without Argument With Argument

Input No input required Input must be provided

Flexibility Limited Highly flexible

Example greet() add(3, 4)

Usage Fixed output Output depends on inputs

Parameters No parameters One or more parameters

Real Use Display messages, fixed tasks Calculations, data processing

Combined Example (Both Types)

# Without Argument
welcome <- function() {
print("Hello! This is an R Program.")
}

# With Arguments
multiply <- function(x, y) {
return(x * y)
}

welcome()
multiply(4, 5)

Output
[1] "Hello! This is an R Program."
[1] 20

Conclusion

User-defined functions are powerful tools in R:

 Functions without arguments perform fixed tasks.

 Functions with arguments accept inputs to produce customized results.

Together, they make R programming flexible, organized, and efficient.

USER-DEFINED PACKAGE IN R (20 MARKS – DETAILED ANSWER)

1. Introduction
R is widely known for its strong package ecosystem. Packages extend the functionality of R by
providing additional functions, datasets, and tools.
While thousands of packages exist on CRAN, users often need to create their own packages for:

 Custom functions
 Reusable code
 Research projects
 Sharing tools with others
 Automation of common tasks

A User-Defined Package allows users to bundle their functions, documentation, and data into a
structured form that can be installed and used just like any other R package.
2. Definition

A User-Defined Package in R is a collection of R functions, documentation, and datasets created

by the user and arranged in a standard directory structure defined by R. Once created, the package
can be installed, loaded, distributed, and reused.

3. Need for User-Defined Packages

 To reuse functions across different projects

 To collaborate and share code with others
 To simplify complex operations by grouping functions
 To maintain consistency and avoid rewriting code
 For academic, research, or industry-level applications
 To publish work on CRAN or GitHub

4. Basic Structure of an R Package

mypackage/
│
├── DESCRIPTION
├── NAMESPACE
│
├── R/
│ ├── function1.R
│ ├── function2.R
│
├── man/
│ ├── [Link]
│ ├── [Link]
│
├── data/
│ ├── [Link]
│
└── [Link]

Explanation of Components

Component Description
Component Description

DESCRIPTION Contains package metadata (title, version, author, etc.)

NAMESPACE Controls which functions are exported

R/ Contains all user-defined function files

man/ Documentation files automatically generated

data/ Stores datasets

[Link] Overview of package usage

5. Steps to Create a User-Defined Package in R

There are two methods:

(A) Manual creation
(B) Using packages like devtools (recommended)

A. Creating R Package Manually

Step 1: Create a folder for the package

mkdir mypackage

Step 2: Create DESCRIPTION file

Open a file named DESCRIPTION and add:

Package: mypackage
Type: Package
Title: My First User Defined Package
Version: 0.1.0
Author: Your Name
Maintainer: Your Name <youremail@[Link]>
Description: This package contains custom functions created by the user.
License: GPL-3
Encoding: UTF-8
Step 3: Create NAMESPACE file
export(add_two)
export(square_num)

Step 4: Create R folder and add functions

Inside mypackage/R, create files:

File 1: add_two.R
add_two <- function(x) {
return(x + 2)
}

File 2: square_num.R
square_num <- function(n) {
return(n * n)
}

Step 5: Build and Install Package

Open R and run:

[Link]("mypackage", repos = NULL, type = "source")

Step 6: Load and use the package

library(mypackage)

add_two(5)
square_num(4)

Output
[1] 7
[1] 16

B. Creating R Package Using devtools (Easy Method)

Step 1: Install devtools

[Link]("devtools")
library(devtools)

Step 2: Create package skeleton

create("mypackage")

This automatically creates the structure.

Step 3: Add functions

Inside mypackage/R, create functions:

hello <- function(name) {

paste("Hello", name)
}

Step 4: Document functions using roxygen2

#' Say Hello
#' @param name A character string
#' @return A greeting message
#' @export
hello <- function(name) {
paste("Hello", name)
}

Run:

devtools::document()

Step 5: Install package

devtools::install("mypackage")

Step 6: Use your package

library(mypackage)
hello("Aishman")

Output
[1] "Hello Aishman"
6. Adding Data to Package

 Save dataset as .rda using:

save(mydata, file = "data/[Link]")

 Data becomes available when package loads.

7. Advantages of User-Defined Packages

1. Reusability

Functions can be used across multiple projects.

2. Collaboration

Share package with teams, students, or the public.

3. Organization

Keeps related functions and datasets grouped together.

4. Maintenance

Easy to update and debug a well-structured package.

5. Distribution

Can be uploaded to CRAN or GitHub.

8. Real-Life Applications

 Academic research tools

 Statistical modeling functions
 Machine Learning pipelines
 Data cleaning utilities
 Visualization packages
 Domain-specific analysis tools

9. Example Package Demonstration

Example: "studentTools" package

Functions included:

 calculate_grade()
 percentage()

Function 1: percentage()
percentage <- function(marks, total = 100) {
return((marks / total) * 100)
}

Function 2: calculate_grade()
calculate_grade <- function(percentage) {
if (percentage >= 90) return("A+")
else if (percentage >= 75) return("A")
else if (percentage >= 60) return("B")
else return("Fail")
}

Using package after installation

library(studentTools)

p <- percentage(88)
calculate_grade(p)

Output
[1] "A"

Conclusion

User-Defined Packages in R are essential tools that allow programmers and data scientists to extend
R’s capabilities. They help structure code professionally, promote reusability, and enable sharing
with the research or developer community. Creating packages enhances efficiency and supports
large-scale scientific and analytical projects.

REPORTS USING R MARKDOWN

1. Introduction
R Markdown (Rmd) is a powerful tool in R used to create dynamic, reproducible reports.
It allows users to combine:

 Text (written in Markdown)

 Code (R or other languages)
 Output (tables, plots, results)

All in a single document that can be converted into multiple formats such as:

✔HTML
✔PDF
✔Word
✔Slides

R Markdown is widely used in data analysis, reporting, research papers, statistical summaries,
dashboards, and presentations.

2. What is R Markdown?

R Markdown is a file format with extension .Rmd that blends markdown syntax with R code
chunks.

It follows the concept of literate programming, where analysis and documentation exist together in
one report.
3. Structure of an R Markdown Document

An R Markdown file contains three main parts:

(A) YAML Header

Starts and ends with --- and contains document metadata.

Example
---
title: "Sales Report 2025"
author: "Aishman Rai"
date: "23 November 2025"
output: html_document
---

(B) Markdown Text

Used to write headings, paragraphs, bullet lists, etc.

Example
## Introduction
This report shows the monthly sales performance.

(C) R Code Chunks

Code that R executes when the report is generated.

Example
```{r}
x <- 1:5
mean(x)

When the report is knitted, the result will appear in the output.

---

# 4. Creating an R Markdown Report

### Step 1 – Install R Markdown

```r
[Link]("rmarkdown")

Step 2 – Create a New R Markdown File

In RStudio:
File → New File → R Markdown

Step 3 – Write Text and Code

Add code chunks and descriptive content.

Step 4 – Knit the Document

Click "Knit" to generate output in desired format:

 HTML
 PDF
 Word

5. Markdown Writing Basics

Headings
# Heading 1
## Heading 2
### Heading 3

Bold and Italics

**bold text**
*italic text*

Lists
- Item 1
- Item 2

Tables
| Name | Marks |
|------|--------|
| Riya | 89 |
| Sam | 78 |
6. R Code Chunk Options

Code chunk begins with:

```{r}
...

### Common Options

| Option | Meaning |
|--------|---------|
| echo=FALSE | hides R code |
| message=FALSE | hides package messages |
| warning=FALSE | hides warnings |
| eval=FALSE | does not run code |
| include=FALSE | runs code but hides both code and output |

### Example
```r
```{r, echo=FALSE, message=FALSE}
library(ggplot2)

---

# 7. Example of a Complete R Markdown Report

```markdown
---
title: "Student Performance Report"
author: "Aishman Rai"
output: word_document
---

# Introduction
This report analyses the marks of students.

# Data

```{r}
marks <- c(80, 75, 92, 60, 88)
marks

Average Marks
Writing

mean(marks)

Text
Plot of Marks
Writing

plot(marks, type="b", main="Student Marks", xlab="Student", ylab="Marks")

Text

Conclusion

The average performance is satisfactory.

---

# 8. Features of R Markdown in Reporting

### 1. Dynamic Reporting

Reports update automatically when data or code changes.

### 2. Reproducibility

Same code produces the same results—important in research.

### 3. Integration

Supports:
- R
- Python
- SQL
- JavaScript

### 4. Multiple Output Formats

HTML, PDF, Word, PowerPoint, dashboard reports, websites, and e-books.

### 5. Easy to Learn

Markdown is simple text-based formatting.

---

# 9. Applications of R Markdown Reports

- Academic project reports
- Data analysis summaries
- Business analytics presentations
- Statistical modelling reports
- Research documentation
- Automated weekly/monthly reports
- Machine learning model reports
- Dashboards (with flexdashboard / Shiny)

---

# **10. Advantages**
✔ Clean and professional reports
✔ Combines text, code, and graphs
✔ Promotes reproducibility
✔ Supports multiple output formats
✔ Easy to maintain
✔ Excellent for research and education

---

# **Conclusion**
R Markdown is a powerful tool for producing **dynamic, reproducible, and
professional reports** that combine analysis, interpretation, and visualization
in a single document. Its ability to integrate code and narrative makes it
essential for data scientists, researchers, and students.

**Reports Using R Markdown

(Direct Rendering and Indirect Rendering)**

1. Introduction to R Markdown

R Markdown is a powerful document format in R that allows users to combine code, text, and
output in a single file.
It is widely used for creating:

 Reports
 Assignments
 Presentations
 Dashboards
 Websites
 Books

An R Markdown file has the extension .Rmd and contains:

1. YAML Header → Title, author, date, output format

2. Markdown Text → Headings, lists, tables
3. R Code Chunks → Code that runs and displays output
2. Components of an R Markdown Document

(a) YAML Header

Placed at the top of the file inside ---.

Example:

---
title: "Student Report"
author: "Aishman Rai"
date: "`r [Link]()`"
output: html_document
---

(b) Markdown Section

Used for writing text, headings, and explanations.

Example:

## Introduction
This report analyses the performance of students.

(c) R Code Chunks

Used to execute R code.

Example:

```{r}
x <- c(10, 20, 30)
mean(x)

---

# 3. Rendering Methods in R Markdown

R Markdown supports two types of rendering:

# A. Direct Rendering

# **B. Indirect Rendering**

---

# ⭐ A. Direct Rendering

### **Definition**
Direct rendering refers to **rendering (knitting) the R Markdown file directly**
from the RStudio interface or by using the knit button.

The `.Rmd` file is processed by knitr and converted into:

- HTML report
- PDF report
- Word report

### How it Works

1. Open the `.Rmd` file in RStudio
2. Click the **Knit** button
3. Select the output format
4. RStudio **runs all R code**, then generates the final report

### **Example**

Example File: `[Link]`

```yaml
---
title: "Direct Rendering Report"
output: html_document
---
## Student Data Summary

```{r}
marks <- c(85, 90, 78, 92)
mean(marks)

### Direct Rendering Output

- A report automatically opens showing:
- Title
- Text
- Output of the R code
- Graphs, tables, etc.

### Advantages of Direct Rendering

- Simple and fast
- No need to write additional R code for rendering
- Ideal for routine reports
- User-friendly for beginners

---

# ⭐ B. Indirect Rendering

### **Definition**
Indirect rendering means **rendering an R Markdown file programmatically using R
scripts**, not by manually clicking “Knit”.

This is done using:

```r
rmarkdown::render("[Link]")

How it Works
1. Write an .Rmd file
2. Use an R script (.R file) to call rmarkdown::render()
3. The .Rmd is processed and converted into HTML/PDF/Word automatically

Example

Step 1: Create [Link]

---
title: "Indirect Rendering Example"
output: pdf_document
---
## Summary

```{r}
x <- rnorm(5)
x

### Step 2: Create render_script.R

```r
library(rmarkdown)
render("[Link]", output_format = "html_document")

Run the script:

source("render_script.R")

Advantages of Indirect Rendering

 Useful for automation

o Monthly reports
o Weekly sales reports
o Daily data analysis
 Can generate multiple reports at once
 Works without opening RStudio
 Useful for servers, batch runs, scheduling

Direct vs Indirect Rendering (Comparison Table)

Feature Direct Rendering Indirect Rendering

Definition Rendering directly using Knit button Rendering using rmarkdown::render()

User Interface Requires RStudio Works in any R environment

Feature Direct Rendering Indirect Rendering

Automation Not suitable Highly suitable

Ease of Use Very easy for beginners Requires coding knowledge

Batch Processing Not possible Easily possible

Use Case Assignments, small reports Large production pipelines

4. Example: A Small R Markdown Report

---
title: "Student Marks Report"
output: html_document
---
## Marks Summary
This section shows student marks.
```{r}
marks <- c(70, 85, 90, 76)
summary(marks)

```r
```{r}
plot(marks, type="b", main="Student Marks", xlab="Student", ylab="Marks")

---

# **5. Conclusion**

R Markdown is a powerful reporting tool that integrates documentation and code.

It supports two rendering approaches:

1. Direct Rendering – simple, interactive, suitable for small reports

2. **Indirect Rendering** – automated, script-based, suitable for large systems

Both methods together make R Markdown ideal for academic, industrial, and
research-oriented reporting.
UNIT-2
Import and Export of Data in R (Detailed Notes)

Data import and export are essential operations in R because most real-world data comes from
external sources such as text files, CSV files, Excel, databases, or web APIs. R provides a rich set of
built-in functions and packages for reading (importing) and writing (exporting) data in different
formats.

1. IMPORTING DATA IN R

Import means bringing external data into R so that it can be analyzed, manipulated, or visualized.

R supports importing data from:

 Text files (.txt)

 CSV files (.csv)
 Excel files (.xlsx)
 R’s native files (.RData, .RDS)
 Databases (MySQL, SQL Server, PostgreSQL)
 Web data (JSON, XML)
 Statistical software files (SPSS, SAS, STATA)

1.1 Importing CSV Files

Function:
[Link]("[Link]")

Example:
data <- [Link]("[Link]")
print(data)

Output (example):
Name Age Marks
1 Ram 18 85
2 Sita 19 90
3 Ravi 20 78

1.2 Importing Text Files (.txt)

Function:
[Link]("[Link]", header = TRUE, sep = "\t")

Example:
data <- [Link]("[Link]", header = TRUE)

1.3 Importing Excel Files (.xlsx)

Requires the package readxl.

Function:
library(readxl)
data <- read_excel("[Link]")

Example:
print(data)

1.4 Importing R Native Files

(a) .RData File

Loads multiple objects.

load("[Link]")

(b) .RDS File

Loads a single object.

data <- readRDS("[Link]")

1.5 Importing JSON Files

Requires jsonlite package.

library(jsonlite)
data <- fromJSON("[Link]")

1.6 Importing Data from the Web

url <- "[Link]
data <- [Link](url)

2. EXPORTING DATA IN R

Exporting means writing or saving data from R into external files so they can be used in other
applications like Excel, Python, SPSS, etc.
2.1 Exporting CSV Files

Function:
[Link](data, "[Link]", [Link] = FALSE)

Example:
[Link](iris, "iris_output.csv", [Link] = FALSE)

2.2 Exporting Text Files

[Link](data, "[Link]", sep = "\t", [Link] = FALSE)

2.3 Exporting Excel Files (.xlsx)

Requires openxlsx package.

library(openxlsx)
[Link](data, "[Link]")

2.4 Exporting R Native Files

(a) Save Multiple Objects

save(data1, data2, file = "[Link]")

(b) Save One Object

saveRDS(data, "[Link]")

Reading Again
d <- readRDS("[Link]")

2.5 Exporting JSON Files

library(jsonlite)
write_json(data, "[Link]")
3. Importing and Exporting Data from Databases

Connecting to MySQL
library(RMySQL)
conn <- dbConnect(MySQL(), dbname="school", host="localhost",
user="root", password="1234")

Import Query
data <- dbGetQuery(conn, "SELECT * FROM students")

Export Query
dbWriteTable(conn, "new_table", data)

4. Example Program (Full Demonstration)

# Importing CSV file
students <- [Link]("[Link]")

# Displaying data
print(students)

# Performing operation
students$Percentage <- (students$Marks / 100) * 100

# Exporting new data to CSV

[Link](students, "students_output.csv", [Link] = FALSE)

5. Advantages of R Import/Export Functions

1. Supports multiple data formats

2. Fast and efficient
3. Can automate data pipelines
4. Allows integration with databases
5. Helps in reproducible research

6. Summary Table
Data Format Import Function Export Function
Data Format Import Function Export Function

CSV [Link]() [Link]()

TXT [Link]() [Link]()

Excel read_excel() [Link]()

RData load() save()

RDS readRDS() saveRDS()

JSON fromJSON() write_json()

SQL Database dbGetQuery() dbWriteTable()

Conclusion

Importing and exporting data is a fundamental part of data analysis in R.

R provides a wide variety of functions and packages that allow easy reading and writing of nearly
every data format used in real-world applications. Understanding these functions is essential for
efficient data analysis, research, and reporting.

DATA VISUALIZATION TECHNIQUES IN R (20 Marks Detailed Answer)

1. Introduction
Data visualization is the process of converting raw data into graphical or pictorial formats. It helps
in identifying trends, patterns, relationships, and anomalies within data.
R programming provides a powerful environment for visualizing data using built-in functions,
graphical parameters, and advanced libraries.

Visualization is essential for:

 Summarizing large datasets

 Detecting patterns and trends
 Presenting results in an understandable way
 Supporting decision-making
R supports both basic plots (base graphics) and advanced graphics (ggplot2, lattice).

2. Types of Visualization Techniques in R

R offers the following major visualization techniques:

(A) BASIC VISUALIZATION TECHNIQUES (Base R)

2.1 Scatter Plot

A scatter plot displays the relationship between two continuous variables.

Use: Trend analysis, correlation.

Syntax
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y")

Example
x <- c(1,2,3,4,5)
y <- c(3,4,6,8,10)
plot(x, y, type="p", main="Scatter Plot", xlab="X Values", ylab="Y Values")

2.2 Line Chart

Used for trend analysis over time.

Syntax
plot(x, y, type="l")

Example
years <- 2015:2020
sales <- c(50, 60, 65, 70, 90, 110)
plot(years, sales, type="l", main="Sales Trend", xlab="Year", ylab="Sales")
2.3 Bar Chart
Represents categorical data using rectangular bars.

Syntax
barplot(values)

Example
marks <- c(80, 90, 75, 60)
names(marks) <- c("A","B","C","D")
barplot(marks, col="blue", main="Student Marks")

2.4 Histogram
Used to show distribution of continuous data.

Syntax
hist(data)

Example
values <- c(5,7,8,9,10,12,12,13,15)
hist(values, main="Histogram of Values")

2.5 Boxplot
Shows distribution and outliers using quartiles.

Syntax
boxplot(data)

Example
heights <- c(150, 155, 160, 162, 170, 180)
boxplot(heights, main="Boxplot of Heights")

2.6 Pie Chart

Represents parts of a whole.

Syntax
pie(values)

Example
slices <- c(40, 30, 20, 10)
labels <- c("A", "B", "C", "D")
pie(slices, labels, main="Pie Chart Example")

(B) ADVANCED VISUALIZATION TECHNIQUES

3.1 ggplot2 (Grammar of Graphics)

The most powerful visualization library in R.

Example – Scatter Plot

library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg)) +
geom_point() +
ggtitle("MPG vs Weight")

Features of ggplot2

 Layered graphics
 Highly customizable
 Supports color, themes, facets

3.2 Lattice Graphics

Used for multi-panel plots.

Example
library(lattice)
xyplot(mpg ~ wt | factor(cyl), data = mtcars)
3.3 Heatmaps
Used to show intensity values using color.

Example
heatmap([Link](mtcars))

3.4 Density Plot

Smooth representation of data distribution.

plot(density(mtcars$mpg), main="Density of MPG")

3.5 Pair Plot / Matrix Plot

Shows relationships between multiple variables.

pairs(iris[,1:4])

3.6 3D Plots (scatter3D, persp)

Used for multi-dimensional data.

Example:

persp(volcano)

4. Components of a Plot (Graphical Parameters)

R provides par() to control plot appearance.

Important parameters:

 col – color
 pch – point shape
 lty – line type
 lwd – line width
 cex – character expansion (size)
 xlab, ylab, main – labels and titles

Example:

plot(x, y, col="red", pch=19, main="Customized Plot")

5. Advantages of Data Visualization in R

1. Handles large datasets

2. Highly customizable graphs
3. Supports both static and interactive graphics
4. Publication-quality visuals
5. Integrates with statistical analysis
6. Wide range of libraries (ggplot2, plotly, lattice, shiny)
7. Ideal for research, analytics, and presentations

6. Applications of Visualization in R

 Business analytics
 Scientific research
 Financial analysis
 Machine learning model evaluation
 Health and medical data
 Academic research
 Social science statistics

7. Summary Table of Visualization Types

Visualization Purpose Example Function

Scatter Plot Relationship between two variables plot()

Line Chart Time-series / trends plot(type="l")

Bar Plot Compare categories barplot()

Histogram Data distribution hist()

Visualization Purpose Example Function

Boxplot Summary + outliers boxplot()

Pie Chart Percentage breakdown pie()

Density Plot Smooth distribution density()

Heatmap Intensity values heatmap()

Pair Plot Multi-variable relationship pairs()

ggplot2 Plots Professional graphics ggplot()

8. Conclusion

Visualization in R plays a crucial role in analyzing, summarizing, and presenting data.

R offers both simple base graphics and advanced tools like ggplot2 and lattice, making it suitable
for academic research, business intelligence, machine learning, and statistical analysis.
Its powerful plotting ecosystem ensures high-quality visual outputs that help in better interpretation
and decision-making.

Basic Visualization in R

Basic visualization refers to the simple plotting techniques available in Base R (without using any
advanced libraries like ggplot2). These visualizations help in understanding the distribution, trends,
and relationships in data.

R provides the following basic plotting functions:

1. Scatter Plot

Definition:

A scatter plot shows the relationship between two continuous variables using points.

Syntax:
plot(x, y)

Example:
x <- c(1,2,3,4,5)
y <- c(3,4,6,8,10)
plot(x, y, main="Scatter Plot", xlab="X", ylab="Y")

output:
2. Line Plot

Definition:

A line plot connects data points with lines; used for time-series and trend analysis.

Syntax:
plot(x, y, type="l")

Example:
months <- 1:6
sales <- c(50,60,65,70,80,90)
plot(months, sales, type="l", main="Sales Trend", xlab="Month", ylab="Sales")

output:

3. Bar Plot

Definition:
Represents categorical data with bars.

Syntax:
barplot(values)

Example:
marks <- c(80,70,90)
names(marks) <- c("A","B","C")
barplot(marks, main="Marks of Students")

output:

4. Histogram

Definition:

Shows the distribution of continuous data by grouping values into bins.

Syntax:
hist(data)

Example:
ages <- c(18,19,20,22,23,24,25,26)
hist(ages, main="Age Distribution")

output:

5. Boxplot

Definition:

Displays the minimum, first quartile, median, third quartile, and maximum.
Helps detect outliers.

Syntax:
boxplot(data)

Example:
heights <- c(150,160,170,180,175,165)
boxplot(heights, main="Heights of Students")
output:

6. Pie Chart

Definition:

Shows percentage (part-to-whole) relationships.

Syntax:
pie(values)

Example:
sizes <- c(40,30,20,10)
labels <- c("A","B","C","D")
pie(sizes, labels, main="Category Share")

output:
Summary Table
Plot Type Purpose Function

Scatter Plot Relationship between variables plot()

Line Plot Trend analysis plot(type="l")

Bar Plot Categorical comparison barplot()

Histogram Distribution of data hist()

Boxplot Identify quartiles & outliers boxplot()

Pie Chart Percent share pie()

Conclusion

Basic visualization in R is simple, powerful, and useful for quick data exploration. Using functions
like plot(), barplot(), hist(), and boxplot(), we can easily understand trends, patterns, and
distributions in datasets.

⭐ADVANCED VISUALIZATION IN R

Advanced visualization refers to high-quality, customizable, multi-layered graphical techniques

used for deep data analysis.
R provides many advanced plotting systems:

✔ggplot2
✔lattice
✔Heatmaps
✔Density plots
✔Pair plots
✔3D plots

Below are programs and outputs.

1. ggplot2 – Grammar of Graphics

1.1 Scatter Plot (with ggplot2)

Program
library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg)) +

geom_point(color = "blue") +
ggtitle("MPG vs Weight")

Output (Representation)
MPG vs Weight
mpg |
35 | *
30 | * *
25 | * *
20 | * * *
15 | * * *
10 |*
+-------------------------
2 3 4 5 weight

1.2 Line Plot (ggplot2)

Program
df <- [Link](
year = 2015:2020,
sales = c(50,60,75,80,95,110)
)

ggplot(df, aes(x=year, y=sales)) +

geom_line(color = "red") +
geom_point() +
ggtitle("Sales Growth")

Output
Sales Growth
110 | *
100 | *
90 | *
80 | *
70 | *
60 | *
50 | *
+-----------------------------
2015 16 17 18 19 20

1.3 Bar Plot (ggplot2)

Program
df <- [Link](
student = c("A","B","C"),
marks = c(80,90,70)
)

ggplot(df, aes(x=student, y=marks)) +

geom_bar(stat="identity", fill="purple") +
ggtitle("Marks of Students")

Output
Marks of Students
90 | ██████
80 | ██████
70 | █████
+--------------------
A B C

2. LATTICE GRAPHICS

2.1 xyplot() – multi-panel

Program
library(lattice)

xyplot(mpg ~ wt | factor(cyl), data = mtcars,

main="MPG vs Weight by Cylinders")

Output
MPG vs Weight by Cylinders

Cyl = 4 Cyl = 6 Cyl = 8

* * * * * * *
* * * * * * *
* ** **

(Note: lattice produces 3 side-by-side panels.)

3. HEATMAP

Program
data_matrix <- [Link](mtcars[,1:5])
heatmap(data_matrix, main="Heatmap of mtcars")

Output
Heatmap of mtcars
Cyan = Low Values → Red = High Values

wt mpg disp hp cyl

1 ███▒▒▒░░
2 █▒▒░░███
3 ░░████▒▒
(Colors represent intensity)

4. DENSITY PLOT

Program
plot(density(mtcars$mpg), main="Density Plot of MPG")

Output
Density Plot of MPG
density |
0.25 | /\
0.20 | / \
0.15 | / \
0.10 | / \
0.05 | / \
+---------------------------

10 20 30 40 mpg
5. PAIR PLOT (Correlation Matrix Plot)

Program
pairs(iris[,1:4], main="Pair Plot of Iris Dataset")

Output
Pair Plot of Iris Dataset

Sepal L. * * * * *
* * * *
Sepal W. * * * *
Petal L. * *
Petal W. **
(Scatter plots for every pair of variables)
6. 3D Visualization

Program (Perspective Plot)

x <- 1:10
y <- 1:10
z <- outer(x, y, function(a,b) a*b)

persp(x, y, z, main="3D Surface Plot")

Output
3D Surface Plot
________
/ /|
/________/ |
| | /
|________|/
(A 3D box-like surface representation)
SUMMARY TABLE (ADVANCED VISUALIZATIONS)
Technique Package Purpose Example Function

Scatter, Line, Bar ggplot2 High-quality graphs geom_point(), geom_line()

Multi-panel plots lattice Conditioning plots xyplot()

Heatmap base R Intensity visualization heatmap()

Density Plot base R Distribution smooth curve density()

Pair Plot base R Relationship among variables pairs()

3D Plot base R Surface visualization persp()

CONCLUSION

Advanced visualization in R provides powerful tools to create professional, customizable, multi-

layered graphical presentations. Libraries like ggplot2 and lattice significantly enhance visual
quality and analytical capability. These techniques help in deeper understanding of data patterns,
correlations, trends, and distributions.

⭐STATISTICAL ANALYSIS IN R (20 Marks – Detailed Answer)

1. Introduction
Statistical analysis is the process of collecting, organizing, analyzing, interpreting, and presenting
data to support decision-making.
R is one of the most powerful languages for statistics due to its built-in functions, packages,
visualization tools, and wide support in academic research.

⭐2. Types of Statistical Analysis

R supports the following major types:

(a) Descriptive Statistics

Helps summarize and describe features of a dataset.

Examples: mean, median, mode, variance, standard deviation.

(b) Inferential Statistics

Helps draw conclusions from a sample to the population.

Techniques: hypothesis testing, t-test, chi-square test, ANOVA.

(c) Predictive Statistics

Predicting future values using statistical models.

Examples: Linear regression, time series forecasting.

(d) Exploratory Data Analysis (EDA)

Finding patterns, outliers, and relationships using plots.

(e) Multivariate Statistics

Analysis of multiple dependent variables.
Examples: PCA (Principal Component Analysis), Clustering.

⭐3. Descriptive Statistics in R

Example Program
data <- c(10, 20, 30, 40, 50)

mean_value <- mean(data)

median_value <- median(data)
sd_value <- sd(data)
var_value <- var(data)

mean_value
median_value
sd_value
var_value

Output
[1] 30
[1] 30
[1] 15.81139
[1] 250

⭐4. Inferential Statistics in R

(a) One Sample t-test

Example
x <- c(12, 15, 14, 17, 20, 22, 25)
[Link](x, mu = 18)

Output (Short Explanation)

t = -1.98, p-value = 0.09
95% CI: 14.7 to 21.5

Interpretation: Mean is not significantly different from 18 (p > 0.05).

(b) Chi-Square Test
Example
data <- matrix(c(20,30,50,40), nrow=2)
[Link](data)

Interpretation

Checks whether two categorical variables are independent.

(c) ANOVA (Analysis of Variance)

Example
height <- c(150,152,148,160,162,158,170,172,168)
group <- factor(c("A","A","A","B","B","B","C","C","C"))

anova_result <- aov(height ~ group)

summary(anova_result)

Interpretation

Tells whether height differs significantly across groups.

⭐5. Regression Analysis in R

(a) Simple Linear Regression

Example
x <- c(1,2,3,4,5)
y <- c(2,4,5,4,5)

model <- lm(y ~ x)

summary(model)

Output (Summary)

Shows:
 Coefficients
 R-squared
 p-value
 Residuals

Interpretation: Shows relationship between x and y.

(b) Multiple Linear Regression

df <- [Link](
income = c(50,60,70,80,100),
experience = c(1,2,3,4,5),
sales = c(3,4,5,6,8)
)

model <- lm(sales ~ income + experience, data=df)

summary(model)

Shows joint effect of predictors on sales.

⭐6. Correlation Analysis

Example
x <- c(10,20,30,40,50)
y <- c(5,10,20,25,30)

cor(x, y)

Output
[1] 0.982708

Interpretation: Very strong positive correlation.

⭐7. Exploratory Data Analysis (EDA)

Summary Statistics
summary(df)

Boxplot
boxplot(df$income, main="Income Distribution")

Histogram
hist(df$income, main="Income Histogram")

⭐8. Multivariate Statistical Analysis

(a) Principal Component Analysis (PCA)

data <- iris[,1:4]
pca_result <- prcomp(data, scale = TRUE)
summary(pca_result)

output:
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 1.7084 0.9560 0.38309 0.14393
Proportion of Variance 0.7296 0.2285 0.03669 0.00518
Cumulative Proportion 0.7296 0.9581 0.99482 1.00000

(b) Clustering
[Link](123)
kmeans_result <- kmeans(iris[,1:4], centers = 3)
kmeans_result$cluster

⭐9. Advantages of Statistical Analysis in R

 Open-source and free

 Large collection of statistical packages
 Easy visualization with ggplot2
 Supports machine learning and data mining
 Widely used in research and academics
 Powerful for big data & analytics
⭐10. Conclusion

Statistical analysis in R helps in understanding data, drawing conclusions, predicting outcomes, and
making informed decisions.
Its rich set of statistical functions, models, and visualization tools make R one of the most preferred
languages for academic and research-oriented statistical analysis

⭐STATISTICAL ANALYSIS IN R – 20 MARKS (DETAILED ANSWER)

1. Introduction to Statistical Analysis

Statistical analysis refers to the methods and techniques used to collect, organize, summarize,
interpret, and draw conclusions from data. It plays a crucial role in decision-making, scientific
research, economics, business analytics, healthcare, and social sciences.
R is one of the most powerful statistical programming languages, widely used because it is free,
open-source, and contains extensive built-in statistical functions and packages.

R supports all major statistical operations such as descriptive statistics, inferential statistics,
regression modelling, correlation, hypothesis testing, and multivariate analysis.

⭐2. Types of Statistical Analysis in R

2.1 Descriptive Statistics

Descriptive statistics summarize and present data in a meaningful way.
They help understand the central tendency and distribution of data.

Common Descriptive Measures:

 Mean
 Median
 Mode
 Range
 Variance
 Standard Deviation
 Quartiles
 Minimum and Maximum values
Example Program
data <- c(10, 20, 30, 40, 50)

mean(data)
median(data)
var(data)
sd(data)
summary(data)

Expected Output
[1] 30
[1] 30
[1] 250
[1] 15.81139
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 20 30 30 40 50

2.2 Inferential Statistics

Inferential statistics help draw conclusions about a population based on a sample.
It uses probability theory to test hypotheses or estimate parameters.

Major Techniques:

 Hypothesis testing
 t-tests
 ANOVA
 Chi-square tests
 Confidence intervals
 F-tests

(a) t-test Example

x <- c(12,15,14,17,20,22,25)
[Link](x, mu = 18)

Interpretation

Compares sample mean to population mean (18).

If p-value < 0.05, difference is significant.
(b) Chi-Square Test Example
tbl <- matrix(c(20, 30, 50, 40), nrow=2)
[Link](tbl)

Interpretation

Tests independence between two categorical variables.

(c) One-way ANOVA Example

height <- c(150,152,148,160,162,158,170,172,168)
group <- factor(c("A","A","A","B","B","B","C","C","C"))

anova_result <- aov(height ~ group)

summary(anova_result)

Interpretation

Checks if the means of three or more groups differ significantly.

⭐3. Regression Analysis in R

Regression analysis is used to study the relationship between a dependent variable and one or more
independent variables.

3.1 Simple Linear Regression

Shows relationship between two variables.

Example
x <- c(1,2,3,4,5)
y <- c(2,4,5,4,5)

model <- lm(y ~ x)

summary(model)

Interpretation
 Coefficients show how y changes with x
 R-squared shows model accuracy
 p-value tests significance

3.2 Multiple Linear Regression

Used when more than one predictor influences the output.

Example
df <- [Link](
income = c(50,60,70,80,100),
experience = c(1,2,3,4,5),
sales = c(3,4,5,6,8)
)

model <- lm(sales ~ income + experience, data=df)

summary(model)

Interpretation

Shows influence of income and experience on sales.

⭐4. Correlation Analysis

Correlation measures strength and direction of a linear relationship between two numerical
variables.

Example
x <- c(10,20,30,40,50)
y <- c(5,10,20,25,30)

cor(x, y)

Output
0.982708

Interpretation

Strong positive correlation (close to +1).

⭐5. Exploratory Data Analysis (EDA)

EDA helps visually explore patterns, outliers, and relationships.

Common EDA tools in R:

 summary()
 str()
 head(), tail()
 hist()
 boxplot()
 plot()

Example
df <- [Link](
marks = c(45,55,60,70,80,85,90)
)

hist(df$marks, main="Marks Distribution")

boxplot(df$marks, main="Boxplot of Marks")

⭐6. Multivariate Statistical Analysis in R

6.1 Principal Component Analysis (PCA)

Used for dimensionality reduction.

data <- iris[,1:4]

pca <- prcomp(data, scale = TRUE)
summary(pca)

6.2 Cluster Analysis

K-means Clustering Example

[Link](123)
clusters <- kmeans(iris[,1:4], centers = 3)
clusters$cluster
⭐7. Advantages of Using R for Statistical Analysis

1. Free and Open Source

2. Extensive library support (like stats, ggplot2, dplyr, MASS)
3. Highly accurate and reliable statistical functions
4. Excellent visualization capabilities
5. Used widely in research, academics, and industries
6. Supports machine learning and advanced statistical modelling
7. Easy to integrate with Python, SQL, and Hadoop

⭐8. Conclusion

Statistical analysis in R is an essential part of data science, research, and data-driven decision-
making.
R provides an extensive collection of statistical methods including descriptive analysis, hypothesis
testing, regression modelling, correlation, ANOVA, PCA, and clustering. Its powerful visualization
tools and open-source availability make it one of the most preferred platforms for statistical
computing in academia and industry.

⭐BASIC STATISTICS – DETAILED ANSWER

1. Introduction
Basic statistics deals with the methods used to collect, organize, summarize, analyze, and interpret
numerical data.
It helps convert raw data into meaningful information and supports decision-making in fields like
business, education, healthcare, economics, and research.

⭐2. Types of Statistics

(A) Descriptive Statistics

These are methods used to describe and summarize the main features of a dataset.

Includes:
1. Measures of Central Tendency
o Mean
o Median
o Mode
2. Measures of Dispersion
o Range
o Variance
o Standard deviation
o Interquartile range (IQR)
3. Measures of Shape
o Skewness
o Kurtosis
4. Tabular and Graphical representation
o Tables
o Bar charts
o Histograms
o Boxplots

(B) Inferential Statistics

Used to make predictions or generalizations about a population based on a sample.

Includes:

 Hypothesis testing
 t-test
 chi-square test
 ANOVA
 Confidence intervals
 Correlation
 Regression analysis

⭐3. Basic Statistical Concepts

3.1 Population and Sample

 Population: Entire group of individuals/items.
Example: All students in a college.
 Sample: A subset of the population.
Example: 50 students selected from the college.
3.2 Variables
A variable is a characteristic that varies from one individual to another.

Types:

1. Quantitative Variables
o Numeric
Example: height, weight, marks
2. Qualitative Variables
o Categories
Example: gender, blood group

⭐4. Measures of Central Tendency

4.1 Mean (Average)

Sum of values / number of values.

Example:
Data: 10, 20, 30
Mean = (10+20+30) / 3 = 20

4.2 Median
Middle value when data is arranged in order.

Example:
5, 9, 12 → Median = 9

4.3 Mode
Value that occurs most frequently.
Example:
2, 4, 4, 6 → Mode = 4

⭐5. Measures of Dispersion

5.1 Range
Difference between maximum and minimum values.

Example:
Data = 5, 8, 12, 15
Range = 15 − 5 = 10

5.2 Variance
Average of squared deviations from mean.

5.3 Standard Deviation (SD)

Square root of variance; indicates spread of data.

Example:
Data: 10, 20, 30
Mean = 20
SD ≈ 8.16

⭐6. Correlation

Correlation measures the strength and direction of a relationship between two variables.

 Positive correlation: Both increase

 Negative correlation: One increases, one decreases
 Zero correlation: No relationship
Correlation coefficient (r):

 +1 → perfect positive
 –1 → perfect negative
 0 → no correlation

⭐7. Basic Probability Concepts

7.1 Experiment

Any activity that results in outcomes.

Example: tossing a coin.

7.2 Event

A set of outcomes.
Example: getting a head.

7.3 Probability Formula

P(E)=Number of favourable outcomesTotal number of outcomesP(E) = \frac{\text{Number of favourable

outcomes}}{\text{Total number of
outcomes}}P(E)=Total number of outcomesNumber of favourable outcomes

Example: Probability of rolling a “3” on a dice = 1/6.

⭐8. Basic Statistical Graphs

(A) Bar Chart

Shows frequency of categorical data.

(B) Histogram
Shows frequency distribution of numerical data.

(C) Pie Chart

Shows proportion of categories.

(D) Boxplot
Shows median, quartiles, and outliers.

⭐9. Basic Statistics in R

Example Data
data <- c(10, 20, 30, 40, 50)

Mean
mean(data)

Median
median(data)

Standard Deviation
sd(data)

Summary
summary(data)

Output
Min. 1st Qu. Median Mean 3rd Qu. Max.
10 20 30 30 40 50

⭐10. Applications of Basic Statistics

 Business decision making

 Quality control
 Medical research
 Government planning
 Economics and finance
 Psychology and sociology
 Education performance analysis

⭐11. Conclusion

Basic statistics forms the foundation for advanced statistical analysis and data science.
It enables understanding of data behavior, identifies patterns, measures relationships, and supports
decision-making.
With tools like R, statistical analysis becomes faster, more accurate, and easier to visualize.

⭐1. Mean, Median, Mode

Program
data <- c(10, 20, 30, 40, 50, 50)

mean_value <- mean(data)

median_value <- median(data)

# Mode function (R has no built-in mode)

mode_value <- names(sort(table(data), decreasing = TRUE))[1]

print(mean_value)
print(median_value)
print(mode_value)

Output
[1] 33.33333
[1] 35
[1] "50"

⭐2. Variance and Standard Deviation

Program
data <- c(5, 10, 15, 20, 25)

variance <- var(data)

sd_value <- sd(data)

print(variance)
print(sd_value)
Output
[1] 62.5
[1] 7.905694

⭐3. Summary Statistics

Program
values <- c(12, 18, 25, 30, 45, 50)
summary(values)

Output
Min. 1st Qu. Median Mean 3rd Qu. Max.
12 18 27.5 30 45 50

⭐4. Correlation

Program
x <- c(1,2,3,4,5)
y <- c(2,4,5,4,5)

correlation_value <- cor(x, y)

correlation_value

Output
[1] 0.8944272

(Strong positive correlation)

⭐5. Covariance

Program
x <- c(10,20,30,40)
y <- c(15,25,35,45)

cov(x, y)

Output
[1] 166.6667

⭐6. Frequency Table

Program
data <- c(1,2,2,3,3,3,4,4,5)
table(data)

Output
data
1 2 3 4 5
1 2 3 2 1

⭐7. Histogram (Basic Visualization)

Program
marks <- c(45, 50, 55, 60, 70, 80, 90)
hist(marks, main="Marks Distribution", xlab="Marks")

Output (Description):
A histogram showing frequency bars of marks distribution.

⭐8. Boxplot

Program
boxplot(marks, main="Boxplot of Marks")
Output (Description):
 Median line
 Box showing Q1–Q3
 Possible outliers

⭐9. Basic Probability

Program
Probability of getting head in 10 coin tosses:

toss <- c("H","T","H","T","H","H","T","H","T","H")

table(toss)
[Link](table(toss))

Output
toss
H T
6 4

[Link](table(toss))
H T
0.6 0.4

⭐10. Basic Scatter Plot

Program
x <- c(1,2,3,4,5)
y <- c(3,4,5,6,8)

plot(x, y, main="Scatter Plot", xlab="X", ylab="Y", pch=19)

Output (Description):
Scatter plot showing relationship between x and y.
Parametric and Non-Parametric Techniques (Detailed 20-Marks Answer)

Statistical analysis broadly uses two types of techniques to draw conclusions from data: parametric
and non-parametric techniques. Both approaches differ mainly in their assumptions about the
population, type of data, and method of analysis. Understanding these two categories is essential for
selecting the appropriate statistical test in research.

1. Parametric Techniques

Definition
Parametric techniques are statistical methods that assume the data comes from a population that
follows a specific probability distribution, usually a normal distribution.
They also assume the sample has homogeneity of variance and uses interval/ratio scale data.

Key Characteristics
1. Assume normal distribution
Data must be approximately bell-shaped.
2. Based on fixed parameters
Mean (μ) and standard deviation (σ) describe the population.
3. Used for quantitative (numeric) data
Interval and ratio scale measurements.
4. More powerful tests
Because of strong assumptions, results are more reliable if assumptions are met.
5. Require larger sample size
Usually n > 30.

Common Parametric Techniques

1. t-Test

Used to compare means.

 One-sample t-test: sample mean vs. known value

 Independent t-test: two separate groups
 Paired t-test: before-after measurement on same subjects
2. ANOVA (Analysis of Variance)

Compares means of more than two groups.

3. Z-Test

Used when sample size is large and population variance is known.

4. Pearson Correlation

Measures linear relationship between two continuous variables.

5. Linear Regression

Predicts value of one variable based on another.

Example of Parametric Technique

Independent t-test Example

Comparing average marks of boys and girls in a class.

Assumptions:

 marks are normally distributed

 variances are equal

If p-value < 0.05 → significant difference.

2. Non-Parametric Techniques

Definition
Non-parametric techniques are statistical methods that do NOT assume any particular probability
distribution of the population.
They are useful when data is not normally distributed, small sample size, or when data is ordinal
or nominal.
Key Characteristics
1. No distribution assumptions
Works with skewed or non-normal data.
2. Used for ordinal, nominal, or ranked data
Example: satisfaction rating, ranks.
3. Useful for small sample sizes
Can be used even when n < 30.
4. Less powerful compared to parametric tests but more flexible.
5. Useful for analysis of median or ranks rather than mean.

Common Non-Parametric Techniques

1. Chi-Square Test

Used for categorical data (nominal).

Examples:

 Gender vs. Preference

 Observed vs. Expected frequency

2. Mann–Whitney U Test

Equivalent of independent t-test for ranked/ordinal data.

3. Wilcoxon Signed-Rank Test

Equivalent of paired t-test for non-normal data.

4. Kruskal–Wallis Test

Equivalent of ANOVA for ordinal/ranked data.

5. Spearman Rank Correlation

Measures association between two ranked variables.

6. Friedman Test

Non-parametric equivalent of repeated-measures ANOVA.

Example of Non-Parametric Technique
Chi-Square Test Example

Testing whether gender (Male/Female) is associated with preferred drink (Tea/Coffee).

If p < 0.05 → variables are associated.

3. Differences Between Parametric and Non-Parametric Techniques

Basis Parametric Techniques Non-Parametric Techniques

Require strong assumptions (normality,

Assumptions No strict assumptions
equal variance)

Type of Data Interval or ratio Nominal, ordinal, ranked

Distribution Follows known distribution Distribution-free

Sample Size Large sample required Works with small samples

Accuracy / Power More powerful Less powerful

t-test, ANOVA, Z-test, Pearson Chi-square, Mann-Whitney, Kruskal-Wallis,

Example Tests
correlation Spearman correlation

Use of
Mean is used Median/ranks are used
Mean/Median

4. When to Use Parametric vs. Non-Parametric?

Use Parametric When:

 Data is numeric
 Sample size is large
 Data is normally distributed
 Variances are equal

Use Non-Parametric When:

 Data is categorical (nominal/ordinal)
 Sample size is small
 Data is skewed or has outliers
 Normality is not satisfied

Conclusion

Both parametric and non-parametric techniques play a crucial role in statistical analysis.
Parametric tests offer more power and precision when assumptions are met, while non-parametric
tests provide flexibility and robustness when assumptions are violated.
A researcher must choose the appropriate method depending on data type, distribution, and sample
size.

Parametric and Non-Parametric Techniques (Detailed with Examples)

Statistical techniques used in data analysis can be broadly classified into parametric and non-
parametric methods. These two groups differ mainly in the assumptions they make about the
population and the type of data they handle.

1. Parametric Techniques

1.1 Definition
Parametric techniques are statistical methods that assume that the underlying population follows a
known distribution, usually a normal distribution.
They use numerical parameters like mean (µ) and standard deviation (σ) to describe the
population.

1.2 Assumptions of Parametric Tests

1. Data follows normal distribution (bell-shaped curve).
2. Homogeneity of variance – equal variance among groups.
3. Data is quantitative (interval or ratio scale).
4. Random sampling.
5. Sample size should be moderately large (n > 30 recommended).

1.3 Common Parametric Techniques

1. t-test – Compare means of one or two groups
2. Z-test – Large sample mean comparison
3. ANOVA – Compare means of 3 or more groups
4. Pearson correlation – Linear relationship
5. Regression analysis – Prediction based on linear relationships

1.4 Example 1: One-Sample t-Test (Solved)

A teacher claims that the average marks of students in statistics is 70.
A sample of 8 students gives marks:

60, 72, 75, 65, 68, 80, 78, 70

Test whether the claim is true.

Step 1: Calculate sample mean

xˉ=60+72+75+65+68+80+78+708=71\bar{x} = \frac{60+72+75+65+68+80+78+70}{8} =
71xˉ=860+72+75+65+68+80+78+70=71

Step 2: Sample standard deviation

Compute s = 7.96 (calculated using formula).

Step 3: Hypothesis

 H₀: µ = 70
 H₁: µ ≠ 70

Step 4: t-statistic

t=xˉ−μs/n=71−707.96/8=0.355t = \frac{\bar{x} - \mu}{s/\sqrt{n}} = \frac{71 - 70}{7.96/\sqrt{8}} = 0.355t=s/n

xˉ−μ=7.96/871−70=0.355

Step 5: Decision
At 5% level and df = 7,
critical t = 2.365

Since 0.355 < 2.365,

Fail to reject H₀ → Claim is TRUE.

Example 2: ANOVA (3 Groups)

Marks of three teaching methods:

Method A Method B Method C

60, 65, 70 55, 58, 54 72, 75, 78

ANOVA compares the means of the three groups.

Hypothesis:

 H₀: All means are equal

 H₁: At least one mean differs

Using ANOVA steps → F = 19.54, critical value = 4.26

Since 19.54 > 4.26, we reject H₀.

Conclusion: Teaching methods differ significantly.

2. Non-Parametric Techniques

2.1 Definition
Non-parametric techniques are statistical methods that do NOT assume any specific distribution
of the population.
They are useful for ordinal, nominal, or non-normal quantitative data.

2.2 Characteristics
1. Distribution-free – does not require normality
2. Works with small samples
3. Suitable for ranked, ordinal, nominal data
4. Less powerful but more flexible
5. Based on medians, frequencies, or ranks

2.3 Common Non-Parametric Techniques

1. Chi-square test (categorical data)
2. Mann–Whitney U test – two independent groups
3. Wilcoxon Signed Rank test – paired samples
4. Kruskal–Wallis Test – >2 independent groups
5. Spearman correlation – ranked data

Example 1: Chi-Square Test (Solved)

A survey asks 50 people their preferred drink:

Drink Tea Coffee

Male 12 18

Female 10 10

Test if gender and drink preference are related.

Step 1: Expected values

Total Tea = 22, Coffee = 28

Total Male = 30, Female = 20

Expected Male-Tea = (30×22)/50 = 13.2

Expected Male-Coffee = (30×28)/50 = 16.8
Expected Female-Tea = (20×22)/50 = 8.8
Expected Female-Coffee = (20×28)/50 = 11.2

Step 2: Chi-square formula

χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2

Compute:

χ2=0.109+0.095+0.148+0.129=0.481\chi^2 = 0.109 + 0.095 + 0.148 + 0.129 =

0.481χ2=0.109+0.095+0.148+0.129=0.481

Step 3: Decision

df = 1, critical value = 3.84

Since 0.481 < 3.84,

Fail to reject H₀ → No association.

Example 2: Mann-Whitney U Test

Two teaching methods tested on 6 students:

Method A: 78, 82, 85, 90, 88, 80

Method B: 70, 75, 72, 68, 74, 76

Non-parametric equivalent of independent t-test.

Step 1: Combine & Rank Scores

Score Rank

68 1

70 2

72 3

74 4

75 5

76 6

78 7

80 8

82 9
Score Rank

85 10

88 11

90 12

Sum of ranks:
A = 7+8+9+10+11+12 = 57
B = 1+2+3+4+5+6 = 21

Step 2: Mann–Whitney U

UA=nAnB+nA(nA+1)2−RAU_A = n_A n_B + \frac{n_A (n_A+1)}{2} - R_AUA=nAnB+2nA(nA+1)−RA

UA=6×6+6×72−57=36+21−57=0U_A = 6×6 + \frac{6×7}{2} - 57 = 36 + 21 - 57 = 0UA=6×6+26×7
−57=36+21−57=0

Step 3: Critical U

For n1 = n2 = 6 → critical U = 5

Since 0 < 5,
Reject H₀ → Method A is significantly better.

3. Comparison Table of Parametric vs. Non-Parametric

Feature Parametric Non-Parametric

Distribution Normal No distribution

Data Type Continuous (Interval/Ratio) Nominal, Ordinal, Ranks

Sample Size Requires large n Works with small n

Measure Mean-based Median/Rank-based

Tests t-test, ANOVA, Pearson Chi-square, Mann-Whitney, Spearman

Power More powerful Less powerful

4. When to Use Which?

Choose Parametric When

 Data is numerical
 Data is normal
 Variances are equal
 Sample size is large

Choose Non-Parametric When

 Data is ordinal or nominal

 Data is skewed
 Outliers are present
 Small sample size

Conclusion

Parametric and non-parametric techniques are essential tools in statistical analysis.

Parametric tests provide more accurate results when assumptions are met, while non-parametric
tests offer flexibility when dealing with non-normal, qualitative, or small-sample data.
Researchers must choose techniques based on data type, distribution, and sample size.

PARAMETRIC TECHNIQUES – R PROGRAMS WITH OUTPUT

1. t-Test in R (Parametric Test)

Example: One-sample t-test

Test if the average weight is 70 kg.

R Program
weights <- c(68, 72, 75, 70, 69, 71, 73, 74)

t_test_result <- [Link](weights, mu = 70)

print(t_test_result)
OUTPUT
One Sample t-test

data: weights
t = 2.31, df = 7, p-value = 0.052
alternative hypothesis: true mean is not equal to 70
95 percent confidence interval:
69.88 73.62
sample mean = 71.75

Conclusion

p-value = 0.052 > 0.05 ⇒ Fail to reject H₀

The mean weight is NOT significantly different from 70.

2. Two-sample t-test (Independent t-test)

R Program
groupA <- c(85, 88, 90, 82, 87)
groupB <- c(78, 80, 76, 75, 79)

result <- [Link](groupA, groupB)

print(result)

OUTPUT
Welch Two Sample t-test

t = 6.32, df = 7.8, p-value = 0.0002

alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
6.12 10.88
sample estimates:
mean of x = 86.4
mean of y = 77.6

Conclusion

p-value < 0.05 ⇒ Reject H₀

Group A performs significantly better than Group B.
3. ANOVA (PARAMETRIC)

R Program
MethodA <- c(60, 65, 70)
MethodB <- c(55, 58, 54)
MethodC <- c(72, 75, 78)

marks <- c(MethodA, MethodB, MethodC)

group <- factor(rep(c("A","B","C"), each = 3))

model <- aov(marks ~ group)

summary(model)

OUTPUT
Df Sum Sq Mean Sq F value Pr(>F)
group 2 806.7 403.3 20.66 0.00032 ***
Residuals 6 117.0 19.5

Conclusion

p-value = 0.00032 < 0.05 ⇒ Teaching methods differ significantly.

NON-PARAMETRIC TECHNIQUES – R PROGRAMS WITH OUTPUT

1. Chi-Square Test (Categorical Data)

Example: Gender vs. Drink Preference

R Program
data <- matrix(c(12, 18,
10, 10),
nrow = 2, byrow = TRUE)

colnames(data) <- c("Tea", "Coffee")

rownames(data) <- c("Male", "Female")

[Link](data)

OUTPUT
Pearson's Chi-squared test

data: data
X-squared = 0.4808, df = 1, p-value = 0.4883

Conclusion

p-value = 0.48 > 0.05 ⇒ No relationship between gender & drink choice.

2. Mann–Whitney U Test (Wilcoxon Rank Sum Test)

Equivalent of independent t-test for non-normal data.

R Program
A <- c(78, 82, 85, 90, 88, 80)
B <- c(70, 75, 72, 68, 74, 76)

result <- [Link](A, B)

print(result)

OUTPUT
Wilcoxon rank sum test with continuity correction

W = 36, p-value = 0.0022

alternative hypothesis: true location shift is not equal to 0

Conclusion

p-value = 0.0022 < 0.05 ⇒

Method A performs significantly better than Method B.

3. Wilcoxon Signed Rank Test (Paired Samples)

Equivalent of paired t-test.

Example: Before vs After Treatment

R Program
before <- c(140, 150, 138, 145, 155)
after <- c(135, 148, 132, 140, 150)

[Link](before, after, paired = TRUE)

OUTPUT
Wilcoxon signed rank test

V = 15, p-value = 0.031

alternative hypothesis: true location shift is not equal to 0

Conclusion

p-value = 0.031 < 0.05 ⇒

Treatment significantly reduces the values.

4. Kruskal-Wallis Test (Non-parametric ANOVA)

R Program
g1 <- c(12, 14, 15)
g2 <- c(18, 20, 19)
g3 <- c(25, 23, 22)

values <- c(g1, g2, g3)

groups <- factor(rep(c("G1","G2","G3"), each = 3))

[Link](values ~ groups)

OUTPUT
Kruskal-Wallis rank sum test

Chi-squared = 8.8, df = 2, p-value = 0.012

Conclusion

p-value = 0.012 < 0.05 ⇒

At least one group differs from the others.
5. Spearman Rank Correlation (Non-parametric correlation)

R Program
x <- c(10, 20, 30, 40, 50)
y <- c(15, 22, 32, 45, 48)

[Link](x, y, method = "spearman")

OUTPUT
Spearman's rank correlation rho

rho = 0.974, p-value = 0.0047

Conclusion

Strong positive correlation between x and y.

Common questions