0% found this document useful (0 votes)
8 views115 pages

R Programming

The document provides an introduction to R programming, highlighting its key features, basic concepts, and applications in data analysis and machine learning. It includes installation instructions for R and RStudio, as well as guidance on writing and running R code in Visual Studio Code. Additionally, it covers variables, data types, operators, and decision-making statements in R, along with practical examples and practice questions.

Uploaded by

rajurizzgod
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views115 pages

R Programming

The document provides an introduction to R programming, highlighting its key features, basic concepts, and applications in data analysis and machine learning. It includes installation instructions for R and RStudio, as well as guidance on writing and running R code in Visual Studio Code. Additionally, it covers variables, data types, operators, and decision-making statements in R, along with practical examples and practice questions.

Uploaded by

rajurizzgod
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

R Programming

Chapter 1

Introduction to R Programming
R is a powerful programming language and environment widely used for statistical computing, data analysis, and
graphical representation. Developed by Ross Ihaka and Robert Gentleman in the early 1990s, R has become a
standard tool in many fields such as data science, research, and academia.

Key Features of R:

1. Open Source: R is free and open-source, which means anyone can download and modify it.

2. Statistical Techniques: It provides a variety of statistical techniques such as linear and nonlinear
modeling, time-series analysis, classification, and clustering.

3. Data Visualization: R excels in creating detailed and high-quality graphs and plots.

4. Extensibility: Users can extend the functionality of R by installing packages, which are user-contributed
libraries for specialized tasks.

5. Large Community: R has a large and active community, so there are plenty of tutorials, forums, and
resources available for learning.

Basic Concepts in R Programming:

• Variables and Data Types: In R, you can store values in variables and use basic data types like numeric,
integer, character (text), and logical (TRUE/FALSE).

• Vectors: A core data structure in R that holds an ordered set of elements of the same type (e.g., numbers
or text).

• Data Frames: A table-like structure where columns can be of different types, making it ideal for datasets.

• Functions: R has built-in functions for tasks like calculating the mean, sum, or generating plots. You can
also create custom functions.

Applications of R:

• Data Analysis: R is used in industries like finance, healthcare, and academia for analyzing large datasets.

• Machine Learning: R provides libraries for implementing machine learning algorithms like regression,
decision trees, and clustering.

• Research and Academia: It's commonly used for statistical analysis in scientific research, allowing
researchers to analyze and visualize data efficiently.

Here’s how you can install and run R in Visual Studio Code (VSCode):
Installation

Step 1: Install R

1. Go to the official R website: [Link]

2. Click on your operating system:

o Windows → Download R for Windows → base → Download [Link]

o macOS → Download R for (Mac) OS X

o Linux → Choose your distribution and follow the listed commands.

3. Run the downloaded installer and follow the setup wizard (default settings are fine).

Step 2: Install RStudio (Optional but Recommended)

RStudio provides a user-friendly interface for R.

1. Go to [Link]

2. Download the free version of RStudio Desktop.

3. Install it — it will automatically detect your R installation.

Step 3: Open R or RStudio

• Option 1: Open R GUI (installed with R)

• Option 2: Open RStudio — easier and recommended for beginners.

Step 4: Write and Run R Code

You can type commands directly in the Console or write them in a Script file.

Example 1 – Simple Print

print("Hello, R World!")

Example 2 – Basic Calculation


x <- 10

y <- 5

sum <- x + y

print(sum)

Example 3 – Create and Display a Data Frame

data <- [Link](Name = c("John", "Asha", "Ravi"),

Age = c(25, 28, 30))

print(data)

Step 5: Run a Script File

In RStudio:

1. Create a new file → File → New File → R Script

2. Write your R code.

3. Save it as program.R

4. Run it using:

o Ctrl + Enter → runs the current line or selection

o Ctrl + Shift + S → runs the whole script

In R Console:

source("C:/path/to/your/program.R")

Step 6: Install Packages (if needed)

R uses packages to add extra functionality.

Example – install and load ggplot2:

[Link]("ggplot2")
library(ggplot2)

You’re Ready!

Now you can write, save, and run R programs easily.


Option 2

Step 1: Install R

1. Download and Install R:

o Go to CRAN R Project.

o Download and install the appropriate version of R for your operating system (Windows, macOS,
or Linux).

Step 2: Install VSCode

1. Download VSCode:

o Visit Visual Studio Code.

o Download the appropriate version of VSCode for your operating system.

o Install and launch VSCode.

Step 3: Install R Extension for VSCode

1. Install R Extension:

o Open VSCode.

o Go to the Extensions view by clicking the Extensions icon on the sidebar or pressing Ctrl + Shift +
X.

o Search for "R" and install the R Extension for Visual Studio Code.

Step 4: Configure R Path in VSCode

1. Configure the Path:

o Open VSCode’s command palette (press Ctrl + Shift + P).

o Type "Preferences: Open Settings (JSON)" and open it.

o Add the following configuration to the settings file to let VSCode know where R is installed:

"[Link]": "C:/Program Files/R/R-4.1.0/bin/[Link]",

"[Link]": [

"--no-save",

"--no-restore"

],
o Replace "C:/Program Files/R/R-4.1.0/bin/[Link]" with the actual path where R is installed on your
system.

Step 5: Run R in VSCode

1. Run R:

o Open a new terminal in VSCode (Ctrl + ~).

o Type R and press Enter to start the R terminal.

o You can now run R code directly in the terminal.

2. Run R Scripts:

o Open or create a new .R file in VSCode.

o Run the code by right-clicking the file and selecting Run Source File.

Additional Tips:

• R Language Server: You can also install the R language server to enable IntelliSense, auto-completion,
and R-specific features in VSCode.

• Install languageserver package in R:

o Run this command in R:

[Link]("languageserver")

After following these steps, you will be able to write and execute R code in Visual Studio Code, enhancing your
workflow with a powerful editor.
Chapter 2

Variables & Data Types

In programming, data types refer to the classification of data, determining the type of values that variables can
store and how the program can manipulate these values. Here's a simple breakdown:

1. Primitive Data Types:

• Integer: Represents whole numbers without decimals, like 5, -23.

• Float: Represents numbers with decimal points, like 3.14, -0.01.

• Boolean: Holds only two possible values, True or False.

• Character: Represents a single character, like 'a', '$'.

• String: A sequence of characters, like "Hello" or "123".

2. Non-Primitive Data Types:

• Array: A collection of elements, all of the same type (like [1, 2, 3]).

• List (Python): A flexible collection that can store elements of different types, e.g., [5, "apple", True].

• Tuple (Python): Similar to a list but immutable (can’t be modified), e.g., (1, 2, 3).

• Dictionary (Python): A collection of key-value pairs, like {"name": "Alice", "age": 25}.

Variables
Variables in R Programming: Complete Concept

1. What is a Variable?

• In R, a variable is used to store data values. You can think of a variable as a label or container that holds
information, such as numbers, text, or more complex data structures like vectors or data frames.

• Example:

x <- 10

y <- "Hello"

2. Assigning Values to Variables:

• Variables are assigned using the assignment operator <- or =.


a <- 5 # Assigning the value 5 to variable a

b = 7 # Assigning the value 7 to variable b

3. Types of Variables:

• Numerical: Stores numbers (both integer and float types).

num <- 25

• Character (String): Stores text data.

name <- "R Programming"

• Logical (Boolean): Stores TRUE or FALSE values.

is_valid <- TRUE

• Factor: Used for categorical data (e.g., categories like "Male" or "Female").

gender <- factor(c("Male", "Female"))

4. Naming Rules for Variables:

• Start with a letter or a dot (but not followed by a number).

• No spaces or special characters (use underscores or periods).

• Case-sensitive (e.g., Var and var are different).

5. Dynamic Typing in R:

• R allows dynamic typing, meaning the type of data stored in a variable can change during the program
execution.

x <- 10 # Integer

x <- "Text" # Now a String

6. Removing Variables:

• You can remove variables using the rm() function.

rm(x) # Removes the variable x

7. Data Types and Variables: Variables can hold different data types. These include:

• Numeric: For numbers.

• Character: For text.


• Logical: TRUE or FALSE.

• Complex: For complex numbers.

• Factors: Used for categorical variables.

8. Scope of Variables:

• Variables in R have a scope, meaning where they are accessible. There are global variables (available
throughout the program) and local variables (defined within a function or block).

global_var <- 10 # Global Variable

my_function <- function(){

local_var <- 5 # Local Variable

return(local_var)

9. Reassigning Variables: You can change the value stored in a variable at any time:

x <- 3

x <- 8 # Now x holds the value 8

10. Checking the Type of a Variable: Use class() to check the data type of a variable:

x <- 5

class(x) # Output: "numeric"

Comments in R
1. What are Comments?

• Comments are lines in the code that are not executed by the R interpreter. They are used to explain the
code, make it more readable, and provide context for future reference.

2. How to Write Comments in R?

• To create a comment, use the # symbol. Any text after this symbol on the same line is considered a
comment and will not affect your program.

# This is a comment

x <- 5 # Assigning value 5 to variable x


3. Multi-Line Comments

• R does not have a special syntax for multi-line comments, but you can simply use # at the beginning of
each line.

# This is a multi-line comment

# Each line starts with a #

y <- 10 # Assign value 10 to y

4. Why Use Comments?

• Clarification: Comments make code easier to understand.

• Debugging: You can comment out certain parts of the code temporarily to troubleshoot.

• Collaboration: Helpful when working in teams to make sure others understand your thought process.

Comments are essential for making your R code readable and maintainable!
Practice Questions

Data Types

1. What are the four primary data types in R? Provide an example for each.

2. How can you check the data type of a variable in R? Give an example.

3. What is the difference between numeric and integer data types in R?

4. How would you convert a character variable to numeric in R?

5. What is the function used to create a logical value in R?

Variables

6. What is the purpose of a variable in R, and how do you create one?

7. List at least three rules for naming variables in R.

8. What happens if you try to create a variable name that starts with a number?

9. How can you assign multiple values to a variable in R?

10. Write a line of code to declare a variable for your favorite color.

Comments

11. How do you write a single-line comment in R? Provide an example.

12. How can you write a multi-line comment in R?

13. Why are comments important in programming?

14. Provide an example of how comments can clarify the purpose of code.

15. What is the best practice for commenting your code in R?


Chapter 3 – Operators & Decision-Making Statements in R
Programming

Introduction

In every programming language, operators and decision-making statements are


essential for performing calculations and controlling the flow of logic.

• Operators allow you to perform mathematical, relational, or logical


operations.

• Decision-making statements allow your R programs to choose different


actions based on conditions.

These tools help R programmers make programs more intelligent, flexible, and
responsive to user input or data.

1. Operators in R Programming

An operator is a symbol that performs a specific operation on one or more values


or variables.

For example, in the expression 10 + 5, the symbol + is an operator, and 10 and 5


are operands.

Types of Operators in R

1. Arithmetic Operators

2. Relational Operators

3. Logical Operators

4. Assignment Operators
5. Miscellaneous Operators (: and %in%)

1. Arithmetic Operators

Arithmetic operators perform basic mathematical calculations like addition,


subtraction, multiplication, division, etc.

Operator Description Example Output

+ Addition 5+3 8

- Subtraction 10 - 4 6

* Multiplication 6*2 12

/ Division 8/2 4

%% Modulus (Remainder) 10 %% 3 1

^ Exponentiation (Power) 2 ^ 3 8

Real-World Examples – Arithmetic Operators

Example 1 – Calculate Total Marks

math <- 85

science <- 90

english <- 80

total <- math + science + english

print(total)

Example 2 – Calculate Simple Interest


P <- 10000

R <- 5

T <- 2

SI <- (P * R * T) / 100

print(SI)

Example 3 – Calculate Discounted Price

price <- 1200

discount <- 10

final_price <- price - (price * discount / 100)

print(final_price)

Example 4 – Average Marks

marks <- c(70, 80, 90)

average <- sum(marks) / length(marks)

print(average)

2. Relational Operators

Relational operators are used to compare two values and return a TRUE or FALSE
result.

Operator Description Example Output

== Equal to 5 == 5 TRUE

!= Not equal to 5 != 3 TRUE


Operator Description Example Output

> Greater than 7>2 TRUE

< Less than 3<5 TRUE

>= Greater than or equal to 5 >= 5 TRUE

<= Less than or equal to 4 <= 3 FALSE

Real-World Examples – Relational Operators

Example 1 – Age Check for Voting

age <- 18

if (age >= 18) {

print("Eligible to Vote")

Example 2 – Temperature Check

temperature <- 35

if (temperature > 30) {

print("It’s Hot Outside!")

Example 3 – Marks Comparison

marks <- 60

if (marks >= 40) {

print("Pass")
} else {

print("Fail")

Example 4 – Comparing Two Numbers

a <- 25

b <- 30

print(a > b)

print(a < b)

3. Logical Operators

Logical operators are used to combine multiple conditions.

Operator Description Example Output

AND – TRUE if both conditions


& (5 > 3 & 10 > 8) TRUE
TRUE

OR – TRUE if any one condition `(5 >


` `
TRUE 10

! NOT – reverses logical value !(5 > 10) TRUE

Real-World Examples – Logical Operators

Example 1 – Exam Eligibility

attendance <- 80

marks <- 55
if (attendance >= 75 & marks >= 40) {

print("Eligible for Exam")

} else {

print("Not Eligible")

Example 2 – Loan Approval

income <- 40000

credit_score <- 700

if (income > 30000 | credit_score > 650) {

print("Loan Approved")

Example 3 – Electricity Bill Check

units <- 250

paid <- TRUE

if (units > 200 & paid == TRUE) {

print("Customer eligible for rebate")

4. Assignment Operators
Assignment operators are used to assign values to variables.

Operator Description Example

<- Assigns value (most common) x <- 10

= Also used to assign values x = 20

<<- Assigns value in global environment x <<- 30

Examples

x <- 10

y = 20

z <- x + y

print(z)

5. Miscellaneous Operators

Operator Description Example Output

1234
: Creates a sequence of numbers 1:5
5

Checks if element belongs to a "apple" %in%


%in% TRUE
vector c("apple","mango")

Decision-Making Statements in R

Decision-making statements control the flow of execution based on conditions.


They help you write programs that respond dynamically.
1. If Statement

The if statement executes code only when a condition is TRUE.

Syntax:

if (condition) {

# code

Example 1:

x <- 10

if (x > 0) {

print("Positive Number")

Example 2 – Check Temperature

temp <- 5

if (temp < 10) {

print("It's cold outside!")

Example 3 – Sales Bonus

sales <- 15000

if (sales > 10000) {

print("Bonus Granted")
}

2. If-Else Statement

When a condition is TRUE, one block runs; when FALSE, another block runs.

Syntax:

if (condition) {

# true block

} else {

# false block

Example 1:

x <- -2

if (x > 0) {

print("Positive")

} else {

print("Negative or Zero")

Example 2 – Even or Odd

num <- 7

if (num %% 2 == 0) {

print("Even Number")

} else {
print("Odd Number")

Example 3 – Attendance

attendance <- 70

if (attendance >= 75) {

print("Allowed for Exam")

} else {

print("Not Allowed")

3. Else If Statement

Used to check multiple conditions in sequence.

Syntax:

if (condition1) {

# code 1

} else if (condition2) {

# code 2

} else {

# default code

Example 1 – Check Number Type

x <- 0
if (x > 0) {

print("Positive")

} else if (x < 0) {

print("Negative")

} else {

print("Zero")

Example 2 – Grade Calculation

marks <- 82

if (marks >= 90) {

print("Grade A+")

} else if (marks >= 75) {

print("Grade A")

} else if (marks >= 50) {

print("Grade B")

} else {

print("Fail")

4. Nested If Statement

An if inside another if creates a nested structure.

Example 1:
x <- 10

if (x >= 0) {

print("Non-negative")

if (x == 0) {

print("Zero")

} else {

print("Positive Number")

} else {

print("Negative Number")

Example 2 – Banking Example

balance <- 5000

pin <- 1234

input_pin <- 1234

if (input_pin == pin) {

if (balance >= 1000) {

print("Transaction Successful")

} else {

print("Insufficient Balance")

}
} else {

print("Incorrect PIN")

Summary Table

Statement
Description Use Case
Type

Executes code when condition is


if Simple checks
TRUE

if-else Executes one of two code blocks Pass/Fail, Even/Odd

else if Tests multiple conditions Grading, Age group

Complex logic like


nested if One if inside another
banking/login

Practice Questions (Without Answers)

Section A – Operators

1. Write an R expression to calculate the total and average of 3 subject marks.

2. Find the remainder when 47 is divided by 5 using the modulus operator.

3. Compare two numbers and check which is greater using relational


operators.

4. Use logical operators to check if a student’s attendance is above 75% and


marks above 40.
5. Assign a value to a variable and perform addition and multiplication using
assignment operators.

6. Check if “mango” exists in a vector of fruits using %in%.

7. Create a sequence of numbers from 10 to 20 using the : operator.

8. Write an expression that checks if a number is not equal to 10.

Section B – Decision-Making Statements

1. Write an R program to check if a number is positive or negative.

2. Write an R program to check whether a given number is even or odd.

3. Write a program that prints “Pass” if marks are above 40, otherwise “Fail”.

4. Write a program that classifies temperature as “Cold”, “Warm”, or “Hot”.

5. Write a program that calculates a grade based on marks (A, B, C, Fail).

6. Write a nested if program to simulate an ATM PIN check and balance


verification.

7. Write an R program to find the largest among three numbers using if-else.

8. Write a program that checks if a person is eligible for a driving license (age
≥ 18).

9. Write a program that checks whether a number is divisible by both 2 and 3.

10. Write an R program that prints “Welcome” if both username and


password match predefined values.
Chapter 4 – Taking Input & Loops in R Programming

Introduction

In R programming, we often need to:

1. Take input from users (for example, their name, age, or any value).

2. Repeat certain actions multiple times (like printing numbers, calculating totals, etc.) — this is
done using loops.

This chapter covers both — how to take user input and how to use different types of loops in R.

Taking Input in R

The function used to take user input in R is readline().


It always takes input as a string (text), even if the user types numbers.
To convert this input to a numeric value, we use [Link]().

1. Basic Input

You can prompt the user for input by giving a message inside the prompt argument.

Example:

name <- readline(prompt = "Enter your name: ")

print(paste("Hello,", name))

Output Example:

Enter your name: Riya

Hello, Riya

2. Numeric Input

By default, readline() returns a string, so we convert it using [Link]() to perform mathematical


operations.
Example:

age <- [Link](readline(prompt = "Enter your age: "))

print(paste("You are", age, "years old."))

Output Example:

Enter your age: 20

You are 20 years old.

3. Handling Invalid Input

Sometimes users enter invalid data (like text instead of numbers).


We can check it using [Link]() (which checks for "Not Available" or invalid numeric values).

Example:

age <- [Link](readline(prompt = "Enter your age: "))

if ([Link](age)) {

print("Please enter a valid number.")

} else {

print(paste("You are", age, "years old."))

Output Example:

Enter your age: abc

Please enter a valid number.

4. Taking Multiple Inputs

You can take multiple inputs by using readline() more than once.

Example:
name <- readline(prompt = "Enter your name: ")

age <- [Link](readline(prompt = "Enter your age: "))

print(paste("Hello,", name, "you are", age, "years old."))

Output Example:

Enter your name: Aman

Enter your age: 25

Hello, Aman you are 25 years old.

5. Real-World Example – Billing System

item <- readline(prompt = "Enter item name: ")

price <- [Link](readline(prompt = "Enter price: "))

qty <- [Link](readline(prompt = "Enter quantity: "))

total <- price * qty

print(paste("You bought", qty, item, "for a total of", total, "rupees."))

Loops in R Programming

Loops are used to repeat tasks automatically without writing the same code many times.

They are very useful in cases such as:

• Printing sequences,

• Processing lists of data,

• Repeating tasks until a condition is met.

Types of Loops in R

1. For Loop

2. While Loop
3. Repeat Loop

1. For Loop

A for loop is used when you know how many times you want to repeat the code.

Syntax:

for (variable in sequence) {

# Code to execute

Example 1 – Print Numbers from 1 to 5

for (i in 1:5) {

print(paste("This is iteration number", i))

Output:

This is iteration number 1

This is iteration number 2

This is iteration number 3

This is iteration number 4

This is iteration number 5

Example 2 – Print All Elements in a Vector

fruits <- c("Apple", "Banana", "Mango")

for (item in fruits) {

print(paste("I like", item))

}
Example 3 – Calculate Squares of Numbers

for (num in 1:5) {

print(paste("Square of", num, "is", num^2))

Real-World Example – Student Roll Numbers

for (roll in 1:3) {

print(paste("Processing Roll Number:", roll))

2. While Loop

A while loop runs as long as the condition is TRUE.


It is useful when we don’t know beforehand how many times we need to loop.

Syntax:

while (condition) {

# Code to execute

Example 1 – Count from 1 to 5

count <- 1

while (count <= 5) {

print(paste("Count is:", count))

count <- count + 1

}
Example 2 – Print Even Numbers Up to 10

num <- 2

while (num <= 10) {

print(num)

num <- num + 2

Real-World Example – Countdown Timer

n <- [Link](readline(prompt = "Enter a number: "))

while (n >= 0) {

print(n)

n <- n - 1

3. Repeat Loop

A repeat loop keeps executing code indefinitely until a break statement stops it.
It’s similar to a while loop, but without an initial condition.

Syntax:

repeat {

# Code to execute

if (condition) {

break

Example 1 – Print Numbers from 1 to 5


count <- 1

repeat {

print(paste("Count is:", count))

count <- count + 1

if (count > 5) {

break

Example 2 – Ask for Password

repeat {

password <- readline(prompt = "Enter password: ")

if (password == "admin123") {

print("Access Granted")

break

} else {

print("Wrong password. Try again.")

Real-World Example – Sum Until “stop” Entered

sum <- 0

repeat {

input <- readline(prompt = "Enter a number or 'stop': ")

if (input == "stop") {
break

sum <- sum + [Link](input)

print(paste("Total Sum:", sum))

Summary of Loops

Loop Type Description When to Use

Executes for each element in a


For Loop When you know the exact number of iterations
sequence

While Loop Executes while a condition is true When the number of iterations is unknown

Repeat When you want to stop manually based on a


Executes indefinitely until break
Loop condition

When to Use Which Loop

Situation Best Loop

Known range or list For loop

Unknown number of times, condition-based While loop

Run indefinitely until break Repeat loop

Practice Questions (Without Answers)

Section A – Input

1. Write an R program that asks the user for their name and greets them.

2. Take two numbers as input and display their sum.

3. Ask the user for their age and check if they are an adult (age ≥ 18).
4. Write an R program that asks for a product name and price, then displays a billing message.

5. Take input for 3 subject marks and print the total and average marks.

6. Ask the user to enter a temperature and print whether it’s cold, warm, or hot.

7. Write an R program that checks if the user entered a valid numeric value.

8. Take multiple inputs: user’s name, city, and age, and print them together.

Section B – Loops

1. Write a for loop to print numbers from 1 to 10.

2. Create a for loop that prints the square of numbers from 1 to 5.

3. Use a while loop to print numbers from 10 down to 1.

4. Use a while loop to print even numbers from 2 to 20.

5. Create a repeat loop that prints numbers until the user enters “stop”.

6. Write a for loop that prints each element of a vector c("R", "Python", "C++").

7. Create a countdown timer using while loop that starts from a user-input number.

8. Ask the user for a number and print its multiplication table using a for loop.

9. Use repeat loop to repeatedly ask for names until the user types “done”.

10. Write a program using while loop that prints the sum of numbers from 1 to 50.
Chapter-5

Functions
Chapter 5 – Functions in R Programming

Introduction

A function in R is a block of reusable code designed to perform a specific task.


You can think of a function as a mini-program inside your main program.

Functions make your code:

• Easier to manage,

• Easier to reuse, and

• Easier to understand.

In real-world data analysis, functions help automate repetitive tasks, saving time and reducing errors.

What is a Function?

A function in R is a set of instructions that takes input, processes it, and returns output.

In simple words:

A function is a tool that performs a specific job whenever you call it.

For example:

• The function sum() adds numbers.

• The function mean() calculates the average.

You can also create your own functions to perform custom tasks.

Why Use Functions?

1. Reusability –
Once created, a function can be reused multiple times without rewriting the same code.

2. Modularity –
Functions divide a program into small, manageable sections, making it easier to debug.
3. Organization –
Functions keep related code grouped together and your program more structured.

4. Abstraction –
Complex operations can be hidden behind a simple function name.

How to Create a Function in R

You can define a new function using the function() keyword.

Syntax:

function_name <- function(parameters) {

# code to execute

return(output)

Explanation:

• function_name → the name you give to the function.

• parameters → the input values (optional).

• return() → specifies the result that the function should output.

Example – Simple Function

add_numbers <- function(x, y) {

sum <- x + y

return(sum)

Here:

• The function is named add_numbers.

• It takes two inputs (x and y).

• It adds them and returns their sum.

Calling the Function


result <- add_numbers(5, 10)

print(result)

Output:

[1] 15

Functions with Default Parameters

Sometimes you want your function to work even if the user doesn’t give all inputs.
In that case, you can provide default values for parameters.

Example – Default Parameter

greet <- function(name = "Guest") {

message <- paste("Hello,", name)

return(message)

print(greet()) # Output: "Hello, Guest"

print(greet("Alice")) # Output: "Hello, Alice"

Here, if no name is provided, "Guest" is used by default.

Scope of Variables in Functions

Variables defined inside a function exist only within that function — this is called local scope.
Variables defined outside a function are called global variables.

Example – Local and Global Scope

x <- 5 # Global variable

my_function <- function() {

y <- 10 # Local variable

return(x + y)

}
print(my_function()) # 15

print(x) #5

# print(y) # Error: 'y' not found (y exists only inside the function)

How to Call a Function

You can use a function by typing its name followed by parentheses () containing the required arguments.

Example

result <- add_numbers(8, 12)

print(paste("Sum is:", result))

Output:

Sum is: 20

Returning Multiple Values from a Function

You can return multiple values from a function in R using a list.

Example – Return Multiple Values

calculate_values <- function(x, y) {

sum <- x + y

diff <- x - y

prod <- x * y

return(list(Sum = sum, Difference = diff, Product = prod))

result <- calculate_values(10, 5)

print(result)

Output:

$Sum
[1] 15

$Difference

[1] 5

$Product

[1] 50

Built-in Functions in R

R comes with many predefined (built-in) functions that perform common tasks.
You don’t need to write them yourself — just call them.

1. Mathematical Functions

Function Description Example Output

sum() Adds all numbers sum(c(1, 2, 3)) 6

sqrt() Square root sqrt(16) 4

abs() Absolute value abs(-10) 10

round() Rounds numbers round(3.567, 2) 3.57

2. Statistical Functions

Function Description Example Output

mean() Average value mean(c(2, 4, 6)) 4

median() Middle value median(c(1, 3, 5)) 3

sd() Standard deviation sd(c(1, 2, 3, 4)) 1.29

var() Variance var(c(1, 2, 3, 4)) 1.67


3. Character Functions

Function Description Example Output

nchar() Count characters nchar("Hello") 5

toupper() Convert to uppercase toupper("hello") "HELLO"

tolower() Convert to lowercase tolower("WORLD") "world"

substr() Extract substring substr("Rprogram", 1, 3) "Rpr"

4. Data Frame Functions

Function Description Example

head() Displays first few rows head(mtcars)

tail() Displays last few rows tail(mtcars)

summary() Gives summary of dataset summary(mtcars)

nrow() Counts rows in a data frame nrow(mtcars)

5. Logical Functions

Function Description Example Output

any() Checks if any value is TRUE any(c(FALSE, TRUE)) TRUE

all() Checks if all values are TRUE all(c(TRUE, TRUE)) TRUE

[Link]() Checks for missing values [Link](c(1, NA, 2)) TRUE FALSE FALSE

Real-World Examples Using Functions

Example 1 – Salary Bonus Function

calculate_bonus <- function(salary, bonus_percent = 10) {

bonus <- salary * bonus_percent / 100

total <- salary + bonus


return(total)

print(calculate_bonus(50000))

print(calculate_bonus(50000, 20))

Example 2 – Temperature Conversion Function

convert_temp <- function(celsius) {

fahrenheit <- (celsius * 9/5) + 32

return(fahrenheit)

print(convert_temp(25))

Example 3 – Student Grade Function

student_grade <- function(marks) {

if (marks >= 90) {

return("A+")

} else if (marks >= 75) {

return("A")

} else if (marks >= 50) {

return("B")

} else {

return("Fail")

print(student_grade(82))
Example 4 – Mean and Median Together

mean_median <- function(numbers) {

mean_value <- mean(numbers)

median_value <- median(numbers)

return(list(mean = mean_value, median = median_value))

data <- c(10, 20, 30, 40, 50)

print(mean_median(data))

Summary

Concept Description

Function A block of reusable code that performs a task

Parameters Inputs to a function

Return The output of a function

Scope Determines where a variable can be accessed

Built-in Functions Predefined functions like sum(), mean(), nchar(), etc.

Default Parameters Provide default values if no input is given

Practice Questions (Without Answers)

Section A – Basic Function Creation

1. Write a function called square that takes one numeric input and returns its square.

2. Create a function called add_three that adds three numbers and returns the result.

3. Write a function named difference that returns the absolute difference between two numbers.

4. Write a function that takes a temperature in Celsius and returns it in Fahrenheit.


Section B – Function with Multiple Parameters

1. Create a function named calculate_area that takes length and width as inputs and returns the area of a
rectangle.

2. Write a function calculate_interest that takes principal, rate, and time and returns simple interest.

3. Write a function that calculates the perimeter of a rectangle using two inputs.

Section C – Default Parameters

1. Write a function called greet_user that takes a name and prints a greeting. If no name is given, it should
default to “Guest”.

2. Create a function discount_price with a default discount of 5%.

3. Write a function that prints the message “Welcome to R Programming!” if no name is given.

Section D – Returning Multiple Values

1. Create a function min_max that returns both the minimum and maximum of a numeric vector.

2. Write a function stats_summary that returns sum, mean, and median of a numeric vector.

3. Write a function that takes two numbers and returns their sum, difference, and product as a list.

Section E – Using Built-in Functions

1. Write a function mean_median that takes a vector and returns both mean and median.

2. Write a function that counts the number of characters in a user-input string.

3. Write a function that prints the first 5 rows of a dataset using head().

4. Write a function that checks if any element in a vector is negative.

5. Write a function that uses summary() to print details of a given dataset.


Chapter -6

Join & DPLYR


Introduction

When working with large datasets, it’s common to have information split across multiple tables or data frames.
For example:

• One data frame may contain student names.

• Another may contain their marks or ages.

To combine them, R provides join operations through the dplyr package.

What is Joining in R?

Joining means merging two data frames based on a common column (key) — usually an ID, name, or code.
It helps combine related data stored in different places.

Example:

• df1: Student names and IDs

• df2: Student IDs and marks

You can merge them using a join on the ID column.

Types of Joins in R (using dplyr)

The dplyr package provides simple and powerful join functions.


Below are the most commonly used ones:

Join Type Description

inner_join() Returns only rows that have matching values in both data frames.

Returns all rows from the left data frame, and matching rows from the right one. Unmatched rows get
left_join()
NA.

right_join() Returns all rows from the right data frame, and matching rows from the left one.

full_join() Returns all rows from both data frames. Missing matches get NA.
Before You Start

Install and load the dplyr package:

[Link]("dplyr")

library(dplyr)

Example Data Frames

df1 <- [Link](ID = c(1, 2, 3),

Name = c("Alice", "Bob", "Charlie"))

df2 <- [Link](ID = c(2, 3, 4),

Age = c(25, 30, 22))

1. Inner Join

Definition:
An inner join returns only those rows that have matching values in both data frames.

Example

result <- inner_join(df1, df2, by = "ID")

print(result)

Output:

ID Name Age

1 2 Bob 25

2 3 Charlie 30

Explanation:
Only IDs 2 and 3 exist in both df1 and df2, so those rows are returned.

2. Left Join

Definition:
A left join returns all rows from the left data frame, and the matching rows from the right.
If there is no match, it fills with NA.
Example

result <- left_join(df1, df2, by = "ID")

print(result)

Output:

ID Name Age

1 1 Alice NA

2 2 Bob 25

3 3 Charlie 30

Explanation:

• All students from df1 are shown.

• Alice has no match in df2, so Age = NA.

3. Right Join

Definition:
A right join returns all rows from the right data frame and matching rows from the left.

Example

result <- right_join(df1, df2, by = "ID")

print(result)

Output:

ID Name Age

1 2 Bob 25

2 3 Charlie 30

3 4 NA 22

Explanation:

• All rows from df2 are included.

• The record with ID = 4 has no matching name, so Name = NA.


4. Full Join

Definition:
A full join combines all rows from both data frames.
If there’s no match, missing values are filled with NA.

Example

result <- full_join(df1, df2, by = "ID")

print(result)

Output:

ID Name Age

1 1 Alice NA

2 2 Bob 25

3 3 Charlie 30

4 4 NA 22

Explanation:
All rows from both data frames appear, with NA where data is missing.

Real-World Example – Employee Database

employees <- [Link](EmpID = c(1, 2, 3),

Name = c("John", "Sara", "Mike"))

salaries <- [Link](EmpID = c(2, 3, 4),

Salary = c(50000, 60000, 55000))

# Merge employee and salary data

result <- full_join(employees, salaries, by = "EmpID")

print(result)

Output:

EmpID Name Salary


1 1 John NA

2 2 Sara 50000

3 3 Mike 60000

4 4 NA 55000

Summary of Join Types

Join Type Includes from Left Includes from Right Unmatched Filled with NA

inner_join Yes (only matches) Yes (only matches) No

left_join All Matches only Yes

right_join Matches only All Yes

full_join All All Yes

Introduction to DPLYR Package

The dplyr package in R is used for data manipulation — cleaning, transforming, and summarizing data frames
easily.
It provides simple, human-readable functions to perform tasks that would otherwise need complex code.

Installing and Loading dplyr

[Link]("dplyr")

library(dplyr)

Key Functions of DPLYR

Function Description Example

filter() Selects rows based on conditions filter(df, Age > 25)

select() Chooses specific columns select(df, Name, Age)

mutate() Adds or modifies columns mutate(df, AgePlusOne = Age + 1)


Function Description Example

arrange() Sorts rows arrange(df, Age)

summarize() Creates summary statistics summarize(df, avg_age = mean(Age))

group_by() Groups data for aggregation group_by(df, Name)

1. filter() – Select Rows Based on a Condition

df <- [Link](Name = c("Alice", "Bob", "Charlie"),

Age = c(25, 30, 22))

filter(df, Age > 25)

Output:

Name Age

1 Bob 30

2. select() – Choose Columns

select(df, Name)

Output:

Name

1 Alice

2 Bob

3 Charlie

3. mutate() – Add or Modify Columns

mutate(df, AgePlusOne = Age + 1)

Output:

Name Age AgePlusOne


1 Alice 25 26

2 Bob 30 31

3 Charlie 22 23

4. arrange() – Sort Data

arrange(df, Age)

Output:

Name Age

1 Charlie 22

2 Alice 25

3 Bob 30

5. summarize() – Summary Statistics

summarize(df, avg_age = mean(Age))

Output:

avg_age

1 25.67

6. group_by() – Grouped Summary

students <- [Link](Name = c("Alice", "Bob", "Alice", "Bob"),

Subject = c("Math", "Math", "Science", "Science"),

Marks = c(90, 85, 80, 88))

grouped <- group_by(students, Name)

summarize(grouped, avg_marks = mean(Marks))

Output:

Name avg_marks
1 Alice 85

2 Bob 86.5

Real-World Example – Combining Joins and DPLYR

students <- [Link](ID = c(1, 2, 3),

Name = c("Alice", "Bob", "Charlie"))

marks <- [Link](ID = c(2, 3, 4),

Marks = c(85, 90, 78))

# Step 1: Left join to include all students

data_combined <- left_join(students, marks, by = "ID")

# Step 2: Filter only students with marks > 80

filtered_data <- filter(data_combined, Marks > 80)

# Step 3: Select only Name and Marks columns

final_data <- select(filtered_data, Name, Marks)

print(final_data)

Output:

Name Marks

1 Bob 85

2 Charlie 90

Summary – Why Use DPLYR?

Easy to use, clear syntax.


Works well with data frames and tibbles.
Handles large datasets efficiently.
Integrates with other R packages (like ggplot2).

Practice Questions (Without Answers)

Section A – Joins

1. Create two data frames students and marks as shown below and perform an inner join to find students
who have marks listed.

2. students <- [Link](ID = c(1, 2, 3), Name = c("Alice", "Bob", "Charlie"))

3. marks <- [Link](ID = c(2, 3, 4), Marks = c(85, 90, 78))

4. Use a left join to display all students along with their marks (if available).

5. Perform a right join to include all entries from the marks data frame even if some students’ names are
missing.

6. Combine both data frames using a full join to show all students and marks (including missing ones).

7. After performing a left join, use filter() to select only those students who have marks greater than 80, and
display only their Name and Marks columns.

Section B – DPLYR Functions

1. Create a data frame with students’ names and ages, and use filter() to select students older than 25.

2. Use select() to choose only specific columns (e.g., Name and Marks).

3. Use mutate() to create a new column showing Marks + 5 bonus points.

4. Sort the data in ascending order of Marks using arrange().

5. Use summarize() to calculate the average marks from a data frame.

6. Use group_by() and summarize() to calculate the average marks of each student.

7. Combine a left_join() with filter() and select() to get students with marks > 80 and display only their
names.
Chapter 7 – Data Visualization with ggplot2 in R

1. Introduction to ggplot2

ggplot2 is one of the most powerful and flexible libraries in R for creating high-quality, visually
appealing graphs.
It follows the Grammar of Graphics, which means every graph is built by combining small,
independent components (or layers).

A ggplot2 plot consists of three essential elements:

Component Description

Data The dataset you want to visualize

Aesthetics (aes) Defines how variables are mapped (x-axis, y-axis, color, size, etc.)

Geometries (geom) The visual representation (bars, points, lines, etc.)

Key Idea:

In ggplot2, you build a graph layer by layer — starting with data, then aesthetics, then
geometries.

2. Basic Structure of ggplot2

Every plot in ggplot2 follows this general structure:

ggplot(data = <data>, aes(x = <x-axis>, y = <y-axis>)) +

geom_<plot type>()

Example – Simple Bar Plot

library(ggplot2)

df <- [Link](Name = c("Alice", "Bob", "Charlie"),

Age = c(25, 30, 22))


ggplot(df, aes(x = Name, y = Age)) +

geom_bar(stat = "identity")

Explanation:

• data → specifies the dataset.

• aes() → defines mappings (x = Name, y = Age).

• geom_bar() → creates bars, stat = "identity" uses actual values.

3. Geometries (Geoms)

Geometries define what kind of plot you are creating.

Geometry Description Function

geom_point() Scatter plot Dots showing relationship

geom_line() Line plot Lines showing trend

geom_bar() Bar chart Bars showing comparison

geom_histogram() Histogram Shows frequency distribution

Example – Scatter Plot

df <- [Link](Height = c(150, 160, 170, 180),

Weight = c(55, 65, 70, 80))

ggplot(df, aes(x = Height, y = Weight)) +

geom_point()

Explanation:
Each point represents a (Height, Weight) pair.
4. Aesthetics (aes)

Aesthetics define how your data looks — such as position, color, size, or shape.

Aesthetic Purpose Example

aes(x, y) Defines variables on x and y axes aes(x = Height, y = Weight)

aes(color) Colors data points or lines aes(color = Gender)

aes(size) Adjusts size of points aes(size = Age)

aes(fill) Fills bars/areas with color aes(fill = Category)

Example – Color by Group

df <- [Link](Name = c("Alice", "Bob", "Charlie"),

Age = c(25, 30, 22),

Gender = c("Female", "Male", "Male"))

ggplot(df, aes(x = Name, y = Age, fill = Gender)) +

geom_bar(stat = "identity")

Explanation:
Bars are colored based on the Gender variable.

5. Facets (Creating Subplots)

Faceting allows you to split your data into multiple subplots — one for each category or group.

Function Description

facet_wrap(~variable) Creates one subplot per level of a variable


Function Description

facet_grid(rows ~ columns) Creates a grid of plots

Example – Facet by Gender

df <- [Link](Age = c(20, 25, 30, 35, 40, 45),

Gender = c("Male", "Female", "Male", "Female", "Male", "Female"))

ggplot(df, aes(x = Age)) +

geom_histogram(binwidth = 5, fill = "skyblue") +

facet_wrap(~Gender)

Explanation:
This creates separate histograms for males and females.

6. Labels and Titles

Labels make your plot more readable and informative.

Function Purpose

ggtitle() Adds a title

xlab() Label for x-axis

ylab() Label for y-axis

Example – Add Labels and Title

ggplot(df, aes(x = Age, y = Name)) +

geom_point() +
ggtitle("Age of Individuals") +

xlab("Age in Years") +

ylab("Name of Person")

Explanation:
The plot now includes descriptive titles and axis labels.

7. Themes (Customizing the Appearance)

Themes define the style and overall look of your plot.

Function Description

theme() Manually customize text, gridlines, colors

theme_minimal() Clean minimal design

theme_bw() Black and white theme

theme_classic() Classic simple look

Example – Using Themes

ggplot(df, aes(x = Age, y = Name)) +

geom_point() +

theme_minimal()

Explanation:
The theme_minimal() removes background grids and gives a neat modern style.

8. Saving Plots

You can save your plots to files (like PNG, JPG, or PDF) using ggsave().
Function Description

ggsave("[Link]") Saves the last created plot

Example – Saving a Plot

ggplot(df, aes(x = Age, y = Name)) +

geom_point()

ggsave("[Link]")

Explanation:
The last created plot is saved as [Link] in your working directory.

9. Modifying Axes

You can control and format your axes using scale_x_continuous() and scale_y_continuous().

Function Purpose

scale_x_continuous(limits = c(a, b)) Set range of x-axis

scale_y_continuous(limits = c(a, b)) Set range of y-axis

Example – Adjust Axis Limits

ggplot(df, aes(x = Age, y = Name)) +

geom_point() +

scale_x_continuous(limits = c(20, 40))

Explanation:
Only ages between 20 and 40 are displayed.

10. Combining Multiple Layers


One of ggplot2’s greatest strengths is its ability to combine multiple layers on the same plot.

You can mix geoms such as points, lines, and smooth curves.

Example – Points + Trend Line

df <- [Link](Age = c(20, 25, 30, 35, 40),

Height = c(150, 160, 170, 175, 180))

ggplot(df, aes(x = Age, y = Height)) +

geom_point(color = "blue") +

geom_smooth(method = "lm", se = FALSE)

Explanation:
The geom_point() adds dots, and geom_smooth() adds a trend line using linear regression.

11. Plotting with Groups

Grouping allows you to display different categories in one plot using different colors or shapes.

Aesthetic Purpose

aes(group = variable) Groups data by a variable

aes(color = variable) Colors groups differently

aes(shape = variable) Uses different shapes

Example – Group by Gender

df <- [Link](Age = c(20, 25, 30, 35, 40),

Height = c(150, 160, 170, 175, 180),

Gender = c("Male", "Female", "Male", "Female", "Male"))


ggplot(df, aes(x = Age, y = Height, color = Gender)) +

geom_point(size = 3)

Explanation:
Male and Female data points appear in different colors.

Summary of ggplot2 Concepts

Concept Description

Data & Aesthetics (aes) Define what to plot and how to map data to visuals

Geoms Choose the type of plot (scatter, bar, line, etc.)

Facets Create subplots for different categories

Labels Add titles, captions, and axis labels

Themes Customize plot design and appearance

Layers Combine multiple visual layers (e.g., points + lines)

Saving Plots Export your plots for reports or presentations

Practice Questions (Without Answers)

1. Simple Scatter Plot

Create a scatter plot showing the relationship between height and weight from the given data.

df <- [Link](Height = c(150, 160, 170, 180),

Weight = c(55, 65, 70, 80))

# Practice: Create a scatter plot using geom_point().


2. Bar Plot with Colors

Create a bar plot of students' scores, and color the bars based on their subject.

df <- [Link](Name = c("Alice", "Bob", "Charlie"),

Score = c(90, 85, 88),

Subject = c("Math", "Science", "English"))

# Practice: Create a bar plot using geom_bar() with fill = Subject.

3. Add Titles and Labels

Create a line plot showing the monthly sales data. Add a title and labels for the x and y axes.

df <- [Link](Month = c("Jan", "Feb", "Mar", "Apr"),

Sales = c(200, 250, 300, 400))

# Practice: Create a line plot using geom_line() and add ggtitle(), xlab(), and ylab().

4. Histogram

Create a histogram of ages in the following data, and set the bin width to 5.

df <- [Link](Age = c(21, 25, 30, 35, 40, 45, 50))

# Practice: Create a histogram using geom_histogram() with binwidth = 5.

5. Faceting

Use faceting to split the scatter plot of Height and Weight by gender.

df <- [Link](Height = c(150, 160, 170, 180),

Weight = c(55, 65, 70, 80),

Gender = c("Male", "Female", "Male", "Female"))

# Practice: Create a scatter plot using geom_point() and use facet_wrap(~Gender).


Chapter -8

Array

1. Introduction to Arrays in R

An array in R is a multidimensional data structure that can store elements of the same data type — such as
numeric, character, or logical.

You can think of:

• A vector as a 1-dimensional array,

• A matrix as a 2-dimensional array,

• An array as an extension of a matrix into three or more dimensions.

In simple words:

Arrays in R are like a collection of tables (or matrices) stacked one over another.

2. Creating an Array

To create an array, use the array() function.


You need to provide:

• The data (values to store)

• The dimensions (size in each direction — rows, columns, layers, etc.)

Syntax:

array(data, dim = c(row, column, layer, ...))

Example – 2D Array (Matrix)

# Create a 2x3 array

my_array <- array(c(1, 2, 3, 4, 5, 6), dim = c(2, 3))

print(my_array)

Output:

[,1] [,2] [,3]

[1,] 1 3 5
[2,] 2 4 6

Explanation:

• The array has 2 rows and 3 columns.

• The data fills column-wise by default (R fills columns first, not rows).

3. Accessing Elements in an Array

You can access array elements using square brackets [ ].

Syntax:

array_name[row, column, layer]

If it’s a 2D array (matrix), just use row and column.

Example

# Access element in row 1, column 2

my_array[1, 2]

Output:

[1] 3

Explanation:
Row = 1, Column = 2 → Value = 3

Modify an Element

my_array[1, 2] <- 10

print(my_array)

Output:

[,1] [,2] [,3]

[1,] 1 10 5

[2,] 2 4 6

The value in row 1, column 2 is changed to 10.


4. Creating a 3D Array

A 3D array can be visualized as multiple 2D matrices (layers) stacked together.

Example – 3D Array

# Create a 2x3x2 array

my_3d_array <- array(c(1:12), dim = c(2, 3, 2))

print(my_3d_array)

Output:

,,1

[,1] [,2] [,3]

[1,] 1 3 5

[2,] 2 4 6

,,2

[,1] [,2] [,3]

[1,] 7 9 11

[2,] 8 10 12

Explanation:

• 2 rows

• 3 columns

• 2 layers
→ Total elements = 2 × 3 × 2 = 12

Layer 1 contains elements 1–6, and layer 2 contains 7–12.

5. Common Array Operations

R provides many built-in functions for arrays.


Operation Function Example Description

Sum of all elements sum() sum(my_array) Adds up all elements

Mean of elements mean() mean(my_array) Calculates average

Sum along rows apply() apply(my_array, 1, sum) Sums row-wise

Sum along columns apply() apply(my_array, 2, sum) Sums column-wise

Modify element [ ] <- my_array[1,2] <- 10 Change a value

Example – Using apply()

# Create array

my_array <- array(c(1:6), dim = c(2, 3))

# Sum along rows

apply(my_array, 1, sum)

# Sum along columns

apply(my_array, 2, sum)

Output:

Row sums: [1] 9 12

Column sums: [1] 3 7 11

Explanation:

• apply(my_array, 1, sum) → 1 means apply across rows.

• apply(my_array, 2, sum) → 2 means apply across columns.

6. Naming Rows, Columns, and Layers

You can add meaningful names to rows, columns, and layers using the dimnames parameter.

Example
my_array <- array(c(1:12),

dim = c(2, 3, 2),

dimnames = list(

c("Row1", "Row2"),

c("Col1", "Col2", "Col3"),

c("Matrix1", "Matrix2")

))

print(my_array)

Output (simplified):

, , Matrix1

Col1 Col2 Col3

Row1 1 3 5

Row2 2 4 6

Explanation:
Adding names helps make the output more readable — especially for multi-dimensional data.

7. Real-World Example – Sales Data

Imagine you have monthly sales data for two products across two regions.

Example

sales <- array(c(120, 130, 140, 150, 160, 170, 180, 190),

dim = c(2, 2, 2),

dimnames = list(

c("Product1", "Product2"),

c("Region1", "Region2"),

c("Jan", "Feb")

))
print(sales)

Output:

, , Jan

Region1 Region2

Product1 120 140

Product2 130 150

, , Feb

Region1 Region2

Product1 160 180

Product2 170 190

Explanation:

• Each layer (Jan, Feb) represents a month.

• Each cell represents sales for a product in a region.

You can use:

apply(sales, 3, sum)

to get total sales per month.

8. Summary of Key Points

Concept Description

Array Stores elements of same data type in multiple dimensions

array() Creates an array

dim Defines size of each dimension

[] Used to access or modify elements

apply() Applies functions across rows/columns/layers


Concept Description

dimnames Adds names to rows, columns, or layers

sum(), mean() Common operations on array elements

Practice Questions (Without Answers)

1. Basic Array Creation

Create a 3×3 array containing numbers from 1 to 9. Print the array.

2. Element Access

Create a 2×3 array and access:

• The element in row 2, column 3

• The entire first row

• The entire second column

3. Modify Elements

Create a 2×2 array and change the element at (1,2) to 50.

4. 3D Array

Create a 3D array with dimensions (2×3×2) filled with numbers 1 to 12. Print the array.

5. Using apply()

For the 2×3 array:

• Find the sum of each row

• Find the sum of each column

6. Named Dimensions
Create an array with rows as "A", "B", columns as "X", "Y", "Z", and layers as "Sheet1" and "Sheet2".

7. Real-World Scenario

Create a 3D array to store sales data of 2 products across 3 stores for 2 months.
Then, find total sales for each month using apply().

8. Array Operations

Create a numeric array and perform:

• sum(), mean(), and max() operations.

9. Replace Elements

Replace all values greater than 5 in an array with 0.

10. Combining apply() and dimnames

Create a named 3D array (students × subjects × terms) and use apply() to calculate total marks per term.
Other Practice Questions

1. Creating a 1D Array
Create a 1D array containing the first five positive integers.

# Practice: Create a 1D array with integers 1 to 5.

2. Creating a 2D Array
Create a 2D array (3x3) with the following values:

1 2 3

4 5 6

7 8 9

# Practice: Create a 3x3 array with the values above.

3. Accessing Elements
Given the following 2D array, access the element in the second row and third column.

my_array <- array(c(1:9), dim = c(3, 3))

# Practice: Access the element in row 2, column 3.

4. Modifying an Element
Modify the element in the first row and first column of the following array to 99.

my_array <- array(c(1:6), dim = c(2, 3))

# Practice: Change the value at row 1, column 1 to 99.

5. Creating a 3D Array
Create a 3D array (2x2x3) with numbers 1 to 12.

# Practice: Create a 2x2x3 array with numbers from 1 to 12.


Chapter -9

String programming

Chapter 9 – String Programming in R

1. Introduction

String programming in R deals with text data manipulation — creating, combining, formatting, searching, and
transforming strings.

Strings are sequences of characters such as letters, words, or sentences, and are often used for working with:

• Names, addresses, and text data

• Data cleaning

• File names or messages

• Report formatting

In R, strings are enclosed in either single (' ') or double (" ") quotes.

2. Creating Strings

Strings can be created using single or double quotes. Both work the same way.

Example

str1 <- "Hello, World!"

str2 <- 'This is R programming.'

print(str1)

print(str2)
Output:

[1] "Hello, World!"

[1] "This is R programming."

Tip:
Always keep strings in quotes. Using double quotes is most common in R.

3. Finding String Length

To find the number of characters in a string (including spaces and punctuation), use the nchar() function.

Example

str1 <- "Hello, World!"

length_str <- nchar(str1)

print(length_str)

Output:

[1] 13

The length includes spaces and symbols as characters.

4. Extracting Substrings

To extract part of a string, use substr() or substring().

Function Description

substr(string, start, end) Extracts substring from start to end position

substring(string, start, end) Similar, but can be vectorized

Example

str1 <- "Hello, World!"

substring_example <- substr(str1, 1, 5)

print(substring_example)

Output:
[1] "Hello"

Explanation:
Extracts characters from position 1 to 5.

5. String Concatenation

You can combine (join) strings using:

• paste() → adds a space by default

• paste0() → joins without any space

Example

str1 <- "Hello"

str2 <- "World"

str3 <- paste(str1, str2) # Adds a space

str4 <- paste0(str1, str2) # No space

print(str3)

print(str4)

Output:

[1] "Hello World"

[1] "HelloWorld"

You can also add custom separators:

paste(str1, str2, sep = "-")

→ "Hello-World"

6. Changing Case

Convert strings to uppercase or lowercase with toupper() and tolower().

Example

str1 <- "Hello, World!"

str2 <- "r programming"


upper_str <- toupper(str1)

lower_str <- tolower(str2)

print(upper_str)

print(lower_str)

Output:

[1] "HELLO, WORLD!"

[1] "r programming"

7. Replacing Substrings

To replace text inside a string, use:

• sub() → replaces the first occurrence

• gsub() → replaces all occurrences

Example

str1 <- "Learning R is fun! R is great!"

replaced_str <- gsub("R", "Python", str1)

print(replaced_str)

Output:

[1] "Learning Python is fun! Python is great!"

8. Splitting Strings

Split a string into multiple parts using strsplit(), based on a delimiter (like a comma or space).

Example

split_str <- strsplit("R,Python,Java", ",")

print(split_str)
Output:

[[1]]

[1] "R" "Python" "Java"

The result is a list, so to get the first element:

split_str[[1]]

9. Finding Substrings

Use grep(), grepl(), or regexpr() to search for specific words inside a string.

Function Description

grep() Returns index/position of match

grepl() Returns TRUE/FALSE if match found

regexpr() Returns starting position of match

Example

str1 <- "Hello, World!"

position <- grep("World", str1)

found <- grepl("World", str1)

print(position)

print(found)

Output:

[1] 1

[1] TRUE

10. Trimming Whitespace

Use trimws() to remove unwanted leading and trailing spaces.

Example
str5 <- " Hello, R! "

trimmed_str <- trimws(str5)

print(trimmed_str)

Output:

[1] "Hello, R!"

Tip: Useful when cleaning text data with extra spaces.

11. String Formatting

Use sprintf() to create formatted strings, like inserting values into text templates.

Example

name <- "R"

age <- 25

formatted_str <- sprintf("I am learning %s at the age of %d.", name, age)

print(formatted_str)

Output:

[1] "I am learning R at the age of 25."

Similar to string formatting in other languages like C or Python.

12. Full Example – String Manipulation in R

# Original string

original_str <- " R Programming is fun! "

# 1. Trim whitespace

trimmed_str <- trimws(original_str)

# 2. Convert to uppercase

upper_str <- toupper(trimmed_str)


# 3. Replace "fun" with "awesome"

replaced_str <- gsub("fun", "awesome", upper_str)

# 4. Split into words

words <- strsplit(replaced_str, " ")[[1]]

# 5. Concatenate back to a single string

final_str <- paste(words, collapse = " ")

print(final_str)

Output:

[1] "R PROGRAMMING IS AWESOME!"

Explanation:

1. Removed extra spaces

2. Converted to uppercase

3. Replaced a word

4. Split and joined words back together

13. Summary of String Functions

Function Purpose Example

nchar() Count characters nchar("R")

substr() Extract substring substr("Hello", 1, 3)

paste() Combine strings with space paste("R", "Language")

paste0() Combine strings without space paste0("R", "Language")


Function Purpose Example

toupper() Convert to uppercase toupper("hello")

tolower() Convert to lowercase tolower("HELLO")

sub() / gsub() Replace text gsub("R", "Python", "R is fun")

strsplit() Split string strsplit("a,b,c", ",")

grep() Find pattern grep("R", c("R", "Python"))

trimws() Trim whitespace trimws(" R ")

sprintf() Format string sprintf("Age: %d", 25)

Practice Questions (Without Answers)

1. Creating Strings

Create a string variable that contains your favorite quote.

# Practice: Create a string variable with your favorite quote.

2. Finding String Length

Given the string "Learning R is fun!", find its length using nchar().

my_string <- "Learning R is fun!"

# Practice: Use nchar() to find the length of the string.

3. Extracting Substrings

From the string "Data Science with R", extract the substring "Science".

my_string <- "Data Science with R"

# Practice: Use substr() to extract "Science".

4. String Concatenation

Concatenate "Hello" and "World" with a space in between.


str1 <- "Hello"

str2 <- "World"

# Practice: Use paste() to combine the two strings.

5. Changing Case

Convert "r programming" to uppercase.

my_string <- "r programming"

# Practice: Use toupper() to convert to uppercase.

6. Replacing Substrings

Replace "fun" with "exciting" in "Learning R is fun!".

my_string <- "Learning R is fun!"

# Practice: Use gsub() to replace "fun" with "exciting".

7. Splitting Strings

Split "apple,banana,cherry" into individual fruits.

fruit_string <- "apple,banana,cherry"

# Practice: Use strsplit() to split the string by commas.

8. Trimming Whitespace

Trim the extra spaces from " Welcome to R! ".

text <- " Welcome to R! "

# Practice: Use trimws() to remove spaces.

9. Searching for Substrings

Check if "Data" exists in "Data Science is powerful".

sentence <- "Data Science is powerful"


# Practice: Use grepl() to check if "Data" is found.

10. String Formatting

Use sprintf() to print a formatted sentence like:


"My name is John and I am 22 years old."

# Practice: Use sprintf() for formatted output.


Chapter -10

1. Introduction

A Data Frame in R is one of the most commonly used and powerful data structures for storing datasets.
It is a two-dimensional table-like structure where:

• Each column can contain different data types (numeric, character, logical, factor, etc.).

• Each row represents a single observation or record.

You can think of a data frame like a spreadsheet or an Excel table, where columns are variables and rows are
entries.

2. Creating a Data Frame

The [Link]() function is used to create a data frame.


You can assign column names and fill them with data.

Example – Creating a Simple Data Frame

# Create a simple data frame

df <- [Link](

Name = c("Alice", "Bob", "Charlie"),

Age = c(25, 30, 35),

Score = c(90.5, 85.0, 88.5)

print(df)

Output:

Name Age Score

1 Alice 25 90.5

2 Bob 30 85.0

3 Charlie 35 88.5

Explanation:

• Each column contains a vector.


• All vectors must be of equal length.

• R automatically assigns row numbers (1, 2, 3, …).

3. Accessing Data in a Data Frame

You can access specific columns, rows, or elements using different methods.

(a) Access Columns

Columns can be accessed in multiple ways:

df$Name # Using the $ symbol

df[["Name"]] # Using double brackets

df[, "Name"] # Using [row, column] indexing

Output:

[1] "Alice" "Bob" "Charlie"

df$Name returns the entire "Name" column as a vector.

(b) Access Rows

You can access rows using square brackets [ ].

df[2, ] # Returns the second row

Output:

Name Age Score

1 Bob 30 85.0

Syntax: [row, column]

• Leaving column blank (df[2, ]) means "all columns of row 2."

(c) Access a Specific Element

To access a particular element, specify both row and column.


df[1, 3] # Element at row 1, column 3

Output:

[1] 90.5

This extracts the Score of the first person.

4. Adding Columns and Rows

You can add new columns or rows to an existing data frame.

Add a Column

df$Pass <- df$Score > 85

print(df)

Output:

Name Age Score Pass

1 Alice 25 90.5 TRUE

2 Bob 30 85.0 FALSE

3 Charlie 35 88.5 TRUE

Adds a logical column (TRUE/FALSE) based on a condition.

Add a Row

new_row <- [Link](Name = "David", Age = 28, Score = 82)

df <- rbind(df, new_row)

print(df)

Output:

Name Age Score Pass

1 Alice 25 90.5 TRUE

2 Bob 30 85.0 FALSE


3 Charlie 35 88.5 TRUE

4 David 28 82.0 NA

Adds a new record (row) at the end.

5. Modifying Data Frame Elements

You can modify data in any cell directly using indexing.

Example

df[2, "Score"] <- 89

print(df)

Output:

Name Age Score Pass

1 Alice 25 90.5 TRUE

2 Bob 30 89.0 FALSE

3 Charlie 35 88.5 TRUE

4 David 28 82.0 NA

Updates Bob’s score to 89.

6. Data Reshaping

Data reshaping means changing the structure of your dataset to make it suitable for analysis or visualization.

The tidyr package (part of tidyverse) provides functions for reshaping:

• Wide → Long format (using pivot_longer())

• Long → Wide format (using pivot_wider())

(a) Wide to Long Format

library(tidyr)
wide_df <- [Link](

Name = c("Alice", "Bob"),

Math = c(90, 80),

Science = c(85, 90)

# Convert to long format

long_df <- pivot_longer(wide_df,

cols = c(Math, Science),

names_to = "Subject",

values_to = "Score")

print(long_df)

Output:

# A tibble: 4 × 3

Name Subject Score

<chr> <chr> <dbl>

1 Alice Math 90

2 Alice Science 85

3 Bob Math 80

4 Bob Science 90

Converts column names into a new “Subject” column and their values into “Score.”

(b) Long to Wide Format

wide_df_again <- pivot_wider(long_df,

names_from = Subject,

values_from = Score)
print(wide_df_again)

Output:

# A tibble: 2 × 3

Name Math Science

<chr> <dbl> <dbl>

1 Alice 90 85

2 Bob 80 90

Converts back to the original structure.

7. Common Data Frame Operations

The dplyr package makes data frame manipulation easy and readable.

Before using it:

library(dplyr)

1. Adding a Column

df$Pass <- df$Score > 85

print(df)

Adds a logical column to show whether each student passed.

2. Filtering Rows

Use filter() to extract rows based on a condition.

filtered_df <- df %>% filter(Age > 28)

print(filtered_df)

Shows only students older than 28.


3. Sorting Data

Use arrange() to sort by a column.

sorted_df <- df %>% arrange(desc(Score))

print(sorted_df)

Sorts by Score in descending order.

4. Summarizing Data

Use summarise() to compute statistics like mean, sum, or count.

summary_df <- df %>%

summarise(Average_Score = mean(Score))

print(summary_df)

Output:

Average_Score

1 86.5

Calculates the average of the Score column.

8. Summary

Concept Description

Data Frame Two-dimensional table for mixed data types

Create [Link]()

Access $, [ ], [[ ]]

Add Column df$new <- values

Add Row rbind()

Reshape pivot_longer(), pivot_wider()

Filter filter()
Concept Description

Sort arrange()

Summarize summarise()

Practice Questions (Without Answers)

1. Creating a Data Frame

Create a data frame that contains the following information about students: Name, Age, and Grade.
Use at least three students in your data frame.

# Practice: Create a data frame for students.

2. Accessing Columns

Given the following data frame:

df <- [Link](

Name = c("John", "Mary", "Alice"),

Age = c(22, 25, 20),

Score = c(88, 92, 85)

Access the Score column from the data frame.

# Practice: Access the Score column from df.

3. Filtering Rows

Using the data frame from Question 2, filter the rows to find students with a score greater than 85.

# Practice: Filter students with scores greater than 85.

4. Adding a Column
Add a new column named Pass to the data frame from Question 2 that indicates whether each student has passed
(Score > 90).

# Practice: Add a Pass column based on the Score.

5. Sorting the Data Frame

Sort the data frame from Question 2 by the Age column in ascending order.

# Practice: Sort the data frame by Age.

6. Reshaping (Wide → Long)

Create a data frame with columns Name, Math, and Science, then convert it into long format using pivot_longer().

# Practice: Use pivot_longer() to reshape data.

7. Reshaping (Long → Wide)

Convert the long data frame back into wide format using pivot_wider().

# Practice: Use pivot_wider() to reshape data.

8. Summarizing Data

Using a student data frame, calculate the average Score using summarise().

# Practice: Use summarise() to calculate average Score.

9. Adding and Filtering

Add a new column called “Grade” and filter only those with Grade “A”.

# Practice: Combine mutate() and filter() to find Grade A students.

10. Combined Operation

Using dplyr:

1. Filter students with Score > 80


2. Sort by Score descending

3. Calculate average Age of remaining students

# Practice: Combine filter(), arrange(), and summarise().

Chapter -11

File Handling in R

Chapter – File Handling in R

1. Introduction

File Handling in R refers to the process of reading data from files and writing data to files such as CSV, Excel, or
text files.
This is one of the most important skills for working with datasets stored outside R, especially in data analysis and
machine learning projects.
R provides various built-in and package-based functions for file input/output operations, including:

• CSV Files (Comma-Separated Values)

• Excel Files

• Text Files

• RData / RDS Files

2. Working with CSV Files

CSV (Comma-Separated Values) is the most common format for datasets.

Reading a CSV File

You can use the [Link]() function to import data from a CSV file into a data frame.

# Read CSV file

data <- [Link]("[Link]")

# Display the first few rows

head(data)

Explanation:

• [Link]() reads a CSV file into a data frame.

• You can use head() to view the first few rows.

Customizing CSV Reading

You can specify additional parameters like separators or missing values:

data <- [Link]("[Link]", sep = ",", header = TRUE, [Link] = "NA")

Parameters:

• sep → Defines the separator (, for CSV, ; for semicolon-separated)

• header = TRUE → Treats the first row as column names


• [Link] → Defines how missing values are represented

Writing a CSV File

You can write data frames into CSV files using [Link]().

# Example data frame

df <- [Link](

Name = c("Alice", "Bob", "Charlie"),

Score = c(85, 90, 88)

# Write to CSV

[Link](df, "students_output.csv", [Link] = FALSE)

Explanation:

• The file students_output.csv will be created in your working directory.

• [Link] = FALSE avoids writing row numbers.

Finding or Setting Working Directory

To check or change your working directory:

getwd() # Get current directory

setwd("C:/Users/YourName/Documents") # Set new directory

3. Working with Text Files

You can read and write plain text files (like .txt).

Reading a Text File

# Read all lines of a text file

lines <- readLines("[Link]")


print(lines)

Output Example:

[1] "Name: John" "Age: 25" "Course: Data Science"

Tip:
Each line of the file is stored as an element of a character vector.

Writing to a Text File

# Write to a text file

writeLines(c("Welcome to R", "File Handling Chapter", "Enjoy Learning!"), "[Link]")

This creates a text file named [Link] with three lines of text.

Appending Text

You can append text to an existing file:

write("Additional Line", "[Link]", append = TRUE)

4. Working with Excel Files

To read and write Excel files, you can use the readxl and writexl packages from the tidyverse.

Installing Required Packages

[Link]("readxl")

[Link]("writexl")

library(readxl)

library(writexl)

Reading an Excel File

# Read Excel file

data <- read_excel("[Link]")


# Display data

print(data)

Explanation:

• read_excel() automatically detects the sheet name and data types.

You can also specify a particular sheet:

data <- read_excel("[Link]", sheet = "Sheet2")

Writing to an Excel File

# Write data frame to Excel file

write_xlsx(df, "students_data.xlsx")

This saves the df data frame into an Excel file.

5. Reading and Writing R Data Files

R has two special formats for saving and loading data efficiently:
.RData and .rds files.

Saving and Loading .RData Files

# Save multiple objects

save(df, data, file = "my_data.RData")

# Load them back

load("my_data.RData")

This stores and restores multiple R objects (like data frames or variables).

Saving and Loading .rds Files

# Save one object


saveRDS(df, "[Link]")

# Load it back

df_new <- readRDS("[Link]")

.rds files are useful when you want to store a single R object.

6. Checking and Managing Files

R provides several functions for file management.

Function Description Example

[Link]() Checks if file exists [Link]("[Link]")

[Link]() Deletes a file [Link]("[Link]")

[Link]() Lists all files in directory [Link]()

[Link]() Creates a folder [Link]("output")

unlink() Deletes files or directories unlink("output", recursive = TRUE)

7. Summary

File Type Read Function Write Function Package

CSV [Link]() [Link]() Base R

TXT readLines() writeLines() Base R

Excel read_excel() write_xlsx() readxl, writexl

RData load() save() Base R

RDS readRDS() saveRDS() Base R

Practice Questions (Without Answers)


1. Reading CSV Files

Create a CSV file named [Link] with columns: Name, Age, and Marks.
Write R code to read this file and display its contents.

# Practice: Use [Link]() to load and print the CSV file.

2. Writing CSV Files

Create a data frame for employee details (Name, Department, Salary) and save it as [Link].

# Practice: Use [Link]() to save the data frame.

3. Reading Text Files

Write R code to read a text file named [Link] and print each line.

# Practice: Use readLines() to read text data.

4. Writing Text Files

Write R code to create a text file with three motivational lines about learning R.

# Practice: Use writeLines() to write to a text file.

5. Reading Excel Files

Read an Excel file named [Link] and print only the first 5 rows.

# Practice: Use read_excel() to read the data.

6. Writing Excel Files

Save a data frame with product details (ID, Product, Price) into an Excel file named [Link].

# Practice: Use write_xlsx() to save data.

7. Working Directory

Write R commands to check your current working directory and set it to a folder named C:/R_Projects.
# Practice: Use getwd() and setwd().

8. Managing Files

Write R code to:

1. Check if [Link] exists.

2. List all .csv files in the directory.

3. Delete a file named old_data.csv.

# Practice: Use [Link](), [Link](), and [Link]().

9. Saving and Loading R Objects

Save a data frame named df into a .RData file and load it back.

# Practice: Use save() and load().

10. Saving Single Object

Save a single object called sales_data as an .rds file and read it again.

# Practice: Use saveRDS() and readRDS().

Chapter -12

Chapter – Data Cleaning and Preprocessing in R

1. Introduction

Data Cleaning and Preprocessing are crucial steps in any data analysis workflow.
Before analyzing data, it must be clean, consistent, and structured properly.

Data often contains:

• Missing values
• Duplicates

• Incorrect data types

• Outliers

• Inconsistent formats

R provides powerful tools and packages like dplyr, tidyr, and stringr to handle such issues easily.

2. Importing Data

Before cleaning, first load the dataset into R. You can use:

# Import CSV file

data <- [Link]("[Link]")

# View first few rows

head(data)

Tip: Always inspect your data after loading using:

str(data) # Check structure

summary(data) # Summary statistics

3. Checking for Missing Values

Missing values can cause incorrect results during analysis.


Use [Link]() to identify them.

Example

# Count missing values in each column

colSums([Link](data))

Output Example:

Name Age Salary

0 2 1

Tip: You can check total missing values:


sum([Link](data))

Handling Missing Values

(a) Remove Missing Values

# Remove rows with any missing values

clean_data <- [Link](data)

(b) Replace Missing Values

You can replace missing values using ifelse() or mutate():

# Replace NA in Age with the mean of Age

data$Age[[Link](data$Age)] <- mean(data$Age, [Link] = TRUE)

Explanation:

• [Link] = TRUE tells R to ignore missing values when computing the mean.

4. Checking and Removing Duplicates

Duplicate rows can lead to incorrect analysis.

Example

# Check duplicates

duplicated(data)

# Remove duplicates

data <- data[!duplicated(data), ]

Tip:
Use distinct() from the dplyr package for cleaner syntax:

library(dplyr)

data <- distinct(data)

5. Fixing Incorrect Data Types


Sometimes, numeric data may be read as characters or factors.
You can convert them easily using:

# Convert character to numeric

data$Salary <- [Link](data$Salary)

# Convert numeric to character

data$ID <- [Link](data$ID)

Check the data type:

str(data)

6. Renaming Columns

Rename columns for better readability.

Example

# Rename columns using dplyr

library(dplyr)

data <- rename(data, Employee_Name = Name, Employee_Age = Age)

Output:

Employee_Name Employee_Age Salary

Alice 25 50000

7. Handling Outliers

Outliers are extreme values that can affect results.

Detect Outliers using Boxplot

boxplot(data$Salary, main = "Salary Boxplot")

Remove Outliers (Optional):

Q1 <- quantile(data$Salary, 0.25)

Q3 <- quantile(data$Salary, 0.75)


IQR <- Q3 - Q1

# Filter data within range

data <- subset(data, Salary > (Q1 - 1.5*IQR) & Salary < (Q3 + 1.5*IQR))

8. Standardizing and Normalizing Data

Normalization

Scales values between 0 and 1.

data$Normalized_Salary <- (data$Salary - min(data$Salary)) /

(max(data$Salary) - min(data$Salary))

Standardization (Z-Score)

Centers data around mean = 0, SD = 1.

data$Standardized_Salary <- scale(data$Salary)

9. Dealing with Inconsistent Data

Sometimes, text data can contain inconsistencies like spacing or case issues.

Example

library(stringr)

# Trim whitespace

data$Name <- str_trim(data$Name)

# Convert all names to title case

data$Name <- str_to_title(data$Name)

Example:

Before: " alice "

After: "Alice"
10. Splitting and Combining Columns

Use functions from tidyr to split or combine columns.

Splitting a Column

library(tidyr)

data <- separate(data, col = "Full_Name", into = c("First_Name", "Last_Name"), sep = " ")

Combining Columns

data <- unite(data, "Full_Name", First_Name, Last_Name, sep = " ")

11. Removing Unnecessary Columns

To delete unwanted columns:

data <- subset(data, select = -c(UnnecessaryColumn))

Or using dplyr:

data <- select(data, -UnnecessaryColumn)

12. Encoding Categorical Data

Convert categorical text into numeric values for analysis or modeling.

Example

data$Gender_Code <- ifelse(data$Gender == "Male", 1, 0)

Converts:

Male → 1

Female → 0

13. Summarizing and Inspecting Cleaned Data

After cleaning, always check your data:


summary(data)

str(data)

head(data)

You can use:

library(dplyr)

data %>%

summarise(

Mean_Age = mean(Age, [Link] = TRUE),

Max_Salary = max(Salary, [Link] = TRUE)

14. Example: Complete Data Cleaning Workflow

# Step 1: Import data

data <- [Link]("[Link]")

# Step 2: Check structure

str(data)

# Step 3: Handle missing values

data$Age[[Link](data$Age)] <- mean(data$Age, [Link] = TRUE)

# Step 4: Remove duplicates

data <- distinct(data)

# Step 5: Convert data types

data$Salary <- [Link](data$Salary)


# Step 6: Rename columns

data <- rename(data, Employee_Name = Name, Employee_Age = Age)

# Step 7: Normalize salary

data$Normalized_Salary <- (data$Salary - min(data$Salary)) /

(max(data$Salary) - min(data$Salary))

# Step 8: Save cleaned data

[Link](data, "cleaned_employees.csv", [Link] = FALSE)

Cleaned data ready for analysis!

15. Summary

Task Function / Package Example

Import Data [Link]() [Link]("[Link]")

Handle Missing Values [Link](), [Link]() data$Age[[Link]()] <- mean()

Remove Duplicates duplicated(), distinct() data <- distinct(data)

Rename Columns rename() rename(data, New=Old)

Handle Outliers boxplot(), subset() Remove extreme values

Normalize / Scale scale() scale(data$Salary)

Split Columns separate() Split full name

Combine Columns unite() Combine columns

Encoding ifelse() Convert text to numbers

Save Clean Data [Link]() Save cleaned file

Practice Questions (Without Answers)


1. Checking Missing Values

Load a CSV file named [Link] and find how many missing values are present in each column.

# Practice: Use colSums([Link]()) to check missing values.

2. Replacing Missing Values

Replace missing Age values in the [Link] dataset with the average age.

# Practice: Replace missing values using mean().

3. Removing Duplicates

Remove any duplicate rows from a data frame named data.

# Practice: Use distinct() or duplicated().

4. Handling Outliers

Detect and remove outliers from the Salary column in the data frame employees.

# Practice: Use IQR method to remove outliers.

5. Converting Data Types

Convert the Age column to numeric and Name column to character in a dataset.

# Practice: Use [Link]() and [Link]().

6. Renaming Columns

Rename the columns Name to Employee_Name and Salary to Employee_Salary.

# Practice: Use rename() function.

7. Normalization

Normalize the Marks column between 0 and 1 in a data frame.


# Practice: Apply normalization formula.

8. String Cleaning

Trim extra spaces and convert names to title case in the Name column.

# Practice: Use str_trim() and str_to_title().

9. Splitting and Combining Columns

Split a column named Full_Name into First_Name and Last_Name, then combine them again.

# Practice: Use separate() and unite().

10. Save Clean Data

After cleaning a dataset named data, save it as cleaned_data.csv.

# Practice: Use [Link]() to export data.

Chapter -13

Chapter – Data Visualization in R

1. Introduction

Data Visualization is the graphical representation of data to make it easier to understand patterns, relationships,
and insights.
R is one of the most powerful languages for visualization, offering both basic plotting functions and advanced
libraries like ggplot2.

There are two main approaches:

1. Base R Graphics – Simple and built-in plotting functions

2. ggplot2 – A grammar-based, layered approach for professional-quality visuals


2. Base R Plotting System

The base plotting functions are easy to use and ideal for quick visual exploration.

(a) The plot() Function

The plot() function creates a basic scatter plot or line plot depending on the data.

# Example Data

x <- c(1, 2, 3, 4, 5)

y <- c(2, 4, 6, 8, 10)

# Simple scatter plot

plot(x, y, main = "Simple Scatter Plot", xlab = "X Values", ylab = "Y Values", col = "blue", pch = 16)

Explanation:

• main → Title of the graph

• xlab, ylab → Axis labels

• col → Color

• pch → Point style

(b) Line Plot

Use type = "l" for line plots.

plot(x, y, type = "l", col = "red", main = "Line Plot", xlab = "X", ylab = "Y")

(c) Bar Plot

Used for categorical data.

scores <- c(85, 90, 88, 75)

names <- c("Alice", "Bob", "Charlie", "David")


barplot(scores, [Link] = names, col = "lightblue", main = "Student Scores", ylab = "Marks")

(d) Histogram

Used to show data distribution.

marks <- c(45, 56, 67, 78, 89, 90, 91, 73, 65, 80)

hist(marks, col = "orange", main = "Histogram of Marks", xlab = "Marks")

(e) Pie Chart

Used for percentage data.

slices <- c(30, 20, 25, 25)

labels <- c("Math", "Science", "English", "History")

pie(slices, labels, main = "Subject Distribution", col = rainbow(length(slices)))

(f) Boxplot

Shows data spread and outliers.

scores <- c(65, 70, 75, 80, 85, 90, 95)

boxplot(scores, main = "Boxplot of Scores", col = "green")

Interpretation:

• Box shows the interquartile range (IQR).

• Line inside the box = median.

• Dots = outliers.

(g) Multiple Plots in One Window

Use par(mfrow = c(rows, cols)) to display multiple plots together.

par(mfrow = c(2, 2)) # 2x2 grid


plot(x, y)

barplot(scores)

hist(marks)

boxplot(scores)

3. Advanced Visualization Using ggplot2

The ggplot2 package is part of the tidyverse and follows the Grammar of Graphics — where each plot is built layer
by layer.

Installing and Loading ggplot2

[Link]("ggplot2")

library(ggplot2)

4. Structure of ggplot2

ggplot(data = <data_frame>, aes(x = <x_var>, y = <y_var>)) +

geom_<plot_type>() +

other_layers()

5. Basic ggplot2 Plots

(a) Scatter Plot

df <- [Link](Height = c(150, 160, 170, 180),

Weight = c(55, 60, 70, 80))

ggplot(df, aes(x = Height, y = Weight)) +

geom_point(color = "blue", size = 3) +

ggtitle("Height vs Weight") +

xlab("Height (cm)") + ylab("Weight (kg)")


Explanation:

• geom_point() creates scatter plots

• ggtitle() adds title

• xlab(), ylab() add axis labels

(b) Line Plot

sales <- [Link](Month = c("Jan", "Feb", "Mar", "Apr"),

Revenue = c(200, 250, 300, 400))

ggplot(sales, aes(x = Month, y = Revenue, group = 1)) +

geom_line(color = "red") +

geom_point(size = 3) +

ggtitle("Monthly Sales Trend")

(c) Bar Plot

data <- [Link](

Subject = c("Math", "Science", "English", "History"),

Score = c(85, 90, 78, 88)

ggplot(data, aes(x = Subject, y = Score, fill = Subject)) +

geom_bar(stat = "identity") +

ggtitle("Subject-wise Scores")

stat = "identity" → uses actual data values instead of counts.

(d) Histogram

data <- [Link](Marks = c(50, 60, 70, 80, 90, 100, 65, 85, 75))
ggplot(data, aes(x = Marks)) +

geom_histogram(binwidth = 10, fill = "purple", color = "black") +

ggtitle("Distribution of Marks")

(e) Boxplot

data <- [Link](Subject = rep(c("Math", "Science"), each = 5),

Marks = c(85, 90, 88, 92, 95, 75, 78, 80, 82, 85))

ggplot(data, aes(x = Subject, y = Marks, fill = Subject)) +

geom_boxplot() +

ggtitle("Boxplot of Marks by Subject")

(f) Pie Chart Using ggplot2

Though not a built-in function, you can create pie charts by converting bar charts into circular coordinates.

data <- [Link](

Category = c("A", "B", "C", "D"),

Value = c(30, 25, 20, 25)

ggplot(data, aes(x = "", y = Value, fill = Category)) +

geom_bar(stat = "identity") +

coord_polar("y", start = 0) +

ggtitle("Pie Chart Example")

6. Adding Colors and Themes

You can customize your plots with colors and themes.

ggplot(data, aes(x = Subject, y = Score, fill = Subject)) +


geom_bar(stat = "identity") +

ggtitle("Custom Bar Plot") +

theme_minimal() +

theme([Link].x = element_text(angle = 45, hjust = 1))

Common Themes:

• theme_minimal()

• theme_classic()

• theme_bw()

• theme_dark()

7. Facets (Subplots)

Faceting splits data into multiple smaller plots.

df <- [Link](

Gender = rep(c("Male", "Female"), each = 4),

Age = c(20, 25, 30, 35, 22, 28, 32, 40),

Height = c(170, 175, 180, 185, 160, 165, 170, 172)

ggplot(df, aes(x = Age, y = Height, color = Gender)) +

geom_point(size = 3) +

facet_wrap(~Gender) +

ggtitle("Faceted Scatter Plot by Gender")

8. Exporting Plots

Use ggsave() to save your plot.

ggplot(df, aes(x = Age, y = Height)) +

geom_point() +
ggtitle("Height vs Age")

ggsave("height_plot.png", width = 6, height = 4)

9. Combining Multiple Layers

You can overlay different types of plots in one figure.

ggplot(df, aes(x = Age, y = Height)) +

geom_point(color = "blue") +

geom_smooth(method = "lm", color = "red") +

ggtitle("Scatter Plot with Trend Line")

Explanation:

• geom_point() → plots points

• geom_smooth(method = "lm") → adds regression line

10. Summary

Plot Type Base R Function ggplot2 Function

Scatter Plot plot() geom_point()

Line Plot plot(type="l") geom_line()

Bar Chart barplot() geom_bar()

Histogram hist() geom_histogram()

Pie Chart pie() coord_polar()

Boxplot boxplot() geom_boxplot()

Practice Questions (Without Answers)

1. Simple Scatter Plot


Create a scatter plot of Height vs Weight using base R’s plot() function.

# Practice: Use plot() to visualize Height vs Weight.

2. Bar Chart

Create a bar chart showing the number of students in each subject.

# Practice: Use barplot() or ggplot() with geom_bar().

3. Histogram

Create a histogram to show the distribution of student marks.

# Practice: Use hist() or geom_histogram().

4. Pie Chart

Draw a pie chart showing the percentage of sales in different regions.

# Practice: Use pie() or coord_polar().

5. Line Chart

Create a line plot showing monthly sales using ggplot2.

# Practice: Use geom_line() to plot monthly trends.

6. Boxplot

Use ggplot2 to compare salaries between different departments.

# Practice: Use geom_boxplot().

7. Faceting

Create faceted scatter plots to compare data for different genders.

# Practice: Use facet_wrap(~Gender).


8. Adding Themes

Customize a ggplot chart using theme_minimal() and rotate x-axis labels.

# Practice: Use theme_minimal() and element_text().

9. Multiple Layers

Add a regression trend line to a scatter plot using geom_smooth().

# Practice: Combine geom_point() and geom_smooth().

10. Saving Plots

Save your ggplot chart as report_plot.png with size 6x4 inches.

# Practice: Use ggsave() to export the cha

You might also like