0% found this document useful (0 votes)

10 views42 pages

R Programming and Bioinformatics Guide

R is a programming language used for statistical computing and graphics, widely utilized in data analysis and bioinformatics. Bioinformatics combines biology with computational tools to manage and analyze biological data, playing a crucial role in research and applications like genomics and clinical bioinformatics. The document also covers data manipulation in R using packages like dplyr, data structures like vectors, matrices, and data frames, and statistical methods including t-tests.

Uploaded by

santoshkumar130378

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views42 pages

R Programming and Bioinformatics Guide

Uploaded by

santoshkumar130378

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

What is R?

R is a programming language for statistical

computing and graphics.

Open-source and widely used in data analysis

and bioinformatics.

Supports statistical modeling, data

manipulation, and visualization.
• 1️. Download R:
[Link]
• 2️. Download RStudio:
[Link]
rstudio-desktop/
• RStudio gives you: - Script
editor - Console -
Environment viewer
• RStudio makes it easy to
write, run, and debugR
code.
What is Bioinformatics?

Bioinformatics is the
application of It combines biology,
Definition: computational tools and computer science,
techniques to understand mathematics, and statistics.
and manage biological data.

Data Analysis – Alignment,

Data Storage – DNA, RNA,
Key Components: Annotation, Structure
Protein sequences
prediction

Visualization – Graphs,
Heatmaps, Molecular
Structures
Why is Bioinformatics Important?
• Manages Big Biological Data
• Human Genome: 3 billion base pairs!
• Need computers to store and analyze
Importance of this data.
Bioinformatics • Accelerates Research
• Drug discovery, personalized medicine,
in vaccine development
Biotechnology Virtual experiments reduce lab costs and
time.
• Bridges Lab and Data Science
• Helps biologists make sense of complex
datasets.
Applications of Bioinformatics

Field Bioinformatics Application

Genomics DNA sequencing, Genome assembly
Transcriptomics RNA-Seq, Gene expression analysis
Protein structure prediction, Mass
Proteomics
spectrometry data analysis

Metagenomics Study of microbiomes, 16S rRNA analysis

Structural Biology Protein docking, Molecular simulations

Clinical Bioinformatics Cancer genomics, Biomarker discovery

• Use [Link]() or
[Link]() to import data
Reading from CSV files.
• Example: data <-
Data in [Link]('gene_expression.
csv')

R • Check data with head(),

str(), and summary()
functions.
Importing & Cleaning Real Datasets

• Import Data:
• data <- [Link]("gene_expression.csv")
Clean Data:
• View first few rows: head(data)
• Check structure: str(data)
• Summary of columns: summary(data)
• Remove missing values: data <- [Link](data)
• Convert types: data$Group <- [Link](data)
dplyr

dplyr is one of the most

powerful and popular R
What is dplyr? packages for data Why use dplyr?
manipulation. It is part of
the tidyverse collection.

It makes data wrangling

Works great on data Uses a pipeline (%>%) to
easier, faster, and more
frames and tibbles chain commands together
readable
Data Manipulation with dplyr

dplyr is a grammar of data manipulation.

Key functions: filter(), select(), mutate(),

arrange(), summarise()

Example: data %>% filter(condition == 'treated')

%>% summarise(mean(Expression))
Common dpylr Function
Function Purpose Example

filter() Select rows that match a condition filter(Group == "Control")

select() Pick specific columns select(Gene, Expression)

mutate() Create or modify columns mutate(LogExpr = log(Expression))

arrange() Sort rows arrange(desc(Expression))

summarise() Create summary statistics summarise(Mean = mean(Expression))

group_by() Group data before summarizing group_by(Group)

What is a Vector?

A vector in R is a sequence of elements that are

all of the same data type. It is the most basic
and fundamental data structure in R.

R is a vectorized language, which means most

operations work on vectors without the need for
loops.
Key Characteristics:

Feature Explanation

Homogeneous All elements must be of the same type

One-dimensional Only one row of elements (no columns)

Elements can be accessed using position

Indexable numbers

Automatically named You can assign names to vector elements

How to Create a Vector
• # Numeric vector
• gene_expr <- c(25.4, 30.1, 45.2, 28.9, 35.6)

• # Character vector
• genes <- c("GeneA", "GeneB", "GeneC", "GeneD",
"GeneE")

• # Logical vector
• results <- c(TRUE, FALSE, TRUE)
Types of Vectors in R

Type Example Use Case

Numeric c(1, 2, 3.5) Gene expression values

Integer c(1L, 2L, 3L) Sample counts, IDs

Character c("GeneA", "GeneB") Gene names, sample labels

Logical c(TRUE, FALSE) Quality control pass/fail flags

Complex c(1+2i, 3+4i) Rarely used in biology

Accessing Vector
Elements (Indexing)
gene_expr[1] # First element
(25.4)
gene_expr[2:4] # 2nd to 4th
elements
• gene_expr[c(1, 3)] # First and
third elements
Matrix
• A matrix in R is a 2-dimensional data structure
that stores elements in rows and columns.
Unlike data frames, all elements in a matrix must
be of the same type (numeric, character, or
logical).
• It’s like a table or spreadsheet with a fixed data
type.
Key Characteristics:

Property Explanation

2D Structure Organized in rows and columns

Homogeneous All elements must be of the same type

Indexed Access elements using [row, column]

Supports matrix algebra (addition,

Numeric-friendly multiplication, etc.)
How to Create a
Matrix
expr_matrix <- matrix(
c(10, 12, 14, 15, 11, 13), # Data
nrow = 2, # Number of rows
ncol = 3, # Number of columns
byrow = TRUE # Fill row-wise
)
Common Matrix
Operations

Operation Code Example

Add a constant expr_values + 1
Multiply by scalar expr_values * 2
Row-wise mean rowMeans(expr_values)
Column-wise sum colSums(expr_values)
Matrix multiplication A %*% B (matrix algebra)
Transpose matrix t(expr_values)
When to Use Matrices inBioinformatics

Application Description
Rows = genes, Columns =
Gene Expression Data samples
Methylation / SNP Intensity For omics arrays (values are all
Matrix numeric)
Convert pixel intensities to
Image Data matrix
Distance or Correlation Pairwise distances between
Matrices genes or samples
Data Frame?

A data frame is a 2-dimensional tabular

data structure in R that:
• Looks like an Excel spreadsheet
• Can store columns of different data
types (numeric, character, logical, etc.)
• Is the most commonly used structure for
datasets in R
Key Characteristics

Feature Description
2D structure Rows and columns

Mixed types Columns can be numeric, character,

logical, etc.
Column access Use $, indexing, or column names
Row access Use row numbers

Ideal for Real-world data, survey results, lab

measurements
How to Create a Data Frame

mean(): Calculate the Average

• The mean() function computes the
arithmetic average of a numeric vector
expression <- c(25.4, 30.1, 45.2, 28.9, 35.6)
• mean(expression)
Use in Biology

Average gene expression

level across samples

Average metabolite
concentration in a group

Average age of patients in a

clinical study
Standard Deviation

• Standard Deviation (SD) tells how much

the values in a dataset vary (spread out)
from the mean.
• Low SD → values are close to the mean
(consistent)
• High SD → values are spread out (more
variable)
Biological Relevance

MEASURING VARIABILITY ASSESSING HOW EVALUATING

IN GENE EXPRESSION CONSISTENT PATIENT EXPERIMENTAL
ACROSS REPLICATES BIOMARKER LEVELS ARE REPEATABILITY
To Calculate SD

# Expression values for a gene across 5

samples
expression <- c(25.4, 30.1, 45.2, 28.9, 35.6)

# Calculate Standard Deviation

std_dev <- sd(expression)

# Print the result

print(paste("Standard Deviation:",
round(std_dev, 2)))
What is summary()
The summary() function provides a quick
statistical overview of data.
• For numeric data, it gives:
– Minimum
– 1st Quartile (25%)
– Median (50%)
– Mean
– 3rd Quartile (75%)
– Maximum
• For categorical/factor data, it gives:
– Frequency count of each category
syntax
A whole
summary(x)
data frame

A column in
Where x can
a data
be:
frame

A vector
ggplot2 is an R package used for
data visualization. It allows you to
create beautiful and customizable
plots based on the Grammar of
Graphics.
• It is part of the tidyverse collection
of packages.
ggplot2? • ggplot2 is an R package used for
data visualization.
• It allows you to create beautiful
and customizable plots based on
the Grammar of Graphics.
• It is part of the tidyverse collection
of packages.
Why use ggplot2?
Easy to build complex plots from simple layers
Ideal for scientific and publication-ready figures

Highly customizable: themes, labels, colors,

legends

Works seamlessly with data frames and dplyr

Add layers: + labs(), +
theme(), +
scale_x_continuous()

ggplot2 - Plot Faceting: ggplot(data)

+ facet_wrap(~Group)
Customization

Export plots with

ggsave('[Link]')
1. Load data using [Link]().

2. Use dplyr to clean and

manipulate.
Example
Workflow: 3. Calculate mean and sd of
expression levels.
Gene
Expression
4. Plot expression using ggplot2.

5. Interpret biological significance.

What is a t-test?
What is a t-test?
• A t-test compares the means of two groups and
tells you whether the difference is statistically
significant.
Types of t-tests:
Type When to use
Comparing two different
Independent (two-
groups (e.g., control vs
sample)
treatment)

Comparing same group

Paired before & after (e.g., pre- vs
post-treatment)

Compare one group to a

One-sample
known value
Example: Independent t-test in R
• Suppose you have gene expression data for a
gene in control and treatment groups.
# Expression values for Control and Treatment
groups
control <- c(25.4, 28.1, 29.5, 30.2, 27.8)
treatment <- c(32.5, 35.1, 33.9, 34.2, 36.0)
[Link](control, treatment)
How to Interpret:
Output Part Meaning
t= The t-statistic
df = Degrees of freedom
Probability result is due to
p-value
chance
mean of x Mean of control group
mean of y Mean of treatment group
Range of the true mean
confidence interval
difference
Real-life
Bio Use
Application Description

Gene expression Control vs Treatment

comparison

Cases: Clinical trial

measurements Drug vs Placebo

Proteomics Protein intensity across

conditions

Metabolomics Metabolite abundance

differences
summary(): Shows min, max,
mean, median, etc.

mean(), sd(), var(): Useful for

descriptive statistics.
Statistical
Summary in R
[Link](): Compare two groups.

cor(): Correlation between

variables.

Data Analysis with R: A Comprehensive Guide
No ratings yet
Data Analysis with R: A Comprehensive Guide
39 pages
Introduction to R for Statistical Analysis
No ratings yet
Introduction to R for Statistical Analysis
66 pages
Statistics for Genome Analysis in R
No ratings yet
Statistics for Genome Analysis in R
6 pages
R Tutorial for Gene Expression Analysis
No ratings yet
R Tutorial for Gene Expression Analysis
11 pages
R Data Management and Structures Guide
No ratings yet
R Data Management and Structures Guide
9 pages
Chapter 3 Data Management in R
No ratings yet
Chapter 3 Data Management in R
35 pages
R Programming Basics Guide
No ratings yet
R Programming Basics Guide
24 pages
R Statistical Package: Data Input & Access
No ratings yet
R Statistical Package: Data Input & Access
28 pages
R Tutorial for EHS Data Analysis
No ratings yet
R Tutorial for EHS Data Analysis
9 pages
ASM Unit2
No ratings yet
ASM Unit2
15 pages
R Data Analysis with dplyr and ggplot2
No ratings yet
R Data Analysis with dplyr and ggplot2
2 pages
R Language Basics for Biological Data
No ratings yet
R Language Basics for Biological Data
2 pages
Introduction to R for Biological Data
No ratings yet
Introduction to R for Biological Data
2 pages
Introduction to R for Data Analysis
No ratings yet
Introduction to R for Data Analysis
47 pages
Data Manipulation in R for Beef Calves
No ratings yet
Data Manipulation in R for Beef Calves
6 pages
R Programming Reference Card
No ratings yet
R Programming Reference Card
2 pages
R Operations Reference Guide
100% (1)
R Operations Reference Guide
4 pages
R Programming: Interfaces and Data Handling
No ratings yet
R Programming: Interfaces and Data Handling
22 pages
Data Manipulation in Watson Studio
No ratings yet
Data Manipulation in Watson Studio
58 pages
Understanding R Data Types and Structures
No ratings yet
Understanding R Data Types and Structures
7 pages
Data Science Techniques Using R
No ratings yet
Data Science Techniques Using R
38 pages
R Programming for Data Analysis and Visualization
No ratings yet
R Programming for Data Analysis and Visualization
29 pages
Data and Graph Basics in R Statistics
No ratings yet
Data and Graph Basics in R Statistics
18 pages
Data and Graph Basics in R Statistics
No ratings yet
Data and Graph Basics in R Statistics
17 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
4 pages
Essential R Studio Commands Guide
No ratings yet
Essential R Studio Commands Guide
5 pages
R Graphics for Effective Data Visualization
No ratings yet
R Graphics for Effective Data Visualization
248 pages
R Programming Basics for Biologists
No ratings yet
R Programming Basics for Biologists
29 pages
ANOVA Analysis with R Programming
No ratings yet
ANOVA Analysis with R Programming
32 pages
Reading and Writing Data in R
No ratings yet
Reading and Writing Data in R
18 pages
Data Analytics Basics with R
No ratings yet
Data Analytics Basics with R
27 pages
Experiment - 4
No ratings yet
Experiment - 4
6 pages
AD8502 Data Exploration Lab Manual
No ratings yet
AD8502 Data Exploration Lab Manual
29 pages
R ggplot2 Code Examples Guide
No ratings yet
R ggplot2 Code Examples Guide
22 pages
R Data Analysis and Visualization Guide
No ratings yet
R Data Analysis and Visualization Guide
4 pages
R Programming Cheat Sheet Guide
No ratings yet
R Programming Cheat Sheet Guide
7 pages
R3-Data Edu
No ratings yet
R3-Data Edu
29 pages
Essential R Functions Reference Guide
100% (4)
Essential R Functions Reference Guide
4 pages
R Data Structures and Plotting Basics
No ratings yet
R Data Structures and Plotting Basics
14 pages
Zelig For R Cheat Sheet: Plots Vectors
No ratings yet
Zelig For R Cheat Sheet: Plots Vectors
2 pages
R Functions Reference Guide
No ratings yet
R Functions Reference Guide
5 pages
L02 Working With Data
No ratings yet
L02 Working With Data
44 pages
Data Transformation Cheatsheet
No ratings yet
Data Transformation Cheatsheet
2 pages
Main R Cheatsheet
No ratings yet
Main R Cheatsheet
34 pages
Introduction to R for Statistical Analysis
No ratings yet
Introduction to R for Statistical Analysis
17 pages
R Data Frames and Functions Overview
No ratings yet
R Data Frames and Functions Overview
15 pages
Tutorial-1 - Handling Data in R
No ratings yet
Tutorial-1 - Handling Data in R
5 pages
Data Transformation with dplyr Guide
No ratings yet
Data Transformation with dplyr Guide
2 pages
R Data Types and Plotting Techniques
No ratings yet
R Data Types and Plotting Techniques
9 pages
R Note 2
No ratings yet
R Note 2
13 pages
R Bioconductor RNA-Seq Analysis Guide
No ratings yet
R Bioconductor RNA-Seq Analysis Guide
4 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
39 pages
Data Wrangling with R Course Overview
No ratings yet
Data Wrangling with R Course Overview
174 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
20 pages
LEC-1 - Int - To R Prog SRNR
No ratings yet
LEC-1 - Int - To R Prog SRNR
47 pages
Rcmdr Installation and Usage Guide
No ratings yet
Rcmdr Installation and Usage Guide
23 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
23 pages
Introduction to R by Jarno Tuimala
No ratings yet
Introduction to R by Jarno Tuimala
41 pages
R Data Structures and Control Flow Guide
No ratings yet
R Data Structures and Control Flow Guide
44 pages
Bar Bending Schedule for Beams and Slabs
No ratings yet
Bar Bending Schedule for Beams and Slabs
18 pages
Class XI Physics Exam Paper 2011
No ratings yet
Class XI Physics Exam Paper 2011
12 pages
Form 3 Mathematics Exam Paper 2025
No ratings yet
Form 3 Mathematics Exam Paper 2025
15 pages
520L0848
No ratings yet
520L0848
64 pages
Science 8 Second Periodic Test Specs
No ratings yet
Science 8 Second Periodic Test Specs
2 pages
Shear and Bond in Reinforced Concrete
No ratings yet
Shear and Bond in Reinforced Concrete
9 pages
Laboratory Guide: Specific Gravity Experiment
No ratings yet
Laboratory Guide: Specific Gravity Experiment
41 pages
Vedic Astrology: Panchanga Chart Method
No ratings yet
Vedic Astrology: Panchanga Chart Method
6 pages
C Language Syllabus
No ratings yet
C Language Syllabus
3 pages
Software for Steel Structure Design
No ratings yet
Software for Steel Structure Design
20 pages
Time Value of Money Practice Problems
No ratings yet
Time Value of Money Practice Problems
21 pages
Hydraulics and Pneumatics Overview
No ratings yet
Hydraulics and Pneumatics Overview
9 pages
Open Electives for Mechanical Engineering
No ratings yet
Open Electives for Mechanical Engineering
55 pages
SpiderQC Online Help PDF
No ratings yet
SpiderQC Online Help PDF
307 pages
Visualizing Quantitative Data Effectively
No ratings yet
Visualizing Quantitative Data Effectively
2 pages
Creating Bootable USB with UltraISO
No ratings yet
Creating Bootable USB with UltraISO
8 pages
SSC CGL Maths Marathon with Aditya Ranjan
No ratings yet
SSC CGL Maths Marathon with Aditya Ranjan
1 page
Triax 2015 Impact Tester Manual
No ratings yet
Triax 2015 Impact Tester Manual
41 pages
Warehouse Layout Design Optimization
No ratings yet
Warehouse Layout Design Optimization
5 pages
IMS-COBOL Program Preparation Guide
No ratings yet
IMS-COBOL Program Preparation Guide
3 pages
Airbus A320 Fuel Strainers Update
No ratings yet
Airbus A320 Fuel Strainers Update
79 pages
Cloudbusting Cabled Cardigan Pattern
100% (1)
Cloudbusting Cabled Cardigan Pattern
18 pages
C Calendar Application Development
No ratings yet
C Calendar Application Development
4 pages
Class 9 Maths Lesson Plans Overview
No ratings yet
Class 9 Maths Lesson Plans Overview
26 pages
Fehr-Schmidt Model of Fairness
No ratings yet
Fehr-Schmidt Model of Fairness
4 pages
Non-Linear Systems Exam Paper 2024
No ratings yet
Non-Linear Systems Exam Paper 2024
2 pages
E - WP - Safety Availability Versus Process Availability
No ratings yet
E - WP - Safety Availability Versus Process Availability
13 pages
Recrystallization of Benzoic Acid
No ratings yet
Recrystallization of Benzoic Acid
22 pages
Gran Plot Titration Analysis Guide
No ratings yet
Gran Plot Titration Analysis Guide
6 pages
File Management and Organization Methods
No ratings yet
File Management and Organization Methods
13 pages