0% found this document useful (0 votes)

17 views18 pages

CS 838: Pairwise Sequence Alignment

This document discusses pairwise sequence alignment using dynamic programming. It introduces the tasks of comparing DNA or protein sequences to find the optimal correspondences between subsequences that maximize similarity. Dynamic programming is used to solve this problem by dividing it into smaller subproblems and storing the solutions in a matrix. The document provides examples of how to initialize the matrix and fill it in using a scoring scheme to find the highest scoring alignment. It analyzes the computational complexity and also discusses extensions like local alignment.

Uploaded by

Fadhili Dunga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views18 pages

CS 838: Pairwise Sequence Alignment

Uploaded by

Fadhili Dunga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PS, PDF, TXT or read online on Scribd

Pairwise Sequence Alignment

CS 838
[Link]/~craven/[Link]
Mark Craven
craven@[Link]
January 2001

Announcements
• New optional, but recommended, reading on the
course web page: Molecular Biology for Computer
Scientists by Larry Hunter

1
Pairwise Alignment:
Task Definition
• Given
– a pair of sequences (DNA or protein)
– a method for scoring the similarity of a pair of
characters
• Do
– determine the correspondences between
substrings in the sequences such that the
similarity score is maximized

Motivation
• comparing sequences to gain information
about the structure/function of a query
sequence
• putting together a set of sequenced
fragments (fragment assembly)
• comparing a segment sequenced by two
different labs

2
The Role of Homology
• homology: similarity due to descent from a
common ancestor
• often we can infer homology from similarity
• thus we can sometimes infer
structure/function from sequence similarity

Homology
• homologous sequences can be divided into
two groups
– orthologous sequences: sequences that differ
because they are found in different species
(e.g. human α-globin and mouse α-globin)
– paralogous sequences: sequences that differ
because of a gene duplication event
(e.g. human α-globin and human β-globin,
various versions of both )

3
Issues in Sequence Alignment
• the sequences we’re comparing probably differ in
length
• there may be only a relatively small region in the
sequences that matches
• we want to allow partial matches (i.e. some amino
acid pairs are more substitutable than others)
• variable length regions may have been
inserted/deleted from the common ancestral
sequence

Gaps
• sequences may have diverged from a
common ancestor through various types of
mutations:
– substitutions (ACGA AGGA)
– insertions (ACGA ACCGA)
– deletions (ACGA AGA)
• the latter two will result in gaps in
alignments

4
Insertions/Deletions and
Protein Structure

loop structures: insertions/deletions

here not so significant

Example Alignment
GSAQVKGHGKKVADALTNAVAHV---D--DMPNALSALSDLHAHKL
++ ++++H+ KV + +A ++ +L+ L+++H+ K
NNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKG

• gaps depicted with –

• middle line shows matches
– identical matches shown with letters
– similar amino acids shown with +
– dissimilar amino acids/gaps indicated by space

5
Alignments in the Olden Days:
Dot Plots
G A C G G A T T A G
G n n n n
A n n n
T n n
C n
G n n n n
G n n n n
A n n n
A n n n
T n n
A n n n
G n n n n

Types of Alignment
• global: find best match of both sequences in their
entirety
• local: find best subsequence match
• semi-global: find best match without penalizing
gaps on the ends of the alignment

6
Pairwise Alignment Via Dynamic
Programming
• Needleman & Wunsch, Journal of Molecular
Biology, 1970
• dynamic programming: solve an instance of a
problem by taking advantage of computed
solutions for smaller subparts of the problem
• determine alignment of two sequences by
determining alignment of all prefixes of the
sequences

Scoring Scheme Components

• substitution matrix
– s(a,b) indicates score of aligning character a
with character b
• gap penalty function
– w(k) indicates cost of a gap of length k

7
Linear Gap Penalty Function
• different gap penalty functions require
somewhat different DP algorithms
• the simplest case is when a linear gap
function is used

w(k ) = gk
where g is a constant
• we’ll start by considering this case

Dynamic Programming Idea

• consider last step in computing alignment of AAAC with
AGC
• three possible options; in each we’ll choose a different
pairing for end of alignment, and add this to best alignment
of previous characters
AAA C AAAC -
AG C AG C

AAA C consider best score of

AGC -
alignment of + aligning
these prefixes this pair

8
Dynamic Programming Idea
• given an n-character sequence x, and an m-
character sequence y
• construct an (n+1) x (m+1) matrix F
• F [ i, j ] = score of the best alignment of
x[1…i ] with y[1…j ]

Dynamic Programming Idea

F[i-1, j-1] F[i, j-1]

+g
+ s(x[i],y[j])

F[i-1, j] F[i, j]
+g

9
Dynamic Programming Idea
• in extending an alignment, we have 3 choices:
– align x[ 1… i-1] with y[ 1… j-1] and match x[ i ]
with y[ i ]
– align x[1… i ] with y[ 1… j-1 ] and match a gap
with y[ j ]
– align x[ 1…i-1 ] with y[ 1… j ] and match a gap
with x[ i ]
• choose highest scoring choice to fill in F [ i, j ]

DP Algorithm for Global Alignment

with Linear Gap Penalty
• one way to specify the DP is in terms of its
recurrence relation:

 F (i − 1, j − 1) +s ( xi, yj )

F (i, j ) = max  F (i − 1, j ) + g
 F (i, j − 1) + g


10
Initializing Matrix: Global
Alignment with Linear Gap Penalty
A G C

0 g 2g 3g

A g

A 2g

A 3g

C 4g

DP Algorithm Sketch
• initialize first row and column of matrix
• fill in rest of matrix from top to bottom, left
to right
• for each F [ i, j ], save pointer(s) to cell(s)
that resulted in best score
• F [m, n] holds the optimal alignment score;
trace pointers back from F [m, n] to F [0, 0]
to recover alignment

11
DP Algorithm Example
• suppose we choose the following scoring scheme:
s(x[i], y[j]) =
+1 when x[i] = y[j]
-1 when x[i] <> y[j]
g (penalty for aligning with a gap) = -2

DP Algorithm Example
A G C

0 -2 -4 -6

A one optimal alignment

-2 1 -1 -3
x: A A A C
A y: A G - C
-4 -1 0 -2

A -6 -3 -2 -1

C -8 -5 -4 -1

12
DP Comments
• works for either DNA or protein sequences,
although the substitution matrices used
differ
• finds an optimal alignment
• the exact algorithm (and computational
complexity) depends on gap penalty
function (we’ll come back to this issue)

Equally Optimal Alignments

• many optimal alignments may exist for a given
pair of sequences
• can use preference ordering over paths when
doing traceback
highroad 1 lowroad 3
2 2

3 1
• highroad and loadroad alignments show the two
most different optimal alignments

13
Highroad & Lowroad Alignments
A G C
highroad alignment
0 -2 -4 -6
x: A A A C
A y: A G - C
-2 1 -1 -3

A -4 -1 0 -2 lowroad alignment
x: A A A C
A -6 -3 -2 -1 y: - A G C

C -8 -5 -4 -1

Dynamic Programming Analysis

• there are

 2n  (2n)! 2 2 n
  = ≈
 
n ( n! ) 2
πn
possible alignments of length n
• e.g. two sequences of length 1000 have ≈ 10
600

possible alignments
• but the DP approach finds an optimal alignment
efficiently

14
Computational Complexity
• initialization: O(m), O(n)
• filling in rest of matrix: O(mn)
• traceback: O(m + n)
• hence, if sequences have nearly same
length, the computational complexity is
O (n 2 )

Local Alignment
• so far we have discussed global alignment,
where we are looking for best match
between sequences from one end to the
other.
• more commonly, we will want a local
alignment, the best match between
subsequences of x and y.

15
Local Alignment Motivation
• useful for comparing protein sequences that
share a common domain but differ
elsewhere
• useful for comparing against genomic
sequences (long stretches of
uncharacterized sequence)
• more sensitive when comparing highly
diverged sequences

Local Alignment DP Algorithm

• original formulation: Smith & Waterman,
Journal of Molecular Biology, 1981
• interpretation of array values is somewhat
different
– F [ i, j ] = score of the best alignment of a
suffix of x[1…i ] and a suffix of y[1…j ]

16
Local Alignment DP Algorithm
• the recurrence relation is slightly different than for
global algorithm

 F (i − 1, j − 1) +s( xi, yj )
 F (i − 1, j ) + g

F (i, j ) = max 
 F (i, j − 1) + g
0

Local Alignment DP Algorithm

• initialization: first row and first column initialized
with 0’s
• traceback:
– find maximum value of F(i, j); can be anywhere
in matrix
– stop when we get to a cell with value 0

17
Local Alignment Example
A A G A
0 0 0 0 0
T 0 0 0 0 0
T 0 0 0 0 0
A 0 1 1 0 1
A 0 1 2 0 1
G 0 0 0 3 1
x: A A G
y: A A G

Pairwise Sequence Alignment Algorithms
No ratings yet
Pairwise Sequence Alignment Algorithms
37 pages
Sequence Alignment Algorithms in Bioinformatics
75% (4)
Sequence Alignment Algorithms in Bioinformatics
37 pages
DNA Sequence Alignment Techniques
No ratings yet
DNA Sequence Alignment Techniques
96 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
92 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
55 pages
Alignment Scoring in Bioinformatics
No ratings yet
Alignment Scoring in Bioinformatics
25 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
51 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
26 pages
Understanding Sequence Alignment
No ratings yet
Understanding Sequence Alignment
27 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
98 pages
3 - BTE 401 Introduction To Alignment (Updated)
No ratings yet
3 - BTE 401 Introduction To Alignment (Updated)
24 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
9 pages
Substitution Matrices in Sequence Alignment
No ratings yet
Substitution Matrices in Sequence Alignment
138 pages
Protein Structure Prediction & Drug Design
No ratings yet
Protein Structure Prediction & Drug Design
24 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
66 pages
Computational Biology Algorithms Overview
No ratings yet
Computational Biology Algorithms Overview
125 pages
Types of Sequence Alignment in Bioinformatics
No ratings yet
Types of Sequence Alignment in Bioinformatics
90 pages
BLS 310 - Sequence Alignment - 04112024
No ratings yet
BLS 310 - Sequence Alignment - 04112024
50 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
38 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
41 pages
BIOL 3600 Lecture - 03 - Alignment
No ratings yet
BIOL 3600 Lecture - 03 - Alignment
46 pages
Sequence Alignment Algorithms Explained
No ratings yet
Sequence Alignment Algorithms Explained
12 pages
Sequence Alignment Techniques Overview
No ratings yet
Sequence Alignment Techniques Overview
45 pages
DNA Sequence Alignment Techniques
No ratings yet
DNA Sequence Alignment Techniques
16 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
45 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
7 pages
Sequence Alignment Techniques Overview
No ratings yet
Sequence Alignment Techniques Overview
56 pages
Sequence Alignment Methods Overview
No ratings yet
Sequence Alignment Methods Overview
32 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
27 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
28 pages
Global vs Local Sequence Alignment
No ratings yet
Global vs Local Sequence Alignment
14 pages
Sequence Alignment Techniques Explained
No ratings yet
Sequence Alignment Techniques Explained
8 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
36 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Sequence Alignment: Local vs Global
No ratings yet
Sequence Alignment: Local vs Global
3 pages
Sequence Alignment2
No ratings yet
Sequence Alignment2
8 pages
Sequence Alignment Techniques Explained
No ratings yet
Sequence Alignment Techniques Explained
19 pages
Needleman-Wunsch vs. Smith-Waterman Algorithms
No ratings yet
Needleman-Wunsch vs. Smith-Waterman Algorithms
11 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
42 pages
Bioinformatics: Short Read Alignment Techniques
No ratings yet
Bioinformatics: Short Read Alignment Techniques
18 pages
Sequence Alignment Methods and Analysis
No ratings yet
Sequence Alignment Methods and Analysis
63 pages
Global vs Local Sequence Alignment Methods
No ratings yet
Global vs Local Sequence Alignment Methods
57 pages
Pairwise Sequence Alignment Techniques
No ratings yet
Pairwise Sequence Alignment Techniques
49 pages
Optimizing Genomic Sequence Alignment
No ratings yet
Optimizing Genomic Sequence Alignment
7 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
149 pages
Sequence Alignment Techniques Explained
No ratings yet
Sequence Alignment Techniques Explained
61 pages
Multiple Sequence Alignment Methods
No ratings yet
Multiple Sequence Alignment Methods
34 pages
Dynamic Programming for Sequence Alignment
No ratings yet
Dynamic Programming for Sequence Alignment
8 pages
Sequence Alignment Tutorial by Amaro
No ratings yet
Sequence Alignment Tutorial by Amaro
14 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
35 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
27 pages
DNA Sequence Alignment Techniques
No ratings yet
DNA Sequence Alignment Techniques
57 pages
Bioinformatics: Sequence Alignment Methods
No ratings yet
Bioinformatics: Sequence Alignment Methods
60 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
18 pages
Alignment
No ratings yet
Alignment
51 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
33 pages
Tomato Farming Care Schedule
No ratings yet
Tomato Farming Care Schedule
1 page
Addressing Health Commodity Management Challenges
No ratings yet
Addressing Health Commodity Management Challenges
16 pages
Cloning Techniques and DNA Mapping
No ratings yet
Cloning Techniques and DNA Mapping
12 pages
Health Logistics in Tanzania A Decade of Supply Chain Accomplishments
No ratings yet
Health Logistics in Tanzania A Decade of Supply Chain Accomplishments
56 pages
La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics
No ratings yet
La Souris, La Mouche Et L'Homme A. Human Genome Project: B. DNA Computers/DNA Nanorobots: C. Phylogenomics
12 pages
Introduction to Computational Biology
No ratings yet
Introduction to Computational Biology
8 pages
SVM for Gene Expression Classification
No ratings yet
SVM for Gene Expression Classification
31 pages
Understanding Clone Maps in Biology
No ratings yet
Understanding Clone Maps in Biology
38 pages
HMMs for Biomedical Information Extraction
No ratings yet
HMMs for Biomedical Information Extraction
7 pages
Understanding Restriction Enzymes
No ratings yet
Understanding Restriction Enzymes
15 pages
Constructing Biological Knowledge Bases by Extracting Information From Text Sources
No ratings yet
Constructing Biological Knowledge Bases by Extracting Information From Text Sources
10 pages
Inferring Gene Regulatory Networks
No ratings yet
Inferring Gene Regulatory Networks
14 pages
Gene Expression Data Clustering Techniques
No ratings yet
Gene Expression Data Clustering Techniques
9 pages
Step 6: Profiling Segments: 8.1 Identifying Key Characteristics of Market Segments
No ratings yet
Step 6: Profiling Segments: 8.1 Identifying Key Characteristics of Market Segments
15 pages
Sequence Alignment Techniques in Bioinformatics
No ratings yet
Sequence Alignment Techniques in Bioinformatics
31 pages
E. coli Operon Prediction Using Naïve Bayes
No ratings yet
E. coli Operon Prediction Using Naïve Bayes
12 pages
Gene Expression Data Clustering Techniques
No ratings yet
Gene Expression Data Clustering Techniques
12 pages
Step 7: Describing Segments: 9.1 Developing A Complete Picture of Market Segments
No ratings yet
Step 7: Describing Segments: 9.1 Developing A Complete Picture of Market Segments
38 pages
Step 3: Collecting Data: 5.1 Segmentation Variables
No ratings yet
Step 3: Collecting Data: 5.1 Segmentation Variables
17 pages
Nyuki, Ufugaji Bora Wa
No ratings yet
Nyuki, Ufugaji Bora Wa
2 pages
Step 10: Evaluation and Monitoring: 12.1 Ongoing Tasks in Market Segmentation
No ratings yet
Step 10: Evaluation and Monitoring: 12.1 Ongoing Tasks in Market Segmentation
13 pages
Animal Breeding Methods Overview
No ratings yet
Animal Breeding Methods Overview
17 pages
Step 8: Selecting The Target Segment(s)
No ratings yet
Step 8: Selecting The Target Segment(s)
7 pages
Demon F1 Pepper Variety Overview
No ratings yet
Demon F1 Pepper Variety Overview
1 page
Unlocking Huawei E153 Modem Guide
100% (1)
Unlocking Huawei E153 Modem Guide
10 pages
DNA Sequencing Methods Overview
No ratings yet
DNA Sequencing Methods Overview
30 pages
Human Factors in Lighting Third Edition Peter Robert Boyce Ebook Formatted Reading Edition
100% (1)
Human Factors in Lighting Third Edition Peter Robert Boyce Ebook Formatted Reading Edition
47 pages
ISC Biotechnology3
No ratings yet
ISC Biotechnology3
12 pages
Genomic Diversity in Falcon Species
No ratings yet
Genomic Diversity in Falcon Species
16 pages
Engel Et Al 2016 The Bee Microbiome Impact On Bee Health and Model For Evolution and Ecology of Host Microbe
No ratings yet
Engel Et Al 2016 The Bee Microbiome Impact On Bee Health and Model For Evolution and Ecology of Host Microbe
9 pages
Bioinformatics Master's Course List
No ratings yet
Bioinformatics Master's Course List
3 pages
Molecular Diagnostics Study Guide
No ratings yet
Molecular Diagnostics Study Guide
17 pages
Yeast - 2000 - Pretorius - Tailoring Wine Yeast For The New Millennium Novel Approaches To The Ancient Art of Winemaking
No ratings yet
Yeast - 2000 - Pretorius - Tailoring Wine Yeast For The New Millennium Novel Approaches To The Ancient Art of Winemaking
55 pages
AI in Climate-Resilient Crop Breeding
No ratings yet
AI in Climate-Resilient Crop Breeding
14 pages
Vasiljevic Et Al., 2021
No ratings yet
Vasiljevic Et Al., 2021
11 pages
Advances in Immunology and Gene Editing
No ratings yet
Advances in Immunology and Gene Editing
9 pages
Chromosome Walking Explained
No ratings yet
Chromosome Walking Explained
3 pages
Microbial Genetics Techniques Overview
No ratings yet
Microbial Genetics Techniques Overview
7 pages
Microbial Culturing and Identification Techniques
No ratings yet
Microbial Culturing and Identification Techniques
23 pages
Lost Crops of The Incas: Origins of Domestication of The Andean Pulse Crop Tarwi, Lupinus Mutabilis
No ratings yet
Lost Crops of The Incas: Origins of Domestication of The Andean Pulse Crop Tarwi, Lupinus Mutabilis
15 pages
Hydrogel Biomaterials for Muscle Stem Cells
No ratings yet
Hydrogel Biomaterials for Muscle Stem Cells
50 pages
Rapid Sequencing Dna v14 PCR Barcoding SQK Rpb114 24 Document Document MinION en RPB 9191 v114 RevH 06nov2025 19
No ratings yet
Rapid Sequencing Dna v14 PCR Barcoding SQK Rpb114 24 Document Document MinION en RPB 9191 v114 RevH 06nov2025 19
44 pages
Draft Genomes of Bartonella Henselae Strains
No ratings yet
Draft Genomes of Bartonella Henselae Strains
3 pages
New Lactic Acid Bacteria from Stingless Bees
No ratings yet
New Lactic Acid Bacteria from Stingless Bees
15 pages
Real-World Problem Solving in OOP
No ratings yet
Real-World Problem Solving in OOP
3 pages
The Prehistory of Antibiotic Resistance
No ratings yet
The Prehistory of Antibiotic Resistance
9 pages
Ph.D. Entrance Exam Syllabus: Biotechnology
No ratings yet
Ph.D. Entrance Exam Syllabus: Biotechnology
4 pages
Bacterial Identification Virtual Lab Guide
100% (2)
Bacterial Identification Virtual Lab Guide
7 pages
Microbial Analysis of Swine Wastewater Lagoons
No ratings yet
Microbial Analysis of Swine Wastewater Lagoons
8 pages
Rossello-Mora 2015
No ratings yet
Rossello-Mora 2015
8 pages
Staining Techniques for Proteins & Nucleic Acids
No ratings yet
Staining Techniques for Proteins & Nucleic Acids
69 pages
Advanced Sequencing in Lignin Degradation
No ratings yet
Advanced Sequencing in Lignin Degradation
16 pages
TYLCV Seed Transmission in Tomatoes
No ratings yet
TYLCV Seed Transmission in Tomatoes
10 pages
Recent Advances in Animal Breeding Techniques
No ratings yet
Recent Advances in Animal Breeding Techniques
13 pages
HGP Methodologies and Ethical Implications
No ratings yet
HGP Methodologies and Ethical Implications
2 pages

CS 838: Pairwise Sequence Alignment

Uploaded by

CS 838: Pairwise Sequence Alignment

Uploaded by

Pairwise Sequence Alignment

loop structures: insertions/deletions

• gaps depicted with –

Scoring Scheme Components

Dynamic Programming Idea

AAA C consider best score of

Dynamic Programming Idea

F[i-1, j-1] F[i, j-1]

DP Algorithm for Global Alignment

A one optimal alignment

Equally Optimal Alignments

Dynamic Programming Analysis

Local Alignment DP Algorithm

Local Alignment DP Algorithm

You might also like