0% found this document useful (0 votes)

30 views9 pages

Global and Local Sequence Alignment Techniques

The document discusses global alignment and local alignment algorithms. It describes the Needleman-Wunsch algorithm as the first algorithm for global sequence alignment using dynamic programming to find the optimal alignment between entire sequences. The Smith-Waterman algorithm is presented as the method for local alignment to find locally similar regions between divergent or variably sized sequences. Key steps of the Needleman-Wunsch algorithm including setting up a scoring matrix and performing a trace-back procedure are outlined.

Uploaded by

Raj Lonkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views9 pages

Global and Local Sequence Alignment Techniques

Uploaded by

Raj Lonkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Global alignment

A global alignment contains the entire sequence of each

protein or DNA molecule that means it tries to align entire

sequence.

 One of the first and most important algorithms for aligning

two protein sequences was described by Needleman and

Wunsch (1970).

 TheNeedleman-Wunsch algorithm is an example of dynamic

programming.

 In global alignment, two sequences to be aligned are

assumed to be generally simmilar over their entire length.

 Alignment is carried out from beginning to end of both

sequences to find the best possible alignment across the entire
length between the two sequences.

 This
method is more applicable for aligning two closely related
sequences of roughly the same length.

 For divergent sequences and sequences of variable lengths, this

method may not be able to generate optimal results because it
fails to recognize highly similar local regions between the two
sequences.

 This algorithm is important because it produces an optimal

alignment of protein or DNA sequences, even allowing the
introduction of gaps.
 the Needleman-Wunsch approach to global sequence alignment
in three steps:

(1) setting up a matrix.

 First step is comparasion of two sequences in a
two-dimensional matrix.
 First sequence is listed horizintally along the matrix, second
sequence is listed vertically along the matrix .
 Then a matrix is build of dimensions m + 1 by n + 1
 A perfect alignment between two identical sequences would
simply be represented by a diagonal line extending from the top
left to the bottom right
 Any mismatches between two sequences would still be
represented on this diagonal path
 Gaps are represented in this matrix using horizontal or vertical
paths.

(2) scoring the matrix.

 The goal of this algorithm is to identify an optimal alignment.
 goal in finding an optimal alignment is to determine the path
through the matrix that maximizes the score.
 There are four possible occurrences at each position
 two residues may be perfectly matched
 they may be mismatched;
 a gap may be introduced from the first sequence
 a gap may be introduced from the second sequence,

(3) identifying the optimal alignment.

 After the matrix is filled, the alignment is determined by a
trace-back procedure.
 There are rewards and penalties match 1 mismatch -1 and gap
-2
 In the matrix the right bottom value will be larger than its

diagonal value then we can say it is match and if mis

matched then diagonal value will be larger than right bottom

one.

 If there is a match go diagonal, if not then go highest value

of the neighbour value and this is represented as gap.

Local alignment
 Localalignment, does not assume that the two sequences in
question have similarity over the entire length.

 It
only finds local regions with the highest level of similarity
between the two sequences and aligns these regions only .

 Stretches of sequences with highest density of matches are

aligned.

 Thisapproach can be used for aligning partially similar, different

length or more divergent sequences with the goal of searching for
conserved patterns in DNA or protein sequences.

 Thetwo sequences to be aligned can be of different lengths. In

which alignment of substring of target with substring of query is
done.

 This approach is more appropriate for aligning divergent

biological sequences containing only modules that are similar,
which are referred to as domains or motifs.

 The general local alignment method used is smith-waterman

which is an example of dynamic programming.
 The smith waterman method is very much similar to
needleman-wunsch method of gobal alignment , the only main
difference is the negative values in needleman-wunsch method is
converted to zero.
 The traceback step is far more simpler and straight forward than
global alignment, choosing the highest value first and then
moving upto zero is all needed in this [Link] would give a
conserved pattern in both the sequences.
Applications of bioinformatics:

Databases
 database is a computerized archive used to store and organize
data in such a way that information can be retrieved easily via a
variety of search criteria.
 Databases are composed of computer hardware and software
for data management.
 The chief objective of the development of a database is to
organize data in a set of structured records to enable easy
retrieval of information.
 To retrieve a particular record from the database, a user can
specify a particular piece of information, called value, to be found
in a particular field and expect the computer to retrieve the whole
data record. This process is called making a query

 Biological databases:
 Itis the a collection of biological information or data that is
organised so that it can be easily accessed, managed, updated.
 The kind of data includes DNA sequences of gene or full
genome, protein sequences and 3d structure protein, nucleic
acids and protein -nucleic acid complex.
 Current biological databases use all three types of database
structures: flflat fifiles, relational, and object oriented.
 Based on their contents, biological databases can be roughly
divided into three categories: primary databases, secondary
databases, and specialized databases.
Similarity identity

 An important concept in sequence analysis is sequence

homology.
 When two sequences are descended from a common
evolutionary origin, they are said to have a homologous
relationship or share homology.
 A related but different term is sequence similarity, which is the
percentage of aligned residues that are similar in physio-chemical
properties such as size, charge, and hydrophobicity.
 To be clear, sequence homology is an inference or a conclusion
about a common ancestral relationship drawn from sequence
similarity comparison when the two sequences share a high
enough degree of similarity.
 On the other hand, similarity is a direct result of observation from
the sequencealignment.
 Sequence similarity can be quantifified using percentages;
homology is a qualitative statement.
 In a protein sequence alignment, sequence identity refers to the
percentage of matches of the same amino acid residues between
two aligned sequences.
 Sequence Similarity and sequence identity are same

words for nucleotide sequence, but are different for

protein sequence where identity means % of exact

matches between 2 aligned sequences and similarity

means % of aligned resides that share characteristics.

 Bothidentity and similarity are used to deduce homology.
Homology has a specific definition having a common evolutionary
ancestor.

Homology
 Homologous are two or more sequence that descend from a
common ancestral sequence
 Homologos are results of divergent evolution.
 Two sequences are homologous if they share a common
evolutionary ancestry.
 There are no degrees of homology; sequences are either
homologous or not.
 Homologous proteins almost always share a significantly related
three-dimensional structure
 Proteins that are homologous may be orthologous or
paralogous.
 Orthologs are homologous sequences in different species that
arose from a common ancestral gene during speciation, result of
speciation events.
 Paralogs are homologous sequences that arose by a mechanism
such as gene duplication, result of gene duplication.
 Xenologsn result of horizontal gene transfer
 Gametologs :the gene in sex chromosomes that have not
recombined.
 Homologs : the gene which are separated by a speciation event
when hybridised together via lateral gene transfer.

Common questions

Global alignment methods, such as the Needleman-Wunsch algorithm, assume that two sequences are similar over their entire length and aim to align them from beginning to end . These are suitable for sequences of roughly the same length that are closely related. In contrast, local alignment methods, like the Smith-Waterman algorithm, do not assume overall sequence similarity and instead focus on finding and aligning only the regions with the highest similarity . Local alignments are more appropriate for aligning divergent sequences of different lengths or sequences with only few similar modules, as they focus on conserved patterns within the sequences .

Sequence similarity and sequence identity are related but distinct concepts in protein sequence analysis. Sequence identity specifically refers to the percentage of exact matches of the same amino acid residues between two aligned sequences . Similarity, however, refers to the percentage of aligned residues that share physio-chemical properties . Homology is an inference made based on high sequence similarity, indicating a common evolutionary origin . Although sequence similarity can provide insights into possible homology, it is not a definitive measure; homology is a qualitative assessment asserting a shared ancestry, often deduced when sequence similarity is substantial .

The choice between global and local alignment is influenced by the nature and goal of the sequence analysis. Global alignment is preferred when the sequences are of similar length and are expected to be similar across their entirety, as it attempts to align entire sequences . Local alignment is more appropriate when analyzing sequences that may only share some regions of similarity, such as in cases of divergent sequences or sequences of varying lengths, as it identifies and aligns only the most similar subsequences . The specific research objective, whether to align full sequences or identify conserved patterns, also dictates the alignment method .

The Needleman-Wunsch algorithm, a global alignment method, is not optimal for sequences of differing lengths or sequences that are not closely related because it attempts to produce an alignment over the entire length of both sequences . This approach may overlook significant local similarities or conserved regions which are more relevant in divergent or variably-length sequences, as it emphasizes alignment across the full sequence length .

Biological databases are comprised of computer hardware and software designed for data management, organization, and retrieval . They are categorized based on their content into primary, secondary, and specialized databases, and encompass a range of data types such as DNA sequences, protein sequences, 3D protein structures, nucleic acids, and protein-nucleic acid complexes . These databases usually employ flat files, relational, or object-oriented structures to store the data, allowing for efficient data access and management .

Local alignment enables the alignment of partially similar or divergent biological sequences by concentrating on aligning only the regions with the highest similarity or density of matches within the sequences, rather than attempting to align them entirely . This approach, exemplified by the Smith-Waterman algorithm, is particularly useful for identifying conserved patterns or motifs within sequences of differing lengths or variable similarity, which would be missed by global alignment methods that require overall similarity across full sequence lengths .

Homology in bioinformatics helps in understanding evolutionary relationships by indicating that homologous sequences have descended from a common ancestral sequence . Homologous proteins typically have related three-dimensional structures and may be categorized as orthologs, paralogs, or xenologs, reflecting different evolutionary processes like speciation, gene duplication, or horizontal gene transfer, respectively . Therefore, identifying homologous relationships can provide insights into the evolutionary history and functional similarities of proteins across different species .

Biological databases are critical for sequence alignment tasks as they store and organize vast amounts of biological data, including DNA and protein sequences, which are essential for conducting sequence alignment . They facilitate easy retrieval, management, and updating of sequence information, allowing researchers to efficiently access data needed for alignment tasks . The structured nature of databases enables users to perform precise searches and queries, ensuring the retrieval of accurate and relevant sequences for alignment .

The Needleman-Wunsch algorithm involves three primary steps to determine the optimal alignment: setting up a scoring matrix, scoring the matrix, and identifying the optimal alignment. Initially, a matrix is created with one sequence listed horizontally and the other vertically . Each position in the matrix considers four possibilities: a perfect match, a mismatch, a gap from the first sequence, or a gap from the second sequence, scored with specific rewards or penalties . After scoring the matrix, the optimal alignment is traced back from the highest scoring path, identifying the best alignment across the entire sequences .

The Smith-Waterman algorithm offers advantages over the Needleman-Wunsch algorithm for aligning divergent sequences because it performs local alignment, which focuses only on the most similar regions between sequences and disregards the rest . This approach is more suitable for sequences of different lengths or when only certain motifs or domains are conserved, allowing for more meaningful alignments in cases where full-length alignment might not capture the evolutionary or functional similarities . Moreover, its ability to handle sequences of varying similarity levels by setting negative values to zero makes it robust against noise in alignment scores, which can be crucial for finding local similarities .

Global vs Local Sequence Alignment
No ratings yet
Global vs Local Sequence Alignment
77 pages
Emboss
No ratings yet
Emboss
20 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
28 pages
BLAST 2 Sequences: Tool for Alignment
No ratings yet
BLAST 2 Sequences: Tool for Alignment
17 pages
Sequence Analysis Unit 4
No ratings yet
Sequence Analysis Unit 4
24 pages
Pairwise vs. Multiple Sequence Alignment
No ratings yet
Pairwise vs. Multiple Sequence Alignment
21 pages
Sequence Analysis in Molecular Biology
No ratings yet
Sequence Analysis in Molecular Biology
9 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
22 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
36 pages
Global Alignment and Local Alignment
No ratings yet
Global Alignment and Local Alignment
8 pages
DNA Sequence Alignment Techniques
No ratings yet
DNA Sequence Alignment Techniques
57 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
18 pages
Pairwise Sequence Alignment Overview
No ratings yet
Pairwise Sequence Alignment Overview
4 pages
Bioinformatics UNIT II
No ratings yet
Bioinformatics UNIT II
27 pages
Sequence Alignment Tutorial by Amaro
No ratings yet
Sequence Alignment Tutorial by Amaro
14 pages
Sequence Alignment Methods Overview
No ratings yet
Sequence Alignment Methods Overview
32 pages
Sequence Alignment and Phylogenetics Guide
No ratings yet
Sequence Alignment and Phylogenetics Guide
70 pages
Basics of Bioinformatics
No ratings yet
Basics of Bioinformatics
59 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
27 pages
Bioinformatics: Merging Biology and Computing
No ratings yet
Bioinformatics: Merging Biology and Computing
59 pages
Sequence Alignment
No ratings yet
Sequence Alignment
10 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
87 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
Conserved Sequences in Sequence Alignment
No ratings yet
Conserved Sequences in Sequence Alignment
17 pages
Importance of Sequence Alignment in Bioinformatics
No ratings yet
Importance of Sequence Alignment in Bioinformatics
13 pages
Sequence Alignment Methods and Analysis
No ratings yet
Sequence Alignment Methods and Analysis
63 pages
Pairwise Sequence Alignment Overview
No ratings yet
Pairwise Sequence Alignment Overview
13 pages
Multiple Sequence Alignment Methods
No ratings yet
Multiple Sequence Alignment Methods
5 pages
Needleman-Wunsch vs. Smith-Waterman Algorithms
No ratings yet
Needleman-Wunsch vs. Smith-Waterman Algorithms
11 pages
Database Queries & Sequence Alignment
No ratings yet
Database Queries & Sequence Alignment
40 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
114 pages
Global vs Local Sequence Alignment
No ratings yet
Global vs Local Sequence Alignment
4 pages
Understanding Sequence Alignments
No ratings yet
Understanding Sequence Alignments
25 pages
Understanding Sequence Alignment Techniques
No ratings yet
Understanding Sequence Alignment Techniques
33 pages
Significance of Sequence Alignment
No ratings yet
Significance of Sequence Alignment
15 pages
Needleman-Wunsch Algorithm for Gene Alignment
No ratings yet
Needleman-Wunsch Algorithm for Gene Alignment
4 pages
3 - BTE 401 Introduction To Alignment (Updated)
No ratings yet
3 - BTE 401 Introduction To Alignment (Updated)
24 pages
Mining Biological Sequence Patterns
No ratings yet
Mining Biological Sequence Patterns
6 pages
Sequence Alignment Algorithms Explained
No ratings yet
Sequence Alignment Algorithms Explained
12 pages
Understanding Sequence Alignment
No ratings yet
Understanding Sequence Alignment
27 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
9 pages
Types of Sequence Alignment Explained
No ratings yet
Types of Sequence Alignment Explained
22 pages
Protein Structure Prediction & Drug Design
No ratings yet
Protein Structure Prediction & Drug Design
24 pages
Sequence Alignment
No ratings yet
Sequence Alignment
10 pages
Dot Matrix Method in Sequence Alignment
No ratings yet
Dot Matrix Method in Sequence Alignment
107 pages
The Needleman Wunsch Algorithm For Sequence Alignment
No ratings yet
The Needleman Wunsch Algorithm For Sequence Alignment
46 pages
Gene Sequencing with Needleman-Wunsch Algorithm
No ratings yet
Gene Sequencing with Needleman-Wunsch Algorithm
5 pages
Biological Sequence Alignment Overview
No ratings yet
Biological Sequence Alignment Overview
15 pages
Sequence Alignment Techniques in Bioinformatics
No ratings yet
Sequence Alignment Techniques in Bioinformatics
51 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
54 pages
Evolutionary Basis of Sequence Alignment
No ratings yet
Evolutionary Basis of Sequence Alignment
26 pages
Dynamic Programming in Sequence Alignment
No ratings yet
Dynamic Programming in Sequence Alignment
41 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
17 pages
Advanced Bioinformatics Course Overview
No ratings yet
Advanced Bioinformatics Course Overview
6 pages
Sequence Alignment
No ratings yet
Sequence Alignment
44 pages
Pair-Wise Sequence Alignment Basics
No ratings yet
Pair-Wise Sequence Alignment Basics
17 pages
Alignment
No ratings yet
Alignment
51 pages
Overview of Sequence Analysis Methods
No ratings yet
Overview of Sequence Analysis Methods
6 pages
Pairwise Sequence Alignment Methods
No ratings yet
Pairwise Sequence Alignment Methods
23 pages
Biotechnology Applications in Pest Resistance
No ratings yet
Biotechnology Applications in Pest Resistance
3 pages
Essential MBBS Reference Books List
No ratings yet
Essential MBBS Reference Books List
3 pages
Biotechnology Applications in Health
No ratings yet
Biotechnology Applications in Health
43 pages
Overview of Bioinks for 3D Bioprinting
No ratings yet
Overview of Bioinks for 3D Bioprinting
32 pages
Mycobacterial Biofilms and Nanoparticle Disruption
No ratings yet
Mycobacterial Biofilms and Nanoparticle Disruption
51 pages
COMPANION ANIMALS SYMPOSIUM: Microbes and Health: K. S. Swanson, J. S. Suchodolski, and P. J. Turnbaugh
No ratings yet
COMPANION ANIMALS SYMPOSIUM: Microbes and Health: K. S. Swanson, J. S. Suchodolski, and P. J. Turnbaugh
2 pages
HISCL Washing Solution Overview
No ratings yet
HISCL Washing Solution Overview
3 pages
Mls 047 Molecular Biology and Diagnostics Lab. Manual
No ratings yet
Mls 047 Molecular Biology and Diagnostics Lab. Manual
60 pages
Modern Microbiology Overview
No ratings yet
Modern Microbiology Overview
5 pages
Plant vs. Animal Cell Organelles
No ratings yet
Plant vs. Animal Cell Organelles
4 pages
Manufacturing Challenges of Therapeutic Exosomes
No ratings yet
Manufacturing Challenges of Therapeutic Exosomes
7 pages
Gene Structure and Function Overview
No ratings yet
Gene Structure and Function Overview
21 pages
Mitonuclear Discordance in Species Delimitation
No ratings yet
Mitonuclear Discordance in Species Delimitation
3 pages
Acute Myeloid Leukemia A Concise Review
No ratings yet
Acute Myeloid Leukemia A Concise Review
17 pages
Modern Biotechnology Techniques Overview
No ratings yet
Modern Biotechnology Techniques Overview
72 pages
ICMR Vacancy: Project Research Scientist-I
No ratings yet
ICMR Vacancy: Project Research Scientist-I
1 page
High School Students' Cell Understanding
No ratings yet
High School Students' Cell Understanding
19 pages
Foundations of Human Biology Overview
No ratings yet
Foundations of Human Biology Overview
177 pages
Bacteria and Viruses: Key Concepts
No ratings yet
Bacteria and Viruses: Key Concepts
12 pages
Biotechnology Student Resume Overview
No ratings yet
Biotechnology Student Resume Overview
1 page
Biology Exam Practice Questions
No ratings yet
Biology Exam Practice Questions
4 pages
10th World Sponge Conference Overview
No ratings yet
10th World Sponge Conference Overview
149 pages
Deng 2016
No ratings yet
Deng 2016
22 pages
Alicyclobacillus Detection in Juices
No ratings yet
Alicyclobacillus Detection in Juices
1 page
Gene Regulation and Transposable Elements
No ratings yet
Gene Regulation and Transposable Elements
27 pages
ICMR SARS-CoV-2 Test Report
No ratings yet
ICMR SARS-CoV-2 Test Report
4 pages
PGT-WGS: Advancements in Genetic Testing
No ratings yet
PGT-WGS: Advancements in Genetic Testing
15 pages
Testbank Introduction To Materials Management 7E Arnold Fast Download
100% (2)
Testbank Introduction To Materials Management 7E Arnold Fast Download
227 pages
DNA Methylation and Development
No ratings yet
DNA Methylation and Development
4 pages
DirectPlex IFU
No ratings yet
DirectPlex IFU
1 page