0% found this document useful (0 votes)

21 views17 pages

Understanding Multiple Sequence Alignment

The document discusses homologous multiple sequence alignment (MSA), emphasizing the importance of sequence identity and similarity in understanding evolutionary relationships among proteins. It outlines various mechanisms of molecular evolution, alignment methods (including progressive and iterative approaches), and the significance of tools like PAM and BLOSUM matrices in analyzing sequence data. Additionally, it highlights the applications of MSA in phylogenetic analysis, protein structure prediction, and identifying conserved sequences.

Uploaded by

Your Friend

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

Understanding Multiple Sequence Alignment

Uploaded by

Your Friend

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

04-02-2025

Homologous
Multiple Sequence Alignent ◼ Homology is an inference (sequences are
homologous or not).

◼ Identity and similarity are quantities that

Dr Perugu Shyam describe the relatedness of sequences.

Shyam’s

◼ The close kinship between human beings and

chimpanzees, hinted at by the mutual interest shown by
Jane Goodall and a chimpanzee in the photograph, is
revealed in the amino acid sequences of myoglobin.

◼ The human sequence (red) differs from the chimpanzee

sequence (blue) in only one amino acid in a protein
chain of 153 residues.

Sequence similarity to Humans

Shyam’s Shyam’s

1
04-02-2025

Similarity

◼ Similarity is a quantitative measure of how

two sequences are related to one another.

◼ Similarity is assessed as the total number of

identities and conservative substitutions in pair
wise sequence alignment.

Shyam’s Shyam’s

Mechanisms Involved in Molecular

Identity Evolution of Genes/Proteins
Mutation- Stochastic single point changes in the genetic material due to errors
in DNA replication during mitosis, radiation exposure, chemical or environmental
◼ A quantitative measure of how related two stressors, or viruses and transposable elements. Slow but constant rate
(molecular clock) of 10-9 to 10-8 mutations per base per generation. Splicing
sequences to one another. errors in eukaryotes that retain introns.

Recombination- Exchange of genes or portions of genes between different

chromosomes to create new combinations of elements.
◼ Identity is assessed as the total number of Gene duplication- Duplication of a gene or portions of a gene, one of which
exact matches in pair wise sequence alignment continues the original function and the other is free to evolve and acquire new
functions.

Retrotransposition- Incorporation of mRNA sequences back into DNA,

frequently inserting into new locations with different expression patterns.

The mechanism by which new genes/proteins arise allow for the

possibility of sequence analysis to infer functional and structural
relationships among different sequences.
Shyam’s

2
04-02-2025

PAM (Percent Acceptable Mutation) matrices Block substitution matrices (BLOSUM)

• Are derived from studying global alignments of well-characterized protein families. Are derived from studying local alignments (blocks) of sequences from related proteins
• PAM1 = only 1% of residues has changed (ie short evolutionary distance) that differ by no more than X%.
• Raise this to 250 power to get 250% change of two sequences (greater
1) In other words, one might use the portions of aligned sequences from related
evolutionary distance), or about 20% sequence identity.
proteins that have no more than 62% identity (in the portions or blocks) to derive
• Therefore,
the BLOSUM 62 scoring matrix.
a PAM 30 would be used to analyze more closely related proteins,
a PAM 400 is used for finding and analyzing distantly related proteins. 2) One might use only the blocks that have <80% identity to derive the BLOSUM 80
• PAMx = PAM1x matrix.

3) BLOSUM and PAM substitution matrices have the opposite effects:

a) The higher the number of the BLOSUM matrix (BLOSUM X), the more closely
related proteins you are looking for.

a) The higher the number of the PAM matrix (PAM X), the more distantly related
proteins you are looking for.

Gap penalties – Intuitively one recognizes that there should be a penalty

for introducing (requiring) a gap during identification/alignment of a given
sequence. But if two sequences are related, the gaps may well be located
in loop regions which are more tolerant of mutational events and probably
have little impact on structure. Therefore, a new gap should be penalized,
but extending an existing gap should be penalized very little.

Filtering – many proteins and nucleotides contain simple repeats or regions

Multiple Sequence Alignments
of low sequence complexity. These must be excluded from searches and
alignments.

Significance of a “hit” during a search - More important than an arbitrary

score is an estimation of the likelihood of finding a hit through pure chance
(lower the value to more certainty of a match). Ergo the “Expectation value”
or E-value. E-values can be as low as 10-70.

3
04-02-2025

Exhaustive algorithms :
◼ Place residues in columns that are
derived from a common ancestral ◼ Exhaustive alignment involves examining all possible alignments at once.
residue ◼ A multidimensional search matrix is required to perform multiple
sequence alignment using the exhaustive algorithm, similar to the two-
◼ MSA can reveal sequence patterns dimensional matrix used in dynamic programming for pairwise alignment.
CREASE
◼ Demonstration of homology between >2 This means that to align N sequences, an N-dimensional matrix is
sequences CREATE required.
◼ Identification of functionally important RELAPSE
sites ◼ Dynamic programming is a powerful method for aligning sequences, but
as the number of sequences to be aligned increases, the amount of
◼ Protein function prediction GREASER computational time and memory space also increases. This means that the
◼ Structure prediction method becomes computationally impractical for large data sets. As a
◼ Search for weak but significant similarities SeqA CREAT--E- result, dynamic programming is typically only used for small data sets
in databases with fewer than ten short sequences.
Design PCR primers for related gene
SeqB CREAS--E-
◼
◼ Heuristic approaches are typically used for larger data sets to achieve a
identification SeqC GREAS--ER more efficient alignment.
◼ Genome sequencing: contig assembly
SeqD -RELAPSE-
123456789

Heuristic algorithm :
◼ i. Progressive method
◼ The progressive method, also known as the tree-based algorithm, is
a step-wise assembly of multiple alignments based on pairwise
similarity. This method is called progressive because it aligns
sequences in a step-wise manner.
◼ First, it performs pairwise alignments of all the sequences using the
Needleman–Wunsch global alignment method and records the
similarity scores.
◼ Then, it converts the scores into evolutionary distances to create a
distance matrix. A guide tree is constructed from the distance matrix
using the neighbor-joining method.
◼ The guide tree is used to direct the realignment of sequences based
on their relative positions on the tree, starting with the two most
closely related sequences and adding more distant sequences one at
a time until all sequences are aligned.
◼ Clustal and T-Coffee are two well-known progressive alignment
programs.

4
04-02-2025

◼ ii. Iterative Method

◼ The iterative method involves improving an initial

suboptimal solution by repeatedly modifying it
until an optimal solution is reached.
◼ An initial pairwise alignment is conducted to
create a tree that provides weights for creating
alignments. Aligned regions with gaps are
identified and iteratively adjusted to enhance the
alignment score. The highest-scoring alignment is
used in a new set of calculations to predict a new
tree, new weights, and new alignments. The
procedure is repeated until there is no more
improvement in the alignment score.
◼ PRRN is a web-based program that uses the
iterative method of alignment.

◼ Applications of sequence alignment :

◼ iii. Block-based method ◼ Sequence alignment can identify unknown sequences by
comparing them with already known sequences in
databases.
◼ The progressive and iterative alignment methods ◼ Sequence alignment is also used to identify conserved
are based on global alignment and may not be sequence patterns and motifs, which helps to characterize
effective in identifying conserved domains and the functions of the sequences.
◼ Sequence alignment can also produce phylogenetic trees
motifs in highly divergent sequences of different and obtain information about the evolutionary
lengths. relationship between the sequences aligned.
◼ To align such divergent sequences, a local ◼ Sequence alignment can also predict proteins’ secondary
and tertiary structures. It can also predict gene locations
alignment-based approach is needed. and new members of gene families.
◼ The block-based method is one such method that ◼ Sequence alignment can also be used to develop
identifies a block of ungapped alignment that is degenerate PCR primers by analyzing multiple related
sequences.
shared by all sequences.

5
04-02-2025

Multiple Sequence alignment

6
04-02-2025

MSA with PILEUP

PILEUP is the MSA program that is part of the Genetics Computer Group
(GCG) sequence analysis package

Sequences are aligned pairwise using dynamic programming algorithm

The scores are used to produce a phylogenetic tree, which is then used to
guide the alignment of the most closely related sequences and groups of
sequences

Resulting alignment is a global alignment produced by the Needleman-

Wunsch algorithm

MSA with PILEUP Iterative MSA methods

PILEUP drawbacks:

No recent enhancements such as gap modifications or sequence weighting Attempt to correct initial alignment problems by repeatedly aligning subgroups of the
sequences and then by aligning these subgroups into a global alignment of all the sequences
comparable to those introduced for ClustalW
MultAlin – recalculates pair-wise scores during the production of the progressive alignment
and uses these scores to recalculate the tree
As with other progressive alignment programs, does not guarantee an
optimal alignment PRRP – initial alignment is made to predict a tree, the tree is used to produce weights where
the sequences are analyzed for the presence of aligned regions that include gaps

Major problem with progressive alignment programs such as ClustalW and SAGA – based on genetic algorithm that is a machine-learning algorithm that attempts to
produce alignments by the simulations of evolutionary changes in sequences
PILEUP is the dependence of the final multiple sequence alignment on the
initial pairwise alignments

For closely related sequences, ClustalW is designed to provide an adequate

alignment of a large number of sequences

7
04-02-2025

Editing and formatting alignments Multiple Sequence Alignment

Sequence editors are used for: Clustalw can be run on many websites or
- manual alignment/editing of sequences downloaded
- visualization of data

- data management
ClustalX is a graphical form of Clustalw which can
- import/export of data
be downloaded
- graphical enhancement of data for presentations
Clustalw is a global sequence alignment program
Examples:

- CINEMA (Color Interactive Editor for Multiple Alignments) web applet

therefore sequences may need to be edited before
[Link]

- GDE (Genetic Data Environment) - UNIX based

alignment
[Link]

- GeneDoc - MS Windows [Link]

- MACAW - local multiple sequence alignment program and sequence editing tool
Examples: Clustal W/X, Pileup (GCG), 3D-Coffee, DIALIGN-2, MUSCLE,
available by anonymous FTP from [Link]/pub/schuler/macaw PROBCONS, MSA, SALIGN.
- BioEdit - sequence alignment editor for MS Windows with web access and
accessory applications (BLAST, local BLAST, ClustalW, Phylip and more)

ClustalW
◼ Based on phylogenetic analysis.
◼ A phylogenetic tree is created using a pairwise distance matrix and
nearest-neighbor algorithm.
◼ The most closely-related pairs of sequences are aligned using
dynamic programming.
◼ Each of the alignments is analyzed and a profile of it is created.
◼ Alignment profiles are aligned progressively for a total alignment.
◼ W in ClustalW refers to a weighting of scores depending on how
far a sequence is from the root on the phylogenetic tree (See p.
154 of Bioinformatics by Mount.)

8
04-02-2025

Summary MSA
Definition:
A multiple sequence alignment is an alignment of n > 2 sequences obtained by inserting gaps Approaches:
(“-”) into sequences such that the resulting sequences have all length L and can be arranged in a
matrix of N rows and L columns where each column represents a homologous position
◼ Optimal Global Alignments -Dynamic programming
Why do we need MSA?
◼ Build matrices with every possible combination and
- Formulate & test hypotheses about protein 3-D structure
- MSA can help us to reveal biological facts about proteins search for optimal solution
- Crucial for genome sequencing
- To establish homology for phylogenetic analyses ◼ Align 10 sequences of 100 aa length
- Identify primers and probes to search for homologous sequences in other organisms
◼ Optimal in the mathematical sense
- Most pairwise alignment algorithms are too complex to be used for n-wise alignments
- Alignment algorithms need to be optimized ◼ Global Progressive Alignments - Match most common
* use structural information sequences together
* use phylogenetic information
* use conserved regions ◼ Global Iterative Alignments - Multiple re-building
MSA methods
- Progressive global alignment (starts with the most alike sequences)
attempts to find best alignment
* e.g., ClustalW, ClustalX, Pileup
- Iterative methods (initial alignment of groups of sequences that are revised)
* MultAlin, PRRP, SAGA
◼ Local alignments
- Alignments based on locally conserved patterns
◼ Profiles, Blocks, Patterns
Sequence editors
- CINEMA GDE, GeneDoc, MACAW, BioEdit

Progressive Methods
Progressive Method
◼ Similar to dynamic programming method in that it uses ◼ Generally proceeds as follows:
the first step (i.e., it creates a phylogenetic tree, aligns the ◼ Choose a starting pair of sequences and align them
most-alike pair, and incrementally adds sequences to the ◼ Align each next sequence to those already aligned, one at
alignment in order of “alikeness” as indicated by the tree.). a time
◼ Heuristic method – doesn’t guarantee an optimal alignment
◼ Differs from dynamic programming method for MSA in ◼ Details vary in implementation:
that it doesn’t refine the “first-cut” MSA by doing a full ◼ How to choose the first sequence to align?
search through the reduced search space. (This is the ◼ Align all subsequence sequences cumulatively or in
computationally expensive part of DP MSA in that, even subfamilies?
though we’ve cut down the search space, it’s still big when
◼ How to score?
we have many sequences to align.)

9
04-02-2025

Problems with Progressive Method

Global Progressive Alignment
◼ A heuristic approach that utilizes Seq1
VMR
Seq2
VMK
Seq3
GMK
Seq4
GMV
◼ MSA depends on pairwise alignments.
phylogenetic information to ◼ If sequences are very distantly related, much higher
assist in routing the alignment
(clustalw/clustalx).
VMR
VMK
likelihood of errors.
◼ Feng & Doolittle1987, Higgins = ◼ Highly sensitive to the choice of initial pair to align. If
and Sharp 1988. VMR/K
they aren’t very similar, it throws everything off.
◼ Most alike sequences are aligned VMR/K ◼ It’s not trivial to come up with a suitable scoring matrix
together in order of their GMK
=
or gap penaties.
similarity (tree-based), a
consensus is determined and
V/G M R/K ◼ Other approaches using Bayesian methods such as
then aligned to next most similar
VMR
VMK
hidden Markov models
sequence GMK V/G M R/K
GMV
GMV
=
V/G M V/R/K

Iterative Multiple Alignment Iterative Methods for MSA

◼ “Repeatedly re-align subgroups of ◼ Get an alignment.
sequences into a global alignment Initial Progressive
to improve alignment score” Alignment
◼ Refine it.
(Mount, 2001)
◼ Repeat until one msa doesn’t change
◼ Start with a progressive alignment Build Tree
and tree significantly from the next.
◼ Recalculate pair-wise scores during Weight Based On ◼ An example is genetic algorithm approach.
progressive alignment, use new
scores to rebuild the tree, which is Subgroup Alignments
used to improve alignments
Iterate MSA

10
04-02-2025

Phylogenetics

Biological Foundations Terminology

Evolution is driven by ◼ Phylogeny
◼ Inheritance
◼ Variation
The evolutionary relationships among organisms,
◼ Mutations based on a common ancestor
Phenotype
◼ Phylogenetics
◼
◼ Genotype
Area of research concerned with finding the
◼ Recombination
genetic relationships between species
◼ Nature selects: survival of the fittest
◼ (Greek: phylon = race and genetic = birth

◼ All organisms share a common ancestry

11
04-02-2025

Applications of phylogenetic trees

Phylogeny
◼ Evolution studies
◼ Systematic biology
◼ Medical research and epidemiology
Orangutan Gorilla Chimpanzee Human
◼ Ecology

Phylogenetic Trees Tree Shapes

◼ A graph representing the evolutionary Rooted Un-rooted
history of a sequence
A A A C
◼ Relationship of one sequence to other Simple Tree
sequences B B
B D
◼ Dissect the order of appearance of A C C
insertions, deletions, and mutations B
D D
◼ Predict function, observe epidemiology, C
analyzing changes in viral strains
D Branches intersect at Nodes
Leaves are the topmost branches

12
04-02-2025

Tree Characteristics Tree Building Algorithms

◼ Tree Properties ◼ Maximum Parsimony
◼ Clade: all the descendants of a common ancestor
represented by a node
Phylogram
◼ Distance: number of changes that have taken place ◼ Distance Methods
along a branch .035
.012
A ◼ UPGMA
◼ Tree Types ◼ Neighbor Joining
◼ Cladogram: shows the branching order of nodes .009B
.057
C
Phylogram: shows branching order and distances
Maximum Likelihood
◼
.016 ◼
.044D

Maximum Parsimony Distance Methods

Informative Trees ◼ Distance is expressed as the fraction of sites that
Alignment Tree I Tree II Tree III
Site
differ between two sequences in an alignment
1 2 3 4 5 6 1 3 1 2 1 2
One A A G A G T
Site 5
G A G G G G ◼ Sequences with the smallest number of changes
Two A G C C G T G A A A G A
Three A G A T A T G A A A A A (shortest distance) are “related taxa”
Four 2 4 3 4 4 3
A G A G A T
(Select Tree I) (Li, 1991)

◼ Find the tree that changes one sequence into all of the others by the least
number of steps [Focus solely on end product sequences, ignore
evolutionary history]
◼ Only informative sites are analyzed (no gaps or conserved positions)
◼ Can be misleading when rates of change vary in different tree branches

13
04-02-2025

Distance Methods - UPGMA Distance Methods - NJ

◼ Neighbor-Joining (NJ): useful when there are different
◼ UPGMA (Unweighted Pair-Group Method with rates of evolution within a tree
Arithmetic mean) ◼ Each possible pair-wise alignment is examined. Calculate distance
◼ Sequentially find pair of taxa with smallest distance from each sequence to every other sequence
between them, and define branching as midpoint of two ◼ Choose the pair with the lowest distance value and join them to
produce the minimal length tree
◼ Assumes the tree is additive and that rate of change is ◼ Update distance matrix where joined node is substituted for two
constant in all of the branches original taxa and then repeat process
A A A E
DAB
A C
2 B D(AB)C B B B E C 3
H B A A B F
2
C C G 1 C 2 1 F 2 1
D(ABC)D
2 D D
F D H G G
D H

Maximum Likelihood Tree Reliability

◼ Best accounts for variation in sequences ◼ Probability that the members of a clade are always
members of that clade
◼ Establish a probabilistic model with multiple
◼ Sample by Bootstrapping
solutions and determine which is most likely
◼ Random sites of an alignment are randomly sampled so as
◼ All possible trees are considered, therefore, to create a dataset the same size as the original. The same
only suitable for small number of sequences analysis as applied to the original data set is performed on
the bootstrap dataset
◼ Maximizes probability of finding optimal tree
◼ Construct a consensus bootstrap tree and compare to the
original tree

14
04-02-2025

Analysing the aligned sequence

Which Method to Use?
matrix
◼ PHYLIP
Is there yes Maximum
strong Parsimony
◼ POY
sequence
similarity? ◼ PAUP, GCG
no
Is there yes
◼ And many more... (274 software packages
Distance
clearly
Methods
described at one website)
recognizable
sequence
similarity?
no
(Mount, 2001)
Maximum
Likelihood

PHYLIP (Phylogeny Inference Package)

[Link]

◼ Available free in Windows/MacOS/Linux

systems
◼ Parsimony, distance matrix and likelihood
methods (bootstrapping and consensus trees)
◼ Data can be molecular sequences, gene
frequencies, restriction sites and fragments,
distance matrices and discrete characters

15
04-02-2025

Visualising trees
◼ Treeview
◼ You can change the graphic presentation of a
tree (cladogram, rectangular cladogram, radial
tree, phylogram), but not change the structure
of a tree

16
04-02-2025

POY
(Phylogenetic Analysis Using Parsimony)

◼ Cladistic and phylogenetic analysis using sequence

and/or morphological data
◼ Finding among all possible trees, those that exhibit
minimal edit costs (minimum number of mutations)
◼ Is able to assess directly the number of DNA
sequence transformations, evolutionary events,
required by a tree topology without the use of
multiple sequence alignment
◼ CSC

Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
9 pages
Overview of Sequence Analysis Methods
No ratings yet
Overview of Sequence Analysis Methods
6 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
18 pages
Pair-Wise Sequence Alignment Basics
No ratings yet
Pair-Wise Sequence Alignment Basics
17 pages
RNA Posttranscriptional Modifications
No ratings yet
RNA Posttranscriptional Modifications
16 pages
CATH Protein Structure Classification
No ratings yet
CATH Protein Structure Classification
3 pages
Multalin Tool for Sequence Alignment
No ratings yet
Multalin Tool for Sequence Alignment
66 pages
Rigid vs. Flexible Docking Methods
100% (1)
Rigid vs. Flexible Docking Methods
8 pages
Amino Acid Catabolism Overview
No ratings yet
Amino Acid Catabolism Overview
21 pages
Metabolic Control and Regulation Overview
No ratings yet
Metabolic Control and Regulation Overview
8 pages
Structure and Regulation of Trp Operon
No ratings yet
Structure and Regulation of Trp Operon
14 pages
RNA Processing in Eukaryotes
No ratings yet
RNA Processing in Eukaryotes
11 pages
Cell Communication and Signal Transduction
No ratings yet
Cell Communication and Signal Transduction
101 pages
Transcription Factors in Gene Regulation
No ratings yet
Transcription Factors in Gene Regulation
25 pages
Arginine's Role in Protein Chromatography
No ratings yet
Arginine's Role in Protein Chromatography
7 pages
Steps in Homology Modeling Process
No ratings yet
Steps in Homology Modeling Process
29 pages
Dynamic Programming in Bioinformatics
No ratings yet
Dynamic Programming in Bioinformatics
18 pages
Amino Acid Acetylation Overview
100% (1)
Amino Acid Acetylation Overview
22 pages
Understanding Proteomics Techniques
No ratings yet
Understanding Proteomics Techniques
32 pages
Multimeric Protein Structure Insights
No ratings yet
Multimeric Protein Structure Insights
39 pages
Isoschizomers vs. Neoschizomers Explained
100% (1)
Isoschizomers vs. Neoschizomers Explained
4 pages
Overview of Restriction Enzymes
No ratings yet
Overview of Restriction Enzymes
6 pages
Cell Communication Mechanisms Explained
No ratings yet
Cell Communication Mechanisms Explained
25 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
8 pages
Introduction to Molecular Biology
100% (2)
Introduction to Molecular Biology
39 pages
Gene Expression Regulation Explained
100% (1)
Gene Expression Regulation Explained
31 pages
Molecular Chaperones in Protein Folding
No ratings yet
Molecular Chaperones in Protein Folding
17 pages
Gene Mapping: Types and Importance
No ratings yet
Gene Mapping: Types and Importance
16 pages
Protein Sequencing Methods Overview
No ratings yet
Protein Sequencing Methods Overview
21 pages
Eukaryotic Transcription Factors Overview
No ratings yet
Eukaryotic Transcription Factors Overview
13 pages
Genetic Engineering Course Syllabus
No ratings yet
Genetic Engineering Course Syllabus
3 pages
Exonucleases: Functions and Examples
No ratings yet
Exonucleases: Functions and Examples
16 pages
Transgenic Cattle: Innovations and Examples
No ratings yet
Transgenic Cattle: Innovations and Examples
3 pages
Clone Contig Method in DNA Assembly
No ratings yet
Clone Contig Method in DNA Assembly
7 pages
Online Primer Design for PCR and qPCR
No ratings yet
Online Primer Design for PCR and qPCR
25 pages
BLAST vs FASTA: Key Differences Explained
No ratings yet
BLAST vs FASTA: Key Differences Explained
2 pages
Dna Markers
100% (3)
Dna Markers
52 pages
Rice, Maize, and Wheat Genome Projects
No ratings yet
Rice, Maize, and Wheat Genome Projects
27 pages
Protein Sequence Analysis Techniques
No ratings yet
Protein Sequence Analysis Techniques
69 pages
DNA Recombination: Types and Functions
100% (1)
DNA Recombination: Types and Functions
33 pages
Sequecing - Khan Academy
No ratings yet
Sequecing - Khan Academy
16 pages
Transamination and Deamination Overview
No ratings yet
Transamination and Deamination Overview
6 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
46 pages
Labmanual CS 1
No ratings yet
Labmanual CS 1
52 pages
Understanding Molecular Transcription Process
No ratings yet
Understanding Molecular Transcription Process
101 pages
Organelle Genomes in Molecular Genetics
No ratings yet
Organelle Genomes in Molecular Genetics
13 pages
Amino Acid Synthesis Overview
100% (1)
Amino Acid Synthesis Overview
16 pages
Understanding Exons and Introns
No ratings yet
Understanding Exons and Introns
30 pages
Centrifugation
No ratings yet
Centrifugation
20 pages
Post-Translational Modifications Overview
100% (1)
Post-Translational Modifications Overview
33 pages
Overview of Post-Translational Modifications
No ratings yet
Overview of Post-Translational Modifications
14 pages
Phage Therapy: A New Antibiotic Alternative
No ratings yet
Phage Therapy: A New Antibiotic Alternative
42 pages
Importance of Sequence Alignment in Bioinformatics
No ratings yet
Importance of Sequence Alignment in Bioinformatics
13 pages
Clustal Algorithms for Sequence Alignment
No ratings yet
Clustal Algorithms for Sequence Alignment
2 pages
Proteome Analysis Techniques and Applications
No ratings yet
Proteome Analysis Techniques and Applications
10 pages
Sequence Alignment Techniques Overview
No ratings yet
Sequence Alignment Techniques Overview
48 pages
Sequence Analysis in Molecular Biology
No ratings yet
Sequence Analysis in Molecular Biology
9 pages
Bioinformatics UNIT II
No ratings yet
Bioinformatics UNIT II
27 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
114 pages
Sequence Analysis Unit 4
No ratings yet
Sequence Analysis Unit 4
24 pages
Overview of BLAST in Bioinformatics
100% (1)
Overview of BLAST in Bioinformatics
21 pages
Multiple Sequence Alignment in Bioinformatics
No ratings yet
Multiple Sequence Alignment in Bioinformatics
42 pages
Introduction to Sequence Analysis
No ratings yet
Introduction to Sequence Analysis
2 pages
DBT-BET 2009 Exam Instructions & Questions
100% (3)
DBT-BET 2009 Exam Instructions & Questions
22 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
45 pages
Biologically-Inspired Music Retrieval Techniques
No ratings yet
Biologically-Inspired Music Retrieval Techniques
167 pages
SOW Checklist Bonus Material For SOW Webinar
No ratings yet
SOW Checklist Bonus Material For SOW Webinar
10 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
2 pages
Bioinformatics Course Teaching Plan
No ratings yet
Bioinformatics Course Teaching Plan
6 pages
Generalized ANN Model for Microstrip Antennas
No ratings yet
Generalized ANN Model for Microstrip Antennas
10 pages
Multiple Sequence Alignment Overview
No ratings yet
Multiple Sequence Alignment Overview
21 pages
AlphaFold: Transforming Drug Discovery
No ratings yet
AlphaFold: Transforming Drug Discovery
9 pages
PASTA: Enhanced Protein Alignment Methods
No ratings yet
PASTA: Enhanced Protein Alignment Methods
3 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
18 pages
Introduction To Bioinformatics Bioschema 2026
No ratings yet
Introduction To Bioinformatics Bioschema 2026
3 pages
MAFFT
No ratings yet
MAFFT
8 pages
B.Sc. Multidisciplinary Syllabi 2025-26
No ratings yet
B.Sc. Multidisciplinary Syllabi 2025-26
85 pages
Multiple Sequence Alignment Methods
No ratings yet
Multiple Sequence Alignment Methods
17 pages
ProteinNet: Standardized Dataset for ML
No ratings yet
ProteinNet: Standardized Dataset for ML
10 pages
Pairwise Sequence Alignment Algorithms
No ratings yet
Pairwise Sequence Alignment Algorithms
37 pages
Protein Binder Design with pTMEnergy
No ratings yet
Protein Binder Design with pTMEnergy
14 pages
Bioinformatics Research Internship Guide
No ratings yet
Bioinformatics Research Internship Guide
5 pages
Multiple Sequence Alignment in Biology
No ratings yet
Multiple Sequence Alignment in Biology
13 pages
Energy-Efficient Resource Management in Fog Computing
No ratings yet
Energy-Efficient Resource Management in Fog Computing
46 pages
Bioinformatics: DNA, RNA, and Sequence Analysis
No ratings yet
Bioinformatics: DNA, RNA, and Sequence Analysis
8 pages
Bayesian Evolutionary Analysis with BEAST
No ratings yet
Bayesian Evolutionary Analysis with BEAST
273 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
15 pages
Understanding Multiple Sequence Alignment
No ratings yet
Understanding Multiple Sequence Alignment
8 pages
To Catch A Chorus Verse Intro or Anything Else Ana
No ratings yet
To Catch A Chorus Verse Intro or Anything Else Ana
5 pages

Understanding Multiple Sequence Alignment

Uploaded by

Understanding Multiple Sequence Alignment

Uploaded by

04-02-2025

◼ Identity and similarity are quantities that

◼ The close kinship between human beings and

◼ The human sequence (red) differs from the chimpanzee

Sequence similarity to Humans

◼ Similarity is a quantitative measure of how

◼ Similarity is assessed as the total number of

Mechanisms Involved in Molecular

Recombination- Exchange of genes or portions of genes between different

Retrotransposition- Incorporation of mRNA sequences back into DNA,

The mechanism by which new genes/proteins arise allow for the

PAM (Percent Acceptable Mutation) matrices Block substitution matrices (BLOSUM)

3) BLOSUM and PAM substitution matrices have the opposite effects:

Gap penalties – Intuitively one recognizes that there should be a penalty

Filtering – many proteins and nucleotides contain simple repeats or regions

Significance of a “hit” during a search - More important than an arbitrary

◼ ii. Iterative Method

◼ The iterative method involves improving an initial

◼ Applications of sequence alignment :

Multiple Sequence alignment

MSA with PILEUP

Sequences are aligned pairwise using dynamic programming algorithm

Resulting alignment is a global alignment produced by the Needleman-

MSA with PILEUP Iterative MSA methods

For closely related sequences, ClustalW is designed to provide an adequate

Editing and formatting alignments Multiple Sequence Alignment

- CINEMA (Color Interactive Editor for Multiple Alignments) web applet

- GDE (Genetic Data Environment) - UNIX based

- GeneDoc - MS Windows [Link]

Problems with Progressive Method

Iterative Multiple Alignment Iterative Methods for MSA

Biological Foundations Terminology

◼ All organisms share a common ancestry

Applications of phylogenetic trees

Phylogenetic Trees Tree Shapes

Tree Characteristics Tree Building Algorithms

Maximum Parsimony Distance Methods

Distance Methods - UPGMA Distance Methods - NJ

Maximum Likelihood Tree Reliability

Analysing the aligned sequence

PHYLIP (Phylogeny Inference Package)

◼ Available free in Windows/MacOS/Linux

◼ Cladistic and phylogenetic analysis using sequence

You might also like