0% found this document useful (0 votes)
51 views66 pages

Multalin Tool for Sequence Alignment

The document provides an index of topics covered in a Bioinformatics course. The topics include using various bioinformatics tools like Multalin, RNAfold, BLAST, EMBOSS, Clustal Omega, KEGG pathways, PDB, SCOP, CATH, tRNAscanSE, Rasmol, and DendroUPGMA. Procedures for performing multiple sequence alignment of 16S rRNA using Multalin, predicting RNA secondary structure using RNAfold, performing BLAST searches, and utilizing other databases and tools are outlined. Screenshots from using the various tools are also included to demonstrate how to access and analyze data using different bioinformatics resources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views66 pages

Multalin Tool for Sequence Alignment

The document provides an index of topics covered in a Bioinformatics course. The topics include using various bioinformatics tools like Multalin, RNAfold, BLAST, EMBOSS, Clustal Omega, KEGG pathways, PDB, SCOP, CATH, tRNAscanSE, Rasmol, and DendroUPGMA. Procedures for performing multiple sequence alignment of 16S rRNA using Multalin, predicting RNA secondary structure using RNAfold, performing BLAST searches, and utilizing other databases and tools are outlined. Screenshots from using the various tools are also included to demonstrate how to access and analyze data using different bioinformatics resources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

(Semester IV, Paper code: MMCB4413)

Roll number: 636


Registration number: A01-1112-0097-21
Subject: Bioinformatics
INDEX

[Link]. Date Topic Page Prof.


No.

1. Multalin 1-8 AB

2. RNA fold 9-13 AB

3. BLAST 14-19 KS

4. EMBOSS and Clustal Omega 20-27 KS

5. KEGG pathway 29-38 SSC

6. PDB 39-47 SSC


- The prediction of primary sequence
- Secondary structure prediction tool
- Prediction of tertiary structure of protein
by comparative/ homology modelling

7. SCOP AND CATH 48-52 SSC

8. tRNAscanSE 53-59 AB

9. Visualization of the structure of Aspirin using 60-63 KS


Rasmol

10. DendroUPGMA 64-65 SSC


ACCESSION TO MULTALIN
Date:
Introduction
JOB TITLE: To find the similarities between 16S rRNA between two species using
MultAlin
INTRODUCTION:
16S rRNA
16S ribosomal RNA (or 16S rRNA) is the RNA component of the 30S subunit of
a prokaryotic ribosome (SSU rRNA). It is also the component of the 30S subunit of the
ribosome of eukaryotic mitochondria and chloroplasts. It binds to the Shine-Dalgarno
sequence. 
MultAlin
Multalin is a multiple-sequence alignment tool for protein and nucleic acid sequences created
by Florence Corpet.
Exercise: The following steps are performed to compare the 16sRNA of two species
Step 1: First, we open the NCBI (National Centre for Biotechnology Information) webpage.
MultAlin for Methanopterin and THF reductase

Step 1: First, we open the NCBI (National Centre for Biotechnology Information) webpage.
Step 2: Select Genome from the drop-down box and then press enter.
Step 3: For the above website for genomes, scroll down and select the prokaryotic reference
genome option.

Select any two organisms, for the multiple sequence alignment, in our case we have chosen
Archaeoglobus fulgidus and Acetobacterium woodii
Step 4: Scroll down to choose Refseq (reference sequence). Then customise the view of the
webpage to see gene and RNA.
Scroll and then click on FASTA.
Now, go back and copy-paste the FASTA sequence of both organisms individually and paste
it in the given box.
After pasting the sequence “start multalin”
The data run will show the following data, where the genome will be highlighted in red, blue
or black.
Red: Highly conserved sequence.
Blue: Variable Sequence.
Black: Neutral Value.
ACCESSION TO RNA fold
Date:
Introduction
RNA folding is the process by which a linear ribonucleic acid (RNA) molecule acquires secondary
structure through intra-molecular interactions.
Theory
Ribonucleic acid (RNA) is one of the key players in molecular biology and has in the past attracted
theoretical and experimental physicists because of its intriguing structural and functional properties.
RNA molecules are used for the synthesis of proteins, they act as messengers. Both DNA and RNA
are composed of subunits, the so-called nucleotides or bases. The nucleotides are linked together by
phosphodiester linkages through the hydroxyl group on the sugar on one nucleotide and the phosphate
on the next one. As a result, one can observe a strand with a 5’end where a free phosphate group can
be found, and a 3’end with a free hydroxyl group. Important aspect of the prediction of RNA
secondary structure is that there are many sequences whose structures have not yet been
experimentally determined and
for which there are no homologues in the databases from which the structure could be derived. Hence
it is a good idea to predict the structure. Moreover, it has been shown that RNA secondary structure
prediction has applications to the design of nucleic acid probes.
Procedure
[Link] was opened. Genome option was selected. Then prokaryotic reference genomes was chosen.

2. We have to search for the reference sequence. In this case we used Escherichia coli.
3. We obtain the RNAase P.

4. The FASTA sequence is copied.


5. The DNA sequence is converted to RNA sequence using Biomodel transcription and translation.
Link: [Link]
6. RNAfold web webserver was opened. The sequence was copied and pasted in the sequence field of
RNAfold web server.

[Link] page showing the dot bracket structure is opened for the Minimum free energy structure
An equivalent Graphical output is also seen with the graph showing the minimum Free Energy
Structures
ACCESSION TO BLAST - BASIC LOCAL ALIGNMENT SEARCH TOOL
Date:
INTRODUCTION:
BLAST or Basic Local Alignment Search Tool is an alignment tool that finds sequences from
a large database which show significant alignment to our query sequence. These sequences
are called subject sequences. BLAST is accessed through the National Centre for
Biotechnology Information (NCBI) website.
There are four types of BLAST:
● blastn- Nucleotide BLAST (nucleotide query sequence is compared with nucleotide
subject sequences).
● blastp- Protein BLAST (protein query sequence is compared with protein subject
sequences).
● blastx- (translated nucleotide sequence is compared against protein sequences).
● tblastn- (protein sequence is compared against translated nucleotide sequences).

PROCEDURE:
STEP 1: We go on our web browser and search NCBI website. We select protein from the
drop down option and write p53 on the search bar and click on search.

Query sequence: P53 [Cricetulus griseus]

GenBank: AAC53040.1

STEP 2: We select the first result and click on the ‘FASTA’ to obtain the FASTA sequence.
STEP 3: We select the entire sequence and copy it.

STEP 4: On a separate tab we open ‘BLAST’ and click on Protein BLAST.


STEP 5: In the space provided under “Enter query sequence” we paste our query sequence.

STEP 6: We scroll down and click on ‘BLAST’ to run BLAST analysis.


OBSERVATION:
● At the top of our result page we see information about our query sequence like its
query ID, molecule type, length etc.

● On scrolling down, we see the list of 100 subjects which showed the most significant
alignment to our query sequence under the ‘Description’ list. This also shows the
scientific name, maximum and total scores based on alignment, query coverage
showing how much of our query sequence is covered by the subject sequence, e value
(expected value) which here is zero showing maximum alignment, percentage
identity, accession length and number of the subjects.

● Clicking on the ‘Graphic Summary’ we can see the graphical representation of the
query and subject sequence alignment. Here, the sequences are red indicating the
alignment score is more than 200.

● Under ‘Alignments’ we can see the protein sequence alignment. The sequence of the
query is written on the 1st line and that of the subject is written on the 3rd line. If the
proteins align perfectly then that protein symbol is written, if it doesn’t align a gap is
left and incase of alignment of two proteins that are chemically similar, a plus sign (+)
is written in the 2nd line.

● Under ‘Taxonomy’ we can see the lineage, taxonomy and the organisms from where
the subject protein is obtained.

CONCLUSION: BLAST is a local alignment tool which helps us find sequences that show
significant alignment to our query sequence, giving us an idea about the query sequence, its
function or its species of origin.
Pairwise sequence alignment using EMBOSS needle
Date:
ACCESSION TO CLUSTAL OMEGA
Date:
Introduction
Clustal Omega is a multiple sequence alignment tool and it is very useful to align divergent sequences
and find relation among them. It is used for aligning multiple nucleotide or protein sequences in an
efficient manner. It uses progressive alignment methods, which align the most similar sequences first
and work their way down to the least similar sequences until a global alignment is created. ClustalW
is a matrix-based algorithm, whereas tools like T-Coffee and Dialign are consistency-based. ClustalW
has a fairly efficient algorithm that competes well against other software. This program requires three
or more sequences in order to calculate a global alignment, for pairwise sequence alignment (2
sequences) use tools similar to EMBOSS, LALIGN.

Steps
 Select first 10
 Download file
 Copy and paste in clustal omega
Red- small hydropho
bic residues
Blue- acidic
Magenta- basic
Asterix - conserved

Finally, here is the phylogenetic tree obtained by performing Clustal Omega.


Accession to pathway database: KEGG pathway
Date:

a. Metabolism
- Glycolysis
b. Genetic processing
- RNA polymerase
c. Environmental Information Processing
- Bacterial secretion system
d. Cellular Processes
- Endocytosis
e. Organismal systems
- Neutrophil Extracellular Trap Formation
f. Human Diseases
- Vibrio cholerae infection
g. Drug Development
- Cephalosporins
Accession to RCSB protein database – PDB
Date:
The prediction/ characterization of primary sequence

OPEN NCBI
Copy sequence and paste
Secondary structure prediction tool
Prediction of tertiary structure of protein by comparative/ homology modelling
ACCESSION TO SCOP AND CATH DATABASES
DATE:
1. SCOP
INTRODUCTION: SCOP is a protein classification database. It stands for Structural
classification of proteins. It provides a detailed description of the structural and evolutionary
relationships between all the proteins with known structures. The various levels of SCOP are:
class, fold, superfamily, family, protein domain and species.
ACCESSION STEPS
STEP 1: Type [Link] or search for SCOP in search box of google.
The homepage of SCOP database appears. The ID of our desired protein sequence (as
obtained from NCBI, 1A6M in this case) is typed in the search box. Results are displayed.

STEP 2: The ancestry (class, fold, superfamily, family and domain) can be observed. It also
shows that the ID 1A6M is of myoglobin in species of Physeter catodon.
STEP 3: The structure of the myoglobin molecules is observed.
2. CATH
INTRODUCTION: CATH is a protein classification database. It stands for Class,
Architecture, Topology and Homologous superfamily. It provides information according to
the evolutionary relationship of protein domains.
STEP 1: Type [Link] or search for CATH in search box of google. The homepage of
CATH database appears. The ID of our desired protein sequence (as obtained from NCBI,
1A6M in this case) is typed in the search box.

STEP 2: The appropriate results are displayed as per the gene ID submitted.

STEP 3: The first option ([Link]) is selected. A general summary of the superfamily is
shown.
STEP 4: The structure is shown as follows:

STEP 5: The CATH classification is displayed. The EC number is described showing that
the submitted ID is of a protein which belongs to- Class: Alpha proteins, Architecture:
Orthogonal bundle, Topology: Globin-like, Homologous superfamily: Globins.
STEP 6: The functional families under the superfamily (of 1A6M) can be seen along with
their total sequences.

STEP 7: Structural neighbourhood of the superfamily is seen. It shows that the sequences
belonging to the same homologous superfamily have very similar percent identity.
ACCESSION TO tRNAscanSE
Date:
Introduction
tRNAscan-SE has been the software of choice for predicting transfer RNA (tRNA) genes in genomic
sequences. Not only basic researchers, users of tRNAscan-SE include biologists, database annotators
and sequencing centres too. One or more sequences can be analysed together. The users are also asked
to state the source of the genome, if known. The sequnce may be uploaded as a FASTA format or
typed/pasted in the text editor on the website. The tRNAscan-SE web server used here is a convenient,
ready-for-use means to identify tRNA genes in one or more query sequences. The graphical interface
also provides easy navigation to the details of prediction results and a quick way to learn about the
features of the software without requiring familiarity with UNIX-based commands or installation on
one’s own computer. However, web-based analysis limits query sequences to a maximum of five
million base pairs. The standalone version can be used for larger genomic sequences.
Go to trna scan SE
3’ END

5’ END

CLASS 1

Types of bp = 3 AU, GC, GU


The bond between GC pairs in RNA helices appears as red dots and AU appears as blue
dots.
JOB: Visualization of the structure of Aspirin using Rasmol
Date:
ACCESSION TO DendroUPGMA
Date:
Introduction
UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a straightforward approach to
constructing a phylogenetic tree from a distance matrix. It is the only method of phylogenetic
reconstruction dealt with in which the resulting trees are rooted. The unweighted term indicates that
all distances contribute equally to each average that is computed and does not refer to the math by
which it is achieved.

DendroUPGMA homepage

Distance matrix

Similarity matrix
Steps
1. We take the FASTA sequence of the testis determining factor gene of both human and
monkey separately from NCBI and input them as instructed in the dialogue box.

2. We decide upon the parameters we want to base this dendogram on and click on
submit.

3. We get the different types of matrices based on the parameters we had set

You might also like