0% found this document useful (0 votes)
11 views8 pages

BLAST and ORF Analysis in Bioinformatics

The document outlines procedures for pair-wise sequence alignment using BLAST, identifying Open Reading Frames (ORFs) with bioinformatics tools, and predicting protein 3D structures through homology modeling. It emphasizes the importance of sequence similarity in evolutionary relationships, gene prediction, and protein function. The document provides step-by-step instructions for using NCBI BLAST, ORF Finder, and SWISS-MODEL for these analyses.

Uploaded by

takuriino11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

BLAST and ORF Analysis in Bioinformatics

The document outlines procedures for pair-wise sequence alignment using BLAST, identifying Open Reading Frames (ORFs) with bioinformatics tools, and predicting protein 3D structures through homology modeling. It emphasizes the importance of sequence similarity in evolutionary relationships, gene prediction, and protein function. The document provides step-by-step instructions for using NCBI BLAST, ORF Finder, and SWISS-MODEL for these analyses.

Uploaded by

takuriino11
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Aim: Pair-wise alignment of sequences (BLAST) and interpretation of the output

Materials Required: Computer with internet, Access to NCBI BLAST tool


([Link] Sample DNA or protein sequences
Introduction
The principle behind pair-wise sequence alignment using BLAST (Basic Local Alignment
Search Tool) is based on finding regions of similarity between two sequences. This similarity
can indicate evolutionary relationships, functional similarities, or even structural homologies.
BLAST is designed to perform local sequence alignments, focusing on finding short, highly
similar regions within sequences. Instead of aligning entire sequences end-to-end (global
alignment), BLAST looks for subsequences with high similarity. This approach is faster and
more efficient, especially for large databases. BLAST uses a heuristic search strategy, which
accelerates the alignment process by finding initial "seed" matches between the query and
database sequences. It starts by identifying short word matches (or "words") between the
sequences, which are then extended in both directions to maximize alignment. Only regions
with scores above a threshold are retained, searching faster than exhaustive methods. BLAST
uses a scoring matrix (like BLOSUM for proteins or PAM) to calculate match scores,
penalizing mismatches and gaps. A high alignment score suggests a close similarity, while
lower scores are less likely to indicate functional or evolutionary relationships. The output
provides critical data points, including Percent Identity (percentage of identical matches),
Query Coverage (portion of the query sequence covered by alignment), and Bit Score
(alignment quality). Through these metrics, one can determine the degree of similarity,
possible function, and potential evolutionary relationship. BLAST includes a graphical
overview of the alignments, showing the positions and quality of alignments along the
sequences. This allows for quick assessment of regions of similarity and conservation.
Procedure:
1. Accessing BLAST:
 Open a web browser and go to the NCBI BLAST website.
2. Choosing the BLAST Program:
 Depending on the sequence type (DNA or protein), select the appropriate BLAST
program (e.g., BLASTn for DNA or BLASTp for protein).
3. Input the Query Sequence:
 Copy the sequence you want to analyze and paste it into the query box.
 Choose a relevant database, such as nr (non-redundant) or refseq.
4. Setting Parameters:
 You may adjust parameters like Expect threshold (E-value), Matrix, and Gap penalties
for fine-tuning.
 Choose the organism database if you’re looking for sequences from a specific species.
5. Run BLAST:
 Click on the "BLAST" button to start the search.
6. Review the Results:
 The results include a list of sequences that align with your query, along with
alignment scores and E-values.

Result: Do Blast and paste your result


Discussion/Conclusion:
Aim: Finding of ORF using bioinformatic tools
Materials Required: Computer with internet, NCBI ORF Finder, Sample DNA sequence
Introduction:
An Open Reading Frame (ORF) is a sequence of DNA that starts with a start codon, usually
ATG in eukaryotes, and ends with a stop codon, such as TAA, TAG, or TGA. ORFs represent
potential protein-coding regions and are fundamental for locating genes within a DNA
sequence. Identifying ORFs is a crucial step in gene prediction, helping scientists to locate
genes and infer possible protein functions. In a DNA sequence, there are six reading frames-
three in the forward direction and three in the reverse. Each frame can potentially contain
start and stop codons, so it is necessary to examine all six frames to identify all possible
ORFs. ORF identification is essential in genomic annotation, enabling scientists to determine
gene locations in both newly sequenced genomes and known genomes of various organisms.
ORF identification is significant in genetic engineering, molecular biology, and
biotechnology, where understanding gene locations and structures is foundational.
Bioinformatic tools are invaluable in automating ORF detection by analyzing DNA
sequences and finding regions between start and stop codons that meet length requirements,
which is especially useful for filtering out short, non-functional ORFs.
NCBI’s ORF Finder is a commonly used online tool, offering a simple interface that
identifies ORFs in all six reading frames and provides information on start and stop positions,
length, and potential translations. EMBOSS getorf, part of the EMBOSS suite, is another
widely used tool available online or locally, providing users with ORF sequences in each
frame. Commercial software like Geneious Prime also has ORF detection features with
graphical displays, simplifying the analysis for researchers. For those with programming
knowledge, Biopython is a Python library that can identify ORFs programmatically, enabling
batch processing of multiple sequences.
The mechanism of ORF identification involves scanning DNA sequences for start codons and
continuing until a stop codon is reached, often with a minimum length filter to eliminate short
ORFs that may not code for functional proteins. After identifying an ORF, tools can translate
it into the corresponding amino acid sequence, offering insights into the protein it may
encode. Interpreting ORF results involves examining start and stop positions, ORF length,
and amino acid sequence. This translated sequence can be further analyzed by comparing it
against databases, such as through a BLAST search, to predict function or identify
homologous sequences. ORF finding is fundamental in gene prediction, protein engineering,
and comparative genomics. It allows scientists to identify, clone, and express proteins for
research or therapeutic applications, compare gene structures across species, and identify
evolutionary patterns.
Procedure:
1: Go to the NCBI ORF Finder website.
2: Input the DNA sequence in FASTA format.
3: Configure any additional settings if needed (e.g., minimum ORF length).
4: Run the tool and interpret the results by reviewing the ORFs and their positions.
5: Download or copy the sequences of interest for further analysis (e.g., BLAST for
homology).
Result: Paste your result
Conclusion:
Aim: Demonstration and prediction of the 3D structure of a protein using bioinformatics tools
Materials and Software Required: Computer with internet access, Bioinformatics tools-
UniProt/PDB for sequence retrieval, SWISS-MODEL for homology modeling
Theory:
Proteins are essential macromolecules in all living organisms, playing vital roles in nearly all
cellular processes. Their functions include enzymatic catalysis, transport, signal transduction,
and structural support. The function of a protein is intricately linked to its 3D structure, which
is determined by the sequence of amino acids that make up the protein. Protein structures are
organized into four levels.
1. Primary Structure: The linear sequence of amino acids in a protein chain, linked by
peptide bonds. This sequence determines the way a protein will fold and, ultimately,
its function.
2. Secondary Structure: Localized conformations within the polypeptide chain, formed
through hydrogen bonding between backbone atoms. The two main types are:
o Alpha-helix: A right-handed coiled structure stabilized by hydrogen bonds
between every fourth amino acid.
o Beta-sheet: A planar structure where strands lie side by side, forming hydrogen
bonds between them.
3. Tertiary Structure: The complete 3D arrangement of all atoms within a single
polypeptide chain, formed through interactions among amino acid side chains. This
includes hydrogen bonds, ionic interactions, hydrophobic interactions, and disulfide
bonds. The tertiary structure defines the protein's specific shape and function.
4. Quaternary Structure: The arrangement of multiple polypeptide chains (subunits) in a
multi-subunit protein. Each subunit may have its own tertiary structure, but together
they function as a single unit.
The 3D structure of a protein is crucial for understanding how it interacts with other
molecules, substrates, or ligands. Knowledge of the structure allows researchers to design
drugs, study disease mechanisms, and understand enzyme catalysis and receptor-ligand
interactions. Since experimental methods such as X-ray crystallography and NMR are time-
consuming and expensive, bioinformatics-based structure prediction has become invaluable
in studying protein structures.
Methods of Protein Structure Prediction
There are three primary methods for predicting protein structures computationally:
1. Homology Modeling: This method predicts the 3D structure of a target protein based
on the structure of a homologous protein (template) with a known structure. It works
well when there is significant sequence similarity between the target and template.
Homology modeling relies on the principle that similar sequences have similar
structures. A widely used online tool for homology modeling, SWISS-MODEL
allows users to input a target protein sequence, select a homologous template, and
automatically build a 3D model.
2. Threading (Fold Recognition): Used when no suitable template is available, but the
target protein may have a fold similar to known protein folds. Threading methods
compare the target sequence with a library of known structures to identify compatible
folds, even in cases with low sequence similarity.
3. Ab Initio Prediction: This approach does not rely on template structures and predicts
protein structures solely based on the physical and chemical properties of amino
acids. It is used when no homologs or templates exist. This method is computationally
intensive but has advanced significantly with tools like AlphaFold, which uses deep
learning to predict accurate protein structures.

Procedure: Homology modeling using SWISS-MODEL


Step 1: Sequence Retrieval
1. Open PDB ([Link]
2. Enter the name of the protein of interest (e.g., Human Hemoglobin).
3. Download the FASTA format sequence for further analysis.
Step 2: 3D Structure Prediction
1. Go to SWISS-MODEL ([Link]
2. Input the protein sequence (FASTA format).
3. For SWISS-MODEL, select a suitable template from the BLAST search results for
homology modeling.
4. Run the model-building process, which may take several minutes.
5. Assess the structure through Ramachandran plot
6. Download the predicted 3D structure file in PDB format.

Result: Paste a picture of the predicted model along with the Ramachandran plot
Conclusion:

You might also like