0% found this document useful (0 votes)

6 views28 pages

Overview of Biological Databases

Biological databases are organized electronic collections of biological data essential for research and analysis. They facilitate data management, retrieval, and the discovery of new biological insights, with ideal features including comprehensive data, user-friendly interfaces, regular updates, and security. Databases are classified into primary, secondary, and composite types, with examples including GenBank, EMBL-EBI, UniProt, and various protein sequence databases.

Uploaded by

mail12allforthis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views28 pages

Overview of Biological Databases

Uploaded by

mail12allforthis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Biological Databases

Dr. Himanshu Avashthi

Assistant Professor
Department of Bioinformatics,
Faculty of Engineering & Technology,
Marwadi University, Rajkot
Biological Databases
• Biological databases are organized collections of biological data that are
stored electronically in a computer system.

• These are essential for storing, managing, and retrieving biological data for
research and analysis.

• These databases enable scientists to access and use vast amounts of

information generated from various experiments and studies in the field of
biology.
Importance of Databases
• Databases act as a storehouse of information.

• Databases are used to store and organize data in such a way that the
information can be easily retrieved through various search criteria.

• It facilitates the discovery of new biological insights from raw data.

Features of Ideal Biological Database
• An ideal biological database should possess several key features to ensure it is useful,
reliable, and accessible for researchers and scientists. Some essential features are:

1. Comprehensive and Accurate Data

✓Should cover a wide range of relevant data.
✓Data should be correct, validated, and up-to-date to maintain scientific credibility.

2. User-Friendly Interface
✓ A user-friendly interface allows researchers to find and use the data efficiently.
✓Advanced search capabilities with filters, and easy retrieval options.
Contd…
3. Regular Updates
✓Frequent updates to incorporate new data and research findings.

4. Accessibility and Availability

✓Preferably free and open (open access) to the scientific community to maximize its
utility.

5. Security and Privacy

✓To protect data from unauthorized access and breaches.

6. Non-redundant
Avoid unnecessary duplication of data.
Other important features are:
Classification of Biological Databases are based on:

1. Data Sources

2. Data types
1. Based on the sources of Biological Data: It is of three types

Primary Databases

Secondary Databases

Composite Databases (OWL, NRDB, UniProt)

1. Primary Databases
• It can also be called an archival database.

• They are populated with experimentally derived (obtained) data such as

nucleotide sequence, protein sequence, or macromolecular structure.

• Experimental results are submitted directly into the database by researchers.

• The data entered here remains uncurated (no modifications are performed
over the data).
Contd…
• The data are given accession numbers when entered into the database.

• Once given a database accession number, the data in primary databases are
never changed: they form part of the scientific record.

• The same data can later be retrieved using the accession number.

• Accession number identifies each data uniquely and it never changes.

Primary Nucleotide Sequence Databases

• The primary repository of nucleotide sequences are:

• GenBank (National Center of Biotechnology Information, USA)

• EMBL-EBI (European Molecular Biology Laboratory), UK
• DDBJ (DNA Data Bank of Japan)

• They synchronize on a daily basis, and the unique accession numbers are
managed consistently.

• Good amount of redundancy.

GenBank ([Link]

• GenBank is the NIH genetic sequence database,

• It is an annotated collection of all publicly available DNA sequences.
EMBL-EBI ([Link]

• EMBL is a repository providing free and unrestricted access to annotated

DNA and RNA sequences.

• Data arrive at ENA from various sources such as submitted raw sequencing
data, sequence assembly information, and routine exchange with INSDC.
DDBJ ([Link]

• DDBJ collects nucleotide sequence data as a member of INSDC and provides freely
available nucleotide sequence data to support research activities in life sciences.
International Nucleotide Sequence Database Collaboration

• INSDC integrates the information of the NCBI, EMBL-EBI, and DDBJ

databases.

• This information is exchanged, updated, and synchronized daily.

Primary Protein Sequence Databases
1. PIR-PSD: (Protein Information Resource-International Protein Sequence Database)

2. SWISS-PROT

3. TrEMBL (Translated EMBL)

Primary Protein Sequence Databases
1. PIR-PSD: ([Link]
• World’s first superfamily-based classified, functionally annotated, comprehensive,
and expertly curated protein sequence database.
Primary Protein Sequence Databases
2. SWISS-PROT: ([Link]
It is a manually curated (annotated) protein sequence database that provides high
levels of annotation information on the protein’s function, domain structure, and
post-translational modifications with minimal redundancies.
Primary Protein Sequence Databases
3. TrEMBL: ([Link]
• It is a computationally annotated protein sequence database
• It contains translation of all coding sequences in the EMBL nucleotide sequence database
that are not yet integrated into Swiss-Prot.
Primary Protein Sequence Databases

• Merger of all three databases (PIR, Swiss Prot, and TrEMBL) into a single
resource i.e. UniProt (Universal Protein Resource) Consortium.
UniProt ([Link]
• It comprises of three sub-databases:
1. The UniProt Knowledgebase (UniProtKB): Includes SwissProt (Manually
annotated) and TrEMBL (Automatically annotated)
It is the main source of functional protein information, providing accurate,
consistent, and detailed annotations.

2. The UniProt Archive (UniParc):

A comprehensive sequence archive that stores all protein sequences—annotated or
unannotated—from major public databases. It is updated daily.

3. The UniProt Reference Clusters (UniRef):

Offers clustered sets of sequences at different identity levels to reduce redundancy
and improve search efficiency across UniProtKB and UniParc.
2. Secondary Databases
• Secondary databases conatain data that is derived from the results of analyzing
primary data.

• In other words, they store the interpreted or processed results of primary data.

• These are also known as curated or annotated databases.

• Secondary databases often integrate information from multiple sources, including

other databases, and scientific literature.

• They are highly curated, often using a complex combination of computational

tools, manual analysis, and interpretation to generate new information or
knowledge from existing scientific data.
Secondary Protein Sequence Databases
1. PROSITE

2. Pfam

3. InterPro
Secondary Protein Sequence Databases
1. PROSITE: ([Link]
• It is a manually curated database of protein domains, families, and
functional (catalytic) sites.
2. Pfam: ([Link]
• It is a collection of curated protein families to provide complete and accurate
information.
• Rather than performing a typical BLAST search, Pfam uses the HMM
algorithm, which gives greater weight to matches at conserved sites,
allowing better homology detection.
3. InterPro: ([Link]
• It provides functional analysis of proteins by classifying them into families and
predicting domains and important functional sites.
• To classify proteins, InterPro uses predictive models, known as signatures,
provided by several different databases (referred to as member databases) that
make up the InterPro consortium.
References
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]
• [Link]

UNIT - 1 - SEC Primary and Secondary Database
No ratings yet
UNIT - 1 - SEC Primary and Secondary Database
12 pages
Biotech Document Databases Overview
No ratings yet
Biotech Document Databases Overview
49 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
26 pages
DNA and Protein Databases Overview
No ratings yet
DNA and Protein Databases Overview
65 pages
Overview of Bioinformatics Databases
No ratings yet
Overview of Bioinformatics Databases
48 pages
Primary and Secondary Databases in Bioinformatics
No ratings yet
Primary and Secondary Databases in Bioinformatics
66 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
29 pages
Types of Biological Databases Explained
No ratings yet
Types of Biological Databases Explained
18 pages
Database
No ratings yet
Database
40 pages
Biological Data and Database
No ratings yet
Biological Data and Database
13 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
19 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
21 pages
Biotech Document Database Overview
No ratings yet
Biotech Document Database Overview
50 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
16 pages
Types and Features of Biological Databases
No ratings yet
Types and Features of Biological Databases
50 pages
Bioinformatics Notes For 10 Marks
No ratings yet
Bioinformatics Notes For 10 Marks
17 pages
Introduction to Molecular Bioinformatics
No ratings yet
Introduction to Molecular Bioinformatics
48 pages
Major Bioinformatics Databases Overview
No ratings yet
Major Bioinformatics Databases Overview
36 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
31 pages
Overview of GenBank in Bioinformatics
No ratings yet
Overview of GenBank in Bioinformatics
31 pages
Overview of Bioinformatics Databases
75% (4)
Overview of Bioinformatics Databases
5 pages
Overview of Bioinformatics Databases
No ratings yet
Overview of Bioinformatics Databases
28 pages
Nucleotide Sequence Analysis Tools
No ratings yet
Nucleotide Sequence Analysis Tools
75 pages
2 - BTE 401 Biological Databases
No ratings yet
2 - BTE 401 Biological Databases
34 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
17 pages
Biological Databases in Bioinformatics
No ratings yet
Biological Databases in Bioinformatics
9 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
35 pages
Biotech Database Classifications Overview
No ratings yet
Biotech Database Classifications Overview
16 pages
Materi, 2 - Biological Databases
No ratings yet
Materi, 2 - Biological Databases
22 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
135 pages
Classification of Biological Databases
No ratings yet
Classification of Biological Databases
31 pages
Major Bioinformatics Databases Overview
No ratings yet
Major Bioinformatics Databases Overview
54 pages
Understanding Biological Databases in Bioinformatics
No ratings yet
Understanding Biological Databases in Bioinformatics
28 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
25 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
18 pages
Biological Databases Overview and Retrieval
No ratings yet
Biological Databases Overview and Retrieval
36 pages
Different Databases in Biology Detailed
No ratings yet
Different Databases in Biology Detailed
9 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
16 pages
Classification of Biological Databases
No ratings yet
Classification of Biological Databases
50 pages
Secondary Databases in Bioinformatics
No ratings yet
Secondary Databases in Bioinformatics
5 pages
Introduction to Biological Databases
No ratings yet
Introduction to Biological Databases
73 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
25 pages
Bioinformatics Lecture Notes Overview
No ratings yet
Bioinformatics Lecture Notes Overview
28 pages
Module2 Part3 OEBI
No ratings yet
Module2 Part3 OEBI
48 pages
Introduction to Bioinformatics Databases
No ratings yet
Introduction to Bioinformatics Databases
21 pages
Biological Databases in Biotechnology
No ratings yet
Biological Databases in Biotechnology
47 pages
Overview of Bioinformatics Databases
No ratings yet
Overview of Bioinformatics Databases
65 pages
Composite Databases in Bioinformatics
No ratings yet
Composite Databases in Bioinformatics
15 pages
Bioinformatics and Biological Databases
No ratings yet
Bioinformatics and Biological Databases
60 pages
Bioinformatics Database Types and Examples
No ratings yet
Bioinformatics Database Types and Examples
17 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
24 pages
Williams Textbook of Endocrinology. 13th Edition.
97% (33)
Williams Textbook of Endocrinology. 13th Edition.
23 pages
Masterclass in Biology by NK Sharma PDF
0% (1)
Masterclass in Biology by NK Sharma PDF
3 pages
Genotype and Phenotype Practice Worksheet
No ratings yet
Genotype and Phenotype Practice Worksheet
2 pages
Biochemical Tests: Oxidase, Catalase, Coagulase
No ratings yet
Biochemical Tests: Oxidase, Catalase, Coagulase
5 pages
Genetic Markers in Anthropology for UPSC
No ratings yet
Genetic Markers in Anthropology for UPSC
12 pages
Mycelium-Based Biomaterials Review
No ratings yet
Mycelium-Based Biomaterials Review
11 pages
Agricultural Sciences NSC P2 Memo Nov 2022 Eng
No ratings yet
Agricultural Sciences NSC P2 Memo Nov 2022 Eng
12 pages
ICSE Class 10 Biology Sample Paper 2024
No ratings yet
ICSE Class 10 Biology Sample Paper 2024
8 pages
AQA Biology: Biological Molecules Checklist
No ratings yet
AQA Biology: Biological Molecules Checklist
5 pages
IGCSE Biology Workbook Answers 4th Ed
No ratings yet
IGCSE Biology Workbook Answers 4th Ed
1 page
Immunohematology Lecture Notes
100% (1)
Immunohematology Lecture Notes
4 pages
Natural Products in Antiparasitic Therapy
No ratings yet
Natural Products in Antiparasitic Therapy
57 pages
A Variational Approach To Bayesian Phylogenetic Inference: Cheng Zhang
No ratings yet
A Variational Approach To Bayesian Phylogenetic Inference: Cheng Zhang
56 pages
Omic Methods for Bioactive Peptide Production
No ratings yet
Omic Methods for Bioactive Peptide Production
49 pages
DNA Evidence in the Sheppard Case
No ratings yet
DNA Evidence in the Sheppard Case
4 pages
Adhesives and Mounting Media in Histology
No ratings yet
Adhesives and Mounting Media in Histology
18 pages
Azithromycin Metal Complexes for COVID-19
No ratings yet
Azithromycin Metal Complexes for COVID-19
8 pages
Mader Biology: Ap Edition AP Biology Mader Sylvia S. Mader - PDF Format
100% (6)
Mader Biology: Ap Edition AP Biology Mader Sylvia S. Mader - PDF Format
189 pages
AQA A-Level Biology: Immunity Revision Guide
No ratings yet
AQA A-Level Biology: Immunity Revision Guide
44 pages
Antibiotic Discovery from Shipworm Symbiosis
No ratings yet
Antibiotic Discovery from Shipworm Symbiosis
15 pages
Antigenic Evolution of SARS-CoV-2
No ratings yet
Antigenic Evolution of SARS-CoV-2
3 pages
Base Analogue Mutagens Overview
No ratings yet
Base Analogue Mutagens Overview
16 pages
DNA Integrity and Semen Quality in Infertility
No ratings yet
DNA Integrity and Semen Quality in Infertility
12 pages
Recombinant Vaccines: Updates & Prospects
No ratings yet
Recombinant Vaccines: Updates & Prospects
14 pages
Translation Process in Protein Synthesis
No ratings yet
Translation Process in Protein Synthesis
1 page
Managing ADHD in Children: A Guide
No ratings yet
Managing ADHD in Children: A Guide
19 pages
Genetics Overview and Mnemonics
No ratings yet
Genetics Overview and Mnemonics
5 pages
Molecular Biology Internship Exam 2022-2023
No ratings yet
Molecular Biology Internship Exam 2022-2023
2 pages
3D In Vitro Models for Glioblastoma Research
No ratings yet
3D In Vitro Models for Glioblastoma Research
18 pages
Biology Paper 1 Marking Scheme 2024
No ratings yet
Biology Paper 1 Marking Scheme 2024
10 pages

Overview of Biological Databases

Uploaded by

Overview of Biological Databases

Uploaded by

Biological Databases

Dr. Himanshu Avashthi

• These databases enable scientists to access and use vast amounts of

• It facilitates the discovery of new biological insights from raw data.

1. Comprehensive and Accurate Data

4. Accessibility and Availability

5. Security and Privacy

Composite Databases (OWL, NRDB, UniProt)

• They are populated with experimentally derived (obtained) data such as

• Experimental results are submitted directly into the database by researchers.

• Accession number identifies each data uniquely and it never changes.

• The primary repository of nucleotide sequences are:

• GenBank (National Center of Biotechnology Information, USA)

• Good amount of redundancy.

• GenBank is the NIH genetic sequence database,

• EMBL is a repository providing free and unrestricted access to annotated

• INSDC integrates the information of the NCBI, EMBL-EBI, and DDBJ

• This information is exchanged, updated, and synchronized daily.

3. TrEMBL (Translated EMBL)

2. The UniProt Archive (UniParc):

3. The UniProt Reference Clusters (UniRef):

• These are also known as curated or annotated databases.

• Secondary databases often integrate information from multiple sources, including

• They are highly curated, often using a complex combination of computational

You might also like