Paper ZooMT-604
Unit IV
Topic: Biological databases
Dr. K.M. Mazumder
Department of Zoology
Dhemaji College, Assam
What is Bioinformatics?
• Bioinformatics is the application of Information
technology to store, organize and analyze the
vast amount of biological data which is
available in the form of sequences and
structures of proteins and nucleic acids.
How NCBI defines?
• “Bioinformatics is the field of science in which
biology, computer science, and information
technology merge into a single discipline.
• There are three important sub-disciplines within
Bioinformatics:
– Development of new algorithms and statistics which
assess relationships among members of large data sets,
– Analysis and interpretation of various types of data
including nucleotide and amino acid sequences, protein
domains, and protein structures; and
– Development and implementation of tools that enable
efficient access and management of different types of
information.”
Biological Databases
• Biological database design, development, and
long-term management are a core area of the
discipline of bioinformatics.
• A biological database is a collection of data
that is organized so that its contents can
easily be accessed, managed, and updated.
The activity of preparing a database can be
divided into:
– Collection of data in a form which can be easily
accessed
–
Types of Databases
• Primary database: contains information of
the sequence or structure alone. E.g. Swiss-
Prot & PIR for protein sequences, GenBank &
DDBJ for Genome sequences and the Protein
Databank for protein structures.
• Secondary database contains derived
information from the primary database. E.g.
SCOP, CATH, PROSITE, eMOTI, etc.
Sequence databases
Nucleotide and protein sequence databases represent
the most widely used and some of the best
established biological databases.
These databases serve as repositories for wet lab
results and the primary source for experimental
results. Major public data banks which takes care of
the DNA and protein sequences are GenBank in
USA, EMBL (European Molecular Biology
Laboratory) in Europe and DDBJ (DNA Data Bank)
in Japan.
Structure databases
• Knowledge of protein structures and of molecular
interactions is key to understanding protein functions and
complex regulatory mechanisms underlying many
biological processes.
• Protein Data Bank: PDB is the single worldwide archive
of Structural data of Biological macromolecules, established
in Brookhaven National Laboratories in 1971. It contains
Structural information of the macromolecules determined
by X-ray crystallographic, NMR methods. PDB is
maintained by the Research Collaboratory for Structural
Bioinformatics (RCSB).
Pathway databases
• Development of metabolic databases derived from the
comparative study of metabolic pathways will cater the
industrial needs in more efficient manner to further the
growth of systems biotechnology. Some examples of the
pathway databases are KEGG, BRENDA, Biocyc.
• KEGG: KEGG is the primary resource for the Japanese
GenomeNet service that attempts to define the relationships
between the functional meanings and utilities of the cell or
the organism and its genome information.
• KEGG contains three databases: PATHWAY, GENES, and
LIGAND. The PATHWAY database stores computerized
knowledge on molecular interaction networks.
Lets see the following databases
• NCBI (database of databases)
• GenBank
• PubChem Compounds
• PubMed
• KEGG pathways
• PDB
Thank you
Wish you good health