0% found this document useful (0 votes)

4 views10 pages

Bioinformatics Storing Databases

The document discusses the role of bioinformatics in managing and analyzing vast biological data generated from genome sequencing projects. It highlights the integration of computer science with biology to solve complex biological problems, including gene discovery and protein structure analysis. Additionally, it covers the evolution of bioinformatics databases and cloud storage solutions that facilitate data sharing and access for researchers globally.

Uploaded by

Yusra Al-Najjar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views10 pages

Bioinformatics Storing Databases

Uploaded by

Yusra Al-Najjar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Technium BioChemMed Vol. 2, Issue 4 pp.

96-105 (2021)
ISSN: 2734-7990
[Link]

Bioinformatics Storing Databases

Raghad M. M. Abed1, Yusra A. Y. Al-Najjar2*

1 Medical Analysis, Al-Balqa’a Applied University As Salt – Jordan, ORCID ID:
[Link] email: [Link]@[Link]

2 College of Computer Science, Taibah University, Al-Madinah Al-Munawarah –

Saudi Arabia, ORCID ID: [Link] email:
yalnajar@[Link]

*Corresponding author: Yusra Al-Najjar, yalnajar@[Link]

Abstract. An exceptional branch of data that requires huge databases has been shown lately from
genome sequencing projects which is a field that employs computational approaches to answer
biological questions. With this huge sequence of information that is available for researchers,
bioinformatics plays a big role in studying basic medical-biological problems. The challenge that
faces bioinformatical scientists is to help in discovering genes and designing molecular models,
site-directed mutagenesis, and other experiments that reveal the unknown relationships
concerning the structure and function of genes and proteins. This become a big challenge
especially with the huge amount of data that is generated using the human genome and other
systematic sequencing efforts up till now. Bioinformatics solves biological problems depending
on available data. It is concerned with creating databases and predicting the outcome of lab
experiments.

Keywords. Bioinformatics, chromosomes, databases, gene, genome, protein, DNA sequencing

1. Introduction:
Bioinformatics science is an integration of combining biology with computer science and information
technology. It is also the statistical, mathematical, and computing methods used in solving biological
problems using DNA and amino acid sequences and their related information. See figure 1. In general,
we can say that bioinformatics is a management information system for molecular biology and has many
practical applications.

Fig. 1: fields involved in Bioinformatics

96
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

Bioinformatics differs from computational biology where bioinformatics concentrates on the

structure, function, and analysis of genes and genomes and their products. It is an interdisciplinary field
concerning developing new methods and software tools for understanding biological data.
Bioinformatics is limited to genes sequencing and genomes besides their corresponding products which
makes it considered computational molecular biology, whereas computational biology includes all
biological fields that require computation and is restricted to the theoretical development of algorithms
that are used for bioinformatics. Bioinformatic is a way that allows scientific researchers to access
biological databases freely. This paper discusses how computer science and technology are employed
for biology.

2. General Review

2.1. Why bioinformatics?

The goal of bioinformatics is to understand living cell in a better way and how it works on the molecular
level by analyzing the raw molecular sequences and the structural data. Bioinformatics studies could
generate a new vision for the cell. Figuring cell functions is better understood by the flow of generic
information which is committed to the central dogma of biology in which the DNA is copied into RNA
which is translated into protein [1]. See figure 2.
The functions of the cells are performed mainly by proteins whose capabilities are determined
through their sequence. So, solving problems using sequences and structure proved to be helpful [2].
Bioinformatics is not important for biological genomes and basic nuclear only but has a great effect on
many other fields of biological technology and biomedical sciences. Bioinformatics has many
applications such as drug design which is based on knowledge, forensic DNA analysis, and agriculture
biotechnology [2].

Fig. 2: Central Dogma [2]

2.2 Proteins
Proteins are macromolecules made of a smaller sequence of amino acids that differ in their structure and
characteristics. Mainly humans need 20 different types of amino acids which could be classified into 2
groups - based on the ability of the cells to make - into essential amino acids ( include “histidine,
isoleucine, leucine, lysine, methionine, phenylalanine, threonine, tryptophan, and valine” ) and that the
body can't produce thus must be obtained from the diet and the non-essential amino acids (include

97
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

Alanine, Arginine, Asparagine, Aspartic acid, Cysteine, Glutamic acid, Glutamine, Glycine, Proline,
Serine, and Tyrosine) which the cell genome has the genetic recipe to make [3]. See figure 3.

Fig. 3: Types of amino acids [3]

2.3 Amino acids

Amino acids are monomers of the proteins, each amino acid has a central carbon atom bound to a
carboxyl group, amine group, hydrogen atom, and the R side chain that give the variation in amino acids.
Some of These amino acids are encoded in the human genome meaning that they are produced from
cells through transcription and translation. Each amino acid has a specific sequence in the DNA
nucleotides code called codon (made of 3 nucleotides). See figure 4.

Fig. 4: Amino acid general structure [3]

2.4 Gene, genome, and DNS

The word genome describes the whole genetic information in our cells, this information is coded in the
DNA sequence. DNA used 4 basic nucleotides to write the information (four types of chemical building
blocks, adenine, thymine, cytosine, and guanine, with abbreviations A, T, C, and G.) the arrangement
of these nucleotides provides the information about specific functional proteins, the arrange of specific
nucleotides called a gene. During living, the cell enters different stages which are called cell-cycle to
develop and multiply, normally the DNA strands are found unpacked within the nuclease but at a certain
stage it needs to be packed using specific proteins (histones) and form defined chromosomes (an X shape
structure) to ease the separation when cells multiply. Humans have 24 pairs of chromosomes within
their cells representing the genome. See figure 5.
The information carried within the genome is varied it holds all the instructions and information that
helped you to develop from a single cell into a full person. It guides the growth, helps organs to do their
jobs, and repairs itself when damaged. The genome is unique to each organism [4].DNA in the cell is
not a continuous long molecule; it is divided into parts of uneven lengths. These parts could be packed
bundles at some points of the cell life cycle. These packed bundles are called chromosomes and they
look like an X shape. Every creature has a specific number of chromosomes, e.g., a human has 46
chromosomes (23 pairs), plant rice has 24 chromosomes. See figure 6.

98
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

Fig. 5: Genetic arrangement in the cell [4]

Fig. 6: pairs of chromosomes [4]

3. Computer Science Behind and Sequencing

In 1950, Alan Turing’s Automatic Computing Engine (ACE) was introduced, three years before the
publication of DNA structure. ACE was an early electronic serial stored-program computer. In 1970,
EF Codd introduced a data model that proved to be essential for managing a huge amount of data. Later
in 1977 DNA sequencing method by Frederick Sanger was published, and at the same time, computer
scientists were about to announce the first draft of the human genome [5].
Proteins are an important issue to people and that is because the changes in proteins are the main
reason for disease-causing mutations. Sanger was a chemist who said that biology and chemistry were
both needed for Human Genome Project to succeed. And now the third strand is computer science [6].
Many disease-causing mutations occur in proteins which become of great interest. The DNA sequence
codes for the amino acids were prepared earlier by reading the DNA sequence from a gel and then
translating it to amino acids, but this was slow and boring. The computer started to show by 1976 and
started to work closely with chemists and biologists in producing DNA sequence data. The operation of
converting triplet DNA code into an amino acid sequence for printing was easily done using computer
software [7].
After some years, labs all over the world started to produce more and more data sequences. And
scientists started to get sequences from other labs to do comparisons. For example, sequences for beetles
from different labs were compared to see how closely the beetles were related. Sequenced records were
printed in a journal, but with the existence of computers, people started to share the data using networks.
Michael Ashburner - who was a geneticist at Cambridge University - tried in 1980 to compare his
sequence data with the data of Stanford University. He used the internet to show the problems he faced.
But at that time there was a problem in communicating since network protocols used in the UK were
different from the protocols used in the US [8]. The existence of a shared warehouse of data was a good

99
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

solution for sharing data problems. So, in 1981 the “European Molecular Biology Laboratory (EMBL)
electronic library” for the nucleotide sequence data was established in Heidelberg. And because there
was rapid growth for this warehouse, there was a need for a database management system [9].
Now, these databases are available, and the sequence data are freely available over the internet. Other
databases were founded in the US and Japan, so people can share data from their personal computers.
By completing the project of the Human Genome in the year 2003, for the first time, the sequencing for
the human genome was accomplished. This sequencing was very expensive since it costs around one
billion dollars and needed 13 years to finish. Nowadays, human genome sequencing could cost around
one thousand dollars and could be accomplished in less than two days. The major factor in DNA
sequencing technology besides scientists’ knowledge was the advancement in computer science and
engineering [10].
The amount of storage that computer engineering accomplished had a great effect in affording a
space for storing such a huge data of DNA sequencing, in addition to a processing speed that came a
long way since 2003. The methods that are used for DNA sequencing become more advanced than
before. Previous processing for Human Genome Project – Sanger Sequencing – is implemented in
reading a small fragment of DNA. Then these small pieces are put together to assemble the full genome
[6]. The technique that is used nowadays is the Next-Generation Sequencing (NGS), it works in parallel.
NGS process many micro-scale reactions simultaneously, and the result is 15,000 times more generated
data per day more as a Sanger Sequencer [10]. See figure 7.

Fig. 7: Sanger versus next-generation sequencing [10]

Due to the progress in processing, sequencing the entire genome using an NGS machine per day
became easy, whereas it required years using Sanger Sequencer. Increment in processing requires
increment in storing capacity since the production of NGS machines could reach 1 terabyte of data per
day. Memory upgrade RAM (Random Access Memory) also played a big role in storing generated
information. Without this fast computational advancement, we cannot have this low cost of DNA
sequencing.

3.1 Computational Biology

After sequencing the genome, the processing operation starts. In this operation, millions of data points
will be analyzed seeking variations (mutations) within the genome. Analyzing the genome and detecting
mutations could help in identifying the cause of several diseases.
To do this task manually would be impossible while using computational biology, doctors and
scientists become able to detect mutations easily. To derive the meaning from the sequenced data,
computational biologists use different techniques including pattern-matching algorithms, mathematical
modeling, and image enhancement. They might also use simulation processes to find out how some
biological systems will behave in different environments. The simulation shows the reaction of cancer
cells according to different drug treatments which could lead to finding a cure. These simulations and
models might one day lead to finding new treatments for several diseases [11].

3.2 Algorithms used in Bioinformatics

• Comparing sequences: a comparison between long sequences allowing for insertion, deletion,
and mutation of symbols

100
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

• Constructing evolutionary (phylogenetic) trees: a comparison between sequence of different

organisms, and building trees depending on similarities
• Detecting patterns in sequences: a making searches for genes in DNA or sequences of amino
acids
• Inferring cell regulation: modelling the cell from the data
• Determining 3D structures: like inferring RNA shape from sequences, and protein shape from
amino acids sequences
• Using scripts language: using a script on the internet to analyze the data [12].

4. Bioinformatics Databases and Cloud Storage

Due to the increment in genetic information, different cloud storage platforms and online databases and
cloud storage have appeared that gave the chance for these data to be analysed and accessed. These
platforms provided required storage space for about 1 in 25 Americans’ DNA to be stored and referenced
when needed. Storing data using cloud-based databases allowed global teams to work at the same time
and on the same data and find solutions for problems that face them.
In the field of law, DNA databases play a big role. In April 2018 in the USA, the Golden State Killer
was arrested due to genetic information that a relative of him shared on the GEDMatch genealogy
website [13]. Many USA states besides other countries collect DNA information from different crime
scenes and arrest people for different crimes depending on the criminological database. Despite the DNA
database’s role in catching many criminals and sending them behind bars, there are still many questions
concerning privacy [14]. The database is an archive that is used to store and organize data making it
easy to be retrieved using search criteria. Databases are developed to classify data made of records that
are structured in a way that makes it easy for retrieval. The software that manages databases is called a
database management system.
The database that contains biological science such as molecular biology and bioinformatics is called
a biological database. These databases work like other databases, they allow indexing data, as well as
removing redundancy. They are the central component of bioinformatics [15]. Biological data differs
from any other data, they are complicated, have many exceptions, vast and incomplete. So, many
databases were established and interpreted to make sure that the results are clear and unambiguous. A
good database should contain up-to-date information. Biological databases allow scientists to retrieve
biological sequences, structure, metabolic interactions, functional relationships, molecular actions,
motifs homologous, and protein families [2]. PubMed is one of the most known databases in biomedical
literature. It contains the abstracts and the text of articles for about 4000 journals [2]. Modern biological
research especially genomic studies require databases. There are two types of biological databases,
primary database, and secondary database. The primary database holds the sequence and structural
information, whereas the secondary database is derivative from analysing the primary data. The
secondary database is important for controlling protein functions [16].
Examples for some primary biological databases are [17]:
• ‘GeneBank (Generic Sequence Databank)’: this bank contains nucleotide sequences data. It is an
ASCII text file that could be read by humans as well as computers. It is an open-access database.
• ‘EMBL (European Molecular Biology Laboratory)’: is a DNA and RNA sequences database that
is collected from patient offices and submitted by researchers. This database is maintained by EBI
(European Bioinformatics Institute).
• Swiss-Port: it is a curated protein sequence database.
- And examples for secondary biological databases are:
- Motif Databases:
• A protein sequence motif is a set of conserved amino acid residues. These are vital for protein
functions.
• PRINT: is a database for protein fingerprint
- Domain Database:
• A protein domain is a compact unit that forms a three-dimensional structure
• SMART: a sensitive tool for domain identification
- 3D Structure Databases:

101
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

• ‘PDB (Protein Data Bank)’: a database for the 3D structure of biological macromolecules
determined by X-ray
• ‘SCOP (Structure Classification of Protein database)’: classifies protein 3D structure
- Gene Expression Databases:
• ‘GEO (Gene Expression Omnibus)’: selected online resource and gene expression molecular
huge store to browse, query, and retrieve the gene.
• ‘GXD (Gene Expression Database)’: is a community resource for expressing gene information
This is just a selected group of the available databases that keep on growing over days.

4.1 Turning DNA sequences into protein sequences

The process of turning DNA into mRNA to Protein could be figured out as decoding the instructions
concerning proteins, including mRNA as well as tRNA. Once the sequence of amino acids is known, it
could be translated into the corresponding protein sequence using genetic codes. This is the same way
that cells generate protein sequences and is called translating DNA into Protein. The genetic code in the
Table shows how the 4-nucleotide sequence is linked to give a set of 20 amino acids. The table describes
the roles by which the coded data is translated to proteins. The diagram shows the DNA codon table as
a chart [18]. See figure 8.

Fig. 8: Genetic Code Chart [22]

4.1.1How to translate DNA to Protein using Genetic Code Chart

- Get the DNA string, e.g. :
‘ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG’
- Read the sequence of 3 nucleotides (triple) at a time, e.g.:
‘ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G...’
- Use a genetic code chart for reading the amino acid that corresponds to the triplet (codons).
• Start from the center that represents the first character of the triblet
• The second circle represents the second character
• The third circuit represents the last character
- Now, the protein sequence that corresponds to the DNA sequence is ready where:
• ATG = M, GAA = E, GTA = V, and so on…, so, the results will be like:
‘M E V F K A P P I G I STOP’
TAA, TAG, and TGA stop the translation process, so they are called the termination signals. This
process of translation could be done using any of the programming languages, the following is a simple
python program that could be used to translate DNA into protein.

4.1.2 DNA to Protein Translation using Python

- Starting with a text file containing the sequence of DNA as mentioned above,

102
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

- Open the text file

- Read the data stored in it
- Start translating according to the given algorithm

Once the start of protein coding is determined in the DNA sequence, the software starts generating
the corresponding protein sequence. DNA could be processed as a virtual protein sequence using a
simple software. See the following code:
# open the file called [Link] and read the DNA sequence
file = open('[Link]', 'r')
dna = [Link]()
print "The DNA Sequence is: ", dna
# DNA translation table
protein = {"TTT" : "F", "CTT" : "L", "ATT" : "I", "GTT" : "V",
"TTC" : "F", "CTC" : "L", "ATC" : "I", "GTC" : "V",
"TTA" : "L", "CTA" : "L", "ATA" : "I", "GTA" : "V",
"TTG" : "L", "CTG" : "L", "ATG" : "M", "GTG" : "V",
"TCT" : "S", "CCT" : "P", "ACT" : "T", "GCT" : "A",
"TCC" : "S", "CCC" : "P", "ACC" : "T", "GCC" : "A",
"TCA" : "S", "CCA" : "P", "ACA" : "T", "GCA" : "A",
"TCG" : "S", "CCG" : "P", "ACG" : "T", "GCG" : "A",
"TAT" : "Y", "CAT" : "H", "AAT" : "N", "GAT" : "D",
"TAC" : "Y", "CAC" : "H", "AAC" : "N", "GAC" : "D",
"TAA" : "STOP", "CAA" : "Q", "AAA" : "K", "GAA" : "E",
"TAG" : "STOP", "CAG" : "Q", "AAG" : "K", "GAG" : "E",
"TGT" : "C", "CGT" : "R", "AGT" : "S", "GGT" : "G",
"TGC" : "C", "CGC" : "R", "AGC" : "S", "GGC" : "G",
"TGA" : "STOP", "CGA" : "R", "AGA" : "R", "GGA" : "G",
"TGG" : "W", "CGG" : "R", "AGG" : "R", "GGG" : "G" }
protein_sequence = ""
# Generate protein sequence
for i in range(0, len(dna)-(3+len(dna)%3), 3):
if protein[dna[i:i+3]] == "STOP" :
break
protein_sequence += protein[dna[i:i+3]]
# Print the protein sequence
print "Protein Sequence: ", is the protein_sequence
# end of the program
The output of the program is like the following:
DNA Sequence: ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG
Protein Sequence: MEVFKAPPIGI
Programming languages that are used for bioinformatics are mostly PHP, Java, Pearl, C, and C++
for developers, and R, Python for analysis [19].

4.2 Privacy and Security

There is a risk on any data stored in the cloud including the sequenced DNA. Cloud stored data are open
and can be accessed freely, which makes this huge data - about a million profiles for DNA on GEDMatch
- available for anyone. These DNA profiles concern half of the USA population.
Since there is a great number of hackers, even private databases such as 23andMe could be hacked
nowadays. People are afraid that their genetic information could be used for a crime or even cloned if
stolen. There is a big challenge provided for securing DNA due to its nature. To accomplish this, this
requires the need from computer science experts [20].

4.3 DNA Future Storage

Technological advances facilitated the understanding of DNA, but the opposite might also be true. The
data is growing fast, over 2.5 exabytes per day, where the exabyte is equal to one billion gigabytes. The

103
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

work is still going on to find new techniques for storing this huge amount of data, some computer
scientists even turned to a weird place: the DNA. DNA is capable of storing and encoding information
using DNA sequences. This encourages big companies such as Microsoft’s to provide DNA storage for
storing data. In the year 2016, scientific researchers from Microsoft and the University of Washington
managed to store 200 megabytes of data using synthetic DNA. This medium for storing DNA sequences
are artificial genes which scientists created in the laboratory [21].
Nowadays researchers are looking to store more data and find ways for quick access in a time where
synthetic DNA is important field that could make data storage using synthetic DNA available in the
coming future. Figure 9 shows synthetic DNA as storing media that could store data much more
efficiently in a small area. This technique can store around 1250 terabytes of data in one cubic
millimeter, whereas the flash drive or the hard disk stores around 1.25 gigabytes per cubic millimeter.
This means that DNA can store up to 1,000,000 times than other storage devices [21].

Fig. 9: Storing data using synthetic DNA [23]

5. Conclusion
With the existence of a huge amount of DNA data showing every day, typical storing devices became
insufficient. Bioinformatics is an essential field for biological sciences that provides storing techniques
for DNA sequences. Information could not be stored or retrieved so quickly and easily without the
interference of technology, which is still developing new storage techniques. Bioinformatics continues
to grow to include other topics and areas in biology, chemistry, etc., using computations, statistics,
algorithms, and most important: databases.

References
[1] J. J. Li and M. D. Biggin, “Statistics requantitates,” Science (80-. )., vol. 347, no. 6226, pp. 1066–
1067, 2015.
[2] J. Xiong, Essential Bioinformatics. United Kingdom: Cambridge University Press, 2006.
[3] Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and P. W.,
Molecular Biology of the Cell, 4th editio. New York, 2002.
[4] Evelyn Fox Keller, “Genes, Genomes, and Genomics,” Biol. Theory, vol. 6, pp. 132–140, 2012.
[5] B. J. Copeland, Alan Turing’s Automatic Computing Engine: The Master Codebreaker’s Struggle
to build the Modern Computer. 2008.
[6] A. O. Stretton, “The First Sequence: Fred Sanger and Insulin,” Genetics, vol. 162, no. 2, pp. 527–
532, 2002.
[7] H. P. Langtangen and G. K. Sandve, “Illustrating Python via Bioinformatics Examples Basic
Bioinformatics Examples in Python,” Online, pp. 1–46, 2012.
[8] J. Mullins and B. J. M. McKay, “International society for computational biology honors Michael
Shburner and Olga Troyanskaya with top Bioinformatics/Computational biology awards for
2011,” PLoS Comput. Biol., vol. 7, no. 6, 2011, doi: 10.1371/[Link].1002081.
[9] G. Stoesser, M. A. Moseley, J. Sleep, M. McGowran, M. Garcia-Pastor, and P. Sterk, “The EMBL
nucleotide sequence database,” Nucleic Acids Res., vol. 26, no. 1, pp. 8–15, 1998, doi:
10.1093/nar/26.1.8.
[10] J. K. Kulski, Next-Generation Sequencing — An Overview of the History, Tools, and “Omic”
Applications. 2015.

104
Technium BioChemMed Vol. 2, Issue 4 pp.96-105 (2021)
ISSN: 2734-7990
[Link]

[11] Gautam B. Singh, Fundamentals of Bioinformatics and Computational Biology. 2015.

[12] C. Jou, “Introduction to Bioinformatics,” 2014.
[13] G. C. Michael (Doc) Edge, “How lucky was the genetic investigation in the Golden State Killer
case?,” 2019, doi: 10.1101/531384.
[14] L. S. Lewis, “THE ROLE GENETIC INFORMATION PLAYS IN THE CRIMINAL JUSTICE
SYSTEM,” 2005.
[15] B. B. Noor Ahmad Shaik, Ramu Elango, Muhummadh Khan, “Introduction to Biological
Databases,” Springer, Cham, vol. 1, pp. 19–27, 2019, doi: [Link]
02634-9_2.
[16] A. Tyagi, “Biological Databases,” 2020.
[17] Pascale Anderle, Pascale Anderle, Manuel Duval, Manuel Duval, Sorin Draghici, S. D. et al,
“Gene Expression Databases and Data Mining” Biotech. Suppl, vol. 3, pp. 36–44, 2003, doi:
10.2144/mar03anderle.
[18] W. B. Suzanne Clancy, “Translation: DNA to mRNA to Protein,” Scitable by Nat. Educ., vol. 1,
no. 1, 2018, [Online]. Available: [Link]
mrna-to-protein-393/.
[19] M. R. G. Mathieu Fourment, “A comparison of common programming languages used in
bioinformatics,” BMC Bioinformatics, vol. 9, no. 82, 2008, doi: 10.1186/1471-2105-9-82.
[20] G. M. Nina F de Groot, Britta C van Beers, “Commercial DNA tests and police investigations: a
broad bioethical perspective,” J. Med. Ethics, vol. 0, pp. 1–8, 2021, doi: 10.1136/medethics-2021-
107568.
[21] Potomac Institute for Policy Studies, “The Future of DNA Data Storage,” 2018. [Online].
Available: [Link]
[22] M. Mitra, “Elements of RNA , its Techniques and Applications,” no. January, 2019, doi:
10.5281/zenodo.2552818.
[23] Ashutosh Viramgama, “DNA Data Storage – Synthetic DNA – Future Of Storage,” 2018.
[Link]

105

Basic Concepts
No ratings yet
Basic Concepts
19 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
117 pages
Introduction to Computational Biology
No ratings yet
Introduction to Computational Biology
36 pages
Bioinformatics: Applications and Insights
No ratings yet
Bioinformatics: Applications and Insights
21 pages
Bioinformatics: Advancing Agriculture & Healthcare
No ratings yet
Bioinformatics: Advancing Agriculture & Healthcare
33 pages
Bioinformatics Overview and Applications
No ratings yet
Bioinformatics Overview and Applications
16 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
12 pages
Introduction to Bioinformatics PDF
100% (3)
Introduction to Bioinformatics PDF
104 pages
Introduction to Bioinformatics Overview
No ratings yet
Introduction to Bioinformatics Overview
61 pages
Introduction To NCBI Resources
No ratings yet
Introduction To NCBI Resources
39 pages
Omics Chapter 1-2
No ratings yet
Omics Chapter 1-2
99 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
19 pages
Introduction to Bioinformatics Concepts
100% (1)
Introduction to Bioinformatics Concepts
43 pages
Bioinformatics Quiz Prep: Key Concepts
No ratings yet
Bioinformatics Quiz Prep: Key Concepts
45 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
52 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
19 pages
Bioinformatics Algorithms Overview
No ratings yet
Bioinformatics Algorithms Overview
74 pages
Bioinformatics: Exploring Life's DNA Code
No ratings yet
Bioinformatics: Exploring Life's DNA Code
16 pages
Understanding Genomics: History & Categories
No ratings yet
Understanding Genomics: History & Categories
24 pages
Transform Your Life with Bioinformatics
No ratings yet
Transform Your Life with Bioinformatics
53 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
16 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
19 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
90 pages
Overview of Bioinformatics Concepts
100% (2)
Overview of Bioinformatics Concepts
20 pages
Intro to Computational Biology Course
No ratings yet
Intro to Computational Biology Course
24 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
42 pages
Biological Data Mining Overview
No ratings yet
Biological Data Mining Overview
9 pages
Bioinformatics for Beginners
No ratings yet
Bioinformatics for Beginners
232 pages
Bioinformatics in the Genomic Era
No ratings yet
Bioinformatics in the Genomic Era
64 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
93 pages
Module 1 Notes
No ratings yet
Module 1 Notes
54 pages
4.1 Bioinformatics
No ratings yet
4.1 Bioinformatics
14 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
Understanding Bioinformatics Basics
100% (1)
Understanding Bioinformatics Basics
52 pages
Bioinformatics: DNA to Protein Translation
No ratings yet
Bioinformatics: DNA to Protein Translation
74 pages
Cs481-Week1 1
No ratings yet
Cs481-Week1 1
63 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
76 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
84 pages
4.2 Bioinformatics
No ratings yet
4.2 Bioinformatics
15 pages
Bioinformatics: Basics and Applications
No ratings yet
Bioinformatics: Basics and Applications
232 pages
Introduction to Bioinformatics Overview
No ratings yet
Introduction to Bioinformatics Overview
14 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
51 pages
Intro to Bioinformatics Notes
No ratings yet
Intro to Bioinformatics Notes
18 pages
Introduction Bioinformatics
50% (2)
Introduction Bioinformatics
155 pages
Introduction to Bioinformatics and Genomics
No ratings yet
Introduction to Bioinformatics and Genomics
39 pages
Bioinformatics in IT - 065142
No ratings yet
Bioinformatics in IT - 065142
2 pages
Understanding Bioinformatics Basics
No ratings yet
Understanding Bioinformatics Basics
19 pages
Understanding Bioinformatics and Molecular Biology
No ratings yet
Understanding Bioinformatics and Molecular Biology
14 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
8 pages
Bioinformatics in Biotechnology Advances
No ratings yet
Bioinformatics in Biotechnology Advances
45 pages
Bioinformatics: DNA, RNA, and Sequence Analysis
No ratings yet
Bioinformatics: DNA, RNA, and Sequence Analysis
8 pages
Computational Biology
No ratings yet
Computational Biology
19 pages
BTG 315 - Dr. Seriki
No ratings yet
BTG 315 - Dr. Seriki
26 pages
1 - Bioinformatics 1
No ratings yet
1 - Bioinformatics 1
23 pages
Bioinformatics and Biological Databases
No ratings yet
Bioinformatics and Biological Databases
47 pages
Complete Notes On Bioinformatics
No ratings yet
Complete Notes On Bioinformatics
17 pages
Overview of Bioinformatics and Its Applications
No ratings yet
Overview of Bioinformatics and Its Applications
2 pages
DNA Sequencing: Past to Future Insights
No ratings yet
DNA Sequencing: Past to Future Insights
81 pages
Cambridge International AS & A Level: Biology 9700/01
No ratings yet
Cambridge International AS & A Level: Biology 9700/01
18 pages
DNA Structure and Replication Overview
No ratings yet
DNA Structure and Replication Overview
22 pages
TF Structure
No ratings yet
TF Structure
9 pages
Central Dogma: Transcription & Translation Notes
No ratings yet
Central Dogma: Transcription & Translation Notes
19 pages
Understanding Heredity Mechanisms
No ratings yet
Understanding Heredity Mechanisms
5 pages
Comparative Genomics
No ratings yet
Comparative Genomics
14 pages
KU MBBS Curriculum Overview 2011
71% (17)
KU MBBS Curriculum Overview 2011
147 pages
Summary of Microbiology Concepts
100% (1)
Summary of Microbiology Concepts
53 pages
Mathematical Analysis of Biological Data
No ratings yet
Mathematical Analysis of Biological Data
2 pages
Mitochondrial DNA in Forensic Odontology
No ratings yet
Mitochondrial DNA in Forensic Odontology
31 pages
Cell Biology Lecture Notes Overview
No ratings yet
Cell Biology Lecture Notes Overview
6 pages
AQA A-Level Biology: Biological Molecules Guide
No ratings yet
AQA A-Level Biology: Biological Molecules Guide
25 pages
BGM1004 Seminar: DNA Analysis & PCR
No ratings yet
BGM1004 Seminar: DNA Analysis & PCR
6 pages
2026 GRADE 12 LFSC Composite Term 1 LEARNER Document - Final
No ratings yet
2026 GRADE 12 LFSC Composite Term 1 LEARNER Document - Final
67 pages
The Silent Murder at Grand Horizon Hotel
No ratings yet
The Silent Murder at Grand Horizon Hotel
9 pages
USMLE Biochemistry Cell Cycle Overview
No ratings yet
USMLE Biochemistry Cell Cycle Overview
12 pages
Microbiology Curriculum Overview 2008-09
No ratings yet
Microbiology Curriculum Overview 2008-09
40 pages
WRKY-GCM1 Zinc Finger Evolution
No ratings yet
WRKY-GCM1 Zinc Finger Evolution
16 pages
UV Radiation Effects on E. coli Growth
No ratings yet
UV Radiation Effects on E. coli Growth
3 pages
Biomolecule Definitions and Comparisons
No ratings yet
Biomolecule Definitions and Comparisons
50 pages
DNA Structure Enhances TF Binding Predictions
No ratings yet
DNA Structure Enhances TF Binding Predictions
3 pages
Photosynthesis and Ecosystem Dynamics
100% (1)
Photosynthesis and Ecosystem Dynamics
3 pages
RNA Synthesis MCQs with Answers
100% (1)
RNA Synthesis MCQs with Answers
13 pages
Recombinant DNA Technology (Siddra Ijaz, Imran Ul Haq) (Z-Library)
100% (1)
Recombinant DNA Technology (Siddra Ijaz, Imran Ul Haq) (Z-Library)
157 pages
DNA Transcription & Translation Lesson Plan
No ratings yet
DNA Transcription & Translation Lesson Plan
11 pages
Introduction to Next-Generation Sequencing
No ratings yet
Introduction to Next-Generation Sequencing
25 pages
Class 12 MCQs on Inheritance Concepts
No ratings yet
Class 12 MCQs on Inheritance Concepts
16 pages
Ribosomes and Protein Synthesis Overview
No ratings yet
Ribosomes and Protein Synthesis Overview
8 pages
Metagenomics For Microbiology 1st Edition Jacques Izard
No ratings yet
Metagenomics For Microbiology 1st Edition Jacques Izard
81 pages

Bioinformatics Storing Databases

Uploaded by

Bioinformatics Storing Databases

Uploaded by

Technium BioChemMed Vol. 2, Issue 4 pp.

Bioinformatics Storing Databases

Raghad M. M. Abed1, Yusra A. Y. Al-Najjar2*

2 College of Computer Science, Taibah University, Al-Madinah Al-Munawarah –

*Corresponding author: Yusra Al-Najjar, yalnajar@[Link]

Keywords. Bioinformatics, chromosomes, databases, gene, genome, protein, DNA sequencing

Fig. 1: fields involved in Bioinformatics

Bioinformatics differs from computational biology where bioinformatics concentrates on the

2.1. Why bioinformatics?

Fig. 2: Central Dogma [2]

Fig. 3: Types of amino acids [3]

2.3 Amino acids

Fig. 4: Amino acid general structure [3]

2.4 Gene, genome, and DNS

Fig. 5: Genetic arrangement in the cell [4]

Fig. 6: pairs of chromosomes [4]

3. Computer Science Behind and Sequencing

Fig. 7: Sanger versus next-generation sequencing [10]

3.1 Computational Biology

3.2 Algorithms used in Bioinformatics

• Constructing evolutionary (phylogenetic) trees: a comparison between sequence of different

4. Bioinformatics Databases and Cloud Storage

4.1 Turning DNA sequences into protein sequences

Fig. 8: Genetic Code Chart [22]

4.1.1How to translate DNA to Protein using Genetic Code Chart

4.1.2 DNA to Protein Translation using Python

- Open the text file

4.2 Privacy and Security

4.3 DNA Future Storage

Fig. 9: Storing data using synthetic DNA [23]

[11] Gautam B. Singh, Fundamentals of Bioinformatics and Computational Biology. 2015.

You might also like