Computer Science and Information Technology 5(5): 149-153, 2017 [Link]
org
DOI: 10.13189/csit.2017.050501
A Comparison between Characteristics of NoSQL
Databases and Traditional Databases
Mitko Radoev
Department of Information Technologies and Communications, Faculty of Applied Informatics and Statistics,
University of National and World Economy, Sofia, Bulgaria
Copyright©2017 by authors, all rights reserved. Authors agree that this article remains permanently open access under
the terms of the Creative Commons Attribution License 4.0 International License
Abstract With the increasing popularity of NoSQL NoSQL databases are already among the Top 10.
databases, the question arises if they have all According to DB-engines [1], the Top 10 most popular
characteristics of databases and can they be a real DBMSs in September 2017 are:
alternative to the relational databases in all application 1. Oracle
domains. This paper makes an attempt to systematize the 2. MySQL
most important characteristics of the traditional databases 3. Microsoft SQL Server
and then to analyze whether NoSQL databases have these 4. PostgreSQL
characteristics. On this basis it is possible to draw a 5. MongoDB
conclusion whether NoSQL databases have the necessary 6. DB2
qualities to be called databases, or rather they are data 7. Microsoft Access
stores with limited capabilities. The results of the 8. Cassandra
comparison shows that none of the NoSQL DBMS under 9. Redis
consideration covers more than 50% of the characteristics 10. Elasticsearch
of the traditional databases so the use of the term
According to a survey, conducted by StackOverflow [2]
"database" in respect of any one of them is not fully correct.
among software developers, the most popular 8 DBMS for
Keywords Databases, Relational Databases, NoSQL 2017 are:
Databases 1. MySQL
2. Microsoft SQL Server
3. PostgreSQL
4. SQLite
5. MongoDB
1. Introduction 6. Oracle
7. Redis
In recent years, so-called NoSQL databases have
8. Cassandra
become increasingly popular. The term NoSQL databases
are interpreted in different ways. It should literally refer to In the both rankings NoSQL database management
databases that do not use the SQL language. There is also systems like MongoDB, Cassandra and Redis are among
an interpretation that it means Not Only SQL. In fact, they the first 10 places, and they are constantly improving their
are databases that do not use SQL, but also they are not positions compared to the previous periods. The use of
based on the relational data model at all, and therefore the NoSQL databases is increasing not only in developing new
more precise term would be Non-relational databases. systems, but some of the relational database users have
Moreover, they are not hierarchical nor network databases decided to replace them by these new alternative databases.
(the predecessors of the relational databases). They are an Are all users aware of what they choose and what
entirely new type of databases based on different data consequences of their choice would be? Because the term
models. Further on, the article will look at the different database is used, although NoSQL, users often expect to
NoSQL databases according to the models on which they get everything they used to get from databases, along with
are based. the benefits that NoSQL DBMS undoubtedly have.
What is the growing popularity of these new databases Unfortunately, everything has a price, and each of the
based on? The reasons for this are to be analyzed, but it is benefits of NoSQL databases is paid for by the loss of
a fact that in most database management system rankings, traditional database capabilities - control of data
150 A Comparison between Characteristics of NoSQL Databases and Traditional Databases
redundancy, ensuring data integrity, maintaining 4. No ad hoc queries
transaction, and so on. Ironically, what is most missing In order to get new reports, new programs need to be
from NoSQL databases is SQL [3]. There are developers written to implement them, as there are no possibilities for
who have already experienced in practice the deficiencies users to ask ad hoc queries.
of the new data stores and some of them decided to return All of these limitations of the file organization of data
back to the relational databases [4]. are due to two major factors:
To be able to make informed choice, users must be fully The data definitions are embedded in the
aware of the characteristics of the database systems. Last application programs instead of being stored
but not least, they have to be aware of whether the NoSQL together with the data;
databases have the necessary qualities to be called There are no data access and processing
databases, or they are rather data stores with limited capabilities other than those provided by the
capabilities compared to relational databases. The main application programs.
purpose of this article is to analyze the characteristics of
NoSQL databases in comparison to the characteristics of The first types of databases created were based on the
traditional databases and on this basis to answer the hierarchical and network data models. They made an
question whether the NoSQL databases are real databases. attempt to solve the problems of the existing file
organization of data, but they failed to solve all the
problems because they did not provide sufficient data
2. Traditional Databases independence, and the implementation of queries was
rather complicated as they were navigationally oriented.
The first databases that appeared used hierarchical The crucial moment in database development
(tree-like) or network structures to store data. Thereafter, advancement occurred in 1970 when Edgar Frank Codd
relational databases emerged and rapidly gained popularity, published his famous work on the relational model of data
and to date they are still the most common databases. These [5]. By the late 1970s and early 1980s, there had been
three types of databases will be called traditional. already database management systems based on the
relational data model.
2.1. Genesis of the Databases Putting relational model into practice leads to the
creation of extremely powerful and flexible tools such as
It is necessary to review briefly the appearance and relational database management systems, which rapidly
development of databases in order to recall the problems became prevalent and dominant. The subsequent
that seem to be forgotten today. development of these systems leads to expanding their
Databases originated in the 1960s in order to replace the capabilities, such as the creation of SQL language, which
existing file organization of data. What are the further reinforces their dominant position, which is still
disadvantages of organizing data in separate files? Here are the case today.
some of the main drawbacks:
2.2. Characteristics of the Databases
1. Redundant data
Storing the data required by each application program The databases have a large number of characteristics. In
results in serious duplication of data. Data duplication is this part was made an attempt to systematize the most
important of them, using information from different
undesirable, not only because it leads to extra storage, but
sources [6-8].
above all because it can lead to inconsistent data.
1. Structured data
2. Isolated data
Data in the database is structured according to the used
When data is stored in separate files it is difficult to data model. On the other hand, the structure of the data
process it together. The situation is complicated by the use has to match to the entities of the real world and their
of different file formats. Due to the fact that the file essential properties.
structure is defined in the application programs, file
formats are different according to used programming tools 2. Related data
and developers' decisions. In addition to information about the real world entities
and their properties, the database also stores information
3. Program–data dependence
about the essential links between them. In the hierarchical
The physical structures of files and data records are and network models, these links are set explicitly. There is
defined in the program code. This means that changes to no such requirement in the relational model, links can be
existing structures are difficult to make. However, if a realized between any pair of attributes (or sets of
data structure changes are necessary, all programs that attributes) having a common domain. In practice however,
have access to the changed file must be modified. almost all relational databases provide some means of
Computer Science and Information Technology 5(5): 149-153, 2017 151
setting the essential relationships, for example through the 12. Ad hoc queries
foreign key constraints. Data in the database is directly accessible to end users.
3. Metadata Many DBMSs provide query languages or report
generators that allow users to get the necessary
Together with the data itself, the database also stores information without writing a program to retrieve this
metadata, data about the data. Metadata include information from the database.
information about data structures as well as data integrity
constraints, security information etc. 13. Transaction support
4. Data sharing The databases provide a mechanism to ensure execution
of transactions, i.e. sequences of actions that are logically
The database is a common resource within enterprises related and should be implemented as one action.
or organizations and is used by most (ideally all) Transactions must meet the following requirements:
application programs serving their business. Atomicity - All actions are executed successfully,
5. Restriction on data duplication or the entire transaction is canceled;
Consistency - Once the transaction has been
Databases are trying to eliminate or minimize data
executed, the database must be in a consistent
duplication. In the relational model, for example, it is
state;
necessary to duplicate the values of the primary keys as
Isolation - The execution of one transaction should
foreign keys in order to model the relationships between
not affect the execution of the others transactions;
the entities.
Durability - Transaction result must be stored
6. Data integrity reliably.
Databases usually ensure data integrity by imposing
restrictions. These restrictions can apply to the entities and
their properties, and to the relationships between entities. 3. NoSQL Databases
7. Data security As already mentioned, NoSQL databases are newly
emerging non-relational databases. The history of NoSQL
Data security is primarily related to data protection
databases, the types of NoSQL databases and their
against unauthorized access. This is done via various
characteristics are presents in the following parts.
mechanisms, such as usernames and passwords. User
access to data may also be limited by the type of operation
(retrieval, insertion, update, deletion). 3.1. Genesis of the NoSQL Databases
8. Data reliability This new trend in databases began in the early 21st
century with the development of Internet applications and,
Databases ensure reliable data storage and recovery
mechanisms in case the database gets damaged. above all, Google's applications. One of the first
publications on the topic dates back to 2003 and is linked
9. User views to the Google File System [9]. Then, publications related
Databases allow users to define different views to the to other Google's systems appeared, such as MapReduce
same data. Each user may have a specific view presented [10], Chubby [11] and Bigtable [12]. Google's Internet
in a form that is familiar to them. The view includes applications are followed by Yahoo, Amazon, and later by
information only about those entities, attributes and Facebook, Netflix, EBay and many others, which have led
relationships from the real world, which the user is to many new NoSQL databases.
interested in. Here are the most significant reasons for the emergence
and rapid development of this new direction:
10. Data independence
1. Need to store and process huge volumes of data
Databases are based on a multi-level architecture that
ensures the independence of the external structure (user The amount of photos, videos, geographic data etc.
views) from the logical structure of the data, as well as the stored and processed by different applications increases
independence of the logical structure from the physical every day.
representation of the data. 2. Need for real-time access
11. Data storage, retrieval, and update Stored huge volumes of data must be available in real
The databases provide users and application programs time from anywhere in the world.
with a mechanism for storing, updating and retrieving data 3. Need for flexibility
from the database through the database management
system (DBMS). Ability to easily and quickly change the structure of the
152 A Comparison between Characteristics of NoSQL Databases and Traditional Databases
data. products of this type are Neo4j, Titan, Giraph.
There are also products based on two or more different
4. Need to store unstructured data data models. Such are OrientDB, ArangoDB and many
Unstructured or semi-structured data should also be others.
stored along with structured data. According to Brewer's CAP Theorem [13], distributed
5. Need for scalability systems cannot have more than two of the following
It must be easy to extend the scale of applications and characteristics at the same time:
stored data. Consistency;
Traditional databases cannot meet all the new Availability;
requirements to the necessary extent. This opens up a Partition tolerance.
niche for creating and developing many new systems While traditional databases are focused on ensuring
based on different models. data consistency, the NoSQL databases in most cases
prioritize high availability and partitioning, losing the
3.2. Types of NoSQL Databases and Their consistency of data. This tendency results in creating
Characteristics systems known as BASE (Basically Available, Soft-state,
Eventually consistent). It cannot be ignored that these
The following main types of NoSQL databases can be systems, while trying to solve some problems created
identified, depending on the data model used: other, perhaps more serious, problems.
1. Key-value store
This is one of the simplest models for storing data. 4. Comparison between NoSQL
Key-value pairs are stored with the keys uniqueness. Data Databases and Traditional
is accessed by searching through the key values. It is
suitable for storing large volumes of data and provides
Databases
quick access by the key. There are different varieties The main purpose of this article is to compare NoSQL
depending on the memory in which the data is stored, the databases with traditional databases, and more specifically,
key sorting and data consistency. The most popular to check which of the characteristics of traditional
systems are Redis, Memcached, Berkeley DB and Oracle databases are also characteristics of the NoSQL databases.
NoSQL. The variety of products from each of the types of
2. Document store NoSQL databases leads to significant differences in their
characteristics. The characteristics of the most popular
As can be seen from their name, they are designed
products from each category are taken into account when
specifically for document storage. The formats used are
performing the analysis. These are:
XML, JSON, BSON, and others. Data is semi-structured
1. Redis for key-value stores;
and contains pairs of attribute name-value. Data is
2. MongoDB for document stores;
accessed by searching both on key values and attribute
3. Apache Cassandra for column-oriented stores;
values. They are suitable for storing text and XML
4. Neo4j for graph-oriented stores.
documents and other semi-structured data. The most
famous products of this type are MongoDB, Amazon Data from their web sites and other sources have been
Dynamo DB, Couchbase, and CouchDB. used to determine product characteristics.
The results of the comparison between characteristics of
3. Column-oriented the traditional and NoSQL databases are summarized in
Unlike relational databases that are row-oriented, this Table 1.
type of data structure is column-oriented. This makes it The final results of the comparison are eloquent - none
possible to easily expand the data structure by adding new of the products under consideration covers more than 50%
columns. They are suitable for large volumes of of the characteristics of the traditional databases. At the
distributed data. Typical representatives are Cassandra, same time, the following should be taken into account
HBase and Google Bigtable. before drawing final conclusions:
Not all database characteristics are equally
4. Graph-oriented important to determine their essence;
The data is represented by a graph-like structure, with The information available for the products is not
the nodes of the graph representing the objects and their sufficient to determine categorically possession or
set of attributes, and the edges of the graph representing non-possession of a particular characteristic;
links between the objects. They are suitable for Graph-oriented systems differ from the other
representing data when the links between objects are NoSQL data stores and, although with boundary
particularly important for modeling. The most famous values, have the potential to be true databases.
Computer Science and Information Technology 5(5): 149-153, 2017 153
Table 1. Match between characteristics of the traditional and NoSQL databases
Key-value Document Column-oriented Graph-oriented
Characteristics
(Redis) [14,18] (MongoDB) [15] (Cassandra) [16] (Neo4j) [17]
Structured data no partly 1 yes yes
Related data no no no yes
Metadata no no no no
Data sharing yes yes yes yes
Restriction on data duplication no no no no
Data integrity no no no partly 3
Data security no yes yes yes
Data reliability yes yes yes no
User views no no no no
Data independence no no no no
Data storage, retrieval, and update yes yes yes yes
Ad hoc queries no no no no
Transaction support yes no partly 2 yes
Total 31% 35% 42% 50%
1
MongoDB stores semi-structured data.
2
Cassandra uses batches which are not a full analogue for transactions.
3
Neo4j allows the imposition of some data constraints.
5. Conclusions [7] C. J. Date, An Introduction to Database Systems, 8th edition,
Pearson/Addison Wesley, 2004.
Since none of the products in question have more than [8] T. Connolly, C. Begg, Database Systems: A Practical
50% of the characteristics of traditional databases, the use Approach to Design, Implementation, and Management, 4th
edition, Pearson Education Ltd, 2005.
of the term "database" in respect of any one of them is not
entirely correct. [9] S. Ghemawat, H. Gobioff, S. Leung, The Google File
Instead of NoSQL databases, the products of this System, 19th ACM Symposium on Operating Systems
Principles, Lake George, 2003.
category would be more accurately called Non-relational
data stores. [10] J. Dean, S. Ghemawat, MapReduce: Simplified Data
Graph-oriented storage systems most closely match Processing on Large Clusters, 6th Symposium on Operating
System Design and Implementation, San Francisco, 2004.
characteristics of traditional databases and have the
potential to develop into complete databases. [11] M. Burrows, The Chubby Lock Service for
Loosely-Coupled Distributed Systems, 7th Symposium on
Operating System Design and Implementation, Seattle,
2006.
[12] F. Chang, J. Dean, S. Ghemawat, W. Hsieh, D. Wallach, M.
REFERENCES Burrows, T. Chandra, A. Fikes, R. Gruber, Bigtable: A
Distributed Storage System for Structured Data, 7th
[1] DB-Engines Ranking, Online available from Symposium on Operating System Design and
[Link] Implementation, Seattle, 2006.
[2] Stack Overflow, Online available from [13] A. Fox, E. Brewer, Harvest, Yield and Scalable Tolerant
[Link] Systems, Proc. 7th Workshop Hot Topics in Operating
Systems, pp. 174–178, 1999
[3] S. Tiwari, Professional NoSQL, John Wiley & Sons,
Indianapolis, 2011. [14] Redis Documentation, Online available from
[Link]
[4] S. Mei, Why You Should Never Use MongoDB, Online
available from [15] The MongoDB 3.4 Manual, Online available from
[Link] [Link]
d-never-use-mongodb/. [16] Apache Cassandra Documentation v4.0, Online available
from [Link]
[5] E. F. Codd, A relational model of data for large shared data
banks, Communications of the ACM, Vol. 13, № 6, pp. 377 - [17] The Neo4j Developer Manual v3.2, Online available from
387, June 1970. E. F. Codd, A relational model of data for [Link]
large shared data banks, Communications of the ACM, Vol.
13, № 6, pp. 377 - 387, June 1970. [18] A. Moniruzzaman, S. Hossain, NoSQL Database: New Era
of Databases for Big data Analytics - Classification,
[6] E. F. Codd, Relational database: A practical foundation for Characteristics and Comparison, International Journal of
productivity, Communications of the ACM, Vol. 25, № 2, pp. Database Theory and Application, Vol. 6, № 4, pp. 1-14,
109-117, February 1982. 2013.