0% found this document useful (0 votes)
9 views16 pages

NoSQL Unit 1 Week 1 Notes

NoSQL databases are non-relational systems designed for storing and retrieving large volumes of structured, semi-structured, and unstructured data, offering high scalability and flexibility. They differ from traditional relational databases by not requiring fixed schemas, enabling horizontal scalability, and often sacrificing ACID compliance for performance. The document outlines the history, features, advantages, disadvantages, types, and operational environment of NoSQL databases.

Uploaded by

coupanhub
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

NoSQL Unit 1 Week 1 Notes

NoSQL databases are non-relational systems designed for storing and retrieving large volumes of structured, semi-structured, and unstructured data, offering high scalability and flexibility. They differ from traditional relational databases by not requiring fixed schemas, enabling horizontal scalability, and often sacrificing ACID compliance for performance. The document outlines the history, features, advantages, disadvantages, types, and operational environment of NoSQL databases.

Uploaded by

coupanhub
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

V20UDS502 – NOSQL

UNIT – 1, WEEK - 1
INTRODUCTION TO NoSQL
NoSQL Database is used to refer a non-SQL or non-relational database. NoSQL – Not Only
SQL. It provides a mechanism for storage and retrieval of data other than tabular relations
model used in relational databases. It is generally used to store big data and real-time web
applications. The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The system response
time becomes slow when you use RDBMS for massive volumes of data. Traditional RDBMS
uses SQL syntax to store and retrieve data for further insights. Instead, a NoSQL database
system encompasses a wide range of database technologies that can store structured, semi-
structured, unstructured data.

History of NoSQL:
• 1998 - Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
• 2000 - Graph database Neo4j was launched
• 2004 - Google BigTable was launched
• 2005 - CouchDB was launched
• 2007 - The research paper on Amazon Dynamo is released
• 2008 - Facebooks open sources the Cassandra project
• 2009 - The term NoSQL was reintroduced

FEATURES OF NoSQL

Non-relational: NoSQL databases never follow the relational model. Never provide tables
with flat fixed-column records. Work with self-contained aggregates or BLOBs. Doesn’t
require object-relational mapping and data normalization. No complex features like query
languages, query planners, referential integrity joins, ACID.
Schema-free: NoSQL databases are either schema-free or have relaxed schemas. Do not
require any sort of definition of the schema of the data. Offers heterogeneous structures of
data in the same domain.

Simple API: Offers easy to use interfaces for storage and querying data provided. APIs allow
low-level data manipulation & selection methods. Text-based protocols mostly used with
HTTP REST with JSON Mostly used no standard based NoSQL query language. Web-
enabled databases running as internet-facing services.

Distributed: Multiple NoSQL databases can be executed in a distributed fashion. Often


ACID concept can be sacrificed for scalability and throughput. Shared Nothing Architecture,
this enables less coordination and higher distribution.

Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to
a database cluster, making them well-suited for handling large amounts of data and high
levels of traffic.

High availability: NoSQL databases are often designed to be highly available and to
automatically handle node failures and data replication across multiple nodes in a database
cluster.

Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and
dynamic manner, with support for multiple data types and changing data structures.

Performance: NoSQL databases are optimized for high performance and can handle a high
volume of reads and writes, making them suitable for big data and real-time applications.

ADVANTAGES AND DISADVANTAGES OF NoSQL

Advantages of NoSQL:

High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data
and placing it on multiple machines in such a way that the order of the data is preserved is
sharding. Vertical scaling means adding more resources to the existing machine whereas
horizontal scaling means adding more machines to handle the data. NoSQL can handle a huge
amount of data because of scalability, as the data grows NoSQL scale itself to handle that
data in an efficient manner.

Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data,


which means that they can accommodate dynamic changes to the data model. This makes
NoSQL databases a good fit for applications that need to handle changing data requirements.

High availability: Auto replication feature in NoSQL databases makes it highly available
because in case of any failure data replicates itself to the previous consistent state.

Scalability: NoSQL databases are highly scalable, which means that they can handle large
amounts of data and traffic with ease. This makes them a good fit for applications that need to
handle large amounts of data or traffic.

Performance: NoSQL databases are designed to handle large amounts of data and traffic,
which means that they can offer improved performance compared to traditional relational
databases.

Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational
databases, as they are typically less complex and do not require expensive hardware or
software.

Disadvantages of NoSQL:

Lack of standardization: There are many different types of NoSQL databases, each with its
own unique strengths and weaknesses. This lack of standardization can make it difficult to
choose the right database for a specific application

Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means
that they do not guarantee the consistency, integrity, and durability of data. This can be a
drawback for applications that require strong data consistency guarantees.
Narrow focus: NoSQL databases have a very narrow focus as it is mainly designed for
storage but it provides very little functionality. Relational databases are a better choice in the
field of Transaction Management than NoSQL.

Open-source: NoSQL is open-source database. There is no reliable standard for NoSQL yet.
In other words, two database systems are likely to be unequal.

Lack of support for complex queries: NoSQL databases are not designed to handle
complex queries, which means that they are not a good fit for applications that require
complex data analysis or reporting.

Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional
relational databases. This can make them less reliable and less secure than traditional
databases.

Management challenge: The purpose of big data tools is to make the management of a large
amount of data as simple as possible. But it is not so easy. Data management in NoSQL is
much more complex than in a relational database. NoSQL, in particular, has a reputation for
being challenging to install and even more hectic to manage on a daily basis.

GUI is not available: GUI mode tools to access the database are not flexibly available in the
market.

Backup: Backup is a great weak point for some NoSQL databases like MongoDB.
MongoDB has no approach for the backup of data in a consistent manner.

Large document size: Some database systems like MongoDB and CouchDB store data in
JSON format. This means that documents are quite large (BigData, network bandwidth,
speed), and having descriptive key names actually hurts since they increase the document
size.
SQL vs NoSQL

SQL NoSQL
Data Model Relational databases NoSQL (non-relational) databases offer various
use a structured data data models such as document, key-value,
model with tables. column-family, and graph. They are more flexible
Data is stored in rows and can handle semi-structured and unstructured
and columns. data.
Schema Relational databases NoSQL databases are schema-less or have flexible
require a fixed schemas.
schema before data is
inserted.
Query SQL databases use Each type of NoSQL database may have its own
Language the SQL language for query language or API.
querying and
manipulating data.
Scalability Vertically Scalable Horizontally scalable
Performance Relational databases NoSQL databases can be highly performance
are optimized for
complex queries and
transactions.
Consistency Follows ACID Follows CAP (Consistency, Availability, Partition)
property
Example MySQL, PostgreSQL MongoDB, GraphQL, HBase, Neo4j, Cassandra
, Oracle, MS-SQL
Server

NoSQL SCHEMA MODELING

NoSQL databases allows to store vast amounts of data and access them anytime, from any
location and device. It deciding which data modeling technique best suits your needs is
complex. Fortunately, there is a data modeling technique for every use case.

The primary difference is that NoSQL does not use relational data modelling techniques and
emphasises flexible design. The absence of schematic requirements makes designing a much
simpler and cheaper process. This doesn’t mean that you can’t use the schema completely, but
rather that the schema design is very flexible. Another helpful feature of NoSQL data models
is that they are designed for high efficiency and speed in making up millions of queries per
second. This is achieved by having all the data in one table so that JOINS and cross-
references are not as performance intensive. SQL is only vertically scalable, but on the other
hand, NoSQL is both vertically & horizontally scalable. In addition, with NoSQL, you can
use another shard, which is cheap, rather than buying additional hardware, which is not.

Denormalisation: Denormalisation is a common technique that involves copying data into


multiple tables or forms to simplify it. Use denormalisation to easily group all the data you
need to query in one place. Unfortunately, this means that the data volume increases for
various parameters, considerably increasing the data volume.

Aggregates: This allows users to create nested entities with complex internal structures and
change their specific systems. Ultimately, aggregation limits connections by minimising one-
to-one relationships. Most NoSQL data models have some form of this soft schema
technique. For example, graph and key-value store databases have values in any format
because these data models place no restrictions on the matter.

Application Side Joins: Since NoSQL databases are question-oriented and join are
performed during design time, NoSQL often does not enable joins. Compared to relational
databases, this is done when the query is executed. Naturally, this frequently entails a
performance penalty and is sometimes unavoidable.

Four Types of NoSQL Databases

Key-Value Store: One of the most common data models, key-value stores use key-values
with pointers to store data. This unique pointer refers directly to a specific piece of
information, which can be anything you want. You can even use an empty string as the value
key if you wish to, although there are upper limits to how big the value can be depending on
the database. Interestingly, Amazon initially helped get this data model off the ground and
used it for DynamoDB. Since they are one of the largest online marketplaces in the world,
you can see how powerful this data model can be.

Document-based Store: XML and JSON tend to be tied to SQL, which slows down queries
and the whole process. NoSQL doesn’t use a relational model, it doesn’t have to, which is
where document-based stores come in. All data is stored in one table, so there is no need for
cross-referencing, and instead of storing information in a table, it is stored in a document.
While it is very similar to a key-value store and can sometimes be considered an umbrella for
it, the difference is that document-based NoSQL generally has some form of encoding, such
as XML.

Column-based store: This data model stores information in columns rather than rows, which
is more common with SQL. Data is stored in columns that are grouped into families, and
families are further grouped into more columns. This essentially creates an almost unlimited
column nesting data model. The advantage is that it offers incredibly high speeds compared
to other models or NoSQL when it comes to searches. The data is treated as one continuous
record, so there is no need to jump across rows or different areas where the information is
stored.

Graph-based store: Graph data models consider the relationship between two pieces of
information to be as meaningful as the information itself. This data model is really made for
any information you would typically represent in a chart. It uses relationships and nodes,
where the data is the information itself, and the connection is created between the nodes.

Schema design for NoSQL

Since NoSQL databases don’t have a set structure, schema development and design usually
focus on the physical data model. That means developing for large, horizontally-spanning
environments, which is where NoSQL excels. Therefore, the specific peculiarities and
problems brought about by scalability are in the foreground. So, the first step is to define the
business requirements because optimising access to data is a must and can only be achieved if
we know what the business wants to do with the data. Schema design should complement the
workflows associated with your use case. There are several ways to choose a primary key,
ultimately depending on the users themselves. Some data may indicate a more efficient
scheme, especially regarding how often that data is queried.

NoSQL DATABASE ENVIRONMENT

A NoSQL database environment refers to the ecosystem in which NoSQL databases operate.
It encompasses various components, tools, configurations, and considerations necessary for
the successful deployment, management, and utilization of NoSQL databases. This
environment is designed to address the unique characteristics and requirements of NoSQL
databases, which are distinct from traditional relational databases.

Key Elements of NoSQL Database Environment:

NoSQL Database Engines: These are the core software systems that implement the chosen
NoSQL database model (document, key-value, column-family, graph). Examples include
MongoDB (document-based), Redis (key-value), Cassandra (column-family), Neo4j (graph),
and more. These engines provide data storage, retrieval, and manipulation functionalities.

Data Modeling Tools: Tools and frameworks that assist in designing the schema for NoSQL
databases. They help define the structure of documents, fields, relationships, and indexing
strategies. Visualization tools provide insights into the organization of data.

Deployment Platforms: Cloud Providers: Platforms like AWS, Azure, and Google Cloud
offer managed NoSQL database services, enabling easy deployment and scalability. On-
Premises Servers: Organizations can set up their own servers for running NoSQL databases.
Containerization: Docker and Kubernetes allow containerized deployment for consistent
environments across different stages.

Replication and Sharding: Replication: Copies of data are maintained on multiple nodes for
high availability and fault tolerance. Sharding: Data is split into smaller pieces (shards) and
distributed across nodes for horizontal scalability.

Monitoring and Management Tools: Monitoring: Tools like Prometheus, Grafana, or native
database monitoring dashboards provide insights into database performance, resource usage,
and query patterns. Management: Tools like MongoDB Compass or management consoles in
cloud platforms help manage and configure databases.

Backup and Recovery: Regular backups: Automated backup solutions ensure data is backed
up at specified intervals. Point-in-time recovery: Ability to restore the database to a specific
point in time.

Consistency and Concurrency Management: Consistency Models: NoSQL databases offer


various consistency levels, from strong to eventual, allowing trade-offs between data
consistency and availability. Concurrency Control: Mechanisms to handle simultaneous read
and write operations to maintain data integrity.

Indexing and Query Optimization: Secondary Indexes: Tools allow creating indexes on
specific fields to speed up query execution. Query Optimization: Analyzing query execution
plans and using indexing strategies to optimize queries.

Security and Access Control: Authentication: Users are required to authenticate before
accessing the database. Authorization: Role-based access control determines what operations
users can perform on the database. Encryption: Data at rest and in transit can be encrypted for
security.

Integration with Development Tools: Libraries and SDKs: Libraries and software
development kits for various programming languages to interact with the database. Drivers:
Native drivers enable efficient communication between the application and the database.
Scalability and Load Balancing: Horizontal Scaling: Adding more nodes to the database
cluster to accommodate increased load. Load Balancing: Distributing incoming requests
evenly across nodes to prevent overloading.

Data Migration and ETL Tools: Data Import/Export: Tools for moving data to and from the
NoSQL database during migrations or integrations. ETL (Extract, Transform, Load): Tools to
extract data from various sources, transform it as needed, and load it into the database.

Documentation and Community: Comprehensive Documentation: Guides, tutorials, and


references provided by the database vendors. Community Support: Online forums, discussion
groups, and communities where developers share experiences and help each other.

Performance Tuning and Optimization: Indexing Strategies: Designing and maintaining


effective indexes to improve query performance. Query Optimization: Analyzing and fine-
tuning queries for better execution speed. Resource Allocation: Proper allocation of memory,
CPU, and disk resources for optimal performance.

HOW NoSQL DATABASE WORKS

NoSQL databases work differently from traditional relational databases. They are designed to
handle large volumes of unstructured or semi-structured data, offer high scalability, and often
provide better performance for certain types of applications. The specific workings of NoSQL
databases can vary depending on the type of database model being used (document-based,
key-value, column-family, graph).

Data Model Flexibility: NoSQL databases allow more flexible data modeling compared to
rigid tabular structures of relational databases. Data can be stored as documents, key-value
pairs, wide columns, or graph structures.

Data Storage: In document-based NoSQL databases (e.g., MongoDB), data is stored as


JSON-like documents with nested fields. In key-value stores (e.g., Redis), data is stored as
simple key-value pairs. In column-family databases (e.g., Cassandra), data is organized into
columns and column families. Graph databases (e.g., Neo4j) use nodes and edges to represent
data and relationships.

Schema Flexibility: NoSQL databases often have a dynamic or schema-less structure,


allowing data to be added or modified without strictly adhering to a predefined schema.

Horizontal Scalability: NoSQL databases are designed for horizontal scalability, meaning
they can distribute data across multiple nodes or servers to handle high loads and
accommodate growing data volumes.

CAP Theorem: The CAP theorem (Consistency, Availability, Partition Tolerance) is a


fundamental concept in NoSQL databases. It states that a database can prioritize only two out
of the three factors at any given time. Most NoSQL databases prioritize either availability and
partition tolerance (AP) or consistency and partition tolerance (CP).

Eventual Consistency: Many NoSQL databases, particularly those prioritizing high


availability, use the concept of eventual consistency. This means that after an update, the
database will eventually become consistent across all nodes, even in the presence of network
partitions.

Querying: NoSQL databases often provide specific query languages or APIs optimized for
their data models. Document-based databases use queries similar to JSON syntax to retrieve
data from documents. Key-value databases allow direct retrieval of values using keys. Graph
databases offer query languages to traverse relationships between nodes.

Indexes: NoSQL databases use indexes to speed up data retrieval. Secondary indexes allow
querying on fields other than the primary key. In some NoSQL databases, indexing is crucial
for efficient query performance.

Data Distribution: In distributed environments, NoSQL databases use techniques like


sharding and replication to distribute data across nodes while ensuring fault tolerance and
availability.
THE VALUE OF RELATIONAL DATABASES

Relational databases have become such an embedded part of our computing culture that it’s
easy to take them for granted. It’s therefore useful to revisit the benefits they provide.

Values are:

• Getting at Persistent Data

• Concurrency

• Integration

• A (Mostly) Standard Model

Getting at Persistent Data: Persistent data means data that is stored reliably over time and
across system restarts. Relational databases are designed to provide persistent and durable
storage for data. This means that data stored in a relational database should be available and
retrievable as long as the database itself is operational. Relational databases offer a structured
way to organize and access data through tables, rows, and columns, making it relatively easy
to retrieve specific data and perform various operations on it. Probably the most obvious
value of a database is keeping large amounts of persistent data. Most computer architectures
have the notion of two areas of memory: a fast volatile “main memory” and a larger but
slower “backing store.” Main memory is both limited in space and loses all data when you
lose power or something bad happens to the operating system. Therefore, to keep data
around, we write it to a backing store, commonly seen a disk (although these days that disk
can be persistent memory). The backing store can be organized in all sorts of ways. For many
productivity applications (such as word processors), it’s a file in the file system of the
operating system. For most enterprise applications, however, the backing store is a database.
The database allows more flexibility than a file system in storing large amounts of data in a
way that allows an application program to get at small bits of that information quickly and
easily.

Concurrency: Concurrency means simultaneous access and modification of data by multiple


users or applications. Concurrency control is a critical aspect of relational databases. It
ensures that multiple users or processes can access and modify data simultaneously without
causing data inconsistencies or conflicts. Relational databases use mechanisms such as
locking, transactions, and isolation levels to manage concurrency, ensuring that data remains
consistent and reliable even in a multi-user environment. Enterprise applications tend to have
many people looking at the same body of data at once, possibly modifying that data. Most of
the time they are working on different areas of that data, but occasionally they operate on the
same bit of data. As a result, we have to worry about coordinating these interactions to avoid
such things as double booking of hotel rooms. Concurrency is notoriously difficult to get
right, with all sorts of errors that can trap even the most careful programmers. Since
enterprise applications can have lots of users and other systems all working concurrently,
there’s a lot of room for bad things to happen. Relational databases help handle this by
controlling all access to their data through transactions. While this isn’t a cure-all (you still
have to handle a transactional error when you try to book a room that’s just gone), the
transactional mechanism has worked well to contain the complexity of concurrency.
Transactions also play a role in error handling. With transactions, you can make a change,
and if an error occurs during the processing of the change you can roll back the transaction to
clean things up.

Integration: Enterprise applications live in a rich ecosystem that requires multiple


applications, written by different teams, to collaborate in order to get things done. This kind
of inter-application collaboration is awkward because it means pushing the human
organizational boundaries. Applications often need to use the same data and updates made
through one application have to be visible to others. A common way to do this is shared
database integration where multiple applications store their data in a single database. Using a
single database allows all the applications to use each others’ data easily, while the database’s
concurrency control handles multiple applications in the same way as it handles multiple
users in a single application. Relational databases provide mechanisms for integrating and
relating data from different tables within the database. This is achieved through the use of
primary and foreign keys, which establish relationships between tables. Integration also refers
to the ability of relational databases to work seamlessly with various programming languages,
frameworks, and tools. This makes it easier to develop applications that interact with the
database.

A (Mostly) Standard Model: Relational databases are typically based on a standardized


model, which is the relational model. This model defines the structure of data as tables
(relations) with rows (tuples) and columns (attributes). SQL (Structured Query Language) is a
widely adopted standard for interacting with relational databases. It provides a standardized
way to query, manipulate, and manage data across different database management systems
(DBMS). The use of a standardized model and language promotes interoperability and
portability, allowing applications to work with different relational database systems without
major modifications. Relational databases have succeeded because they provide the core
benefits. As a result, developers and database professionals can learn the basic relational
model and apply it in many projects. Although there are differences between different
relational databases, the core mechanisms remain the same: Different vendors’ SQL dialects
are similar, transactions operate in mostly the same way.

You might also like