0% found this document useful (0 votes)

8 views9 pages

Introduction To NoSQL

The document provides an overview of NoSQL databases, highlighting their non-relational nature and flexibility in handling various data types without fixed schemas. It compares SQL and NoSQL databases, detailing their differences in data models, scalability, and use cases, while also discussing the CAP theorem and sharding as techniques for managing distributed databases. Additionally, it outlines the steps for migrating from SQL to NoSQL and describes different types of NoSQL databases, including document-based, key-value, column-family, and graph databases.

Uploaded by

vijaylaxmivijjub39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views9 pages

Introduction To NoSQL

Uploaded by

vijaylaxmivijjub39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Unit – III

Data storage and manipulation

Introduction to NoSQL

A NoSQL database is a non-relational database that handles

unstructured, semi-structured, and structured data. Unlike
traditional SQL databases, NoSQL databases don't rely on fixed
schemas, making them highly flexible and adaptable. "NoSQL" stands
for "Not Only SQL," indicating their ability to support a variety of
data models beyond just relational data. These databases are optimized
for scalability, performance, and high availability.

NoSQL databases are often used for big data applications, real-time
analytics, and handling large volumes of data that may change over
time. They allow developers to store data without predefined schemas,
providing greater flexibility in handling diverse datasets.
Additionally, NoSQL databases typically scale horizontally by adding
more servers, enabling them to manage increasing data or traffic.

Several types of NoSQL databases, including document-based, key-value,

column-family, and graph databases, each suited for specific use
cases. While NoSQL databases offer fast read and write operations,
they may sacrifice ACID compliance in favor of eventual consistency.
They are popular in applications like social media, IoT, and e-
commerce, where rapid changes and large datasets are common. However,
developers must carefully consider their use case, as NoSQL databases
may require more complex management and offer limited querying
capabilities compared to traditional relational databases.

SQL vs NoSQL:
SQL databases are best for structured data with complex relationships
and require strong consistency. In contrast, NoSQL databases are ideal
for handling large, distributed, and flexible datasets with varying
structures and the need for scalability.

The table lists a few differences between the SQL and NoSQL Databases:

Feature SQL NoSQL

Relational (tables with Non-relational

Data Model (document, key-value,
rows and columns)
column, graph)
Fixed schema (must Schema-less (data can be
Schema define schema stored without
beforehand) predefined schema)
Vertical scaling Horizontal scaling
Scalability (requires more powerful (adding more servers to
hardware) distribute data)
ACID (Atomicity, Many do not fully
ACID Compliance Consistency, Isolation, support ACID. Focuses on
Durability) eventual consistency
SQL (Structured Query Varies (e.g., MongoDB
Query Language
Language) uses its query language)
Eventual consistency
Strong consistency and (may allow some
Data Integrity
data integrity inconsistencies
temporarily)
Full support for
Transaction Limited or no support
transactions (e.g.,
Support for complex transactions
complex joins)
Suitable for structured Best for flexible, large
data with complex – scale or distributed
Use Case
relationships (e.g., data(e.g., social media,
financial systems) real-time analytics)
Example MySQL, PostgreSQL, MongoDB, Cassandra,
Databases Oracle, SQL Server Redis, CouchDB, Neo4j
Faster for big data,
Slower for large–scale
real-time analytics, and
Performance applications with
applications with high
complex queries
throughput
Highly flexible with the
Less flexible due to
ability to store varied
Flexibility rigid schema and table
data formats (e.g.,
structures
JSON, key-value pairs)
Limited or no support
Supports complex joins
Joins for joins. Data is often
between tables
denormalized.
Data is often
Data is normalized to
Normalization denormalized to improve
reduce redundancy
performance.
It may allow more
Minimal redundancy due redundancy to increase
Data Redundancy
to normalization performance and
scalability
Stores data in
Stores data in tables documents, key-value
Data Storage
with rows and columns pairs, columns, or
graphs
Eventually consistent,
Firm consistency,
Consistency focusing on availability
transactions are
Model and partition tolerance
reliable.
(CAP Theorem)
Typically scales Scales horizontally
Scalability
vertically (upgrading a (adding more servers or
Model
single machine) nodes)
Migrating from SQL to NoSQL database:

Migrating from an SQL to a NoSQL database involves several necessary

steps and considerations:

1. Assess the Need: Evaluate if your application requires more

flexibility and scalability or handles large volumes of
unstructured data, which NoSQL is better suited for.
2. Choose the Right NoSQL Database: Select from various NoSQL types
such as document-based (MongoDB), key-value (Redis), column-
family (Cassandra), or graph databases (Neo4j), depending on
your use case.
3. Analyze the Data Model: SQL uses a structured schema with tables,
while NoSQL is more flexible. Data may need to be denormalized
for NoSQL, and relationships (joins) will be handled
differently.
4. Data Migration: Extract data from SQL, transform it as needed,
and load it into NoSQL. This may involve using ETL tools to
automate the process.
5. Modify Application Code: Replace SQL queries with NoSQL-specific
queries and APIs. Adjust for the NoSQL database's structure and
eventual consistency model.
6. Performance Optimization: Implement indexing, sharding, and
caching to optimize performance in the NoSQL environment.
7. Testing and Monitoring: Thoroughly test the migration, monitor
the database’s performance, and adjust as needed for
scalability.
8. Backup and Future Planning: Ensure you have a solid backup
strategy and regularly review the NoSQL database’s performance
to adapt to future needs.

Different Types of NoSQL Databases:

NoSQL databases are categorized based on their data models, and each
type is suited for specific use cases. Here are the four main types
of NoSQL databases:

1. Document-based NoSQL Databases

• Data Model: Stores data as documents (usually JSON, BSON, or XML

format). Each document contains key-value pairs and can be
nested.

• Use Case: Ideal for semi-structured data, like user profiles,

product catalogs, and content management systems.

• Example Databases:

o MongoDB

o CouchDB
o Couchbase

2. Key-Value Store NoSQL Databases

• Data Model: Stores data as pairs of keys and values. Each key
is unique, and the value can be any data type (e.g., string,
number, object).

• Use Case: Best for applications that require fast access to data
using a unique key, such as caching or session storage.

• Example Databases:

o Redis

o DynamoDB

o Riak

3. Column-family Store NoSQL Databases

• Data Model: Stores data in columns rather than rows. Each column
family stores related data together, optimizing for read and
write operations.

• Use Case: Suitable for applications that require quick access

to large amounts of data or perform analytical queries on
specific columns (e.g., time-series data, event logging).

• Example Databases:

o Cassandra

o HBase

o ScyllaDB

4. Graph-based NoSQL Databases

• Data Model: Uses graph structures consisting of nodes, edges,

and properties to represent and store data. This is ideal for
managing relationships between entities.

• Use Case: Best for applications like social networks,

recommendation engines, and fraud detection, where relationships
between entities are essential.

• Example Databases:

o Neo4j

o ArangoDB

o OrientDB

Each type of NoSQL database is tailored to specific needs, offering

unique advantages in scalability, performance, and data modeling.
CAP Theorem

The CAP Theorem is a fundamental principle in distributed systems,

and it was proposed by computer scientist Eric Brewer in 2000. It
describes the trade-offs between three key properties in a distributed
database system: Consistency, Availability, and Partition Tolerance.
The theorem states that a distributed system can achieve at most two
of these three properties simultaneously but not all three
simultaneously.

The three properties of the CAP theorem are:

1. Consistency

• Definition: Every read request in the system returns the most

recent write. This means that once data is written to the system,
all subsequent reads will reflect that data, no matter which
node the request is directed to.

• Example: In a consistent system, if you update a user's profile

information, every subsequent read (from any part of the system)
will reflect the updated information immediately.

2. Availability

• Definition: Every request (read or write) will receive a

response, regardless of whether the data is up to date. This
means the system remains operational and returns a response even
if some nodes are down.

• Example: In an available system, if a user tries to retrieve

data, the system will still return data, even if it might not
be the latest version or some replicas of the data are
unavailable.
3. Partition Tolerance

• Definition: The system will continue to operate correctly even

if network partitions (communication breakdowns) prevent some
nodes from communicating with each other. A partitioned system
can still perform reads and writes, even if some nodes are
temporarily disconnected from the rest of the system.

• Example: If a network partition occurs between two regions, the

system can still function in both areas, allowing reads and
writes despite the lack of communication between nodes.

The Trade-off (According to CAP Theorem)

The CAP theorem states that a distributed system can guarantee at most
two of these three properties at any given time. This means a system
must sacrifice one of the properties depending on the use case and
the design priorities.

• Consistency + Availability (CA): The system guarantees that all

nodes return the same data (consistency), and every request will
return a response (availability). However, if a network
partition occurs, the system may not function properly because
it sacrifices Partition Tolerance.

o Example: A system like a traditional relational database

where all nodes must be in sync but can fail when there's
a partition.

• Consistency + Partition Tolerance (CP): The system guarantees

that all nodes will have the most recent data (consistency) and
will continue to work even if a network partition occurs.
However, if there’s a partition, some requests might fail,
sacrificing Availability.

o Example: Zookeeper is a CP system that prioritizes

consistent data even during a partition. However, it might
refuse to process requests during a partition to maintain
consistency.

• Availability + Partition Tolerance (AP): The system guarantees

that every request gets a response (availability), and it
continues to work even in the case of network partitioning.
However, it might return stale data or inconsistent results
because Consistency is sacrificed.

o Example: A system like Cassandra, where even if some nodes

are partitioned, the system remains available and
responsive, but you might get outdated or inconsistent
data.
Examples of CAP Trade-offs in Real Systems

• CA (Consistency + Availability): Systems like HBase or Google

Spanner (in specific configurations) focus on consistency and
availability but are limited in handling network partitions.

• CP (Consistency + Partition Tolerance): HBase and Zookeeper are

examples of systems prioritizing consistency and partition
tolerance. In the event of network partitions, these systems may
refuse to serve some requests to maintain data consistency.

• AP (Availability + Partition Tolerance): Cassandra, Couchbase,

and Riak are examples of databases prioritizing availability and
partition tolerance. They remain operational even during network
partitions but might sometimes serve outdated or inconsistent
data.

Beyond the CAP Theorem: BASE vs. ACID

While the CAP theorem focuses on the trade-offs in distributed

systems, databases that prioritize Availability and Partition
Tolerance (AP) often use the BASE (Basically Available, Soft state,
eventually consistent) model as an alternative to the ACID (Atomicity,
Consistency, Isolation, Durability) properties of traditional
relational databases.

• BASE: Allows for temporary inconsistencies but guarantees that,

over time, the system will become consistent (eventual
consistency).

• ACID: Ensures strong consistency and reliability but may not

scale as efficiently as BASE systems in distributed
environments.

Sharding:
Sharding is a database partitioning technique used to horizontally
scale a database by distributing data across multiple servers (or
nodes). Sharding aims to handle large datasets, ensure high
availability, and improve system performance by distributing the load
and increasing capacity. Sharding helps databases scale out rather
than scaling up (which involves upgrading a single server). It is
beneficial in systems where data grows too large for a single server
to handle efficiently.

How Sharding Works

In sharding, the data in a database is split into smaller chunks,

known as shards, which are distributed across multiple servers (or
nodes). Each shard is a subset of the entire dataset, and each server
holds one or more subsets. These subsets are typically divided by a
shard key, a specific attribute of the data used to determine how the
data is split.
Types of Sharding

1. Horizontal Sharding (Data Partitioning):

o Definition: In horizontal sharding, the rows of a database

table are divided into smaller chunks, and each chunk is
stored on a different server or node. This allows data to
be distributed across multiple machines, improving
performance and scalability.

2. Vertical Sharding:

o Definition: Different table columns are stored on different

servers or nodes in vertical sharding. For example, in a
user’s table, one server may store the columns UserID,
Name, and Email, while another stores PhoneNumber and
Address.

3. Directory-Based Sharding:

o Definition: In directory-based sharding, a lookup table

(or directory) is used to track where each piece of data
is stored. The directory contains the shard key and the
location of the data, so the system knows which shard to
query for specific data.

4. Range-based Sharding:

o Definition: Data is split into ranges based on the shard

key in range-based sharding. For example, if the shard key
is a UserID, the data might be divided into shards
containing ranges of UserID values (e.g., 1-1000, 1001-
2000, etc.).

5. Hash-based Sharding:

o Definition: In hash-based sharding, the shard key is passed

through a hash function, and the resulting hash value
determines which shard the data will belong to. This
approach helps to distribute data evenly across shards.

6. Composite Sharding:

o Definition: Composite sharding uses more than one attribute

to determine how data is distributed across shards. It
combines multiple keys or fields in a compound sharding
strategy.
Advantages of Sharding

Scalability:

Sharding allows databases to scale horizontally by adding more servers

to handle growing data and traffic. This enables systems to manage
massive amounts of data and large concurrent requests.

Improved Performance:

By distributing data across multiple servers, sharding reduces the

load on any single server, leading to faster query responses and
improved performance. Each server can handle only a subset of the
data, making operations faster.

Fault Tolerance:

Sharding provides fault tolerance by storing data across multiple

servers. If one shard becomes unavailable, other shards can still
serve requests, ensuring the system remains available.

Load Balancing:

Sharding helps distribute the workload across multiple servers,

preventing one server from becoming a bottleneck. This ensures better
performance even with high traffic volumes.

Examples of Sharded Databases

1. MongoDB: A popular NoSQL database that supports sharding. It

allows data partitioned across multiple servers, improving
scalability and performance.

2. Cassandra: A highly scalable NoSQL database that uses a

decentralized approach to sharding and distributing data across
multiple nodes in a cluster.

3. Elasticsearch: A search engine that uses sharding to distribute

data and queries across a cluster of nodes, providing fast search
results.

Sharding is a powerful technique to scale a database horizontally by

distributing data across multiple servers. While it offers scalability
and improved performance, it also introduces challenges like
complexity, managing cross-shared queries, and ensuring consistency.
Choosing the right shard key, managing rebalancing, and handling
complex queries are critical to making sharding work effectively.
Sharding is commonly used in large-scale applications like social
media platforms, e-commerce websites, and cloud-based services where
the volume of data grows rapidly.

Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
9 pages
NoSQL Introduction
No ratings yet
NoSQL Introduction
8 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
7 pages
ACID vs BASE in NoSQL Databases
No ratings yet
ACID vs BASE in NoSQL Databases
47 pages
Understanding NoSQL Databases and Types
No ratings yet
Understanding NoSQL Databases and Types
18 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
45 pages
NoSQL vs SQL: Key Differences Explained
No ratings yet
NoSQL vs SQL: Key Differences Explained
23 pages
NoSQL Database Overview and Types
No ratings yet
NoSQL Database Overview and Types
52 pages
Overview of NoSQL Data Management
No ratings yet
Overview of NoSQL Data Management
29 pages
NoSQL Databases: Overview and Types
No ratings yet
NoSQL Databases: Overview and Types
57 pages
Unit - 1 (Overview of Nosql)
No ratings yet
Unit - 1 (Overview of Nosql)
20 pages
NoSQL Unit 1 Week 1 Notes
No ratings yet
NoSQL Unit 1 Week 1 Notes
16 pages
NoSQL Database Overview and Features
No ratings yet
NoSQL Database Overview and Features
112 pages
Understanding NoSQL Databases: Features & Types
No ratings yet
Understanding NoSQL Databases: Features & Types
12 pages
BDAUnit 2 Notes
No ratings yet
BDAUnit 2 Notes
29 pages
Introduction to NoSQL Databases Explained
No ratings yet
Introduction to NoSQL Databases Explained
5 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
3 pages
Notes - SQL vs. NoSQL Databases
No ratings yet
Notes - SQL vs. NoSQL Databases
4 pages
NoSQL Databases: Overview and Benefits
No ratings yet
NoSQL Databases: Overview and Benefits
28 pages
Nosql Unit 1
No ratings yet
Nosql Unit 1
21 pages
Types of NoSQL Databases Overview
No ratings yet
Types of NoSQL Databases Overview
42 pages
Comprehensive NoSQL Database Guide
No ratings yet
Comprehensive NoSQL Database Guide
3 pages
NoSQL Database Overview and Types
No ratings yet
NoSQL Database Overview and Types
31 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
3 pages
No SQL
No ratings yet
No SQL
6 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
21 pages
NoSQL Database Management Overview
No ratings yet
NoSQL Database Management Overview
11 pages
No SQL
No ratings yet
No SQL
18 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
29 pages
Module VI - NoSQL Databases
No ratings yet
Module VI - NoSQL Databases
9 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
41 pages
Understanding NoSQL Database Types
No ratings yet
Understanding NoSQL Database Types
36 pages
NoSQL vs NewSQL: MongoDB & Cassandra Insights
No ratings yet
NoSQL vs NewSQL: MongoDB & Cassandra Insights
39 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
33 pages
Unit Ii Nosql Data Management 2.1.1 Introduction To Nosql
No ratings yet
Unit Ii Nosql Data Management 2.1.1 Introduction To Nosql
57 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
23 pages
NoSQL Database Overview and Types
No ratings yet
NoSQL Database Overview and Types
17 pages
Understanding MongoDB as NoSQL
No ratings yet
Understanding MongoDB as NoSQL
18 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
17 pages
A Comprehensive Exploration of NoSQL Databases and MongoDB
No ratings yet
A Comprehensive Exploration of NoSQL Databases and MongoDB
31 pages
Understanding NoSQL Databases Basics
No ratings yet
Understanding NoSQL Databases Basics
31 pages
Overview of NoSQL Databases
No ratings yet
Overview of NoSQL Databases
5 pages
Understanding NoSQL Databases Explained
No ratings yet
Understanding NoSQL Databases Explained
16 pages
DB Lab 13 Manual
No ratings yet
DB Lab 13 Manual
13 pages
No SQL
No ratings yet
No SQL
28 pages
Understanding NoSQL Databases
No ratings yet
Understanding NoSQL Databases
22 pages
Module4 Part1
No ratings yet
Module4 Part1
48 pages
Understanding NoSQL Architecture
No ratings yet
Understanding NoSQL Architecture
22 pages
NoSQL Database Management Overview
No ratings yet
NoSQL Database Management Overview
22 pages
Understanding NoSQL Databases and Their Types
No ratings yet
Understanding NoSQL Databases and Their Types
26 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
48 pages
Overview of NoSQL Database Types
No ratings yet
Overview of NoSQL Database Types
9 pages
Understanding NoSQL and CAP Theorem
No ratings yet
Understanding NoSQL and CAP Theorem
12 pages
Understanding NoSQL Databases and Benefits
No ratings yet
Understanding NoSQL Databases and Benefits
137 pages
NoSQL Database Overview and Comparison
No ratings yet
NoSQL Database Overview and Comparison
51 pages
Introduction to NoSQL Databases
No ratings yet
Introduction to NoSQL Databases
18 pages
NoSQL Database Use Cases and Benefits
No ratings yet
NoSQL Database Use Cases and Benefits
14 pages
Sustainability 16 01347 v2
No ratings yet
Sustainability 16 01347 v2
25 pages
AI Literacy and Competency Framework
No ratings yet
AI Literacy and Competency Framework
10 pages
Marketing Research Process Overview
100% (1)
Marketing Research Process Overview
14 pages
10 Golden Rules of Data Visualization
No ratings yet
10 Golden Rules of Data Visualization
1 page
Batch Data Communication in SAP ABAP
No ratings yet
Batch Data Communication in SAP ABAP
45 pages
Decision Support Systems Overview and Analysis
No ratings yet
Decision Support Systems Overview and Analysis
6 pages
Automated Body Posture Detection System
No ratings yet
Automated Body Posture Detection System
6 pages
SS2 Data Processing Exam Questions
No ratings yet
SS2 Data Processing Exam Questions
4 pages
Introduction to Research Methods
No ratings yet
Introduction to Research Methods
7 pages
HR Dashboard Creation Guide
No ratings yet
HR Dashboard Creation Guide
9 pages
Understanding the Nature of Research
100% (1)
Understanding the Nature of Research
11 pages
DPS Bangalore North AS Level Test Portions
No ratings yet
DPS Bangalore North AS Level Test Portions
2 pages
Understanding Biograms in Sociology
No ratings yet
Understanding Biograms in Sociology
8 pages
Pedagogia Evening Coaching Classes Exam
No ratings yet
Pedagogia Evening Coaching Classes Exam
3 pages
Describing Charts, Diagrams, and Tables
No ratings yet
Describing Charts, Diagrams, and Tables
6 pages
Research Report Writing Guidelines
No ratings yet
Research Report Writing Guidelines
13 pages
INFS2200 Mid-Sem Exam Paper 2014
No ratings yet
INFS2200 Mid-Sem Exam Paper 2014
8 pages
Types of Data in Data Mining Explained
No ratings yet
Types of Data in Data Mining Explained
4 pages
Microeconomics Data Collection Guide
No ratings yet
Microeconomics Data Collection Guide
10 pages
Data Replication and Partitioning in DDBMS
No ratings yet
Data Replication and Partitioning in DDBMS
52 pages
Cash Flow Forecast for Bounce Fitness
No ratings yet
Cash Flow Forecast for Bounce Fitness
19 pages
Family Involvement in PDL Programs at Caloocan
No ratings yet
Family Involvement in PDL Programs at Caloocan
16 pages
CS8481 - Database Management Systems Lab Manualsairam
No ratings yet
CS8481 - Database Management Systems Lab Manualsairam
160 pages
Understanding Research Problems
No ratings yet
Understanding Research Problems
11 pages
Blockchain's Impact on Charity Transparency
No ratings yet
Blockchain's Impact on Charity Transparency
8 pages
Understanding Communication Dynamics
No ratings yet
Understanding Communication Dynamics
5 pages
Data Collection and Sampling Methods
No ratings yet
Data Collection and Sampling Methods
32 pages
ZFS Filesystem Overview and Benefits
No ratings yet
ZFS Filesystem Overview and Benefits
36 pages
Implementing GTIDs in MySQL Replication
No ratings yet
Implementing GTIDs in MySQL Replication
17 pages
ServiceNow CTA PDF Questions
100% (2)
ServiceNow CTA PDF Questions
5 pages

Introduction To NoSQL

Uploaded by

Introduction To NoSQL

Uploaded by

Unit – III

Data storage and manipulation

A NoSQL database is a non-relational database that handles

Several types of NoSQL databases, including document-based, key-value,

Feature SQL NoSQL

Relational (tables with Non-relational

Migrating from an SQL to a NoSQL database involves several necessary

1. Assess the Need: Evaluate if your application requires more

Different Types of NoSQL Databases:

1. Document-based NoSQL Databases

• Data Model: Stores data as documents (usually JSON, BSON, or XML

• Use Case: Ideal for semi-structured data, like user profiles,

2. Key-Value Store NoSQL Databases

3. Column-family Store NoSQL Databases

• Use Case: Suitable for applications that require quick access

4. Graph-based NoSQL Databases

• Data Model: Uses graph structures consisting of nodes, edges,

• Use Case: Best for applications like social networks,

Each type of NoSQL database is tailored to specific needs, offering

The CAP Theorem is a fundamental principle in distributed systems,

The three properties of the CAP theorem are:

• Definition: Every read request in the system returns the most

• Example: In a consistent system, if you update a user's profile

• Definition: Every request (read or write) will receive a

• Example: In an available system, if a user tries to retrieve

• Definition: The system will continue to operate correctly even

• Example: If a network partition occurs between two regions, the

The Trade-off (According to CAP Theorem)

• Consistency + Availability (CA): The system guarantees that all

o Example: A system like a traditional relational database

• Consistency + Partition Tolerance (CP): The system guarantees

o Example: Zookeeper is a CP system that prioritizes

• Availability + Partition Tolerance (AP): The system guarantees

o Example: A system like Cassandra, where even if some nodes

• CA (Consistency + Availability): Systems like HBase or Google

• CP (Consistency + Partition Tolerance): HBase and Zookeeper are

• AP (Availability + Partition Tolerance): Cassandra, Couchbase,

Beyond the CAP Theorem: BASE vs. ACID

While the CAP theorem focuses on the trade-offs in distributed

• BASE: Allows for temporary inconsistencies but guarantees that,

• ACID: Ensures strong consistency and reliability but may not

How Sharding Works

In sharding, the data in a database is split into smaller chunks,

1. Horizontal Sharding (Data Partitioning):

o Definition: In horizontal sharding, the rows of a database

o Definition: Different table columns are stored on different

o Definition: In directory-based sharding, a lookup table

o Definition: Data is split into ranges based on the shard

o Definition: In hash-based sharding, the shard key is passed

o Definition: Composite sharding uses more than one attribute

Sharding allows databases to scale horizontally by adding more servers

By distributing data across multiple servers, sharding reduces the

Sharding provides fault tolerance by storing data across multiple

Sharding helps distribute the workload across multiple servers,

Examples of Sharded Databases

1. MongoDB: A popular NoSQL database that supports sharding. It

2. Cassandra: A highly scalable NoSQL database that uses a

3. Elasticsearch: A search engine that uses sharding to distribute

Sharding is a powerful technique to scale a database horizontally by

You might also like