0% found this document useful (0 votes)
24 views18 pages

Database Normalization and Types Explained

The document discusses normalization in database management, detailing its benefits, types (1NF, 2NF, 3NF), and the process of reducing data redundancy and anomalies. It also covers denormalization, an optimization technique that introduces redundancy to improve query performance, along with its advantages and disadvantages. Additionally, comparisons between normalization and denormalization are provided, highlighting their distinct purposes and impacts on database structure and performance.

Uploaded by

chetankosare426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views18 pages

Database Normalization and Types Explained

The document discusses normalization in database management, detailing its benefits, types (1NF, 2NF, 3NF), and the process of reducing data redundancy and anomalies. It also covers denormalization, an optimization technique that introduces redundancy to improve query performance, along with its advantages and disadvantages. Additionally, comparisons between normalization and denormalization are provided, highlighting their distinct purposes and impacts on database structure and performance.

Uploaded by

chetankosare426
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Management System

Unit- III
Normalization: Benefits of Normalization, First, Second and Third Normal Form,
Denormalization, Advantages and Disadvantages of Denormalization
Parallel Databases: Parallel Database Architecture, Advantages and Disadvantage of Parallel
Databases
Distributed Databases: Basic Concepts, Distributed Database Management System (DDBMS),
Advantages and Disadvantages of Distributed Databases, DDBMS Architectures, Comparison
of DBMS and DDBMS, Comparison of Parallel and Distributed Databases.

Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
o Making relations very large.
o It isn't easy to maintain and update data as it would involve searching many
records in relation.
o Wastage and poor utilization of disk space and resources.
o The likelihood of errors and inconsistencies increases.

So to handle these problems, we should analyze and decompose the relations with
redundant data into smaller, simpler, and well-structured relations that are satisfy
desirable properties. Normalization is a process of decomposing the relations into
relations with fewer attributes.

What is Normalization?
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like Insertion,
Update, and Deletion Anomalies.
o Normalization divides the larger table into smaller and links them using
relationships.
o The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?


The main reason for normalizing the relations is removing these anomalies. Failure to
eliminate anomalies leads to data redundancy and can cause data integrity and other
problems as the database grows. Normalization consists of a series of guidelines that
helps to guide you in creating a good database structure.

Data modification anomalies can be categorized into three types:


o Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new
tuple into a relationship due to lack of data.
o Deletion Anomaly: The delete anomaly refers to the situation where the deletion
of data results in the unintended loss of some other important data.
o Updatation Anomaly: The update anomaly is when an update of a single data
value requires multiple rows of data to be updated.

Advantages of Normalization

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 1
o Normalization helps to minimize data redundancy.
o Greater overall database organization.
o Data consistency within the database.
o Much more flexible database design.
o Enforces the concept of relational integrity.

Disadvantages of Normalization

o You cannot start building the database before knowing what the user needs.
o The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
o It is very time-consuming and difficult to normalize relations of a higher degree.
o Careless decomposition may lead to a bad database design, leading to serious
problems.

Types of Normal Forms:

Normalization works through a series of stages called Normal forms. The normal forms
apply to individual relations. The relation is said to be in particular normal form if it
satisfies constraints.

Normal Description
Form
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition dependency exists.

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.
o It states that an attribute of a table cannot hold multiple values. It must hold only
single-valued attribute.
o First normal form disallows the multi-valued attribute, composite attribute, and
their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute


EMP_PHONE.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 2
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.
o In the second normal form, all non-key attributes are fully functional dependent
on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they
teach. In a school, a teacher can teach more than one subject.

TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID


which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain any transitive partial
dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data
integrity.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 3
o If there is no transitive dependency for non-prime attributes, then the relation
must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for
every non-trivial function dependency X → Y.

1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-
prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP


dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates the rule of third normal
form.

That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 4
The benefits of normalization include:
 Searching, sorting, and creating indexes is faster, since tables are narrower, and more
rows fit on a data page.
 You usually have more tables.
You can have more clustered indexes (one per table), so you get more flexibility in
tuning queries.
 Index searching is often faster, since indexes tend to be narrower and shorter.
 More tables allow better use of segments to control physical placement of data.
 You usually have fewer indexes per table, so data modification commands are faster.
 Fewer null values and less redundant data, making your database more compact.
 Triggers execute more quickly if you are not maintaining redundant data.
 Data modification anomalies are reduced.
 Normalization is conceptually cleaner and easier to maintain and change as your
needs change.

Denormalization in Databases
Denormalization is a database optimization technique in which we add redundant data
to one or more tables. This can help us avoid costly joins in a relational database.
Note that denormalization does not mean ‘reversing normalization’ or ‘not to
normalize’. It is an optimization technique that is applied after normalization.
Basically, the process of taking a normalized schema and making it non-normalized is
called denormalization, and designers use it to tune the performance of systems to
support time-critical operations.
In a traditional normalized database, we store data in separate logical tables and
attempt to minimize redundant data. We may strive to have only one copy of each
piece of data in a database.
For example, in a normalized database, we might have a Courses table and a
Teachers table. Each entry in Courses would store the teacherID for a Course but not
the teacherName. When we need to retrieve a list of all Courses with the Teacher’s
name, we would do a join between these two tables.
In some ways, this is great; if a teacher changes his or her name, we only have to
update the name in one place.
The drawback is that if tables are large, we may spend an unnecessarily long time
doing joins on tables.
Denormalization, then, strikes a different compromise. Under denormalization, we
decide that we’re okay with some redundancy and some extra effort to update the
database in order to get the efficiency advantages of fewer joins.

How is Denormalization Different from Normalization?


Denormalization is a method to merge data from multiple relational tables into a single
table in a database that can be retrieved quickly. While Normalization, on the other
hand, is used to delete redundant/duplicate data from a database and replace it with
non-redundant and reliable data.
-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 5
Denormalization is used when joins are costly, and queries are run regularly on the
tables. Normalization, on the other hand, is typically used when a large number of
insert/update/delete operations are performed, and joins between those tables are not
expensive.

Advantages of Denormalization:
Denormalization is used by the database managers to increase the performance of a
database. Some of its advantages are:
1. Enhanced Query Performance Querying data from normalized databases
requires joining multiple tables which slows down the query performance. To
overcome this, redundancy is added to the database thus minimizing the number
of joins.
2. More convenient to manage databases In a normalized database, required
values are calculated while query execution takes a longer time thus slowing
down the execution of the query. Thus to counter this, denormalization is done to
fetch queries that can be simpler because we need to look at fewer tables.
3. Facilitate and accelerate reporting In normalized databases, if we want to
regularly monitor revenues over a certain period, Generating such reports from
live data will require "searching" throughout the entire database thus significantly
slowing it down.
4. Minimizing the need for joins
5. Reducing the number of tables
6. Queries to be retrieved can be simpler.
7. Less likely to have bugs
8. Pre-computing derived values
9. Reducing the number of relations
10. Reducing the number of foreign keys in relation
11. Data modification at the computing time and rather than at the select time
12. Retrieving data is faster due to fewer joins

Disadvantages of Denormalization:
 Although Data Denormalization can avoid some anomalies that can lead to the
mismatch of the result, it may
 Slow down updates, although maybe speeding up retrievals.
 Make it more complex in others, although simplifying implementation.
 Be inconsistent.
 Sacrifice flexibility.
 It takes a lot of storage space due to storing redundant data.
 It is very expensive to maintain updates and inserts in a Denormalized database,
also code for insert and update is harder to write in this case.
 Data Integrity is not maintained in Denormalized databases, as there is
redundancy so data is inconsistent.

It also can
 Increase the size of relations.
 Make the update and insert codes harder to write.
 Involve Data redundancy which necessitates more storage.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 6
The data can be changed now in many places, so we have to be careful while adjusting
the data to avoid data anomalies. We can use triggers, transactions, or procedures to
avoid such inconsistencies.

Difference between Normalization and Denormalization


The process to alter the structure of a database is basically categorized into two ways,
one is Normalization and the other is Denormalization. The basic difference between
normalization and denormalization is that the database normalization removes the
redundancy of data and anomalies in a poorly designed table, while denormalization
combines multiple table data into one so that it can be queried quickly.
What is Normalization?
Normalization is used to remove redundant data from the database and to store non-
redundant and consistent data into it. It is a process of converting an unnormalized
table into a normalized table. Database normalization is an important process because
a poorly designed database table is inconsistent and may create issues while
performing operations like insertion, deletion, updating, etc.
The process of Normalization involves resolution of database anomalies, elimination of
data redundancy, data dependency, isolation of data, and data consistency.
Normalization in databases provides a formal framework to analyze the relations based
on the key attributes and their functional dependencies. It reduces the requirements of
restructuring of tables.
What is Denormalization?
Denormalization is used to combine multiple table data into one so that it can be
queried quickly. It is a process of storing the join of higher normal form relations in the
form of base relation that is in a lower normal form. The primary goal of denormalization
is to achieve the faster execution of the queries.
In the process of denormalization, the data is integrated into the same database.
Denormalization is mainly used where joins are expensive and queries are executed on
the table very frequently. However, there is a drawback of denormalization, that is, a
small wastage of memory.

Difference between Normalization and Denormalization


The following table highlights the important differences between Normalization and
Denormalization −
Factor Normalization Denormalization
Implementation Normalization is used to remove Denormalization is the process of
redundant data from the database adding some redundant data to a
and to store non-redundant and database that has been normalized,
consistent data into it. so as to improve the read
performance (execution time) of the
database
Focus Normalization mainly focuses on The real goal of denormalization is
clearing the database from unused to achieve the faster execution of
data and to reduce the data the queries by introducing
redundancy and inconsistency. redundancy.
Number of During Normalization, data is During Denormalization, data is
Tables reduced, so there will be a decrease integrated into the same database
-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 7
in the number of tables. and hence there will be an increase
in the number of tables.
Memory Normalization uses optimized Denormalization introduces some
consumption memory and hence faster in sort of wastage of memory.
performance.
Data integrity Normalization maintains data Denormalization does not maintain
integrity, i.e., any addition or deletion any data integrity.
of data from the table will not create
any mismatch in the relationship of
the tables.
Where to use Normalization is generally used Denormalization is used where joins
where a number of are expensive and frequent queries
insert/update/delete operations are are executed on the tables.
performed and joins of those tables
are not expensive.

Conclusion
Normalization is an important step in designing efficient and functional databases. Normalizing
a database involves removing redundancy which means there will be no duplicate entries of the
same data. Denormalization is a process that adds some redundant data into a normalized
database to enhance its functionality and minimize the running time of specific database
queries. You should not confuse a Denormalized database with a database that has not been
normalized.

Introduction of Parallel Database


Nowadays organizations need to handle a huge amount of data with a high transfer
rate. For such requirements, the client-server or centralized system is not efficient.
With the need to improve the efficiency of the system, the concept of the parallel
database comes in picture. A parallel database system seeks to improve the
performance of the system through parallelizing concept.
Need:
Multiple resources like CPUs and Disks are used in parallel. The operations are
performed simultaneously, as opposed to serial processing. A parallel server can
allow access to a single database by users on multiple machines. It also performs
many parallelization operations like data loading, query processing, building indexes,
and evaluating queries.
Advantages:
Here, we will discuss the advantages of parallel databases. Let’s have a look.
1. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 8
increase the performance of the system.

2. High availability –
In the parallel database, nodes have less contact with each other, so the failure of
one node doesn’t cause for failure of the entire system. This amounts to significantly
higher database availability.

3. Proper resource utilization –


Due to parallel execution, the CPU will never be idle. Thus, proper utilization of
resources is there.

4. Increase Reliability –
When one site fails, the execution can continue with another available site which is
having a copy of data. Making the system more reliable.

Performance Measurement of Databases:


Here, we will emphasize the performance measurement factor-like Speedup and
Scale-up. Let’s understand it one by one with the help of examples.

Speedup –
The ability to execute the tasks in less time by increasing the number of resources is
called Speedup.

What Is a Parallel Database?


A variety of hardware architectures allow multiple computers to share access to data,
software, or peripheral devices. A parallel database is designed to take advantage of
such architectures by running multiple instances which "share" a single physical
database. In appropriate applications, a parallel server can allow access to a single
database by users on multiple machines, with increased performance.

A parallel server processes transactions in parallel by servicing a stream of transactions


using multiple CPUs on different nodes, where each CPU processes an entire
transaction. Using parallel data manipulation language you can have one transaction
being performed by multiple nodes. This is an efficient approach because many
applications consist of online insert and update transactions which tend to have short
data access requirements. In addition to balancing the workload among CPUs, the
parallel database provides for concurrent access to data and protects data integrity.

Working of Parallel Database


Let us discuss how parallel database works in step by step manner −
Step 1 − Parallel processing divides a large task into many smaller tasks and executes
the smaller tasks concurrently on several CPU’s and completes it more quickly.
Step 2 − The driving force behind parallel database systems is the demand of
applications that have to query extremely large databases of the order of terabytes or
that have to process a large number of transactions per second.
Step 3 − In parallel processing, many operations are performed simultaneously as
opposed to serial processing, in which the computational steps are performed
sequentially.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 9
Parallel Database Architecture
A parallel Database is a DBMS that runs across multiple processors or CPUs and is
mainly designed to execute query operations in parallel, wherever possible. The
parallel DBMS link a number of smaller machines to achieve the same throughput as
expected from a single large machine.
In Parallel Databases, mainly there are three architectural designs for Parallel DBMS.
They are as follows:
 Shared Memory Architecture
 Shared Disk Architecture
 Shared Nothing Architecture
 Hierarchical System or Non-Uniform Memory Architecture

Let’s discuss them one by one:

Shared Memory Architecture


In Shared Memory Architecture, there are multiple CPUs that are attached to an
interconnection network. They are able to share a single or global main memory and
common disk arrays. It is to be noted that, In this architecture, a single copy of a multi-
threaded operating system and multithreaded DBMS can support these multiple
CPUs. Also, the shared memory is a solid coupled architecture in which multiple CPUs
share their memory. It is also known as Symmetric multiprocessing (SMP). This
architecture has a very wide range which starts from personal workstations that
support a few microprocessors in parallel via RISC.

 Shared Memory Architecture uses multiple processors which is attached to a global


shared memory via intercommunication channel or communication bus.
 Shared memory system have large amount of cache memory at each processors, so
referencing of the shared memory is avoided.
 If a processor performs a write operation to memory location, the data should be
updated or removed from that location.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 10
Advantages of Shared Memory Architecture

 Data is easily accessible to any processor.


 One processor can send message to other efficiently.
 It has high-speed data access for a limited number of processors.
 The communication is efficient.
Disadvantages of Shared Memory Architecture
 Waiting time of processors is increased due to more number of processors.
 Bandwidth problem.
 It cannot use beyond 80 or 100 CPUs in parallel.
 The bus or the interconnection network gets block due to the increment of the
large number of CPUs.

Shared Disk System


In Shared Disk Architecture, various CPUs are attached to an interconnection
network. In this, each CPU has its own memory and all of them have access to the
same disk. Also, note that here the memory is not shared among CPUs therefore each
node has its own copy of the operating system and DBMS. Shared disk architecture is
a loosely coupled architecture optimized for applications that are inherently
centralized. They are also known as clusters.

 Shared disk system uses multiple processors which are accessible to multiple disks via
intercommunication channel and every processor has local memory.
 Each processor has its own memory so the data sharing is efficient.
 The systems built around this system are called as clusters.

Advantages of Shared Disk System

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 11
 Fault tolerance is achieved using shared disk system.
Fault tolerance: If a processor or its memory fails, the other processor can
complete the task. This is called as fault tolerance.
 The interconnection network is no longer a bottleneck each CPU has its own
memory.
 Load-balancing is easier in shared disk architecture.
 There is better fault tolerance.

Disadvantage of Shared Disk System


 Shared disk system has limited scalability as large amount of data travels
through the interconnection channel.
 If more processors are added the existing processors are slowed down.
 If the number of CPUs increases, the problems of interference and memory
contentions also increase.
 There’s also exists a scalability problem.

Shared Nothing Disk System (Independent Resource)


Shared Nothing Architecture is multiple processor architecture in which each
processor has its own memory and disk storage. In this, multiple CPUs are attached to
an interconnection network through a node. Also, note that no two CPUs can access
the same disk area. In this architecture, no sharing of memory or disk resources is
done. It is also known as Massively Parallel Processing (MPP).

 Each processor in the shared nothing system has its own local memory and local disk.
 Processors can communicate with each other through intercommunication channel.
 Any processor can act as a server to serve the data which is stored on local disk.
Advantages of Shared nothing disk system
 Number of processors and disk can be connected as per the requirement in
share nothing disk system.
 Shared nothing disk system can support for many processors, which makes the
system more scalable.
 It has better scalability as no sharing of resources is done
-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 12
 Multiple CPUs can be added

Disadvantages of Shared nothing disk system

 Data partitioning is required in shared nothing disk system.


 Cost of communication for accessing local disk is much higher.

Hierarchical System or Non-Uniform Memory Architecture


 Hierarchical model system is a hybrid of shared memory system, shared disk
system and shared nothing system.
 Hierarchical model is also known as Non-Uniform Memory Architecture
(NUMA).
 In this system each group of processor has a local memory. But processors from
other groups can access memory which is associated with the other group in
coherent.
 NUMA uses local and remote memory (Memory from other group), hence it will
take longer time to communicate with each other.

Disadvantages of NUMA

 The cost of the architecture is higher compared to other architectures.

Note that this technology is typically used for very large databases that have the size
of 1012 bytes or TB or for the system that has the process of thousands of transactions
per second.

Advantages of Parallel Database System


1. High Computing Speed -
This type of database system has a very high computing speed so it can manage
applications with a large amount of data.

2. Capacity to handle very large very large number of transactions -

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 13
This system can handle a very large number of transactions per second, so these are
used to speed up the processing of transactions on data based systems or servers.

3. High throughput and response time -


In these systems throughput and response time are very high.
Throughput: Number of tasks completed in a given specific time period.
Response time: The time duration a single task actually occupies to complete itself
from the all-over time allotted.

4. Performance Improvement –
By connecting multiple resources like CPU and disks in parallel we can significantly
increase the performance of the system.

5. High availability –
In the parallel database, nodes have less contact with each other, so the failure of one
node doesn’t cause for failure of the entire system. This amounts to significantly
higher database availability.

6. Proper resource utilization –


Due to parallel execution, the CPU will never be idle. Thus, proper utilization of
resources is there.

7. Increase Reliability –
When one site fails, the execution can continue with another available site which is
having a copy of data, making the system more reliable.

Disadvantages of Parallel Database System


1. The start-up cost is very high in this system. Start-up cost actually means the
time a single task (from all tasks allotted) uses to start itself. So the main
disadvantage of Parallel DBMS is cost.
2. Due to shared resources, each task has to wait for the required resource to
become available.
3. Another disadvantage of Parallel DBMS is the huge amount of resources.
4. Also other disadvantage of Parallel DBMS is the difficulty in systems managing.
5. Deadlock conditions may occur in these type of DBMS.

Basic Concepts of Distributed databases


A Distributed database is defined as a logically related collection of data that is shared
which is physically distributed over a computer network on different sites.
Distributed DBMS:
The Distributed DBMS is defined as, the software that allows for the management of
the distributed database and make the distributed data available for the users.
A distributed DBMS consist of a single logical database that is divided into a number
of pieces called the fragments. In DDBMS, Each site is capable of independently
processing the user’s request.
Users can access the DDBMS via applications classified:

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 14
1. Local Applications –
Those applications that don’t require data from the other sites are classified under
the category of Local applications.
2. Global Applications –
Those applications that require data from the other sites are classified under the
category of Global applications.
Characteristics of Distributed DDBMS:
A DDBMS has the following characteristics-
1. A collection of logically related shared data.
2. The data is split into a number of fragments.
3. Fragments may be duplicate.
4. Fragments are allocated to sites.
5. The data at each site is under the control of DBMS and managed by DBMS.

Distributed Processing:
The Distributed processing is centralized database that can be accessed over a
computer network by different sites. The data is centralized even though other users
may be accessing the data from the other sites, we do not consider this to be DDBMS,
simply distributed processing.

Advantages of Distributed Database


1. Better Reliability: Distributed databases offers better reliability than centralized
databases. When database failure occurs in a centralized database, the system
comes to a complete stop. But in the case of distributed databases, the system
functions even when a failure occurs, only performance-related issues occur
which are negotiable.
2. Modular Development: It implies that the system can be expanded by adding
new computers and local data to the new site and connecting them to the
distributed system without interruption.
3. Lower Communication Cost: Locally storing data reduces communication costs
for data manipulation in distributed databases. In centralized databases, local
storage is not possible.
4. Better Response Time: As the data is distributed efficiently in distributed
databases, this provides a better response time when user queries are met
locally. While in the case of centralized databases, all of the queries have to pass
through the central machine which increases response time.

Disadvantages of Distributed Database


1. Costly Software: Maintaining a distributed database is costly because we need
to ensure data transparency, coordination across multiple sites which requires
costly software.
2. Large Overhead: Many operations on multiple sites require complex and
numerous calculations, causing a lot of processing overhead.
3. Improper Data Distribution: If data is not properly distributed across different
sites, then responsiveness to user requests is affected. This in turn increases the
response time.

Architectures of Distributed DBMS


-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 15
The basic types of distributed DBMS are as follows:

1. Client-server architecture of Distributed system.


 A client server architecture has a number of clients and a few servers connected in a
network.
 A client sends a query to one of the servers. The earliest available server solves it and
replies.
 A Client-server architecture is simple to implement and execute due to centralized
server system.

2. Collaborating server architecture.


 Collaborating server architecture is designed to run a single query on multiple servers.
 Servers break single query into multiple small queries and the result is sent to the
client.
 Collaborating server architecture has a collection of database servers. Each server is
capable for executing the current transactions across the databases.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 16
3. Middleware Architecture
 Middleware architectures are designed in such a way that single query is executed on
multiple servers.
 This system needs only one server which is capable of managing queries and
transactions from multiple servers.
 Middleware architecture uses local servers to handle local queries and transactions.
 The softwares are used for execution of queries and transactions across one or more
independent database servers, this type of software is called as middleware.

Comparison of DBMS & DDBMS

DBMS DDBMS

A DBMS is a generalized software for DDBMS is a software that manages a


managing and manipulating databases. Distributed database make distribution
transparent to the user.

DBMS have the capability to process the DBMS have the capability to handle remote
query at the single site. sites and transmit query and data among
various users.

Data replication is minimum Data replication is necessary to improve


availability, reliability and performance.

DBMS may be categorized into DDBMS and


Parallel DBMS according to location of DDBMS may be categorized into
storage and it may be categorized into homogeneous and heterogeneous databases
RDBMS, Hierarchical DBMS or Network according to the type of DBMS used in
DBMS according to its structure. different sites.

There is only directory and has no concept of DDBMS maintains two types of directories-
local or global directory. Global for the entire Distributed databases and
Local for each site.

Data may not be distributed in case of Data is distributed at different sites that may
DBMS. be geographical apart.

Difference between Parallel and Distributed databases

Parallel Database Distributed Database

In parallel databases, processes are tightly In distributed databases, the sites are loosely
coupled and constitutes a single database coupled and share no physical components
system i.e., the parallel database is a i.e., distributed database is our geographically
centralized database and data reside in a departed, and data are distributed at several
single location locations.

In parallel databases, query processing and In distributed databases, query processing and
transaction is complicated. transaction is more complicated.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 17
In parallel databases, it’s not applicable. In distributed databases, a local and global
transaction can be transformed into distributed
database systems

In parallel databases, the data is partitioned In distributed databases, each site preserve a
among various disks so that it can be local database system for faster processing
retrieved faster. due to the slow interconnection between sites

In parallel databases, there are 3 types of


architecture: shared memory, shared disk, Distributed databases are generally a kind of
and shared shared-nothing. shared-nothing architecture

In distributed databases, query Optimization


In parallel databases, query optimization is techniques may be different at different sites
more complicated. and are easy to maintain

In parallel databases, data is generally not In distributed databases, data is replicated at


copied. any number of sites to improve the
performance of systems

Parallel databases are generally Distributed databases may be homogeneous or


homogeneous in nature heterogeneous in nature.

Skew is the major issue with the increasing Blocking due to site failure and transparency
degree of parallelism in parallel databases. are the major issues in distributed databases.

-------------------------------------------------------------------------------------------------------------------------------
Database Management System (Unit-III) Prof. Ratna Biswas Page- 18

Common questions

Powered by AI

Denormalization might be more beneficial in scenarios where query retrieval speed and performance are prioritized over storage efficiency and data consistency . For applications with complex queries that involve frequent and expensive join operations, such as real-time data analysis or reporting systems, denormalization can significantly reduce the time required for data retrieval by reducing the need for joins . Although it compromises data integrity and increases storage, the trade-off can be acceptable in environments where read performance is more critical than update speed or storage cost .

Parallel and distributed databases have distinct architectural and processing differences. Parallel databases consist of tightly coupled processes with centralized data storage, focusing on performance through data partitioning and parallel task execution . Their architecture types include shared memory, shared disk, and shared-nothing . In contrast, distributed databases are loosely coupled, with data distributed across multiple locations, often geographically separated, enhancing system reliability and availability through data replication . This architecture is typically shared-nothing, emphasizing managing local databases independently . Query optimization and transaction handling are more complex in distributed databases due to network latency and data distribution .

Distributed databases enhance system reliability and availability by maintaining data across multiple sites. If one site fails, others can continue to operate, ensuring ongoing service and access . Data replication across several locations ensures that the system can withstand site-specific failures, thereby increasing fault tolerance and operational up-time . This distribution also balances the load across various nodes, further improving system resilience and performance efficiency in handling local transactions independently .

Denormalization is chosen over normalization primarily when query performance is a critical concern. It reduces the number of joins required, thereby enhancing query performance . If database queries are frequent and join operations are costly, denormalization helps by pre-computing derived values, reducing the number of relations, and minimizing the complexity of queries, which simplifies management and accelerates report generation . However, it involves accepting trade-offs like increased storage due to data redundancy and potentially slower updates .

Client-server architecture is simpler and relies on a centralized server to handle queries from clients, which can create bottlenecks but simplifies network management . In contrast, collaborating server architecture involves multiple servers that work together to execute parts of a single query in parallel, which reduces bottlenecks and allows for better load distribution . This parallel execution can significantly enhance query handling by dividing tasks among multiple servers, improving overall system throughput and efficiency, although it requires more complex coordination mechanisms .

Parallel database architecture improves performance by distributing tasks across multiple CPUs and disks, allowing simultaneous data processing, which significantly boosts throughput and response time . It also provides high availability since the failure of one node doesn’t affect the entire system . However, it has a high start-up cost and requires substantial resources, which can lead to shared resource contention and potential deadlock issues . Managing these systems is complex as it requires careful allocation and synchronization of tasks among the nodes .

Normalization supports database integrity and efficiency by systematically organizing data to eliminate redundancy and ensure consistency . It isolates data, reduces unnecessary duplication, and establishes functional dependencies between data, which enhances reliability during operations like insertion, updating, and deletion . By minimizing data redundancy, normalization reduces storage needs and ensures that changes in data are consistently reflected across the database, thus maintaining its integrity .

Middleware architecture in distributed database management systems (DDBMS) acts as a mediator that coordinates the execution of queries and transactions across multiple independent database servers. It enhances functionality by allowing centralized management of distributed systems, supporting transparency in query execution across different sites, and ensuring efficient routing and balancing of server workloads . This helps in managing data effectively while providing users with a unified system view, regardless of data's physical distribution .

Data redundancy introduced by denormalization can negatively impact the integrity and consistency of a database. Redundancy means the same data can exist in multiple places, which increases the risk of data being inconsistent if not properly managed. When data is duplicated, any update operation needs to ensure that all copies of the data are updated, which is complex and error-prone . This lack of data integrity is a key disadvantage of denormalization as it can lead to anomalies and mismatches in the data .

An organization might opt for a parallel database system to leverage its higher computing speed and ability to manage large-scale transactions efficiently . Parallel databases improve performance by utilizing multiple CPUs and disks simultaneously, reducing bottlenecks associated with centralized systems . They also enhance reliability by allowing continued operatioin even if one node fails, thus offering higher availability and better resource utilization across a distributed architecture . These capabilities make parallel databases suitable for data-intensive applications requiring high throughput and rapid processing .

You might also like