UNIT - V:
Relational Database Design: Features of Good Relational Designs - Atomic Domains and First
Normal Form - Decomposition Using Functional Dependencies – Functional Dependency
Theory - Decomposition Using Multi valued Dependencies- More Normal Forms - Database-
Design Process.
RELATIONAL DATABASE DESIGN
Relational database design is the process of structuring a relational database in a way that
optimally organizes data for storage, retrieval, and management. It involves defining the data
entities (tables), their attributes (columns), and the relationships between these entities (keys and
constraints).
Relational database design is the process of organizing data into structured tables,
ensuring efficient storage, retrieval, and management. It involves defining entities (tables),
attributes (columns), and the relationships between them using primary and foreign keys.
The design aims to minimize redundancy and enhance data integrity through
normalization, which breaks down complex data into simpler, related tables. Key considerations
include defining constraints to maintain data validity and creating an effective schema that
outlines the database structure.
The entity-relationship (ER) model often guides the design process by visualizing
relationships among data entities. Overall, relational database design seeks to create a scalable,
maintainable, and efficient database that meets the needs of its users while preserving data
consistency.
Components of relational database design include:
1. Entities and Attributes: Identifying the main objects of interest (entities) and their
characteristics (attributes).
2. Tables: Structuring data into tables where each table represents an entity.
3. Primary Keys: Defining unique identifiers for each record in a table to ensure data integrity.
4. Foreign Keys: Establishing relationships between tables by linking a primary key in one table
to a foreign key in another.
5. Normalization: Organizing data to reduce redundancy and dependency by dividing large tables
into smaller, related tables.
[Link]: Implementing rules to ensure data validity, such as uniqueness, referential
integrity, and data types.
GOOD RELATIONAL DATABASE
1. Normalization
Effective normalization reduces data redundancy and dependency, ensuring that data is stored in
an optimal way across multiple tables.
2. Data Integrity
Maintaining data integrity through constraints, such as primary keys, foreign keys, and unique
constraints, ensures the accuracy and consistency of data.
3. Clear Relationships
Well-defined relationships among tables (one-to-one, one-to-many, many-to-many) facilitate
efficient data retrieval and logical organization.
4. Simplicity
A good design is as simple as possible while meeting the requirements. This includes avoiding
unnecessary complexity in tables and relationships.
5. Flexibility and Scalability
The design should accommodate future growth and changes in requirements without requiring
significant restructuring.
6 User-Friendly Structure
The database schema should be intuitive, making it easier for developers and users to understand
and work with the data.
7. Documentation
Comprehensive documentation of the database structure, relationships, and constraints aids in
maintenance and onboarding new team members.
8. Consistency
Data should be consistently defined across the database, with standardized naming conventions
and formats for attributes.
9. Security Features
Incorporating security measures, such as access controls and encryption, helps protect sensitive
data from unauthorized access.
DESIGNING GOOD RELATIONAL DATABASES:
Databases have a reputation for being difficult to construct and hard to maintain. The
power of modern database software makes it possible to create a database with a few
mouseclicks
Database design has nothing to do with using computers. It has everything to do with
research and planning. The design process should be completely independent of software
choices.
The basic elements of the design process are:
1. Defining the problem or objective
2. Researching the current database
3. Designing the data structures
4. Constructing database relationships
5. Implementing rules and constraints
6. Creating database views and reports
7. Implementing the design
Defining the problem or objective. The most important step in database design is the first one:
defining the problem the database will address or the objective of the database. It is important
however, to draw a distinction between:
How the database will be used and
What information needs to be stored in it.
Researching the current database. In most database design situations, there is some sort of
database already in existence. That database may be Post-it notes, paper order forms, a
spreadsheet of sales data, a word processor file of names and addresses, or a full-fledged digital
database (possibly in an outdated software package or older legacy system).
Designing the data structures. A database is essentially a collection of data tables, so the next
step in the design process is to identify and describe those data structures. Each table in a
database should represent some distinct subject or physical object, so it seems reasonable to
simply analyse the subjects or physical objects relevant to the purpose of the database, then
arrive at a list of tables.
Once the tables have been determined and fields have been assigned to each, the next step is to
develop the specifications for each field. The perfect field should be atomic: It should be unique
in all tables in the database (unless it is used as a key) and contain a single value, and it should
not be possible to break it into smaller components.
Constructing database relationships. Once the data structures are in place, the next step is to
establish the relationships between the databases. First you must ensure that each table has a
unique key that can identify the individual records in each table.
Implementing rules and constraints. In this step, the fields in the database are still fairly
amorphous. Defining the fields as text or numeric and getting a rough feel for the types of data
that the client needs to store has narrowed them down, but there is room for further refinement.
Creating database views and report. Now that the data design is essentially complete, the
penultimate step is to create the specifications that help turn the data into useful information in
the form of a report or view of the data. Views are simply collections of the data available in the
database combined and made accessible in one place.
Implementing the design in software. All of the work to this point has been accomplished
without explicitly worrying about the details of the program being used to produce the database.
In fact, the design should only exist as diagrams and notes on paper
ATOMIC DOMAIN
An atomic domain refers to a set of values that are indivisible and cannot be broken down
into smaller components. In relational databases, each attribute in a table should ideally hold
atomic values, meaning each value represents a single piece of information.
For example, in a table for "Customers," an attribute for "Phone Number" should contain
only one number, not multiple numbers combined. Similarly, a "Date of Birth" field should store
a single date, not a range or multiple dates.
This ensures data integrity and simplifies querying. If a field contains non-atomic values,
like "123 Main St, Apt 4," it should be split into separate fields for "Street" and "Apartment." By
keeping data atomic, we make it easier to manipulate and retrieve accurate information from the
database.
NORMALIZATION
Normalization is the process of organizing the data in the database. Normalization is used to
minimize the redundancy from a relation or set of relations. It is also used to eliminate
undesirable characteristics like Insertion, Update, and Deletion Anomalies.
1. Normalization is the process of organizing the data in the database.
2. Normalization is used to minimize the redundancy from a relation or set of relations.
3. It is also used to eliminate the undesirable characteristics like insertion, update and deletion
anomalies.
4. Normalization divides the larger table into the smaller table and links them using relationship.
5. The normal form is used to reduce redundancy from the database table.
6. A large database defined as a single relation may result in data duplication. This repetition of
data may result in:
7. Making relations very large.
8. It isn't easy to maintain and update data as it would involve searching many records in
relation.
9. Wastage and poor utilization of disk space and resources.
ANOMALIES
Anomalies in databases refer to issues that arise when data is not organized correctly,
leading to problems during data insertion, deletion, or updating. These anomalies can
compromise data integrity and consistency. The three main types of anomalies are
Insertion Anomaly: Insertion Anomaly refers to when one cannot insert a new tuple into a
relationship due to lack of data.
Deletion Anomaly: The delete anomaly refers to the situation where the deletion of data results
in the unintended loss of some other important data.
Updatation Anomaly: The update anomaly is when an update of a single data value requires
multiple rows of data to be updated.
TYPES OF NORMAL FORMS:
Normalization works through a series of stages called Normal forms. The normal forms apply to
individual relations. The relation is said to be in particular normal form if it satisfies constraints.
Following are the various types of Normal forms:
NORMAL FORM DESCRIPTION
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all
non-key attributes are fully functional dependent
on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no
transition dependency exists.
BCNF A stronger definition of 3NF is known as Boyce
Codd's normal form.
4NF A relation will be in 4NF if it is in Boyce Codd's
normal form and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not
contain any join dependency, joining should be
lossless.
FIRST NORMAL FORM (1NF)
A relation will be 1NF if it contains an atomic value.
It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
First normal form disallows the multi-valued attribute, composite attribute, and their
combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385, UP
9064738238
20 Harry 8574783832 Bihar
12 Sam 7390372389, Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID EMP_NAME EMP_PHONE EMP_STATE
14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
SECOND NORMAL FORM (2NF)
In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the primary
key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a
school, a teacher can teach more than one subject. TEACHER table
TEACHER_ID SUBJECT TEACHER_AGE
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which
is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
THIRD NORMAL FORM (3NF)
o A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
o 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
o If there is no transitive dependency for non-prime attributes, then the relation must be in
third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every
non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:
1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on
EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent
on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID EMP_NAME EMP_ZIP
222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
BOYCE CODD NORMAL FORM (BCNF)
o BCNF is the advance version of 3NF. It is stricter than 3NF.
o A table is in BCNF if every functional dependency X → Y, X is the super key of the
table.
o For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO
264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT DEPT_TYPE EMP_DEPT_NO
Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
FOURTH NORMAL FORM (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
o For a dependency A → B, if for a single value of A, multiple values of B exists, then the
relation will be a multi-valued dependency.
Example
STUDENT
STU_ID COURSE HOBBY
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-
valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
FIFTH NORMAL FORM (5NF)
o A relation is in 5NF if it is in 4NF and not contains any join dependency and joining
should be lossless.
o 5NF is satisfied when all the tables are broken into as many tables as possible in order to
avoid redundancy.
o 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT LECTURER SEMESTER
Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take
Math class for Semester 2. In this case, combination of all these fields required to identify a valid
data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will
be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together
acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
ADVANTAGES OF NORMALIZATION
Normalization helps to minimize data redundancy.
Greater overall database organization.
Data consistency within the database.
Much more flexible database design.
Enforces the concept of relational integrity.
DISADVANTAGES OF NORMALIZATION
You cannot start building the database before knowing what the user needs.
The performance degrades when normalizing the relations to higher normal forms, i.e.,
4NF, 5NF.
It is very time-consuming and difficult to normalize relations of a higher degree.
Careless decomposition may lead to a bad database design, leading to serious problems.
FUNCTIONAL DEPENDENCY IN DBMS
Functional dependency is a relationship between two sets of attributes in a relational
database. It indicates that the value of one attribute (or a set of attributes) determines the value of
another attribute. If attribute A functionally determines attribute B, then for each value of A,
there is precisely one value of B associated with it.
This is denoted as A → B.
Examples of Functional Dependency
1. In a table of employees, if each employee ID uniquely determines an employee's name, we
express this as:
EmployeeID → EmployeeName
Here, knowing the EmployeeID allows you to know the corresponding EmployeeName.
2. In a student table, if a student’s ID determines their major, it can be expressed as:
StudentID → Major
TYPES OF FUNCTIONAL DEPENDENCY
1. Trivial Functional Dependency:
A functional dependency is trivial if the dependent attribute is a subset of the
determinant.
Example: If A → B, and B is part of A, then the dependency is trivial. For example:
{StudentID, StudentName} → StudentName
2. Non-Trivial Functional Dependency:
A functional dependency is non-trivial if the dependent attribute is not a subset of the
determinant.
Example: If StudentID → Major, this is non-trivial because knowing the StudentID provides
information about the Major, which is not part of the StudentID.
3. Transitive Functional Dependency:
If A → B and B → C, then A → C is a transitive dependency.
Example: If StudentID → Major and Major → Department, then StudentID → Department.
4. Partial Functional Dependency:
A functional dependency is partial if it holds true for a subset of a composite primary
key.
Example: In a table with a composite key (CourseID, StudentID), if CourseID → CourseName,
this is a partial dependency.
5. Full Functional Dependency:
A functional dependency is full if it depends on the whole composite primary key and not
on any subset.
Example: In a table with (CourseID, StudentID) as the composite key, if (CourseID,
StudentID) → Grade, it is a full functional dependency.
NON-FUNCTIONAL DEPENDENCY
Non-functional dependency refers to a relationship where the value of one attribute does
not determine the value of another attribute. In other words, there is no functional relationship
between them.
Example: If we have an attribute "Favorite Color" in a student table, it does not depend on the
"StudentID." Knowing the StudentID does not provide any information about the student's
favorite color. This can be denoted as:
StudentID ⊭ Favorite Color
DECOMPOSITION
Decomposition in a Database Management System (DBMS) refers to the process of
dividing a table into multiple smaller tables to improve data consistency and eliminate
redundancy. It is crucial for ensuring that no data is lost while maintaining the integrity of the
original information.
Types of Decomposition
1. Lossless Decomposition: In lossless decomposition, the original relation can be perfectly
reconstructed using a join operation on the decomposed tables. No information is lost.
Example:
Original Relation R: {StudentID, Name, Course}
Data:
StudentID | Name | Course
-----------|--------|-------
1 | Alice | Math
2 | Bob | Science
3 | Charlie| Math
Decomposed Tables:
R1: {StudentID, Name}
Data:
StudentID | Name
-----------|-------
1 | Alice
2 | Bob
3 | Charlie
R2: {StudentID, Course}
Data:
StudentID | Course
-----------|-------
1 | Math
2 | Science
3 | Math
Join Result: Joining R1 and R2 yields the original relation R.
2. Lossy Decomposition: In lossy decomposition, the join operation on the decomposed tables
does not yield the original relation, often resulting in extra, incorrect tuples.
Example:
Original Relation R: {OrderID, Product, Quantity}
Data:
OrderID | Product | Quantity
---------|---------|---------
1 | Apple | 10
2 | Banana | 20
3 | Apple | 5
Decomposed Tables:
R1: {OrderID, Product}
Data:
OrderID | Product
---------|--------
1 | Apple
2 | Banana
3 | Apple
R2: {OrderID, Quantity}
Data:
OrderID | Quantity
---------|---------
1 | 10
2 | 20
3 |5
Join Result: The join of R1 and R2 could produce extra or incorrect combinations:
OrderID | Product | Quantity
---------|---------|---------
1 | Apple | 10
1 | Apple | 5
2 | Banana | 20
3 | Apple | 10
3 | Apple | 5
DECOMPOSITION USING FUNCTIONAL DEPENDENCIES
Decomposition in the context of databases refers to the process of breaking down a
relation (table) into smaller, more manageable sub-relations based on functional dependencies.
The goal of decomposition is to eliminate redundancy and anomalies while preserving the
original data and ensuring that the relationships between the data are maintained.
Examples of Decomposition
[Link] 1: Basic Decomposition
Initial Relation: Consider a relation R with attributes {StudentID, StudentName, Major,
Advisor}.
Functional Dependencies:
StudentID → StudentName
StudentID → Major
Major → Advisor
Decomposition:
We can decompose R into two relations:
R1: {StudentID, StudentName, Major}
R2: {Major, Advisor}
This decomposition removes redundancy and organizes the data based on the functional
dependencies.
BENEFITS OF DECOMPOSITION
Reduces Redundancy: By breaking tables into smaller ones based on functional dependencies,
we reduce duplicated data.
Eliminates Anomalies: Helps to avoid insertion, deletion, and update anomalies.
Enhances Data Integrity: Maintains accurate relationships among the data.
DECOMPOSITION USING MULTI-VALUED DEPENDENCIES
Decomposition using multi-valued dependencies (MVDs) involves breaking down a
relation into smaller relations based on dependencies where one attribute can have multiple
independent values associated with a single value of another attribute. This process helps to
eliminate redundancy and avoid anomalies in database design.
A multi-valued dependency occurs when, for a given value of attribute A, there are
multiple values of attribute B that are independent of any other attributes. This is denoted as \( A
\two head right arrow B \), meaning A determines B, and B is independent of other attributes in
the relation.
Example of Decomposition Using MVDs
1. Initial Relation:
Consider a relation R with the following attributes: {StudentID, Course, Hobby}.
Data:
StudentID | Course | Hobby
-----------|------------|-----------
1 | Math | Painting
1 | Math | Reading
1 | Science | Painting
2 | Science | Sports
2 | Math | Reading
2. Multi-Valued Dependency:
In this relation, we can observe that:
StudentID determines both Course and Hobby independently.
Thus, we have the following multi-valued dependencies:
\( StudentID \twoheadrightarrow Course \)
\( StudentID \twoheadrightarrow Hobby \)
3. Decomposition:
To eliminate redundancy, we can decompose relation R into two separate relations based on the
multi-valued dependencies:
R1: {StudentID, Course}
Data:
StudentID | Course
-----------|------------
1 | Math
1 | Science
2 | Science
2 | Math
R2: {StudentID, Hobby}
Data:
StudentID | Hobby
-----------|------------
1 | Painting
1 | Reading
2 | Sports
2 | Reading
4. Result of Decomposition:
- After decomposition, the relations R1 and R2 can be managed independently, reducing
redundancy:
- Now, each relation focuses on a single aspect (courses and hobbies) related to students
without duplication.
MORE NORMAL FORM
SIXTH NORMAL FORM (6NF)
Sixth Normal Form (6NF) is a level of database normalization that is primarily concerned
with handling temporal data and ensuring that all values in a relation are atomic and that the
relation is free of any non-trivial multi-valued dependencies. A relation is in 6NF if it is already
in 5NF and if it contains only atomic values, meaning that each attribute contains indivisible
values.
Example of 6NF
Consider a relation that tracks employees and their skills over time:
EmployeeID Skill StartDate EndDate
1 Java 2020-01-01 2022-12-31
1 Python 2021-01-01 2023-06-30
2 SQL 2019-01-01 2021-12-31
In this table:
The attributes are not atomic. For instance, if an employee has multiple skills that change
over time, the same record might need to be split further to reflect those changes
appropriately.
The relationship between skills and time periods needs to be expressed without losing
historical context.
DATABASE DESIGN PROCESS ( REFER UNIT 4 )