0% found this document useful (0 votes)
4 views44 pages

Dbms Unit-2

The Relational Model, proposed by E.F. Codd, organizes data into two-dimensional tables called relations, where columns represent attributes and rows represent records. It is widely used due to its simplicity, flexibility, and ability to maintain data integrity, although it may struggle with large databases and complex relationships. The model includes various types of keys, such as primary, candidate, and foreign keys, to uniquely identify records and establish relationships between tables.

Uploaded by

manju.papasani
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views44 pages

Dbms Unit-2

The Relational Model, proposed by E.F. Codd, organizes data into two-dimensional tables called relations, where columns represent attributes and rows represent records. It is widely used due to its simplicity, flexibility, and ability to maintain data integrity, although it may struggle with large databases and complex relationships. The model includes various types of keys, such as primary, candidate, and foreign keys, to uniquely identify records and establish relationships between tables.

Uploaded by

manju.papasani
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

What is the Relational Model?

E.F. Codd proposed the relational Model to model data in the form of relations or
tables.
After designing the conceptual model of the Database using ER diagram, we need to
convert the conceptual model into a relational model which can be implemented using
any RDBMS language like Oracle SQL, MySQL, etc.
The relational model for database management is an approach to logically represent
and manage the data stored in a database.
In this model, the data is organized into a collection of twodimensional interrelated
tables, also known as relations.
Each relation is a collection of columns and rows, where the column represents the
attributes of an entity and the rows (or tuples) represents the records.
The use of tables to store the data provided a straightforward, efficient, and flexible
way to store and access structured information. Because of this simplicity, this data
model provides easy data sorting and data access. Hence, it is used widely around the
world for data storage and processing.
Let's look at a scenario to understand the relational model:
Consider a case where you wish to store the name, the CGPA attained, and the roll
number of all the students of a particular class. This structured data can be easily
stored in a table as described below:

As we can notice from the above relation:




Any given row of the relation indicates a student i.e., the row of the table describes a
real
world entity.
The columns of the table indicate the attributes related to the entity. In this case, the
roll
number, CGPA, and the name of the student.
NOTE: A database implemented and organized in terms of the relational model is
known as a
relational database management system (RDBMS). Hence, the relational model
describes
how data is stored in relational databases.
Advantages of the Relational Model
Simple model: Relational Model is simple and easy to use in comparison to
other languages.
 Flexible: Relational Model is more flexible than any other relational model present.
 Secure: Relational Model is more secure than any other relational model.
 Data Accuracy: Data is more accurate in the relational data model.
 Data Integrity: The integrity of the data is maintained in the relational model.
 Operations can be Applied Easily: It is better to perform operations in the relational
model.

Disadvantages of the Relational Model
 Relational Database Model is not very good for large databases.
 Sometimes, it becomes difficult to find the relation between tables.
 Because of the complex structure, the response time for queries is high.
Characteristics of the Relational Model
 Data is represented in rows and columns called relations.
 Data is stored in tables having relationships between them called the Relational
model.
 The relational model supports the operations like Data definition, Data
manipulation,
and Transaction management.
 Each column has a distinct name and they are representing attributes.
 Each row represents a single entity.
Types of Keys
Keys play an important role in the relational database.
o It is used to uniquely identify any record or row of data from the table. It is also
used to establish and identify relationships between tables.
o
For example, ID is used as a key in the Student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys since
they are unique for each person.
Types of keys:

1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key
which is most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and
developers.
o
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a
tuple( here values may or may not be null ).
o Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.
o
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest
of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.

3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME),
the name of two employees can be the same, but their EMPLYEE_ID can't be the
same.
Hence, this combination can also be a key.
4. Foreign key
Foreign keys are the column of the table used to point to the primary key of
another table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's
information in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.

5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely
identify
each tuple in a relation. These attributes or combinations of the attributes are called
the candidate keys. One key is chosen as the primary key from these candidate keys,
and
the remaining candidate key, if it exists, is termed the alternate key. In other words,
the
total number of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one candidate
key in
a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that
act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a
composite
key. This key is also known as Concatenated Key.

For example, in employee relations, we assume that an employee may be assigned


multiple roles, and an employee may work on multiple projects simultaneously. So the
primary key will be composed of all three attributes, namely Emp_ID, Emp_role, and
Proj_ID in combination. So these attributes act as a composite key since the primary
key comprises more than one attribute.
Structure of Relational Databases

A relational database consists of a collection of tables, each of which is assigned a


unique name.

A row in a table represents a relationship


among a set of values. Since a table is a
collection of such relationships, there is a
close correspondence between the concept of
table and the mathematical concept of
relation, from which the relational data
model takes its name. In what follows, we
introduce the concept of relation.

Basic Structure

Consider the account table of Figure 3.1. It has three column headers: account
number, branchname, and balance. Following the terminology of the relational model,
these headers are attributes. For each attribute, there is a set of permitted values,
called the domain of that attribute. For the attribute branchname, for example, the
domain is the set of all branch names.

LetD1 denote the set of all account numbers, D2 the set of all branch names, and
D3the set of all balances. Any row of account must consist of a 3tuple (v1, v2, v3),
where v1 is an account number (that is, v1 is in domain D1),v2 is a branch name (that
is, v2 is in domain D2), and v3 is a balance (that is, v3 is in domain D3). In general,
account will contain only a subset of the set of all possible rows. Therefore, account is
a subset of

In general, a table of n attributes


must be a subset of

Mathematicians define a relation to


be a subset of a Cartesian product of a list of domains. This definition corresponds
almost exactly with our definition of table. The only difference is that we have
assigned names to attributes, whereas mathematicians rely on numeric “names,”
using the integer 1 to denote the attribute whose domain appears first in the list of
domains, 2 for the attribute whose domain appears second, and so on. Because tables
are essentially relations, we shall use the mathematical terms relation and tuple in
place of the terms table and row. A tuple variable is a variable that stands for a tuple;
in other words, a tuple variable is a variable whose domain is the set of all tuples.

In the account relation of Figure 3.1, there are seven tuples. Let the tuple variable t
refer to the first tuple of the relation. We use the notation t[accountnumber] to denote the
value of t on the accountnumber attribute. Thus, t[accountnumber] = “A101,” and
t[branchname] = “Downtown”. Alternatively, we may write t[1] to denote the value of
tuple t on the first attribute (accountnumber), t[2] to denote branchname, and so on.
Since a relation is a set of tuples, we use the mathematical notation of t ∈r to denote
that tuple t is in relation r.

The order in which tuples appear in a relation is irrelevant, since a relation is a set of
tuples. Thus, whether the tuples of a relation are listed in sorted order, as in Figure
3.1, or are unsorted, as in Figure 3.2, does not matter; the relations in the two figures
above are the same, since both contain the same set of tuples.

For all relations r, the domains of all attributes of r be atomic. A domain is atomic if
elements of the domain are considered to be indivisible units. For example, the set of
integers is an atomic domain, but the set of all sets of integers is a non atomic domain.

Database Schema

The concept of a relation corresponds to the programminglanguage notion of a


variable. The concept of a relation schema Corresponds to the programminglanguage
notion of type definition.

It is convenient to give a name to a relation schema, just as we give names to type


definitions in programming languages. We adopt the convention of using lowercase
names for relations, and names beginning with an uppercase letter for relation
schemas. Following this notation, we use Accountschema to denote the relation
schema for relation account. Thus,

The fact that account is a relation on Account


schema by
The concept of a relation instance corresponds to the programming language notion of
a value of a variable. The value of a given variable may change with time; similarly,
the contents of a relation instance may change with time as the relation is updated.
However, we often simply say “relation” when we actually mean “relation instance.”

The schema for that relation is

Note that the attribute branchname appears in both Branchschema and Account
schema. This duplication is not a coincidence. Rather, using common attributes in
relation schemas is one way of relating tuples of distinct relations. For example,
suppose we wish to find the information about all of the accounts maintained in
branches located in Brooklyn. We look first at the branch relation to find the names of
all the branches located in Brooklyn. Then, for each such branch, we would look in the
account relation to find the information about the accounts maintained at that branch.

This is not surprising recall that the primary key attributes of a strong entity set
appear in the table created to represent the entity set, as well as in the tables created
to represent relationships that the entity set participates in.

Let us continue our banking example. We need


a relation to describe information about
customers. The relation schema is

Figure 3.4 shows a sample relation customer


(Customerschema).We assume that the customer name uniquely identifies a customer
obviously this may not be true in the real world, but the assumption makes our
examples much easier to read.

In a realworld database, the customerid (which could be a socialsecurity number, or an


identifier generated by the bank) would serve to uniquely identify customers. We need
a relation to describe the association between customers and accounts.

The relation schema to describe this association is


Figure 3.5 shows a sample relation depositor (Depositor
schema). It would appear that, for our banking example,
we could have just one relation schema, rather than
several. That is, it may be easier for a user to think in
terms of one relation schema, rather than in terms of
several. Suppose that we used only one relation for our example, with schema

Observe that, if a customer has several accounts, we must list her address once for
each account. That is, we must repeat certain information several times. This
repetition is wasteful and is avoided by the use of several relations, as in our example.

Schema Diagram

A database schema, along with primary key and foreign key dependencies, can be
depicted pictorially by schema diagrams. Figure 3.9 shows the schema diagram for our
banking enterprise. Each relation appears as a box, with the attributes listed inside it
and the relation name above it. If there are primary key attributes, a horizontal line
crosses the box, with the primary key attributes listed above the line. Foreign key
dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation.
Do not confuse a schema diagram with an ER diagram. In particular, ER diagrams do
not show foreign key attributes explicitly, whereas schema diagrams show them
explicity.

Many database systems provide design tools with a graphical user interface for
creating schema diagrams.

Relational Query Language in DBMS


SQL has its own querying methods to interact with the database. But how do these
queries work in the database? These queries work similarly to Relational Algebra that
we study in mathematics. In the database, we have tables participating in relational
Algebra.

Relational Database systems are expected to be equipped with a query language that assists users to
query the database. Relational Query Language is used by the user to communicate with the
database user requests for the information from the database. Relational algebra breaks the user
requests and instructs the DBMS to execute the requests. It is the language by which the user
communicates with the database. They are generally on a higher level than any other programming
language. These relational query languages can be Procedural and Non-Procedural.
Types of Relational Query Language
There are two types of relational query language:
• Procedural Query Language
• Non-Procedural Language

Procedural Query Language


In Procedural Language, the user instructs the system to perform a series of operations on the
database to produce the desired results. Users tell what data to be retrieved from the database and
how to retrieve it. Procedural Query Language performs a set of queries instructing the DBMS to
perform various transactions in sequence to meet user requests.

Relational Algebra is a Procedural Query Language


Relational Algebra could be defined as the set of operations on
relations. There are a few operators that are used in relational algebra –
1. Select (sigma): Returns rows of the input relation that satisfy the provided predicate. It is
unary Operator means requires only one operand.
2. Projection (ℼ): Show the list of those attribute which we desire to appear and rest other
attributes are eliminated from the table. It seperates the table vertically.
3. Set Difference (-): It returns the difference between two relations . If we have two relations
R and S them R-S will return all the tuples (row) which are in relation R but not in Relation
S , It is binary operator.
4. Cartesian Product (X): Combines every tuple (row) of one table with every tuple (row) in
other table ,also referred as cross Product . It is a binary operator.
5. Union (U): Outputs the union of tuples from both the relations. Duplicate tuples are
eliminated automatically. It is a binary operator means it require two operands.

Non-Procedural Language
In Non Procedural Language user outlines the desired information without giving a specific
procedure or without telling the steps by step process for attaining the information. It only gives a
single Query on one or more tables to get .The user tells what is to be retrieved from the database
but does not tell how to accomplish it.

For Example: get the name and the contact number of the student with a Particular ID will have a
single query on STUDENT table.
Relational Calculus is a Non Procedural Language .
Relational Calculus exists in two forms:
1. Tuple Relational Calculus (TRC): Tuple Relational Calculus is a non procedural query
language , It is used for selecting the tuples that satisfy the given condition or predicate . The
result of the relation can have one or more tuples (row).
2. Domain Relational Calculus (DRC): Domain Relational Calculus is a Non Procedural
Query Language , the records are filtered based on the domains , DRC uses the list of
attributes to be selected from relational based on the condition.
Basic/Fundamental Operations:

[Link] (σ)
2. Project (∏)
3. Union (�)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)

1. Select Operation (σ) :This is used to fetch rows (tuples) from table(relation)
which satisfies a given condition.
Syntax: σp(r)

 σ is the predicate
 r stands for relation which is the name of the table
 p is prepositional
logic ex: σage > 17 (Student)
This will fetch the tuples(rows) from table Student, for which age will be greater than 17.
σage > 17 and gender = 'Male' (Student)
This will return tuples(rows) from table Student with information of male students, of age
more than 17.

BRANCH_NAME LOAN_NO AMOUNT

Downtown L-17 1000

Redwood L-23 2000

Perryride L-15 1500

Downtown L-14 1500

Mianus L-13 500

Roundhill L-11 900

Perryride L-16 1300


Input:
σ BRANCH_NAME="perryride" (LOAN)

Output:

BRANCH_NAME LOAN_NO AMOUNT

Perryride L-15 1500

Perryride L-16 1300

Project Operation (∏):


 Project operation is used to project only a certain set of attributes of a relation.
In simple words, If you want to see only the names all of the students in
the Student table, then you can use Project Operation.
 It will only project or show the columns or attributes asked for, and will also
remove duplicate data from the columns.
Syntax of Project Operator (∏)
∏ column_name1, column_name2,...., column_nameN(table_name)

Example:
∏Name, Age(Student)
Above statement will show us only the Name and Age columns for all the rows of
data in Student table.

Example: CUSTOMER RELATION

NAME STREET CITY

Jones Main Harrison

Smith North Rye

Hays Main Harrison

Curry North Rye


Johnson Alma Brooklyn

Brooks Senator Brooklyn

Input:
∏ NAME, CITY (CUSTOMER)

Output:

NAME CITY

Jones Harrison

Smith Rye

Hays Harrison

Curry Rye

Johnson Brooklyn

Brooks Brooklyn

Union Operation (�):

 This operation is used to fetch data from two relations(tables) or


temporary relation(result of another operation).
 For this operation to work, the relations(tables) specified should have same number
of attributes(columns) and same attribute domain. Also the duplicate tuples are
autamatically eliminated from the result.
Syntax: A � B
∏Student(RegularClass) � ∏Student(ExtraClass)
Example:

DEPOSITOR RELATION

CUSTOMER_NAME ACCOUNT_NO

Johnson A-101

Smith A-121

Mayes A-321

Turner A-176

Johnson A-273

Jones A-472

Lindsay A-284

BORROW RELATION

CUSTOMER_NAME LOAN_NO

Jones L-17

Smith L-23

Hayes L-15

Jackson L-14

Curry L-93

Smith L-11
Williams L-17

Input:
∏ CUSTOMER_NAME (BORROW) � ∏ CUSTOMER_NAME (DEPOSITOR)

Output:

CUSTOMER_NAME

Johnson

Smith

Hayes

Turner

Jones

Lindsay

Jackson

Curry

Williams

Mayes

Set Difference (-):


This operation is used to find data present in one relation and not present in the second
relation. This operation is also applicable on two relations, just like Union operation.
Syntax: A - B
where A and B are relations.
For example, if we want to find name of students who attend the regular class but not
the extra class, then, we can use the below operation:
∏Student(RegularClass) - ∏Student(ExtraClass)
Input: ∏ CUSTOMER_NAME (BORROW) ∩ ∏ CUSTOMER_NAME (DEPOSITOR)

CUSTOMER_NAME

Smith

Jones

Cartesian Product (X):

This is used to combine data from two different relations(tables) into one and fetch data
from the combined relation.
Syntax: A X B
For example, if we want to find the information for Regular Class and Extra Class which
are conducted during morning, then, we can use the following operation:
σtime = 'morning' (RegularClass X ExtraClass)
For the above query to work, both RegularClass and ExtraClass should have
the attribute time.
Notation: E X D

EMPLOYEE

EMP_ID EMP_NAME EMP_DEPT

1 Smith A

2 Harry C

3 John B
DEPARTMENT

DEPT_NO DEPT_NAME

A Marketing

B Sales

C Legal

Input:
EMPLOYEE X DEPARTMENT

Output:

EMP_ID EMP_NAME EMP_DEPT DEPT_NO DEPT_NAME

1 Smith A A Marketing

1 Smith A B Sales

1 Smith A C Legal

2 Harry C A Marketing

2 Harry C B Sales

2 Harry C C Legal

3 John B A Marketing

3 John B B Sales

3 John B C Legal
Rename Operation (ρ):

This operation is used to rename the output relation for any query operation which
returns result like Select, Project etc. Or to simply rename a relation(table)
Syntax: ρ(RelationNew, RelationOld)

The rename operation is used to rename the output relation. It is denoted by rho (ρ).

Example: We can use the rename operator to rename STUDENT relation to


STUDENT1. ρ(STUDENT1, STUDENT)

Join in DBMS:

 A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.
 Join in DBMS is a binary operation which allows you to combine join product
and selection in one single statement.
 The goal of creating a join condition is that it helps you to combine the data from
two or more DBMS tables.
 The tables in DBMS are associated using the primary key and foreign keys.

Types of SQL JOIN


1. INNER JOIN
2. LEFT JOIN

3. RIGHT JOIN
4. FULL JOIN

Table name: EMPLOYEE

EMP_ID EMP_NAME CITY SALARY AGE


1 Angelina Chicago 200000 30

2 Robert Austin 300000 26

3 Christian Denver 100000 42

4 Kristen Washington 500000 29

5 Russell Los angels 200000 36

6 Marry Canada 600000 48

PROJECT

PROJECT_NO EMP_ID DEPARTMENT

101 1 Testing

102 2 Development

103 3 Designing

104 4 Development

1. INNER JOIN

In SQL, INNER JOIN selects records that have matching values in both tables as long as
the condition is satisfied.

It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
SELECT table1.column1,
table1.column2 FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE INNER JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

2. LEFT JOIN

The SQL left join returns all the values from left table and the matching values from the
right table. If there is no matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2 FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE LEFT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing

Kristen Development

Russell NULL

Marry NULL
3. RIGHT JOIN

In SQL, RIGHT JOIN returns all the values from the values from the rows of right table
and the matched values from the left table. If there is no matching in both tables, it will return
NULL.

Syntax
SELECT table1.column1,
table1.column2 FROM table1 RIGHT
JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE RIGHT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development

Christian Designing
Kristen Development

4. FULL JOIN

In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join
tables have all the records from both tables. It puts NULL on the place of matches not
found.

Syntax
SELECT table1.column1,
table1.column2 FROM table1 FULL
JOIN table2
ON table1.matching_column = table2.matching_column;

Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;

Output

EMP_NAME DEPARTMENT

Angelina Testing

Robert Development
Christian Designing

Kristen Development

Russell NULL

Decomposition: the process of breaking up or dividing a single relation into two or more
sub relations is called as decomposition of a relation.

Decomposition in DBMS removes redundancy, anomalies and inconsistencies from a


database by dividing the table into multiple tables.

Lossless Decomposition

o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.

o The relation is said to be lossless decomposition if natural joins of all


the decomposition give the original relation.

Example:
o EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_N


AME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Market


ing

46 Stephan 30 Bangalore 869 Financ


e

52 Katherine 36 Mumbai 575 Product


ion

60 Jack 40 Noida 678 Testing

o The above relation is decomposed into two relations EMPLOYEE


and DEPARTMENT
o EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida
o DEPARTMENT table
o Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:
Employee ⋈ Department

EMP_I EMP_NAM EMP_AG EMP_CIT DEPT_I DEPT_NAM


D E E Y D E

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

o Hence, the decomposition is Lossless join decomposition.

Lossy Decomposition

As the name suggests, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.

Let us see an example −

<EmpInfo>

Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name

E001 Jacob 29 Alabama Dpt1 Operations

E002 Henry 32 Alabama Dpt2 HR

E003 Tom 22 Texas Dpt3 Finance


Decompose the above table into two tables −

<EmpDetails>

Emp_ID Emp_Name Emp_Age Emp_Location

E001 Jacob 29 Alabama

E002 Henry 32 Alabama

E003 Tom 22 Texas

<DeptDetails>

Dept_ID Dept_Name

Dpt1 Operations

Dpt2 HR

Dpt3 Finance

Now, you won’t be able to join the above tables, since Emp_ID isn’t part
of the DeptDetails relation.

Therefore, the above relation has lossy decomposition.

Dependency Preserving

o It is an important constraint of the database.

o In the dependency preservation, at least one decomposed table must satisfy


every dependency.

o If a relation R is decomposed into relation R1 and R2, then the dependencies of


R either must be a part of R1 or R2 or must be derivable from the combination
of functional dependencies of R1 and R2.
o For example, suppose there is a relation R (A, B, C, D) with functional
dependency set (A->BC). The relational R is decomposed into R1(ABC) and
R2(AD) which is dependency preserving because FD A->BC is a part of relation
R1(ABC).

Multivalued Dependency

o Multivalued dependency occurs when two attributes in a table are independent of


each other but, both depend on a third attribute.

o A multivalued dependency consists of at least two attributes that are dependent on


a third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.

BIKE_MODEL MANUF_YEAR COLOR

M2011 2008 White

M2001 2008 Black

M3001 2013 White

M3001 2013 Black

M4006 2017 White

M4006 2017 Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and


independent of each other.

In this case, these two columns can be called as multivalued dependent on


BIKE_MODEL. The representation of these dependencies is shown below:

BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and
"BIKE_MODEL multidetermined COLOR".

Normalization: Normalization is a process of organizing the data in database to avoid


data redundancy, insertion anomaly, update anomaly & deletion anomaly.

o Normalization is the process of organizing the data in the database.

o Normalization is used to minimize the redundancy from a relation or set of


relations. It is also used to eliminate the undesirable characteristics like Insertion,
Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them
using relationship.
o The normal form is used to reduce redundancy from the database table.

Anomalies in DBMS

There are three types of anomalies that occur when the database is not normalized. These are
– Insertion, update and deletion anomaly.

Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing
the department details in which the employee works. At some point of time the table looks
like this:

Update anomaly: we have two rows for employee Rick as he belongs to two departments
of the company. If we want to update the address of Rick then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having
two different addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the information
of employee Maggie since she is assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we
will discuss about normalization.

First Normal Form (1NF)


• A relation will be 1NF if it contains an atomic value.
• It states that an attribute of a table cannot hold multiple values. It must hold only single-valued
attribute.
• First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


7272826385,
14 John UP
9064738238
20 Harry 8574783832 Bihar
7390372389,
12 Sam Punjab
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID EMP_NAME EMP_PHONE EMP_STATE


14 John 7272826385 UP
14 John 9064738238 UP
20 Harry 8574783832 Bihar
12 Sam 7390372389 Punjab
12 Sam 8589830302 Punjab
Second Normal Form (2NF)
n the 2NF, relational must be in 1NF.
• In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.

TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE


25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper
subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer

Third Normal Form (3NF)


• A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
• 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
• If there is no transitive dependency for non-prime attributes, then the relation must be in third normal
form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial
function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY


222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal
Super key in the table above:

1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the
rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007
EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal

Boyce Codd normal form (BCNF)


BCNF is the advance version of 3NF. It is stricter than 3NF.
• A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
• For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID EMP_COUNTRY EMP_DEPT DEPT_TYPE EMP_DEPT_NO


264 India Designing D394 283
264 India Testing D394 300
364 UK Stores D283 232
364 UK Developing D283 549
In the above table Functional dependencies are as follows:

1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO


Designing D394 283
Testing D394 300
Stores D283 232
Developing D283 549
EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)


• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will
be a multi-valued dependency.

Example

STUDENT

STU_ID COURSE HOBBY


21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there
is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY

STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT LECTURER SEMESTER


Computer Anshika Semester 1
Computer John Semester 1
Math John Semester 1
Math Akash Semester 2
Chemistry Praveen Semester 1
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for
Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we
can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

P1

SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2

SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3

SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen

Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.

BIKE_MODEL MANUF_YEAR COLOR


M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation
of these dependencies is shown below:

1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL


multidetermined COLOR".

BASICS OF FUNCTIONAL DEPENDENCY


• functional dependency is a relationship that exists when one attribute uniquely determines
another attribute.
• A functional dependency is an association between two attributes of the same relational database
table. One of the attributes is called the determinant and the other attribute is called the
determined.
• For each value of the determinant there is associated one and only one value of the determined.
• If A is the determinant and B is the determined then we say that A functionally determines B
and graphically represent this as A ­> B. can also be expressed as B is functionally determined by
A.
Since for each value of A there
is associated one and only one
value of B.

• Since for A = 3 there is associated more than one value of Functional dependency can also be
defined as follows:
• An attribute in a relational model is said to be functionally dependent on another attribute in
the table if it can take only one value for a given value of the attribute upon which it is
functionally dependent.
FUNCTIONAL DEPENDENCY DIAGRAM AND EXAMPLES

Here in Supplier table


Sno Supplier number of supplier that is
unique
Sname Supplier name
City City of the supplier
Status Status of the city
e.g. A grade cities may have status 10,
B grad cities may have status 20 and so on.

Here, Sname is FD on Sno. Because, Sname can take only one


value for the given value of Sno.
 FD is represented as:
Sno ­> Sname
 FD is shown by “­>” which means that Sname is functionally
dependent on Sno.
 Similarly, city and status are also FD on Sno, because for each
value of Sno there will be only one city and status.
 FD is represented as:
Sno ­ > City
Sno ­ > Status

A dependency diagram consists of


the attribute names and all
functional dependencies in a
given table.
Here, following functional
dependencies exist in supplier
table

Sno­>Sname
Sname­>Sno
Sno­>City
Sno­>Status
Sname­>City
Sname­>Status
City­>Status
The FD diagram of relation P
is shown in diagram.
 Here following functional
dependencies exist in Part
table:

Pno ­> Pname
 Pno ­> Color
 Pno ­> Wt

The FD diagram of relation


Shipment is shown in diagram.
 Here following functional
dependencies exist in parts
table

SP (Sno, Pno) ­> [Link]
FULLY F UNCTION D EPENDENCY (FDD)
The term full functional dependency (FFD) is used to indicate the minimum set of attributes in of a
functional dependency (FD).
 In other words, the set of attributes X will be full functionally dependent on the set of attributes Y if
the following conditions are satisfied:
 X is functionally dependent on Y and
 X is not functionally dependent on any subset of Y.
Example:
 Let R be Student(stuId, Name, branch)
 FDs in R include {stuId}→{Name}, but not the reverse
PARTIAL F UNCTION D EPENDENCY
Let us assume a relation R with attributes A, B, C, and D. Also, assume that the set of functional
dependencies F that hold on R as follows;
F = {A → B, D → C}.
 From set of attributes F, we can derive the primary key.
 For R, the key can be (A,D), a composite primary key.
 That means, AD → BC, AD can uniquely identify B and C.
 To identify B, attribute A is enough.
 Likewise, to identify C, attribute D is enough.
 The functional dependencies AD → B or AD → C are called as Partial functional dependencies.

Full Functional Dependency :


In a relation , there exists there exists Full Functional Dependency between any two attributes X and Y,
when X is functionally dependent on Y and is not functionally dependent on any proper subset of Y.
Partial Functional Dependency : In a relation, there exists Partial Dependency, when a non prime
attribute (the attributes which are not a part of any candidate key ) is functionally dependent on a
proper subset of Candidate Key.
example : Let there be a relation R ( Course, Sid , Sname , schedule , room , marks )
Full Functional Dependencies : (Course , Sid) ­> Sname ,(Course , Sid) ­> Marks, etc.
Partial Functional Dependencies : Course ­> Schedule , Course ­> Room

T RIVIAL D EPENDENCY
Consider a table with two columns Student_id and Student_Name.
 {Student_Id, Student_Name} ­> Student_Id is a trivial functional dependency as Student_Id is a
subset of {Student_Id, Student_Name}.
 That makes sense because if we know the values of Student_Id and Student_Name then the value of
Student_Id can be uniquely determined.
 Student_Id ­> Student_Id & Student_Name ­> Student_Name are trivial dependencies too.
N ON ­T RIVIAL D EPENDENCY
An employee table with three attributes: emp_id, emp_name,emp_address.
The following functional dependencies are non­trivial:
emp_id ­> emp_name (emp_name is not a subset of emp_id)
emp_id ­> emp_address (emp_address is not a subset of emp_id)
 On the other hand, the following dependencies are trivial:
{emp_id, emp_name} ­> emp_name
 Here, emp_name is a subset of {emp_id, emp_name}
C LOSURES OF A SET OF FUNCTIONAL DEPENDENCIES
A closure of a set of FDs is a set of all possible FDs that can be derived
from a given set of FDs. It is also referred as a complete set of FDs.
 If F is used to denote the set of FDs for relation R, then a closure of a set
of FDs implied by F is denoted by F + .


Example
Consider the following relation schema
Depositer_Account(cid, ano, acess_date, balance, bname).
For this relation, a set of functional dependencies F can be given as
F = { {cid, ano } → access_date , ano → { balance, bname } }
Find out the closure of F.
Determine each set of attributes X that appears as a left­hand side of FD
in F. { cid, ano } and ano.
Find out { cid, ano } +
Step­1 : { cid, ano } + = { cid, ano }
Step­2 : { cid, ano } + = { cid, ano, acess_date }
# {cid, ano} Í X +
{ cid, ano } + = { cid, ano, acess_date, balance, bname }
# ano Í X +
Step­3: { cid, ano } + = { cid, ano, acess_date, balance, bname }

Find out ano +
Step­1: ano + = ano
Step­2: ano + = { ano, balance, bname }
Step­3: ano + = { ano, balance, bname }
Combine all such sets of X + to form a closure of F.
{ cid, ano } + = { cid, ano, acess_date, balance, bname }
ano + = { ano, balance, bname }

You might also like