Dbms Unit-2
Dbms Unit-2
E.F. Codd proposed the relational Model to model data in the form of relations or
tables.
After designing the conceptual model of the Database using ER diagram, we need to
convert the conceptual model into a relational model which can be implemented using
any RDBMS language like Oracle SQL, MySQL, etc.
The relational model for database management is an approach to logically represent
and manage the data stored in a database.
In this model, the data is organized into a collection of twodimensional interrelated
tables, also known as relations.
Each relation is a collection of columns and rows, where the column represents the
attributes of an entity and the rows (or tuples) represents the records.
The use of tables to store the data provided a straightforward, efficient, and flexible
way to store and access structured information. Because of this simplicity, this data
model provides easy data sorting and data access. Hence, it is used widely around the
world for data storage and processing.
Let's look at a scenario to understand the relational model:
Consider a case where you wish to store the name, the CGPA attained, and the roll
number of all the students of a particular class. This structured data can be easily
stored in a table as described below:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key
which is most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and
developers.
o
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a
tuple( here values may or may not be null ).
o Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.
o
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest
of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID,
EMPLOYEE_NAME),
the name of two employees can be the same, but their EMPLYEE_ID can't be the
same.
Hence, this combination can also be a key.
4. Foreign key
Foreign keys are the column of the table used to point to the primary key of
another table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's
information in the employee table. That's why we link these two tables through the
primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are
related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely
identify
each tuple in a relation. These attributes or combinations of the attributes are called
the candidate keys. One key is chosen as the primary key from these candidate keys,
and
the remaining candidate key, if it exists, is termed the alternate key. In other words,
the
total number of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one candidate
key in
a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that
act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a
composite
key. This key is also known as Concatenated Key.
Basic Structure
Consider the account table of Figure 3.1. It has three column headers: account
number, branchname, and balance. Following the terminology of the relational model,
these headers are attributes. For each attribute, there is a set of permitted values,
called the domain of that attribute. For the attribute branchname, for example, the
domain is the set of all branch names.
LetD1 denote the set of all account numbers, D2 the set of all branch names, and
D3the set of all balances. Any row of account must consist of a 3tuple (v1, v2, v3),
where v1 is an account number (that is, v1 is in domain D1),v2 is a branch name (that
is, v2 is in domain D2), and v3 is a balance (that is, v3 is in domain D3). In general,
account will contain only a subset of the set of all possible rows. Therefore, account is
a subset of
In the account relation of Figure 3.1, there are seven tuples. Let the tuple variable t
refer to the first tuple of the relation. We use the notation t[accountnumber] to denote the
value of t on the accountnumber attribute. Thus, t[accountnumber] = “A101,” and
t[branchname] = “Downtown”. Alternatively, we may write t[1] to denote the value of
tuple t on the first attribute (accountnumber), t[2] to denote branchname, and so on.
Since a relation is a set of tuples, we use the mathematical notation of t ∈r to denote
that tuple t is in relation r.
The order in which tuples appear in a relation is irrelevant, since a relation is a set of
tuples. Thus, whether the tuples of a relation are listed in sorted order, as in Figure
3.1, or are unsorted, as in Figure 3.2, does not matter; the relations in the two figures
above are the same, since both contain the same set of tuples.
For all relations r, the domains of all attributes of r be atomic. A domain is atomic if
elements of the domain are considered to be indivisible units. For example, the set of
integers is an atomic domain, but the set of all sets of integers is a non atomic domain.
Database Schema
Note that the attribute branchname appears in both Branchschema and Account
schema. This duplication is not a coincidence. Rather, using common attributes in
relation schemas is one way of relating tuples of distinct relations. For example,
suppose we wish to find the information about all of the accounts maintained in
branches located in Brooklyn. We look first at the branch relation to find the names of
all the branches located in Brooklyn. Then, for each such branch, we would look in the
account relation to find the information about the accounts maintained at that branch.
This is not surprising recall that the primary key attributes of a strong entity set
appear in the table created to represent the entity set, as well as in the tables created
to represent relationships that the entity set participates in.
Observe that, if a customer has several accounts, we must list her address once for
each account. That is, we must repeat certain information several times. This
repetition is wasteful and is avoided by the use of several relations, as in our example.
Schema Diagram
A database schema, along with primary key and foreign key dependencies, can be
depicted pictorially by schema diagrams. Figure 3.9 shows the schema diagram for our
banking enterprise. Each relation appears as a box, with the attributes listed inside it
and the relation name above it. If there are primary key attributes, a horizontal line
crosses the box, with the primary key attributes listed above the line. Foreign key
dependencies appear as arrows from the foreign key attributes of the referencing
relation to the primary key of the referenced relation.
Do not confuse a schema diagram with an ER diagram. In particular, ER diagrams do
not show foreign key attributes explicitly, whereas schema diagrams show them
explicity.
Many database systems provide design tools with a graphical user interface for
creating schema diagrams.
Relational Database systems are expected to be equipped with a query language that assists users to
query the database. Relational Query Language is used by the user to communicate with the
database user requests for the information from the database. Relational algebra breaks the user
requests and instructs the DBMS to execute the requests. It is the language by which the user
communicates with the database. They are generally on a higher level than any other programming
language. These relational query languages can be Procedural and Non-Procedural.
Types of Relational Query Language
There are two types of relational query language:
• Procedural Query Language
• Non-Procedural Language
Non-Procedural Language
In Non Procedural Language user outlines the desired information without giving a specific
procedure or without telling the steps by step process for attaining the information. It only gives a
single Query on one or more tables to get .The user tells what is to be retrieved from the database
but does not tell how to accomplish it.
For Example: get the name and the contact number of the student with a Particular ID will have a
single query on STUDENT table.
Relational Calculus is a Non Procedural Language .
Relational Calculus exists in two forms:
1. Tuple Relational Calculus (TRC): Tuple Relational Calculus is a non procedural query
language , It is used for selecting the tuples that satisfy the given condition or predicate . The
result of the relation can have one or more tuples (row).
2. Domain Relational Calculus (DRC): Domain Relational Calculus is a Non Procedural
Query Language , the records are filtered based on the domains , DRC uses the list of
attributes to be selected from relational based on the condition.
Basic/Fundamental Operations:
[Link] (σ)
2. Project (∏)
3. Union (�)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)
1. Select Operation (σ) :This is used to fetch rows (tuples) from table(relation)
which satisfies a given condition.
Syntax: σp(r)
σ is the predicate
r stands for relation which is the name of the table
p is prepositional
logic ex: σage > 17 (Student)
This will fetch the tuples(rows) from table Student, for which age will be greater than 17.
σage > 17 and gender = 'Male' (Student)
This will return tuples(rows) from table Student with information of male students, of age
more than 17.
Output:
Example:
∏Name, Age(Student)
Above statement will show us only the Name and Age columns for all the rows of
data in Student table.
Input:
∏ NAME, CITY (CUSTOMER)
Output:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
∏ CUSTOMER_NAME (BORROW) � ∏ CUSTOMER_NAME (DEPOSITOR)
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
CUSTOMER_NAME
Smith
Jones
This is used to combine data from two different relations(tables) into one and fetch data
from the combined relation.
Syntax: A X B
For example, if we want to find the information for Regular Class and Extra Class which
are conducted during morning, then, we can use the following operation:
σtime = 'morning' (RegularClass X ExtraClass)
For the above query to work, both RegularClass and ExtraClass should have
the attribute time.
Notation: E X D
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
Rename Operation (ρ):
This operation is used to rename the output relation for any query operation which
returns result like Select, Project etc. Or to simply rename a relation(table)
Syntax: ρ(RelationNew, RelationOld)
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Join in DBMS:
A JOIN clause is used to combine rows from two or more tables, based on a
related column between them.
Join in DBMS is a binary operation which allows you to combine join product
and selection in one single statement.
The goal of creating a join condition is that it helps you to combine the data from
two or more DBMS tables.
The tables in DBMS are associated using the primary key and foreign keys.
3. RIGHT JOIN
4. FULL JOIN
PROJECT
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as
the condition is satisfied.
It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
SELECT table1.column1,
table1.column2 FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE INNER JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the
right table. If there is no matching join value, it will return NULL.
Syntax
SELECT table1.column1, table1.column2 FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE LEFT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table
and the matched values from the left table. If there is no matching in both tables, it will return
NULL.
Syntax
SELECT table1.column1,
table1.column2 FROM table1 RIGHT
JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE RIGHT JOIN
PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join
tables have all the records from both tables. It puts NULL on the place of matches not
found.
Syntax
SELECT table1.column1,
table1.column2 FROM table1 FULL
JOIN table2
ON table1.matching_column = table2.matching_column;
Query
SELECT EMPLOYEE.EMP_NAME,
[Link] FROM EMPLOYEE
FULL JOIN PROJECT
ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID;
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Decomposition: the process of breaking up or dividing a single relation into two or more
sub relations is called as decomposition of a relation.
Lossless Decomposition
o If the information is not lost from the relation that is decomposed, then
the decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in the
same relation as it was decomposed.
Example:
o EMPLOYEE_DEPARTMENT table:
22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida
o DEPARTMENT table
o Now, when these two relations are joined on the common column "EMP_ID",
then the resultant relation will look like:
Employee ⋈ Department
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more relational
schemas, the loss of information is unavoidable when the original relation is retrieved.
<EmpInfo>
<EmpDetails>
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part
of the DeptDetails relation.
Dependency Preserving
Multivalued Dependency
Example: Suppose there is a bike manufacturer company which produces two colors(white
and black) of each model every year.
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and
"BIKE_MODEL multidetermined COLOR".
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are
– Insertion, update and deletion anomaly.
Example: Suppose a manufacturing company stores the employee details in a table named
employee that has four attributes: emp_id for storing employee’s id, emp_name for storing
employee’s name, emp_address for storing employee’s address and emp_dept for storing
the department details in which the employee works. At some point of time the table looks
like this:
Update anomaly: we have two rows for employee Rick as he belongs to two departments
of the company. If we want to update the address of Rick then we have to update the same
in two rows or the data will become inconsistent. If somehow, the correct address gets
updated in one department but not in other then as per the database, Rick would be having
two different addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if emp_dept field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having emp_dept as D890 would also delete the information
of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we
will discuss about normalization.
EMPLOYEE table:
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial
function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-
prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the
rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table:
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Example
STUDENT
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to
unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
Example
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that
subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we
can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Multivalued Dependency
• Multivalued dependency occurs when two attributes in a table are independent of each other but,
both depend on a third attribute.
• A multivalued dependency consists of at least two attributes that are dependent on a third attribute
that's why it always requires at least three attributes.
Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each
model every year.
In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation
of these dependencies is shown below:
1. BIKE_MODEL → → MANUF_YEAR
2. BIKE_MODEL → → COLOR
• Since for A = 3 there is associated more than one value of Functional dependency can also be
defined as follows:
• An attribute in a relational model is said to be functionally dependent on another attribute in
the table if it can take only one value for a given value of the attribute upon which it is
functionally dependent.
FUNCTIONAL DEPENDENCY DIAGRAM AND EXAMPLES
Sno>Sname
Sname>Sno
Sno>City
Sno>Status
Sname>City
Sname>Status
City>Status
The FD diagram of relation P
is shown in diagram.
Here following functional
dependencies exist in Part
table:
Pno > Pname
Pno > Color
Pno > Wt
T RIVIAL D EPENDENCY
Consider a table with two columns Student_id and Student_Name.
{Student_Id, Student_Name} > Student_Id is a trivial functional dependency as Student_Id is a
subset of {Student_Id, Student_Name}.
That makes sense because if we know the values of Student_Id and Student_Name then the value of
Student_Id can be uniquely determined.
Student_Id > Student_Id & Student_Name > Student_Name are trivial dependencies too.
N ON T RIVIAL D EPENDENCY
An employee table with three attributes: emp_id, emp_name,emp_address.
The following functional dependencies are nontrivial:
emp_id > emp_name (emp_name is not a subset of emp_id)
emp_id > emp_address (emp_address is not a subset of emp_id)
On the other hand, the following dependencies are trivial:
{emp_id, emp_name} > emp_name
Here, emp_name is a subset of {emp_id, emp_name}
C LOSURES OF A SET OF FUNCTIONAL DEPENDENCIES
A closure of a set of FDs is a set of all possible FDs that can be derived
from a given set of FDs. It is also referred as a complete set of FDs.
If F is used to denote the set of FDs for relation R, then a closure of a set
of FDs implied by F is denoted by F + .
Example
Consider the following relation schema
Depositer_Account(cid, ano, acess_date, balance, bname).
For this relation, a set of functional dependencies F can be given as
F = { {cid, ano } → access_date , ano → { balance, bname } }
Find out the closure of F.
Determine each set of attributes X that appears as a lefthand side of FD
in F. { cid, ano } and ano.
Find out { cid, ano } +
Step1 : { cid, ano } + = { cid, ano }
Step2 : { cid, ano } + = { cid, ano, acess_date }
# {cid, ano} Í X +
{ cid, ano } + = { cid, ano, acess_date, balance, bname }
# ano Í X +
Step3: { cid, ano } + = { cid, ano, acess_date, balance, bname }
Find out ano +
Step1: ano + = ano
Step2: ano + = { ano, balance, bname }
Step3: ano + = { ano, balance, bname }
Combine all such sets of X + to form a closure of F.
{ cid, ano } + = { cid, ano, acess_date, balance, bname }
ano + = { ano, balance, bname }