0% found this document useful (0 votes)
16 views174 pages

DBMS (R20)

This document provides an introduction to Database Management Systems (DBMS), covering key concepts such as the definition of a database, its applications, and the advantages and disadvantages of using a DBMS. It also discusses various data models, the architecture of a DBMS, and the importance of data independence. The document outlines the structure and components of a database system, emphasizing the role of data processing and metadata in managing information efficiently.

Uploaded by

lovelysai2224
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views174 pages

DBMS (R20)

This document provides an introduction to Database Management Systems (DBMS), covering key concepts such as the definition of a database, its applications, and the advantages and disadvantages of using a DBMS. It also discusses various data models, the architecture of a DBMS, and the importance of data independence. The document outlines the structure and components of a database system, emphasizing the role of data processing and metadata in managing information efficiently.

Uploaded by

lovelysai2224
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Aditya College of Engineering & Technology

Aditya Nagar, ADB Road, Surampalem - 533437

DATABASE MANAGEMENT SYSTEMS

UNIT I: Introduction

Syllabus:
Introduction: Database system, Characteristics (Database Vs File System), Database
Users (Actors on Scene, Workers behind the scene), Advantages of Database systems,
Database applications. Brief introduction of different Data Models; Concepts of Schema,
Instance and data independence; Three tier schema architecture for data independence;
Database system structure, environment, Centralized and Client Server architecture for
the database.

Objectives:
After studying this unit, you will be able to:
 Define database management system
 Explain database system applications
 State the characteristics and the database approach
 Understand different data models
 Discuss the advantages and disadvantages of database Discuss the database
architecture
DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

Introduction

The information storage and retrieval has become very important in our day-to-day life. The
old era of manual system is no longer used in most of the places. For example, to book your
airline tickets or to deposit your money in the bank the database systems may be used. The
database system makes most of the operations automated. A very good example for this is the
billing system used for the items purchased in a super market. Obviously this is done with the
help of a database application package. Inventory systems used in a drug store or in a
manufacturing industry are some more examples of database. We can add similar kind of
examples to this list.
Apart from these traditional database systems, more sophisticated database systems are used in
the Internet where a large amount of information is stored and retrieved with efficient search
engines. For instance, [Link] is a famous web site that enables users to
search for their favorite information on the net. In a database we can store starting from text
data to very complex data like audio, video, etc.

1.1 Database Management Systems (DBMS)

A database is a collection of related data stored in a standard format, designed to be shared by


multiple users. A database is defined as “A collection of interrelated data items that can be
processed by one or more application programs”.
A database can also be defined as “A collection of persistent data that is used by the application
systems of some given enterprise”. An enterprise can be a single individual (with a small
personal database), or a complete corporation or similar large body (with a large shared
database), or anything in between.

Example: A Bank, a Hospital, a University, a Manufacturing company

Data
Data is the raw material from which useful information is derived. The word data is the plural
of Datum. Data is commonly used in both singular and plural forms. It is defined as raw facts
or observations. It takes variety of forms, including numeric data, text and voice and images.
Data is a collection of facts, which is unorganized but can be made organized into useful
information. The term Data and Information come across in our daily life and are often
interchanged.
Example: Weights, prices, costs, number of items sold etc.
Information
Data that have been processed in such a way as to increase the knowledge of the person who
uses the data. The term data and information are closely related. Data are raw material
resources that are processed into finished information products. The information as data that
has been processed in such way that it can increase the knowledge of the person who uses it.
In practice, the database today may contain either data or information.
Data Processing
The process of converting the facts into meaningful information is known as data processing.
Data processing is also known as information processing.
Metadata
Data that describe the properties or characteristics of other data.

1 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

Data is only become useful when placed in some context. The primary mechanism for
providing context for data is Metadata. Metadata are data that describe the properties, or
characteristics of other data. Some of these properties include data definition, data structures
and rules or constraints. The Metadata describes the properties of data but do not include that
data.

1.2 Database System Applications

Databases are widely used. Here are some representative applications:


1. Banking: For customer information, accounts, and loans, and banking transactions.
2. Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner - terminals situated around the world
accessed the central database system through phone lines and other data networks.
3. Universities: For student information, course registrations, and grades.
4. Credit card transactions: For purchases on credit cards and generation of monthly
statements.
5. Telecommunication: For keeping records of calls made, generating monthly bills,
maintaining balances on prepaid calling cards, and storing information about the
communication networks.
6. Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds.
7. Sales: For customer, product, and purchase information.
8. Manufacturing: For management of supply chain and for tracking production of items in
factories, inventories of items in warehouses / stores, and orders for items.
9. Human resources: For information about employees, salaries, payroll taxes and benefits,
and for generation of paychecks.

1.3 File Systems Versus A DBMS (Characteristics)


In earlier days, the databases were created directly on top of file systems. File system has many
disadvantages.

1. Not enough primary memory to process large data sets. If data is maintained in other
storage devices like disks, tapes and bringing relevant data to main memory, it increases
the cost of performance. Problem in accessing the large data due to addressing the data
using 32 bit or 64 bit mode addressing mechanism.
2. Programs must be written to process the user request to process the data stored in files
which are complex in nature because of large volume of data to be searched.
3. Inconsistent data and complexity in providing concurrent accesses.
4. Not sufficiently flexible to enforce security policies in which different users have
permission to access different subsets of the data.

A DBMS is a piece of software that is designed to make the preceding tasks easier. By storing
data in a DBMS, rather than as a collection of operating system Files, we can use the DBMS's
features to manage the data in a robust and efficient manner.

2 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

1.4 Advantages of DBMS

One of the main advantages of using a database management system is that the organization
can exert via the DBA, centralized management and control over the data. The database
administrator is the focus of the centralized control.

The following are the major advantages of using a Database Management System (DBMS):
Data independence: Application programs should be as independent as possible from details
of data representation and storage. The DBMS can provide an abstract view of the data to
insulate application code from such details.

Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.

Data integrity and security: The DBMS can enforce integrity constraints on the data. The
DBMS can enforce access controls that govern what data is visible to different classes of users.

Data administration: When several users share the data, centralizing the administration of data
can offer significant improvements. It can be used for organizing the data representation to
minimize redundancy and for fine-tuning the storage of the data to make retrieval efficient.

Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in
such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures. .

Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS.

1.5 Disadvantages of DBMS

The disadvantage of the DBMS system is overhead cost. The processing overhead introduced
by the DBMS to implement security, integrity, and sharing of the data causes a degradation of
the response and throughput times. An additional cost is that of migration from a traditionally
separate application environment to an integrated one.

Even though centralization reduces duplication, the lack of duplication requires that the
database be adequately backup so that in the case of failure the data can be recovered.

Backup and recovery operations are complex in a DBMS environment, and this is an increment
in a concurrent multi-user database system. A database system requires a certain amount of
controlled redundancies and duplication to enable access to related data items.

1.6 Data Models


A data model is a collection of high-level data description constructs that hide many low-level
storage details. A DBMS allows a user to define the data to be stored in terms of a data model. Most
database management systems today are based on the relational data model.

A schema is a description of a particular collection of data, using the given data model. The
relational model of data is the most widely used model today.

3 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

Main concept: relation, basically a table with rows and columns.


Every relation has a schema, which describes the columns, or fields.
Data Model is a collection of high-level data description constructs that hide many low-level
storage details. A DBMS allows a user to define the data to be stored in terms of a data model. Most
database management systems today are based on the Relational data model. Relational models
include – IBM’s DB2, Informix, Oracle, Sybase, Microsoft’s Access, Foxbase, Paradox, Tandem and
Teradata.

1.7 Categories of data models

 Conceptual (high-level, semantic) data models: Provide concepts that are close to the way
many users perceive data (Also called entity-based or object-based data models).
 Physical (low-level, internal) data models: Provide concepts that describe details of how data
is stored in the computer.
 Implementation (representational) data models: Provide concepts that fall between the
above two.

1. Hierarchical models:
Advantages:
 Hierarchical model is simple to construct and operate on.
 Corresponds to a number of natural hierarchical organized domains – e.g., assemblies in
manufacturing, personal organization in companies.
 Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT etc.,
Disadvantages:
 Navigational and procedural nature of processing.
 Database is visualized as a linear arrangement of records.
 Little scope for “query optimization”.
 One-to-many relationships.

2. Network model:
Advantages:

 Network model is able to model complex relationships and represents semantics of


add/delete on the relationships.
 Can handle most situations for modeling using record types and relationship types.
 Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc. Programmers can do optimal navigation through the database.
Disadvantages:

 Navigational and procedural nature of processing.


 Database contains a complex array of pointers that are expensive and difficult to update
when inserting and deleting.
 Little scope for automated “query optimization”.

4 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

3. Relational model:
 A relation, basically a table with rows and columns.
 Every relation has a schema, which describes the columns, or fields.
 Student information in a university database may be stored in a relation with the following
schema
 Students (sid: string, name: string, login: string, age: integer, gpa: real)

1.8 Levels of Abstraction in a DBMS (Three tier schema architecture )

The data in a DBMS is described at three levels of abstraction.


The database description consists of a schema at each of these three levels of abstraction.
External, Conceptual and Physical
Views describe how users see the data.
Conceptual schema defines logical structure.
Physical schema describes the files and indexes used.

Conceptual schema:

 The conceptual schema(also called as logical schema) describes the stored data in terms of the
data model of the DBMS.
 In a relational DBMS, the conceptual schema describes all relations that are stored in the
database.
 In our sample university database, these relations contain information about entities, such as
students and faculty, and about relationships, such as students’ enrollment in courses.
Students(sid: string, name: string, login: string, age: integer, gpa:
real)
Faculty(fid: string, fname: string, salary : real)
Courses(cid: string, cname: string, credits: integer)
Rooms(nw: integer, address: string, capacity: integer)
Enrolled (sid: string, cid: string, grade: string)
Teaches (fid: string, cid: string)

5 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

The choice of relations, and the choice of fields for each relation, is not always obvious, and
the process of arriving at a good conceptual schema is called conceptual database design.

Physical Schema:

 The physical schema specifies storage details.


 It summarizes how the relations described in the conceptual schema are actually stored on
secondary storage devices such as disks and tapes.
 Decides what file organizations to use to store the relations and create auxiliary data
structures, called indexes, to speed up data retrieval operations.
 A sample physical schema for the university database is to store all relations as unsorted files
of records.
o Create indexes on the first column of the students, faculty and courses relations, the
salary column of faculty, and the capacity of column of rooms.

External Schema:

 This schema allows data access to be customized at the level of individual users or groups of
users.
 A database has exactly one conceptual schema and one physical schema, but it may have
several external schemas.
 An external schema is a collection of one or more views and relations from the conceptual
schema.
 A view is conceptually a relation, but the records in a view are not stored in the DBMS.

1.9 Data Independence


Application programs are insulated from changes in the way the data is structured and stored.
Data independence is achieved through use of the three levels of data abstraction.
Logical data independence: users can be shielded from changes in the logical structure of the
data, or changes in the choice of relations to be stored. This is the independence to change the
conceptual schema without having to change the external schemas and their application
programs.
Physical data independence: the conceptual schema insulated users from changes in physical
storage details. This is the independence to change the internal schema without having to change
the conceptual schema.

1.10 Architecture of a DBMS


The functional components of a database system can be broadly divided into query processor
components and storage manager components. The query processor includes:
1. DML Compiler: It translates DML statements in a query language into low-level instructions that
the query evaluation engine understands.
2. Embedded DML Pre-compiler: It converts DML statements embedded in an application
program to normal procedure calls in the host language. The pre-compiler must interact with
the DML compiler to generate the appropriate code.
3. DDL Interpreter: It interprets DDL Stateline its and records them in a set of tables containing

6 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

metadata.
4. Transaction Manager: Ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
5. File Manager: Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
6. Buffer Manager: Is responsible for fetching data from disk storage into main memory and
deciding what data to cache in memory.
Also some data structures are required as part of the physical system implementation:
1. Data Files: The data files store the database by itself.

2. Data Dictionary: It stores metadata about the structure of the database, as it is used heavily.

3. Indices: It provides fast access to data items that hold particular values.

4. Statistical Data: It stores statistical information about the data in the database. This
information used by the query processor to select efficient ways to execute a query.

1.11 People Who Deal With Databases


Quite a variety of people are associated with the creation and use of databases. Obviously, there
are database implementors, who build DBMS software, and end users who wish to store and use
data in a DBMS.
Database implementors work for vendors such as IBM or Oracle. End users come from a diverse
and increasing number of fields.

7 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

In addition to end users and implementors, two other classes of people are associated with a
DBMS: application programmers and database administrators (DBAs).
Database application programmers develop packages that facilitate data access for end users,
who are usually not computer professionals, using the host or data languages and software tools
that DBMS vendors provide.

The task of designing and maintaining the database is entrusted to a professional called the
database administrator.

The DBA is responsible for many critical tasks:

 Design of the conceptual and physical schemas: The DBA is responsible for interacting with
the users of the system to understand what data is to be stored in the DBMS and how it is
likely to be used. Based on this knowledge, the DBA must design the conceptual schema
(decide what relations to store) and the physical schema (decide how to store them).
 Security and authorization: The DBA is responsible for ensuring that unauthorized data
access is not permitted. In general, not everyone should be able to access all the data. In a
relational DBMS, users can be granted permission to access only certain views and relations.
 Data availability and recovery from failures: The DBA must take steps to ensure that if the
system fails, users can continue to access as much of the uncorrupted data as possible.
 Database tuning: The needs of users are likely to evolve with time. The DBA is responsible
for modifying the database, in particular the conceptual and physical schemas, to ensure
adequate performance as user requirements change.

1.12 Database Environment


A database management system (DBMS) is a collection of programs that enables users to create
and maintain a database. The DBMS is hence a general-purpose software system that facilitates the
processes of defining, constructing, manipulating, and sharing databases among various users and
applications.

Defining a database involves specifying the data types, structures, and constraints for the data to
be stored in the database.

Constructing the database is the process of storing the data itself on some storage medium that is
controlled by the DBMS.

Manipulating a database includes such functions as querying the database to retrieve specific
data, updating the database to reflect changes in the mini world, and generating reports from the
data.

Sharing a database allows multiple users and programs to access the database concurrently.

Other important functions provided by the DBMS include protecting the database and maintaining
it over a long period of time.

Protection includes both system protection against hardware or software malfunction (or crashes),
and security protection against unauthorized or malicious access. A typical large database may
have a life cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time. We can call the database and
DBMS software together a database system.

8 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

1.13 Database Architecture


Database architecture uses programming languages to design a particular type of software for
businesses or organizations. Database architecture focuses on the design, development,
implementation and maintenance of computer programs that store and organize information for
businesses, agencies and institutions.

The architecture of a DBMS can be seen as either single tier or multi-tier. The tiers are classified as
follows:

1-tier architecture

2-tier architecture

3-tier architecture

n-tier architecture

1-tier architecture:

One-tier architecture involves putting all of the required components for a software application
or technology on a single server or platform.

9 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

2-tier architecture:
The two-tier is based on Client Server architecture. The two-tier architecture is like client server
application. The direct communication takes place between client and server. There is no
intermediate between client and server.

3-tier architecture:
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design a
DBMS.

1.14 Centralized DBMS Architecture


Architectures for DBMSs have followed trends similar to those for general computer system
architectures. Earlier architectures used mainframe computers to provide the main processing for
all functions of the system, including user application programs and user interface programs, as
well as all the DBMS functionality.

10 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

As prices of hardware declined, most users replaced their terminals with personal computers
(PCs) and workstations. At first, database systems used these computers in the same way as they
had used display terminals, so that the DBMS itself was still a centralized DBMS in which all the
DBMS functionality, application program execution, and user interface processing were carried
out on one machine.

Gradually, DBMS systems started to exploit the available processing power at the user side,
which led to client/server DBMS architectures.

1.15 Client/Server Architecture:


The client/server architecture was developed to deal with computing environments in which a
large number of PCs, workstations, file servers, printers, database servers, Web servers, and other
equipment are connected via a network. The idea is to define specialized servers with specific
functionalities.

The resources provided by specialized servers can be accessed by many client machines. The
client machines provide the user with the appropriate interfaces to utilize these servers, as well as
with local processing power to run local applications. This concept can be carried over to
software, with specialized software-such as a DBMS or a CAD (computer-aided design) package
being stored on specific server machines and being made accessible to multiple clients.

The concept of client/server architecture assumes an underlying framework that consists of


many PCs and workstations as well as a smaller number of mainframe machines, connected via
local area networks and other types of computer networks. A client in this framework is typically
a user machine that provides user interface capabilities and local processing. When a client

11 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

requires access to additional functionality-such as database access-that does not exist at that
machine, it connects to a server that provides the needed functionality.

A server is a machine that can provide services to the client machines, such as file access,
printing, archiving, or database access. In the general case, some machines install only client
software, others only server software, and still others may include both client and server
software. However, it is more common that client and server software usually run on separate
machines.

In client/server architecture, the user interface programs and application programs can run on
the client side. When DBMS access is required, the program establishes a connection to the DBMS
(which is on the server side); once the connection is created, the client program can communicate
with the DBMS. A standard called Open Database Connectivity (ODBC) provides an application
programming interface (API), which allows client-side programs to call the DBMS, as long as
both client and server machines have the necessary software installed. Most DBMS vendors
provide ODBC drivers for their systems.

Review Questions

1. Distinguish between database systems and file systems.


2. Discuss about the client server architecture of the database.
3. Define DBMS. Explain database users in detail.
4. What are advantages of DBMS? Explain.
5. With a neat diagram, explain the structure of Database Management System.
6. What is data independence and how does a DBMS support it? Explain.
7. What is a Database model? List out various database models and explain any two of them.
8. Explain the difference between external, logical and physical level schemas. How are these
different schema layers related to the concepts of logical and physical data independence?
9. Compare and contrast various Data Models.
10. Demonstrate data abstraction implementation in DBMS.
11. List and explain various data models used for database design.
12. Explain about Buffer management in DBMS.

12 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION

13. Who are the different database users? Explain their interfaces to database management
system.
14. Describe the client server architecture for the database with necessary diagram.
15. Define Schema. Explain three level architecture in DBMS.
16. Explain Data Independence and its types in detail.
17. How does DBMS provide data abstraction? Explain the concept of data independence.
18. With a neat diagram describe the overall system structure of DBMS.

References:

 Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
 C.J. Date, Introduction to Database Systems, Pearson Education.
 Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.

13 ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY


Aditya College of Engineering & Technology
Aditya Nagar, ADB Road, Surampalem - 533437

DATABASE MANAGEMENT SYSTEMS

UNIT II: Relational Model & Basic SQL

Syllabus:
Relational Model: Introduction to relational model, concepts of domain, attribute, tuple,
relation, importance of null values, constraints (Domain, Key constraints, integrity constraints)
and their importance
BASIC SQL: Simple Database schema, data types, table definitions (create, alter), different
DML operations (insert, delete, update), basic SQL querying (select and project) using where
clause, arithmetic & logical operations, SQL functions (Date and Time, Numeric, String
conversion).

Objectives:
After studying this unit, you will be able to:
 Describe Relational model
 Describe SQL and data types
 Explain the basic structures of SQL queries
 Know how to create tables
 Realise aggregate functions and null values
DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.1 Introduction
Relational Model was proposed by E.F Codd to model data in the form of relations or tables. After
designing the conceptual model of database using ER diagram, we need to convert the conceptual
model in the relational model which can be implemented using any RDBMS (Relational Data Base
Management System) like SQL, MY SQL etc.

The relational model is very simple and elegant; a database is a collection of one or more relations,
where each relation is a table with rows and columns.

This simple tabular representation enables even new users to understand the contents of a
database, and it permits the use of simple, high-level languages to query the data.

2.2 Relational Model


Relational Model represents how date is stored in relational databases.
A Relational database stores data in the form of relations (tables).
Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE as
shown in table.
ROLL_NO NAME ADDRESS PHONE AGE
1 Nishma Hyderabad 9455123451 28
2 Sai Guntur 9652431843 27
3 Swetha Nellore 9156253131 26
4 Raji Ongole 9215635311 25
Attribute: Attributes are the properties that define a relation.
Ex: ROLL_NO, NAME

Tuple: Each row in a relation is known as tuple.


Ex:
1 Nishma Hyderabad 9455123451 28
Degree: The number of attributes in the relation is known as degree.
Ex: The degree of the given STUDENT table is 5.
Column: Column represent the set of values for a particular attribute.
The column ROLL_NO is extracted from the relation STUDENT.
Ex:

ROLL_NO
1

Null values: The value which is not known or unavailable is called NULL VALUE. It is represented
by blank space.

Cardinality: The number of tuples are present in the relation is called as its cardinality.

Ex: The Cardinality of the STUDENT table is 4.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 2


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.3 Concept Of Domain


The domain of a database is the set of all allowable values (or) attributes of the database.
Ex: Gender (Male, Female, Others).

Relation
 A relation is defined as a set of tuples and attributes.
 A relation consists of Relation schema and relation instance.
 Relation schema: A relation schema represents the name of the relation with its attributes.
Ex: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is Relation schema for
STUDENT.
 Relation instance: The set of tuples of a relation at a particular instance of a time is called
Relation Instance.
An instance of „Employee „relation

Emp_code Emp_Name Dept_Name


01234 John HR
12567 Smith Sales
21678 Sai Production
12456 Jay Design

2.4 Importance of Null values:


 SQL supports a special value known as NULL which is used to represent the values of
attributes that may be unknown or not apply to a tuple.
 For example, the apartment_number attribute of an address applies only to the address that
is in apartment buildings and not to other types of residences.
 It is important to understand that a NULL value is different from Zero value.
 A Null value is used to represent a missing value, but that is usually has one of the following
interpretations:
 Value unknown (Value exists but it is unknown)
 Value not available (exists but it is purposely withheld)
 Attribute not applicable (undefined for this tuple)
 It is often not possible to determine which of the meanings is intended.

2.5 Constraints
 On modeling the design of the relational data base, we can put some rules(conditions) like
what values are allowed to be inserted in the relation
 Constraints are the rules enforced on the data columns of a table. These are used to limit the
type of data that can go in to a table
 This Ensure the accuracy and reliability of the data in the database. Constraints could be
either on a column level on a table level.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 3


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.6 Domain Constraints In DBMS


 In DBMS table is viewed as a combination of rows and columns
 For example, if you are having a column called month and you want only (jan, feb,
march……) as values allowed to be entered for that particular column which is referred to as
domain for that particular column
Definition: Domain constraint ensures two things it makes sure that the data value entered for that
particular column matches with the data type defined by that column
It shows that the constraints (NOT NULL/UNIQUE/PRIMARY KEY/FOREIGN
KEY/CHECK/DEFAULT)

Domain constraint= data type check for the column +constraints.


Example:, we want to create a table “STUDENT” with “stu_id” field having a value greater
than 100, can create a domain and table like this.

 Create domain id_value int constraint id_test check (value>=100);


 CREATE table STUDENT (stu_id id value primary key, stu_name varchar
(30), stu_age int);

2.7 Key constraints in DBMS:


 Constraints are nothing but the rules that are to be followed while entering data into columns of the
database table.
 Constraints ensure that the data entered by the user into columns must be within the criteria specified by
the condition.
 We have 6 types of key constraints in DBMS
1. Not Null
2. Unique
3. Default
4. Check
5. Primary key
6. Foreign key

1. Not Null:
 Null represents a record where data may be missing data or data for that record may be optional.
 Once not null is applied to a particular column, you cannot enter null values to that column.
 A not null constraint cannot be applied at table level.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 4


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Example:

Create table EMPLOYEE (id int Not null, name varchar Not null, Age int
not null, address char (25), salary decimal (18,2), primary key(id));

 In the above example we have applied not null on three columns id, name and age which
means whenever a record is entered using insert statement all three columns should contain a
value other than null.
 We have two other columns address and salary, where not null is not applied which means
that you can leave the row as empty.

2. Unique:

Some times we need to maintain only. Unique data in the column of a database table, this is
possible by using a Unique constraint.

 Unique constraint ensures that all values in a column are Unique.

Example:

Create table PERSONS (id int unique, last_name varchar (25) not null,
First name varchar (25), age int);

 In the above example, as we have used unique constraint on ID column we are not supposed
to enter the data that is already present, simply no two ID values are same.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 5


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

3. Default:

Default in SQL is used to add default data to the columns.

 When a column is specified as default with same value then all the rows will use the same
value i.e., each and every time while entering the data we need not enter that value.
 But default column value can be customised i.e., it can be over ridden when inserting a data
for that row based on the requirement.

(Row with default values “abc”)

Example:

Create table EMPLOYEE (id int Not null, last_name varchar (25) Not null,
first_name varchar (25), Age int, city varchar (25) Default Hyderabad);

 As a result, whenever you insert a new row each time you need not enter a value for this
default column that is entering a column value for a default column is optional.

4. Check:
 Check constraint ensures that the data entered by the user for that column is within the range
of values or possible values specified.

Example: Create table STUDENT (id int, name varchar (25), age int,
check(age>=18));

 As we have used a check constraint as (age>=18) which means value entered by user for this
age column while inserting the data must be less than or equal to 18.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 6


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

5. Primary Key:
 A primary key is a constraint in a table which uniquely identifies each row record in a
database table by enabling one or more column in the table as primary key.

Creating a primary key:

 A particular column is made as a primary key column by using the primary key keyword
followed by the column name.
Example:
Create table EMP (ID int, name varchar (20), age int, course varchar
(10), Primary key (ID));
 Here we have used the primary key on ID column then ID column must contain unique
values i.e., one ID cannot be used for another student.

6. Foreign Key:
 The foreign key constraint is a column or list of columns which points to the primary key
column of another table.
 The main purpose of the foreign key is only those values are allowed in the present table that
will match to the primary key column of another table.

From the above two tables, COURSE_ID is a primary key of the table STUDENT_MARKS and also
behaves as a foreign key as it is same in STUDENT_DETAILS and STUDENT_MARKS.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 7


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Example:

(Reference Table)

Create table CUSTOMER1 (id int, name varchar (25), course varchar (10),
primary key (ID));

(Child table)

Create table CUSTOMER2 (id int, marks int, references customer1(ID));

2.8 Integrity Constraints in DBMS:


 There are two types of integrity constraints
1. Entity Integrity Constraints
2. Referential Integrity Constraints

Entity Integrity constraints:

 These constraints are used to ensure the uniqueness of each record or row in the data
table.
 Entity Integrity constraints says that no primary key can take NULL VALUE, since
using primary key we identify each tuple uniquely in a relation.

Example:

Explanation:

 In the above relation, EID is made primary key, and the primary key can‟t
take NULL values but in the 3rd tuple, the primary key is NULL, so it is
violating Entity integrity constraints.

Referential Integrity constraints:

 The referential integrity constraint is specified between two relations or tables and used
to maintain the consistency among the tuples in two relations.
 This constraint is enforced through foreign key, when an attribute in the foreign key of
relation R1 have the same domain as primary key of relation R2, then the foreign key of
R1 is said to reference or refer to the primary key of relation R2.
 The values of the foreign key in a tuple of relation R1 can either take the values of the
primary key for some tuple in Relation R2, or can take NULL values, but can‟t be empty.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 8


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Explanation:

 In the above, DNO of the first relation is the foreign key and DNO in the second relation is the
primary key
 DNO=22 in the foreign key of the first relation is not available in the second relation so, since
DNO=22 is not defined in the primary key of the second relation therefore Referential
integrity constraints is violated here.

2.9 Basic SQL (introduction)

 SQL stands for Structure Query Language it is used for storing and managing data in
relational database management system.
 It is standard language for relational database system. It enables a user to create, read, update
and delete relational databases and tables.
 All the RDBMS like MYSQL, Oracle, MA access and SQL Server use SQL as their standard
database language.
 SQL allows users to Query the database in a number of ways using statements like common
English.
Rules: SQL follows following rules

 SQL is not a case sensitive. Generally, keywords are represented in


UPPERCASE.
 Using the SQL statements, you can perform most of the actions in a database.
 Statements of SQL are dependent on text lines. We can use a single SQL
statement on one or multiple text line.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 9


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.10 SQL Process:


 When an SQL command is executing for any RDBMS, then the system figure out the best
way to carry out the request and the sql engine determines that how to interrupt the task.
 In the process, various components are included. These components can be optimization
engine, query engine, query dispatcher etc.,
 All the non-sql queries are handled by the classic query engine, but sql query engine won‟t
handle logical files.

2.11 Characteristics of SQL:


 SQL is easy to learn.
 SQL is used to access data from relational database management system.
 SQL is used to describe the data.
 SQL is used to create and drop the database and table.
 SQL allows users to set permissions on tables, procedures and views.

2.12 Simple database Schema:


 A database schema is a structure that represents the logical storage of the data in the
database.
 It represents the organization of data and provides information about the relationships
between the tables in a given database.
 A database schema is the logical representation of a database, which shows how the data is
stored logically in the entire database.
 It contains list of attributes and instruction that informs the database engine that how the
data is organized and how the elements are related to each other.
 A database schema contains schema objects that may include tables, fields, packages, views,
relationship, primary key, foreign key.
 In actual, the data is physically stored in files that may be in unstructured form, but to
retrieve it and use it, we need to keep them in a structured manner. To do this a database
schema is used. It provides knowledge about how the data is organized in a database and
how it is associated with other data.
 A database schema object includes the following:
 Consistent formatting for all data entries.
 Database objects and unique keys for all data entries.
 Tables with multiple columns, and each column contains its names and datatypes.
 The given diagram is an example of a database schema it contains three tables, their
data types. This also represents the relationships between the tables and primary keys
as well as foreign keys.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 10


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.13 SQL Commands:


SQL commands are categorized into three types.

1. Data Definition Language (DDL): used to create (define) a table.

2. Data Manipulation Language (DML): used to update, store and retrieve data from tables.

3. Data Control Language (DCL): used to control the access of database created using DDL and
DML.

2.14 SQL DATATYPES :


SQL data type is used to define the values that a column can contain

Every column is required to have a name and data type in the database table.

DATA TYPES OF SQL :

SQL DATA
TYPES

Binary data Numeric data Extract String data Date data type
type type numeric data type
type

1. BINARY DATATYPES:
There are three types of binary data types which are given below

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 11


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

DATA TYPE DESCRIPTION

Binary It has a maximum length of 800 bytes. It contains a fixed-


length binary data
Var binary It has a maximum length of 800 bytes. It contains a
variable - length binary data
Image It has a maximum length of 2,147,483,647 bytes. It contains
a variable - length binary data

2. NUMERIC DATATYPE:
DATA TYPE FROM TO DESCRIPTION

Float -1.79 E 1.79 E It is used to specify a floating-point value.


+308 +308 Ex: 6.2, 2.9 etc
Real -3.40 E 3.40 E It specifies a single precision floating
+38 +38 point number.

3. EXACT NUMERIC DATA TYPE:


DATA TYPE DESCCRIPTION

Int It is used to specify an integer value


Small int It is used to specify small integer value
Bit It has the number of bits to store
Decimal It specifies a numeric value that can have a decimal number
Numeric It is used to specify a numeric value

4. DATE AND TIME DATATYPES:


DATA TYPE DESCRIPTION
Date It is used to store the year, month, and days value
Time It is used to store the hour, minute, and seconds value
Time stamp It stores the year, month, hour, minute, and the second value

5. STRING DATATYPE:
DATA TYPE DESCRIPTION
Char It has a maximum length of 8000 characters. It contains fixed-length non-
Unicode characters.
Varchar It has a maximum length of 8000 characters. It contains variable-length
non-Unicode characters.
Text It has a maximum length of 2,147,483,647 characters. It contains variable-
length non-Unicode characters.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 12


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.15 TABLE DEFINITIONS: (CREATE, ALTER)


SQL TABLE: SQL table is a collection of data which is organized in terms of rows and columns.

 In DBMS, the table is known as relation and row as a tuple


 Let‟s see an example of the “EMPLOYEE “table

EMP_ID EMP_NAME CITY PHONE_ID


1 Kristen Washington 7289201223
2 Anna Franklin 9378282882
3 Jackson California 9264783838
4 Daniel Hawaii 9638482678
 In the above table, “EMPLOYEE” is the table name, “EMP_ID, “EMP_NAME”, “CITY”,”
PHONE-NO” are the column names.
 The combination of data of multiple columns forms a row
EG: 1, “Kristen”, “Washington” and “7289201223 “are the data of one row

2.16 OPERATIONS ON TABLE:


1. Create table
2. Alter table
3. Drop table

[Link] table: SQL create table is used to create a table in the database. To define the table, you
should define the name of the table and also define its column and column‟s data type.

SYNTAX:

Create table table_name (“column1” “datatype”,

“column2” “datatype”,

“column3” “datatype”,

….

“column N” “datatype”);

EXAMPLE:

SQL > create table employee (emp_id int, emp_name varchar (25), phone_no int,
address char (30));

 If you create the table successfully, you can verify the table by looking at the message by the
sql server. else you can use DESC command as follows

SQL > DESC employee;

FIELD TYPE NULL DEFAULT EXTRA


Emp_id Int (11) No NULL
Emp_name Varchar (25) No NULL
Phone_no No Int (11) NULL
address yes NULL Char(30)

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 13


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2. ALTER TABLE:

 The alter table command adds, delete or modifies columns in a table


 The alter table command also adds and deletes various constraints in a table
 The following SQL adds an “EMAIL” column to the “EMPLOYEE “table

SYNTAX:

ALTER table table_name add column1 datatype;

EXAMPLE:

ALTER table employee add email varchar (255);

SQL > DESC employee;

FIELD TYPE NULL DEFAULT EXTRA


Emp_id Int (11) No NULL
Emp_name Varchar (25) No NULL
Phone_no No Int (11) NULL
Address Yes NULL Char (30)
Email Varchar (255) NULL

3. DROP TABLE:

 The drop table command deletes a table in the data base


 The following example SQL deletes the table “EMPLOYEE”

SYNTAX :

DROP table table_name;

EXAMPLE:

DROP table employee;

 Dropping a table results in loss of all information stored in the table.

2.17 Different DML Operations (insert, delete, update):


 DML-Data Manipulation Language.
 Data Manipulation Commands are used to manipulate data to the database.
 Some of the data manipulation commands are
1. Insert
2. Update
3. Delete

1. Insert:
SQL insert statement is a sql query. It is used to insert a single multiple records in a table.

Syntax:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 14


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Insert into table name values (value 1, value 2, value 3);

Let‟s take an example of table which has 3 records within it.

▪ insert into student values(„alekhya‟,501,‟hyderabad‟);

▪ insert into student values(„deepti‟,502,‟guntur‟);

▪ insert into student values(„ramya‟,503,‟nellore‟);

The following table will be as follows:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Ramya 503 Nellore

2. Update:
 The SQL Commands update are used to modify the data that is already in the database.
 SQL Update statement is used to change the data of records held by tables which rows is to
be update, it is decided by condition to specify condition, we use “WHERE” clause.
 The update statement can be written in following form:

Syntax:
Update table_name set column_name=expression where condition;

Example:

Let‟s take an example: here we are going to update an entry in the table.

Update students set name=‟rasi‟ where id=503;

After update the table is as follows:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore

3. Delete:
 The SQL delete statement is used to delete rows from a table.
 Generally, delete statement removes one or more records from a table.

Syntax:

delete from table_name [where condition];

Example:

Let us take a table named “student” table

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 15


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Delete from students where id=501;

Resulting table after the query:

NAME ID CITY
Deepti 502 Guntur
Rasi 503 Nellore

2.18 Basic SQL querying (select and project) using where clause:
 The following are the various SQL clauses:

SQL Clause

Group by clause having clause Order by clause

1. Group by:
 SQL group by statement is used to arrange identical data into groups.
 The group by statement is used with the SQL select statement.
 The group by statement follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.

Syntax:
Select column from table_name where column group by column, order by
column;

Sample table: product


PRODUCT COMPANY QTY RATE COST
Item 1 Com 1 2 10 20
Item 2 Com 2 3 25 75
Item 3 Com 1 2 30 60
Item 4 Com 3 5 10 50
Item 5 Com 2 2 20 40

Example:
 Select company count (*) from product group by company;

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 16


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Output:
Com 1 2
Com 2 3
Com 3 5

2. Having clause:
 Having clause is used to specify a search condition for a group or an aggregate.

Having clause is used in a group by clause, if you are not using group by clause then you can
use having function like a where clause.

Syntax:
Select column1, column2 from table_name

Where conditions

Group by column1, column2

Having conditions

Order by column1, column2;

Example:
 select company count (*) from product

Group by company

Having count (*) > 2;

Output:
Com 3 5
Com 2 2

3. Order by clause:
The order by clause sorts the result _set in ascending or descending order.

Syntax:

Select column1, column2, from table_name

Where condition

Order by column1, column2…asc;

Sample table:

Take a student table

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 17


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Example:

Select * from student order by name;

Output:

NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore

2.19 SQL Where clause:


 A where clause in SQL is a data manipulation language statement.
 Where clauses are not mandatory clauses of SQL DML statements but it can be used to limit
the number of rows affected by a SQL DML statement or returned by query.
 Actually, it follows the [Link] returns only those queries which the specific conditions.

Syntax:
Select column1, column2, …………column from table_name where[condition];

 Where clause uses same conditional selection.

= Equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to

2.20 Arithmetic and logical operations:

SQL operators:
 SQL statements generally contain some reserved words or characters that are used to
perform operations such as arithmetic and logical operations etc. Their reserved words are
known as operators.

SQL arithmetic operator:


 We can use various arithmetic operators on the data stored in tables.
 Arithmetic operators are:

+ Addition
- Subtraction
/ Division
* Multiplication

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 18


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

% modulus

1. Addition (+):
It is used to perform addition operation on data items.

Sample table:
EMP_ID EMP_NAME SALARY

1 Alex 25000
2 John 55000
3 Daniel 52000
4 Sam 12312

 select emp id, emp_name, salary, salary+100 as “salary +100” from


addition;

Output:
EMP_ID EMP_NAME SALARY SALARY+100
1 Alex 25000 25100
2 John 55000 55100
3 Daniel 52000 52100
4 Sam 12312 12412
 Here we have done addition of 100 to each emp‟s salary.

2. Subtraction (-):
 It is used to perform subtraction on the data items.

Example:
Select emp_id, emp_name, salary, salary-100 as “salary-100” from
subtraction;
EMP_ID EMP_NAME SALARY SALARY-100
1 Alex 25000 24900
2 John 55000 54900
3 Daniel 52000 51900
4 Sam 90000 89900
Here we have done subtraction of 100 for each emp‟s salary.

3. Division (/):
 The division function is used to integer division (x is divided by y).an integer value is
returned.

Example:
 Select emp_id, emp_name, salary, salary/100 as “salary/100” from
division;

EMP_ID EMP_NAME SALARY Salary/100


1 Alex 25000 250
2 John 55000 550
3 Daniel 52000 520
4 Sam 90000 900

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 19


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

4. Multiplication (*):
 It is used to perform multiplication of data items.
 Select emp_id, emp_name, salary, salary*100 as “salary*100” from
multiplication;

EMP_ID EMP_NAME SALARY SALARY*100


1 Alex 25000 2,500,000
2 John 55000 5,500,000
3 Daniel 52000 5,200,000
4 Sam 90000 9,000,000
 Here we have done multiplication of 100 to each emp‟s salary.

5. Modulus (%):
 It is used to get remainder when one data is divided by another.
 Select emp_id, emp_name, salary, salary%25000 as “salary%25000” from
modulus;

Output:
EMP_ID EMP_NAME SALARY SALARY%25000
1 Alex 25000 0
2 John 55000 5000
3 Daniel 52000 2000
4 Sam 90000 15000
 Here we have done modulus operation to each emp‟s salary.

2.21 Logical operations:


 Logical operations allow you to test for the truth of a condition.
 The following table illustrates the SQL logical operator.
OPERATOR MEANING
ALL Returns true if all comparisons are true
AND Returns true if both expressions are true
ANY Returns true if any one of the comparisons is
true
BETWEEN Return true if the operand is within a range
IN Return true if the operand is equal to one of
the values in a list
EXISTS Return true if the sub query contains any rows
1. AND:
The AND operator allows you to construct multiple condition in the WHERE clause of an SQL
statement such as select.

 The following example finds all employees where salaries are greater than the 5000 and less
than 7000.
 Select first_name, last_name, salary from employees where
salary>5000 AND salary<7000 order by salary;

Output:
FIRST_NAME LAST_NAME SALARY
John Wesley 6000
Eden Daniel 6000
Luis Popp 6900
Shanta Suji 6500

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 20


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2. ALL:
The ALL operator compares a value to all values in another value set.

 The following example finds all employees whose salaries are greater than all salaries of
employees.
EX:
select first_name, last_name, salary from employees where salary>=ALL
(select salary from employees where department_id =8) order by salary
DESC;
Output:
FIRST_NAME LAST_NAME SALARY
Steven King 24000
John Russel 17000
Neena Kochhar 14000

3. ANY:
The ANY operator compares a value to any value in a set ascending to condition.
The following example statement finds all employees whose salaries are greater than the average
salary of every department.
EX:
select first_name, last_name, salary from employees where salary >ANY
(select avg (salary) from employees‟ group by department_id) order by
first_name, last_name;

Output:
FIRST_NAME LAST_NAME SALARY
Alexander Hunold 9000.00
Charles Johnson 6200.00
David Austin 4800.00
Eden Flip 9000.00

4. Between:
 The between operator searches for values that are within a set of values.
 For example, the following statement finds all employees where salaries are between 9000
and 12000.

EX:

select first_name, last_name, salary from employees where salary


between 9000 AND 12000 order by salary;

Output:
FIRST_NAME LAST_NAME SALARY
Alexander Hunold 9000.00
Den Richards 10000.00
Nancy Prince 12000.00

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 21


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

5. IN:
 The IN operator compares a value to list of specified values. The IN operator return true if
compared value matches at least one value in the list.
 The following statement finds all employees who work in department _id 8 or 9.

EX:

select first_name, last_name, department_id from employees where


department_id IN (8,9) order by department_id;

Output:
FIRST_NAME LAST_NAME DEPARTMENT_ID
John Russel 8
Jack Livingstone 8
Steven King 9
Neena Kochhar 9

6. Exists:
 The EXISTS operator tests if a sub query contains any rows.
 For example, the following statement finds all employees who have dependents.
 select first_name, last_name from employees where EXISTS (select 1
from dependent d where d.employee_id=e.employee_id);

FIRST_NAME LAST_NAME
Steven King
Neena Kochhar
Alexander Hunold

2.22 SQL FUNCTIONS (Date & Time, Numeric, Aggregate, String conversions):

DATE & TIME FUNCTIONS:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 22


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 23


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Some important date and time functions are below:

Sysdate: It generates the system date.

Ex: Select sysdate from dual;

Output: 05-DEC-2021.

ADD_MONTHS: This function returns a date after adding data with specified no of months.

EX: Select ADD_MONTHS („2017-02-29‟,1) from dual;

Output: 31-MAR-17.

Select add_months(sysdate,3) from dual;

Output: 05-MAR-22.

CURRENT_DATE: This function displays the current date.

Ex: Select CURRENT_DATE from dual;

Output: 05-DEC-2021.

NEXT_DAY: This function represents both day and date and returns the day of the next given day.

EX: Select NEXT_DAY(SYSDATE,‟MONDAY‟) from dual;

Output: 07-DEC-21.

LAST_DAY: This function returns a day corresponding last day of months.

EX: Select LAST_DAY (sysdate) from dual;

Output: 31-DEC-21.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 24


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

MONTHS_BETWEEN: It is used to find no of months between two given dates.

EX: Select MONTHS_BETWEEN(‟16-APRIL-2021‟,‟16-AUGUST-2021) from dual;

Output: -4.

ROUND: It gives the nearest value or round off value for the argument pass. (or) It returns a date
rounded to a specific unit of measure.

EX: Select ROUND(‟26-NOV-21‟,‟YYYY‟) from dual;

Output: 01-JAN-22.

TRUNC: This function returns the date with the time(co-efficient) portion of the date truncated to
the unit specified.

EX: Select TRUNC (sysdate, ‟MM‟) from dual;

Output: 01-DEC-21.

TO_DATE: This function converts date which is in the character string to a date value.

EX: Select TO_DATE (‟01 jan 2017‟,‟DD MM YYYY‟) from dual;

Output: 01-JAN-17.

TO_CHAR: This function converts DATE or an INTERVAL value to a character string in a


specified format.

EX: Select TO_CHAR (sysdate,‟DD MM YYYY‟) from dual;

Output: 05 12 2021.

LEAST: This function displays the oldest date present in the argument list.

EX: Select LEAST(‟01-march-2021‟,‟16-feb-2021‟,‟28-dec-2021‟) from dual;

Output: 01-MAR-21.

GREATEST: This function displays the latest date present in the argument list.

EX: Select GREATEST (‟01-march-2021‟,‟16-feb-2021‟,‟28-dec-2021‟) from dual;

Output: 28-DEC-21.

2.23 Aggregate Functions:


Aggregate Functions take a collection of values as input and returns a single value.

[Link] ()

[Link] ()

[Link] ()

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 25


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

[Link] ()

[Link] ()

1. Count (): This function returns number of rows returned by a query.

Syntax: Select count(column_name)

From table_name

Where condition);

Example: Select count (distinct manager_id) from employees;

[Link] (): It will add/ sum all the column values in the query.

Syntax: Select sum (column_name)

From table_name

Where condition);

Example: Select sum(salaries) from employees;

[Link] (): Avg function used to calculate average values of the set of rows.

Syntax: Select avg (column_name)

From table_name

Where condition);

Example: Select avg(salary) from employees;

[Link] (): This function is used to find maximum value from the set of values.

Syntax: Select max (column_name)

From table_name

Where condition);

Example: Select max (salary) from employees;

[Link] (): This function is used to find minimum value from the set of values.

Syntax: Select min (column_name)

From table_name

Where condition);

Example: Select min (salary) from employees;

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 26


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

2.24 SQL NUMERIC FUNCTIONS:

Numeric functions are used to perform operations on numbers and return numbers.

Following are some of the Numeric functions

1. ABS (): It returns the absolute value of a number.

EX: select ABS (-243.5) from dual;

OUTPUT: 243.5

2. ACOS (): It returns the cosine of a number.


EX: select ACOS (0.25) from dual;
OUTPUT: 1.318116071652818

3. ASIN (): It returns the arc sine of a number.


EX: select ASIN (0.25) from dual;
OUTPUT: 0.253680255142

4. CEIL (): It returns the smallest integer value that is a greater than or equal to a number.
EX: select CEIL (25.77) from dual;
OUTPUT: 26

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 27


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

5. FLOOR (): It returns the largest integer value that is a less than or equal to a number.
EX: select FLOOR (25.75) from dual;

OUTPUT: 25
6. TRUNCATE (): This does not work for SQL server. It returns the truncated to 2 places right of the
decimal point.
EX: select TRUNCATE (7.53635, 2) from dual;

OUTPUT: 7.53
7. MOD (): It returns the remainder when two numbers are divided.
EX: select MOD (55,2) from dual;
OUTPUT: 1.
8. ROUND (): This function rounds the given value to given number of digits of precision.
EX: select ROUND (14.5262,2) from dual;
OUTPUT: 14.53.
9. POWER (): This function gives the value of m raised to the power of n.
EX: select POWER (4,9) from dual;
OUTPUT: 262144.
10. SQRT (): This function gives the square root of the given value n.
EX: Select SQRT (576) from dual;
OUTPUT: 24.
11. LEAST (): This function returns least integer from given set of integers.

EX: select LEAST (1,9,2,4,6,8,22) from dual;

OUTPUT: 1.

12. GREATEST (): This function returns greatest integer from given set of integers.

EX: select GREATEST (1,9,2,4,6,8,22) from dual;

OUTPUT: 22

2.25 STRING CONVERSION FUNCTIONS OF SQL:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 28


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

String Functions are used to perform an operation on input string and return the output string.
Following are the string functions

[Link] (): This function is used to add two words (or) strings.

EX: select „database‟ ||‟ „|| „management system‟ From dual;

OUTPUT: „database management system‟

[Link] (): This function is used to find the occurrence of an alphabet.

EX: instr („database system‟,‟ a‟) from dual;

OUTPUT: 2 (the first occurrence of „a‟)

[Link] (): This function is used to convert the given string into lowercase.

EX: select lower („DATABASE‟) from dual;

OUTPUT: database

[Link] (): This function is used to convert the lowercase string into uppercase.

EX: select upper („database „) from dual;

OUTPUT: DATABASE

[Link] (): This function is used to make the given string of the given size by adding the given symbol.

EX: > lpad („system „, 8, „0‟) from dual;

OUTPUT: 00system

[Link] (): This function is used to make the given string as long as the given size by adding the given
symbol on the right.

EX: rpad („system „,8, „0„) from dual;

OUTPUT: system00

[Link] (): This function is used to cut the given substring from the original string.

EX: ltrim („database „, „data „) from dual;

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 29


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

OUTPUT: base

[Link] (): This function is used to cut the given substring from the original string.

EX: rtrim („database „, „base „) from dual;

OUTPUT: data.

9. INITCAP (): This function returns the string with first letter of each word starts with uppercase.

EX: Select INITCAP („data base management system‟) from dual;

OUTPUT: Data Base Management System.

10. LENGTH (): Tis function returns the length of the given string.

EX: select LENGTH („SQ LANGUAGE‟) from dual;

OUTPUT: 11.

[Link] (): This function returns a portion of a string beginning at the character position.

EX: select SUBSTR („MY WORLD IS AMAZING‟,12,3) from dual;

OUTPUT: AM.

[Link] (): This function returns a string after replacing some set of characters into another set.

EX: select TRANSLATE („Delhi is the capital of India‟,‟i‟,‟a‟) from dual;

OUTPUT: Delha as the capatal andaa.

Review Questions:

1. Why NULL values are needed in databases?


2. Discuss GROUPBY and HAVING clauses with an example. And also give the constraints
related to their usage.
3. Consider the SAILOR DATABASE
Sailors (Sid: string, sname: string, rating: integer, age: real)
Boats (bid: integer, bname: string, colour: string)
Reserves (Sid: integer, bid: integer, day: date)
Based on the above schema, write the corresponding SQL queries for the following?
Find the colours of boats reserved by Lubber.
Find the names of sailors who have reserved at least one boat.
Find the names of sailors who have reserved a red or green boat.
Find the names of the sailors who have reserved both a red boat and a green boat.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 30


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

Find names of sailors who have reserved all boats.


4. Write about the usability of „group by‟ and „having‟ clauses In SQL.
5. How would you use the operators IN, EXISTS, UNIQUE, ANY and ALL in writing nested
queries? Why are they useful? Explain with an example.
6. Explain commands with respect to SQL: (i) Rename (ii) Alter (iii) View
7. Explain two aggregate functions of SQL.
8. Explain the following SQL constructs with examples:
(1) order by (2) group by and having (3) as select (4) schema

9. Write SQL Queries for following set of tables:


EMPLOYEE (EmpNo, Name, DoB, Address, Gender, Salary, DNumber)
DEPARTMENT (DNumber, Dname, ManagerEmpNo, ManagerStartDate).
Display the Age of „male‟ employees.
Display all employees in Department named „Marketing‟.
Display the name of highest salary paid „female‟ employee.
Which employee is oldest manger in company?
Display the name of department of the employee „SMITH.
10. Describe creating and modifying relations using SQL. Give examples for each.
11. Explain about integrity constraints over relations.
12. Consider the following insurance database, where the primary keys are underlined.
Construct the following SQL queries for this relational database. person (driver-id#, name,
address)
car (license, model, year)
accident (report-number, date, location)
owns (driver-id#, license)
participated (driver-id, car, report-number, damage-amount)
a. Find the total number of people who owned cars that were involved in accidents in
1989.
b. Find the number of accidents in which the cars belonging to “John Smith” were
involved.
c. Add a new accident to the database; assume any values for required attributes.
d. Delete the Mazda belonging to “John Smith”.
e. Update the damage amount for the car with license number “AABB2000” in the
accident with report number “AR2197” to $3000.
13. List the SQL functions for string conversions.
14. Write SQL statements for following:
Student (Enrno, name, courseId, emailId, cellno)
Course (courseId, course_nm, duration)

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 31


DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL

i. Add a column city in student table.


ii. Find out list of students who have enrolled in “computer” course.
iii. List name of all courses with their duration.
iv. List name of all students start with „a‟.
v. List email Id and cell no of all mechanical engineering students.
15. How to define a domain constraint? Give an example.
16. What is an integrity constraint? Explain its enforcement by DBMS with illustrative
17. List the data types supported by SQL.
18. Demonstrate the use of DISTINCT keyword in SQL select statement.
19. Consider the following database schema to write nested queries in SQL
Supplier (id, name, city)
Parts (pno, pname, pdescription)
Supply (id, pno, cost)
i. Find the names of the parts supplied by “RamRaj”
ii. Find the names of the suppliers who supply “Nuts”
iii. Find the cost of bolts being supplied by Nagpur suppliers.

References:

 Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
 C.J. Date, Introduction to Database Systems, Pearson Education.
 Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 32


Aditya College of Engineering & Technology
Aditya Nagar, ADB Road, Surampalem - 533437

DATABASE MANAGEMENT SYSTEMS

UNIT III: Entity Relationship Model


(PART – 1)

Syllabus:
Entity Relationship Model: Introduction, Representation of entities, attributes, entity set,
relationship, relationship set, constraints, sub classes, super class, inheritance, specialization,
generalization using ER Diagrams.
SQL: Creating tables with relationship, implementation of key and integrity constraints,
nested queries, sub queries, grouping, aggregation, ordering, implementation of different
types of joins, view(updatable and non-updatable), relational set operations.

Objectives:
After studying this unit, you will be able to:
 Explain entity relationship model
 Know the constraints
 Describe E-R diagrams and extended ER features
 Describe subqueries and nested subqueries
 Explain complex queries
 Define views
 Discuss joined relations
DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

3.1 Introduction

The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships and is widely used to develop an initial
database design.
The ER model is important primarily for its role in database design. It provides useful concepts
that allow us to move from an informal description of what users want from their database to a
more detailed and precise, description that can be implemented in a DBMS.
Even though the ER model describes the physical database model, it is basically useful in the
design and communication of the logical database model.

3.2 Overview of Database Design


Our primary focus is the design of the database. The database design process can be divided into
six steps:

Requirements Analysis

The very first step in designing a database application is to understand what data is to be stored in
the database, what applications must be built on the database, and what operations must be
performed on the database. In other words, we must find out what the users want from the
database. This process involves discussions with user groups, a study of the current operating
environment, how it is expected to change an analysis of any available documentation on existing
applications and so on.

Conceptual Database Design

The information gathered in the requirement analysis step is used to develop a high-level
description of the data to be stored in the database, along with the conditions known to hold this
data. The goal is to create a description of the data that matches both—how users and developers
think of the data (and the people and processes to be represented in the data). This facilitates
discussion among all the people involved in the design process i.e., developers and as well as
users who have no technical background. In simple words, the conceptual database design phase
is used in drawing ER model.

Logical Database Design

We must implement our database design and convert the conceptual database design into a
database schema (a description of data) in the data model (a collection of high-level data
description constructs that hide many low-level storage details) of the DBMS. We will consider
only relational DBMSs, and therefore, the task in the logical design step is to convert the
conceptual database design in the form of E-R Schema (Entity-Relationship Schema) into a
relational database schema.

Schema Refinement

The fourth step in database design is to analyze the collection, of relations (tables) in our relational
database schema to identify future problems, and to refine (clear) it.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 2


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

Physical Database Design

This step may simply involve building indexes on some tables and clustering some tables, or it
may involve redesign of parts of the database schema obtained from the earlier design steps.

Application and Security Design

Any software project that involves a DBMS must consider applications that involve processes and
identify the entities.

3.3 Entities, Attributes and Entity Sets


Entity: An entity is an object in the real world that is distinguishable from other objects.
Entity set: An entity set is a collection of similar entities. The Employees entity set with attributes
ssn, name, and lot is shown in the following figure.

Attribute: An attribute describes a property associated with entities. Attribute will have a name
and a value for each entity.
Domain: A domain defines a set of permitted values for an attribute
Entity Relationship Model: An ERM is a theoretical and conceptual way of showing data
relationships in software development. It is a database modeling technique that generates an
abstract diagram or visual representation of a system's data that can be helpful in designing a
relational database.
ER model allows us to describe the data involved in a real-world enterprise in terms of objects and
their relationships and is widely used to develop an initial database design.

3.4 Representation of Entities and Attributes


ENTITIES: Entities are represented by using rectangular boxes. These are named with the entity
name that they represent.

ATTRIBUTES: Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity.

Types of attributes:
 Simple attribute − Simple attributes are atomic values, which cannot be divided further. For
example, a student's roll number is an atomic value.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 3


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

 Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.

 Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database, instead
it can be derived. For another example, age can be derived from data_of_birth.

 Single-value attribute − Single-value attributes contain single value. For example −


Social_Security_Number.
 Multi-value attribute − Multi-value attributes may contain more than one values. For example,
a person can have more than one phone number, email_address, etc.

3.5 Relationship and Relationship set


Relationships are represented by diamond-shaped box. Name of the relationship is written inside the
diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a line.
Types of relationships:

Degree of Relationship is the number of participating entities in a relationship defines the degree of the
relationship. Based on degree the relationships are categorized as

 Unary = degree 1
 Binary = degree 2

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 4


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

 Ternary = degree 3
 n-array = degree
Unary Relationship: A relationship with one entity set. It is like a relationship among 2 entities of
same entity set. Example: A professor ( in-charge) reports to another professor (Head Of the Dept).

Binary Relationship: A relationship among 2 entity sets. Example: A professor teaches a course
and a course is taught by a professor.

Ternary Relationship: A relationship among 3 entity sets. Example: A professor teaches a course
in so and so semester.

n-array Relationship: A relationship among n entity sets.

Cardinality:
Defines the number of entities in one entity set, which can be associated with the number of
entities of other set via relationship set. Cardinality ratios are categorized into 4. They are.

1. One-to-One relationship: When only one instance of entities is associated with the
relationship, then the relationship is one-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with at most one entity in A.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 5


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

2. One-to-many relationship: When more than one instance of an entity is associated with a
relationship, then the relationship is one-to-many relationship. Each entity in A is associated
with zero or more entities in B and each entity in B is associated with at most one entity in A.

3. Many-to-one relationship: When more than one instance of entity is associated with the
relationship, then the relationship is many-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with 0 (or) more entities in A.

4. Many-to-Many relationship: If more than one instance of an entity on the left and more than
one instance of an entity on the right can be associated with the relationship, then it depicts
many-to-many relationship. Each entity in A is associated with 0 (or) more entities in B and
each entity in B is associated with 0 (or) more entities in A.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 6


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

Relationship Set:
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.

Participation Constraints:
 Total Participation − If Each entity in the entity set is involved in the relationship then the
participation of the entity set is said to be total. Total participation is represented by double
lines.
 Partial participation − If, Not all entities of the entity set are involved in the relationship then
such a participation is said to be partial. Partial participation is represented by single lines.
Example:

3.6 Additional Features Of The ER Model


Key Constraints

Consider a relationship set called Manages between the Employees and Departments entity sets
such that each department has at most one manager, although a single employee is allowed to
manage more than one department. The restriction that each department has at most one manager
is an example of a key constraint, and it implies that each Departments entity appears in at most
one Manages relationship in any allowable instance of Manages. This restriction is indicated in the
ER diagram of below Figure by using an arrow from Departments to Manages. Intuitively, the
arrow states that given a Departments entity, we can uniquely determine the Manages relationship
in which it appears.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 7


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

Key Constraints for Ternary Relationships

If an entity set E has a key constraint in a relationship set R, each entity in an instance of E appears
in at most one relationship in (a corresponding instance of) R. To indicate a key constraint on entity
set E in relationship set R, we draw an arrow from E to R.

Below figure show a ternary relationship with key constraints. Each employee works in at most one
department, and at a single location.

Weak Entities

Strong Entity set: If each entity in the entity set is distinguishable or it has a key then such an entity
set is known as strong entity set.

Weak Entity set: If each entity in the entity set is not distinguishable or it doesn't has a key then
such an entity set is known as weak entity set.

eno is key so it is represented by solid underline. dname is partial key. It can't distinguish the
tuples in the Dependent entity set. so dname is represented by dashed underline.
Weak entity set is always in total participation with the relation. If entity set is weak then the
relationship is also known as weak relationship, since the dependent relation is no longer needed
when the owner left.
Ex: policy dependent details are not needed when the owner (employee) of that policy left or fired
from the company or expired. The detailed ER Diagram is as follows.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 8


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

The cardinality of the owner entity set is with weak relationship is 1 : m. Weak entity set is
uniquely identifiable by partial key and key of the owner entity set.
Dependent entity set is key to the relation because the all the tuples of weak entity set are
associated with the owner entity set tuples.

Dependents is an example of a weak entity set. A weak entity can be identified uniquely only by
considering some of its attributes in conjunction with the primary key of another entity, which is
called the identifying owner.
The following restrictions must hold:
 The owner entity set and the weak entity set must participate in a one-to-many relationship
set (one owner entity is associated with one or more weak entities, but each weak entity has
a single owner). This relationship set is called the identifying relationship set of the weak
entity set.
 The weak entity set must have total participation in the identifying relationship set

3.7 E-R Diagrams Implementation

Now we are in a position to write the ER diagram for the Company database which was introduced
in the beginning of this unit. The readers are strictly advised to follow the steps shown in this unit
to design an ER diagram for any chosen problem.

Step 1: Identify the Strong and Weak Entity Sets

After careful analysis of the problem we come to a conclusion that there are four possible entity sets
as shown below:
1. Employees Strong Entity Set
2. Departments Strong Entity Set
3. Projects Strong Entity Set
4. Dependents Weak Entity Set

Step 2: Identify the Relevant Attributes

The next step is to get all the attributes that are most applicable for each entity set. Do this work by
considering each entity set in mind and also the type of attributes. Next job is to pick the primary
key for strong entity sets and partial key for weak entity sets.

Example: Following are the attributes:


1. Employees SSN. Name, Addr, DateOfBirth, Sex, Salary
2. Departments DNo. DName, DLocation
3. Projects PNo. PName, PLocation
4. Dependents (weak) DepName, DateOf Birth, Sex, Relationship

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 9


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

The underlined attributes are the primary keys and DepName is the partial key of Dependents.
Also, DLocation may be treated as a multivalued attribute.

Step 3: Identify the Relationship Sets

In this step we need to find all the meaningful relationship sets among possible entity sets. This
step is very tricky, as redundant relationships may lead to complicated design and in turn a bad
implementation.

Example: Let us show below what the possible relationship sets are:
1. Employees and Departments WorksFor
2. Employees and Departments Manages
3. Departments and Projects Controls
4. Projects and Employees WorksOn
5. Dependents and Employees Has
6. Employees and Employees Supervises

Some problems may not have recursive relationship sets but some do have. In fact, our Company
database has one such relationship set called Supervises. You can complete this step adding
possible descriptive attributes of the relationship sets (Manages has StartDate and WorksOn has
Hours).

Step 4: Identify the Cardinality Ratio and Participation Constraints

This step is relatively a simple one. Simply apply the business rules and your common sense. So,
we write the structural constraints for our example as follows:

1. WorksFor N: 1 Total on either side


2. Manages 1: 1 Total on Employees and Partial on Departments side
3. Controls 1: N Total on either side
4. WorksOn M: N Total on either side
5. Has 1: M Total on Dependents and Partial on Employees

Step 5: Identify the IS-A and Has-A Relationship Sets

The last step is to look for “is-a” and “has-a” relationships sets for the given problem. As far as
the Company database is concerned, there are no generalization and aggregation relationships
in the Company database.

The complete single ER diagram by combining all the above five steps is shown in figure

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 10


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

3.8 Class Hierarchies


To classify the entities in an entity set into subclass entity is known as class hierarchies. Example,
we might want to classify Employees entity set into subclass entities Hourly-Emps entity set and
Contract-Emps entity set to distinguish the basis on which they are paid. Then the class hierarchy
is illustrated as follows.

This class hierarchy illustrates the inheritance concept. Where, the subclass attributes ISA (read as
: is a) super class attributes; indicating the “is a” relationship (inheritance concept).Therefore, the

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 11


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

attributes defined for a Hourly-Emps entity set are the attributes of Hourly-Emps plus attributes
of Employees (because subclass can have superclass properties). Likewise the attributes defined
for a Contract-Emps entity set are the attributes of Contract-Emps plus attributes of Employees.

Class Hierarchy based on Sub-super Set

1. Specialization: Specialization is the process of identifying subsets (subclasses) of an entity set


(superclass) that share some special distinguishable characteristic. Here, the superclass
(Employee) is defined first, then the subclasses (Hourly-Emps, Contract-Emps, etc.) are defined
next.
In short, Employees is specialized into subclasses.
2. Generalization: Generalization is the process of identifying (defining) some generalized
(common) characteristics of a collection of (two or more) entity sets and creating a new entity
set that contains (possesses) these common characteristics. Here, the subclasses (Hourly-Emps,
Contract-Emps, etc.) are defined first, then the Superclass (Employee) is defined, next.
In shortly, Hourly-Emps and Contract-Emps are generalized by Employees.

Class Hierarchy based on Constraints


1. Overlap constraints: Overlap constraints determine whether two subclasses are allowed
to contain the same entity.

Example: Can Akbar be both an Hourly-Emps entity and a Contract-Emps entity?


The answer is, No.
Other example, can Akbar be both a Contract-Emps entity and a Senior-Emps entity (among
them)?
The answer is, Yes. Thus, this is a specialisation hierarchy property. We denote this by
writing “Contract-Emps OVERLAPS Senior-Emps”.

2. Covering Constraints: Covering constraints determine whether the entities in the


subclasses collectively include all entities in the superclass.

Example: Should every Employee be a Hourly-Emps or .Contract-Emps?


The Answer is, No. He can be a Daily-Emps.
Other example, should every Motor-vehicle (superclass) be a Bike (subclass) or a Car
(subclass)?
The Answer is YES. Thus generalization hierarchies property is that every instance of a
superclass is an instance of a subclass.
We denote this by writing “ Bikes and Cars COVER Motor-vehicles”.

3.9 Aggregation
Aggregation allows us to indicate that a relationship set (identified through a dashed box)
participates in another relationship sets. That is, a relationship set in an association between entity
sets. Sometimes we have to model a relationship between a collection of entities and relationships.

Example: Suppose that we have an entity set called Project and that each Project entity is sponsored
by one or more departments. Thus, the sponsors relationship set captures this information but, a
department that sponsors a project, might assign employees to monitor the sponsorship.
Therefore, Monitors should be a relationship set that associates a sponsors relationship (rather

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 12


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

than a Project or Department entity) with an Employees entity. However, again we have to define
relationships to associate two or more entities.

Use of Aggregation
We use an aggregation, when we need to express a relationship among relationships. Thus, there
are really two distinct relationships, Sponsors and Monitors, each with its own attributes.

3.10 Conceptual Database Design With The ER Model (ER Design Issues)
The following are the ER design issues:

1. Use entry sets attributes

2. Use of Entity sets or relationship sets

3. Binary versus entry relationship sets

4. Aggregation versus ternary relationship.

1. Use of Entity Sets versus Attributes

Consider the relationship set (called Works In2) shown in Figure

Intuitively, it records the interval during which an employee works for a department. Now
suppose that it is possible for an employee to work in a given department over more than one
period.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 13


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

This possibility is ruled out by the ER diagram’s semantics. The problem is that we want to record
several values for the descriptive attributes for each instance of the Works_In2 relationship. (This
situation is analogous to wanting to record several addresses for each employee.) We can address
this problem by introducing an entity set called, say, Duration, with attributes from and to, as
shown in Figure

2. Entity versus Relationship


Consider the relationship set called Manages that each department manager is given a
discretionary budget (dbudget), as shown in below figure, in which we have also renamed the
relationship set to Manages2.

There is at most one employee managing a department, but a given employee could manage
several departments; we store the starting date and discretionary budget for each manager-
department pair. This approach is natural if we assume that a manager receives a separate
discretionary budget for each department that he or she manages.

But what if the discretionary budget is a sum that covers all departments managed by that
employee? In this case each Manages2 relationship that involves a given employee will have the
same value in the dbudget field. In general such redundancy could be significant and could cause
a variety of problems. Another problem with this design is that it is misleading.

We can address these problems by associating dbudget with the appointment of the employee as
manager of a group of departments. In this approach, we model the appointment as an entity set,
say Mgr_Appts, and use a ternary relationship, say Man ages3, to relate a manager, an
appointment, and a department. The details of an appointment (such as the discretionary budget)
are not repeated for each department that is included in the appointment now, although there is
still one Manages3 relationship instance per such department. Further, note that each department
has at most one manager, as before, because of the key constraint. This approach is illustrated in
below Figure.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 14


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

3. Binary versus Ternary Relationships


Consider the ER diagram shown in below figure. It models a situation in which an employee can
own several policies, each policy can be owned by several employees, and each dependent can be
covered by several policies.

Suppose that we have the following additional requirements:

 A policy cannot be owned jointly by two or more employees.


 Every policy must be owned by some employee.
 Dependents is a weak entity set, and each dependent entity is uniquely identified by
taking pname in conjunction with the policyid of a policy entity (which, intuitively,
covers the given dependent).

The first requirement suggests that we impose a key constraint on Policies with respect to Covers,
but this constraint has the unintended side effect that a policy can cover only one dependent. The
second requirement suggests that we impose a total participation constraint on Policies. This
solution is acceptable if each policy covers at least one dependent. The third requirement forces us
to introduce an identifying relationship that is binary (in our version of ER diagrams, although
there are versions in which this is not the case).

Even ignoring the third point above, the best way to model this situation is to use two binary
relationships, as shown in below figure.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 15


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

4. Aggregation versus Ternary Relationships


The choice between using aggregation or a ternary relationship is mainly determined by the
existence of relationship that relates a relationship set to an entity set (or second relationship
set). The choice may also be guided by certain integrity constraints to we want to express.

Consider the constraint that each sponsorship (of a project by a department) be monitored by at
most one employee. We cannot express this constraint in terms of the Sponsors2 relationship
set. Also we can express the constraint by drawing an arrow from the aggregated relationship.
Sponsors to the relationship Monitors. Thus, the presence of such a constraint serves as another
reason for using aggregation rather than a ternary relationship set.

Review Questions

1. Define the following terms and give examples


(i)cardinality (ii)unary relationships (iii)aggregation (iv)specialization
2. What is Entity set? and also define Relationship set. List and explain the symbols used to
draw ER Diagram.
3. Design a database for an airline. The database must keep track of customers and their

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 16


DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL

reservations, flights and their status, seat assignments on individual flights, and the schedule
and routing of future flights. Your design should include an E-R diagram, a set of relational
schemas, and a list of constraints, including primary-key and foreign-key constraints.
4. Discuss the representation of total participation and multivalued attribute in an E/R
diagram.
5. What is an Entity Relationship diagram and why it is useful?
6. What is a weak entity in ER diagram?
7. Give the diagrammatic representation of recursive relationship in an ER diagram and also
explain the importance of role names in representing a recursive relationship by taking a real
time example.
8. Consider a database used to record the marks that students get in different exams of different
course offerings.
a. Construct an E-R diagram that models exams as entities, and uses a ternary relationship,
for the above database.
b. Construct an alternative E-R diagram that uses only a binary relationship between
students and course-offerings. Make sure that only one relationship exists between a
particular student and course-offering pair, yet you can represent the marks that a student
gets in different exams of a course offering.
9. Explain about relationship sets in ER model with examples.
10. Explain about ER model design issues.

References:

 Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
 C.J. Date, Introduction to Database Systems, Pearson Education.
 Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 17


UNIT-3
PART- II SQL

 Creating Tables with relationship


 Implementation of key and integrity constraints
 Relational Set Operations
 Implementation of different types of Joins
 Views(Updatable and Non Updatable)
 Grouping, Aggregation, Ordering
 Nested queries, Sub queries

[Link] TO CREATING TABLES WITH RELATIONSHIP

Creating tables using CREATE command:


This command is used to create a database and its objects such as Tables, Views, Procedures,
Triggers etc. It defines each column of the table uniquely. Each column has minimum of three
attributes, a column name , data type and size.

Syntax: CREATE TABLE <table_name>


(
column _name 1 DATATYPE 1 (SIZE),
column _name 2 DATATYPE 2 (SIZE),
:
column _name n DATATYPE N (SIZE) );

CREATING TABLES WITH RELATIONSHIP

When we want to create tables with relationship , we need to use Referential integrity
constraints. The referential integrity constraint enforces relationship between tables.
-It designates a column or combination of columns as a Foreign key.
-The foreign key establish a relationship with a specified primary or unique key in another
table called the Referenced key.
- When referential integrity is enforced , it prevents from..
1) Adding records to a related table if there is no associated record in the primary table.
2) Changing values in a primary table that result in orphaned records in a related table.
3) Deleting records from a primary table if there are matching related records.
Note: The table containing the foreign key is called the child table and the table containing the
referenced key is called the Parent table.

SYNTAX: CREATE TABLE <tablename>(


col_name1 datatype[size] ,
col_name2 datatype[size] ,
:
col_name n datatype[size],
FOREIGN KEY(column_name) REFERENCES <parent_table_name>(column_name));

EX: SQL> CREATE TABLE marks(


1
sid VARCHAR2(4),
marks NUMBER(3),
PRIMARY KEY(sid),
FOREIGN KEY(sid) REFERENCES student1(sid));

[Link] OF KEY AND INTEGRITY CONSTRAINTS

Data constraints: All business of the world run on business data being gathered, stored and
analyzed. Business managers determine a set of business rules that must be applied to their data
prior to it being stored in the database/table of ensure its integrity.
For instance , no employee in the sales department can have a salary of less than Rs.1000/- .
Such rules have to be enforced on data stored. If not, inconsistent data is maintained in database.

Note: It is used to impose business rules on DBs.


It allows to enter only valid data.

Various types of Integrity Constraints:

Integrity constraints are the rules in real life, which are to be imposed on the data. If the data is
not satisfying the constraints then it is considered as inconsistent. These rules are to be enforced
on data because of the presence of these rules in real life. These rules are called integrity
constraints. Every DBMS software must enforce integrity constraints, otherwise inconsistent data
is generated.

You can use constraints to do the following:

 To prevent invalid data entry into tables.


 To Enforce rules on the data in a table whenever a row is inserted, updated, or deleted
from that table. The constraint must be satisfied for the operation to succeed.
 To Prevent the deletion of a record from a table if there are dependencies.

Example for Integrity Constraints :-

2
Constraints are categorized as follows.

[Link] integrity constraints - A domain means a set of values assigned to a column. i.e A
set of permitted values. Domain constraints are handled by
Defining proper data type
Specifying not null constraint
Specifying check constraint.
Specifying default constraint

Not null –indicates that a column cannot store NULL value.


Check – Ensures that the value in column meets a specific condition.
Default- prevents null value in column when value is not provided in column by user. So, it
assigns default value or globally assigned value.

[Link] integrity constraints – are TWO types


Unique constraint –It defines a Entity or column as a UNIQUE for particular table.
And ensures that each row of a column must have a UNIQUE value
or name.
Primary key constraint – This avoids duplicate and null values. It combination of a NOT
NULL and UNIQUE.
[Link] integrity constraints
Foreign key –indicates the relationship between child and parent tables.
This Constraint are always attached to a column not a table.
We can add constraints in two ways.

Column level :-

 Constraint is declared immediately declaring column.


 Define with each column
 Use column level to declare constraint for single column.
 Composite key cannot be defined at column level.

Table level :-
3
constraint is declared after declaring all columns.
use table level to declare constraint for combination of columns.(i.e composite key)
 not null cannot be defined.

Another type is possible at Alter level

 constraint is declared with ALTER command.


When we use this , make sure that the table should not contain data.

To add these constraints , we can use constraint with label or with out label.

TWO BASIC TYPES ------- 1. Constraint WITH NAME


[Link] WITHOUT NAME.
i) Declaring Constraint at “TABLE” level (Constraints with label)
Syntax :- CREATE TABLE <table name>
(
col_name1 DATATYPE(SIZE) ,
…..,
col_nameN DATATYPE(SIZE) ,
CONSTRAINT <cons_lable> NAME _OF_ THE_CONSTRAINT [column_list] );

ii) Declaring Constraint at “Column” level (Constraints with label)


Syntax :-
col_name DATATYPE(SIZE) CONSTRAINT <cons_lable> NAME _OF_ THE_CONSTRAINT

iii) Declaring Constraint at “TABLE” level (Constraints without label)


Syntax :-
CREATE TABLE <table name>
(
col_name1 DATATYPE(SIZE) ,
…..,
col_nameN DATATYPE(SIZE) ,
NAME _OF_ THE_CONSTRAINT [column_list] );

iv) Declaring Constraint at “Column” level (Constraints without label)


Syntax :-
col_name DATATYPE(SIZE) NAME _OF_ THE_CONSTRAINT
v) Adding Constraint to a table at “Alter” level (Constraints with label)

A constraint can be added to a table at any time after the table was created by using ALTER
TABLE statement , using ADD clause.

Syntax:
ALTER TABLE <table_name> ADD CONSTRAINT cont_label NAME _OF_
THE_CONSTRAINT (column);
4
vi) Declaring Constraint at “Alter” level (Constraints without label)

Syntax:
ALTER TABLE <table_name> ADD NAME _OF_ THE_CONSTRAINT (column);
Note:’ Constraint ‘ clause is not required when constraints declared without a label.

1.1 NOT NULL:


 It ensures that a table column cannot be left empty.
 Column declared with NOT NULL is a mandatory column i.e data must be entered.
 The NOT NULL constraint can only be applied at column level.
 It allows DUPLICATE data.
 Used to avoid null values into columns.

NOTE: It is applicable at COLUMN LEVEL only.


SYNTAX: column_name DATATAYPE[SIZE] NOT NULL

EX 1 : CREATE TABLE table_notnull(


sid NUMBER(4) NOT NULL, // COLUMN LEVEL
sname VARCHAR2(10));
SQL> SELECT *FROM table_notnull;
SID SNAME
---------- ----------
501 GITA
502 RAJU
503
503
504
Here, SID column not allowed any null values and it can allow duplicate values , but sname can
allows it.
Ex 2:

1.2 CHECK :
 Used to impose a conditional rule a table column.
 It defines a condition that each row must satisfy.
 Check constraint validates data based on a condition .

5
 Value entered in the column should not violate the condition.
 Check constraint allows null values.
 Check constraint can be declared at table level or column level.
 There is no limit to the number of CHECK constraints that can be defined on a
condition.
Limitations :-
 Conditions should not contain/not applicable to pseudo columns like ROWNUM,
SYSDATE etc.
 Condition should not access columns of another table

//CONSTRAINT @ COLUMN LEVEL


SYNTAX: column_name DATATYPE (SIZE) CHECK (condition) // without label

Here, we are creating a table with two columns such as Sid, sname.

Ex: CREATE TABLE check_column(


sid VARCHAR2(4) CHECK (sid LIKE 'C%' AND LENGTH(sid)=4), // without label
sname VARCHAR2(10));
Here, sid should start with ‘C ‘and a length of sid is exactly 4 characters.

SQL> SELECT *FROM check_column;


SID SNAME
---- ----------
C501 MANI
C502 DHANA
C503 RAVI
C504 RAJA
// with label
Syntax: Column_name DATATYPE(SIZE) CONSTRAINT constaint_label
CHECK(condtion)

CREATE TABLE check_column (


sid VARCHAR2(4) CONSTRAINT ck CHECK (sid LIKE 'C%' AND LENGTH(sid)=4),
sname VARCHAR2(10) );

//CHECK @TABLE LEVEL

CREATE TABLE check_table (


sid VARCHAR2(4) ,
sname VARCHAR2(10),
CHECK (sid LIKE'C%' AND LENGTH(sid)=4),
CHECK(sname LIKE '%A'));

Here, sid should start with ‘C ‘and a length of sid is exactly 4 characters. And sname should
ends with letter ‘ A’

6
SQL> SELECT *FROM check_table;

SID SNAME
---- ----------
C401 ABHILA
C401 ANITHA
C403 NANDHITHA
C522 LOHITHA

// with label

CONSTRAINT ck1 CHECK (sid LIKE'C%' AND LENGTH(sid)=4),


CONSTRAINT ck2 CHECK(sname LIKE '%A'));

@ ALTER LEVEL
Here, we add check constraint to new table with columns.

SQL> CREATE TABLE check_alter(sid VARCHAR2(4),


Sname VARCHAR2(10));

//CHECK @ ALTER LEVEL: / / CONSTRAINT WITHOUT NAME


SQL> ALTER TABLE <table name>ADD CHECK (condition );

SQL> ALTER TABLE check_alter ADD CHECK ( sid like ‘C%’ );

SYNTAX: ALTER TABLE <table_name> ADD CONSTRAINT cont_name


CHECK(cond);

SQL> ALTER TABLE check_alter ADD CONSTRAINT ck CHECK ( sid LIKE 'C%');

ANOTHER EXAMPLE FOR TABLE LEVEL CONSTRAINT

Here, We create table with THREE columns

ADD CHECK CONSTRAINT @ TABLE LEVEL (AT THE END OF TABLE


DEFINITION)
MARKS IN BETWEEN 0 AND 100.

SQL> CREATE TABLE marks2 ( sid VARCHAR2(4),


sec VARCHAR2(2),
marks NUMBER(3),
CHECK(marks>0 AND marks<=100) );

DROP @ CHECK CONSTRAINT

SYNTAX: ALTER TABLE <table_name> DROP CONSRAINT cont_name;

SQL> ALTER TABLE check_table DROP CONSTRAINT ck;


7
DEFAULT

-If values are not provided for table column , default will be considered.
-This prevents NULL values from entering the columns , if a row is inserted without a value for
a column.
-The default value can be a literal, an expression, or a SQL function.
-The default expression must match the data type of the column.
- The DEFAULT constraint is used to provide a default value for a column.

-The default value will be added to all new records IF no other value is specified.

Syntax: Column_name datatype (size) DEFAULT <value/expression/function>

Ex: MIDDLENAME VARCHAR(10) DEFAULT 'UNAVAILABLE'

CONTACTNO NUMBER(10) DEFAULT 9999999999

This defines what value the column should use when no value has been supplied explicitly when
inserting a record in the table.

CREATE TABLE tab_default( sid NUMBER(10),


contactno number(10) DEFAULT 9999999999);
Add data to table:

Insert into tab_default (sid,contactno) values(501,9493949312);


Insert into tab_student(sid) values(502);
Insert into tab_student(sid) values(503);
Insert into tab_student(sid,sname) values(504,9393949412);

Select * from tab_default;

SID CONTACTNO
---------- ------------------
501 9493949312
502 9999999999
503 9999999999
504 9393949412

2.1. UNIQUE
 Columns declared with UNIQUE constraint does not accept duplicate values.
 One table can have a number of unique keys.
 Unique key can be defined on more than one column i.e composite unique key
 A composite key UNIQUE key is always defined at the table level only.
 By default UNIQUE columns accept null values unless declared with NOT NULL
constraint
 Oracle automatically creates UNIQUE index on the column declared with UNIQUE
constraint
8
 UNIQUE constraint can be declared at column level and table level.

UNIQUE@ COLUMN LEVEL


SYNTAX: column_name DATA_TYPE(SIZE) UNIQUE

CREATE TABLE table_unique(


sid NUMBER(4) UNIQUE,
sname VARCHAR2(10));
//UNIQUE @ TABLE LEVEL
SYNTAX: UNIQUE(COLUMN_LIST);
CREATE TABLE table_unique2(
sid NUMBER(4),
sname VARCHAR2(10) ,
UNIQUE(sid,sname));

SQL> SELECT *FROM TABLE_UNIQUE2;


SID SNAME
---------- ----------
401 RAMU
402 SITA // Here , these two records are distinct not the same.
402 GITHA
403 GITHA
404 RAMU

Unique @ ALTER level:

Alter table table_unique ADD UNIQUE (sid) // with out label


Alter table table_unique ADD CONSTRAINT uq UNIQUE(sid) // with label

DROP UNIQUE @ TABLE LEVEL


SQL> ALTER TABLE table_unique2 DROP UNIQUE(sid,sname);

Now , we removed unique constraint , so now this table consists duplicate data.
//UNIQUE@ ALTER LEVEL (here, the table contains duplicates, so it is not works)
//delete data from table_unique2
SQL> DELETE FROM table_unique2;

PRIMARY KEY constraint :-


PRIMARY KEY is one of the candidates key , which uniquely identifies a record in a table.
-used to define key column of a table.
-it is provided with an automatic index.
-A primary key constraint combines a NOT NULL and UNIQUE behavior in one declaration.
Characterstics of PRIMARY KEY :-

 There should be at the most one Primary Key or Composite primary key per table.
9
 PK column do not accept null values.
 PK column do not accept duplicate values.
 RAW,LONG RAW,VARRAY,NESTED TABLE,BFILE columns cannot be declared with PK
 If PK is composite then uniqueness is determined by the combination of columns.
 A composite primary key cannot have more than 32 columns
 It is recommended that PK column should be short and numeric.
 Oracle automatically creates Unique Index on PK column
EX:

// PRMARY KEY @ COLUMN LEVEL


SYNTAX : column_name DATA_TYPE(SIZE) PRIMARY KEY

SQL> CREATE TABLE student1 (


sid VARCHAR2(4) PRIMARY KEY
CHECK (sid LIKE 'V%' AND LENGTH(sid)=4 ) ,
name VARCHAR2(10));

SQL> DESC student1;


Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
NAME VARCHAR2(10)

CASE 2: ADD PRIMARY KEY @ ALTER LEVEL

SQL> CREATE TABLE student2( sid VARCHAR2(4),


name VARCHAR2(10));

SYNTAX: ALTER TABLE <tablename> ADD PRIMARY KEY (col_name);

SQL> ALTER TABLE student2 ADD PRIMARY KEY(sid);

Table altered.

SQL> DESC student2;


Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
NAME VARCHAR2(10)
10
CASE 3 : ADD PRIMARY KEY @ TABLE LEVEL
here, we can create a simple and composite primary keys;

SYNTAX: CREATE TABLE < tablename>( col_name1 datatype[size],


col_name2 datatype[size],
:
col_namen datatype[size],
PRIMARY KEY (col_name);
//SIMPLE PRIMARY KEY @ TABLE LEVEL
SQL> CREATE TABLE student3(
sid VARCHAR2(4),
name VARCHAR2(10),
marks NUMBER(3),
PRIMARY KEY(sid) );
SQL> DESC student3;
Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
NAME VARCHAR2(10)
MARKS NUMBER(3)

// COMPOSITE PRIMARY KEY @ TABLE LEVEL

SYNTAX:
CREATE TABLE < tablename>( col_name1 datatype[size],
col_name2 datatype[size],
:
col_namen datatype[size],
PRIMARY KEY (col_name1,col_name2….colmn_name n);

SQL> CREATE TABLE student4(


sid VARCHAR2(4),
name VARCHAR2(10),
marks NUMBER(3),
PRIMARY KEY(sid,name) ); // WITH OUT LABEL

CONSTRAINT pk PRIMARY KEY(sid,name) // WITH LABEL

SQL> DESC STUDENT4;


Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
NAME NOT NULL VARCHAR2(10)
MARKS NUMBER(3)

FOREIGN KEY Constraint:-


11
 Foreign key is used to establish relationship between tables.

 Foreign key is a column in one table that refers primary key/unique


columns of another or same table.

 Values of foreign key should match with values of primary key/unique or


foreign key can be null.

 Foreign key column allows null values unless it is declared with NOT
NULL.

 Foregin key column allows duplicates unless it is declared with UNIQUE

 By default oracle establish 1:M relationship between two tables.

 To establish 1:1 relationship between two tables declare foreign key with
unique constraint

 Foreign key can be declared at column level or table level.

 Composite foreign key must refer composite primary key or Composite


unique key.

EX: TABLES: STYDENT 1 & MARKS 1

CHILD TABLE ‘ MARKS1’ :

ADDING PRIMARY AND CHECK CONSTRAINT @ CREATE LEVEL


SQL> CREATE TABLE marks1(
sid VARCHAR2(4) PRIMARY KEY CHECK( sid LIKE 'V%' AND LENGTH(sid)=4),
marks NUMBER(3) );

// ADDING PRIMARY AND FOREIGN KEY @ TABLE/ CREATE LEVEL

SYNTAX: CREATE TABLE <tablename>(


12
col_name1 datatype[size] ,
col_name2 datatype[size] ,
:
col_name n datatype[size],
FOREIGN KEY(column_name) REFERENCES <parent_table_name>(column_name));
EX: SQL> CREATE TABLE marks3(
sid VARCHAR2(4),
marks NUMBER(3),
PRIMARY KEY(sid),
FOREIGN KEY(sid) REFERENCES student1(sid));

SQL> DESC MARKS3;


Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
MARKS NUMBER(3)

Query : ADD CHECK CONSTRAINT @ ALTER LEVEL ( ON EXISTING TABLE)


MARKS IN BETWEEN 0 AND 100.

SQL> ALTER TABLE marks3 ADD CHECK ( marks>0 AND marks< =100 );

ADD CHECK CONSTRAINT @ TABLE LEVEL (AT THE END OF TABLE


DEFINITION)
MARKS IN BETWEEN 0 AND 100.

SQL> CREATE TABLE marks3 ( sid VARCHAR2(4),


sec VARCHAR2(2),
marks NUMBER(3),
CHECK(marks>0 AND marks<=100) );

//ADDING FOREIGN KEY @ ALTER LEVEL

SQL> ALTER TABLE marks1 ADD FOREIGN KEY (sid) REFERENCES


STUDENT1(sid);
SQL> desc marks1;
Name Null? Type
----------------------------------------- -------- ----------------------------
SID NOT NULL VARCHAR2(4)
MARKS NUMBER(3)

Note :-
 PRIMARY KEY cannot be dropped if it referenced by any FOREIGN KEY constraint.
If PRIMARY KEY is dropped with CASCADE option then along with PRIMARY KEY referencing
FOREING KEY is also dropped.
PRIMARY KEY column cannot be dropped if it is referenced by some FOREIGN KEY.
PRIMARY KEY table cannot be dropped if it is referenced by some FOREIGN KEY.

13
PRIMARY KEY table cannot be truncated if it is referenced by some FOREIGN KEY.
Note:: Once the primary key and foreign key relationship has been created then you can
not remove any parent record if the dependent childs exists.

USING ON DELETE CASCADE

By using this clause you can remove the parent record even if childs exists. Because when
ever you remove parent record oracle automatically removes all its dependent records from
child table, if this clause is present while creating foreign key constraint.
Ex: Consider twe tables dept(parent) and emp(child) tables.
TABLE LEVEL
SQL> create table emp(empno number(2), ename varchar(10), deptno number(2), primary
key(empno), foreign key(deptno) references dept(deptno) on delete cascade); // without label

SQL> create table emp(empno number(2), ename varchar(10), deptno number(2),


constraint pk primary key(empno), constraint fk foreign key(deptno) references
dept(deptno) on delete cascade); // with label
ALTER LEVEL
SQL> alter table emp add foreign key(deptno) references dept(deptno) on delete cascade;
SQL> alter table emp add constraint fk foreign key(deptno) references dept(deptno) on
delete cascade;
Enabling/Disabling a Constraint:
If the constraints are present, then for each DML operation constraints are checked by
executing certain codes internally. It may slow down the DML operation marginally. For massive
DML operations, such as transferring data from one table to another because of the presence
of constraint, the speed will be considered slower. To improve the speed in such cases, the
following methods are adopted:

Disable constraint
Performing the DML operation DML operation
Enable constraint

Disabling Constraint:-
Syntax :-
ALTER TABLE <tabname> DISABLE CONSTRAINT <constraint_name> ;
Example :-
SQL>ALTER TABLE student1 DISABLE CONSTRAINT ck ;
SQL>ALTER TABLE mark1 DISABLE PRIMARY KEY CASCADE;

14
NOTE:-
If constraint is disabled with CASCADE then PK is disabled with FK.

Enabling Constraint :-
Syntax :-
ALTER TABLE <tabname> ENABLE CONSTRAINT <name>
Example :-
SQL>ALTER TABLE student1 ENABLE CONSTRAINT ck;

III. SET OPERATIONS IN SQL


SQL supports few Set operations to be performed on table data. These are used to get meaningful
results from data, under different special conditions. The SET operators combine the results of
two or more component queries into one result. Queries containing SET operators are called
Compound Queries.

The number of columns and data types of the columns being selected must be identical in all the
SELECT statements used in the query. The names of the columns need not be identical.

All SET operators have equal precedence. If a SQL statement contains multiple SET operators,
the oracle server evaluates them from left (top) to right (bottom) if no parentheses explicitly
specify another order.

Introduction

SQL set operators allows combine results from two or more SELECT statements. At first sight
this looks similar to SQL joins although there is a big difference. SQL joins tends to combine
columns i.e. with each additionally joined table it is possible to select more and more columns.
SQL set operators on the other hand combine rows from different queries with strong
preconditions .

 Retrieve the same number of columns and


 The data types of corresponding columns in each involved SELECT must be compatible
(either the same or with possibility implicitly convert to the data types of the first
SELECT statement).

Set operator types

According to SQL Standard there are following Set operator types:


15
 UNION ---returns all rows selected by either query. To return all rows from multiple
tables and eliminates any duplicate rows.
 UNION ALL-- returns all rows from multiple tables including duplicates.
 INTERSECT – returns all rows common to multiple queries.
 MINUS—returns rows from the first query that are not present in second query.

Note: Whenever these operators used select statement must have

- Equal no. of columns.


- Similar data type columns.

Syntax :-

SELECT statement 1
UNION / UNION ALL / INTERSECT / MINUS
SELECT statement 2 ;
Rules :-

1 No of columns returned by first query must be equal to no of columns returned by


second query
2 Corresponding columns datatype type must be same.

1. UNION

 UNION operator combines data returned by two SELECT statement.


 Eliminates duplicates.
 Sorts result.
 This will combine the records of multiple tables having the same structure.

Example :-

1 SQL>SELECT job FROM emp WHERE deptno=10


UNION
SELECT job FROM emp WHERE deptno=20 ;

2 SQL>SELECT job,sal FROM emp WHERE deptno=10


16
UNION
SELECT job,sal FROM emp WHERE deptno=20 ORDER BY sal ;
NOTE:- ORDER BY clause must be used with last query.

3 SQL> select * from student1 union select * from student2;

2. UNION ALL

This will combine the records of multiple tables having the same structure
but including duplicates. IT is similar to UNION but it includes duplicates.

Example :-

SQL>SELECT job FROM emp WHERE deptno=10


UNION ALL
SELECT job FROM emp WHERE deptno=20 ;

SQL> select * from student1 union all select * from student2;

3. INTERSECT

This will give the common records of multiple tables having the same
structure.

INTERSECT operator returns common values from the result of two SELECT statements.

Example:-

Display common jobs belongs to 10th and 20th departments ?

EX 1: SQL>SELECT job FROM emp WHERE deptno=10


17
INTERSECT
SELECT job FROM emp WHERE deptno=20;

EX2: SQL> select * from student1 intersect select * from student2;

4. MINUS

This will give the records of a table whose records are not in other tables
having the same structure.

MINUS operator returns values present in the result of first SELECT statement and not present
in the result of second SELECT statement.

Example:-

Display jobs in 10th dept and not in 20th dept ?

EX1: SQL>SELECT job FROM emp WHERE deptno=10


MINUS
SELECT job FROM emp WHERE deptno=20;

Ex2: SQL> select * from student1 minus select * from student2;

UNION vs JOIN :-

UNION JOIN
Union combines data Join relates data
Union is performed on similar structures Join can be performed also be performed
on
dissimilar structures also

[Link] JOINS

A SQL JOIN is an Operation , used to retrieve data from multiple tables. It is performed
whenever two or more tables are joined in a SQL statement. so, SQL Join clause is used to
combine records from two or more tables in a database. A JOIN is a means for combining fields
from two tables by using values common to each. Several operators can be used to join tables,

18
such as =, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; these all to be used to join tables.
However, the most common operator is the equal symbol.
SQL Join Types:
There are different types of joins available in SQL:
 INNER JOIN: Returns rows when there is a match in both tables.
 OUTER JOIN : Returns all rows even there is a match or no match in tables.
- LEFT JOIN/LEFT OUTER JOIN: Returns all rows from the left table,
even if there are no matches in the right table.
-RIGHT JOIN/RIGHT OUTER JOIN : Returns all rows from the right table, even if
there are
no matches in the left table.
-FULL JOIN/FULL OUTER JOIN : Returns rows when there is a match in one of the
tables.
 SELF JOIN: It is used to join a table to itself as if the table were two tables, temporarily
renaming at least one table in the SQL statement.
 CARTESIAN JOIN or CROSS JOIN : It returns the Cartesian product of the sets of records
from the two or more joined tables.
Based on Operators, The Join can be classified as
- Inner join or Equi Join
- Non-Equi Join
 NATURAL JOIN: It is performed only when common column name is same. In this,no
need to specify join condition explicitly , ORACLE automatically performs join
operation on the column with same name.
1. SQL INNER JOIN (simple join)
It is the most common type of SQL join. SQL INNER JOINS return all rows from multiple
tables where the join condition is met.
Syntax
SELECT columns FROM table1 INNER JOIN table2 ON [Link] = [Link];
Visual Illustration
In this visual diagram, the SQL INNER JOIN returns the shaded area:

19
The SQL INNER JOIN would return the records where table1 and table2 intersect.
Let's look at some data to explain how the INNER JOINS work with example.
We have a table called SUPPLIERS with two fields (supplier_id and supplier_name).
It contains the following data:
supplier_id supplier_name
10000 ibm
10001 hewlett packard
10002 microsoft
10003 nvidia
We have another table called ORDERS with three fields (order_id, supplier_id, and
order_date).
It contains the following data:
order_id supplier_id order_date
500125 10000 2003/05/12
500126 10001 2003/05/13
500127 10004 2003/05/14
Example of INNER JOIN:
Q: List supplier id, name and order id of supplier.
SELECT s.supplier_id, s.supplier_name, od.order_date FROM suppliers s INNER JOIN
orders od ON s.supplier_id = od.supplier_id;
This SQL INNER JOIN example would return all rows from the suppliers and orders
tables where there is a matching supplier_id value in both the suppliers and orders tables.
Our result set would look like this:
supplier_id name order_date
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13

20
The rows for Microsoft and NVIDIA from the supplier table would be omitted, since the
supplier_id's 10002 and 10003 do not exist in both tables.
The row for 500127 (order_id) from the orders table would be omitted, since the
supplier_id 10004 does not exist in the suppliers table.
[Link] JOIN:
Inner / Equi join returns only matching records from both the tables but not unmatched record,
An Outer join retrieves all row even when one of the column met join condition.
Types of outer join:
1. LEFT JOIN/LEFT OUTER JOIN
[Link] JOIN/RIGHT OUTER JOIN
[Link] JOIN/FULL OUTER JOIN
[Link] OUTER JOIN
This type of join returns all rows from the LEFT-hand table specified in the ON
condition and only those rows from the other table where the joined fields are equal (join
condition is met).
Syntax
SELECT columns FROM table1 LEFT [OUTER] JOIN table2
ON [Link] = [Link];
Visual Illustration
In this visual diagram, the SQL LEFT OUTER JOIN returns the shaded area:

The SQL LEFT OUTER JOIN would return the all records from table1 and only those
records from table2 that intersect with table1.
Example
SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM
suppliers LEFT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;
This LEFT OUTER JOIN example would return all rows from the suppliers table and only
those rows from the orders table where the joined fields are equal.
21
supplier_id supplier_name order_date
-------------- ---------------------- -----------------------
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13
10002 microsoft <null>
10003 nvidia <null>

The rows for Microsoft and NVIDIA would be included because a LEFT OUTER JOIN
was used. However, you will notice that the order_date field for those records contains a
<null> value.
2.2 SQL RIGHT OUTER JOIN
This type of join returns all rows from the RIGHT-hand table specified in the ON
condition and only those rows from the other table where the joined fields are equal (join
condition is met).
Syntax
SELECT columns FROM table1 RIGHT [OUTER] JOIN table2 ON [Link] =
[Link];
In some databases, the RIGHT OUTER JOIN keywords are replaced with RIGHT JOIN.
Visual Illustration
In this visual diagram, the SQL RIGHT OUTER JOIN returns the shaded area:

The SQL RIGHT OUTER JOIN would return the all records from table2 and only those
records from table1 that intersect with table2.
Example
SELECT orders.order_id, orders.order_date, suppliers.supplier_name FROM suppliers
RIGHT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;

22
This RIGHT OUTER JOIN example would return all rows from the orders table and only
those rows from the suppliers table where the joined fields are equal.
If a supplier_id value in the orders table does not exist in the suppliers table, all fields in
the suppliers table will display as <null> in the result set.
order_id order_date supplier_name
------------ --------------- -----------------
500125 2013/05/12 ibm
500126 2013/05/13 hewlett packard
500127 2013/05/14 <null>

The row for 500127 (order_id) would be included because a RIGHT OUTER JOIN was
used. However, you will notice that the supplier_name field for that record contains a
<null> value.
2.3. SQL FULL OUTER JOIN
This type of join returns all rows from the LEFT-hand table and RIGHT-hand table with
nulls in place where the join condition is not met.
Syntax
SELECT columns FROM table1 FULL [OUTER] JOIN table2 ON [Link] =
[Link]; In some databases, the FULL OUTER JOIN keywords are replaced with
FULL JOIN.
Visual Illustration
In this visual diagram, the SQL FULL OUTER JOIN returns the shaded area:

The SQL FULL OUTER JOIN would return the all records from both table1 and table2.
Example
Here is an example of a SQL FULL OUTER JOIN:
Query : Find supplier id, supplier name and order date of suppliers who have ordered.

23
SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM
suppliers FULL OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;

This FULL OUTER JOIN example would return all rows from the suppliers table and all
rows from the orders table and whenever the join condition is not met, <nulls> would be
extended to those fields in the result set.
If a supplier_id value in the suppliers table does not exist in the orders table, all fields in
the orders table will display as <null> in the result set. If a supplier_id value in the orders
table does not exist in the suppliers table, all fields in the suppliers table will display as
<null> in the result set.
supplier_id supplier_name order_date
----------------- -------------------- ---------------------
10000 ibm 2013/05/12
10001 hewlett packard 2013/05/13
10002 microsoft <null>
10003 nvidia <null>
<null> <null> 2013/05/14

The rows for Microsoft and NVIDIA would be included because a FULL OUTER JOIN
was used. However, you will notice that the order_date field for those records contains a
<null> value.
The row for supplier_id 10004 would be also included because a FULL OUTER JOIN was
used. However, you will notice that the supplier_id and supplier_name field for those
records contain a <null> value.
Equi join :
When the Join Condition is based on EQUALITY (=) operator, the join is said to be an Equi
join. It is also called as Inner Join.
Syntax
Select col1,col2,…From <table 1>,<table 2>Where <join condition with ‘=’ > .
[Link] : Find supplier id, supplier name and order date of suppliers who have ordered .
select [Link], [Link] ,o.order_date from suppliers s, orders o where
[Link] =[Link].

24
supplier_id name order_date
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13
Non Equi Join :-
When the join condition based on other than equality operator , the join is said to be a Non-Equi
join.
Syntax:-
Select col1,col2,…….
From <table 1>,<table 2>
Where <join condition > [AND <join cond> AND <cond> ----]
In NON- EQUI JOIN, JOIN COND is not based on = operator. It is based on other than =
operator, usually BETWEEN or > or < operators.
Query : Find supplier id,supplier name and order date in between 50025 and 500127.
sql> select s.supplier_id,s.supplier_name,o.order_date from suppliers s , orders o where
o.order_id between 500125 and 500127;

SUPPLIER_ID SUPPLIER_N ORDER_DAT


----------- ---------- ---------
10000 ibm 12-may-03
10000 ibm 13-may-03
10000 ibm 14-may-03
10001 hewlett 12-may-03
10001 hewlett 13-may-03
10001 hewlett 14-may-03
10002 microsoft 12-may-03
10002 microsoft 13-may-03
10002 microsoft 14-may-03
10003 nvidia 12-may-03
10003 nvidia 13-may-03
10003 nvidia 14-may-03
Query : Find supplier id,supplier name and order date above 500126.
sql> select s.supplier_id,s.supplier_name,o.order_date from suppliers s , orders o where
o.order_id >500126;

SUPPLIER_ID SUPPLIER_NO ORDER_DAT


----------- ---------- ---------
10000 ibm 14-may-03
10001 hewlett 14-may-03
10002 microsoft 14-may-03
10003 nvidia 14-may-03
25
Self Join :-
Joining a table to itself is called Self Join.
 Self Join is performed when tables having self refrential integrity.
 To perform Self Join same table must be listed twice with different alias.
 Self Join is Equi Join within the table.
It is used to join a table to itself as if the table were two tables, temporarily renaming at least
one table in the SQL statement.

Syntax :
(Here T1 and T2 refers same table)
SELECT <collist> From Table1 T1, Table1 T2
Where T1.Column1=T2.Column2;
Example:
select s1.supplier_id ,s1.supplier_name ,s2.supplier_id from suppliers s1, suppliers s2 where
s1.supplier_id=s2.supplier_id ;
supplier_id supplier_name supplier_id
----------------- ------------------ ---------------
10000 ibm 10000
10001 hewlett packard 10001
10002 microsoft 10002
10003 nvidia 10003

CROSS JOIN:
It returns the Cartesian product of the sets of records from the two or more joined tables. In
Cartesian product, each element of one set is combined with every element of another set to form
the resultant elements of Cartesian product.
Sytax: SELECT * FROM <tablename1> CROSS JOIN <tablename2>

26
 CROSS JOIN returns cross product of two tables.
 Each record of one table is joined to each and every record of another table.
 If table1 contains 10 records and table2 contains 5 records then CROSS JOIN between
table1 and table2 returns 50 records.
 ORACLE performs CROSS JOIN when we submit query without JOIN COND.
Example: sql> SELECT * FROM suppliers CROSS JOIN orders;
supplier_id supplier_n order_id supplier_id order_dat
----------- ---------- ---------- ----------- ---------
10000 ibm 500125 10000 12-may-03
10000 ibm 500126 10001 13-may-03
10000 ibm 500127 10003 14-may-03
10001 hewlett 500125 10000 12-may-03
10001 hewlett 500126 10001 13-may-03
10001 hewlett 500127 10003 14-may-03
10002 microsoft 500125 10000 12-may-03
10002 microsoft 500126 10001 13-may-03
10002 microsoft 500127 10003 14-may-03
10003 nvidia 500125 10000 12-may-03
10003 nvidia 500126 10001 13-may-03
NATURAL JOIN:
 NATURAL JOIN is possible in ANSI SQL/92 standard.
 NATURAL JOIN is similar to EQUI JOIN.
 NATURAL JOIN is performed only when common column name is same.
 In NATURAL JOIN no need to specify join condition explicitly , ORACLE
automatically performs join operation on the column with same name.

27
Syntax: SELECT <column list> FROM table1 NATURAL JOIN table2;
Example: ( Sailors table)
SELECT sid,sname,sid FROM sailors NATURAL JOIN reserves ; //both tables have
same column name.
SID SNAME SID
---------- ---------- ----------
22 DUSTIN 22
22 DUSTIN 22
22 DUSTIN 22
22 DUSTIN 22
31 LUBBER 31
31 LUBBER 31
31 LUBBER 31
64 HORTIO 64
64 HORTIO 64
74 HORTIO 74
VI. VIEWS
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
[Link] abstraction is usually required after a table is created and populated with data. Data
held by some tables might require restricted access to prevent all users from accessing all columns
of a table, for data security reasons. Such a security issue can be solved by creating several tables
with appropriate columns and assigning specific users to each such table, as required. This answers
data security requirements very well but gives rise to a great deal of redundant data being resident
in tables, in the [Link] reduce redundant data to the minimum possible, Oracle provides
Virtual tables which are Views.
View Definition :-
A View is a virtual table based on the result returned by a SELECT query.
The most basic purpose of a view is restricting access to specific column/rows from a table thus
allowing different users to see only certain rows or columns of a table.
Composition Of View:-
A view is composed of rows and columns, very similar to table. The fields in a view are fields
from one or more database tables in the database.
SQL functions, WHERE clauses and JOIN statements can be applied to a view in the same
manner as they are applied to a table.
View storage:-
Oracle does not store the view data. It recreates the data, using the view’s SELECT statement,
every time a user queries a view.

28
A view is stored only as a definition in Oracle’s system catalog.
When a reference is made to a view, its definition is scanned, the base table is opened and the
view is created on top of the base [Link], therefore, means that a view never holds data, until
a specific call to the view is made. This reduces redundant data on the HDD to a very large
extent.
Advantages Of View:-
Security:- Each user can be given permission to access only a set of views that contain specific
data.
Query simplicity:- A view can drawn from several different tables and present it as a single table
turning multiple table queries into single table queries against the view.
Data Integrity:- If data is accessed and entered through a view, the DBMS can automatically
check the data to ensure that it meets specified integrity constraints.
Disadvantage of View:-
Performance:- Views only create the appearance of the table but the RDBMS must still translate
queries against the views into the queries against the underlined source tables. If the view is
defined on a complex multiple table query then even a simple query against the view becomes a
complicated join and takes a long time to execute.
Types of Views :-
 Simple Views
 Complex Views

Simple Views :-
a View based on single table is called simple view.
Syntax:-
CREATE VIEW <View Name>
AS
SELECT<ColumnName1>,<ColumnName2>..
FROM <TableName>
[WHERE <COND>]
[WITH CHECK OPTION]
[WITH READ ONLY]

Example :-

SQL>CREATE VIEW emp_v


AS
SELECT empno,ename,sal FROM emp ;

Views can also be used for manipulating the data that is available in the base tables[i.e. the
user can perform the Insert, Update and Delete operations through view.

Views on which data manipulation can be done are called Updateable Views.
29
If an Insert, Update or Delete SQL statement is fired on a view, modifications to data in the
view are passed to the underlying base table.
For a view to be updatable,it should meet the following criteria:
Views defined from Single table.
If the user wants to INSERT records with the help of a view, then the PRIMARY KEY
column(s) and all the NOT NULL columns must be included in the view.
Inserting record through view :-

SQL>INSERT INTO emp_v VALUES(1,’A’,5000,200) ;

Updating record throught view :-

Updating a View:
A view can be updated under certain conditions:
 The SELECT clause may not contain the keyword DISTINCT.
 The SELECT clause may not contain summary functions.
 The SELECT clause may not contain set functions.
 The SELECT clause may not contain set operators.
 The SELECT clause may not contain an ORDER BY clause.
 The FROM clause may not contain multiple tables.
 The WHERE clause may not contain subqueries.
 The query may not contain GROUP BY or HAVING.
 Calculated columns may not be updated.
 All NOT NULL columns from the base table must be included in the view in order for
the INSERT query to function.
So if a view satisfies all the above-mentioned rules then you can update a view.

EX: SQL>UPDATE emp_v SET sal=2000 WHERE empno=1;

Deleting record throught view :-

SQL>DELETE FROM emp_v WHERE empno=1;

With Check Option :-

If VIEW created with WITH CHECK OPTION then any DML operation through that view
violates where condition then that DML operation returns error.

Example :-
SQL>CREATE VIEW V2
AS
SELECT empno,ename,sal,deptno FROM emp
WHERE deptno=10
WITH CHECK OPTION ;

30
Then insert the record into emp table through view V2
SQL>INSERT INTO V2 VALUES(2323,’RAJU’,4000,20) ;
The above INSERT returns error because DML operation violating WHERE clause.
Complex Views :-
A view is said to complex view
If it based on more than one table
Query contains
AGGREGATE functions
DISTINCT clause
GROUP BY clause
HAVING clause
Sub-queries
Constants
Strings or Values Expressions
UNION,INTERSECT,MINUS operators.
Example 1 :-
SQL>CREATE VIEW V3
AS
SELECT [Link],[Link],[Link],[Link],[Link]
FROM emp E JOIN dept D
USING(deptno) ;

NON- UPDATABLE VIEWS:

we cannot perform insert or update or delete operations on base table through complex views.
Complex views are not updatable views.
Example 2 :-
SQL>CREATE VIEW V2
AS
SELECT deptno,SUM(sal) AS sumsal
FROM EMP
GROUP BY deptno;

Destroying a View:-
The DROP VIEW command is used to destroy a view from the database.
Syntax:-
DROP VIEW<viewName>
Example :-
SQL>DROP VIEW emp_v;
DIFFERENCES BETWEEN SIMPLE AND COMPLEX VIEWS:
31
SIMPLE COMPLEX

Created from one table Created from one or more tables

Does not contain functions Conations functions

Does not contain groups of data Contain groups of data

MATERIALIZED VIEW: @ DATAWAREHOUSE SYSTEMS

A materialized view in Oracle is a database object that contains the results of a query. They are
local copies of data located remotely, or are used to create summary tables based on aggregations
of a table's data. Materialized views, which store data based on remote tables are also, know as
snapshots.

A materialized view can query tables, views, and other materialized views. Collectively these are
called master tables (a replication term) or detail tables (a data warehouse term).

For replication purposes, materialized views allow you to maintain copies of remote data on your
local node. These copies are read-only. If you want to update the local copies, you have to use
the Advanced Replication feature. You can select data from a materialized view as you would
from a table or view.

For data warehousing purposes, the materialized views commonly created are aggregate views,
single-table aggregate views, and join views.

In replication environments, the materialized views commonly created are primary key, rowid,
and subquery materialized views.

SYNTAX:
CREATE MATERIALIZED VIEW view-name
BUILD [IMMEDIATE | DEFERRED]
REFRESH [FAST | COMPLETE | FORCE ]
ON [COMMIT | DEMAND ]
[[ENABLE | DISABLE] QUERY REWRITE]
[ON PREBUILT TABLE]
AS
SELECT COLUMN_LIST FROM TABLE_NAME;

The BUILD clause options are shown below.


 IMMEDIATE : The materialized view is populated immediately.
 DEFERRED : The materialized view is populated on the first requested refresh.

32
The following refresh types are available.
 FAST : A fast refresh is attempted. If materialized view logs are not present against the
source tables in advance, the creation fails.
 COMPLETE : The table segment supporting the materialized view is truncated and
repopulated completely using the associated query.
 FORCE : A fast refresh is attempted. If one is not possible a complete refresh is
performed.
A refresh can be triggered in one of two ways.
 ON COMMIT : The refresh is triggered by a committed data change in one of the
dependent tables.
 ON DEMAND : The refresh is initiated by a manual request or a scheduled task.
The QUERY REWRITE clause tells the optimizer if the materialized view should be consider
for query rewrite operations. An example of the query rewrite functionality is shown below.
The ON PREBUILT TABLE clause tells the database to use an existing table segment, which
must have the same name as the materialized view and support the same column structure as the
query.

Example:
The following statement creates the rowid materialized view on table emp located on a remote
database:
SQL> CREATE MATERIALIZED VIEW mv_emp_rowid
REFRESH WITH ROWID
AS SELECT * FROM emp@remote_db;

Materialized view log created.

[Link]

USING “ ORDER BY” clause:

This will be used to ordering the columns data (ascending or descending).


Syntax1: (simple form)
select * from <table_name> order by <col> desc;

Note: By default oracle will use ascending order.

If you want output in descending order you have to use desc keyword after the column.
Ex:
SQL> select * from student order by no; SQL> select * from student order by no desc;

33
The order of rows returned in a query result is undefined. The ORDER BY clause can be used to
sort the rows. If you use the ORDER BY clause, it must be the last clause of the SQL statement.
You can specify an expression, or an alias, or column position in ORDER BY clause.

Syntax2 : ( complex form)

SELECT expr FROM table


[WHERE condition(s)]
[ORDER BY {column, expr} [ASC|DESC]];

In the syntax,

ORDER BY :specifies the order in which the retrieved rows are displayed.
orders the rows in ascending order ( default order)
orders the rows in descending order

t Ordering of Data :-

 Numeric values are displayed with the lowest values firs for example 1–999.
 Date values are displayed with the earliest value first for example 01-JAN-92 before 01-
JAN-95.
 Character values are displayed in alphabetical order—for example, A first and Z last.
 Null values are displayed last for ascending sequences and first for descending
sequences.
Examples :-
Arrange employee records in ascending order of their sal ?

34
SQL>SELECT * FROM emp ORDER BY sal ;

Arrange employee records in descending order of their sal ?

SQL>SELECT * FROM emp ORDER BY sal DESC ;

Display employee records working for 10th dept and arrange the result in ascending order of
their sal ?

SQL>SELECT * FROM emp WHERE deptno=10 ORDER BY sal ;

Arrange employee records in ascending of their deptno and with in dept arrange records in
descending order of their sal ?

SQL>SELECT * FROM emp ORDER BY deptno,sal DESC ;

In ORDER BY clause we can use column name or column position , for example

SQL>SELECT * FROM emp ORDER BY 5 DESC ;

In the above example records are sorted based on the fifth column in emp table.

Arrange employee records in descending order of their comm. If comm. Is null then arrange
those records last ?

SQL>SELECT * FROM emp ORDER BY comm DESC NULLS LAST ;

[Link] BY AND HAVING CLAUSE

GROUP BY clause

Using group by, we can create groups of related information. Columns used in select must be
used with group by, otherwise it was not a group by expression.

SELECT [DISTINCT] select-list

FROM from-list

WHERE qualification

GROUP BY grouping-list

HAVING group-qualification

 The select list in the SELECT clause contain


1. A list of column names
2. A list of terms having the form aggop( aggregate operators)
Every column that appear in (1) must also appear in grouping-list

 The expression appearing in the group-qualification in the HAVING clause must have a
single value per group.
35
Ex: SQL> select deptno, sum(sal) from emp group by deptno;

SQL> select deptno, sum(sal) from emp group by deptno;

DEPTNO SUM(SAL)

---------- ----------

10 8750

20 10875

30 9400

SQL> select deptno,job,sum(sal) from emp group by deptno,job;

Sql> Find the age of the youngest sailor for each rating level.

SQL> Select [Link], MIN([Link]) from sailors s GROUP BY [Link];

Find the age of the youngest sailor who is eligible to vote for each rating level with at least two
such sailors ?

SQL> select [Link], MIN([Link]) as minage from sailors s where [Link]>=18

GROUP BY [Link]

HAVING COUNT(*) > 1;

For each red boat find the number of reservations for this boat?

SQL> Select [Link], COUNT(*) AS reservationcount from boats b,

reserves r where [Link]=[Link] and [Link]=’red’

GROUP BY [Link];

Find the average age of sailors for each rating level that has at least two sailors ?

SQL> Select [Link], AVG([Link]) AS avgage from sailors s

GROUP BY [Link]

HAVING COUNT(*) > 1;

IX. AGGREGATION

It is a group operation, which will be works on all records of a table. To do this, Group
functions required to process group of rows and Returns one value from that group.

36
These functions are also called AGGREGATE functions or GROUP functions

 Aggregate functions - max(),min(),sum(),avg(),count(),count(*).


Group functions will be applied on all the rows but produces single output.
a) SUM
This will give the sum of the values of the specified column.
Syntax: sum (column)
Ex: SQL> select sum(sal) from emp;
b) AVG
This will give the average of the values of the specified column.
Syntax: avg (column)
Ex: SQL> select avg(sal) from emp;
c) MAX
This will give the maximum of the values of the specified column.
Syntax: max (column)
Ex: SQL> select max(sal) from emp;
d) MIN
This will give the minimum of the values of the specified column.
Syntax: min (column)
Ex: SQL> select min(sal) from emp;
e) COUNT
This will give the count of the values of the specified column.
Syntax: count (column)
Ex: SQL> select count(sal),count(*) from emp;

X. SUB QUERIES
What is subquery in SQL?
A subquery is a SQL query nested inside a larger query.
 A subquery may occur in :
- A SELECT clause
- A FROM clause
37
- A WHERE clause
 The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE
statement or inside another subquery.
 A subquery is usually added within the WHERE Clause of another SQL SELECT
statement.
 You can use the comparison operators, such as >, <, or =. The comparison operator can
also be a multiple-row operator, such as IN, ANY, or ALL.
 A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
 The inner query executes first before its parent query so that the results of inner query can
be passed to the outer query.
You can use a subquery in a SELECT, INSERT, DELETE, or UPDATE statement to perform
the following tasks :
 Compare an expression to the result of the query.
 Determine if an expression is included in the results of the query.
 Check whether the query selects any rows.
Syntax :

 The subquery (inner query) executes once before the main query (outer query) executes.
 The main query (outer query) use the subquery result.

SQL Subqueries Example :


In this section, you will learn the requirements of using subqueries. We have the following two
tables 'student' and 'marks' with common field 'SID'.

SQL> select *from student1;


SID NAME
---- ----------

38
v001 abhi
v002 abhay
v003 arjun
v004 anand
SQL> select *from marks;
SID TOTALMARKS
---------- ----------
v001 95
v002 80
v003 74
v004 81
Now we want to write a query to identify all students who get better marks than that of the
student who's StudentID is 'V002', but we do not know the marks of 'V002'.
- To solve this problem, we require two queries.
One query returns the marks (stored in Totalmarks field) of 'V002' and a second query identifies
the students who get better marks than the result of the first query.
SQL> select *from marks where sid='v002';
Query Result:
SID TOTALMARKS
---------- ----------
v002 80
The result of query is 80.
- Using the result of this query, here we have written another query to identify the students who
get better marks than 80. Here is the query :
Second query :
SQL> select [Link],[Link],[Link] from student1 s, marks m where [Link]=[Link] and
[Link]>80;
SID NAME TOTALMARKS
---- ---------- ----------
v001 abhi 95
v004 anand 81
Above two queries identified students who get better number than the student who's StudentID is
'V002' (Abhi).

39
You can combine the above two queries by placing one query inside the other. The subquery
(also called the 'inner query') is the query inside the parentheses. See the following code and
query result :

SQL> select [Link],[Link],[Link] from student1 s,marks m where [Link]=[Link] and


[Link] >(select totalmarks from marks where sid='v002');
SID NAME TOTALMARKS
---- ---------- ----------
v001 abhi 95
v004 anand 81

40
Subqueries: Guidelines
There are some guidelines to consider when using subqueries :
-A subquery must be enclosed in parentheses.
-A subquery must be placed on the right side of the comparison operator.
-Subqueries cannot manipulate their results internally, therefore ORDER BY clause cannot
be added in to a [Link] can use a ORDER BY clause in the main SELECT statement
(outer query) which will be last clause.
-Use single-row operators with single-row subqueries.
-If a subquery (inner query) returns a null value to the outer query, the outer query will not
return any rows when using certain comparison operators in a WHERE clause.
Type of Subqueries
 Single row subquery : Returns zero or one row.
 Multiple row subquery : Returns one or more rows.
 Multiple column subquery : Returns one or more columns.
 Correlated subqueries : Reference one or more columns in the outer SQL statement.
The subquery is known as a correlated subquery because the subquery is related to the outer
SQL statement.
 Nested subqueries : Subqueries are placed within another subqueries.
1)SINGLE ROW SUBQUERIES:- Returns zero or one row.
If inner query returns only one row then it is called single row subquery.

Syntax :-

SELECT <collist> FROM <tabname>


WHERE colname OPERATOR (SELECT statement)
Operator can be < > <= >= = <>
Examples:- (on Emp Table)

Q: Display employee records whose job equals to job of SMITH?

SQL>SELECT * FROM emp


WHERE job = (SELECT job FROM emp WHERE ename=’SMITH’) ;
Q: Display employee name earning maximum salary ?
SQL>SELECT ename FROM emp
WHERE sal = (SELECT MAX(sal) FROM emp) ;
41
Example2: (on SAILORS _BOAT_RESERVATION DATABASE )

SQL> SELECT * FROM SAILORS;


SID SNAME RATING AGE
---------- ---------- ---------- ----------
22 DUSTIN 7 45
29 BRUTUS 1 33
31 LUBBER 8 55.5
32 CANDY 8 25.5
58 RUSTY 10 35
64 HORATIO 7 35
71 ZOBRA 10 16
74 HORATIO 9 35
85 ART 3 25.5
95 BOB 3 63.5
SQL> SELECT * FROM BOATS;
BID BNAME COLOR
---------- ---------- ----------
101 INTERLAKE BLUE
102 INTERLAKE RED
103 CLIPPER GREEN
104 MARINE RED
SQL> SELECT * FROM RESERVES;
SID BID DAY
---------- ---------- ---------
22 101 10-OCT-17
22 102 10-OCT-17
22 103 10-OCT-17
22 104 10-JUL-17
31 102 11-OCT-17
31 103 11-JUN-17
31 104 11-DEC-17
64 101 09-MAY-17
64 102 09-AUG-17
74 103 09-AUG-17
Q: Find the sailor’s ID whose name is equal to ‘DUSTIN’
SQL> SELECT SID FROM SAILORS WHERE SID = (SELECT SID FROM
SAILORS WHERE SNAME='DUSTIN');
SID
----------
22
Q:Find sailors records whose name equals to ‘ DUSTIN’?

SQL> SELECT *FROM SAILORS WHERE SID = (SELECT SID FROM


SAILORS WHERE SNAME='DUSTIN');
42
SID SNAME RATING AGE
---------- ---------- ---------- ----------
22 DUSTIN 7 45
Q:Find the rating of a sailor whose name is ‘DUSTIN’.

SQL> SELECT RATING FROM SAILORS WHERE SID = (SELECT SID FROM SAILORS WHERE
SNAME='DUSTIN');

RATING
----------
7
Q: Find the sailors records whose sid is geater than ‘dustin’?
SQL> SELECT *FROM SAILORS WHERE SID > (SELECT SID FROM
SAILORS WHERE SNAME='DUSTIN');

SID SNAME RATING AGE


---------- ---------- ---------- ----------
29 BRUTUS 1 33
31 LUBBER 8 55.5
32 CANDY 8 25.5
58 RUSTY 10 35
64 HORATIO 7 35
71 ZOBRA 10 16
74 HORATIO 9 35
85 ART 3 25.5
95 BOB 3 63.5
Q:Find the sailors records ,whose sailors’ having maximum rating .
SQL> SELECT *FROM SAILORS WHERE RATING = (SELECT
MAX(RATING) FROM SAILORS);
SID SNAME RATING AGE
---------- ---------- ---------- ----------
58 RUSTY 10 35
71 ZOBRA 10 16

Q:Find the records of sailors whose rating is same as ‘DUSTIN’


SQL> SELECT *FROM SAILORS WHERE RATING = (SELECT RATING
FROM SAILORS WHERE SNAME='DUSTIN');

SID SNAME RATING AGE


---------- ---------- ---------- ----------
22 DUSTIN 7 45
43
64 HORATIO 7 35
Q:Find the records of sailors whose rating is higher than ‘DUSTIN’

SQL> SELECT *FROM SAILORS WHERE RATING > (SELECT


MAX(RATING) FROM SAILORS WHERE SNAME='DUSTIN');

SID SNAME RATING AGE


---------- ---------- ---------- ----------
31 LUBBER 8 55.5
32 CANDY 8 25.5
58 RUSTY 10 35
71 ZOBRA 10 16
74 HORATIO 9 35
MULTI ROW SUBQUERIES :

if inner query returns more than one row then it is called multi row subquery.

Syntax :-

SQL>SELECT <collist> FROM <tabname>


WHERE colname OPERATOR (SELECT statement) ;
Here, OPERATOR must be IN , NOT IN, ANY, ALL
IN operator :-

To test for values in a specified list of values, use IN operator. The IN operator can be used with
any data type. If characters or dates are used in the list, they must be enclosed in single
quotation marks (’’).

Syntax:-

IN (V1,V2,V3------------);

Note :-

IN ( ... ) is actually translated by Oracle server to a set of ‘OR’ conditions: a =value1 OR a =


value2 OR a = value3. So using IN ( ... ) has no performance benefits, and it is used for logical
simplicity.

Example :-

Q:Display employee records working as CLERK OR MANAGER ?

SQL>SELECT * FROM emp WHERE job IN (‘CLERK’,’MANAGER’) ;

44
Q:Find the name of sailors who have reserved boat 103

SQL> SELECT [Link] FROM SAILORS S WHERE [Link] IN (SELECT [Link]


FROM RESERVES R WHERE [Link]=103);
SNAME
----------
DUSTIN
LUBBER
HORATIO
Q:Find the name of sailors who have reserved a red boat

SQL> SELECT [Link] FROM SAILORS S WHERE [Link] IN (SELECT [Link]


FROM RESERVES R WHERE [Link] IN (SELECT [Link] FROM BOATS B WHERE
[Link]='RED'));
SNAME
----------
DUSTIN
LUBBER
HORATIO
Q:Find the names of sailors who have not reserved a red boat.

SELECT [Link] FROM SAILORS S WHERE [Link] NOT IN (SELECT [Link] FROM
RESERVES R WHERE [Link] IN (SELECT [Link] FROM BOATS B WHERE [Link]
= 'RED'));
SNAME
----------
BRUTUS
CANDY
RUSTY
ZOBRA
HORATIO
ART
BOB
Using EXISTS operator :-
EXISTS operator returns TRUE or FALSE.

If inner query returns at least one record then EXISTS returns TRUE otherwise returns FALSE.

ORACLE recommends EXISTS and NOT EXISTS operators instead of IN and NOT IN.

Q: Find the name of sailors who have reserved boat 103

SQL> SELECT [Link] FROM SAILORS S WHERE EXISTS (SELECT * FROM


RESERVES R WHERE [Link]=103 AND [Link] = [Link]) ;
45
SNAME
----------
DUSTIN
LUBBER
HORATIO
Q:Find the name of sailors who have not reserved boat 103
SQL> SELECT [Link] FROM SAILORS S WHERE NOT EXISTS (SELECT *
FROM RESERVES R WHERE [Link]=103 AND [Link] = [Link]) ;
SNAME
----------
BRUTUS
CANDY
RUSTY
HORATIO
ZOBRA
ART
BOB
ANY operator:-

Compares a value to each value in a list or returned by a query. Must be preceded by =, !=, >, <,
<=, >=. Evaluates to FALSE if the query returns no rows.

Select employees whose salary is greater than any salesman’s salary ?

SQL>SELECT ename FROM emp


WHERE SAL > ANY ( SELECT sal FROM emp WHERE job = 'SALESMAN');
Q:Find sailors whose rating is better than some sailor called Horatio?

SQL> SELECT [Link] FROM SAILORS S WHERE [Link] > ANY ( SELECT
[Link] FROM SAILORS S2 WHERE [Link]=’HORATIO’) ;
SID
----------
58
71
74
31
32
ALL operator :-
Compares a value to every value in a list or returned by a query. Must be preceded by =, !=, >,
<, <=, >=. evaluates to TRUE if the query returns no rows.

Example:-

Select employees whose salary is greater than every salesman’s salary ?

46
SQL>SELECT ename FROM emp
WHERE SAL > ALL ( SELECT sal FROM emp WHERE job = 'SALESMAN');
Q:Find sailors whose rating is better than every sailor called Horation?

SQL> SELECT [Link] FROM SAILORS S WHERE [Link] > ALL ( SELECT
[Link] FROM SAILORS S2 WHERE [Link]=’HORATIO’) ;
SID
----------
58
71
Multi Column Subqueries:-
If inner query returns more than one column value then it is called MULTI COLUMN subquery.

Example :-

Display employee names earning maximum salaries in their dept ?


SQL>SELECT ename FROM emp
WHERE (deptno,sal) IN (SELECT deptno,MAX(sal)
FROM emp
GROUP BY deptno) ;
SQL> SELECT SNAME FROM SAILORS WHERE (RATING,AGE) IN (SELECT
RATING,MAX(AGE) FROM SAILORS GROUP BY RATING);
SNAME
----------
DUSTIN
BRUTUS
LUBBER
RUSTY
HORATIO
BOB
SQL> SELECT SID,SNAME FROM SAILORS WHERE (RATING,AGE) IN (SELECT
RATING,MAX(AGE) FROM SAILORS GROUP BY RATING);
SID SNAME
---------- ----------
22 DUSTIN
29 BRUTUS
31 LUBBER
58 RUSTY
74 HORATIO
95 BOB
Nested Queries:-
A subquery embedded in another subquery is called NESTED QUERY.
47
Queries can be nested upto 255 level.

Example :-
Display employee name earning second maximum salary ?
SQL>SELECT ename FROM emp
WHERE sal = (SELECT MAX(sal) FROM EMP
WHERE sal < (SELECT MAX(sal) FROM emp)) ;

Q:Find the names of sailors who have not reserved a red boat.

SELECT [Link] FROM SAILORS S WHERE [Link] NOT IN (SELECT [Link] FROM
RESERVES R WHERE [Link] IN (SELECT [Link] FROM BOATS B WHERE [Link]
= 'RED'));
SNAME
----------
BRUTUS
CANDY
RUSTY
ZOBRA
HORATIO
ART
BOB
CORRELATED SUB QUERIES:
In the Co-Related sub query a parent query will be executed first and based on the output of
outer query the inner query execute.
If parent query returns N rows ,inner query executed for N times.
If a subquery references one or more columns of parent query is called CO-RELATED subquery
because it is related to outer query. This subquery executes once for each and every row of
main query.

Example1 :-

Display employee names earning more than avg(sal) of their dept ?

SQL>SELECT ename FROM emp x


WHERE sal > (SELECT AVG(sal) FROM emp
WHERE deptno=[Link]);

Example2: Find sailors whose rating more than avg(rating ) of their id.
SQL> SELECT [Link] FROM SAILORS S WHERE RATING > (SELECT AVG(RATING) FROM
SAILORS WHERE SID=[Link]);

48
no rows selected.

SUB QUERIES WITH SET OPERATORS:


Q1) Find the names of sailors who have reserved a red or a green boat?

SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and
[Link]=[Link] and ([Link] = ‘red’ or [Link]= ‘green’);

Or

SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and
[Link]=[Link] and [Link]=’red’

UNION

Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and


[Link]=[Link] and [Link]=’green’;

SNAME

Dustin

Lubber

Horatio

Q2) Find the names of sailors who have reserved a red and a green boat?

SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link]


and [Link]=[Link] and [Link]=’red’

INTERSECT

Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and


[Link]=[Link] and [Link]=’green’;

SNAME

Dustin

Lubber

49
Horatio

Q3) Find the names of sailors who have reserved a red boat but not green boat?

SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link]


and [Link]=[Link] and [Link]=’red’

MINUS

Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and


[Link]=[Link] and [Link]=’green’;

NO ROWS SELECTED

Q4) Find all sids of sailors who have a rating of 10 or reserved boat 104?

SQL>select [Link] from sailors s where [Link]=10

UNION

Select [Link] from reserves r where [Link]=104;

SID

22

31

58

71

50
Aditya College of Engineering & Technology
Aditya Nagar, ADB Road, Surampalem - 533437

DATABASE MANAGEMENT SYSTEMS

UNIT IV: Schema Refinement (Normalization)

Syllabus:
Schema Refinement (Normalization): Purpose of Normalization or schema refinement,
concept of functional dependency, normal forms based on functional dependency(1NF, 2NF
and 3 NF), concept of surrogate key, Boyce-codd normal form(BCNF), Lossless join and
dependency preserving decomposition, Fourth normal form(4NF), Fifth Normal Form (5NF).

Objectives:
After studying this unit, you will be able to:
 Discuss the different types of anomalies in a database
 State what is functional dependency
 List the different forms of normalization
 Differentiate among different types of normalization
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

INTRODUCTION TO SCHEMA REFINEMENT

The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
Normalization means “split the tables into small tables which will contain less number of
attributes in such a way that table design must not contain any problem of inserting,
deleting, updating anomalies and guarantees no redundancy”.
Normalization or Schema Refinement is a technique of organizing the data in the database.
It is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Redundancy: refers to repetition of same data or duplicate copies of same data stored in
different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file
database.

Anomalies or problems facing without normalization (problems due to


redundancy):

Anomalies refers to the problems occurred after poorly planned and unnormalized
databases where all the data is stored in one table which is sometimes called a flat file
database. Let us consider such type of schema

Here all the data is stored in a single table which causes redundancy of data or say
anomalies as SID and Sname are repeated once for same CID . Let us discuss anomalies one by
one.
Due to redundancy of data we may get the following problems, those are-
[Link] anomalies : It may not be possible to store some information unless some other
information is stored as well.
[Link] storage: some information is stored repeatedly
[Link] anomalies: If one copy of redundant data is updated, then inconsistency is created
unless all redundant copies of data are updated.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 2


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

[Link] anomalies: It may not be possible to delete some information without losing some
other information as well.
Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000,
then we have to update FEE column in all the rows, else data will become inconsistent.

Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy,
otherwise they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4
subject.

Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly:
Deletion of S3 student cause the deletion of course. Because of deletion of some data forced to
delete some other useful data.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 3


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Solutions To Anomalies : Decomposition of Tables – Schema Refinement, shown below.

Purpose of Normalization:

 Minimize the redundancy in data.


 Remove insert, update, and delete anomalies during the database activities.
 Reduce the need to organize the data when it is modified or enhanced.
 Normalization reduces a complex user view to a set of small and sub groups of fields or
relations. This process helps to design a logical data model known as conceptual data
model.

Advantages of Normalization:
1. Greater overall database organization will be gained.
2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.

Disadvantages of Normalization:
1. The disadvantage of normalization is that it produces a lot of tables with a relatively
small number of columns. These columns then have to be joined using their
primary/foreign key relationship.
2. This has two disadvantages.
Performance: all the joins required to merge data slow processing & place
additional stress on your hardware.
Complex queries: developers have to code complex queries in order to merge
data from different tables.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 4


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Concept of Functional Dependency:

Functional Dependencies are fundamental to the process of Normalization i.e., Functional


Dependency plays key role in differentiating good database design from bad database designs.
A functional dependency is a “type of constraint that is a generalization of the notation of
the key”.
Functional Dependency describes the relationship between attributes (columns) in a table.
Functional dependency is represented by an arrow sign (→).
In other words, a dependency FD: “X → Y” means that the values of Y are determined by the
values of X. Two tuples sharing the same values of X will necessarily have the same values of
Y. An attribute on left hand side is known as “Determinant”. Here X is a Determinant.

Example [Identifying the FD’s]


A B C D
A1 B1 C1 D1
A1 B2 C1 D2
A2 B2 C2 D2
A2 B2 C2 D3
A3 B3 C2 D4

Case1: A →B
Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not in FD.
Case1: A →C
Here A1→C1 and A2, A3→C2. So A has unique values in B. So it is in FD.
Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D

Reasoning about functional dependencies:

Armstrong Axioms (Inference Rules ) : The term Armstrong axioms refers to the sound
and complete set of inference rules or axioms, introduced by William W. Armstrong, that is
used to test logical implication of functional dependencies.

Armstrong axioms define the set of rules for reasoning about functional dependencies and also
to infer all the functional dependencies on a relational database.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 5


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Various axioms rules or inference rules:


Primary axioms:

Secondary or derived axioms:

Closure of a Set of Attributes:

Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
The set of FD’s that is logically implied by F is called the closure of F and written as F +. And it
is defined as “If F is a set FD’s on a relation R, the F+, the closure of F by using the inferences
axioms that are not contained in F+.
Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what
is the Closure of A, B, C, D?
Solution: A+ is
A+→ {A, B, D} i.e., A→B, B→D is exists and C is not FD on A. So it is eliminated.
B+→ { B, D} i.e., B→D is exists and A, C is not FD on A. So it is eliminated.
C+→ {C, B, D} i.e., C→B, B→D is exists and A is not FD on C. So it is eliminated.

The algorithm for computing the attribute closure of a set X of attributes is shown below

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 6


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Types of functional dependencies:

1. Fully Functional Dependency: A functional dependency is said to be full dependency


“if and only if the determinant of the functional dependency if either candidate key or
super key, and the dependent can be either prime or non-prime attribute”.
(OR)
Let’s take the functional dependency X → Y (i.e., X determines y). Here Y is said to be
fully determinant, if it cannot determine any subset of X.
Example: Consider the following determinant ABC → D i.e., ABC determines D but D
is not determined by any subset of A/ BC/C/B/AB i.e., BC→D, C→D, A→D
Functional dependencies are not exists. So D is Fully Functional Dependent.

2. Partial Functional Dependency: If a non-prime attribute of the relation is getting


derived by only a part of the candidate key, then such dependency is known as Partial
Dependency.
(OR)
In a relation having more than one key field, a subset of non key fields may depend on
all key fields but another subset or a particular non-key field may depend on only one
of the key fields. Such dependency is defined as Partial Dependency.
Example: Consider the following determinants AC→P, A→D, D→P. From these determinants P
is not fully FD on AC. Because, If we find A+ (means A’s Closure) A→D, D→P i.e., A→P. But we
don’t have any requirement of C. C attribute is removed completely. So P is Partially Dependent
on AC.
Under the following conditions a table cannot have partial F.D
(1) If primary key consists a single attribute
(2) If table consists only two attributes
(3) If all the attributes in the table are part of the primary key

3. Transitive Functional Dependency: If a non-prime attribute of a relation is getting


derived by either another non-prime attribute or the combination of the part of the
candidate key along with non-prime attribute, then such dependency is defined as
Transitive dependency. i.e., in a relation, there may be dependency among non-key
fields. Such dependency is called Transitive Functional Dependency.
Example: X→Y, and Y→Z then we can determine X→Z holds.
Under the following Circumstances, a table cannot have transitive F.D
(1) If table consists only two attributes
(2) If all the attributes in the table are part of the primary key.
4. Trivial Functional Dependency: It is basically related to Reflexive rule. i.e., if X is a set
of attributes, and Y is subset of X then X→Y holds.
Example: ABC→BC is a Trivial Dependency.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 7


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

5. Multi-Valued Dependency: Consider 3 fields X, Y, and Z in a relation. If for each value


of X, there is a well-defined set of values Y and Well-defined set of values of Z and set
of values of Y is independent of the set values of Z. This dependency is Multi-valued
Dependency. i.e., X →Y / Z.

Prime and non-prime attributes


Attributes which are parts of any candidate key of relation are called as prime attribute, others
are non-prime attributes.

Candidate Key:
Candidate Key is minimal set of attributes of a relation which can be used to identify a tuple
uniquely.
Consider student table: student(sno, sname,sphone,age)
we can take sno as candidate key. we can have more than 1 candidate key in a table.
types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)

Super Key:
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
 Adding zero or more attributes to candidate key generates super key.
 A candidate key is a super key but vice versa is not true.
Consider student table: student(sno, sname,sphone,age)
we can take sno, (sno, sname) as super key

Operations performed functional dependencies (applications of closure set of


attributes):
(1) To identify the additional F.D’s.
(2) To identify the keys.
(3) To identify the equivalences of the F.D’s
(4) To identify irreducible set (minimal set) of F.D’s or canonical forms of F.D’s or standard
form of F.D’s.

(1) To identify the additional F.D’s :


To check any F.D’s like AB can be determined from F1 or not. Complete A+ from F1 is A+
includes B also then; AB can be derived as a F.D in F1.

Examples:

1. In a schema with attributes A,B,C,D and E the following set of attributes are given
AB, AC, CDE, BD, EA. Find CDAC determines from the given FDs or
not.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 8


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Sol: Given FD is CDAC find the closure set of CD.

CD+ = CDE (∵ CD E)


= CDEA (∵ E A)
= CDEAB (∵ A B)
From the closure set the attributes AC are determined by CD so CD AC.

2. Check DA can be derived from the following FDs or not ABC, BCAD, DE,
CFB.

(2) Identification of key by using closure set as attributes:


A key attribute: An attribute that is capable of identifying all other attributes in a given table.

(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and
ti identify rows in the table uniquely.
(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify
uniquely the rows in the table so, we combine 2 or more attributes to identify the
rows uniquely.
(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be
used to identify the rows uniquely Eg :( vech no,veng no,purchase date) Either
vehicle no or vehicle engine no can be used as a key attribute then they are called as
candidate keys one of the candidate key can be elected as primary key.

Example 1: Find candidate keys for the relation R(ABCD) having following FD’s ABCD,
CA, DA.

Sol: From the given FD’s, the attribute B is key attribute because it is not in RHS of
functional dependency.

B+ = B (not a candidate key, find the combinations of B)


AB+ = ABCD (∵ AB CD)
BC+ = BCAD (∵ C A, AB CD)
BD+ = BDA (∵ D A )
CD+ = CDA (∵ D A )
AC+ = AC

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 9


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

AD+ =AD
From the above attributes AB and BC determines all attributes.
AB, BC are candidate keys.
Example 2: Find candidate keys for the relation R(ABCDE) having following FD’s ABC,
CDE, BD, EA.

Sol: From the given FD’s, no attribute is key attribute because all are in RHS of
functional dependency. So check for all attributes of LHS.

A+ = ABC (∵ A BC)
= ABCD (∵ B D)
= ABCDE (∵ CD E)
B+ = BD (∵ B D)
E+ = EA (∵ E A)
= EABC (∵ A BC)
= EABCD (∵ B D)
C + = C
D + = D
CD+ = CDE (∵ CD E)
= CDEA (∵ E A)
= CDEAB (∵ A BC)
BC+ = BCD (∵ B D)
= BCDE (∵ CD E)
= BCDEA (∵ E A)
From the above attributes A, E, CD and BC determines all attributes.
A, E, CD, BC are candidate keys.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 10


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

(3) To identify equivalence of F.D’s :

Different database designers may define different F.D’s sets from the same requirements. To
evaluate whether they are equivalent if we are able to derive all F.D’s in G from F and vice-
versa.

Find the equivalence of two sets of FDs.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 11


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Sol:

Step 1: Take set F and enclose all FD’s in G that can be derived from F.

ACD
A+ from F
=A
=AC (∵ A C)
=ACD (∵ AC D)
A  CD can be derived from F

EAH
E+ from F
=E
=EAD
=EADH
 E  AH can be derived from F

Step 2: Take set G and enclose all F.D’s in F that can be derived from
G. AC
A+ from G
=A
=ACD
A  C can be derived from G

E AD
E+ from G
=E
=EAH
=EAHCD
E  AH & E  AD can be derived from G
G and F are equivalent.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 12


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 13


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

(4) To identify the irreducible form of FD’s /canonical Form (minimal cover):
We try to minimize the functional dependency. The minimize FD should be equivalent to
original FD,
Procedure to find minimal set:
Step 1: Have single attributes on the RHS for every FD.
Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them
from the list.
Step 3: Evaluate the necessity of the LHS attributes in FD’s obtained from step [Link] they are not
necessary remove from FD.
Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step
[Link] we will get irreducible set.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 14


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Remove 4 and compute D+ from 1, 2, 4, 5&6

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 15


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Step 4:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 16


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Normal forms based on functional dependency (1NF, 2NF and 3 NF, Boyce-
Codd normal form (BCNF), 4NF)

Normalization means “split the tables into small tables which will contain less number of attributes in
such a way that table design must not contain any problem of inserting, deleting, updating anomalies
and guarantees no redundancy”.
The evolution of Normalization theories / Steps of Normalization / Different Normal Forms
is illustrated below-
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF).

Points to be Remember
 1 NF is a mandatory NF and remaining are the optional
 If you construct E-R diagrams in to the tables, then 4 NF and 5 NF need not be applied
on the table.
 Practically applied normalization is upto 3NF and very rarely we will go beyond that.
 2 NF dealing with the partial dependencies and 3NF is dealing with transitive
dependencies.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 17


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

First Normal Form (1NF): A relation is said to in the 1NF if it is already in un-normalized
form and it satisfies the following conditions or rules or qualifications are:
1. Each attribute name must be unique.
2. Each attribute value must be single or atomic i.e., Single Valued Attributes.
3. Each row / record must be unique.
4. There is no repeating group’s.
Example: How do we bring an un-normalized table into first normal form? Consider the
following relation:

Solution: This table is not in first normal form because the [Color] column can contain
multiple values. For example, the first row includes values "red" and "green." To bring this
table to first normal form, we split the table into two tables and now we have the resulting
tables:

Second Normal Form (2NF): A relation is said to be in 2NF, if it is already in 1st NF and it
has no Partial Dependency i.e., no non-prime attribute is dependent on the only a part of the
candidate key.
(OR)
A relation is in second normal form if it satisfies the following conditions:
• It is in first normal form
• All non-key attributes are fully functional dependent on the primary key.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 18


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Note: Partial Functional Dependency: If a non-prime attribute of the relation is getting


derived by only a part of the candidate key, then such dependency is known as Partial
Dependency

Example: Consider the following relation

➔This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this table does not satisfy second normal form.
➔ To bring this table to second normal form, we break the table into two tables, and now we
have the following:

Q1 Given relation R(ABCD) and F:{ABC, BD} Decompose in into 2NF.


from the given FDs determine primary key. Necessary attributes to include in the key
are A, B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
 AB is a primary key.
From the FDs BD is partially depending on AB. So decompose the table.
(D is a non-prime attribute derived by a part of the key)
ABCD

ABC BD
ABC BD

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 19


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Q2 Consider the relation R=ABCDEF and set of FDs are A FC, CD, B E Find the
key and normalize into 2NF.

Third Normal Form (3NF): A database is in third normal form if it satisfies the following
conditions:
• It is in 2NF.
• There is no transitive functional dependency
 By transitive functional dependency, we mean we have the following relationships in
the table: A is functionally dependent on B, and B is functionally dependent on C. In
this case, C is transitively dependent on A via B. and A non-key attribute is
depending on a non-key attribute.

Example: Consider the following relation.

➔ In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
➔ To bring this table to third normal form, we split the table into two as follows:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 20


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Q1 Given relation R(ABCDE) and F:{ABC, BD, DE} Decompose in into 3NF.
from the given FDs determine primary key. Necessary attributes to include in the key
are A, B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
= ABCDE (∵ D E)
 AB is a primary key.
From the FDs BD is partially depending on AB. So decompose the table.
(D is a non-prime attribute derived by a part of the key)
B+ = BDE

ABCD
B+
ABC BDE
ABC BD, DE
 table is in 2NF but not in 3NF. Because DE is transitive dependency.
(No non-key attribute should determining a non-key attribute)
D+ = DE
BDE
D+
BD DE
BD DE
 Table is 3NF.
The relations after decomposing into 3NF.
R1: ABC
R2: BD
R3: DE

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 21


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Q2 Given relation R=ABCDEFGHIJ and the set of FDs are AB C, ADE, BF, FGH,
D IJ Decompose R into 3NF.

Q3(a) Given a set of FDs for the relation schema R(ABCD) with primary key AB under
which R is 1NF but not in 2NF
(b) Find FDs such that R is in 2NF but not in 3NF

Sol: R=ABCD
Key=AB
(a) Atomic values are allowed in 1NF and partial dependency is not allowed in 2NF.
The following FDs are allowed.
B C, AC, B D, A D
(show the FDs which is having partial dependency)
(b) According to question partial dependencies are not allowed and transitivity
dependency is allowed. The following FDs are allowed.
C D, DC

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 22


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every
determinant should be a candidate key.
✓ BCNF is the advance version of 3NF. It is stricter than 3NF.
✓ A table is in 3NF if for every functional dependency X → Y, X is the super key of the table.
✓ For BCNF, the table should be in 3NF and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one
department. EMPLOYEE table:

emp_id emp_nationality emp_dept dept_type dept_no_of_emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

design and technical


1002 American D134 100
support

1002 American Purchasing department D134 600

➔ In the above table Functional dependencies are as follows: EMP_ID → EMP_COUNTRY


and EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO} Candidate key: {EMP-ID, EMP-DEPT}
➔ The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys. To
convert the given table into BCNF, we decompose it into three tables:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 23


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

FOR BCNF problems refer your note book.


Q1 Consider the relation schema R(A,B,C), which has the FD B → C. If A is a candidate key for R,
is it possible for R to be in BCNF? If so, under what conditions? If not, explain why not.
Sol : The only way R could be in BCNF is if B includes a key, i.e. B is a key for R

Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the
following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
◼ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.

Example

 The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID,
which leads to un-necessary repetition of data.
 So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 24


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

STUDENT_HOBBY

Concept of Surrogate Key:


✓ Alternate of Primary Key that allows duplication of data’s/records.
✓ Surrogate key is a unique identification key, it is like an artificial key to production key,
because the production key may be alphanumeric or composite key but the surrogate key is
always single numeric key.
✓ A surrogate key has the following characteristics:
i. The value is never reused and is unique within the whole system.
ii. It is system generated and an integer.
iii. The value cannot be manipulated by the user or application.
iv. The value is not an amalgam of different values from multiple domains.
✓ A Surrogate Keys can be generated in a variety of ways, and most databases offers ways to
generate surrogate keys.
Example: Oracle uses SEQUENCE,
MYSQL uses Auto_Increment,
and SQL Server uses IDENTITY.

Lossless join and Dependency preserving decomposition:

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 25


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 26


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 27


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Review Questions

1. What is Functional Dependency? Explain types and properties of FD’s.


2. What is a normal form? Explain about various normal forms with examples.
3. What is normalization? Differentiate between second normal form and third normal
form.
4. Explain briefly about 3NF, 4NF with suitable examples?
5. Explain about Boyce Codd normal form with an example.
6. Why normalization is needed? Explain the process of normalization.
7. How to compute closure of set of functional dependency? Explain with a suitable
example schema.
8. What is multi valued dependency? State and explain fourth normal form based on this
concept.
9. List and explain the inference rules of functional dependencies.
10. Explain insertion, deletion, and modification anomalies.
11. What is the importance of dependency preservation during decomposition? How to
achieve it?
12. Consider the relation R on attributes (ABCDE) with functional dependencies:
AB CDE, AC BDF, BC, CB, C D, BE
i) Determine a Key for relation R
ii) Find 3NF decomposition for R using normalization process

13. Give asset o FDs for the relation schema R(A,B,C,D) with primary key AB under which
R is in 1NF but not in 2NF.
14. Why is a relation that is in 3NF generally considered good? Explain.
15. Discuss about 4NF with suitable example.
16. What are the problems caused by redundantly storing information? Explain
17. Given Relation, R=(A,B,C,D,E,F,G) and Functional Dependencies
F={ {A,B}→{C}, { A,C}→{B}, {A,D}→{E}, {B}→{D}, { B,C}→{A}, {E}→{F}}.
Check whether the following decomposition of R into R1=(A,B,C), R2=(A,C,D,E) and
R3=(A,D,F) is satisfying the lossless Decomposition property.
18. What is dependency preservation property for decomposition? Explain why it is important.
19. Given a Relation R=(X,Y,Z) and Functional Dependencies are F={ {X,Y}→{Z}, {Z}→{X} }
Determine all Candidate keys of R and the normal form of R with proper explanation.
20. Define functional dependency? How can you compute the minimal cover for a set of
functional dependencies? Explain it with an example.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 28


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

21. Consider schema R = (A, B, C, G, H, I) and the set F of functional dependencies {AB, AC,
CG H, CGI, BH}. Compute the candidate keys of the schema. Compute the closure of the
same.
22. Explain 3NF & BCNF. What is the difference between them?
23. What is functional dependency? Explain its usage in database design.
24. What is a surrogate key? How can it be used for schema refinement?
25. How to compute closure of set of functional dependency? Explain with a suitable example schema.
26. What is multi valued dependency? State and explain fourth normal form based on this concept.
27. Given a set of FDs for the relation schema R(A,B,C,D) with Primary key AB, and D C or
C D or AC D or AD C or BC D or BD C. In which normal form is R?
28. Discuss the problems caused by redundancy and the purpose of normalization.
29. Give relation schemas for the following normal forms
i) 2NF but not in 3NF ii) 3NF but not in BCNF

References:

 Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
 C.J. Date, Introduction to Database Systems, Pearson Education.
 Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 29


Aditya College of Engineering & Technology
Aditya Nagar, ADB Road, Surampalem - 533437

DATABASE MANAGEMENT SYSTEMS

UNIT V: Transaction Concept & Indexing Techniques

Syllabus:
Transaction Concept: Transaction State, Implementation of Atomicity and Durability,
Concurrent Executions, Serializability, Recoverability, Implementation of Isolation, Testing for
Serializability, Failure Classification, Storage, Recovery and Atomicity, Recovery algorithm.
Indexing Techniques: B+ Trees: Search, Insert, Delete algorithms, File Organization and
Indexing, Cluster Indexes, Primary and Secondary Indexes , Index data Structures, Hash Based
Indexing: Tree base Indexing ,Comparison of File Organizations, Indexes and Performance
Tuning.

Objectives:
After studying this unit, you will be able to:
 Discuss the different types of ACID properties and its implementation.
 Describe concurrent Execution, Serializability and Recoverability.
 Understand physical design of a database system, by discussing Database indexing
techniques and storage techniques
 Examine issues in data storage and query processing and can formulate appropriate
solutions
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

5.1. Introduction
• A transaction is a unit of program execution that accesses and possibly updates
various data items.
• The transaction consists of all operations executed between the statements begin
and end of the transaction
• Transaction operations: Access to the database is accomplished in a transactionby
the following two operations:
 read (X): Performs the reading operation of data item X from the database
 write (X): Performs the writing operation of data item X to the database
• A transaction must see a consistent database
• During transaction execution the database may be inconsistent
• When the transaction is committed, the database must be consistent
• Two main issues to deal with:
 Failures, e.g. hardware failures and system crashes
 Concurrency, for simultaneous execution of multiple transactions
5.2 ACID Properties
 To preserve integrity of data, the database system must ensure:
• Atomicity: Either all operations of the transaction are properly reflected in the
database or none are
• Consistency: Execution of a transaction in isolation preserves the consistency of the
database
• Isolation: Although multiple transactions may execute concurrently, each transaction
must be unaware of other concurrently executing transactions; intermediate
transaction results must be hidden from other concurrently executed transactions
• Durability: After a transaction completes successfully, the changes it has made to
the database persist, even if there are system failures
Example of Fund Transfer: Let Ti be a transaction that transfers 50 from account A to B. This
transaction can be illustrated as follows

Transfer $50 from account A to B:


Ti : read(A)
A := A – 50
write(A)
read(B)
B := B + 50

write(B)

• Consistency: the sum of A and B is unchanged by the execution of the transaction.


• Atomicity: if the transaction fails after step 3 and before step 6, the system should
ensure that its updates are not reflected in the database, else an inconsistency will

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 2


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

result.
• Durability: once the user has been notified that the transaction has completed, the
updates to the database by the transaction must persist despite failures.
• Isolation: between steps 3 and 6, no other transaction should access the partially
updated database, or else it will see an inconsistent state (the sum A + B will be less
than it should be).

5.3 Transaction and Schedules

• A transaction is seen by the DBMS as a series, or list of actions. We therefore establish


a simple transaction model named as transaction states.

Transaction State: A transaction must be one of the following states:

• Active, the initial state; the transaction stays inthis


state while it is executing
• Partially committed, after the final statementhas
been executed.
• Committed, after successful completion.
• Failed: after the discovery that normal execution
can no longer proceed.
• Aborted: after the transaction has been rolled back
and the database restored to its state prior to the start
of the transaction.

5.4 Concurrent Execution and Schedules

Concurrent execution: executing transactions simultaneously has the followingadvantages:

■ increased processor and disk utilization, leading to better throughput


■ one transaction can be using the CPU while another is reading from orwriting to
the disk
■ reduced average response time for transactions: short transactions neednot wait
behind long ones
Concurrency control schemes: these are mechanisms to achieve isolation

■ to control the interaction among the concurrent transactions in order toprevent


them from destroying the consistency of the database
Schedules: sequences that indicate the chronological order in which instructionsof concurrent
transactions are executed

 a schedule for a set of transactions must consist of all instructions of thosetransactions


 must preserve the order in which the instructions appear in each individual transaction

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 3


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Example Schedules
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The
following is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.

Schedule 1

• Let T1 and T2 be the transactions defined previously. The following schedule is not a
serial schedule, but it is equivalent to above Schedule.

Schedule 2

• The following concurrent schedule does not preserve thevalue of the sum A + B

Schedule 3

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 4


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Serializable Schedule
 A serializable schedule over a set S of committed transactions is a schedule whose
effect on any consistent database is guaranteed to be identical to that of some
complete serial schedule over S. i.e., even though the actions of transactions are
interleaved, the result of executing transactions serially in different order may
produce different results.
 Example: The schedule shown in the following figure is serializable.

T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
R(B)
W(A)
Commi
t
Commi
t

Even though the actions of T1 and T2 are interleaved, the result of this schedule is
equivalent to first running T1 entirely and then running and T2 entirely. Actually T1‘s
read and write of B is not influenced by T2‘s actions on B, and the net effect is the same if
these actions are the serial schedule First T1, then T2. This schedule is also serializable if
first T2, then T1. Therefore if T1 and T2 are submitted concurrently to a DBMS, either of
these two schedules could be chosen as first
 A DBMS might sometimes execute transactions which is not a serial execution i.e., not
serializable.
 This can be happen for two reasons:
 First the DBMS might use a concurrency control method that ensures the
executed schedule itself.
 Second, SQL gives programmers the authority to instruct the DBMS tochoose
non-serializable schedule.
Anomalies due to Interleaved execution
 There are three main situations when the actions of two transactions T1 and T2
conflict with each other in the interleaved execution on the same data object.
■ Write-Read (WR) Conflict: Reading Uncommitted data.
■ Read-Write (RW) Conflict: Unrepeatable Reads
■ Write-Write (WW) Conflict: Overwriting Uncommitted Data.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 5


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

 Reading Uncommitted Data (WR Conflicts)


■ Dirty Read: The first source of anomalies is that a transaction T2 could read a
database object A that has been just modified by another transaction T1, which
has not yet committed, such a read is called a dirty read.
■ Example: Consider two transactions T1 and T2, where T1 stands for
transferring $100 from A to B and T2 stands for incrementing both A and B by
6% of their accounts. Suppose that their actions are interleaved as follows:
(i) T1 deducts $100 from account A, then immediately
(ii) T2 reads accounts of A and B adds 6% interest to each, and then,
(iii) T1 adds $100 to account B.
This corresponding schedule is illustrated as follows:

T1 T2
R(A)
A: = A -
100 W(A)
R(A)
A: = A + 0.06
A W(A)
R(B)
B:= B+.06
R(B) BW(B)
B: = B + Commit
100W(B)
Commit

The problem here is T2 has added incorrect 6% interest to each A and B.


Because before commitment that $100 is deducted from A, it has added 6% to
account A before commitment that $100 is credited to B, it has added 6% to
account B. thus, the result of this schedule is different from the result of the
other schedule which is serializable: first T1 then T2.

 Unrepeatable Reads (RW Conflicts)


■ The second source of anomalies is that a transaction T2 could change the value
ofan object A that has been read by a transaction T1 and T1 is still in progress.
This situation causes a problem that, if T1 tries to read the value of A again, it
will get a different result, even though it has not modified A in the meantime.
But, this situation could not arise in a serial execute of two transactions: this, it
is called as unrepeatable read.
■ Example: Suppose that both T1 and T2 reads the same value of A, say 5. Then
T1 has incremented A value to 6 but before commitment as A value 6, T2 has
decremented A value from 5 to 4. Thus, instead of answer of A value as 5, i.e.,
from to 5 we got an answer 4 which is incorrect.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 6


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

 Overwriting Uncommitted Data (WW Conflicts)


■ The third source of anomalies is that a transaction T2 could overwrite the value
of an object A, which has already been modified by a transaction T1, while
T1 isstill in progress.
■ Example: Suppose that A and B are two employees, and their salaries must be
kept equal. Transaction T1 sets their salaries to $1000 and transaction T2 sets
their salaries to $2000.

The following interleaving of the actions T1 and T2 occurs:


i) T1 sets A’s salary to $1000, at the same time, T2 sets B’s salary to $2000.
ii) T1 sets B’s salary is set to to $2000, at the same time, T2 sets A’s salary to
$2000.
As a result A’s salary is set to $2000 and B’s salary is set to $1000, i.e., the result
is not identical

■ Blind-Write: Neither transaction reads a value before writing it-such awrite is


called a blind-write.
The above example is the best example of blind write because T1 and T2

are concentrating only on writing but not on reading.

Schedules involving aborted Transactions


 All transactions of aborted transactions are to be undone, and we can therefore
imagine that they were never carried out to begin with.
 Example: Suppose that transaction T1 deducts $100 from account A then immediately
before committing A’s new value the transaction T2 reads the current values of
accounts A and B and adds 6% interest to each, then commits, but incidentally T1 is
aborted. So, we get incorrect result of transaction T2 because T1 was aborted in the
middle of the process and T2 has taken incorrect value of A by T1 and added 6%. We
say that such a schedule is Unrecoverable Schedule. The corresponding schedule is
shown as follows:

T1 T2
R(A)
A: = A -
100 W(A)
R(A)
A: = A +
0.06 A W(A)
R(B)
B:= B+.06
Abort B W(B)
Commit

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 7


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

 Whereas, a recoverable schedule is one in which transactions read only the changes of
committed transactions.

5.5 Serializability
 Basic Assumption – Each transaction, on its own, preserves database consistency
• i.e. serial execution of transactions preserves database consistency
 A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule
 Different forms of schedule equivalence give rise to the notions of conflict
serializability and view serializability
 Simplifying assumptions:
• ignore operations other than read and write instructions
• assume that transactions may perform arbitrary computations on data inlocal buffers
between reads and writes
• simplified schedules consist only of reads and writes

 Conflict Serializability
 Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
 Intuitively, a conflict between li and lj forces a (logical) temporal order between
them
 If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the ordering
 If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´ are conflict equivalent.
 We say that a schedule S is conflict serializable if it is conflict equivalent to a
serial schedule

• Example of a schedule that is not conflict serializable:

We are unable to swap instructions in the above schedule to obtain


either the serial schedule < T3, T4 >,or the serial schedule < T4, T3 >.
• Schedule 3 below can be transformed into Schedule 1,a
serial schedule where T2 follows T1, by series of swaps of

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 8


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

non-conflicting instructions.
Therefore Schedule 3 is conflict serializable.
 View Serializability
• Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met, where Q is a data item and Ti is a
transaction:
1. If Ti reads the initial value of Q in schedule S, then Ti must, in schedule
S´, also read the initial value of Q
2. If Ti executes read(Q) in schedule S, and that value was produced by
transaction Tj (if any), then transaction Ti must in schedule S´ also read the
value of Q that was produced by transaction Tj
3. The transaction (if any) that performs the final write(Q) operation inschedule S
(for any data item Q) must perform the final write(Q) operationin schedule S´
NB: View equivalence is also based purely on reads and writes
• A schedule S is view serializable it is view equivalent to a serial schedule
• Every conflict serializable schedule is also view serializable
• Schedule 9 (from book) — a schedule which is view-serializable but not conflict
serializable

Every view serializable schedule that is not conflict serializable has blind writes
Other Notions of Serializability
• This schedule produces the same outcome as the serial schedule < T1, T5 >
• However it is not conflict equivalent or view equivalent to it
• Determining such equivalence requires analysis of operations other than read and
write

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 9


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

 Testing for Serializability


• Consider some schedule of a set of transactions T1, T2, ..., Tn
• Precedence graph: a directed graph where the vertices are transaction names
• We draw an arc from Ti to Tj if the two transaction conflict, and Ti accessed thedata
item before Tj
• We may label the arc by the item that was accessed
• Example:

• Example Schedule and Precedence Graph

• A schedule is conflict serializable if and only if its precedence graph is acyclic


■ Cycle-detection algorithms exist which take order n2 time, where n is the number of
vertices in the graph
■ If precedence graph is acyclic, the serializability order can be obtained by a topological
sorting of the graph. This is a linear order consistent with the partial order of the
graph. For example, a serializability order for this graph is T2  T1
 T3  T4  T5

• The precedence graph test for conflict serializability must be modified to apply to a
test for view serializability
■ The problem of checking if a schedule is view serializable is NP-complete. Thus
existence of an efficient algorithm is unlikely. However practical algorithms that just
check some sufficient conditions for view serializability can still be used

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 10


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Example of an acyclic precedence graph

 Concurrency Control vs. Serializability Tests


• Goal – to develop concurrency control protocols that will ensure serializability
• These protocols will impose a discipline that avoids nonseralizable schedules
• A common concurrency control protocol uses locks
■ while one transaction is accessing a data item, no other transaction can modify it
■ require a transaction to lock the item before accessing it
■ two standard lock modes are “shared” (read-only) and “exclusive” (read-write)

5.6 Recoverability
 Need to address the effect of transaction failures on concurrently running
transactions.
 Recoverable schedule: if a transaction Tj reads a data item previously written by a
transaction Ti , the commit operation of Ti appears before the commit operation of
Tj
 The following schedule (Schedule 11) is not recoverable if T9 commits immediately
after the read

• If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence database must ensure that schedules are
recoverable
• Cascading rollback – a single transaction failure leads to a series of transaction rollbacks
• Consider the following schedule where none of the transactions has yet committed
(so the schedule is recoverable)
• If T10 fails, T11 and T12 must also be rolled back
• Can lead to the undoing of a significant amount of work

• Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions
Ti and Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 11


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

• Every cascadeless schedule is also recoverable


• It is desirable to restrict the schedules to those that are cascadeless

5.7 Implementation of Atomicity and Durability


 The recovery-management component of a database system implements the
support for atomicity and durability.

E.g. the shadow-database scheme:


all updates are made on a shadow copy of the database
db_pointer is made to point to the updated shadow copy after
the transaction reaches partial commit and all updated pages have been flushed to disk.
db_pointer always points to the current consistent copy of the database.
In case transaction fails, old consistent copy pointed to by db_pointer can be used, and the
shadow copy can be deleted.
The shadow-database scheme:
19Assumes that only one transaction is active at a time. Assumes disks do not fail
Does not handle concurrent transactions

5.8 Storage Structure

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 12


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Stable-Storage Implementation

5.9 File Organization


A file is organized logically as a sequence of records. These records are mapped onto disk
blocks. Files are provided as a basic construct in operating systems, so we shall assume the
existence of an underlying file system.
Each file is also logically partitioned into fixed-length storage units called blocks, which
are the units of both storage allocation and data transfer.
Most databases use block sizes of 4 to 8 kilobytes by default, but many databases allow
the block size to be specified when a database instance is created.
Two possible approaches to store records: • Record size is fixed • Record size is variable

Fixed-Length Records
As an example, let us consider a file of instructor records for our university database.
Each record of this file is defined (in pseudocode) as:

We allocate the maximum number of bytes that each attribute can hold. Then, the
instructor record is 53 bytes long.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 13


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

A simple approach is to use the first 53 bytes for the first record, the next 53 bytes for the
second record, and so on as shown below figure.

There are two problems with this simple approach:


1. Unless the block size happens to be a multiple of 53 (which is unlikely), some records will
cross block boundaries. That is, part of the record will be stored in one block and part in
another. It would thus require two block accesses to read or write such a record.
2. It is difficult to delete a record from this structure. The space occupied by the record to be
deleted must be filled with some other record of the file, or we must have a way of marking
deleted records so that they can be ignored.

To avoid the first problem, we allocate only as many records to a block as would fit
entirely in the block (this number can be computed easily by dividing the block size by the
record size, and discarding the fractional part). Any remaining bytes of each block are left
unused.

To avoid the second problem, When a record is deleted, we could move the record that
came after it into the space formerly occupied by the deleted record, and so on, until every
record following the deleted record has been moved ahead shown in below figure.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 14


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Such an approach requires moving a large number of records. It might be easier simply
to move the final record of the file into the space occupied by the deleted record shown in
below figure.

It is undesirable to move records to occupy the space freed by a deleted record, since
doing so requires additional block accesses. Since insertions tend to be more frequent than
deletions, it is acceptable to leave open the space occupied by the deleted record, and to wait
for a subsequent insertion before reusing the space.
A simple marker on a deleted record is not sufficient, since it is hard to find this
available space when an insertion is being done. Thus, we need to introduce an additional
structure.
At the beginning of the file, we allocate a certain number of bytes as a file header. The
header will contain a variety of information about the file. For now, all we need to store there
is the address of the first record whose contents are deleted.
We use this first record to store the address of the second available record, and so on.
Intuitively, we can think of these stored addresses as pointers, since they point to the location of
a record. The deleted records thus form a linked list, which is often referred to as a free list.
Below figure shows with the free list, after records 1, 4, and 6 have been deleted. On
insertion of a new record, we use the record.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 15


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Variable-Length Records
Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields.
• Record types that allow repeating fields, such as arrays or multisets.
Different techniques for implementing variable-length records exist. Two different problems
must be solved by any such technique:
• How to represent a single record in such a way that individual attributes can be extracted
easily.
• How to store variable-length records within a block, such that records in a block can be
extracted easily.

Representation of variable-length record


The representation of a record with variable-length attributes typically has two parts: an
initial part with fixed length attributes, followed by data for variable length attributes. Fixed-
length attributes, such as numeric values, dates, or fixed length character strings are allocated
as many bytes as required to store their value. Variable-length attributes, such as varchar
types, are represented in theinitial part of the record by a pair (offset, length), where offset
denotes where the data for that attribute begins within the record, and length is the length in
bytes of the variable-sized attribute. The values for these attributes are stored consecutively,
after the initial fixed-length part of the record. Thus, the initial part of the record stores a fixed
size of information about each attribute, whether it is fixed-length or variable-length.
An example of such a record representation is shown in below figure. The figure shows
an instructor record,whose first three attributes ID, name, and dept name are variable-length
strings, and whose fourth attribute salary is a fixed-sized number. We assume that the offset
and length values are stored in two bytes each, for a total of 4 bytes per attribute. The salary
attribute is assumed to be stored in 8 bytes, and each string takes as many bytes as it has
characters.

The figure also illustrates the use of a null bitmap, which indicates which attributes of the
record have a null value. In this particular record, if the salary were null, the fourth bit of the
bitmap would be set to 1, and the salary value stored in bytes 12 through 19 would be ignored.

Storing variable-length records in a block


The slotted-page structure is commonly used for organizing records within a block,
There is a header at the beginning of each block, containing the following information:
1. The number of record entries in the header.
2. The end of free space in the block.
3. An array whose entries contain the location and size of each record.
The actual records are allocated contiguously in the block, starting from the end of the
block. The free space in the block is contiguous, between the final entry in the header array,

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 16


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

and the first record. If a record is inserted, space is allocated for it at the end of free space, and
an entry containing its size and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to deleted (its
size is set to −1, for example). Further, the records in the block before the deleted record are
moved, so that the free space created by the deletion gets occupied, and all free space is again
between the final entry in the header array and the first record. The end-of-free-space pointer
in the header is appropriately updated as well. Records can be grown or shrunk by similar
techniques, as long as there is space in the block. The cost of moving the records is not too
high, since the size of a block is limited: typical values are around 4 to 8 kilobytes.

5.10 Organization of Records in Files


A relation is a set of records. Given a set of records, the next question is how to organize
them in a file. Several of the possible ways of organizing records in files are:
• Heap file organization. Any record can be placed anywhere in the file where there is space
for the record. There is no ordering of records. Typically, there is a single file for each
relation.
• Sequential file organization. Records are stored in sequential order, according to the value
of a “search key” of each record.
• Hashing file organization. A hash function is computed on some attribute of each record.
The result of the hash function specifies in which block of the

Sequential File Organization

A sequential file is designed for efficient processing of records in sorted order based on
some search key. A search key is any attribute or set of attributes; it need not be the primary
key, or even a superkey. To permit fast retrieval of records in search-key order, we chain
together records by pointers. The pointer in each record points to the next record in search-key
order. Furthermore, to minimize the number of block accesses in sequential file processing, we
store records physically in search-key order, or as close to search-key order as possible.
Blow figure shows a sequential file of instructor records taken from our university example.
In that example, the records are stored in search-key order, using ID as the search key.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 17


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

It is difficult, however, to maintain physical sequential order as records are inserted and
deleted, since it is costly to move many records as a result of a single insertion or deletion. We
can manage deletion by using pointer chains, as we saw previously. For insertion, we apply
the following rules:
1. Locate the record in the file that comes before the record to be inserted in search-key order.
2. If there is a free record (that is, space left after a deletion) within the same block as this
record, insert the new record there. Otherwise, insert the new record in an overflow block. In
either case, adjust the pointers so as to chain together the records in search-key order.
Below figure shows the record after the insertion of the record (32222, Verdi, Music, 48000).

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 18


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

5.11 Introduction to Indexing

An index for a file in a database system works in much the same way as the index in this
textbook. If we want to learn about a particular topic (specified by a word or a phrase) in this
textbook, we can search for the topic in the index at the back of the book, find the pages where
it occurs, and then read the pages to find the information for which we are looking.
Database-system indices play the same role as book indices in libraries. For example, to
retrieve a student record given an ID, the database system would look up an index to find on
which disk block the corresponding record resides, and then fetch the disk block, to get the
appropriate student record.
There are two basic kinds of indices:
• Ordered indices. Based on a sorted ordering of the values.
• Hash indices. Based on a uniform distribution of values across a range of buckets. The
bucket to which a value is assigned is determined by a function, called a hash function.

Index Evaluation Metrics


• Access types: The types of access that are supported efficiently. Access types can include
finding records with a specified attribute value and finding records whose attribute values
fall in a specified range.
• Access time: The time it takes to find a particular data item, or set of items, using the
technique in question.
• Insertion time: The time it takes to insert a new data item. This value includes the time it
takes to find the correct place to insert the new data item, as well as the time it takes to
update the index structure.
• Deletion time: The time it takes to delete a data item. This value includes the time it takes
to find the item to be deleted, as well as the time it takes to update the index structure.
• Space overhead: The additional space occupied by an index structure. Provided that the
amount of additional space is moderate, it is usually worthwhile to sacrifice the space to
achieve improved performance.

An attribute or set of attributes used to look up records in a file is called a search key.

5.12 Ordered Indices


To gain fast random access to records in a file, we can use an index structure. Each
index structure is associated with a particular search key. Just like the index of a book or a
library catalog, an ordered index stores the values of the search keys in sorted order, and
associates with each search key the records that contain it.
A file may have several indices, on different search keys. If the file containing the
records is sequentially ordered, a clustering index is an index whose search key also defines
the sequential order of the file.
Clustering indices are also called primary indices; the term primary index may appear
to denote an index on a primary key, but such indices can in fact be built on any search key.
Indices whose search key specifies an order different from the sequential order of the
file are called nonclustering indices, or secondary indices.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 19


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Dense and Sparse Indices


An index entry, or index record, consists of a search-key value and pointers to one or more
records with that value as their search-key value. The pointer to a record consists of the
identifier of a disk block and an offset within the disk block to identify the record within the
block.
There are two types of ordered indices that we can use:
• Dense index: In a dense index, an index entry appears for every search-key value in the file.
In a dense clustering index, the index record contains the search-key value and a pointer to the
first data record with that search-key value. The rest of the records with the same search-key
value would be stored sequentially after the first record, since, because the index is a
clustering one, records are sorted on the same search key. In a dense nonclustering index, the
index must store a list of pointers to all records with the same search-key value.
• Sparse index: In a sparse index, an index entry appears for only some of the search-key
values. Sparse indices can be used only if the relation is stored in sorted order of the search
key, that is, if the index is a clustering index. As is true in dense indices, each index entry
contains a search-key value and a pointer to the first data record with that search-key value. To
locate a record, we find the index entry with the largest search-key value that is less than or
equal to the search-key value for which we are looking. We start at the record pointed to by
that index entry, and follow the pointers in the file until we find the desired record.

Dense index

Sparse index

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 20


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Multilevel Indices
If an index is small enough to be kept entirely in main memory, the search time to find
an entry is low. However, if the index is so large that not all of it can be kept in memory, index
blocks must be fetched from disk when required. (Even if an index is smaller than the main
memory of a computer, main memory is also required for a number of other tasks, so it may
not be possible to keep the entire index in memory.) The search for an entry in the index then
requires several disk-block reads.
We treat the index just as we would treat any other sequential file, and construct a
sparse outer index on the original index, which we now call the inner index, as shown in
below figure. Note that the index entries are always in sorted order, allowing the outer index
to be sparse. To locate a record, we first use binary search on the outer index to find the record
for the largest search-key value less than or equal to the one that we desire. The pointer points
to a block of the inner index. We scan this block until we find the record that has the largest
search-key value less than or equal to the one that we desire. The pointer in this record points
to the block of the file that contains the record for which we are looking.

Two level sparse index

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 21


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Index Update
Regardless of what form of index is used, every index must be updated whenever a record is
either inserted into or deleted from the file.
We first describe algorithms for updating single-level indices.
• Insertion. First, the system performs a lookup using the search-key value that appears in the
record to be inserted. The actions the system takes next depend on whether the index is dense
or sparse:
◦ Dense indices:
1. If the search-key value does not appear in the index, the system inserts an index entry
with the search-key value in the index at the appropriate position.
2. Otherwise the following actions are taken:
a. If the index entry stores pointers to all records with the same search key value,
the system adds a pointer to the new record in the index entry.
b. Otherwise, the index entry stores a pointer to only the first record with the
search-key value. The system then places the record being inserted after the other
records with the same search-key values.
◦ Sparse indices: We assume that the index stores an entry for each block. If the system
creates a new block, it inserts the first search-key value (in search-key order) appearing
in the new block into the index. On the other hand, if the new record has the least
search-key value in its block, the system updates the index entry pointing to the block;
if not, the system makes no change to the index.
• Deletion. To delete a record, the system first looks up the record to be deleted. The actions
the system takes next depend on whether the index is dense or sparse:
◦ Dense indices:
1. If the deleted record was the only record with its particular search-key value, then the
system deletes the corresponding index entry from the index.
2. Otherwise the following actions are taken:
a. If the index entry stores pointers to all records with the same search key value,
the system deletes the pointer to the deleted record from the index entry.
b. Otherwise, the index entry stores a pointer to only the first record with the
search-key value. In this case, if the deleted record was the first record with the
search-key value, the system updates the index entry to point to the next record.
◦ Sparse indices:
1. If the index does not contain an index entry with the search-key value of the deleted
record, nothing needs to be done to the index.
2. Otherwise the system takes the following actions:
a. If the deleted record was the only record with its search key, the system
replaces the corresponding index record with an index record for the next search-
key value (in search-key order). If the next search-key value already has an index
entry, the entry is deleted instead of being replaced.
b. Otherwise, if the index entry for the search-key value points to the record
being deleted, the system updates the index entry to point to the next record with
the same search-key value.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 22


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Secondary Indices
Secondary indices must be dense, with an index entry for every search-key value, and a
pointer to every record in the file. A clustering index may be sparse, storing only some of the
search-key values, since it is always possible to find records with intermediate search-key
values by a sequential access to a part of the file, as described earlier. If a secondary index
stores only some of the search-key values, records with intermediate search-key values may be
anywhere in the file and, in general, we cannot find them without searching the entire file.
We can use an extra level of indirection to implement secondary indices on search keys
that are not candidate keys. The pointers in such a secondary index do not point directly to the
file. Instead, each points to a bucket that contains pointers to the file.
Below figure shows the structure of a secondary index that uses an extra level of
indirection on the instructor file, on the search key salary.

5.13 B+ Tree Index Files


The main disadvantage of the index-sequential file organization is that performance
degrades as the file grows, both for index lookups and for sequential scans through the data.
The B+ tree index structure is the most widely used of several index structures that
maintain their efficiency despite insertion and deletion of data.
A B+ tree index takes the form of a balanced tree in which every path from the root of the
tree to a leaf of the tree is of the same length. Each non leaf node in the tree has between n/2
and n children, where n is fixed for a particular tree.

Structure of a B+ Tree

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 23


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 24


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Queries on B+ Trees

Let us consider how we process queries on a B+-tree. Suppose that we wish to find
records with a search-key value of V.
Intuitively, the function starts at the root of the tree, and traverses the tree down until it
reaches a leaf node that would contain the specified value if it exists in the tree. Specifically,
starting with the root as the current node, the function repeats the following steps until a leaf
node is reached. First, the current node is examined, looking for the smallest i such that search-
key value Ki is greater than or equal to V. Suppose such a value is found; then, if Ki is equal to
V, the current node is set to the node pointed to by Pi+1, otherwise Ki > V, and the current node
is set to the node pointed to by Pi. If no such value Ki is found, then clearly V > Km−1, where Pm
is the last non null pointer in the node. In this case the current node is set to that pointed to by
Pm. The above procedure is repeated, traversing down the tree until a leaf node is reached.
At the leaf node, if there is a search-key value equal to V, let Ki be the first such value;
pointer Pi directs us to a record with search-key value Ki. The function then returns the leaf
node L and the index i. If no search-key with value V is found in the leaf node, no record with
key value V exists in the relation, and function find returns null, to indicate failure.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 25


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Updates on B+ Trees
When a record is inserted into, or deleted from a relation, indices on the relation must
be updated correspondingly.

Insertion

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 26


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 27


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

Deletion

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 28


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 29


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

5.14 Hash Organization


For a huge database structure, it can be almost next to impossible to search all the index
values through all its level and then reach the destination data block to retrieve the desired
data. Hashing is an effective technique to calculate the direct location of a data record on the
disk without using index structure.
Hashing uses hash functions with search keys as parameters to generate the address of a data
record.
Hash Organization
• Bucket − A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store one
or more records.
• Hash Function − A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes
the same address. For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number of buckets
provided remains unchanged at all times.

Operation
• Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.
Bucket address = h(K)
• Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
• Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 30


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

• Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.

Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-demand. Dynamic hashing is also
known as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a
few are used initially.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 31


DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION

5.15 COMPARISON OF THREE FILE ORGANIZATIONS


Refer text book
Database Management Systems, 3/e, Raghurama Krishnan, Johannes Gehrke, TMH
Chapter 8 File Organizations & Indexes
Pages from 232 to 236.

Review Questions

1. Explain about the measures that are to be considered for comparing the performance of
various file organization techniques.
2. Explain in detail B+ tree file organization.
3. Write short notes on: i) Primary index ii) Clustered index iii) Secondary index.
4. Explain various anomalies that arise due to interleaved execution of transactions with
suitable examples.
5. What is static hashing? What rules are followed for index selection?
6. Define transaction and explain desirable properties of transactions.
7. What is database Recovery? Explain Shadow paging in detail.
8. Explain about Conflict Serializability and view serializability.
9. Explain the following a) Concurrent executions, b) Transaction states.

References:

• Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
• C.J. Date, Introduction to Database Systems, Pearson Education.
• Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.

ADITYA COLLEGE OF ENGINEERING AND TECHNOLOGY 32

You might also like