DBMS (R20)
DBMS (R20)
UNIT I: Introduction
Syllabus:
Introduction: Database system, Characteristics (Database Vs File System), Database
Users (Actors on Scene, Workers behind the scene), Advantages of Database systems,
Database applications. Brief introduction of different Data Models; Concepts of Schema,
Instance and data independence; Three tier schema architecture for data independence;
Database system structure, environment, Centralized and Client Server architecture for
the database.
Objectives:
After studying this unit, you will be able to:
Define database management system
Explain database system applications
State the characteristics and the database approach
Understand different data models
Discuss the advantages and disadvantages of database Discuss the database
architecture
DATABASE MANAGEMENT SYSTEMS UNIT – I : INTRODUCTION
Introduction
The information storage and retrieval has become very important in our day-to-day life. The
old era of manual system is no longer used in most of the places. For example, to book your
airline tickets or to deposit your money in the bank the database systems may be used. The
database system makes most of the operations automated. A very good example for this is the
billing system used for the items purchased in a super market. Obviously this is done with the
help of a database application package. Inventory systems used in a drug store or in a
manufacturing industry are some more examples of database. We can add similar kind of
examples to this list.
Apart from these traditional database systems, more sophisticated database systems are used in
the Internet where a large amount of information is stored and retrieved with efficient search
engines. For instance, [Link] is a famous web site that enables users to
search for their favorite information on the net. In a database we can store starting from text
data to very complex data like audio, video, etc.
Data
Data is the raw material from which useful information is derived. The word data is the plural
of Datum. Data is commonly used in both singular and plural forms. It is defined as raw facts
or observations. It takes variety of forms, including numeric data, text and voice and images.
Data is a collection of facts, which is unorganized but can be made organized into useful
information. The term Data and Information come across in our daily life and are often
interchanged.
Example: Weights, prices, costs, number of items sold etc.
Information
Data that have been processed in such a way as to increase the knowledge of the person who
uses the data. The term data and information are closely related. Data are raw material
resources that are processed into finished information products. The information as data that
has been processed in such way that it can increase the knowledge of the person who uses it.
In practice, the database today may contain either data or information.
Data Processing
The process of converting the facts into meaningful information is known as data processing.
Data processing is also known as information processing.
Metadata
Data that describe the properties or characteristics of other data.
Data is only become useful when placed in some context. The primary mechanism for
providing context for data is Metadata. Metadata are data that describe the properties, or
characteristics of other data. Some of these properties include data definition, data structures
and rules or constraints. The Metadata describes the properties of data but do not include that
data.
1. Not enough primary memory to process large data sets. If data is maintained in other
storage devices like disks, tapes and bringing relevant data to main memory, it increases
the cost of performance. Problem in accessing the large data due to addressing the data
using 32 bit or 64 bit mode addressing mechanism.
2. Programs must be written to process the user request to process the data stored in files
which are complex in nature because of large volume of data to be searched.
3. Inconsistent data and complexity in providing concurrent accesses.
4. Not sufficiently flexible to enforce security policies in which different users have
permission to access different subsets of the data.
A DBMS is a piece of software that is designed to make the preceding tasks easier. By storing
data in a DBMS, rather than as a collection of operating system Files, we can use the DBMS's
features to manage the data in a robust and efficient manner.
One of the main advantages of using a database management system is that the organization
can exert via the DBA, centralized management and control over the data. The database
administrator is the focus of the centralized control.
The following are the major advantages of using a Database Management System (DBMS):
Data independence: Application programs should be as independent as possible from details
of data representation and storage. The DBMS can provide an abstract view of the data to
insulate application code from such details.
Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store and
retrieve data efficiently. This feature is especially important if the data is stored on external
storage devices.
Data integrity and security: The DBMS can enforce integrity constraints on the data. The
DBMS can enforce access controls that govern what data is visible to different classes of users.
Data administration: When several users share the data, centralizing the administration of data
can offer significant improvements. It can be used for organizing the data representation to
minimize redundancy and for fine-tuning the storage of the data to make retrieval efficient.
Concurrent access and crash recovery: A DBMS schedules concurrent accesses to the data in
such a manner that users can think of the data as being accessed by only one user at a time.
Further, the DBMS protects users from the effects of system failures. .
Reduced application development time: Clearly, the DBMS supports many important
functions that are common to many applications accessing data stored in the DBMS.
The disadvantage of the DBMS system is overhead cost. The processing overhead introduced
by the DBMS to implement security, integrity, and sharing of the data causes a degradation of
the response and throughput times. An additional cost is that of migration from a traditionally
separate application environment to an integrated one.
Even though centralization reduces duplication, the lack of duplication requires that the
database be adequately backup so that in the case of failure the data can be recovered.
Backup and recovery operations are complex in a DBMS environment, and this is an increment
in a concurrent multi-user database system. A database system requires a certain amount of
controlled redundancies and duplication to enable access to related data items.
A schema is a description of a particular collection of data, using the given data model. The
relational model of data is the most widely used model today.
Conceptual (high-level, semantic) data models: Provide concepts that are close to the way
many users perceive data (Also called entity-based or object-based data models).
Physical (low-level, internal) data models: Provide concepts that describe details of how data
is stored in the computer.
Implementation (representational) data models: Provide concepts that fall between the
above two.
1. Hierarchical models:
Advantages:
Hierarchical model is simple to construct and operate on.
Corresponds to a number of natural hierarchical organized domains – e.g., assemblies in
manufacturing, personal organization in companies.
Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT etc.,
Disadvantages:
Navigational and procedural nature of processing.
Database is visualized as a linear arrangement of records.
Little scope for “query optimization”.
One-to-many relationships.
2. Network model:
Advantages:
3. Relational model:
A relation, basically a table with rows and columns.
Every relation has a schema, which describes the columns, or fields.
Student information in a university database may be stored in a relation with the following
schema
Students (sid: string, name: string, login: string, age: integer, gpa: real)
Conceptual schema:
The conceptual schema(also called as logical schema) describes the stored data in terms of the
data model of the DBMS.
In a relational DBMS, the conceptual schema describes all relations that are stored in the
database.
In our sample university database, these relations contain information about entities, such as
students and faculty, and about relationships, such as students’ enrollment in courses.
Students(sid: string, name: string, login: string, age: integer, gpa:
real)
Faculty(fid: string, fname: string, salary : real)
Courses(cid: string, cname: string, credits: integer)
Rooms(nw: integer, address: string, capacity: integer)
Enrolled (sid: string, cid: string, grade: string)
Teaches (fid: string, cid: string)
The choice of relations, and the choice of fields for each relation, is not always obvious, and
the process of arriving at a good conceptual schema is called conceptual database design.
Physical Schema:
External Schema:
This schema allows data access to be customized at the level of individual users or groups of
users.
A database has exactly one conceptual schema and one physical schema, but it may have
several external schemas.
An external schema is a collection of one or more views and relations from the conceptual
schema.
A view is conceptually a relation, but the records in a view are not stored in the DBMS.
metadata.
4. Transaction Manager: Ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
5. File Manager: Manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
6. Buffer Manager: Is responsible for fetching data from disk storage into main memory and
deciding what data to cache in memory.
Also some data structures are required as part of the physical system implementation:
1. Data Files: The data files store the database by itself.
2. Data Dictionary: It stores metadata about the structure of the database, as it is used heavily.
3. Indices: It provides fast access to data items that hold particular values.
4. Statistical Data: It stores statistical information about the data in the database. This
information used by the query processor to select efficient ways to execute a query.
In addition to end users and implementors, two other classes of people are associated with a
DBMS: application programmers and database administrators (DBAs).
Database application programmers develop packages that facilitate data access for end users,
who are usually not computer professionals, using the host or data languages and software tools
that DBMS vendors provide.
The task of designing and maintaining the database is entrusted to a professional called the
database administrator.
Design of the conceptual and physical schemas: The DBA is responsible for interacting with
the users of the system to understand what data is to be stored in the DBMS and how it is
likely to be used. Based on this knowledge, the DBA must design the conceptual schema
(decide what relations to store) and the physical schema (decide how to store them).
Security and authorization: The DBA is responsible for ensuring that unauthorized data
access is not permitted. In general, not everyone should be able to access all the data. In a
relational DBMS, users can be granted permission to access only certain views and relations.
Data availability and recovery from failures: The DBA must take steps to ensure that if the
system fails, users can continue to access as much of the uncorrupted data as possible.
Database tuning: The needs of users are likely to evolve with time. The DBA is responsible
for modifying the database, in particular the conceptual and physical schemas, to ensure
adequate performance as user requirements change.
Defining a database involves specifying the data types, structures, and constraints for the data to
be stored in the database.
Constructing the database is the process of storing the data itself on some storage medium that is
controlled by the DBMS.
Manipulating a database includes such functions as querying the database to retrieve specific
data, updating the database to reflect changes in the mini world, and generating reports from the
data.
Sharing a database allows multiple users and programs to access the database concurrently.
Other important functions provided by the DBMS include protecting the database and maintaining
it over a long period of time.
Protection includes both system protection against hardware or software malfunction (or crashes),
and security protection against unauthorized or malicious access. A typical large database may
have a life cycle of many years, so the DBMS must be able to maintain the database system by
allowing the system to evolve as requirements change over time. We can call the database and
DBMS software together a database system.
The architecture of a DBMS can be seen as either single tier or multi-tier. The tiers are classified as
follows:
1-tier architecture
2-tier architecture
3-tier architecture
n-tier architecture
1-tier architecture:
One-tier architecture involves putting all of the required components for a software application
or technology on a single server or platform.
2-tier architecture:
The two-tier is based on Client Server architecture. The two-tier architecture is like client server
application. The direct communication takes place between client and server. There is no
intermediate between client and server.
3-tier architecture:
A 3-tier architecture separates its tiers from each other based on the complexity of the users and
how they use the data present in the database. It is the most widely used architecture to design a
DBMS.
As prices of hardware declined, most users replaced their terminals with personal computers
(PCs) and workstations. At first, database systems used these computers in the same way as they
had used display terminals, so that the DBMS itself was still a centralized DBMS in which all the
DBMS functionality, application program execution, and user interface processing were carried
out on one machine.
Gradually, DBMS systems started to exploit the available processing power at the user side,
which led to client/server DBMS architectures.
The resources provided by specialized servers can be accessed by many client machines. The
client machines provide the user with the appropriate interfaces to utilize these servers, as well as
with local processing power to run local applications. This concept can be carried over to
software, with specialized software-such as a DBMS or a CAD (computer-aided design) package
being stored on specific server machines and being made accessible to multiple clients.
requires access to additional functionality-such as database access-that does not exist at that
machine, it connects to a server that provides the needed functionality.
A server is a machine that can provide services to the client machines, such as file access,
printing, archiving, or database access. In the general case, some machines install only client
software, others only server software, and still others may include both client and server
software. However, it is more common that client and server software usually run on separate
machines.
In client/server architecture, the user interface programs and application programs can run on
the client side. When DBMS access is required, the program establishes a connection to the DBMS
(which is on the server side); once the connection is created, the client program can communicate
with the DBMS. A standard called Open Database Connectivity (ODBC) provides an application
programming interface (API), which allows client-side programs to call the DBMS, as long as
both client and server machines have the necessary software installed. Most DBMS vendors
provide ODBC drivers for their systems.
Review Questions
13. Who are the different database users? Explain their interfaces to database management
system.
14. Describe the client server architecture for the database with necessary diagram.
15. Define Schema. Explain three level architecture in DBMS.
16. Explain Data Independence and its types in detail.
17. How does DBMS provide data abstraction? Explain the concept of data independence.
18. With a neat diagram describe the overall system structure of DBMS.
References:
Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
C.J. Date, Introduction to Database Systems, Pearson Education.
Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.
Syllabus:
Relational Model: Introduction to relational model, concepts of domain, attribute, tuple,
relation, importance of null values, constraints (Domain, Key constraints, integrity constraints)
and their importance
BASIC SQL: Simple Database schema, data types, table definitions (create, alter), different
DML operations (insert, delete, update), basic SQL querying (select and project) using where
clause, arithmetic & logical operations, SQL functions (Date and Time, Numeric, String
conversion).
Objectives:
After studying this unit, you will be able to:
Describe Relational model
Describe SQL and data types
Explain the basic structures of SQL queries
Know how to create tables
Realise aggregate functions and null values
DATABASE MANAGEMENT SYSTEMS UNIT – II : RELATIONAL MODEL & SQL
2.1 Introduction
Relational Model was proposed by E.F Codd to model data in the form of relations or tables. After
designing the conceptual model of database using ER diagram, we need to convert the conceptual
model in the relational model which can be implemented using any RDBMS (Relational Data Base
Management System) like SQL, MY SQL etc.
The relational model is very simple and elegant; a database is a collection of one or more relations,
where each relation is a table with rows and columns.
This simple tabular representation enables even new users to understand the contents of a
database, and it permits the use of simple, high-level languages to query the data.
ROLL_NO
1
Null values: The value which is not known or unavailable is called NULL VALUE. It is represented
by blank space.
Cardinality: The number of tuples are present in the relation is called as its cardinality.
Relation
A relation is defined as a set of tuples and attributes.
A relation consists of Relation schema and relation instance.
Relation schema: A relation schema represents the name of the relation with its attributes.
Ex: STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is Relation schema for
STUDENT.
Relation instance: The set of tuples of a relation at a particular instance of a time is called
Relation Instance.
An instance of „Employee „relation
2.5 Constraints
On modeling the design of the relational data base, we can put some rules(conditions) like
what values are allowed to be inserted in the relation
Constraints are the rules enforced on the data columns of a table. These are used to limit the
type of data that can go in to a table
This Ensure the accuracy and reliability of the data in the database. Constraints could be
either on a column level on a table level.
1. Not Null:
Null represents a record where data may be missing data or data for that record may be optional.
Once not null is applied to a particular column, you cannot enter null values to that column.
A not null constraint cannot be applied at table level.
Example:
Create table EMPLOYEE (id int Not null, name varchar Not null, Age int
not null, address char (25), salary decimal (18,2), primary key(id));
In the above example we have applied not null on three columns id, name and age which
means whenever a record is entered using insert statement all three columns should contain a
value other than null.
We have two other columns address and salary, where not null is not applied which means
that you can leave the row as empty.
2. Unique:
Some times we need to maintain only. Unique data in the column of a database table, this is
possible by using a Unique constraint.
Example:
Create table PERSONS (id int unique, last_name varchar (25) not null,
First name varchar (25), age int);
In the above example, as we have used unique constraint on ID column we are not supposed
to enter the data that is already present, simply no two ID values are same.
3. Default:
When a column is specified as default with same value then all the rows will use the same
value i.e., each and every time while entering the data we need not enter that value.
But default column value can be customised i.e., it can be over ridden when inserting a data
for that row based on the requirement.
Example:
Create table EMPLOYEE (id int Not null, last_name varchar (25) Not null,
first_name varchar (25), Age int, city varchar (25) Default Hyderabad);
As a result, whenever you insert a new row each time you need not enter a value for this
default column that is entering a column value for a default column is optional.
4. Check:
Check constraint ensures that the data entered by the user for that column is within the range
of values or possible values specified.
Example: Create table STUDENT (id int, name varchar (25), age int,
check(age>=18));
As we have used a check constraint as (age>=18) which means value entered by user for this
age column while inserting the data must be less than or equal to 18.
5. Primary Key:
A primary key is a constraint in a table which uniquely identifies each row record in a
database table by enabling one or more column in the table as primary key.
A particular column is made as a primary key column by using the primary key keyword
followed by the column name.
Example:
Create table EMP (ID int, name varchar (20), age int, course varchar
(10), Primary key (ID));
Here we have used the primary key on ID column then ID column must contain unique
values i.e., one ID cannot be used for another student.
6. Foreign Key:
The foreign key constraint is a column or list of columns which points to the primary key
column of another table.
The main purpose of the foreign key is only those values are allowed in the present table that
will match to the primary key column of another table.
From the above two tables, COURSE_ID is a primary key of the table STUDENT_MARKS and also
behaves as a foreign key as it is same in STUDENT_DETAILS and STUDENT_MARKS.
Example:
(Reference Table)
Create table CUSTOMER1 (id int, name varchar (25), course varchar (10),
primary key (ID));
(Child table)
These constraints are used to ensure the uniqueness of each record or row in the data
table.
Entity Integrity constraints says that no primary key can take NULL VALUE, since
using primary key we identify each tuple uniquely in a relation.
Example:
Explanation:
In the above relation, EID is made primary key, and the primary key can‟t
take NULL values but in the 3rd tuple, the primary key is NULL, so it is
violating Entity integrity constraints.
The referential integrity constraint is specified between two relations or tables and used
to maintain the consistency among the tuples in two relations.
This constraint is enforced through foreign key, when an attribute in the foreign key of
relation R1 have the same domain as primary key of relation R2, then the foreign key of
R1 is said to reference or refer to the primary key of relation R2.
The values of the foreign key in a tuple of relation R1 can either take the values of the
primary key for some tuple in Relation R2, or can take NULL values, but can‟t be empty.
Explanation:
In the above, DNO of the first relation is the foreign key and DNO in the second relation is the
primary key
DNO=22 in the foreign key of the first relation is not available in the second relation so, since
DNO=22 is not defined in the primary key of the second relation therefore Referential
integrity constraints is violated here.
SQL stands for Structure Query Language it is used for storing and managing data in
relational database management system.
It is standard language for relational database system. It enables a user to create, read, update
and delete relational databases and tables.
All the RDBMS like MYSQL, Oracle, MA access and SQL Server use SQL as their standard
database language.
SQL allows users to Query the database in a number of ways using statements like common
English.
Rules: SQL follows following rules
2. Data Manipulation Language (DML): used to update, store and retrieve data from tables.
3. Data Control Language (DCL): used to control the access of database created using DDL and
DML.
Every column is required to have a name and data type in the database table.
SQL DATA
TYPES
Binary data Numeric data Extract String data Date data type
type type numeric data type
type
1. BINARY DATATYPES:
There are three types of binary data types which are given below
2. NUMERIC DATATYPE:
DATA TYPE FROM TO DESCRIPTION
5. STRING DATATYPE:
DATA TYPE DESCRIPTION
Char It has a maximum length of 8000 characters. It contains fixed-length non-
Unicode characters.
Varchar It has a maximum length of 8000 characters. It contains variable-length
non-Unicode characters.
Text It has a maximum length of 2,147,483,647 characters. It contains variable-
length non-Unicode characters.
[Link] table: SQL create table is used to create a table in the database. To define the table, you
should define the name of the table and also define its column and column‟s data type.
SYNTAX:
“column2” “datatype”,
“column3” “datatype”,
….
“column N” “datatype”);
EXAMPLE:
SQL > create table employee (emp_id int, emp_name varchar (25), phone_no int,
address char (30));
If you create the table successfully, you can verify the table by looking at the message by the
sql server. else you can use DESC command as follows
2. ALTER TABLE:
SYNTAX:
EXAMPLE:
3. DROP TABLE:
SYNTAX :
EXAMPLE:
1. Insert:
SQL insert statement is a sql query. It is used to insert a single multiple records in a table.
Syntax:
NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Ramya 503 Nellore
2. Update:
The SQL Commands update are used to modify the data that is already in the database.
SQL Update statement is used to change the data of records held by tables which rows is to
be update, it is decided by condition to specify condition, we use “WHERE” clause.
The update statement can be written in following form:
Syntax:
Update table_name set column_name=expression where condition;
Example:
Let‟s take an example: here we are going to update an entry in the table.
NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore
3. Delete:
The SQL delete statement is used to delete rows from a table.
Generally, delete statement removes one or more records from a table.
Syntax:
Example:
NAME ID CITY
Deepti 502 Guntur
Rasi 503 Nellore
2.18 Basic SQL querying (select and project) using where clause:
The following are the various SQL clauses:
SQL Clause
1. Group by:
SQL group by statement is used to arrange identical data into groups.
The group by statement is used with the SQL select statement.
The group by statement follows the WHERE clause in a SELECT statement and precedes the
ORDER BY clause.
Syntax:
Select column from table_name where column group by column, order by
column;
Example:
Select company count (*) from product group by company;
Output:
Com 1 2
Com 2 3
Com 3 5
2. Having clause:
Having clause is used to specify a search condition for a group or an aggregate.
Having clause is used in a group by clause, if you are not using group by clause then you can
use having function like a where clause.
Syntax:
Select column1, column2 from table_name
Where conditions
Having conditions
Example:
select company count (*) from product
Group by company
Output:
Com 3 5
Com 2 2
3. Order by clause:
The order by clause sorts the result _set in ascending or descending order.
Syntax:
Where condition
Sample table:
Example:
Output:
NAME ID CITY
Alekhya 501 Hyderabad
Deepti 502 Guntur
Rasi 503 Nellore
Syntax:
Select column1, column2, …………column from table_name where[condition];
= Equal to
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
<> Not equal to
SQL operators:
SQL statements generally contain some reserved words or characters that are used to
perform operations such as arithmetic and logical operations etc. Their reserved words are
known as operators.
+ Addition
- Subtraction
/ Division
* Multiplication
% modulus
1. Addition (+):
It is used to perform addition operation on data items.
Sample table:
EMP_ID EMP_NAME SALARY
1 Alex 25000
2 John 55000
3 Daniel 52000
4 Sam 12312
Output:
EMP_ID EMP_NAME SALARY SALARY+100
1 Alex 25000 25100
2 John 55000 55100
3 Daniel 52000 52100
4 Sam 12312 12412
Here we have done addition of 100 to each emp‟s salary.
2. Subtraction (-):
It is used to perform subtraction on the data items.
Example:
Select emp_id, emp_name, salary, salary-100 as “salary-100” from
subtraction;
EMP_ID EMP_NAME SALARY SALARY-100
1 Alex 25000 24900
2 John 55000 54900
3 Daniel 52000 51900
4 Sam 90000 89900
Here we have done subtraction of 100 for each emp‟s salary.
3. Division (/):
The division function is used to integer division (x is divided by y).an integer value is
returned.
Example:
Select emp_id, emp_name, salary, salary/100 as “salary/100” from
division;
4. Multiplication (*):
It is used to perform multiplication of data items.
Select emp_id, emp_name, salary, salary*100 as “salary*100” from
multiplication;
5. Modulus (%):
It is used to get remainder when one data is divided by another.
Select emp_id, emp_name, salary, salary%25000 as “salary%25000” from
modulus;
Output:
EMP_ID EMP_NAME SALARY SALARY%25000
1 Alex 25000 0
2 John 55000 5000
3 Daniel 52000 2000
4 Sam 90000 15000
Here we have done modulus operation to each emp‟s salary.
The following example finds all employees where salaries are greater than the 5000 and less
than 7000.
Select first_name, last_name, salary from employees where
salary>5000 AND salary<7000 order by salary;
Output:
FIRST_NAME LAST_NAME SALARY
John Wesley 6000
Eden Daniel 6000
Luis Popp 6900
Shanta Suji 6500
2. ALL:
The ALL operator compares a value to all values in another value set.
The following example finds all employees whose salaries are greater than all salaries of
employees.
EX:
select first_name, last_name, salary from employees where salary>=ALL
(select salary from employees where department_id =8) order by salary
DESC;
Output:
FIRST_NAME LAST_NAME SALARY
Steven King 24000
John Russel 17000
Neena Kochhar 14000
3. ANY:
The ANY operator compares a value to any value in a set ascending to condition.
The following example statement finds all employees whose salaries are greater than the average
salary of every department.
EX:
select first_name, last_name, salary from employees where salary >ANY
(select avg (salary) from employees‟ group by department_id) order by
first_name, last_name;
Output:
FIRST_NAME LAST_NAME SALARY
Alexander Hunold 9000.00
Charles Johnson 6200.00
David Austin 4800.00
Eden Flip 9000.00
4. Between:
The between operator searches for values that are within a set of values.
For example, the following statement finds all employees where salaries are between 9000
and 12000.
EX:
Output:
FIRST_NAME LAST_NAME SALARY
Alexander Hunold 9000.00
Den Richards 10000.00
Nancy Prince 12000.00
5. IN:
The IN operator compares a value to list of specified values. The IN operator return true if
compared value matches at least one value in the list.
The following statement finds all employees who work in department _id 8 or 9.
EX:
Output:
FIRST_NAME LAST_NAME DEPARTMENT_ID
John Russel 8
Jack Livingstone 8
Steven King 9
Neena Kochhar 9
6. Exists:
The EXISTS operator tests if a sub query contains any rows.
For example, the following statement finds all employees who have dependents.
select first_name, last_name from employees where EXISTS (select 1
from dependent d where d.employee_id=e.employee_id);
FIRST_NAME LAST_NAME
Steven King
Neena Kochhar
Alexander Hunold
2.22 SQL FUNCTIONS (Date & Time, Numeric, Aggregate, String conversions):
Output: 05-DEC-2021.
ADD_MONTHS: This function returns a date after adding data with specified no of months.
Output: 31-MAR-17.
Output: 05-MAR-22.
Output: 05-DEC-2021.
NEXT_DAY: This function represents both day and date and returns the day of the next given day.
Output: 07-DEC-21.
Output: 31-DEC-21.
Output: -4.
ROUND: It gives the nearest value or round off value for the argument pass. (or) It returns a date
rounded to a specific unit of measure.
Output: 01-JAN-22.
TRUNC: This function returns the date with the time(co-efficient) portion of the date truncated to
the unit specified.
Output: 01-DEC-21.
TO_DATE: This function converts date which is in the character string to a date value.
Output: 01-JAN-17.
Output: 05 12 2021.
LEAST: This function displays the oldest date present in the argument list.
Output: 01-MAR-21.
GREATEST: This function displays the latest date present in the argument list.
Output: 28-DEC-21.
[Link] ()
[Link] ()
[Link] ()
[Link] ()
[Link] ()
From table_name
Where condition);
[Link] (): It will add/ sum all the column values in the query.
From table_name
Where condition);
[Link] (): Avg function used to calculate average values of the set of rows.
From table_name
Where condition);
[Link] (): This function is used to find maximum value from the set of values.
From table_name
Where condition);
[Link] (): This function is used to find minimum value from the set of values.
From table_name
Where condition);
Numeric functions are used to perform operations on numbers and return numbers.
OUTPUT: 243.5
4. CEIL (): It returns the smallest integer value that is a greater than or equal to a number.
EX: select CEIL (25.77) from dual;
OUTPUT: 26
5. FLOOR (): It returns the largest integer value that is a less than or equal to a number.
EX: select FLOOR (25.75) from dual;
OUTPUT: 25
6. TRUNCATE (): This does not work for SQL server. It returns the truncated to 2 places right of the
decimal point.
EX: select TRUNCATE (7.53635, 2) from dual;
OUTPUT: 7.53
7. MOD (): It returns the remainder when two numbers are divided.
EX: select MOD (55,2) from dual;
OUTPUT: 1.
8. ROUND (): This function rounds the given value to given number of digits of precision.
EX: select ROUND (14.5262,2) from dual;
OUTPUT: 14.53.
9. POWER (): This function gives the value of m raised to the power of n.
EX: select POWER (4,9) from dual;
OUTPUT: 262144.
10. SQRT (): This function gives the square root of the given value n.
EX: Select SQRT (576) from dual;
OUTPUT: 24.
11. LEAST (): This function returns least integer from given set of integers.
OUTPUT: 1.
12. GREATEST (): This function returns greatest integer from given set of integers.
OUTPUT: 22
String Functions are used to perform an operation on input string and return the output string.
Following are the string functions
[Link] (): This function is used to add two words (or) strings.
[Link] (): This function is used to convert the given string into lowercase.
OUTPUT: database
[Link] (): This function is used to convert the lowercase string into uppercase.
OUTPUT: DATABASE
[Link] (): This function is used to make the given string of the given size by adding the given symbol.
OUTPUT: 00system
[Link] (): This function is used to make the given string as long as the given size by adding the given
symbol on the right.
OUTPUT: system00
[Link] (): This function is used to cut the given substring from the original string.
OUTPUT: base
[Link] (): This function is used to cut the given substring from the original string.
OUTPUT: data.
9. INITCAP (): This function returns the string with first letter of each word starts with uppercase.
10. LENGTH (): Tis function returns the length of the given string.
OUTPUT: 11.
[Link] (): This function returns a portion of a string beginning at the character position.
OUTPUT: AM.
[Link] (): This function returns a string after replacing some set of characters into another set.
Review Questions:
References:
Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
C.J. Date, Introduction to Database Systems, Pearson Education.
Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.
Syllabus:
Entity Relationship Model: Introduction, Representation of entities, attributes, entity set,
relationship, relationship set, constraints, sub classes, super class, inheritance, specialization,
generalization using ER Diagrams.
SQL: Creating tables with relationship, implementation of key and integrity constraints,
nested queries, sub queries, grouping, aggregation, ordering, implementation of different
types of joins, view(updatable and non-updatable), relational set operations.
Objectives:
After studying this unit, you will be able to:
Explain entity relationship model
Know the constraints
Describe E-R diagrams and extended ER features
Describe subqueries and nested subqueries
Explain complex queries
Define views
Discuss joined relations
DATABASE MANAGEMENT SYSTEMS UNIT – III : ER MODEL
3.1 Introduction
The entity-relationship (ER) data model allows us to describe the data involved in a real-world
enterprise in terms of objects and their relationships and is widely used to develop an initial
database design.
The ER model is important primarily for its role in database design. It provides useful concepts
that allow us to move from an informal description of what users want from their database to a
more detailed and precise, description that can be implemented in a DBMS.
Even though the ER model describes the physical database model, it is basically useful in the
design and communication of the logical database model.
Requirements Analysis
The very first step in designing a database application is to understand what data is to be stored in
the database, what applications must be built on the database, and what operations must be
performed on the database. In other words, we must find out what the users want from the
database. This process involves discussions with user groups, a study of the current operating
environment, how it is expected to change an analysis of any available documentation on existing
applications and so on.
The information gathered in the requirement analysis step is used to develop a high-level
description of the data to be stored in the database, along with the conditions known to hold this
data. The goal is to create a description of the data that matches both—how users and developers
think of the data (and the people and processes to be represented in the data). This facilitates
discussion among all the people involved in the design process i.e., developers and as well as
users who have no technical background. In simple words, the conceptual database design phase
is used in drawing ER model.
We must implement our database design and convert the conceptual database design into a
database schema (a description of data) in the data model (a collection of high-level data
description constructs that hide many low-level storage details) of the DBMS. We will consider
only relational DBMSs, and therefore, the task in the logical design step is to convert the
conceptual database design in the form of E-R Schema (Entity-Relationship Schema) into a
relational database schema.
Schema Refinement
The fourth step in database design is to analyze the collection, of relations (tables) in our relational
database schema to identify future problems, and to refine (clear) it.
This step may simply involve building indexes on some tables and clustering some tables, or it
may involve redesign of parts of the database schema obtained from the earlier design steps.
Any software project that involves a DBMS must consider applications that involve processes and
identify the entities.
Attribute: An attribute describes a property associated with entities. Attribute will have a name
and a value for each entity.
Domain: A domain defines a set of permitted values for an attribute
Entity Relationship Model: An ERM is a theoretical and conceptual way of showing data
relationships in software development. It is a database modeling technique that generates an
abstract diagram or visual representation of a system's data that can be helpful in designing a
relational database.
ER model allows us to describe the data involved in a real-world enterprise in terms of objects and
their relationships and is widely used to develop an initial database design.
ATTRIBUTES: Attributes are the properties of entities. Attributes are represented by means of
ellipses. Every ellipse represents one attribute and is directly connected to its entity.
Types of attributes:
Simple attribute − Simple attributes are atomic values, which cannot be divided further. For
example, a student's roll number is an atomic value.
Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database, instead
it can be derived. For another example, age can be derived from data_of_birth.
Degree of Relationship is the number of participating entities in a relationship defines the degree of the
relationship. Based on degree the relationships are categorized as
Unary = degree 1
Binary = degree 2
Ternary = degree 3
n-array = degree
Unary Relationship: A relationship with one entity set. It is like a relationship among 2 entities of
same entity set. Example: A professor ( in-charge) reports to another professor (Head Of the Dept).
Binary Relationship: A relationship among 2 entity sets. Example: A professor teaches a course
and a course is taught by a professor.
Ternary Relationship: A relationship among 3 entity sets. Example: A professor teaches a course
in so and so semester.
Cardinality:
Defines the number of entities in one entity set, which can be associated with the number of
entities of other set via relationship set. Cardinality ratios are categorized into 4. They are.
1. One-to-One relationship: When only one instance of entities is associated with the
relationship, then the relationship is one-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with at most one entity in A.
2. One-to-many relationship: When more than one instance of an entity is associated with a
relationship, then the relationship is one-to-many relationship. Each entity in A is associated
with zero or more entities in B and each entity in B is associated with at most one entity in A.
3. Many-to-one relationship: When more than one instance of entity is associated with the
relationship, then the relationship is many-to-one relationship. Each entity in A is associated
with at most one entity in B and each entity in B is associated with 0 (or) more entities in A.
4. Many-to-Many relationship: If more than one instance of an entity on the left and more than
one instance of an entity on the right can be associated with the relationship, then it depicts
many-to-many relationship. Each entity in A is associated with 0 (or) more entities in B and
each entity in B is associated with 0 (or) more entities in A.
Relationship Set:
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.
Participation Constraints:
Total Participation − If Each entity in the entity set is involved in the relationship then the
participation of the entity set is said to be total. Total participation is represented by double
lines.
Partial participation − If, Not all entities of the entity set are involved in the relationship then
such a participation is said to be partial. Partial participation is represented by single lines.
Example:
Consider a relationship set called Manages between the Employees and Departments entity sets
such that each department has at most one manager, although a single employee is allowed to
manage more than one department. The restriction that each department has at most one manager
is an example of a key constraint, and it implies that each Departments entity appears in at most
one Manages relationship in any allowable instance of Manages. This restriction is indicated in the
ER diagram of below Figure by using an arrow from Departments to Manages. Intuitively, the
arrow states that given a Departments entity, we can uniquely determine the Manages relationship
in which it appears.
If an entity set E has a key constraint in a relationship set R, each entity in an instance of E appears
in at most one relationship in (a corresponding instance of) R. To indicate a key constraint on entity
set E in relationship set R, we draw an arrow from E to R.
Below figure show a ternary relationship with key constraints. Each employee works in at most one
department, and at a single location.
Weak Entities
Strong Entity set: If each entity in the entity set is distinguishable or it has a key then such an entity
set is known as strong entity set.
Weak Entity set: If each entity in the entity set is not distinguishable or it doesn't has a key then
such an entity set is known as weak entity set.
eno is key so it is represented by solid underline. dname is partial key. It can't distinguish the
tuples in the Dependent entity set. so dname is represented by dashed underline.
Weak entity set is always in total participation with the relation. If entity set is weak then the
relationship is also known as weak relationship, since the dependent relation is no longer needed
when the owner left.
Ex: policy dependent details are not needed when the owner (employee) of that policy left or fired
from the company or expired. The detailed ER Diagram is as follows.
The cardinality of the owner entity set is with weak relationship is 1 : m. Weak entity set is
uniquely identifiable by partial key and key of the owner entity set.
Dependent entity set is key to the relation because the all the tuples of weak entity set are
associated with the owner entity set tuples.
Dependents is an example of a weak entity set. A weak entity can be identified uniquely only by
considering some of its attributes in conjunction with the primary key of another entity, which is
called the identifying owner.
The following restrictions must hold:
The owner entity set and the weak entity set must participate in a one-to-many relationship
set (one owner entity is associated with one or more weak entities, but each weak entity has
a single owner). This relationship set is called the identifying relationship set of the weak
entity set.
The weak entity set must have total participation in the identifying relationship set
Now we are in a position to write the ER diagram for the Company database which was introduced
in the beginning of this unit. The readers are strictly advised to follow the steps shown in this unit
to design an ER diagram for any chosen problem.
After careful analysis of the problem we come to a conclusion that there are four possible entity sets
as shown below:
1. Employees Strong Entity Set
2. Departments Strong Entity Set
3. Projects Strong Entity Set
4. Dependents Weak Entity Set
The next step is to get all the attributes that are most applicable for each entity set. Do this work by
considering each entity set in mind and also the type of attributes. Next job is to pick the primary
key for strong entity sets and partial key for weak entity sets.
The underlined attributes are the primary keys and DepName is the partial key of Dependents.
Also, DLocation may be treated as a multivalued attribute.
In this step we need to find all the meaningful relationship sets among possible entity sets. This
step is very tricky, as redundant relationships may lead to complicated design and in turn a bad
implementation.
Example: Let us show below what the possible relationship sets are:
1. Employees and Departments WorksFor
2. Employees and Departments Manages
3. Departments and Projects Controls
4. Projects and Employees WorksOn
5. Dependents and Employees Has
6. Employees and Employees Supervises
Some problems may not have recursive relationship sets but some do have. In fact, our Company
database has one such relationship set called Supervises. You can complete this step adding
possible descriptive attributes of the relationship sets (Manages has StartDate and WorksOn has
Hours).
This step is relatively a simple one. Simply apply the business rules and your common sense. So,
we write the structural constraints for our example as follows:
The last step is to look for “is-a” and “has-a” relationships sets for the given problem. As far as
the Company database is concerned, there are no generalization and aggregation relationships
in the Company database.
The complete single ER diagram by combining all the above five steps is shown in figure
This class hierarchy illustrates the inheritance concept. Where, the subclass attributes ISA (read as
: is a) super class attributes; indicating the “is a” relationship (inheritance concept).Therefore, the
attributes defined for a Hourly-Emps entity set are the attributes of Hourly-Emps plus attributes
of Employees (because subclass can have superclass properties). Likewise the attributes defined
for a Contract-Emps entity set are the attributes of Contract-Emps plus attributes of Employees.
3.9 Aggregation
Aggregation allows us to indicate that a relationship set (identified through a dashed box)
participates in another relationship sets. That is, a relationship set in an association between entity
sets. Sometimes we have to model a relationship between a collection of entities and relationships.
Example: Suppose that we have an entity set called Project and that each Project entity is sponsored
by one or more departments. Thus, the sponsors relationship set captures this information but, a
department that sponsors a project, might assign employees to monitor the sponsorship.
Therefore, Monitors should be a relationship set that associates a sponsors relationship (rather
than a Project or Department entity) with an Employees entity. However, again we have to define
relationships to associate two or more entities.
Use of Aggregation
We use an aggregation, when we need to express a relationship among relationships. Thus, there
are really two distinct relationships, Sponsors and Monitors, each with its own attributes.
3.10 Conceptual Database Design With The ER Model (ER Design Issues)
The following are the ER design issues:
Intuitively, it records the interval during which an employee works for a department. Now
suppose that it is possible for an employee to work in a given department over more than one
period.
This possibility is ruled out by the ER diagram’s semantics. The problem is that we want to record
several values for the descriptive attributes for each instance of the Works_In2 relationship. (This
situation is analogous to wanting to record several addresses for each employee.) We can address
this problem by introducing an entity set called, say, Duration, with attributes from and to, as
shown in Figure
There is at most one employee managing a department, but a given employee could manage
several departments; we store the starting date and discretionary budget for each manager-
department pair. This approach is natural if we assume that a manager receives a separate
discretionary budget for each department that he or she manages.
But what if the discretionary budget is a sum that covers all departments managed by that
employee? In this case each Manages2 relationship that involves a given employee will have the
same value in the dbudget field. In general such redundancy could be significant and could cause
a variety of problems. Another problem with this design is that it is misleading.
We can address these problems by associating dbudget with the appointment of the employee as
manager of a group of departments. In this approach, we model the appointment as an entity set,
say Mgr_Appts, and use a ternary relationship, say Man ages3, to relate a manager, an
appointment, and a department. The details of an appointment (such as the discretionary budget)
are not repeated for each department that is included in the appointment now, although there is
still one Manages3 relationship instance per such department. Further, note that each department
has at most one manager, as before, because of the key constraint. This approach is illustrated in
below Figure.
The first requirement suggests that we impose a key constraint on Policies with respect to Covers,
but this constraint has the unintended side effect that a policy can cover only one dependent. The
second requirement suggests that we impose a total participation constraint on Policies. This
solution is acceptable if each policy covers at least one dependent. The third requirement forces us
to introduce an identifying relationship that is binary (in our version of ER diagrams, although
there are versions in which this is not the case).
Even ignoring the third point above, the best way to model this situation is to use two binary
relationships, as shown in below figure.
Consider the constraint that each sponsorship (of a project by a department) be monitored by at
most one employee. We cannot express this constraint in terms of the Sponsors2 relationship
set. Also we can express the constraint by drawing an arrow from the aggregated relationship.
Sponsors to the relationship Monitors. Thus, the presence of such a constraint serves as another
reason for using aggregation rather than a ternary relationship set.
Review Questions
reservations, flights and their status, seat assignments on individual flights, and the schedule
and routing of future flights. Your design should include an E-R diagram, a set of relational
schemas, and a list of constraints, including primary-key and foreign-key constraints.
4. Discuss the representation of total participation and multivalued attribute in an E/R
diagram.
5. What is an Entity Relationship diagram and why it is useful?
6. What is a weak entity in ER diagram?
7. Give the diagrammatic representation of recursive relationship in an ER diagram and also
explain the importance of role names in representing a recursive relationship by taking a real
time example.
8. Consider a database used to record the marks that students get in different exams of different
course offerings.
a. Construct an E-R diagram that models exams as entities, and uses a ternary relationship,
for the above database.
b. Construct an alternative E-R diagram that uses only a binary relationship between
students and course-offerings. Make sure that only one relationship exists between a
particular student and course-offering pair, yet you can represent the marks that a student
gets in different exams of a course offering.
9. Explain about relationship sets in ER model with examples.
10. Explain about ER model design issues.
References:
Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
C.J. Date, Introduction to Database Systems, Pearson Education.
Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.
When we want to create tables with relationship , we need to use Referential integrity
constraints. The referential integrity constraint enforces relationship between tables.
-It designates a column or combination of columns as a Foreign key.
-The foreign key establish a relationship with a specified primary or unique key in another
table called the Referenced key.
- When referential integrity is enforced , it prevents from..
1) Adding records to a related table if there is no associated record in the primary table.
2) Changing values in a primary table that result in orphaned records in a related table.
3) Deleting records from a primary table if there are matching related records.
Note: The table containing the foreign key is called the child table and the table containing the
referenced key is called the Parent table.
Data constraints: All business of the world run on business data being gathered, stored and
analyzed. Business managers determine a set of business rules that must be applied to their data
prior to it being stored in the database/table of ensure its integrity.
For instance , no employee in the sales department can have a salary of less than Rs.1000/- .
Such rules have to be enforced on data stored. If not, inconsistent data is maintained in database.
Integrity constraints are the rules in real life, which are to be imposed on the data. If the data is
not satisfying the constraints then it is considered as inconsistent. These rules are to be enforced
on data because of the presence of these rules in real life. These rules are called integrity
constraints. Every DBMS software must enforce integrity constraints, otherwise inconsistent data
is generated.
2
Constraints are categorized as follows.
[Link] integrity constraints - A domain means a set of values assigned to a column. i.e A
set of permitted values. Domain constraints are handled by
Defining proper data type
Specifying not null constraint
Specifying check constraint.
Specifying default constraint
Column level :-
Table level :-
3
constraint is declared after declaring all columns.
use table level to declare constraint for combination of columns.(i.e composite key)
not null cannot be defined.
To add these constraints , we can use constraint with label or with out label.
A constraint can be added to a table at any time after the table was created by using ALTER
TABLE statement , using ADD clause.
Syntax:
ALTER TABLE <table_name> ADD CONSTRAINT cont_label NAME _OF_
THE_CONSTRAINT (column);
4
vi) Declaring Constraint at “Alter” level (Constraints without label)
Syntax:
ALTER TABLE <table_name> ADD NAME _OF_ THE_CONSTRAINT (column);
Note:’ Constraint ‘ clause is not required when constraints declared without a label.
1.2 CHECK :
Used to impose a conditional rule a table column.
It defines a condition that each row must satisfy.
Check constraint validates data based on a condition .
5
Value entered in the column should not violate the condition.
Check constraint allows null values.
Check constraint can be declared at table level or column level.
There is no limit to the number of CHECK constraints that can be defined on a
condition.
Limitations :-
Conditions should not contain/not applicable to pseudo columns like ROWNUM,
SYSDATE etc.
Condition should not access columns of another table
Here, we are creating a table with two columns such as Sid, sname.
Here, sid should start with ‘C ‘and a length of sid is exactly 4 characters. And sname should
ends with letter ‘ A’
6
SQL> SELECT *FROM check_table;
SID SNAME
---- ----------
C401 ABHILA
C401 ANITHA
C403 NANDHITHA
C522 LOHITHA
// with label
@ ALTER LEVEL
Here, we add check constraint to new table with columns.
SQL> ALTER TABLE check_alter ADD CONSTRAINT ck CHECK ( sid LIKE 'C%');
-If values are not provided for table column , default will be considered.
-This prevents NULL values from entering the columns , if a row is inserted without a value for
a column.
-The default value can be a literal, an expression, or a SQL function.
-The default expression must match the data type of the column.
- The DEFAULT constraint is used to provide a default value for a column.
-The default value will be added to all new records IF no other value is specified.
This defines what value the column should use when no value has been supplied explicitly when
inserting a record in the table.
SID CONTACTNO
---------- ------------------
501 9493949312
502 9999999999
503 9999999999
504 9393949412
2.1. UNIQUE
Columns declared with UNIQUE constraint does not accept duplicate values.
One table can have a number of unique keys.
Unique key can be defined on more than one column i.e composite unique key
A composite key UNIQUE key is always defined at the table level only.
By default UNIQUE columns accept null values unless declared with NOT NULL
constraint
Oracle automatically creates UNIQUE index on the column declared with UNIQUE
constraint
8
UNIQUE constraint can be declared at column level and table level.
Now , we removed unique constraint , so now this table consists duplicate data.
//UNIQUE@ ALTER LEVEL (here, the table contains duplicates, so it is not works)
//delete data from table_unique2
SQL> DELETE FROM table_unique2;
There should be at the most one Primary Key or Composite primary key per table.
9
PK column do not accept null values.
PK column do not accept duplicate values.
RAW,LONG RAW,VARRAY,NESTED TABLE,BFILE columns cannot be declared with PK
If PK is composite then uniqueness is determined by the combination of columns.
A composite primary key cannot have more than 32 columns
It is recommended that PK column should be short and numeric.
Oracle automatically creates Unique Index on PK column
EX:
Table altered.
SYNTAX:
CREATE TABLE < tablename>( col_name1 datatype[size],
col_name2 datatype[size],
:
col_namen datatype[size],
PRIMARY KEY (col_name1,col_name2….colmn_name n);
Foreign key column allows null values unless it is declared with NOT
NULL.
To establish 1:1 relationship between two tables declare foreign key with
unique constraint
SQL> ALTER TABLE marks3 ADD CHECK ( marks>0 AND marks< =100 );
Note :-
PRIMARY KEY cannot be dropped if it referenced by any FOREIGN KEY constraint.
If PRIMARY KEY is dropped with CASCADE option then along with PRIMARY KEY referencing
FOREING KEY is also dropped.
PRIMARY KEY column cannot be dropped if it is referenced by some FOREIGN KEY.
PRIMARY KEY table cannot be dropped if it is referenced by some FOREIGN KEY.
13
PRIMARY KEY table cannot be truncated if it is referenced by some FOREIGN KEY.
Note:: Once the primary key and foreign key relationship has been created then you can
not remove any parent record if the dependent childs exists.
By using this clause you can remove the parent record even if childs exists. Because when
ever you remove parent record oracle automatically removes all its dependent records from
child table, if this clause is present while creating foreign key constraint.
Ex: Consider twe tables dept(parent) and emp(child) tables.
TABLE LEVEL
SQL> create table emp(empno number(2), ename varchar(10), deptno number(2), primary
key(empno), foreign key(deptno) references dept(deptno) on delete cascade); // without label
Disable constraint
Performing the DML operation DML operation
Enable constraint
Disabling Constraint:-
Syntax :-
ALTER TABLE <tabname> DISABLE CONSTRAINT <constraint_name> ;
Example :-
SQL>ALTER TABLE student1 DISABLE CONSTRAINT ck ;
SQL>ALTER TABLE mark1 DISABLE PRIMARY KEY CASCADE;
14
NOTE:-
If constraint is disabled with CASCADE then PK is disabled with FK.
Enabling Constraint :-
Syntax :-
ALTER TABLE <tabname> ENABLE CONSTRAINT <name>
Example :-
SQL>ALTER TABLE student1 ENABLE CONSTRAINT ck;
The number of columns and data types of the columns being selected must be identical in all the
SELECT statements used in the query. The names of the columns need not be identical.
All SET operators have equal precedence. If a SQL statement contains multiple SET operators,
the oracle server evaluates them from left (top) to right (bottom) if no parentheses explicitly
specify another order.
Introduction
SQL set operators allows combine results from two or more SELECT statements. At first sight
this looks similar to SQL joins although there is a big difference. SQL joins tends to combine
columns i.e. with each additionally joined table it is possible to select more and more columns.
SQL set operators on the other hand combine rows from different queries with strong
preconditions .
Syntax :-
SELECT statement 1
UNION / UNION ALL / INTERSECT / MINUS
SELECT statement 2 ;
Rules :-
1. UNION
Example :-
2. UNION ALL
This will combine the records of multiple tables having the same structure
but including duplicates. IT is similar to UNION but it includes duplicates.
Example :-
3. INTERSECT
This will give the common records of multiple tables having the same
structure.
INTERSECT operator returns common values from the result of two SELECT statements.
Example:-
4. MINUS
This will give the records of a table whose records are not in other tables
having the same structure.
MINUS operator returns values present in the result of first SELECT statement and not present
in the result of second SELECT statement.
Example:-
UNION vs JOIN :-
UNION JOIN
Union combines data Join relates data
Union is performed on similar structures Join can be performed also be performed
on
dissimilar structures also
[Link] JOINS
A SQL JOIN is an Operation , used to retrieve data from multiple tables. It is performed
whenever two or more tables are joined in a SQL statement. so, SQL Join clause is used to
combine records from two or more tables in a database. A JOIN is a means for combining fields
from two tables by using values common to each. Several operators can be used to join tables,
18
such as =, <>, <=, >=, !=, BETWEEN, LIKE, and NOT; these all to be used to join tables.
However, the most common operator is the equal symbol.
SQL Join Types:
There are different types of joins available in SQL:
INNER JOIN: Returns rows when there is a match in both tables.
OUTER JOIN : Returns all rows even there is a match or no match in tables.
- LEFT JOIN/LEFT OUTER JOIN: Returns all rows from the left table,
even if there are no matches in the right table.
-RIGHT JOIN/RIGHT OUTER JOIN : Returns all rows from the right table, even if
there are
no matches in the left table.
-FULL JOIN/FULL OUTER JOIN : Returns rows when there is a match in one of the
tables.
SELF JOIN: It is used to join a table to itself as if the table were two tables, temporarily
renaming at least one table in the SQL statement.
CARTESIAN JOIN or CROSS JOIN : It returns the Cartesian product of the sets of records
from the two or more joined tables.
Based on Operators, The Join can be classified as
- Inner join or Equi Join
- Non-Equi Join
NATURAL JOIN: It is performed only when common column name is same. In this,no
need to specify join condition explicitly , ORACLE automatically performs join
operation on the column with same name.
1. SQL INNER JOIN (simple join)
It is the most common type of SQL join. SQL INNER JOINS return all rows from multiple
tables where the join condition is met.
Syntax
SELECT columns FROM table1 INNER JOIN table2 ON [Link] = [Link];
Visual Illustration
In this visual diagram, the SQL INNER JOIN returns the shaded area:
19
The SQL INNER JOIN would return the records where table1 and table2 intersect.
Let's look at some data to explain how the INNER JOINS work with example.
We have a table called SUPPLIERS with two fields (supplier_id and supplier_name).
It contains the following data:
supplier_id supplier_name
10000 ibm
10001 hewlett packard
10002 microsoft
10003 nvidia
We have another table called ORDERS with three fields (order_id, supplier_id, and
order_date).
It contains the following data:
order_id supplier_id order_date
500125 10000 2003/05/12
500126 10001 2003/05/13
500127 10004 2003/05/14
Example of INNER JOIN:
Q: List supplier id, name and order id of supplier.
SELECT s.supplier_id, s.supplier_name, od.order_date FROM suppliers s INNER JOIN
orders od ON s.supplier_id = od.supplier_id;
This SQL INNER JOIN example would return all rows from the suppliers and orders
tables where there is a matching supplier_id value in both the suppliers and orders tables.
Our result set would look like this:
supplier_id name order_date
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13
20
The rows for Microsoft and NVIDIA from the supplier table would be omitted, since the
supplier_id's 10002 and 10003 do not exist in both tables.
The row for 500127 (order_id) from the orders table would be omitted, since the
supplier_id 10004 does not exist in the suppliers table.
[Link] JOIN:
Inner / Equi join returns only matching records from both the tables but not unmatched record,
An Outer join retrieves all row even when one of the column met join condition.
Types of outer join:
1. LEFT JOIN/LEFT OUTER JOIN
[Link] JOIN/RIGHT OUTER JOIN
[Link] JOIN/FULL OUTER JOIN
[Link] OUTER JOIN
This type of join returns all rows from the LEFT-hand table specified in the ON
condition and only those rows from the other table where the joined fields are equal (join
condition is met).
Syntax
SELECT columns FROM table1 LEFT [OUTER] JOIN table2
ON [Link] = [Link];
Visual Illustration
In this visual diagram, the SQL LEFT OUTER JOIN returns the shaded area:
The SQL LEFT OUTER JOIN would return the all records from table1 and only those
records from table2 that intersect with table1.
Example
SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM
suppliers LEFT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;
This LEFT OUTER JOIN example would return all rows from the suppliers table and only
those rows from the orders table where the joined fields are equal.
21
supplier_id supplier_name order_date
-------------- ---------------------- -----------------------
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13
10002 microsoft <null>
10003 nvidia <null>
The rows for Microsoft and NVIDIA would be included because a LEFT OUTER JOIN
was used. However, you will notice that the order_date field for those records contains a
<null> value.
2.2 SQL RIGHT OUTER JOIN
This type of join returns all rows from the RIGHT-hand table specified in the ON
condition and only those rows from the other table where the joined fields are equal (join
condition is met).
Syntax
SELECT columns FROM table1 RIGHT [OUTER] JOIN table2 ON [Link] =
[Link];
In some databases, the RIGHT OUTER JOIN keywords are replaced with RIGHT JOIN.
Visual Illustration
In this visual diagram, the SQL RIGHT OUTER JOIN returns the shaded area:
The SQL RIGHT OUTER JOIN would return the all records from table2 and only those
records from table1 that intersect with table2.
Example
SELECT orders.order_id, orders.order_date, suppliers.supplier_name FROM suppliers
RIGHT OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;
22
This RIGHT OUTER JOIN example would return all rows from the orders table and only
those rows from the suppliers table where the joined fields are equal.
If a supplier_id value in the orders table does not exist in the suppliers table, all fields in
the suppliers table will display as <null> in the result set.
order_id order_date supplier_name
------------ --------------- -----------------
500125 2013/05/12 ibm
500126 2013/05/13 hewlett packard
500127 2013/05/14 <null>
The row for 500127 (order_id) would be included because a RIGHT OUTER JOIN was
used. However, you will notice that the supplier_name field for that record contains a
<null> value.
2.3. SQL FULL OUTER JOIN
This type of join returns all rows from the LEFT-hand table and RIGHT-hand table with
nulls in place where the join condition is not met.
Syntax
SELECT columns FROM table1 FULL [OUTER] JOIN table2 ON [Link] =
[Link]; In some databases, the FULL OUTER JOIN keywords are replaced with
FULL JOIN.
Visual Illustration
In this visual diagram, the SQL FULL OUTER JOIN returns the shaded area:
The SQL FULL OUTER JOIN would return the all records from both table1 and table2.
Example
Here is an example of a SQL FULL OUTER JOIN:
Query : Find supplier id, supplier name and order date of suppliers who have ordered.
23
SELECT suppliers.supplier_id, suppliers.supplier_name, orders.order_date FROM
suppliers FULL OUTER JOIN orders ON suppliers.supplier_id = orders.supplier_id;
This FULL OUTER JOIN example would return all rows from the suppliers table and all
rows from the orders table and whenever the join condition is not met, <nulls> would be
extended to those fields in the result set.
If a supplier_id value in the suppliers table does not exist in the orders table, all fields in
the orders table will display as <null> in the result set. If a supplier_id value in the orders
table does not exist in the suppliers table, all fields in the suppliers table will display as
<null> in the result set.
supplier_id supplier_name order_date
----------------- -------------------- ---------------------
10000 ibm 2013/05/12
10001 hewlett packard 2013/05/13
10002 microsoft <null>
10003 nvidia <null>
<null> <null> 2013/05/14
The rows for Microsoft and NVIDIA would be included because a FULL OUTER JOIN
was used. However, you will notice that the order_date field for those records contains a
<null> value.
The row for supplier_id 10004 would be also included because a FULL OUTER JOIN was
used. However, you will notice that the supplier_id and supplier_name field for those
records contain a <null> value.
Equi join :
When the Join Condition is based on EQUALITY (=) operator, the join is said to be an Equi
join. It is also called as Inner Join.
Syntax
Select col1,col2,…From <table 1>,<table 2>Where <join condition with ‘=’ > .
[Link] : Find supplier id, supplier name and order date of suppliers who have ordered .
select [Link], [Link] ,o.order_date from suppliers s, orders o where
[Link] =[Link].
24
supplier_id name order_date
10000 ibm 2003/05/12
10001 hewlett packard 2003/05/13
Non Equi Join :-
When the join condition based on other than equality operator , the join is said to be a Non-Equi
join.
Syntax:-
Select col1,col2,…….
From <table 1>,<table 2>
Where <join condition > [AND <join cond> AND <cond> ----]
In NON- EQUI JOIN, JOIN COND is not based on = operator. It is based on other than =
operator, usually BETWEEN or > or < operators.
Query : Find supplier id,supplier name and order date in between 50025 and 500127.
sql> select s.supplier_id,s.supplier_name,o.order_date from suppliers s , orders o where
o.order_id between 500125 and 500127;
Syntax :
(Here T1 and T2 refers same table)
SELECT <collist> From Table1 T1, Table1 T2
Where T1.Column1=T2.Column2;
Example:
select s1.supplier_id ,s1.supplier_name ,s2.supplier_id from suppliers s1, suppliers s2 where
s1.supplier_id=s2.supplier_id ;
supplier_id supplier_name supplier_id
----------------- ------------------ ---------------
10000 ibm 10000
10001 hewlett packard 10001
10002 microsoft 10002
10003 nvidia 10003
CROSS JOIN:
It returns the Cartesian product of the sets of records from the two or more joined tables. In
Cartesian product, each element of one set is combined with every element of another set to form
the resultant elements of Cartesian product.
Sytax: SELECT * FROM <tablename1> CROSS JOIN <tablename2>
26
CROSS JOIN returns cross product of two tables.
Each record of one table is joined to each and every record of another table.
If table1 contains 10 records and table2 contains 5 records then CROSS JOIN between
table1 and table2 returns 50 records.
ORACLE performs CROSS JOIN when we submit query without JOIN COND.
Example: sql> SELECT * FROM suppliers CROSS JOIN orders;
supplier_id supplier_n order_id supplier_id order_dat
----------- ---------- ---------- ----------- ---------
10000 ibm 500125 10000 12-may-03
10000 ibm 500126 10001 13-may-03
10000 ibm 500127 10003 14-may-03
10001 hewlett 500125 10000 12-may-03
10001 hewlett 500126 10001 13-may-03
10001 hewlett 500127 10003 14-may-03
10002 microsoft 500125 10000 12-may-03
10002 microsoft 500126 10001 13-may-03
10002 microsoft 500127 10003 14-may-03
10003 nvidia 500125 10000 12-may-03
10003 nvidia 500126 10001 13-may-03
NATURAL JOIN:
NATURAL JOIN is possible in ANSI SQL/92 standard.
NATURAL JOIN is similar to EQUI JOIN.
NATURAL JOIN is performed only when common column name is same.
In NATURAL JOIN no need to specify join condition explicitly , ORACLE
automatically performs join operation on the column with same name.
27
Syntax: SELECT <column list> FROM table1 NATURAL JOIN table2;
Example: ( Sailors table)
SELECT sid,sname,sid FROM sailors NATURAL JOIN reserves ; //both tables have
same column name.
SID SNAME SID
---------- ---------- ----------
22 DUSTIN 22
22 DUSTIN 22
22 DUSTIN 22
22 DUSTIN 22
31 LUBBER 31
31 LUBBER 31
31 LUBBER 31
64 HORTIO 64
64 HORTIO 64
74 HORTIO 74
VI. VIEWS
A view in SQL is a logical subset of data from one or more tables. View is used to restrict data
[Link] abstraction is usually required after a table is created and populated with data. Data
held by some tables might require restricted access to prevent all users from accessing all columns
of a table, for data security reasons. Such a security issue can be solved by creating several tables
with appropriate columns and assigning specific users to each such table, as required. This answers
data security requirements very well but gives rise to a great deal of redundant data being resident
in tables, in the [Link] reduce redundant data to the minimum possible, Oracle provides
Virtual tables which are Views.
View Definition :-
A View is a virtual table based on the result returned by a SELECT query.
The most basic purpose of a view is restricting access to specific column/rows from a table thus
allowing different users to see only certain rows or columns of a table.
Composition Of View:-
A view is composed of rows and columns, very similar to table. The fields in a view are fields
from one or more database tables in the database.
SQL functions, WHERE clauses and JOIN statements can be applied to a view in the same
manner as they are applied to a table.
View storage:-
Oracle does not store the view data. It recreates the data, using the view’s SELECT statement,
every time a user queries a view.
28
A view is stored only as a definition in Oracle’s system catalog.
When a reference is made to a view, its definition is scanned, the base table is opened and the
view is created on top of the base [Link], therefore, means that a view never holds data, until
a specific call to the view is made. This reduces redundant data on the HDD to a very large
extent.
Advantages Of View:-
Security:- Each user can be given permission to access only a set of views that contain specific
data.
Query simplicity:- A view can drawn from several different tables and present it as a single table
turning multiple table queries into single table queries against the view.
Data Integrity:- If data is accessed and entered through a view, the DBMS can automatically
check the data to ensure that it meets specified integrity constraints.
Disadvantage of View:-
Performance:- Views only create the appearance of the table but the RDBMS must still translate
queries against the views into the queries against the underlined source tables. If the view is
defined on a complex multiple table query then even a simple query against the view becomes a
complicated join and takes a long time to execute.
Types of Views :-
Simple Views
Complex Views
Simple Views :-
a View based on single table is called simple view.
Syntax:-
CREATE VIEW <View Name>
AS
SELECT<ColumnName1>,<ColumnName2>..
FROM <TableName>
[WHERE <COND>]
[WITH CHECK OPTION]
[WITH READ ONLY]
Example :-
Views can also be used for manipulating the data that is available in the base tables[i.e. the
user can perform the Insert, Update and Delete operations through view.
Views on which data manipulation can be done are called Updateable Views.
29
If an Insert, Update or Delete SQL statement is fired on a view, modifications to data in the
view are passed to the underlying base table.
For a view to be updatable,it should meet the following criteria:
Views defined from Single table.
If the user wants to INSERT records with the help of a view, then the PRIMARY KEY
column(s) and all the NOT NULL columns must be included in the view.
Inserting record through view :-
Updating a View:
A view can be updated under certain conditions:
The SELECT clause may not contain the keyword DISTINCT.
The SELECT clause may not contain summary functions.
The SELECT clause may not contain set functions.
The SELECT clause may not contain set operators.
The SELECT clause may not contain an ORDER BY clause.
The FROM clause may not contain multiple tables.
The WHERE clause may not contain subqueries.
The query may not contain GROUP BY or HAVING.
Calculated columns may not be updated.
All NOT NULL columns from the base table must be included in the view in order for
the INSERT query to function.
So if a view satisfies all the above-mentioned rules then you can update a view.
If VIEW created with WITH CHECK OPTION then any DML operation through that view
violates where condition then that DML operation returns error.
Example :-
SQL>CREATE VIEW V2
AS
SELECT empno,ename,sal,deptno FROM emp
WHERE deptno=10
WITH CHECK OPTION ;
30
Then insert the record into emp table through view V2
SQL>INSERT INTO V2 VALUES(2323,’RAJU’,4000,20) ;
The above INSERT returns error because DML operation violating WHERE clause.
Complex Views :-
A view is said to complex view
If it based on more than one table
Query contains
AGGREGATE functions
DISTINCT clause
GROUP BY clause
HAVING clause
Sub-queries
Constants
Strings or Values Expressions
UNION,INTERSECT,MINUS operators.
Example 1 :-
SQL>CREATE VIEW V3
AS
SELECT [Link],[Link],[Link],[Link],[Link]
FROM emp E JOIN dept D
USING(deptno) ;
we cannot perform insert or update or delete operations on base table through complex views.
Complex views are not updatable views.
Example 2 :-
SQL>CREATE VIEW V2
AS
SELECT deptno,SUM(sal) AS sumsal
FROM EMP
GROUP BY deptno;
Destroying a View:-
The DROP VIEW command is used to destroy a view from the database.
Syntax:-
DROP VIEW<viewName>
Example :-
SQL>DROP VIEW emp_v;
DIFFERENCES BETWEEN SIMPLE AND COMPLEX VIEWS:
31
SIMPLE COMPLEX
A materialized view in Oracle is a database object that contains the results of a query. They are
local copies of data located remotely, or are used to create summary tables based on aggregations
of a table's data. Materialized views, which store data based on remote tables are also, know as
snapshots.
A materialized view can query tables, views, and other materialized views. Collectively these are
called master tables (a replication term) or detail tables (a data warehouse term).
For replication purposes, materialized views allow you to maintain copies of remote data on your
local node. These copies are read-only. If you want to update the local copies, you have to use
the Advanced Replication feature. You can select data from a materialized view as you would
from a table or view.
For data warehousing purposes, the materialized views commonly created are aggregate views,
single-table aggregate views, and join views.
In replication environments, the materialized views commonly created are primary key, rowid,
and subquery materialized views.
SYNTAX:
CREATE MATERIALIZED VIEW view-name
BUILD [IMMEDIATE | DEFERRED]
REFRESH [FAST | COMPLETE | FORCE ]
ON [COMMIT | DEMAND ]
[[ENABLE | DISABLE] QUERY REWRITE]
[ON PREBUILT TABLE]
AS
SELECT COLUMN_LIST FROM TABLE_NAME;
32
The following refresh types are available.
FAST : A fast refresh is attempted. If materialized view logs are not present against the
source tables in advance, the creation fails.
COMPLETE : The table segment supporting the materialized view is truncated and
repopulated completely using the associated query.
FORCE : A fast refresh is attempted. If one is not possible a complete refresh is
performed.
A refresh can be triggered in one of two ways.
ON COMMIT : The refresh is triggered by a committed data change in one of the
dependent tables.
ON DEMAND : The refresh is initiated by a manual request or a scheduled task.
The QUERY REWRITE clause tells the optimizer if the materialized view should be consider
for query rewrite operations. An example of the query rewrite functionality is shown below.
The ON PREBUILT TABLE clause tells the database to use an existing table segment, which
must have the same name as the materialized view and support the same column structure as the
query.
Example:
The following statement creates the rowid materialized view on table emp located on a remote
database:
SQL> CREATE MATERIALIZED VIEW mv_emp_rowid
REFRESH WITH ROWID
AS SELECT * FROM emp@remote_db;
[Link]
If you want output in descending order you have to use desc keyword after the column.
Ex:
SQL> select * from student order by no; SQL> select * from student order by no desc;
33
The order of rows returned in a query result is undefined. The ORDER BY clause can be used to
sort the rows. If you use the ORDER BY clause, it must be the last clause of the SQL statement.
You can specify an expression, or an alias, or column position in ORDER BY clause.
In the syntax,
ORDER BY :specifies the order in which the retrieved rows are displayed.
orders the rows in ascending order ( default order)
orders the rows in descending order
t Ordering of Data :-
Numeric values are displayed with the lowest values firs for example 1–999.
Date values are displayed with the earliest value first for example 01-JAN-92 before 01-
JAN-95.
Character values are displayed in alphabetical order—for example, A first and Z last.
Null values are displayed last for ascending sequences and first for descending
sequences.
Examples :-
Arrange employee records in ascending order of their sal ?
34
SQL>SELECT * FROM emp ORDER BY sal ;
Display employee records working for 10th dept and arrange the result in ascending order of
their sal ?
Arrange employee records in ascending of their deptno and with in dept arrange records in
descending order of their sal ?
In ORDER BY clause we can use column name or column position , for example
In the above example records are sorted based on the fifth column in emp table.
Arrange employee records in descending order of their comm. If comm. Is null then arrange
those records last ?
GROUP BY clause
Using group by, we can create groups of related information. Columns used in select must be
used with group by, otherwise it was not a group by expression.
FROM from-list
WHERE qualification
GROUP BY grouping-list
HAVING group-qualification
The expression appearing in the group-qualification in the HAVING clause must have a
single value per group.
35
Ex: SQL> select deptno, sum(sal) from emp group by deptno;
DEPTNO SUM(SAL)
---------- ----------
10 8750
20 10875
30 9400
Sql> Find the age of the youngest sailor for each rating level.
Find the age of the youngest sailor who is eligible to vote for each rating level with at least two
such sailors ?
GROUP BY [Link]
For each red boat find the number of reservations for this boat?
GROUP BY [Link];
Find the average age of sailors for each rating level that has at least two sailors ?
GROUP BY [Link]
IX. AGGREGATION
It is a group operation, which will be works on all records of a table. To do this, Group
functions required to process group of rows and Returns one value from that group.
36
These functions are also called AGGREGATE functions or GROUP functions
X. SUB QUERIES
What is subquery in SQL?
A subquery is a SQL query nested inside a larger query.
A subquery may occur in :
- A SELECT clause
- A FROM clause
37
- A WHERE clause
The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE
statement or inside another subquery.
A subquery is usually added within the WHERE Clause of another SQL SELECT
statement.
You can use the comparison operators, such as >, <, or =. The comparison operator can
also be a multiple-row operator, such as IN, ANY, or ALL.
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
The inner query executes first before its parent query so that the results of inner query can
be passed to the outer query.
You can use a subquery in a SELECT, INSERT, DELETE, or UPDATE statement to perform
the following tasks :
Compare an expression to the result of the query.
Determine if an expression is included in the results of the query.
Check whether the query selects any rows.
Syntax :
The subquery (inner query) executes once before the main query (outer query) executes.
The main query (outer query) use the subquery result.
38
v001 abhi
v002 abhay
v003 arjun
v004 anand
SQL> select *from marks;
SID TOTALMARKS
---------- ----------
v001 95
v002 80
v003 74
v004 81
Now we want to write a query to identify all students who get better marks than that of the
student who's StudentID is 'V002', but we do not know the marks of 'V002'.
- To solve this problem, we require two queries.
One query returns the marks (stored in Totalmarks field) of 'V002' and a second query identifies
the students who get better marks than the result of the first query.
SQL> select *from marks where sid='v002';
Query Result:
SID TOTALMARKS
---------- ----------
v002 80
The result of query is 80.
- Using the result of this query, here we have written another query to identify the students who
get better marks than 80. Here is the query :
Second query :
SQL> select [Link],[Link],[Link] from student1 s, marks m where [Link]=[Link] and
[Link]>80;
SID NAME TOTALMARKS
---- ---------- ----------
v001 abhi 95
v004 anand 81
Above two queries identified students who get better number than the student who's StudentID is
'V002' (Abhi).
39
You can combine the above two queries by placing one query inside the other. The subquery
(also called the 'inner query') is the query inside the parentheses. See the following code and
query result :
40
Subqueries: Guidelines
There are some guidelines to consider when using subqueries :
-A subquery must be enclosed in parentheses.
-A subquery must be placed on the right side of the comparison operator.
-Subqueries cannot manipulate their results internally, therefore ORDER BY clause cannot
be added in to a [Link] can use a ORDER BY clause in the main SELECT statement
(outer query) which will be last clause.
-Use single-row operators with single-row subqueries.
-If a subquery (inner query) returns a null value to the outer query, the outer query will not
return any rows when using certain comparison operators in a WHERE clause.
Type of Subqueries
Single row subquery : Returns zero or one row.
Multiple row subquery : Returns one or more rows.
Multiple column subquery : Returns one or more columns.
Correlated subqueries : Reference one or more columns in the outer SQL statement.
The subquery is known as a correlated subquery because the subquery is related to the outer
SQL statement.
Nested subqueries : Subqueries are placed within another subqueries.
1)SINGLE ROW SUBQUERIES:- Returns zero or one row.
If inner query returns only one row then it is called single row subquery.
Syntax :-
SQL> SELECT RATING FROM SAILORS WHERE SID = (SELECT SID FROM SAILORS WHERE
SNAME='DUSTIN');
RATING
----------
7
Q: Find the sailors records whose sid is geater than ‘dustin’?
SQL> SELECT *FROM SAILORS WHERE SID > (SELECT SID FROM
SAILORS WHERE SNAME='DUSTIN');
if inner query returns more than one row then it is called multi row subquery.
Syntax :-
To test for values in a specified list of values, use IN operator. The IN operator can be used with
any data type. If characters or dates are used in the list, they must be enclosed in single
quotation marks (’’).
Syntax:-
IN (V1,V2,V3------------);
Note :-
Example :-
44
Q:Find the name of sailors who have reserved boat 103
SELECT [Link] FROM SAILORS S WHERE [Link] NOT IN (SELECT [Link] FROM
RESERVES R WHERE [Link] IN (SELECT [Link] FROM BOATS B WHERE [Link]
= 'RED'));
SNAME
----------
BRUTUS
CANDY
RUSTY
ZOBRA
HORATIO
ART
BOB
Using EXISTS operator :-
EXISTS operator returns TRUE or FALSE.
If inner query returns at least one record then EXISTS returns TRUE otherwise returns FALSE.
ORACLE recommends EXISTS and NOT EXISTS operators instead of IN and NOT IN.
Compares a value to each value in a list or returned by a query. Must be preceded by =, !=, >, <,
<=, >=. Evaluates to FALSE if the query returns no rows.
SQL> SELECT [Link] FROM SAILORS S WHERE [Link] > ANY ( SELECT
[Link] FROM SAILORS S2 WHERE [Link]=’HORATIO’) ;
SID
----------
58
71
74
31
32
ALL operator :-
Compares a value to every value in a list or returned by a query. Must be preceded by =, !=, >,
<, <=, >=. evaluates to TRUE if the query returns no rows.
Example:-
46
SQL>SELECT ename FROM emp
WHERE SAL > ALL ( SELECT sal FROM emp WHERE job = 'SALESMAN');
Q:Find sailors whose rating is better than every sailor called Horation?
SQL> SELECT [Link] FROM SAILORS S WHERE [Link] > ALL ( SELECT
[Link] FROM SAILORS S2 WHERE [Link]=’HORATIO’) ;
SID
----------
58
71
Multi Column Subqueries:-
If inner query returns more than one column value then it is called MULTI COLUMN subquery.
Example :-
Example :-
Display employee name earning second maximum salary ?
SQL>SELECT ename FROM emp
WHERE sal = (SELECT MAX(sal) FROM EMP
WHERE sal < (SELECT MAX(sal) FROM emp)) ;
Q:Find the names of sailors who have not reserved a red boat.
SELECT [Link] FROM SAILORS S WHERE [Link] NOT IN (SELECT [Link] FROM
RESERVES R WHERE [Link] IN (SELECT [Link] FROM BOATS B WHERE [Link]
= 'RED'));
SNAME
----------
BRUTUS
CANDY
RUSTY
ZOBRA
HORATIO
ART
BOB
CORRELATED SUB QUERIES:
In the Co-Related sub query a parent query will be executed first and based on the output of
outer query the inner query execute.
If parent query returns N rows ,inner query executed for N times.
If a subquery references one or more columns of parent query is called CO-RELATED subquery
because it is related to outer query. This subquery executes once for each and every row of
main query.
Example1 :-
Example2: Find sailors whose rating more than avg(rating ) of their id.
SQL> SELECT [Link] FROM SAILORS S WHERE RATING > (SELECT AVG(RATING) FROM
SAILORS WHERE SID=[Link]);
48
no rows selected.
SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and
[Link]=[Link] and ([Link] = ‘red’ or [Link]= ‘green’);
Or
SQL> Select [Link] from sailors s, reserves r, boats b where [Link]=[Link] and
[Link]=[Link] and [Link]=’red’
UNION
SNAME
Dustin
Lubber
Horatio
Q2) Find the names of sailors who have reserved a red and a green boat?
INTERSECT
SNAME
Dustin
Lubber
49
Horatio
Q3) Find the names of sailors who have reserved a red boat but not green boat?
MINUS
NO ROWS SELECTED
Q4) Find all sids of sailors who have a rating of 10 or reserved boat 104?
UNION
SID
22
31
58
71
50
Aditya College of Engineering & Technology
Aditya Nagar, ADB Road, Surampalem - 533437
Syllabus:
Schema Refinement (Normalization): Purpose of Normalization or schema refinement,
concept of functional dependency, normal forms based on functional dependency(1NF, 2NF
and 3 NF), concept of surrogate key, Boyce-codd normal form(BCNF), Lossless join and
dependency preserving decomposition, Fourth normal form(4NF), Fifth Normal Form (5NF).
Objectives:
After studying this unit, you will be able to:
Discuss the different types of anomalies in a database
State what is functional dependency
List the different forms of normalization
Differentiate among different types of normalization
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION
The Schema Refinement refers to refine the schema by using some technique. The best
technique of schema refinement is decomposition.
Normalization means “split the tables into small tables which will contain less number of
attributes in such a way that table design must not contain any problem of inserting,
deleting, updating anomalies and guarantees no redundancy”.
Normalization or Schema Refinement is a technique of organizing the data in the database.
It is a systematic approach of decomposing tables to eliminate data redundancy and
undesirable characteristics like Insertion, Update and Deletion Anomalies.
Redundancy: refers to repetition of same data or duplicate copies of same data stored in
different locations.
Anomalies: Anomalies refers to the problems occurred after poorly planned and normalized
databases where all the data is stored in one table which is sometimes called a flat file
database.
Anomalies refers to the problems occurred after poorly planned and unnormalized
databases where all the data is stored in one table which is sometimes called a flat file
database. Let us consider such type of schema
Here all the data is stored in a single table which causes redundancy of data or say
anomalies as SID and Sname are repeated once for same CID . Let us discuss anomalies one by
one.
Due to redundancy of data we may get the following problems, those are-
[Link] anomalies : It may not be possible to store some information unless some other
information is stored as well.
[Link] storage: some information is stored repeatedly
[Link] anomalies: If one copy of redundant data is updated, then inconsistency is created
unless all redundant copies of data are updated.
[Link] anomalies: It may not be possible to delete some information without losing some
other information as well.
Problem in updation / updation anomaly – If there is updation in the fee from 5000 to 7000,
then we have to update FEE column in all the rows, else data will become inconsistent.
Insertion Anomaly and Deletion Anomaly- These anomalies exist only due to redundancy,
otherwise they do not exist.
Insertion Anomalies: New course is introduced C4, But no student is there who is having C4
subject.
Because of insertion of some data, It is forced to insert some other dummy data.
Deletion Anomaly:
Deletion of S3 student cause the deletion of course. Because of deletion of some data forced to
delete some other useful data.
Purpose of Normalization:
Advantages of Normalization:
1. Greater overall database organization will be gained.
2. The amount of unnecessary redundant data reduced.
3. Data integrity is easily maintained within the database.
4. The database & application design processes are much for flexible.
5. Security is easier to maintain or manage.
Disadvantages of Normalization:
1. The disadvantage of normalization is that it produces a lot of tables with a relatively
small number of columns. These columns then have to be joined using their
primary/foreign key relationship.
2. This has two disadvantages.
Performance: all the joins required to merge data slow processing & place
additional stress on your hardware.
Complex queries: developers have to code complex queries in order to merge
data from different tables.
Case1: A →B
Here A1 belongs to B1 & B2. So A1 does not have unique value in B. So it is not in FD.
Case1: A →C
Here A1→C1 and A2, A3→C2. So A has unique values in B. So it is in FD.
Note: try to find all the possibilities. i.e., A→D, B→C, B→D, and C→D
Armstrong Axioms (Inference Rules ) : The term Armstrong axioms refers to the sound
and complete set of inference rules or axioms, introduced by William W. Armstrong, that is
used to test logical implication of functional dependencies.
Armstrong axioms define the set of rules for reasoning about functional dependencies and also
to infer all the functional dependencies on a relational database.
Attribute closure of an attribute set can be defined as set of attributes which can be
functionally determined from it.
The set of FD’s that is logically implied by F is called the closure of F and written as F +. And it
is defined as “If F is a set FD’s on a relation R, the F+, the closure of F by using the inferences
axioms that are not contained in F+.
Example: R (A, B, C, D) and set of Functional Dependencies are A→B, B→D, C→B then what
is the Closure of A, B, C, D?
Solution: A+ is
A+→ {A, B, D} i.e., A→B, B→D is exists and C is not FD on A. So it is eliminated.
B+→ { B, D} i.e., B→D is exists and A, C is not FD on A. So it is eliminated.
C+→ {C, B, D} i.e., C→B, B→D is exists and A is not FD on C. So it is eliminated.
The algorithm for computing the attribute closure of a set X of attributes is shown below
Candidate Key:
Candidate Key is minimal set of attributes of a relation which can be used to identify a tuple
uniquely.
Consider student table: student(sno, sname,sphone,age)
we can take sno as candidate key. we can have more than 1 candidate key in a table.
types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
Consider student table: student(sno, sname,sphone,age)
we can take sno, (sno, sname) as super key
Examples:
1. In a schema with attributes A,B,C,D and E the following set of attributes are given
AB, AC, CDE, BD, EA. Find CDAC determines from the given FDs or
not.
2. Check DA can be derived from the following FDs or not ABC, BCAD, DE,
CFB.
(i) Primary key: It is an unique value attribute in a table to enforce entity integrity and
ti identify rows in the table uniquely.
(ii) Composite Primary Key: Sometimes single attribute is not sufficient to identify
uniquely the rows in the table so, we combine 2 or more attributes to identify the
rows uniquely.
(iii) Candidate keys: Sometimes 2 or more independent attribute or attributes can be
used to identify the rows uniquely Eg :( vech no,veng no,purchase date) Either
vehicle no or vehicle engine no can be used as a key attribute then they are called as
candidate keys one of the candidate key can be elected as primary key.
Example 1: Find candidate keys for the relation R(ABCD) having following FD’s ABCD,
CA, DA.
Sol: From the given FD’s, the attribute B is key attribute because it is not in RHS of
functional dependency.
AD+ =AD
From the above attributes AB and BC determines all attributes.
AB, BC are candidate keys.
Example 2: Find candidate keys for the relation R(ABCDE) having following FD’s ABC,
CDE, BD, EA.
Sol: From the given FD’s, no attribute is key attribute because all are in RHS of
functional dependency. So check for all attributes of LHS.
A+ = ABC (∵ A BC)
= ABCD (∵ B D)
= ABCDE (∵ CD E)
B+ = BD (∵ B D)
E+ = EA (∵ E A)
= EABC (∵ A BC)
= EABCD (∵ B D)
C + = C
D + = D
CD+ = CDE (∵ CD E)
= CDEA (∵ E A)
= CDEAB (∵ A BC)
BC+ = BCD (∵ B D)
= BCDE (∵ CD E)
= BCDEA (∵ E A)
From the above attributes A, E, CD and BC determines all attributes.
A, E, CD, BC are candidate keys.
Different database designers may define different F.D’s sets from the same requirements. To
evaluate whether they are equivalent if we are able to derive all F.D’s in G from F and vice-
versa.
Sol:
Step 1: Take set F and enclose all FD’s in G that can be derived from F.
ACD
A+ from F
=A
=AC (∵ A C)
=ACD (∵ AC D)
A CD can be derived from F
EAH
E+ from F
=E
=EAD
=EADH
E AH can be derived from F
Step 2: Take set G and enclose all F.D’s in F that can be derived from
G. AC
A+ from G
=A
=ACD
A C can be derived from G
E AD
E+ from G
=E
=EAH
=EAHCD
E AH & E AD can be derived from G
G and F are equivalent.
(4) To identify the irreducible form of FD’s /canonical Form (minimal cover):
We try to minimize the functional dependency. The minimize FD should be equivalent to
original FD,
Procedure to find minimal set:
Step 1: Have single attributes on the RHS for every FD.
Step 2: Evaluate all F.D’s in step 1 for their necessity. If they are not necessary, remove them
from the list.
Step 3: Evaluate the necessity of the LHS attributes in FD’s obtained from step [Link] they are not
necessary remove from FD.
Step 4: Apply the union rule for common to LHS attribute in the FD’s obtained from step
[Link] we will get irreducible set.
Step 4:
Normal forms based on functional dependency (1NF, 2NF and 3 NF, Boyce-
Codd normal form (BCNF), 4NF)
Normalization means “split the tables into small tables which will contain less number of attributes in
such a way that table design must not contain any problem of inserting, deleting, updating anomalies
and guarantees no redundancy”.
The evolution of Normalization theories / Steps of Normalization / Different Normal Forms
is illustrated below-
1. First Normal Form (1NF)
2. Second Normal Form (2NF)
3. Third Normal Form (3NF)
4. Boyce-Codd Normal Form (BCNF)
5. Fourth Normal Form (4NF)
6. Fifth Normal Form (5NF).
Points to be Remember
1 NF is a mandatory NF and remaining are the optional
If you construct E-R diagrams in to the tables, then 4 NF and 5 NF need not be applied
on the table.
Practically applied normalization is upto 3NF and very rarely we will go beyond that.
2 NF dealing with the partial dependencies and 3NF is dealing with transitive
dependencies.
First Normal Form (1NF): A relation is said to in the 1NF if it is already in un-normalized
form and it satisfies the following conditions or rules or qualifications are:
1. Each attribute name must be unique.
2. Each attribute value must be single or atomic i.e., Single Valued Attributes.
3. Each row / record must be unique.
4. There is no repeating group’s.
Example: How do we bring an un-normalized table into first normal form? Consider the
following relation:
Solution: This table is not in first normal form because the [Color] column can contain
multiple values. For example, the first row includes values "red" and "green." To bring this
table to first normal form, we split the table into two tables and now we have the resulting
tables:
Second Normal Form (2NF): A relation is said to be in 2NF, if it is already in 1st NF and it
has no Partial Dependency i.e., no non-prime attribute is dependent on the only a part of the
candidate key.
(OR)
A relation is in second normal form if it satisfies the following conditions:
• It is in first normal form
• All non-key attributes are fully functional dependent on the primary key.
➔This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is
[Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is
only part of the primary key. Therefore, this table does not satisfy second normal form.
➔ To bring this table to second normal form, we break the table into two tables, and now we
have the following:
ABC BD
ABC BD
Q2 Consider the relation R=ABCDEF and set of FDs are A FC, CD, B E Find the
key and normalize into 2NF.
Third Normal Form (3NF): A database is in third normal form if it satisfies the following
conditions:
• It is in 2NF.
• There is no transitive functional dependency
By transitive functional dependency, we mean we have the following relationships in
the table: A is functionally dependent on B, and B is functionally dependent on C. In
this case, C is transitively dependent on A via B. and A non-key attribute is
depending on a non-key attribute.
➔ In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type].
Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive
functional dependency, and this structure does not satisfy third normal form.
➔ To bring this table to third normal form, we split the table into two as follows:
Q1 Given relation R(ABCDE) and F:{ABC, BD, DE} Decompose in into 3NF.
from the given FDs determine primary key. Necessary attributes to include in the key
are A, B (because this attributes are not in RHS of FD).
Find the closure set of AB
AB+ = ABC
= ABCD (∵ B D)
= ABCDE (∵ D E)
AB is a primary key.
From the FDs BD is partially depending on AB. So decompose the table.
(D is a non-prime attribute derived by a part of the key)
B+ = BDE
ABCD
B+
ABC BDE
ABC BD, DE
table is in 2NF but not in 3NF. Because DE is transitive dependency.
(No non-key attribute should determining a non-key attribute)
D+ = DE
BDE
D+
BD DE
BD DE
Table is 3NF.
The relations after decomposing into 3NF.
R1: ABC
R2: BD
R3: DE
Q2 Given relation R=ABCDEFGHIJ and the set of FDs are AB C, ADE, BF, FGH,
D IJ Decompose R into 3NF.
Q3(a) Given a set of FDs for the relation schema R(ABCD) with primary key AB under
which R is 1NF but not in 2NF
(b) Find FDs such that R is in 2NF but not in 3NF
Sol: R=ABCD
Key=AB
(a) Atomic values are allowed in 1NF and partial dependency is not allowed in 2NF.
The following FDs are allowed.
B C, AC, B D, A D
(show the FDs which is having partial dependency)
(b) According to question partial dependencies are not allowed and transitivity
dependency is allowed. The following FDs are allowed.
C D, DC
Boyce-Codd normal form (BCNF): A relation is said to be in BCNF, if and only if every
determinant should be a candidate key.
✓ BCNF is the advance version of 3NF. It is stricter than 3NF.
✓ A table is in 3NF if for every functional dependency X → Y, X is the super key of the table.
✓ For BCNF, the table should be in 3NF and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one
department. EMPLOYEE table:
Fourth Normal Form (4NF): A relation said to be in 4NF if it is in Boyce Codd normal
form and should have no multi-valued dependency.
✓ For a dependency A→ B, if for a single value of A, multiple value of B exists then the
relation will be multi-valued dependency.
✓ Note: Multi Valued Dependency: A table is said to have multi-valued dependency, if the
following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the
table may have multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R (A, B, C), if there is a multi-valued dependency between, A and
B, then B and C should be independent of each other.
◼ If all these conditions are true for any relation (table), it is said to have multi-valued
dependency.
Example
The given STUDENT table is in 3NF but the COURSE and HOBBY are two independent
entity. Hence, there is no relationship between COURSE and HOBBY. In the STUDENT
relation, student with STU_ID, 21 contains two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID,
which leads to un-necessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STUDENT_HOBBY
Review Questions
13. Give asset o FDs for the relation schema R(A,B,C,D) with primary key AB under which
R is in 1NF but not in 2NF.
14. Why is a relation that is in 3NF generally considered good? Explain.
15. Discuss about 4NF with suitable example.
16. What are the problems caused by redundantly storing information? Explain
17. Given Relation, R=(A,B,C,D,E,F,G) and Functional Dependencies
F={ {A,B}→{C}, { A,C}→{B}, {A,D}→{E}, {B}→{D}, { B,C}→{A}, {E}→{F}}.
Check whether the following decomposition of R into R1=(A,B,C), R2=(A,C,D,E) and
R3=(A,D,F) is satisfying the lossless Decomposition property.
18. What is dependency preservation property for decomposition? Explain why it is important.
19. Given a Relation R=(X,Y,Z) and Functional Dependencies are F={ {X,Y}→{Z}, {Z}→{X} }
Determine all Candidate keys of R and the normal form of R with proper explanation.
20. Define functional dependency? How can you compute the minimal cover for a set of
functional dependencies? Explain it with an example.
21. Consider schema R = (A, B, C, G, H, I) and the set F of functional dependencies {AB, AC,
CG H, CGI, BH}. Compute the candidate keys of the schema. Compute the closure of the
same.
22. Explain 3NF & BCNF. What is the difference between them?
23. What is functional dependency? Explain its usage in database design.
24. What is a surrogate key? How can it be used for schema refinement?
25. How to compute closure of set of functional dependency? Explain with a suitable example schema.
26. What is multi valued dependency? State and explain fourth normal form based on this concept.
27. Given a set of FDs for the relation schema R(A,B,C,D) with Primary key AB, and D C or
C D or AC D or AD C or BC D or BD C. In which normal form is R?
28. Discuss the problems caused by redundancy and the purpose of normalization.
29. Give relation schemas for the following normal forms
i) 2NF but not in 3NF ii) 3NF but not in BCNF
References:
Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
C.J. Date, Introduction to Database Systems, Pearson Education.
Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.
Syllabus:
Transaction Concept: Transaction State, Implementation of Atomicity and Durability,
Concurrent Executions, Serializability, Recoverability, Implementation of Isolation, Testing for
Serializability, Failure Classification, Storage, Recovery and Atomicity, Recovery algorithm.
Indexing Techniques: B+ Trees: Search, Insert, Delete algorithms, File Organization and
Indexing, Cluster Indexes, Primary and Secondary Indexes , Index data Structures, Hash Based
Indexing: Tree base Indexing ,Comparison of File Organizations, Indexes and Performance
Tuning.
Objectives:
After studying this unit, you will be able to:
Discuss the different types of ACID properties and its implementation.
Describe concurrent Execution, Serializability and Recoverability.
Understand physical design of a database system, by discussing Database indexing
techniques and storage techniques
Examine issues in data storage and query processing and can formulate appropriate
solutions
DATABASE MANAGEMENT SYSTEMS UNIT – IV : NORMALIZATION
5.1. Introduction
• A transaction is a unit of program execution that accesses and possibly updates
various data items.
• The transaction consists of all operations executed between the statements begin
and end of the transaction
• Transaction operations: Access to the database is accomplished in a transactionby
the following two operations:
read (X): Performs the reading operation of data item X from the database
write (X): Performs the writing operation of data item X to the database
• A transaction must see a consistent database
• During transaction execution the database may be inconsistent
• When the transaction is committed, the database must be consistent
• Two main issues to deal with:
Failures, e.g. hardware failures and system crashes
Concurrency, for simultaneous execution of multiple transactions
5.2 ACID Properties
To preserve integrity of data, the database system must ensure:
• Atomicity: Either all operations of the transaction are properly reflected in the
database or none are
• Consistency: Execution of a transaction in isolation preserves the consistency of the
database
• Isolation: Although multiple transactions may execute concurrently, each transaction
must be unaware of other concurrently executing transactions; intermediate
transaction results must be hidden from other concurrently executed transactions
• Durability: After a transaction completes successfully, the changes it has made to
the database persist, even if there are system failures
Example of Fund Transfer: Let Ti be a transaction that transfers 50 from account A to B. This
transaction can be illustrated as follows
write(B)
result.
• Durability: once the user has been notified that the transaction has completed, the
updates to the database by the transaction must persist despite failures.
• Isolation: between steps 3 and 6, no other transaction should access the partially
updated database, or else it will see an inconsistent state (the sum A + B will be less
than it should be).
Example Schedules
• Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The
following is a serial schedule (Schedule 1 in the text), in which T1 is followed by T2.
Schedule 1
• Let T1 and T2 be the transactions defined previously. The following schedule is not a
serial schedule, but it is equivalent to above Schedule.
Schedule 2
• The following concurrent schedule does not preserve thevalue of the sum A + B
Schedule 3
Serializable Schedule
A serializable schedule over a set S of committed transactions is a schedule whose
effect on any consistent database is guaranteed to be identical to that of some
complete serial schedule over S. i.e., even though the actions of transactions are
interleaved, the result of executing transactions serially in different order may
produce different results.
Example: The schedule shown in the following figure is serializable.
T1 T2
R(A)
W(A)
R(A)
W(A)
R(B)
W(B)
R(B)
W(A)
Commi
t
Commi
t
Even though the actions of T1 and T2 are interleaved, the result of this schedule is
equivalent to first running T1 entirely and then running and T2 entirely. Actually T1‘s
read and write of B is not influenced by T2‘s actions on B, and the net effect is the same if
these actions are the serial schedule First T1, then T2. This schedule is also serializable if
first T2, then T1. Therefore if T1 and T2 are submitted concurrently to a DBMS, either of
these two schedules could be chosen as first
A DBMS might sometimes execute transactions which is not a serial execution i.e., not
serializable.
This can be happen for two reasons:
First the DBMS might use a concurrency control method that ensures the
executed schedule itself.
Second, SQL gives programmers the authority to instruct the DBMS tochoose
non-serializable schedule.
Anomalies due to Interleaved execution
There are three main situations when the actions of two transactions T1 and T2
conflict with each other in the interleaved execution on the same data object.
■ Write-Read (WR) Conflict: Reading Uncommitted data.
■ Read-Write (RW) Conflict: Unrepeatable Reads
■ Write-Write (WW) Conflict: Overwriting Uncommitted Data.
T1 T2
R(A)
A: = A -
100 W(A)
R(A)
A: = A + 0.06
A W(A)
R(B)
B:= B+.06
R(B) BW(B)
B: = B + Commit
100W(B)
Commit
T1 T2
R(A)
A: = A -
100 W(A)
R(A)
A: = A +
0.06 A W(A)
R(B)
B:= B+.06
Abort B W(B)
Commit
Whereas, a recoverable schedule is one in which transactions read only the changes of
committed transactions.
5.5 Serializability
Basic Assumption – Each transaction, on its own, preserves database consistency
• i.e. serial execution of transactions preserves database consistency
A (possibly concurrent) schedule is serializable if it is equivalent to a serial
schedule
Different forms of schedule equivalence give rise to the notions of conflict
serializability and view serializability
Simplifying assumptions:
• ignore operations other than read and write instructions
• assume that transactions may perform arbitrary computations on data inlocal buffers
between reads and writes
• simplified schedules consist only of reads and writes
Conflict Serializability
Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there
exists some item Q accessed by both li and lj, and at least one of these instructions
wrote Q.
1. li = read(Q), lj = read(Q). li and lj don’t conflict.
2. li = read(Q), lj = write(Q). They conflict.
3. li = write(Q), lj = read(Q). They conflict
4. li = write(Q), lj = write(Q). They conflict
Intuitively, a conflict between li and lj forces a (logical) temporal order between
them
If li and lj are consecutive in a schedule and they do not conflict, their results
would remain the same even if they had been interchanged in the ordering
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-
conflicting instructions, we say that S and S´ are conflict equivalent.
We say that a schedule S is conflict serializable if it is conflict equivalent to a
serial schedule
non-conflicting instructions.
Therefore Schedule 3 is conflict serializable.
View Serializability
• Let S and S´ be two schedules with the same set of transactions. S and S´ are view
equivalent if the following three conditions are met, where Q is a data item and Ti is a
transaction:
1. If Ti reads the initial value of Q in schedule S, then Ti must, in schedule
S´, also read the initial value of Q
2. If Ti executes read(Q) in schedule S, and that value was produced by
transaction Tj (if any), then transaction Ti must in schedule S´ also read the
value of Q that was produced by transaction Tj
3. The transaction (if any) that performs the final write(Q) operation inschedule S
(for any data item Q) must perform the final write(Q) operationin schedule S´
NB: View equivalence is also based purely on reads and writes
• A schedule S is view serializable it is view equivalent to a serial schedule
• Every conflict serializable schedule is also view serializable
• Schedule 9 (from book) — a schedule which is view-serializable but not conflict
serializable
Every view serializable schedule that is not conflict serializable has blind writes
Other Notions of Serializability
• This schedule produces the same outcome as the serial schedule < T1, T5 >
• However it is not conflict equivalent or view equivalent to it
• Determining such equivalence requires analysis of operations other than read and
write
• The precedence graph test for conflict serializability must be modified to apply to a
test for view serializability
■ The problem of checking if a schedule is view serializable is NP-complete. Thus
existence of an efficient algorithm is unlikely. However practical algorithms that just
check some sufficient conditions for view serializability can still be used
5.6 Recoverability
Need to address the effect of transaction failures on concurrently running
transactions.
Recoverable schedule: if a transaction Tj reads a data item previously written by a
transaction Ti , the commit operation of Ti appears before the commit operation of
Tj
The following schedule (Schedule 11) is not recoverable if T9 commits immediately
after the read
• If T8 should abort, T9 would have read (and possibly shown to the user) an
inconsistent database state. Hence database must ensure that schedules are
recoverable
• Cascading rollback – a single transaction failure leads to a series of transaction rollbacks
• Consider the following schedule where none of the transactions has yet committed
(so the schedule is recoverable)
• If T10 fails, T11 and T12 must also be rolled back
• Can lead to the undoing of a significant amount of work
• Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions
Ti and Tj such that Tj reads a data item previously written by Ti, the commit
operation of Ti appears before the read operation of Tj
Stable-Storage Implementation
Fixed-Length Records
As an example, let us consider a file of instructor records for our university database.
Each record of this file is defined (in pseudocode) as:
We allocate the maximum number of bytes that each attribute can hold. Then, the
instructor record is 53 bytes long.
A simple approach is to use the first 53 bytes for the first record, the next 53 bytes for the
second record, and so on as shown below figure.
To avoid the first problem, we allocate only as many records to a block as would fit
entirely in the block (this number can be computed easily by dividing the block size by the
record size, and discarding the fractional part). Any remaining bytes of each block are left
unused.
To avoid the second problem, When a record is deleted, we could move the record that
came after it into the space formerly occupied by the deleted record, and so on, until every
record following the deleted record has been moved ahead shown in below figure.
Such an approach requires moving a large number of records. It might be easier simply
to move the final record of the file into the space occupied by the deleted record shown in
below figure.
It is undesirable to move records to occupy the space freed by a deleted record, since
doing so requires additional block accesses. Since insertions tend to be more frequent than
deletions, it is acceptable to leave open the space occupied by the deleted record, and to wait
for a subsequent insertion before reusing the space.
A simple marker on a deleted record is not sufficient, since it is hard to find this
available space when an insertion is being done. Thus, we need to introduce an additional
structure.
At the beginning of the file, we allocate a certain number of bytes as a file header. The
header will contain a variety of information about the file. For now, all we need to store there
is the address of the first record whose contents are deleted.
We use this first record to store the address of the second available record, and so on.
Intuitively, we can think of these stored addresses as pointers, since they point to the location of
a record. The deleted records thus form a linked list, which is often referred to as a free list.
Below figure shows with the free list, after records 1, 4, and 6 have been deleted. On
insertion of a new record, we use the record.
Variable-Length Records
Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or more fields.
• Record types that allow repeating fields, such as arrays or multisets.
Different techniques for implementing variable-length records exist. Two different problems
must be solved by any such technique:
• How to represent a single record in such a way that individual attributes can be extracted
easily.
• How to store variable-length records within a block, such that records in a block can be
extracted easily.
The figure also illustrates the use of a null bitmap, which indicates which attributes of the
record have a null value. In this particular record, if the salary were null, the fourth bit of the
bitmap would be set to 1, and the salary value stored in bytes 12 through 19 would be ignored.
and the first record. If a record is inserted, space is allocated for it at the end of free space, and
an entry containing its size and location is added to the header.
If a record is deleted, the space that it occupies is freed, and its entry is set to deleted (its
size is set to −1, for example). Further, the records in the block before the deleted record are
moved, so that the free space created by the deletion gets occupied, and all free space is again
between the final entry in the header array and the first record. The end-of-free-space pointer
in the header is appropriately updated as well. Records can be grown or shrunk by similar
techniques, as long as there is space in the block. The cost of moving the records is not too
high, since the size of a block is limited: typical values are around 4 to 8 kilobytes.
A sequential file is designed for efficient processing of records in sorted order based on
some search key. A search key is any attribute or set of attributes; it need not be the primary
key, or even a superkey. To permit fast retrieval of records in search-key order, we chain
together records by pointers. The pointer in each record points to the next record in search-key
order. Furthermore, to minimize the number of block accesses in sequential file processing, we
store records physically in search-key order, or as close to search-key order as possible.
Blow figure shows a sequential file of instructor records taken from our university example.
In that example, the records are stored in search-key order, using ID as the search key.
It is difficult, however, to maintain physical sequential order as records are inserted and
deleted, since it is costly to move many records as a result of a single insertion or deletion. We
can manage deletion by using pointer chains, as we saw previously. For insertion, we apply
the following rules:
1. Locate the record in the file that comes before the record to be inserted in search-key order.
2. If there is a free record (that is, space left after a deletion) within the same block as this
record, insert the new record there. Otherwise, insert the new record in an overflow block. In
either case, adjust the pointers so as to chain together the records in search-key order.
Below figure shows the record after the insertion of the record (32222, Verdi, Music, 48000).
An index for a file in a database system works in much the same way as the index in this
textbook. If we want to learn about a particular topic (specified by a word or a phrase) in this
textbook, we can search for the topic in the index at the back of the book, find the pages where
it occurs, and then read the pages to find the information for which we are looking.
Database-system indices play the same role as book indices in libraries. For example, to
retrieve a student record given an ID, the database system would look up an index to find on
which disk block the corresponding record resides, and then fetch the disk block, to get the
appropriate student record.
There are two basic kinds of indices:
• Ordered indices. Based on a sorted ordering of the values.
• Hash indices. Based on a uniform distribution of values across a range of buckets. The
bucket to which a value is assigned is determined by a function, called a hash function.
An attribute or set of attributes used to look up records in a file is called a search key.
Dense index
Sparse index
Multilevel Indices
If an index is small enough to be kept entirely in main memory, the search time to find
an entry is low. However, if the index is so large that not all of it can be kept in memory, index
blocks must be fetched from disk when required. (Even if an index is smaller than the main
memory of a computer, main memory is also required for a number of other tasks, so it may
not be possible to keep the entire index in memory.) The search for an entry in the index then
requires several disk-block reads.
We treat the index just as we would treat any other sequential file, and construct a
sparse outer index on the original index, which we now call the inner index, as shown in
below figure. Note that the index entries are always in sorted order, allowing the outer index
to be sparse. To locate a record, we first use binary search on the outer index to find the record
for the largest search-key value less than or equal to the one that we desire. The pointer points
to a block of the inner index. We scan this block until we find the record that has the largest
search-key value less than or equal to the one that we desire. The pointer in this record points
to the block of the file that contains the record for which we are looking.
Index Update
Regardless of what form of index is used, every index must be updated whenever a record is
either inserted into or deleted from the file.
We first describe algorithms for updating single-level indices.
• Insertion. First, the system performs a lookup using the search-key value that appears in the
record to be inserted. The actions the system takes next depend on whether the index is dense
or sparse:
◦ Dense indices:
1. If the search-key value does not appear in the index, the system inserts an index entry
with the search-key value in the index at the appropriate position.
2. Otherwise the following actions are taken:
a. If the index entry stores pointers to all records with the same search key value,
the system adds a pointer to the new record in the index entry.
b. Otherwise, the index entry stores a pointer to only the first record with the
search-key value. The system then places the record being inserted after the other
records with the same search-key values.
◦ Sparse indices: We assume that the index stores an entry for each block. If the system
creates a new block, it inserts the first search-key value (in search-key order) appearing
in the new block into the index. On the other hand, if the new record has the least
search-key value in its block, the system updates the index entry pointing to the block;
if not, the system makes no change to the index.
• Deletion. To delete a record, the system first looks up the record to be deleted. The actions
the system takes next depend on whether the index is dense or sparse:
◦ Dense indices:
1. If the deleted record was the only record with its particular search-key value, then the
system deletes the corresponding index entry from the index.
2. Otherwise the following actions are taken:
a. If the index entry stores pointers to all records with the same search key value,
the system deletes the pointer to the deleted record from the index entry.
b. Otherwise, the index entry stores a pointer to only the first record with the
search-key value. In this case, if the deleted record was the first record with the
search-key value, the system updates the index entry to point to the next record.
◦ Sparse indices:
1. If the index does not contain an index entry with the search-key value of the deleted
record, nothing needs to be done to the index.
2. Otherwise the system takes the following actions:
a. If the deleted record was the only record with its search key, the system
replaces the corresponding index record with an index record for the next search-
key value (in search-key order). If the next search-key value already has an index
entry, the entry is deleted instead of being replaced.
b. Otherwise, if the index entry for the search-key value points to the record
being deleted, the system updates the index entry to point to the next record with
the same search-key value.
Secondary Indices
Secondary indices must be dense, with an index entry for every search-key value, and a
pointer to every record in the file. A clustering index may be sparse, storing only some of the
search-key values, since it is always possible to find records with intermediate search-key
values by a sequential access to a part of the file, as described earlier. If a secondary index
stores only some of the search-key values, records with intermediate search-key values may be
anywhere in the file and, in general, we cannot find them without searching the entire file.
We can use an extra level of indirection to implement secondary indices on search keys
that are not candidate keys. The pointers in such a secondary index do not point directly to the
file. Instead, each points to a bucket that contains pointers to the file.
Below figure shows the structure of a secondary index that uses an extra level of
indirection on the instructor file, on the search key salary.
Structure of a B+ Tree
Queries on B+ Trees
Let us consider how we process queries on a B+-tree. Suppose that we wish to find
records with a search-key value of V.
Intuitively, the function starts at the root of the tree, and traverses the tree down until it
reaches a leaf node that would contain the specified value if it exists in the tree. Specifically,
starting with the root as the current node, the function repeats the following steps until a leaf
node is reached. First, the current node is examined, looking for the smallest i such that search-
key value Ki is greater than or equal to V. Suppose such a value is found; then, if Ki is equal to
V, the current node is set to the node pointed to by Pi+1, otherwise Ki > V, and the current node
is set to the node pointed to by Pi. If no such value Ki is found, then clearly V > Km−1, where Pm
is the last non null pointer in the node. In this case the current node is set to that pointed to by
Pm. The above procedure is repeated, traversing down the tree until a leaf node is reached.
At the leaf node, if there is a search-key value equal to V, let Ki be the first such value;
pointer Pi directs us to a record with search-key value Ki. The function then returns the leaf
node L and the index i. If no search-key with value V is found in the leaf node, no record with
key value V exists in the relation, and function find returns null, to indicate failure.
Updates on B+ Trees
When a record is inserted into, or deleted from a relation, indices on the relation must
be updated correspondingly.
Insertion
Deletion
Operation
• Insertion − When a record is required to be entered using static hash, the hash
function h computes the bucket address for search key K, where the record will be
stored.
Bucket address = h(K)
• Search − When a record needs to be retrieved, the same hash function can be used to
retrieve the address of the bucket where the data is stored.
• Delete − This is simply a search followed by a deletion operation.
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
• Overflow Chaining − When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.
• Linear Probing − When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-demand. Dynamic hashing is also
known as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a
few are used initially.
Review Questions
1. Explain about the measures that are to be considered for comparing the performance of
various file organization techniques.
2. Explain in detail B+ tree file organization.
3. Write short notes on: i) Primary index ii) Clustered index iii) Secondary index.
4. Explain various anomalies that arise due to interleaved execution of transactions with
suitable examples.
5. What is static hashing? What rules are followed for index selection?
6. Define transaction and explain desirable properties of transactions.
7. What is database Recovery? Explain Shadow paging in detail.
8. Explain about Conflict Serializability and view serializability.
9. Explain the following a) Concurrent executions, b) Transaction states.
References:
• Raghurama Krishnan, Johannes Gehrke, Database Management Systems, 3rd Edition, Tata
McGraw Hill.
• C.J. Date, Introduction to Database Systems, Pearson Education.
• Elmasri Navrate, Fundamentals of Database Systems, Pearson Education.