0% found this document useful (0 votes)
12 views120 pages

DBMS Overview: Concepts and Architecture

The document provides an overview of Database Management Systems (DBMS), detailing their purpose, structure, and the various data models used to manage databases. It discusses the limitations of traditional file systems, the importance of data abstraction, and the concepts of data independence, schemas, and instances. Additionally, it covers different database languages and their functions, as well as the differences between SQL commands such as DELETE, TRUNCATE, and DROP.

Uploaded by

karmakarmegha01
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views120 pages

DBMS Overview: Concepts and Architecture

The document provides an overview of Database Management Systems (DBMS), detailing their purpose, structure, and the various data models used to manage databases. It discusses the limitations of traditional file systems, the importance of data abstraction, and the concepts of data independence, schemas, and instances. Additionally, it covers different database languages and their functions, as well as the differences between SQL commands such as DELETE, TRUNCATE, and DROP.

Uploaded by

karmakarmegha01
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Management System

IT-601
Contracts: 3L
Credits- 3

Module 1
Introduction :Concept & Overview of DBMS, Data Models, Database Languages, , Database Users, Three
Schema architecture of DBMS.

Introduction A database management system is a collection of interrelated data and a set of


programs to access those data.
Collection of data is referred to as a database. Primary goal of dbms is to provide a way to store and
retrieve database information that is both convenient and efficient. Dbms allows us to define structure
for storage of information and also provides mechanism to manipulate this information. Dbms also
provides safety for the information stored despite system crashes or attempts of authorized access.

What is a database?

It is a collection of data, typically describing the activities of one or more related organizations, e.g., a
University, an airline reservation or a banking database.

Limitations of data processing environment:- (FILE SYSTEM)

1) Data redundancy and consistency:- Different files have different formats of programs written in
different programming languages by different users. So the same information may be duplicated in
several files. It may lead to data inconsistency. If a customer changes his address, then it may be
reflected in one copy of data but not in the other.
2) Difficulty in accessing data:- The file system environment does not allow needed data to be retrieved
in a convenient and efficient manner.
3) Data isolation:- Data is scattered in various files; so it gets isolated because file may be in different
formats.
4) Integrity problems:- Data values stored in the database must satisfy consistency constraints. Problem
occurs when constraints involve several data items from different files.
5) Atomicity problems:- If failure occurs, data must be stored to constant state that existed prior to
failure. For example, if in a bank account, a person abc is transferring Rs 5000 to the account of pqr, and
abc has withdrawn the money but before it
gets deposited to the pqr’s account, the system failure occurs, then Rs5000 should be deposited back to
abc’s bank account.
6) Concurrent access anomalies:- Many systems allow multiple users to update data simultaneously.
Concurrent updates should not result in inconsistent data.
7) Security problems:- Not every user of the database system should be able to access all data. Data
base should be protected from access by unauthorized users.

View of Data: DATA ABSTRACTION Architecture for a database system

Department of Information Technology


Nivedita Ray Page 1
Database Management System

Major purpose of dbms is to provide users with abstract view of data i.e. the system hides certain details
of how the data are stored and maintained.
Since database system users are not computer trained, developers hide the complexity from users
through 3 levels of abstraction, to simplify user’s interaction with the system.
1) Physical level of data abstraction:
This is the lowest level of abstraction which describes how data are actually stored.
2) Logical level of data abstraction:
This level hides what data are actually stored in the database and what relationship exists among them.
3) View Level of data abstraction:
View provides security mechanism to prevent user from accessing certain parts of database.

Explanation:
Physical level: describes how a record (e.g., customer) is stored.
Logical level: describes data stored in database, and the relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
View level: application programs hide details of data types. Views can also hide information (e.g. salary)
for security purposes.

DATA INDEPENDENCE:

We can define two types of data independence:


1. Logical data independence:

Department of Information Technology


Nivedita Ray Page 2
Database Management System

It is the capacity to change the conceptual schema without having to change external schemas or
application programs. We may change the conceptual schema to expand the database (by adding a
record type or data item), or to reduce the database (by
removing a record type or data item). In the latter case, external schemas that refer only to the
remaining data should not be affected. Only the view definition and the mappings need be changed in a
DBMS that supports logical data independence.
Application programs that reference the external schema constructs must work as before, after the
conceptual schema undergoes a logical reorganization. Changes to constraints can be applied also to the
conceptual schema without affecting the external schemas or application programs.

2. Physical data independence:


It is the capacity to change the internal schema without having to change the conceptual (or external)
schemas. Changes to the internal schema may be needed because some physical files had to be
reorganized—for example, by creating additional access structures—to improve the performance of
retrieval or update. If the same data as before remains in the database, we should not have to change
the conceptual schema. Whenever we have a multiplelevel DBMS, its catalog must be expanded to
include information on how to map requests and data among the various levels. The DBMS uses
additional software to accomplish these mappings by referring to the mapping information in the
catalog. Data independence is accomplished because, when the schema is changed at some level, the
schema at the next higher level remains unchanged; only the mapping between the two levels is
changed. Hence, application programs referring to the higher-level schema need not be changed.

Database Schema: Structure of data (metadata). Physically stored in "Data dictionary".


Database Instance: The collection of data stored in the database at a particular moment. Physically
stored in one or more "data file" or "data chunk".

Database Schemes and Database Instances

Independent from the database model it is important to differentiate between the description of the
database and the database itself. The description of the database is called database scheme or also
metadata. The database scheme is defined during the database design process and changes very rarely
afterwards.
The actual content of the database, the data, changes often over the years. A database state at a
specific time defined through the currently existing content and relationship and their attributes is
called a database instance. The following illustration shows that a database scheme could be looked at
like a template or building plan for one or several database instances.

Department of Information Technology


Nivedita Ray Page 3
Database Management System

Analogy Database Schemes and Building Plans

When designing a database it is differentiated between two levels of abstraction and their respective
data schemes, the conceptual and the logical data scheme.

Conceptual Data Scheme:


A conceptual data scheme is a system independent data description. That means that it is
independent from the database or computer systems used.
Logical Data Scheme:
A logical data scheme describes the data in a data definition language DDL of a specific database
management system.

The conceptual data scheme orients itself exclusively by the database application and therefore by the
real world. It does not consider any data technical infrastructure like DBMS or computer systems, which
are eventually employed. Entity relationship diagrams and relations are tools for the development of a
conceptual scheme.
When designing a database the conceptual data scheme is derived from the logical data scheme
Relational Database Design. This derivation results in a logical data scheme for one specific application
and one specific DBMS. A DB-Development System converts then the logical scheme directly into
instructions for the DBMS.

Department of Information Technology


Nivedita Ray Page 4
Database Management System

DATA MODELS:

Many data models have been proposed, and we can categorize them according to the types of concepts
they use to describe the database structure.
High-level or conceptual data models provide concepts that are close to the way many users perceive
data, whereas lowlevel or physical data models provide concepts that describe the details of how data
is stored in the computer. Concepts provided by low-level data models are generally meant for
computer specialists, not for typical end users. Between these two extremes is a class of
representational (or implementation) data models, which provide concepts that may be understood by
end users but that are not too far removed from the way data is organized within the computer.
Representational data models hide some details of data storage but can be implemented on a computer
system in a direct way. Conceptual data models use concepts such as entities, attributes, and
relationships.

An entity represents a real-world object or concept, such as an employee or a project, that is described
in the database. An attribute represents some property of interest that further describes an entity, such
as the employee’s name or salary. A relationship among two or more entities represents an interaction
among the entities, which is explained by the Entity-Relationship model—a popular high-level
conceptual data model. Representational or implementation data models are the models used most
frequently in traditional commercial DBMSs, and they include the widely-used relational data model, as
well as the so-called legacy data models—the network and hierarchical
models—that have been widely used in the past. We can regard object data models as a new family of
higherlevel implementation data models that are closer to conceptual
data models. Object data models are also frequently utilized as highlevel conceptual models,
particularly in the software engineering domain.
Physical data models describe how data is stored in the computer by representing information such as
record formats, record orderings, and access paths. An access path is a structure that makes the search
for particular database records efficient.

OVERVIEW OF CONVENTIONAL DATA MODELS:

Hierarchical Data Model:


One of the most important applications for the earliest database management systems was production
planning for manufacturing companies. If an automobile manufacturer decided to produce 10,000 units
of one car model and 5,000 units of another model, it needed to know how many parts to order from its
suppliers. To answer the question, the product (a car) had to be decomposed into assemblies (engine,
body, chassis), which were decomposed into subassemblies (valves, cylinders, spark plugs), and then
into sub-subassemblies, and so on. Handling this list of parts, known as a bill of materials, was a job
tailor-made for computers. The bill of materials for a product has a natural hierarchical structure. To
store this data, the hierarchical data model, illustrated in Figure below was developed. In this model,
each record in the database represented a specific part. The records had parent/child relationships,
linking each part to its subpart, and so on.

Department of Information Technology


Nivedita Ray Page 5
Database Management System

A hierarchical bill-of-materials database

To access the data in the database, a program could:


• find a particular part by number (such as the left door),
• move "down" to the first child (the door handle),
• move "up" to its parent (the body), or
• move "sideways" to the next child (the right door).
Retrieving the data in a hierarchical database thus required navigating through the records, moving up,
down, and sideways one record at a time.
One of the most popular hierarchical database management systems was IBM's
Information Management System (IMS), first introduced in 1968.
The advantages of IMS and its hierarchical model are as follows:
• Simple structure: The organization of an IMS database was easy to understand. The database
hierarchy paralleled that of a company organization chart or a family tree.
• Parent/child organization: An IMS database was excellent for representing parent/child relationships,
such as "A is a part of B" or "A is owned by B."
• Performance: IMS stored parent/child relationships as physical pointers from one data record to
another, so that movement through the database was rapid. Because the structure was simple, IMS
could place parent and child records close to one another on the disk, minimizing disk input/output. IMS
is still a very widely used DBMS on IBM mainframes. Its raw performance makes it the database of
choice in high-volume transaction processing applications such as processing bank ATM transactions,
verifying credit card numbers, and tracking the delivery of overnight packages. Although relational
database performance has improved dramatically over the last decade, the
performance requirements of applications such as these have also increased, insuring a continued role
for IMS.

Network Data Model:

The simple structure of a hierarchical database became a disadvantage when the data had a more
complex structure. In an order-processing database, for example, a single order might participate in
three different parent/child relationships, linking the order to the customer who placed it, the
salesperson who took it, and the product ordered. The structure of this type of data simply didn't fit the
strict hierarchy of IMS. To deal with applications such as order processing, a new network data model
was developed. The network data model extended the hierarchical model by allowing a record to
participate
in multiple parent/child relationships. For a programmer, accessing a network database was very similar
to accessing a hierarchical database. An application program could:
• find a specific parent record by key (such as a customer number),

Department of Information Technology


Nivedita Ray Page 6
Database Management System

• Move down to the first child in a particular set (the first order placed by this customer),
• Move sideways from one child to the next in the set (the next order placed by the same customer), or
• move up from a child to its parent in another set (the salesperson who took the order).
Once again the programmer had to navigate the database record-by-record, this time specifying which
relationship to navigate as well as the direction.
Network databases had several advantages:
• Flexibility: Multiple parent/child relationships allowed a network database to represent data that did
not have a simple hierarchical structure.
• Standardization: The CODASYL standard boosted the popularity of the network model, and
minicomputer vendors such as Digital Equipment Corporation and Data General implemented network
databases.
• Performance: Despite their greater complexity, network databases boasted performance approaching
that of hierarchical databases. Sets were represented by pointers to physical data records, and on some
systems, the database administrator could specify data clustering based on a set relationship. Network
databases had their disadvantages, too. Like hierarchical databases, they were very rigid. The set
relationships and the structure of the records had to be specified in advance.
Changing the database structure typically required rebuilding the entire database.

Database Languages:

Data Definition Language (DDL) statements are used to define the database structure or schema. Some
examples:
CREATE - to create objects in the database
ALTER - alters the structure of the database
DROP - delete objects from the database
TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed
COMMENT - add comments to the data dictionary
RENAME - rename an object

Data Manipulation Language (DML) statements are used for managing data within schema objects.
Some examples:
SELECT - retrieve data from the a database
INSERT - insert data into a table
UPDATE - updates existing data within a table
DELETE - deletes all records from a table, the space for the records remain
MERGE - UPSERT operation (insert or update)
CALL - call a PL/SQL or Java subprogram
EXPLAIN PLAN - explain access path to data
LOCK TABLE - control concurrency

Data Control Language (DCL) statements. Some examples:


GRANT - gives user's access privileges to database
REVOKE - withdraw access privileges given with the GRANT command

Transaction Control (TCL) statements are used to manage the changes made by DML statements. It
allows statements to be grouped together into logical transactions.
COMMIT - save work done

Department of Information Technology


Nivedita Ray Page 7
Database Management System

SAVEPOINT - identify a point in a transaction to which you can later roll back
ROLLBACK - restore database to original since the last COMMIT
SET TRANSACTION - Change transaction options like isolation level and what rollback segment to use.

Difference between truncate and delete and drop:


SQL Delete Statement
This command deletes only the rows from the table based on the condition given in the where clause or
deletes all the rows from the table if no condition is specified. But it does not free the space containing
the table. The Syntax of a SQL DELETE statement is:
DELETE FROM table_name [WHERE condition];
For Example: To delete an employee with id 100 from the employee table, the sql delete query would
be like,
DELETE FROM employee WHERE id = 100;
To delete all the rows from the employee table, the query would be like,
DELETE FROM employee;
SQL TRUNCATE Statement
The SQL TRUNCATE command is used to delete all the rows from the table and free the space containing
the table.
Syntax to TRUNCATE a table:
TRUNCATE TABLE table_name;
For Example: To delete all the rows from employee table, the query would be like,
TRUNCATE TABLE employee;
SQL DROP Statement:
The SQL DROP command is used to remove an object from the database. If you drop a table, all the rows
in the table is deleted and the table structure is removed from the database. Once a table is dropped we
cannot get it back. When a table is dropped all the references to the table will not be valid.
Syntax to drop a sql table structure:
DROP TABLE table_name;
For Example: To drop the table employee, the query would be like DROP TABLE employee;

Difference between DROP and TRUNCATE Statement:

If a table is dropped, all the relationships with other tables will no longer be valid, the integrity
constraints will be dropped, grant or access privileges on the table will also be dropped, if you want use
the table again it has to be recreated with the integrity constraints, access privileges and the
relationships with other tables should be established again. But, if a table is truncated, the table
structure remains the same, therefore any of the above problems will not exist.

Different users of DB:

DBMS developer: person or a group of people who design and implement DBMS software.
Database Administrator (DBA): person or a group of people who define, modify, and manage a
database using an existing DBMS.
Database User (Naive User, Sophisticated User, and DB Application Programmer): Person who uses a
DBS.

Department of Information Technology


Nivedita Ray Page 8
Database Management System

PEOPLE WHO WORK WITH THE DATABASE:

The people who use the database can be categorized


a) Database users
b) Database administrator (DBA).
a) Database users are of 4 different types:

1) Naive users:
These are the unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously.
E.g. consider a user who checks for account balance information over the World Wide Web. Such a user
access a form, enters the account number and password etc. And the application program on the
internet then retrieves the account balance using given account information which s passed to the user.
2) Application programmers:
These are computer professionals who write application programs, used to develop user interfaces. The
application programmer uses Rapid Application Development (RAD) toolkit or special type of
programming languages which include special features to facilitate generation of forms and display of
date on screen.
3) Sophisticated users:
These users interact with the database using database query language. They submit their query to the
query processor.
Then Data Manipulation Language (DML) functions are performed on the database to retrieve the data.
Tools used by these users are OLAP (Online Analytical Processing) and data mining tools.
4) Specialized users:
These users write specialized database applications to retrieve data. These applications can be used to
retrieve data with complex data types e.g. graphics data and audio data.
b) Database Administrator (DBA)
A person having who has central control over data and programs that access the data is called DBA.
Following are the functions of the DBA.
1) Schema definition: DBA creates database schema by executing Data Definition Language (DDL)
statements.
2) Storage structure and access method definition
3) Schema and physical organization modification: If any changes are to be made in the original
schema, to fit the need of your organization, then these changes are carried out by the DBA.
4) Granting of authorization foe data access: DBA can decide which parts of data can be accessed by
which users. Before any user access the data, dbms checks which rights are granted to the user by the
DBA.
5) Routine maintenance: DBA has to take periodic backups of the database, ensure that enough disk
space is available to store new data, ensure that performance of dbms ix not degraded by any operation
carried out by the users.

Department of Information Technology


Nivedita Ray Page 9
Database Management System

-----------Database System

Database Administrator [DBA]

Centralized control of the database is exerted by a person or group of persons under the supervision of
a high – level administrator. This person or group is referred to as the database administrator (DBA).
They are the users who are most familiar with the database and are responsible for creating, modifying,
and maintaining its three levels. Database Administrator is responsible to manage the DBMS’s use and
ensure that the database is functioning properly.

DBA administers the three levels of database and consultation with the overall user community, sets up
the definition of the global view of the various users and applications and is responsible the definition
and implementation of the internal level, including the storage structure and access methods to be used
for the optimum performance of the DBMS. DBA is responsible for granting permission to the users of
the database and stores the profile of each user in the database.

Responsibilities of DBA
Deciding the information content of the database – It is the DBA’s job to decide exactly what
information is to be held in the database – in other words, to identify the entities of interest to the
enterprise and to identify the information to be recorded about those entities. Having done this, the
DBA must then define the content of the database by writing the conceptual schema.

Deciding the storage structure and access strategy – The DBA must also decide how the data is to be
represented in the database, and must specify the representation by writing the storage structure
definition. In addition, the associated mapping between storage structure definition and the conceptual
schema must also be specified.

Liaising with the users – It is the business of the DBA to liaise with users, to ensure that the data they
require is available, and to write the necessary external schemas. In addition, the mapping between any
given external schema and the conceptual schema must also be specified. In practice the external DDL
will probably include the means for specifying the mapping, but the schema and the mapping should be
clearly distinguishable.

Department of Information Technology


Nivedita Ray De Sarkar Page
1010
Database Management System

Defining authorization checks and validation procedures – Authorization checks and validation
procedures may be considered as logical extensions of the conceptual schema. The conceptual DDL will
include facilities for specifying such checks and procedures.

DBMS ARCHITECTURE: Three Schema Architecture

The goal of the three-schema architecture, illustrated in above Figure, is to separate the user
applications and the physical database. In this architecture, schemas can be defined at the following
three levels:
1. The internal level has an internal schema, which describes the physical storage structure of the
database. The internal schema uses a physical data model and describes the complete details of data
storage and access paths for the database.
2. The conceptual level has a conceptual schema, which describes the structure of the whole database
for a community of users. The conceptual schema hides the details of physical storage structures and
concentrates on describing entities, data types, relationships, user operations, and constraints. A high-
level data model or an implementation data model can be used at this level.
3. The external or view level includes a number of external schemas or user views. Each external
schema describes the part of the database that a particular user group is interested in and hides the rest
of the database from that user group. A high-level
data model or an implementation data model can be used at this level.
The three-schema architecture is a convenient tool for the user to visualize the schema levels in a
database system. In most DBMSs that support user views, external schemas are specified in the same
data model that describes the conceptual-level information. Some DBMSs allow different data models to
be used at the conceptual and external levels. Notice that the three schemas are only descriptions of
data; the only data that actually exists is at the physical level. In a DBMS based on the three schema
architecture, each user group refers only to its own external schema. Hence, the DBMS must transform
a request specified on an external schema into a request against the conceptual schema, and then into a
request on the internal schema for processing over the stored database. If the request is database
retrieval, the data extracted from the stored database must be reformatted to match the user’s external
view. The processes of transforming requests and results between levels are called mappings. These
mappings may be time-consuming, so some DBMSs—especially those that are meant to support small

Department of Information Technology


Nivedita Ray De Sarkar Page
1111
Database Management System

databases—do not support external views. Even in such systems, however, a certain amount of mapping
is necessary to transform requests between the conceptual and internal levels.

Module 2
Entity-Relationship Model
Basic concepts, Design Issues, Mapping Constraints, Keys, Entity-Relationship Diagram, Weak
Entity Sets, Extended E-R features.
ENTITY RELATIONSHIP MODEL

ENTITY

 The basic object that the ER model represents is an entity, which is a "thing" in the real world
with an independent existence.
 An entity may be an object with a physical existence—a particular person, car, house, or
employee—or it may be an object with a conceptual existence—a company, a job, or a
university course.
ATTRIBUTES

• Each entity has attributes—the particular properties that describe it.


• For example, an employee entity may be described by the employee’s name, age, address, salary, and
job.
• A particular entity will have a value for each of its attributes.
• The attribute values that describe each entity become a major part of the data stored in the database.
• Several types of attributes occur in the ER model: simple versus composite; single-valued versus multi-
valued; and stored versus derived.

Composite versus Simple (Atomic) Attributes

• Composite attributes can be divided into smaller subparts, which represent more basic attributes with
independent meanings.
• For example, the Address attribute of the employee entity can be sub-divided into Street_Name, City,
State, and Zip.
• Attributes that are not divisible are called simple or atomic attributes.
• Composite attributes can form a hierarchy; for example, Name can be subdivided into three simple
attributes, First_Name, Middle Name, Last_Name.
• The value of a composite attribute is the concatenation of the values of its constituent simple
attributes.

Composite Attributes

Single-valued Versus Multi-valued Attributes

• Attributes which have only one value for a entity are called single valued attributes.
• E.g. For a student entity, RollNo attribute has only one single value.

Department of Information Technology


Nivedita Ray De Sarkar Page
1212
Database Management System

• But phone number attribute may have multiple values. Such values are called Multi-valued attributes.

Stored Versus Derived Attributes

• Two or more attribute values are related—for example, the Age and Birth Date attributes of a person.
• For a particular person entity, the value of Age can be determined from the current (today’s) date and
the value of that person’s Birth Date.
• The Age attribute is hence called a derived
• The attribute from which another attribute value is derived is called stored attribute.
• In the above example, date of birth is the stored attribute.
• Take another example, if we have to calculate the interest on some principal amount for a given time,
and for a particular rate of interest, we can simply use the interest formula.
o Interest=NPR/100;
• In this case, interest is the derived attribute whereas principal amount(P), time(N) and rate of
interest(R) are all stored attributes.

KEYS

An important constraint on the entities of an entity type is the key or uniqueness constraint on
attributes.
• A key is an attribute (also known as column or field) or a combination of attribute that is used to
identify records.
• Sometimes we might have to retrieve data from more than one table, in those cases we require to join
tables with the help of keys.
• The purpose of the key is to bind data together across tables without repeating all of the data in every
table
• Such an attribute is called a key attribute, and its values can be used to identify each entity uniquely.
• For example, the Name attribute is a key of the COMPANY entity type because no two companies are
allowed to have the same name.
• For the PERSON entity type, a typical key attribute is SocialSecurityNumber.
• Sometimes, several attributes together form a key, meaning that the combination of the attribute
values must be distinct for each entity.
• If a set of attributes possesses this property, we can define a composite attribute that becomes a key
attribute of the entity type.
The various types of key with e.g. in SQL are mentioned below, (For examples let suppose we have an
Employee Table with attributes ‘ID’ , ‘Name’ ,’Address’ , ‘Department_ID’ ,’Salary’)
(I) Super Key – An attribute or a combination of attribute that is used to identify the records uniquely is
known as Super Key. A table can have many Super Keys.

E.g. of Super Key


1 ID
2 ID, Name
3 ID, Address
4 ID, Department_ID
5 ID, Salary
6 Name, Address

Department of Information Technology


Nivedita Ray De Sarkar Page
1313
Database Management System

7 Name, Address, Department_ID ………… So on as any combination which can identify the records
uniquely will be a Super Key.
(II) Candidate Key – It can be defined as minimal Super Key or irreducible Super Key. In other words an
attribute or a combination of attribute that identifies the record uniquely but none of its proper subsets
can identify the records uniquely.
E.g. of Candidate Key
1 Code
2 Name, Address
For above table we have only two Candidate Keys (i.e. Irreducible Super Key) used to identify the
records from the table uniquely. Code Key can identify the record uniquely and similarly combination of
Name and Address can identify the record uniquely,
but neither Name nor Address can be used to identify the records uniquely as it might be possible that
we have two employees with similar name or two employees from the same house.
(III) Primary Key – A Candidate Key that is used by the database designer for unique identification of
each row in a table is known as Primary Key. A Primary Key can consist of one or more attributes of a
table.
E.g. of Primary Key - Database designer can use one of the Candidate Key as a Primary Key. In this case
we have “Code” and “Name, Address” as Candidate Key, we will consider “Code” Key as a Primary Key as
the other key is the combination of more than one attribute.
(IV) Foreign Key – A foreign key is an attribute or combination of attribute in one base table that points
to the candidate key (generally it is the primary key) of another table. The purpose of the foreign key is
to ensure referential integrity of the data i.e. only values that are supposed to appear in the database
are permitted.
E.g. of Foreign Key – Let consider we have another table i.e.
Department Table with Attributes “Department_ID”, “Department_Name”, “Manager_ID”,
”Location_ID” with Department_ID as an Primary Key. Now the Department_ID attribute of Employee
Table (dependent or child table) can be defined as the Foreign Key as it can reference to the
Department_ID attribute of the Departments table (the referenced
or parent table), a Foreign Key value must match an existing value in the parent table or be NULL.
(V) Composite Key – If we use multiple attributes to create a Primary Key then that Primary Key is called
Composite Key (also called a Compound Key or Concatenated Key).
E.g. of Composite Key, if we have used “Name, Address” as a Primary Key then it will be our Composite
Key.
(VI) Alternate Key – Alternate Key can be any of the Candidate Keys except for the Primary Key.
E.g. of Alternate Key is “Name, Address” as it is the only other Candidate Key which is not a Primary Key.
(VII) Secondary Key – The attributes that are not even the Super Key but can be still used for
identification of records (not unique) are known as Secondary Key.
E.g. of Secondary Key can be Name, Address, Salary, Department_ID etc. as they can identify the
records but they might not be unique.

RELATION

• There are several implicit relationships among the various entity types.
• In fact, whenever an attribute of one entity type refers to another entity type, some relationship
exists.
• For example, the attribute Manager of department refers to an employee who manages the
department.

Department of Information Technology


Nivedita Ray De Sarkar Page
1414
Database Management System

• In the ER model, these references should not be represented as relationships or relation. There is a
relation “borrower” in the entities customer and account which can be shown as follows:

CARDINALITY

Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can
be associated via a relationship set.
For a relationship set R between entity sets A and B, the mapping cardinality must be one of the
following:
There are three types of relationships
1) One to one
2) One to many
3) Many to many
2.5.1 One to one:
An entity in A is associated with at most one entity in B, and an entity in B is associated with at most one
entity in A.
2.5.2 One to many:
An entity in A is associated with any number (zero or more) of entities in B. An entity in B, however, can
be associated with at most one entity in A.
2.5.3 Many to one:
An entity in A is associated with at most one entity in B. An entity in B, however, can be associated with
any number (zero or more) of entities in A.
2.5.4 Many to many:
An entity in A is associated with any number (zero or more) of entities in B, and an entity in B is
associated with any number (zero or more) of entities in A.

(a) One to one. (b) One to many.

Department of Information Technology


Nivedita Ray De Sarkar Page
1515
Database Management System

(a) Many to one. (b) Many to many

PARTICIPATION

• The participation of an entity set E in a relationship set R is said to be total if every entity in E
participates in at least one relationship in R.
• If only some entities in E participate in relationships in R, the participation of entity set E in
relationship R is said to be partial.
• For example, we expect every loan entity to be related to at least one customer through the borrower
relationship.
• Therefore the participation of loan in the relationship set borrower is total.
• In contrast, an individual can be a bank customer whether or not she has a loan with the bank.
• Hence, it is possible that only some of the customer entities are related to the loan entity set through
the borrower relationship, and the participation of customer in the borrower relationship set is
therefore partial.

WEAK ENTITIES

• An entity set may not have sufficient attributes to form a primary key.
• Such an entity set is termed a weak entity set.
• An entity set that has a primary key is termed a strong entity set.
• As an illustration, consider the entity set payment, which has the three attributes: payment-number,
payment-date, and paymentamount.
• Payment numbers are typically sequential numbers, starting from 1, generated separately for each
loan.
• Thus, al-though each payment entity is distinct, payments for different loans may share the same
payment number. Thus, this entity set does not have a primary key; it is a weak entity set.
• For a weak entity set to be meaningful, it must be associated with another entity set, called the
identifying or owner entity set.
• Every weak entity must be associated with an identifying entity; that is, the weak entity set is said to
be existence dependent on the identifying entity set.
• The identifying entity set is said to own the weak entity set that it identifies.
• The relationship associating the weak entity set with the identifying entity set is called the identifying
relationship.
• The identifying relationship is many to one from the weak entity set to the identifying entity set, and
the participation of the weak entity set in the relationship is total.
• In our example, the identifying entity set for payment is loan, and a relationship loan-payment that
associates payment entities with their corresponding loan entities is the identifying relationship.
Department of Information Technology
Nivedita Ray De Sarkar Page
1616
Database Management System

• Although a weak entity set does not have a primary key, we nevertheless need a means of
distinguishing among all those entities in the weak entity set that depend on one particular strong
entity.
• The discriminator of a weak entity set is a set of attributes that allows this distinction to be made.
• For example, the discriminator of the weak entity set payment is the attribute payment-number, since,
for each loan, a payment number uniquely identifies one single payment for that loan.
• The discriminator of a weak entity set is also called the partial key of the entity set.
• The primary key of a weak entity set is formed by the primary key of the identifying entity set, plus the
weak entity set’s discriminator.
• In the case of the entity set payment, its primary key is {loannumber, payment-number}, where loan-
number is the primary key of the identifying entity set, namely loan, and paymentnumber distinguishes
payment entities within the same loan.
• The identifying relationship set should have no descriptive attributes, since any required attributes can
be associated with the weak entity set
• A weak entity set can participate in relationships other than the identifying relationship.
• For instance, the payment entity could participate in a relationship with the account entity set,
identifying the account from which the payment was made.
• A weak entity set may participate as owner in an identifying relationship with another weak entity set.
• It is also possible to have a weak entity set with more than one identifying entity set.
• A particular weak entity would then be identified by a combination of entities, one from each
identifying entity set.
• The primary key of the weak entity set would consist of the union of the primary keys of the
identifying entity sets, plus the discriminator of the weak entity set.
• In E-R diagrams, a doubly outlined box indicates a weak entity set, and a doubly outlined diamond
indicates the corresponding identifying relationship.
• The weak entity set payment depends on the strong entity set loan via the relationship set loan-
payment.
• The figure also illustrates the use of double lines to indicate total participation—the participation of
the (weak) entity set payment in the relationship loan-payment is total, meaning that every payment
must be related via loan-payment to some loan.

Finally, the arrow from loan-payment to loan indicates that each payment is for a single loan. The
discriminator of a weak entity set also is underlined, but with a dashed, rather than a solid, line.

Department of Information Technology


Nivedita Ray De Sarkar Page
1717
Database Management System

ER DIAGRAM- SPECIALIZATION, GENERALIZATION AND AGGREGATION

Specialization:

• An entity set may include sub groupings of entities that are distinct in some way from other entities in
the set.
• For instance, a subset of entities within an entity set may have attributes that are not shared by all the
entities in the entity set.
The E-R model provides a means for representing these distinctive entity groupings.
• Consider an entity set person, with attributes name, street, and city. A person may be further
classified as one of the following:
• Customer
• Employee
• Each of these person types is described by a set of attributes that includes all the attributes of entity
set person plus possibly additional attributes.
• For example, customer entities may be described further by the attribute customer-id, whereas
employee entities may be described further by the attributes employee-id and salary.
• The process of designating sub groupings within an entity set is called specialization.
• The specialization of person allows us to distinguish among persons according to whether they are
employees or customers.
• As another example, suppose the bank wishes to divide accounts into two categories, checking
account and savings account. Savings accounts need a minimum balance, but the bank may set interest
rates differently for different customers, offering better rates to favored customers.
• Checking accounts have a fixed interest rate, but offer an overdraft facility; the overdraft amount on a
checking account must be recorded.
• The bank could then create two specializations of account, namely savings-account and checking-
account.
• As we saw earlier, account entities are described by the attributes account-number and balance.
• The entity set savings-account would have all the attributes of account and an additional attribute
interest-rate.
• The entity set checking-account would have all the attributes of account, and an additional attribute
overdraft-amount.
• We can apply specialization repeatedly to refine a design scheme. For instance, bank employees may
be further classified as one of the following:
• Officer

Department of Information Technology


Nivedita Ray De Sarkar Page
1818
Database Management System

• Teller
• Secretary
• Each of these employee types is described by a set of attributes that includes all the attributes of
entity set employee plus additional attributes. For example, officer entities may be described further by
the attribute office-number, teller entities by the attributes station-number and hours-per-week, and
secretary entities by the attribute hours-per-week. Further, secretary entities may participate in a
relationship secretary-for, which identifies which employees are assisted by a secretary.
• An entity set may be specialized by more than one distinguishing feature. In our example, the
distinguishing feature among employee entities is the job the employee performs.
Another, coexistent, specialization could be based on whether the person is a temporary (limited-term)
employee or a permanent employee, resulting in the entity sets temporaryemployee and permanent-
employee. When more than one specialization is formed on an entity set, a particular entity may belong
to multiple specializations. For instance, a given employee may be a temporary employee who is a
secretary.
• In terms of an E-R diagram, specialization is depicted by a triangle component labeled ISA. The label
ISA stands for “is a” and represents, for example, that a customer “is a” person. The ISA relationship may
also be referred to as a super class subclass relationship. Higher- and lower-level entity sets are depicted
as regular entity sets—that is, as rectangles containing the name of the entity set.

Generalization:

• The refinement from an initial entity set into successive levels of entity sub groupings represents a
top-down design process in which distinctions are made explicit. The design process may also proceed in
a bottom-up manner, in which multiple entity sets are synthesized into a higher-level entity set on the
basis of common features. The database designer may have first identified a customer entity set with
the attributes name, street, city, and customer-id, and an employee entity set with the attributes name,
street, city, employee-id, and salary. There are similarities between the customer entity set and the
employee entity set in the sense that they have several attributes in common. This commonality can be
expressed by generalization, which is a containment relationship that exists between a higher-level
entity set and one or more lower-level entity sets. In our example, person is the higher-level entity set
and customer and employee are lower-level entity sets.
• Higher- and lower-level entity sets also may be designated by the terms super class and subclass,
respectively. The person entity set is the super class of the customer and employee subclasses.
• For all practical purposes, generalization is a simple inversion of specialization. We will apply both
processes, in combination, in the course of designing the E-R schema for an enterprise. In terms of the E-
R diagram itself, we do not distinguish between specialization and generalization. New levels of entity
representation will be distinguished (specialization) or synthesized (generalization) as the design schema
comes to express fully the database application and the user requirements of the database.

Department of Information Technology


Nivedita Ray De Sarkar Page
1919
Database Management System

Specialization and Generalization

Aggregation:

• One limitation of the E-R model is that it cannot express relationships among relationships.
• To illustrate the need for such a construct, consider the ternary relationship works-on, which we saw
earlier, between a employee, branch,and job.
• Now, suppose we want to record managers for tasks performed by an employee at a branch; that is,
we want to record managers for (employee, branch, job)combinations. Let us assume that there is an
entity set manager.
• One alternative for representing this relationship is to create a quaternary relationship manages
between employee, branch, job, and manager. (A quaternary relationship is required—a binary
relationship between manager and employee would not permit us to represent which (branch, job)
combinations of an employee are managed by which manager.)
• Using the basic E-R modeling constructs, we obtain the E-R diagram as follows:

E-R diagram with redundant relationships.

Department of Information Technology


Nivedita Ray De Sarkar Page
2020
Database Management System

• It appears that the relationship sets works-on and manages can be combined into one single
relationship set.
• Nevertheless, we should not combine them into a single relationship, since some employee, branch,
job combinations many not have a manager.
• There is redundant information in the resultant figure, however, since every employee, branch, job
combination in manages is also in works-on.
• If the manager were a value rather than an manager entity, we could instead make manager a multi
valued attribute of the relationship works-on.
• But doing so makes it more difficult (logically as well as in execution cost) to find, for example,
employee-branch-job triples for which a manager is responsible. Since the manager is a manager entity,
this alternative is ruled out in any case.
• The best way to model a situation such as the one just described is to use aggregation.
• Aggregation is an abstraction through which relationships are treated as higher-level entities.
• Following figure shows a notation for aggregation commonly used to represent the above situation.

1. Comparison between Strong and Weak Entity.


Strong Entity Weak Entity
1. One that does have a primary key is 1. An entity set that does not possess
called a strong entity set. sufficient attributes to form a primary
key is called a weak entity set.
The discriminator (or partial key) of a
weak entity set is the set of attributes
that distinguishes among all the
entities of a weak entity set
2. It is not dependent on weak entity. 2. The existence of a weak entity set
depends on the existence of a identifying
entity set. It must relate to the identifying
entity set via a total, one-to-many
relationship set from the identifying to the
weak entity set
3. Depicted by single rectangle. 3. We depict a weak entity set by double
rectangles in E-R diagram. We underline the
discriminator of a weak entity set with a
dashed line in E-R diagram.

Department of Information Technology


Nivedita Ray De Sarkar Page
2121
Database Management System

2. How to convert an ER schema to a Relational Schema


S1. Create a relation for every entity present in the ER schema. The primary key f the entity is
the primary key of the relation.
S2. For every weak entity set, create a relation. The primary key of this relation is the primary
key is the primary key of the owner’s entity taken as a foreign key. If there exists any partial key
of the weak entity then the primary key of this relation will consist of the primary key of the
owner’s relation and the partial key taken together.

Conversion of relationships:
a. For every one-to-one relationship, any one of the entity’s primary key is taken as foreign key
to the other entity to establish the relationship.
b. For every n-to-1 or 1-to-n relationship the primary key of the entity in the side 1 of the ER
schema is taken as foreign key to the other side to establish the relationship.
c. For a n-to-m relationship create a relation for the relationship set where the primary key will
consist of the primary keys of the entities present taken as foreign key.
d. For multivalued attributes create a relation taking the multivalued attributes and the
entity’s primary key as a foreign key.

Module 3
Relational Model : Structure of relational Databases, Relational Algebra, Relational Calculus, Extended
Relational Algebra Operations, Views, Modifications Of the Database.

INTRODUCTION TO RELATIONAL MODEL:

The relational model represents the database as a collection of relations. Informally, each relation
resembles a table of values or, to some extent, a "flat" file of records. When a relation is thought of as a
table of values, each row in the table represents a collection of related data values. In the relational
model, each row in the table represents a fact that typically corresponds to a real world entity or
relationship. The table name and column names are used to help in interpreting the meaning of the
values in each row. In the formal relational model terminology, a row is called a tuple, a column header
is called an attribute, and the table is called a relation. The data type describing the types of values that
can appear in each column is called a domain. We now define these terms—domain, tuple, attribute,
and relation—more precisely.

Department of Information Technology


Nivedita Ray De Sarkar Page
2222
Database Management System

The account relation

INTEGRITY CONSTRAINTS OVER THE RELATION:

To preserve the consistency and correctness of its stored data, a relational DBMS typically imposes one
or more data integrity constraints. These constraints restrict the data values that can be inserted into
the database or created by a database update.
Several different types of data integrity constraints are commonly found in relational databases,
including:
• Required data: Some columns in a database must contain a valid data value in every row; they are not
allowed to contain missing or NULL values. In the sample database, every order must have an associated
customer who placed the order. The DBMS can
be asked to prevent NULL values in this column.
• Validity checking: Every column in a database has a domain, a set of data values that are legal for that
column. The DBMS can be asked to prevent other data values in these columns.
• Entity integrity: The primary key of a table must contain a unique value in each row, which is different
from the values in all other rows. Duplicate values are illegal, because they wouldn't allow the database
to distinguish one entity from another. The DBMS can be asked to enforce this unique values constraint.
• Referential integrity: A foreign key in a relational database links each row in the child table containing
the foreign key to the row of the parent table containing the matching primary key value. The DBMS can
be asked to enforce this foreign key/primary key
constraint.
• Other data relationships: The real-world situation modeled by a database will often have additional
constraints that govern the legal data values that may appear in the database.
The DBMS can be asked to check modifications to the tables to make sure that their values are
constrained in this way.

Business rules: Updates to a database may be constrained by business rules governing the real-world
transactions that are represented by the updates.
• Consistency: Many real-world transactions cause multiple updates to a database. The DBMS can be
asked to enforce this type of consistency rule or to support applications that implement such rules.

RELATIONAL ALGEBRA:

The relational algebra is a procedural query language. It consists of a set of operations that take one or
two relations as input and produce a new relation as their result. The fundamental operations in the
relational algebra are select, project, union, set difference, Cartesian product, and rename. In addition
to the fundamental operations, there are several other operations—namely, set intersection, natural
join, division, and assignment. We will define these operations in terms of the fundamental operations.

Department of Information Technology


Nivedita Ray De Sarkar Page
2323
Database Management System

Relational algebra is the basic set of operations for the relational model
 These operations enable a user to specify basic retrieval requests (or queries)
 The result of an operation is a new relation, which may have been formed from one or more
input relations
 This property makes the algebra “closed” (all objects in relational algebra are relations)
 The algebra operations thus produce new relations
----------These can be further manipulated using operations of the same algebra
 A sequence of relational algebra operations forms a relational algebra expression
------------- The result of a relational algebra expression is also a relation that represents the result of
a database query (or retrieval request).

Fundamental Operations
The select, project, and rename operations are called unary operations,
because they operate on one relation. The other three operations operate on pairs of relations and are,
therefore, called binary operations.

The Select Operation

The select operation selects tuples that satisfy a given predicate. We use the lowercase Greek letter
sigma (σ) to denote selection. The predicate appears as a subscript to σ.
The argument relation is in parentheses after the σ. Thus, to select those tuples of the loan relation
where the branch is “Perryridge,” we write
σbranch-name =“Perryridge” (loan)
We can find all tuples in which the amount lent is more than $1200 by writing σamount>1200 (loan)
In general, we allow comparisons using =, _=, <, ≤, >, ≥ in the selection predicate.
Furthermore, we can combine several predicates into a larger predicate by using the connectives and
( ), or ( ), and not (¬).
Thus, to find those tuples pertaining to loans of more than $1200 made by the Perryridge branch, we
write:
σbranch-name =“Perryridge” amount>1200 (loan)

Result of σbranch-name =“Perryridge” (loan).

The selection predicate may include comparisons between two attributes. To illustrate, consider the
relation loan-officer that consists of three attributes: customer-name, banker-name, and loan-number,
which specifies that a particular banker is the loan
officer for a loan that belongs to some customer. To find all customers who have the same name as their
loan officer, we can write σcustomer-name =banker-name (loan-officer).
Relational algebra, an offshoot of first-order logic (and of algebra of sets), deals with a set of finitary
relations which is closed under certain operators. These operators operate on one or more relations to
yield a relation.
As in any algebra, some operators are primitive and the others, being definable in terms of the primitive
ones, are derived. It is useful if the choice of primitive operators parallels the usual choice of primitive
Department of Information Technology
Nivedita Ray De Sarkar Page
2424
Database Management System

logical operators. Although it is well known that the usual choice in logic of AND, OR and NOT is
somewhat arbitrary, Codd made a similar arbitrary choice for his algebra. The six primitive operators of
Codd's algebra are the selection, the projection, the Cartesian product (also called the cross product or
cross join), the set union, the set difference, and the rename. (Actually, Codd omitted the rename, but
the compelling case for its inclusion was shown by the inventors of ISBL.) These six operators are
fundamental in the sense that none of them can be omitted without losing expressive power. Many
other operators have been defined in terms of these six. Among the most important are set intersection,
division, and the natural join. In fact ISBL made a compelling case for replacing the Cartesian product
with the natural join, of which the Cartesian product is a degenerate case. Altogether, the operators of
relational algebra have identical expressive power to that of domain relational calculus or tuple
relational calculus. However, for the reasons given in the Introduction above, relational algebra has
strictly less expressive power than that of first-order predicate calculus without function symbols.
Relational algebra actually corresponds to a subset of first-order logic that is Horn clauses without
recursion and negation.

Set operators

Although three of the six basic operators are taken from set theory, there are additional constraints that
are present in their relational algebra counterparts: For set union and set difference, the two relations
involved must be union-compatible—that is, the two relations must have the same set of attributes. As
set intersection can be defined in terms of set difference, the two relations involved in set intersection
must also be union-compatible.

The Cartesian product is defined differently from the one defined in set theory in the sense that tuples
are considered to be 'shallow' for the purposes of the operation. That is, unlike in set theory, where the
Cartesian product of a n-tuple by an m-tuple is a set of 2-tuples, the Cartesian product in relational
algebra has the 2-tuple "flattened" into an n+m-tuple. More formally, R × S is defined as follows:
R × S = {r s | r R, s S}
In addition, for the Cartesian product to be defined the two relations involved must have disjoint
headers — that is, they must not have a common attribute name.
Projection (π)
A projection is a unary operation written as where a1,...,an is a set of attribute names. The result of
such projection is defined as the set that is obtained when all tuples
in R are restricted to the set {a1,...,an}.
Selection (σ)
A generalized selection is a unary operation written as where is a propositional formula that consists of
atoms as allowed in the normal selection and the logical operators (and), (or) and (negation). This
selection selects all those tuples in R
for which holds.
Rename (ρ)
A rename is a unary operation written as ρa / b(R) where the result is identical to R except that the b
field in all tuples is renamed to an a field. This is simply used to rename the attribute of a relation or the
relation itself.

Department of Information Technology


Nivedita Ray De Sarkar Page
2525
Database Management System

UNION Operation

 Binary operation, denoted by ∪


 The result of R ∪ S, is a relation that includes all tuples that are either in R or in S or in both R
and S
 Duplicate tuples are eliminated
 The two operand relations R and S must be “type compatible” (or UNION compatible)
 R and S must have same number of attributes
 Each pair of corresponding attributes must be type compatible (have same or compatible
domains).

 Example:
To retrieve the social security numbers of all employees who either work in department 5
(RESULT1 below) or directly supervise an employee who works in department 5 (RESULT2
below)
We can use the UNION operation as follows:
DEP5_EMPS ← σDNO=5 (EMPLOYEE)
RESULT1 ← π SSN(DEP5_EMPS)
RESULT2(SSN) ← πSUPERSSN(DEP5_EMPS)
RESULT ← RESULT1 ∪ RESULT2
 The union operation produces the tuples that are in either RESULT1 or RESULT2 or both.

Type Compatibility of operands is required for the binary set operation UNION ∪, (also for
INTERSECTION ∩, and SET DIFFERENCE –)
 R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:
--- they have the same number of attributes, and
--- the domains of corresponding attributes are type compatible (i.e. dom(Ai)=dom(Bi) for i=1, 2, ..., n).
 The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2, see next slides) has the same
attribute names as the first operand relation R1 (by convention).

INTERSECTION is denoted by ∩
 The result of the operation R ∩ S, is a relation that includes all tuples that are in both R and S
 The attribute names in the result will be the same as the attribute names in R
 The two operand relations R and S must be “type compatible”

SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –


 The result of R – S, is a relation that includes all tuples that are in R but not in S
 The attribute names in the result will be the same as the attribute names in R
 The two operand relations R and S must be “type compatible”

Some properties of UNION, INTERSECT, and DIFFERENCE


 Notice that both union and intersection are commutative operations; that is
 R ∪ S = S ∪ R, and R ∩ S = S ∩ R
 Both union and intersection can be treated as n-ary operations applicable to any number of
relations as both are associative operations; that is

Department of Information Technology


Nivedita Ray De Sarkar Page
2626
Database Management System

 R ∪ (S ∪ T) = (R ∪ S) ∪ T
 (R ∩ S) ∩ T = R ∩ (S ∩ T)
 The minus operation is not commutative; that is, in general
 R–S≠S–R

CARTESIAN (or CROSS) PRODUCT Operation


 This operation is used to combine tuples from two relations in a combinatorial fashion.
 Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
 Result is a relation Q with degree n + m attributes:
 Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
 The resulting relation state has one tuple for each combination of tuples—one from R and one
from S.
 Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S will have nR *
nS tuples.
 The two operands do NOT have to be "type compatible”.

Binary Relational Operations: JOIN

 The sequence of CARTESIAN PRODECT followed by SELECT is used quite commonly to identify
and select related tuples from two relations
 A special operation, called JOIN combines this sequence into a single operation
 This operation is very important for any relational database with more than a single relation,
because it allows us combine related tuples from various relations
 The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm)
is: R <join condition>S
where R and S can be any relations that result from general relational algebra expressions.

Example: Suppose that we want to retrieve the name of the manager of each department.
 To get the manager’s name, we need to combine each DEPARTMENT tuple with the EMPLOYEE
tuple whose SSN value matches the MGRSSN value in the department tuple.
 We do this by using the join operation.
DEPT_MGR ← DEPARTMENT MGRSSN=SSN EMPLOYEE
 MGRSSN=SSN is the join condition Combines each department record with the employee who
manages the department
The join condition can also be specified as [Link]= [Link]
EQUIJOIN Operation
 The most common use of join involves join conditions with equality comparisons only
 Such a join, where the only comparison operator used is =, is called an EQUIJOIN.
In the result of an EQUIJOIN we always have one or more pairs of attributes (whose names need not be
identical) that have identical values in every tuple.

NATURAL JOIN Operation

Another variation of JOIN called NATURAL JOIN — denoted by * — was created to get rid of the second
(superfluous) attribute in an EQUIJOIN condition.
because one of each pair of attributes with identical values is superfluous

Department of Information Technology


Nivedita Ray De Sarkar Page
2727
Database Management System

The standard definition of natural join requires that the two join attributes, or each pair of
corresponding join attributes, have the same name in both relations
If this is not the case, a renaming operation is applied first.

DIVISION Operation

The division operation is applied to two relations


R(Z) ÷ S(X), where X subset Z. Let Y = Z - X (and hence Z = X ∪ Y); that is, let Y be the set of attributes of R
that are not attributes of S.
For a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in combination
with every tuple in S.

The OUTER JOIN Operation

In NATURAL JOIN and EQUIJOIN, tuples without a matching (or related) tuple are eliminated from the
join result
Tuples with null in the join attributes are also eliminated
This amounts to loss of information.
A set of operations, called OUTER joins, can be used when we want to keep all the tuples in R, or all
those in S, or all those in both relations in the result of the join, regardless of whether or not they have
matching tuples in the other relation.

The left outer join operation keeps every tuple in the first or left relation R in R S; if no matching tuple is
found in S, then the attributes of S in the join result are filled or “padded” with null values.
 A similar operation, right outer join, keeps every tuple in the second or right relation S in the
result of R S.
 A third operation, full outer join, denoted by keeps all tuples in both the left and the right
relations when no matching tuples are found, padding them with null values as needed.

Aggregate Functions and Grouping

 A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database.
 Examples of such functions include retrieving the average or total salary of all employees or the
total number of employee tuples.
 These functions are used in simple statistical queries that summarize information from the
database tuples.
 Common functions applied to collections of numeric values include
 SUM, AVERAGE, MAXIMUM, and MINIMUM.
TheAggregate
Use of the COUNT function is used
Functional for counting
operation ℱ tuples or values.
 ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from the EMPLOYEE relation
 ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from the EMPLOYEE relation
 ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from the EMPLOYEE relation

Department of Information Technology


Nivedita Ray De Sarkar Page
2828
Database Management System

 ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count (number) of employees and
their average salary
 Note: count just counts the number of rows, without removing duplicates

Using Grouping with Aggregation


➲ The previous examples all summarized one or more attributes for a set of tuples
 Maximum Salary or Count (number of) Ssn
➲ Grouping can be combined with Aggregate Functions
➲ Example: For each department, retrieve the DNO, COUNT SSN, and AVERAGE SALARY
➲ A variation of aggregate operation allows this: ℱ
 Grouping attribute placed to left of symbol
 Aggregate functions to right of symbol
 DNO ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE)
➲ Above operation groups employees by DNO (department number) and computes the count of
employees and average salary per department

Examples of Queries in Relational Algebra : Procedural Form


 Q1: Retrieve the name and address of all employees who work for the ‘Research’ department.
RESEARCH_DEPT ← σ DNAME=’Research’ (DEPARTMENT)
RESEARCH_EMPS ← (RESEARCH_DEPT DNUMBER= DNOEMPLOYEEEMPLOYEE)
RESULT ← π FNAME, LNAME, ADDRESS (RESEARCH_EMPS)
 Q2: Retrieve the names of employees who have no dependents.
ALL_EMPS ← π SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) ← π ESSN(DEPENDENT)
EMPS_WITHOUT_DEPS ← (ALL_EMPS - EMPS_WITH_DEPS)
RESULT ← π LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)

Relational Calculus
➲ A relational calculus expression creates a new relation, which is specified in terms of variables that
range over rows of the stored database relations (in tuple calculus) or over columns of the stored
relations (in domain calculus).
➲ In a calculus expression, there is no order of operations to specify how to retrieve the query
result—a calculus expression specifies only what information the result should contain.
 This is the main distinguishing feature between relational algebra and relational calculus.

Relational calculus is considered to be a nonprocedural or declarative language.


➲ This differs from relational algebra, where we must write a sequence of operations to specify a
retrieval request; hence relational algebra can be considered as a procedural way of
stating a query.
Tuple Relational Calculus
➲ The tuple relational calculus is based on specifying a number of tuple variables.
➲ Each tuple variable usually ranges over a particular database relation, meaning that the variable may
take as its value any individual tuple from that relation.
➲ A simple tuple relational calculus query is of the form
{t | COND(t)}
where t is a tuple variable and COND (t) is a conditional expression involving t.

Department of Information Technology


Nivedita Ray De Sarkar Page
2929
Database Management System

 The result of such a query is the set of all tuples t that satisfy COND (t).

➲ Example: To find the first and last names of all employees whose salary is above $50,000, we can
write the following tuple calculus expression:
{[Link], [Link] | EMPLOYEE(t) AND [Link]>50000}
➲ The condition EMPLOYEE(t) specifies that the range relation of tuple variable t is EMPLOYEE.
➲ The first and last name (PROJECTION π FNAME, LNAME) of each EMPLOYEE tuple t that satisfies the
condition [Link]>50000 (SELECTION σ SALARY >50000) will be retrieved.

The Existential and Universal Quantifiers


➲ Two special symbols called quantifiers can appear in formulas; these are the universal quantifier (∀)
and the existential quantifier (∃).
➲ Informally, a tuple variable t is bound if it is quantified, meaning that it appears in an (∀ t) or (∃ t)
clause; otherwise, it is free.
➲ If F is a formula, then so are (∃ t)(F) and (∀ t)(F), where t is a tuple variable.
 The formula (∃t)(F) is true if the formula F evaluates to true for some (at least one) tuple
assigned to free occurrences of t in F; otherwise (∃ t)(F) is false.
 The formula (∀ t)(F) is true if the formula F evaluates to true for every tuple (in the universe)
assigned to free occurrences of t in F; otherwise (∀ t)(F) is false.
∀ is called the universal or “for all” quantifier because every tuple in “the universe of” tuples
must make F true to make the quantified formula true.
➲ ∃ is called the existential or “there exists” quantifier because any tuple that exists in “the
universe of” tuples may make F true to make the quantified formula true.

Example Query Using Existential Quantifier


➲ Retrieve the name and address of all employees who work for the ‘Research’ department. The query
can be expressed as :
{[Link], [Link], [Link] | EMPLOYEE(t) and (∃ d) (DEPARTMENT(d) and [Link]=‘Research’
and [Link]=[Link]) }
➲ The only free tuple variables in a relational calculus expression should be those that appear to the
left of the bar ( | ).
 In above query, t is the only free variable; it is then bound successively to each tuple.
➲ If a tuple satisfies the conditions specified in the query, the attributes FNAME, LNAME, and ADDRESS
are retrieved for each such tuple.
 The conditions EMPLOYEE (t) and DEPARTMENT(d) specify the range relations for t and d.
 The condition [Link] = ‘Research’ is a selection condition and corresponds to a SELECT
operation in the relational algebra, whereas the condition [Link] = [Link] is a JOIN
condition.
Find the names of employees who work on all the projects controlled by department number 5. The
query can be:
{[Link], [Link] | EMPLOYEE(e)
OR ( (∃ w)(WORKS_ON(w) and ( (∀ x)(not(PROJECT(x))
and [Link]=[Link] or not([Link]=5)
and [Link]=[Link]))))}
➲ Exclude from the universal quantification all tuples that we are not interested in by making the
condition true for all such tuples.
 The first tuples to exclude (by making them evaluate automatically to true) are those that are
not in the relation R of interest.

Department of Information Technology


Nivedita Ray De Sarkar Page
3030
Database Management System

➲ In query above, using the expression not(PROJECT(x)) inside the universally quantified formula
evaluates to true all tuples x that are not in the PROJECT relation.
 Then we exclude the tuples we are not interested in from R itself. The expression
not([Link]=5) evaluates to true all tuples x that are in the project relation but are not
controlled by department 5.
➲ Finally, we specify a condition that must hold on all the remaining tuples in R.
( (∃ w)(WORKS_ON(w) and [Link]=[Link] and [Link]=[Link])

Department of Information Technology


Nivedita Ray De Sarkar Page
3131
Database Management System

SQL
SQL is a standard language for accessing and manipulating databases.

What Can SQL do?

 SQL can execute queries against a database


 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views

The SQL SELECT Statement


The SELECT statement is used to select data from a database.
The result is stored in a result table, called the result-set.
SQL SELECT Syntax
SELECT column_name,column_name
FROM table_name
and
SELECT * FROM table_name

SELECT * Example
The following SQL stament selects all the columns from the "Customers" table:
Example
SELECT * FROM Customers;

The SQL SELECT DISTINCT Statement


In a table, a column may contain many duplicate values; and sometimes you only want to list the
different (distinct) values.
The DISTINCT keyword can be used to return only distinct (different) values.
SQL SELECT DISTINCT Syntax
SELECT DISTINCT column_name,column_name
FROM table_name

SELECT DISTINCT Example


The following SQL statement selects only the distinct values from the "City" columns from the
"Customers" table:

Department of Information Technology


Nivedita Ray De Sarkar Page
3232
Database Management System

Example
SELECT DISTINCT City FROM Customers;
The SQL WHERE Clause
The WHERE clause is used to extract only those records that fulfill a specified criterion.
SQL WHERE Syntax
SELECT column_name,column_name
FROM table_name
WHERE column_name operator value
WHERE Clause Example
The following SQL statement selects all the customers from the country "Mexico", in the "Customers"
table:
Example
SELECT * FROM Customers
WHERE City='Kolkata';

The SQL AND & OR Operators


The AND operator displays a record if both the first condition AND the second condition are true.
The OR operator displays a record if either the first condition OR the second condition is true.
AND Operator Example
The following SQL statement selects all customers from the country "Germany" AND the city "Berlin", in
the "Customers" table:

Example
SELECT * FROM Customers
WHERE Country='India'
AND City='Kolkata';

OR Operator Example

The following SQL statement selects all customers from the city "Berlin" OR "München", in the
"Customers" table:

Example

SELECT * FROM Customers


WHERE City='Kolkata'
OR City='Delhi';

Combining AND & OR


You can also combine AND and OR (use parenthesis to form complex expressions).

The following SQL statement selects all customers from the country "Germany" AND the city must be
equal to "Berlin" OR "München", in the "Customers" table:

Example

Department of Information Technology


Nivedita Ray De Sarkar Page
3333
Database Management System

SELECT * FROM Customers


WHERE Country='Germany'
AND (City='Berlin' OR City='München');

The SQL ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set by one or more columns.

The ORDER BY keyword sorts the records in ascending order by default. To sort the records in a
descending order, you can use the DESC keyword.

SQL ORDER BY Syntax


SELECT column_name,column_name
FROM table_name
ORDER BY column_name,column_name ASC|DESC

ORDER BY Example
The following SQL stament selects all customers from the "Customers" table, sorted by the "Country"
column:

Example

SELECT * FROM Customers


ORDER BY Country;

ORDER BY DESC Example

The following SQL stament selects all customers from the "Customers" table, sorted DESCENDING by the
"Country" column:

Example
SELECT * FROM Customers
ORDER BY Country DESC

The SQL INSERT INTO Statement

The INSERT INTO statement is used to insert new records in a table.

SQL INSERT INTO Syntax

It is possible to write the INSERT INTO statement in two forms.

The first form does not specify the column names where the data will be inserted, only their values:
INSERT INTO table_name
VALUES (value1,value2,value3,...)

Department of Information Technology


Nivedita Ray De Sarkar Page
3434
Database Management System

The second form specifies both the column names and the values to be inserted:
INSERT INTO table_name (column1,column2,column3,...)
VALUES (value1,value2,value3,...)

INSERT INTO Example


Now we want to insert a new row in the "Customers" table.
We use the following SQL statement:
INSERT INTO Customers
VALUES ('A','B','C','D','34','India');

Insert Data Only in Specified Columns

It is also possible to only add data in specific columns.


The following SQL statement will add a new row, but only add data in the "CustomerID",
"CustomerName", "City", and "Country" columns:
INSERT INTO Customers (CustomerID, CustomerName, City, Country)
VALUES ('C008', 'ABD', 'Kolkata', ‘India’);

The UPDATE Statement

The UPDATE statement is used to update existing records in a table.


SQL UPDATE Syntax
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE clause specifies which record or
records that should be updated. If you omit the WHERE clause, all records will be updated!

SQL UPDATE Example

Now we want to update the customer "ABD" in the "Customers" table.


We use the following SQL statement:
UPDATE Customers
SET ContactName=’BBB', City='Delhi'
WHERE CustomerName='ABD' AND Country='India';

SQL UPDATE Warning

Be careful when updating records. If we had omitted the WHERE clause in the example above, like this:
UPDATE Customers
SET ContactName=’BBB’, City='Delhi';

The DELETE Statement

The DELETE statement is used to delete rows in a table.

Department of Information Technology


Nivedita Ray De Sarkar Page
3535
Database Management System

SQL DELETE Syntax


DELETE FROM table_name
WHERE some_column=some_value

Note: Notice the WHERE clause in the DELETE syntax. The WHERE clause specifies which record or
records that should be deleted. If you omit the WHERE clause, all records will be deleted!

SQL DELETE Example

Now we want to delete the company "XYZ" in the "Customers" table.

We use the following SQL statement:


DELETE FROM Customers
WHERE CustomerName=’ABD AND ContactName='BBB';

Delete All Rows

It is possible to delete all rows in a table without deleting the table. This means that the table structure,
attributes, and indexes will be intact:

DELETE FROM table_name;

or

DELETE * FROM table_name;


Note: Be very careful when deleting records. You cannot undo this statement!

Selecting based on a particular row


SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number

Example
SELECT *
FROM Persons
WHERE ROWNUM <=5;

The LIKE Operator

The LIKE operator is used to search for a specified pattern in a column.

SQL LIKE Syntax

Department of Information Technology


Nivedita Ray De Sarkar Page
3636
Database Management System

SELECT column_name(s)
FROM table_name
WHERE column_name LIKE pattern

LIKE Operator Example

The following SQL statement selects the customers in cities starting with an "s" from the "Customers"
table:

Example
SELECT * FROM Customers
WHERE City LIKE 'K%';

The "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the
pattern.

The following SQL statement selects the customers in a city ending with an "s" from the "Customers"
table:

Example
SELECT * FROM Customers
WHERE City LIKE '%s';

It is also possible to search for a pattern, not just a single letter.

The following SQL statement selects the customers in a country containing the pattern "land" from the
"Customers" table:

Example
SELECT * FROM Customers
WHERE Country LIKE '%land%';

Using the NOT keyword allows you to select records that does NOT match the pattern.

The following SQL statement selects the customers in a country NOT containing the pattern "land" from
the "Customers" table:

Example
SELECT * FROM Customers
WHERE Country NOT LIKE '%land%';

The IN Operator

The IN operator allows you to specify multiple values in a WHERE clause.

Department of Information Technology


Nivedita Ray De Sarkar Page
3737
Database Management System

SQL IN Syntax
SELECT column_name(s)
FROM table_name
WHERE column_name IN (value1,value2,...)

IN Operator Example

The following SQL statement selects the companies in the cities "Paris" or "London" from the
"Customers" table:

Example
SELECT * FROM Customers
WHERE City IN ('Kolkata','Delhi');

The BETWEEN Operator

The BETWEEN operator selects a range of data between two values. The values can be numbers, text, or
dates.

SQL BETWEEN Syntax


SELECT column_name(s)
FROM table_name
WHERE column_name
BETWEEN value1 AND value2

BETWEEN Operator Example

The following SQL statement selects the customers with "CustomerName" alphabetically between
"ABD" and "JKL" from the "Customers" table:

Example
SELECT * FROM Customers
WHERE CustomerName
BETWEEN ‘ABD' AND 'JKL';

To display the companies outside the range in the previous example, use NOT BETWEEN:

Example

SELECT * FROM Customers


WHERE CustomerName
NOT BETWEEN ‘ABD' AND 'JKL';

SQL Alias

Department of Information Technology


Nivedita Ray De Sarkar Page
3838
Database Management System

You can give a table or a column another name by using an alias. This can be a good thing to do if you
have very long or complex table names or column names.

An alias name could be anything, but usually it is short.

SQL Alias Syntax for Tables


SELECT column_name(s)
FROM table_name
AS alias_name

SQL Alias Syntax for Columns


SELECT column_name AS alias_name
FROM table_name

Alias Example

Using the "Customers" and "Orders" tables, we give the table aliases of "c" and "o" respectively.

The following SQL statement selects all the orders from the customer "ABD":

Example
SELECT [Link], [Link]
FROM Customers AS c,
Orders AS o
WHERE [Link]=ABD';

SQL JOIN

The JOIN keyword is used in an SQL statement to query data from two or more tables, based on a
relationship between certain columns in these tables.

Tables in a database are often related to each other with keys.

A primary key is a column (or a combination of columns) with a unique value for each row. Each primary
key value must be unique within the table. The purpose is to bind data together, across tables, without
repeating all of the data in every table.

Different SQL JOINs

Before we continue with examples, we will list the types of JOIN you can use, and the differences
between them.

 JOIN: Return rows when there is at least one match in both tables
 LEFT JOIN: Return all rows from the left table, even if there are no matches in the right table
 RIGHT JOIN: Return all rows from the right table, even if there are no matches in the left table
 FULL JOIN: Return rows when there is a match in one of the tables

Department of Information Technology


Nivedita Ray De Sarkar Page
3939
Database Management System

SQL INNER JOIN Keyword

The INNER JOIN keyword returns rows when there is at least one match in both tables.

SQL INNER JOIN Syntax


SELECT column_name(s)
FROM table_name1
INNER JOIN table_name2
ON table_name1.column_name=table_name2.column_name

PS: INNER JOIN is the same as JOIN.

SQL INNER JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 1
4 24562 1
5 34764 15

Now we want to list all the persons with any orders.

We use the following SELECT statement:

SELECT [Link], [Link], [Link]


FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY [Link];

The result-set will look like this:

LastName FirstName OrderNo

Department of Information Technology


Nivedita Ray De Sarkar Page
4040
Database Management System

Hansen Ola 22456


Hansen Ola 24562
Pettersen Kari 77895
Pettersen Kari 44678

The INNER JOIN keyword returns rows when there is at least one match in both tables. If there are rows
in "Persons" that do not have matches in "Orders", those rows will NOT be listed.

SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all rows from the left table (table_name1), even if there are no matches
in the right table (table_name2).

SQL LEFT JOIN Syntax


SELECT column_name(s)
FROM table_name1
LEFT JOIN table_name2
ON table_name1.column_name=table_name2.column_name

PS: In some databases LEFT JOIN is called LEFT OUTER JOIN.

SQL LEFT JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id

Department of Information Technology


Nivedita Ray De Sarkar Page
4141
Database Management System

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the persons and their orders - if any, from the tables above.

We use the following SELECT statement:

SELECT [Link], [Link], [Link]


FROM Persons
LEFT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY [Link]

The result-set will look like this:

LastName FirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

Svendson Tove

The LEFT JOIN keyword returns all the rows from the left table (Persons), even if there are no matches in
the right table (Orders).

SQL RIGHT JOIN Keyword

The RIGHT JOIN keyword returns all the rows from the right table (table_name2), even if there are no
matches in the left table (table_name1).

Department of Information Technology


Nivedita Ray De Sarkar Page
4242
Database Management System

SQL RIGHT JOIN Syntax


SELECT column_name(s)
FROM table_name1
RIGHT JOIN table_name2
ON table_name1.column_name=table_name2.column_name
PS: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.

SQL RIGHT JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the orders with containing persons - if any, from the tables above.

We use the following SELECT statement:

SELECT [Link], [Link], [Link]


FROM Persons
RIGHT JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY [Link]

Department of Information Technology


Nivedita Ray De Sarkar Page
4343
Database Management System

The result-set will look like this:

LastName FirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

34764

The RIGHT JOIN keyword returns all the rows from the right table (Orders), even if there are no matches
in the left table (Persons).

SQL FULL JOIN Keyword

The FULL JOIN keyword return rows when there is a match in one of the tables.

SQL FULL JOIN Syntax


SELECT column_name(s)
FROM table_name1
FULL JOIN table_name2
ON table_name1.column_name=table_name2.column_name

SQL FULL JOIN Example

The "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

Department of Information Technology


Nivedita Ray De Sarkar Page
4444
Database Management System

3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id

1 77895 3

2 44678 3

3 22456 1

4 24562 1

5 34764 15

Now we want to list all the persons and their orders, and all the orders with their persons.

We use the following SELECT statement:

SELECT [Link], [Link], [Link]


FROM Persons
FULL JOIN Orders
ON Persons.P_Id=Orders.P_Id
ORDER BY [Link]

The result-set will look like this:

LastName FirstName OrderNo

Hansen Ola 22456

Hansen Ola 24562

Pettersen Kari 77895

Pettersen Kari 44678

Svendson Tove

Department of Information Technology


Nivedita Ray De Sarkar Page
4545
Database Management System

34764

The FULL JOIN keyword returns all the rows from the left table (Persons), and all the rows from the right
table (Orders). If there are rows in "Persons" that do not have matches in "Orders", or if there are rows
in "Orders" that do not have matches in "Persons", those rows will be listed as well.

The SQL UNION Operator

The UNION operator is used to combine the result-set of two or more SELECT statements.

Notice that each SELECT statement within the UNION must have the same number of columns. The
columns must also have similar data types. Also, the columns in each SELECT statement must be in the
same order.

SQL UNION Syntax


SELECT column_name(s) FROM table_name1
UNION
SELECT column_name(s) FROM table_name2

Note: The UNION operator selects only distinct values by default. To allow duplicate values, use UNION
ALL.

SQL UNION ALL Syntax


SELECT column_name(s) FROM table_name1
UNION ALL
SELECT column_name(s) FROM table_name2

PS: The column names in the result-set of a UNION are always equal to the column names in the first
SELECT statement in the UNION.

SQL UNION Example

Look at the following tables:

"Employees_Norway":

E_ID E_Name

01 Hansen, Ola

02 Svendson, Tove

Department of Information Technology


Nivedita Ray De Sarkar Page
4646
Database Management System

03 Svendson, Stephen

04 Pettersen, Kari

"Employees_USA":

E_ID E_Name

01 Turner, Sally

02 Kent, Clark

03 Svendson, Stephen

04 Scott, Stephen

Now we want to list all the different employees in Norway and USA.

We use the following SELECT statement:

SELECT E_Name FROM Employees_Norway


UNION
SELECT E_Name FROM Employees_USA

The result-set will look like this:

E_Name

Hansen, Ola

Svendson, Tove

Svendson, Stephen

Pettersen, Kari

Turner, Sally

Kent, Clark

Department of Information Technology


Nivedita Ray De Sarkar Page
4747
Database Management System

Scott, Stephen

Note: This command cannot be used to list all employees in Norway and USA. In the example above we
have two employees with equal names, and only one of them will be listed. The UNION command
selects only distinct values.

SQL UNION ALL Example

Now we want to list all employees in Norway and USA:

SELECT E_Name FROM Employees_Norway


UNION ALL
SELECT E_Name FROM Employees_USA

Result

E_Name

Hansen, Ola

Svendson, Tove

Svendson, Stephen

Pettersen, Kari

Turner, Sally

Kent, Clark

Svendson, Stephen

Scott, Stephen

The SQL SELECT INTO Statement

The SELECT INTO statement selects data from one table and inserts it into a different table.

The SELECT INTO statement is most often used to create backup copies of tables.

Department of Information Technology


Nivedita Ray De Sarkar Page
4848
Database Management System

SQL SELECT INTO Syntax

We can select all columns into the new table:

SELECT *
INTO new_table_name [IN externaldatabase]
FROM old_tablename

Or we can select only the columns we want into the new table:

SELECT column_name(s)
INTO new_table_name [IN externaldatabase]
FROM old_tablename

SQL SELECT INTO Example

Make a Backup Copy - Now we want to make an exact copy of the data in our "Persons" table.

We use the following SQL statement:

SELECT *
INTO Persons_Backup
FROM Persons
We can also use the IN clause to copy the table into another database:
SELECT *
INTO Persons_Backup IN '[Link]'
FROM Persons

We can also copy only a few fields into the new table:

SELECT LastName,FirstName
INTO Persons_Backup
FROM Persons

SQL SELECT INTO - With a WHERE Clause

We can also add a WHERE clause.

The following SQL statement creates a "Persons_Backup" table with only the persons who lives in the
city "Sandnes":

Department of Information Technology


Nivedita Ray De Sarkar Page
4949
Database Management System

SELECT LastName,Firstname
INTO Persons_Backup
FROM Persons
WHERE City='Sandnes'

SQL SELECT INTO - Joined Tables

Selecting data from more than one table is also possible.

The following example creates a "Persons_Order_Backup" table contains data from the two tables
"Persons" and "Orders":

SELECT [Link],[Link]
INTO Persons_Order_Backup
FROM Persons
INNER JOIN Orders
ON Persons.P_Id=Orders.P_Id

The CREATE DATABASE Statement

The CREATE DATABASE statement is used to create a database.

SQL CREATE DATABASE Syntax


CREATE DATABASE database_name

CREATE DATABASE Example

Now we want to create a database called "my_db".

We use the following CREATE DATABASE statement:

CREATE DATABASE my_db

Database tables can be added with the CREATE TABLE statement.

The CREATE TABLE Statement

The CREATE TABLE statement is used to create a table in a database.

SQL CREATE TABLE Syntax


CREATE TABLE table_name
(

Department of Information Technology


Nivedita Ray De Sarkar Page
5050
Database Management System

column_name1 data_type,
column_name2 data_type,
column_name3 data_type,
....
)

The data type specifies what type of data the column can hold. For a complete reference of all the data
types available in MS Access, MySQL, and SQL Server, go to our complete Data Types reference.

CREATE TABLE Example

Now we want to create a table called "Persons" that contains five columns: P_Id, LastName, FirstName,
Address, and City.

We use the following CREATE TABLE statement:

CREATE TABLE Persons


(
P_Id int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

The P_Id column is of type int and will hold a number. The LastName, FirstName, Address, and City
columns are of type varchar with a maximum length of 255 characters.

The empty "Persons" table will now look like this:

P_Id LastName FirstName Address City

The empty table can be filled with data with the INSERT INTO statement.

Integrity Constraints

Integrity constraints provide a way of ensuring that changes made to the database by authorized users
do not result in a loss of data consistency.

Data Integrity

Department of Information Technology


Nivedita Ray De Sarkar Page
5151
Database Management System

Integrity ensures that the data in a database is both accurate and complete, in other words, that the
data makes sense. There are at least five different types of integrity that needs to be considered:

1. Domain constraints
2. Entity integrity
3. Column constraints
4. User-defined integrity constraints
5. Referential integrity

The data analysis stage will identify the requirements of these.

Domain Constraints

A domain is defined as the set of all unique values permitted for an attribute. For example, a domain of
date is the set of all possible valid dates, a domain of integer is all possible whole numbers, a domain of
day-of-week is Monday, Tuesday ... Sunday.

This in effect is defining rules for a particular attribute. If it is determined that an attribute is a date then
it should be implemented in the database to prevent invalid dates being entered.

A classic example of this is where the data from a legacy system is loaded into a newly designed
database. The new system is well designed. Columns that hold dates are defined as such whereas, in the
old system, they were held as character strings. Much data is rejected because of invalid dates, eg 30
February 2000.

If the system supports domain constraints then this invalid data would not have stored in the first place.
That is, the integrity of the database is being preserved.

Entity Integrity

A requirement of E F Codd in his seminal paper is that a primary key of an entity, or any part of it, can
never take a null value.
Oracle, and most other relational database management systems, will enforce this.

Column Constraints

During the data analysis phase, business rules will identify any column constraints. For example, a salary
cannot be negative; an employee number must be in the range 1000 - 2000, etc.

Oracle, and some other RDBMSs, will allow storage of these rules within the database itself. That is, in a
central data dictionary.

Referential Integrity

This is the most common type of integrity constraint. This is used to manage the relationships between
primary and foreign keys.

Department of Information Technology


Nivedita Ray De Sarkar Page
5252
Database Management System

Referential integrity is best illustrated by an example.

Let's assume the department and employee entities have been implemented as tables in a relational
database system.

When entering a new employee, the department in which they work needs to be specified. Department
number is the foreign key in the employee table and the primary key in the department table.

In order to preserve the integrity of the data in the database there are a set of rules that need to be
observed:

 If inserting an employee in the employee table, the insertion should only be allowed if their
department number exists in the department table
 If deleting a department in the department table, the deletion should only be allowed if there
are no employees working in that department
 If changing the value of a department number in the department table, the update should only
be allowed if there are no employees working in the department whose number is being
changed
 If changing the value of a department number in the employee table, the update should only be
allowed if the new value exists in the department table

If any of the above is allowed to happen then we have data in an inconsistent state. The integrity of the
data is compromised - the data does not make sense.

Oracle, and other relational database management systems, will allow enforcement of referential
integrity rules. If implemented, and a user tries to break any of the rules, an error message will be given
and the change will not take place.

SQL Constraints

Constraints are used to limit the type of data that can go into a table.

Constraints can be specified when a table is created (with the CREATE TABLE statement) or after the
table is created (with the ALTER TABLE statement).

We will focus on the following constraints:

 NOT NULL
 UNIQUE
 PRIMARY KEY
 FOREIGN KEY
 CHECK
 DEFAULT

The next chapters will describe each constraint in detail.

Department of Information Technology


Nivedita Ray De Sarkar Page
5353
Database Management System

SQL NOT NULL Constraint

The NOT NULL constraint enforces a column to NOT accept NULL values.

The NOT NULL constraint enforces a field to always contain a value. This means that you cannot insert a
new record, or update a record without adding a value to this field.

The following SQL enforces the "P_Id" column and the "LastName" column to not accept NULL values:
CREATE TABLE Persons
(
P_Id number NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)

SQL UNIQUE Constraint

The UNIQUE constraint uniquely identifies each record in a database table.


The UNIQUE and PRIMARY KEY constraints both provide a guarantee for uniqueness for a column or set
of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint defined on it.
Note that you can have many UNIQUE constraints per table, but only one PRIMARY KEY constraint per
table.

SQL UNIQUE Constraint on CREATE TABLE

The following SQL creates a UNIQUE constraint on the "P_Id" column when the "Persons" table is
created:

CREATE TABLE Persons


(
P_Id number NOT NULL UNIQUE,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple columns, use
the following SQL syntax:
CREATE TABLE Persons
(
P_Id int NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Department of Information Technology
Nivedita Ray De Sarkar Page
5454
Database Management System

Address varchar(255),
City varchar(255),
CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
)

SQL UNIQUE Constraint on ALTER TABLE

To create a UNIQUE constraint on the "P_Id" column when the table is already created, use the
following SQL:

ALTER TABLE Persons


ADD UNIQUE (P_Id)

To allow naming of a UNIQUE constraint, and for defining a UNIQUE constraint on multiple columns, use
the following SQL syntax:

ALTER TABLE Persons


ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)

To DROP a UNIQUE Constraint

To drop a UNIQUE constraint, use the following SQL:

ALTER TABLE Persons


DROP CONSTRAINT uc_PersonID

SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain unique values.
A primary key column cannot contain NULL values.
Each table should have a primary key, and each table can have only ONE primary key.

SQL PRIMARY KEY Constraint on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "P_Id" column when the "Persons" table is created:

CREATE TABLE Persons


(
P_Id number PRIMARY KEY,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),

Department of Information Technology


Nivedita Ray De Sarkar Page
5555
Database Management System

City varchar(255)
)

To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple
columns, use the following SQL syntax:

CREATE TABLE Persons


(
P_Id number NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)
)

Note: In the example above there is only ONE PRIMARY KEY (pk_PersonID). However, the value of the
pk_PersonID is made up of two columns (P_Id and LastName).

SQL PRIMARY KEY Constraint on ALTER TABLE

To create a PRIMARY KEY constraint on the "P_Id" column when the table is already created, use the
following SQL:
ALTER TABLE Persons
ADD PRIMARY KEY (P_Id)
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY constraint on multiple
columns, use the following SQL syntax:
ALTER TABLE Persons
ADD CONSTRAINT pk_PersonID PRIMARY KEY (P_Id,LastName)

Note: If you use the ALTER TABLE statement to add a primary key, the primary key column(s) must
already have been declared to not contain NULL values (when the table was first created).

To DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:

ALTER TABLE Persons


DROP CONSTRAINT pk_PersonID

SQL FOREIGN KEY Constraint

A FOREIGN KEY in one table points to a PRIMARY KEY in another table.

Let's illustrate the foreign key with an example. Look at the following two tables:

Department of Information Technology


Nivedita Ray De Sarkar Page
5656
Database Management System

The "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger

The "Orders" table:

O_Id OrderNo P_Id


1 77895 3
2 44678 3
3 22456 2
4 24562 1

Note that the "P_Id" column in the "Orders" table points to the "P_Id" column in the "Persons" table.

The "P_Id" column in the "Persons" table is the PRIMARY KEY in the "Persons" table.

The "P_Id" column in the "Orders" table is a FOREIGN KEY in the "Orders" table.

The FOREIGN KEY constraint is used to prevent actions that would destroy links between tables.

The FOREIGN KEY constraint also prevents invalid data from being inserted into the foreign key column,
because it has to be one of the values contained in the table it points to.

SQL FOREIGN KEY Constraint on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "P_Id" column when the "Orders" table is created:

CREATE TABLE Orders


(
O_Id varchar NOT NULL PRIMARY KEY,
OrderNo varchar NOT NULL,
P_Id varchar FOREIGN KEY REFERENCES Persons(P_Id)
)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple
columns, use the following SQL syntax:

CREATE TABLE Orders


(
O_Id number NOT NULL,

Department of Information Technology


Nivedita Ray De Sarkar Page
5757
Database Management System

OrderNo number NOT NULL,


P_Id number,
PRIMARY KEY (O_Id),
CONSTRAINT fk_PerOrders FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)
)

SQL FOREIGN KEY Constraint on ALTER TABLE

To create a FOREIGN KEY constraint on the "P_Id" column when the "Orders" table is already created,
use the following SQL:

ALTER TABLE Orders


ADD FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)

To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY constraint on multiple
columns, use the following SQL syntax:

ALTER TABLE Orders


ADD CONSTRAINT fk_PerOrders
FOREIGN KEY (P_Id)
REFERENCES Persons(P_Id)

To DROP a FOREIGN KEY Constraint

To drop a FOREIGN KEY constraint, use the following SQL:

ALTER TABLE Orders


DROP CONSTRAINT fk_PerOrders

SQL CHECK Constraint

The CHECK constraint is used to limit the value range that can be placed in a column.
If you define a CHECK constraint on a single column it allows only certain values for this column.
If you define a CHECK constraint on a table it can limit the values in certain columns based on values in
other columns in the row.

SQL CHECK Constraint on CREATE TABLE

The following SQL creates a CHECK constraint on the "P_Id" column when the "Persons" table is created.
The CHECK constraint specifies that the column "P_Id" must only include integers greater than 0.

CREATE TABLE Persons


(

Department of Information Technology


Nivedita Ray De Sarkar Page
5858
Database Management System

P_Id number NOT NULL CHECK (P_Id>0),


LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255)
)
To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use
the following SQL syntax:
CREATE TABLE Persons
(
P_Id number NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255),
CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')
)

SQL CHECK Constraint on ALTER TABLE

To create a CHECK constraint on the "P_Id" column when the table is already created, use the following
SQL:

ALTER TABLE Persons


ADD CHECK (P_Id>0)

To allow naming of a CHECK constraint, and for defining a CHECK constraint on multiple columns, use
the following SQL syntax:

ALTER TABLE Persons


ADD CONSTRAINT chk_Person CHECK (P_Id>0 AND City='Sandnes')

To DROP a CHECK Constraint

To drop a CHECK constraint, use the following SQL:

ALTER TABLE Persons


DROP CONSTRAINT chk_Person

SQL DEFAULT Constraint

The DEFAULT constraint is used to insert a default value into a column.


The default value will be added to all new records, if no other value is specified.

Department of Information Technology


Nivedita Ray De Sarkar Page
5959
Database Management System

SQL DEFAULT Constraint on CREATE TABLE

The following SQL creates a DEFAULT constraint on the "City" column when the "Persons" table is
created:

CREATE TABLE Persons


(
P_Id number NOT NULL,
LastName varchar(255) NOT NULL,
FirstName varchar(255),
Address varchar(255),
City varchar(255) DEFAULT 'Sandnes'
)

The DEFAULT constraint can also be used to insert system values, by using functions like GETDATE():

CREATE TABLE Orders


(
O_Id number NOT NULL,
OrderNo number NOT NULL,
P_Id number,
OrderDate date DEFAULT GETDATE()
)

SQL DEFAULT Constraint on ALTER TABLE

To create a DEFAULT constraint on the "City" column when the table is already created, use the
following SQL:

ALTER TABLE Persons


MODIFY City DEFAULT 'SANDNES'

To DROP a DEFAULT Constraint

To drop a DEFAULT constraint, use the following SQL:

ALTER TABLE Persons


ALTER COLUMN City DROP DEFAULT

The CREATE INDEX statement is used to create indexes in tables.

Indexes allow the database application to find data fast; without reading the whole table.

Indexes

Department of Information Technology


Nivedita Ray De Sarkar Page
6060
Database Management System

An index can be created in a table to find data more quickly and efficiently.

The users cannot see the indexes, they are just used to speed up searches/queries.

Note: Updating a table with indexes takes more time than updating a table without (because the
indexes also need an update). So you should only create indexes on columns (and tables) that will be
frequently searched against.

SQL CREATE INDEX Syntax

Creates an index on a table. Duplicate values are allowed:

CREATE INDEX index_name


ON table_name (column_name)

SQL CREATE UNIQUE INDEX Syntax

Creates a unique index on a table. Duplicate values are not allowed:

CREATE UNIQUE INDEX index_name


ON table_name (column_name)

Note: The syntax for creating indexes varies amongst different databases. Therefore: Check the syntax
for creating indexes in your database.

CREATE INDEX Example

The SQL statement below creates an index named "PIndex" on the "LastName" column in the "Persons"
table:

CREATE INDEX PIndex


ON Persons (LastName)

If you want to create an index on a combination of columns, you can list the column names within the
parentheses, separated by commas:

CREATE INDEX PIndex


ON Persons (LastName, FirstName)

The DROP INDEX Statement

The DROP INDEX statement is used to delete an index in a table.

DROP INDEX Syntax for MS Access:


DROP INDEX index_name ON table_name

Department of Information Technology


Nivedita Ray De Sarkar Page
6161
Database Management System

DROP INDEX Syntax for MS SQL Server:


DROP INDEX table_name.index_name

DROP INDEX Syntax for DB2/Oracle:


DROP INDEX index_name

DROP INDEX Syntax for MySQL:


ALTER TABLE table_name DROP INDEX index_name

The DROP TABLE Statement

The DROP TABLE statement is used to delete a table.

DROP TABLE table_name

The DROP DATABASE Statement

The DROP DATABASE statement is used to delete a database.

DROP DATABASE database_name

The TRUNCATE TABLE Statement

What if we only want to delete the data inside the table, and not the table itself?

Then, use the TRUNCATE TABLE statement:

TRUNCATE TABLE table_name

The ALTER TABLE Statement

The ALTER TABLE statement is used to add, delete, or modify columns in an existing table.

SQL ALTER TABLE Syntax

To add a column in a table, use the following syntax:

ALTER TABLE table_name


ADD column_name datatype
To delete a column in a table, use the following syntax (notice that some database systems don't allow
deleting a column):
ALTER TABLE table_name
DROP COLUMN column_name

My SQL / Oracle:

Department of Information Technology


Nivedita Ray De Sarkar Page
6262
Database Management System

ALTER TABLE table_name


MODIFY column_name datatype

SQL ALTER TABLE Example

Look at the "Persons" table:

P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

Now we want to add a column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

ALTER TABLE Persons


ADD DateOfBirth date

Notice that the new column, "DateOfBirth", is of type date and is going to hold a date. The data type
specifies what type of data the column can hold.

Change Data Type Example

Now we want to change the data type of the column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:
ALTER TABLE Persons
ALTER COLUMN DateOfBirth year
Notice that the "DateOfBirth" column is now of type year and is going to hold a year in a two-digit or
four-digit format.

DROP COLUMN Example

Next, we want to delete the column named "DateOfBirth" in the "Persons" table.

We use the following SQL statement:

Department of Information Technology


Nivedita Ray De Sarkar Page
6363
Database Management System

ALTER TABLE Persons


DROP COLUMN DateOfBirth
The "Persons" table will now like this:
P_Id LastName FirstName Address City

1 Hansen Ola Timoteivn 10 Sandnes

2 Svendson Tove Borgvn 23 Sandnes

3 Pettersen Kari Storgt 20 Stavanger

A view is a virtual table.

This chapter shows how to create, update, and delete a view.

SQL CREATE VIEW Statement

In SQL, a view is a virtual table based on the result-set of an SQL statement.

A view contains rows and columns, just like a real table. The fields in a view are fields from one or more
real tables in the database.

You can add SQL functions, WHERE, and JOIN statements to a view and present the data as if the data
were coming from one single table.

SQL CREATE VIEW Syntax


CREATE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Note: A view always shows up-to-date data! The database engine recreates the data, using the view's
SQL statement, every time a user queries a view.

SQL CREATE VIEW Examples

If you have the Northwind database you can see that it has several views installed by default.

The view "Current Product List" lists all active products (products that are not discontinued) from the
"Products" table. The view is created with the following SQL:
CREATE VIEW [Current Product List] AS
SELECT ProductID,ProductName

Department of Information Technology


Nivedita Ray De Sarkar Page
6464
Database Management System

FROM Products
WHERE Discontinued=No
We can query the view above as follows:
SELECT * FROM [Current Product List]

Another view in the Northwind sample database selects every product in the "Products" table with a
unit price higher than the average unit price:

CREATE VIEW [Products Above Average Price] AS


SELECT ProductName,UnitPrice
FROM Products
WHERE UnitPrice>(SELECT AVG(UnitPrice) FROM Products)
We can query the view above as follows:
SELECT * FROM [Products Above Average Price]
Another view in the Northwind database calculates the total sale for each category in 1997. Note that
this view selects its data from another view called "Product Sales for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName,Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName
We can query the view above as follows:
SELECT * FROM [Category Sales For 1997]

We can also add a condition to the query. Now we want to see the total sale only for the category
"Beverages":

SELECT * FROM [Category Sales For 1997]


WHERE CategoryName='Beverages'

SQL Updating a View

You can update a view by using the following syntax:

SQL CREATE OR REPLACE VIEW Syntax


CREATE OR REPLACE VIEW view_name AS
SELECT column_name(s)
FROM table_name
WHERE condition

Now we want to add the "Category" column to the "Current Product List" view. We will update the view
with the following SQL:

CREATE VIEW [Current Product List] AS


SELECT ProductID,ProductName,Category
FROM Products
WHERE Discontinued=No;

Department of Information Technology


Nivedita Ray De Sarkar Page
6565
Database Management System

SQL Dropping a View

You can delete a view with the DROP VIEW command.

SQL DROP VIEW Syntax


DROP VIEW view_name;

SQL Aggregate Functions

SQL aggregate functions return a single value, calculated from values in a column.

Useful aggregate functions:

 AVG() - Returns the average value


 COUNT() - Returns the number of rows
 FIRST() - Returns the first value
 LAST() - Returns the last value
 MAX() - Returns the largest value
 MIN() - Returns the smallest value
 SUM() - Returns the sum

SQL Scalar functions

SQL scalar functions return a single value, based on the input value.

Useful scalar functions:

 UCASE() - Converts a field to upper case


 LCASE() - Converts a field to lower case
 MID() - Extract characters from a text field
 LEN() - Returns the length of a text field
 ROUND() - Rounds a numeric field to the number of decimals specified
 NOW() - Returns the current system date and time
 FORMAT() - Formats how a field is to be displayed

Tip: The aggregate functions and the scalar functions will be explained in details in the next chapters.

The AVG() Function

The AVG() function returns the average value of a numeric column.

SQL AVG() Syntax


SELECT AVG(column_name) FROM table_name

SQL AVG() Example

Department of Information Technology


Nivedita Ray De Sarkar Page
6666
Database Management System

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer

1 2008/11/12 1000 Hansen

2 2008/10/23 1600 Nilsen

3 2008/09/02 700 Hansen

4 2008/09/03 300 Hansen

5 2008/08/30 2000 Jensen

6 2008/10/04 100 Nilsen

Now we want to find the average value of the "OrderPrice" fields.


We use the following SQL statement:
SELECT AVG(OrderPrice) AS OrderAverage FROM Orders
The result-set will look like this:
OrderAverage

950

Now we want to find the customers that have an OrderPrice value higher than the average OrderPrice
value.

We use the following SQL statement:


SELECT Customer FROM Orders
WHERE OrderPrice>(SELECT AVG(OrderPrice) FROM Orders)
The result-set will look like this:
Customer

Hansen

Nilsen

Jensen

The COUNT() function returns the number of rows that matches a specified criteria.

Department of Information Technology


Nivedita Ray De Sarkar Page
6767
Database Management System

SQL COUNT(column_name) Syntax

The COUNT(column_name) function returns the number of values (NULL values will not be counted) of
the specified column:

SELECT COUNT(column_name) FROM table_name

SQL COUNT(*) Syntax

The COUNT(*) function returns the number of records in a table:

SELECT COUNT(*) FROM table_name

SQL COUNT(DISTINCT column_name) Syntax

The COUNT(DISTINCT column_name) function returns the number of distinct values of the specified
column:

SELECT COUNT(DISTINCT column_name) FROM table_name

Note: COUNT(DISTINCT) works with ORACLE and Microsoft SQL Server, but not with Microsoft Access.

SQL COUNT(column_name) Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer

1 2008/11/12 1000 Hansen

2 2008/10/23 1600 Nilsen

3 2008/09/02 700 Hansen

4 2008/09/03 300 Hansen

5 2008/08/30 2000 Jensen

6 2008/10/04 100 Nilsen

Now we want to count the number of orders from "Customer Nilsen".

We use the following SQL statement:

Department of Information Technology


Nivedita Ray De Sarkar Page
6868
Database Management System

SELECT COUNT(Customer) AS CustomerNilsen FROM Orders


WHERE Customer='Nilsen'
The result of the SQL statement above will be 2, because the customer Nilsen has made 2 orders in
total:
CustomerNilsen

SQL COUNT(*) Example

If we omit the WHERE clause, like this:

SELECT COUNT(*) AS NumberOfOrders FROM Orders

The result-set will look like this:

NumberOfOrders

which is the total number of rows in the table.

SQL COUNT(DISTINCT column_name) Example

Now we want to count the number of unique customers in the "Orders" table.
We use the following SQL statement:
SELECT COUNT(DISTINCT Customer) AS NumberOfCustomers FROM Orders
The result-set will look like this:
NumberOfCustomers
3
which is the number of unique customers (Hansen, Nilsen, and Jensen) in the "Orders" table.

The MAX() Function

The MAX() function returns the largest value of the selected column.

SQL MAX() Syntax


SELECT MAX(column_name) FROM table_name

Department of Information Technology


Nivedita Ray De Sarkar Page
6969
Database Management System

SQL MAX() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer

1 2008/11/12 1000 Hansen

2 2008/10/23 1600 Nilsen

3 2008/09/02 700 Hansen

4 2008/09/03 300 Hansen

5 2008/08/30 2000 Jensen

6 2008/10/04 100 Nilsen

Now we want to find the largest value of the "OrderPrice" column.

We use the following SQL statement:

SELECT MAX(OrderPrice) AS LargestOrderPrice FROM Orders

The result-set will look like this:

LargestOrderPrice
2000

The MIN() Function

The MIN() function returns the smallest value of the selected column.

SQL MIN() Syntax


SELECT MIN(column_name) FROM table_name

SQL MIN() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer

Department of Information Technology


Nivedita Ray De Sarkar Page
7070
Database Management System

1 2008/11/12 1000 Hansen


2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now we want to find the smallest value of the "OrderPrice" column.
We use the following SQL statement:
SELECT MIN(OrderPrice) AS SmallestOrderPrice FROM Orders
The result-set will look like this:
SmallestOrderPrice
100

The SUM() Function

The SUM() function returns the total sum of a numeric column.

SQL SUM() Syntax


SELECT SUM(column_name) FROM table_name

SQL SUM() Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen
Now we want to find the sum of all "OrderPrice" fields".
We use the following SQL statement:
SELECT SUM(OrderPrice) AS OrderTotal FROM Orders
The result-set will look like this:
OrderTotal
5700

Aggregate functions often need an added GROUP BY statement.

The GROUP BY Statement

Department of Information Technology


Nivedita Ray De Sarkar Page
7171
Database Management System

The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by
one or more columns.

SQL GROUP BY Syntax


SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name

SQL GROUP BY Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen
4 2008/09/03 300 Hansen
5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find the total sum (total order) of each customer.

We will have to use the GROUP BY statement to group the customers.

We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders


GROUP BY Customer

The result-set will look like this:

Customer SUM(OrderPrice)
Hansen 2000
Nilsen 1700
Jensen 2000

Let's see what happens if we omit the GROUP BY statement:

SELECT Customer,SUM(OrderPrice) FROM Orders

The result-set will look like this:

Department of Information Technology


Nivedita Ray De Sarkar Page
7272
Database Management System

Customer SUM(OrderPrice)
Hansen 5700
Nilsen 5700
Hansen 5700
Hansen 5700
Jensen 5700
Nilsen 5700

The result-set above is not what we wanted.

Explanation of why the above SELECT statement cannot be used: The SELECT statement above has two
columns specified (Customer and SUM(OrderPrice). The "SUM(OrderPrice)" returns a single value (that
is the total sum of the "OrderPrice" column), while "Customer" returns 6 values (one value for each row
in the "Orders" table). This will therefore not give us the correct result. However, you have seen that the
GROUP BY statement solves this problem.

GROUP BY More Than One Column

We can also use the GROUP BY statement on more than one column, like this:

SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders


GROUP BY Customer,OrderDate

The HAVING Clause

The HAVING clause was added to SQL because the WHERE keyword could not be used with aggregate
functions.

SQL HAVING Syntax


SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value

SQL HAVING Example

We have the following "Orders" table:

O_Id OrderDate OrderPrice Customer


1 2008/11/12 1000 Hansen
2 2008/10/23 1600 Nilsen
3 2008/09/02 700 Hansen

Department of Information Technology


Nivedita Ray De Sarkar Page
7373
Database Management System

4 2008/09/03 300 Hansen


5 2008/08/30 2000 Jensen
6 2008/10/04 100 Nilsen

Now we want to find if any of the customers have a total order of less than 2000.

We use the following SQL statement:

SELECT Customer,SUM(OrderPrice) FROM Orders


GROUP BY Customer
HAVING SUM(OrderPrice)<2000

The result-set will look like this:

Customer SUM(OrderPrice)
Nilsen 1700

Now we want to find if the customers "Hansen" or "Jensen" have a total order of more than 1500.

We add an ordinary WHERE clause to the SQL statement:


SELECT Customer,SUM(OrderPrice) FROM Orders
WHERE Customer='Hansen' OR Customer='Jensen'
GROUP BY Customer
HAVING SUM(OrderPrice)>1500

The result-set will look like this:

Customer SUM(OrderPrice)
Hansen 2000
Jensen 2000

The UCASE() Function

The UCASE() function converts the value of a field to uppercase.

SQL UCASE() Syntax


SELECT UCASE(column_name) FROM table_name

Syntax for SQL Server


SELECT UPPER(column_name) FROM table_name

SQL UCASE() Example

Department of Information Technology


Nivedita Ray De Sarkar Page
7474
Database Management System

We have the following "Persons" table:

P_Id LastName FirstName Address City


1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the
"LastName" column to uppercase.
We use the following SELECT statement:
SELECT UCASE(LastName) as LastName,FirstName FROM Persons
The result-set will look like this:
LastName FirstName
HANSEN Ola
SVENDSON Tove
PETTERSEN Kari
The LCASE() Function
The LCASE() function converts the value of a field to lowercase.
SQL LCASE() Syntax
SELECT LCASE(column_name) FROM table_name

Syntax for SQL Server


SELECT LOWER(column_name) FROM table_name

SQL LCASE() Example


We have the following "Persons" table:
P_Id LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Now we want to select the content of the "LastName" and "FirstName" columns above, and convert the
"LastName" column to lowercase.
We use the following SELECT statement:
SELECT LCASE(LastName) as LastName,FirstName FROM Persons
The result-set will look like this:
LastName FirstName
hansen Ola
svendson Tove
pettersen Kari

Department of Information Technology


Nivedita Ray De Sarkar Page
7575
Database Management System

Cursor

A cursor is a control structure that enables traversal over the records in a database. Cursors facilitate
subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of
database records.

A cursor is a temporary work area created in the system memory when a SQL statement is executed. A
cursor contains information on a select statement and the rows of data accessed by it. This temporary
work area is used to store the data retrieved from the database, and manipulate this data. A cursor can
hold more than one row, but can process only one row at a time. The set of rows the cursor holds is
called the active set.

Cursors are used by database programmers to process individual rows returned by database system
queries. Cursors enable manipulation of whole result-sets at once -- a capability that most procedural
programming languages lack. In this scenario, a cursor enables the rows in a result-set to be processed
sequentially.

There are two types of cursors in PL/SQL:

Implicit cursors:

These are created by default when DML statements like, INSERT, UPDATE, and DELETE statements are
executed. They are also created when a SELECT statement that returns just one row is executed.

Oracle provides few attributes called as implicit cursor attributes to check the status of DML operations.
The cursor attributes available are %FOUND, %NOTFOUND, %ROWCOUNT, and %ISOPEN.

For example, When you execute INSERT, UPDATE, or DELETE statements the cursor attributes tell us
whether any rows are affected and how many have been affected.
When a SELECT... INTO statement is executed in a PL/SQL Block, implicit cursor attributes can be used to
find out whether any row has been returned by the SELECT statement. PL/SQL returns an error when no
data is selected.

The status of the cursor for each of these attributes are defined in the below table.

Attributes Return Value Example


%FOUND The return value is TRUE, if the DML statements like SQL%FOUND
INSERT, DELETE and UPDATE affect at least one row
and if SELECT ….INTO statement return at least one
row.
The return value is FALSE, if DML statements like
INSERT, DELETE and UPDATE do not affect row and
if SELECT….INTO statement do not return a row.
%NOTFOUND The return value is FALSE, if DML statements like SQL%NOTFOUND
INSERT, DELETE and UPDATE at least one row and if
SELECT ….INTO statement return at least one row.

Department of Information Technology


Nivedita Ray De Sarkar Page
7676
Database Management System

The return value is TRUE, if a DML statement like


INSERT, DELETE and UPDATE do not affect even one
row and if SELECT ….INTO statement does not
return a row.
%ROWCOUNT Return the number of rows affected by the DML SQL%ROWCOUNT
operations INSERT, DELETE, UPDATE, SELECT

For Example: Consider the PL/SQL Block that uses implicit cursor attributes as shown below:

Explicit cursors:

They must be created when you are executing a SELECT statement that returns more than one row.
Even though the cursor stores multiple records, only one record can be processed at a time, which is
called as current row. When you fetch a row the current row position moves to next row.

An explicit cursor is defined in the declaration section of the PL/SQL Block. It is created on a SELECT
Statement which returns more than one row. We can provide a suitable name for the cursor.

The General Syntax for creating a cursor is as given below:

CURSOR cursor_name IS select_statement;

 cursor_name – A suitable name for the cursor.


 select_statement – A select query which returns multiple rows.

There are four steps in using an Explicit Cursor.

 DECLARE the cursor in the declaration section.


 OPEN the cursor in the Execution Section.
 FETCH the data from cursor into PL/SQL variables or records in the Execution Section.
 CLOSE the cursor in the Execution Section before you end the PL/SQL Block.
 1) Declaring a Cursor in the Declaration Section:
 DECLARE
 CURSOR emp_cur IS
 SELECT *
 FROM emp_tbl
 WHERE salary > 5000;
 In the above example we are creating a cursor ‘emp_cur’ on a query which returns the
records of all the
employees with salary greater than 5000. Here ‘emp_tbl’ in the table which contains records
of all the
employees.
 2) Accessing the records in the cursor:
Once the cursor is created in the declaration section we can access the cursor in the

Department of Information Technology


Nivedita Ray De Sarkar Page
7777
Database Management System

execution
section of the PL/SQL program.
 How to access an Explicit Cursor?
 These are the three steps in accessing the cursor.
1) Open the cursor.
2) Fetch the records in the cursor one at a time.
3) Close the cursor.
 General Syntax to open a cursor is:
OPEN cursor_name;
 General Syntax to fetch records from a cursor is:
FETCH cursor_name INTO record_name;
 OR
FETCH cursor_name INTO variable_list;
 General Syntax to close a cursor is:
CLOSE cursor_name;
 When a cursor is opened, the first row becomes the current row. When the data is fetched it is
copied to the record or variables and the logical pointer moves to the next row and it becomes
the current row. On every fetch statement, the pointer moves to the next row. If you want to
fetch after the last row, the program will throw an error. When there is more than one row in a
cursor we can use loops along with explicit cursor attributes to fetch all the records.
 Points to remember while fetching a row:
 We can fetch the rows in a cursor to a PL/SQL Record or a list of variables created in the PL/SQL
Block.
If you are fetching a cursor to a PL/SQL Record, the record should have the same structure as
the cursor.
If you are fetching a cursor to a list of variables, the variables should be listed in the same order
in the fetch statement as the columns are present in the cursor.
 General Form of using an explicit cursor is:

DECLARE
variables;
records;
create a cursor;
BEGIN
OPEN cursor;
FETCH cursor;
process the records;
CLOSE cursor;
END;

 Lets Look at the example below

Example 1:
1> DECLARE
2> emp_rec emp_tbl%rowtype;
3> CURSOR emp_cur IS

Department of Information Technology


Nivedita Ray De Sarkar Page
7878
Database Management System

4> SELECT *
5> FROM
6> WHERE salary > 10;
7> BEGIN
8> OPEN emp_cur;
9> FETCH emp_cur INTO emp_rec;
10> dbms_output.put_line (emp_rec.first_name || ' ' || emp_rec.last_name);
11> CLOSE emp_cur;
12> END;
 In the above example, first we are creating a record ‘emp_rec’ of the same structure as of table
‘emp_tbl’ in line no 2. We can also create a record with a cursor by replacing the table name
with the cursor name. Second, we are declaring a cursor ‘emp_cur’ from a select query in line no
3 - 6. Third, we are opening the cursor in the execution section in line no 8. Fourth, we are
fetching the cursor to the record in line no 9. Fifth, we are displaying the first_name and
last_name of the employee in the record emp_rec in line no 10. Sixth, we are closing the cursor
in line no 11.
 What are Explicit Cursor Attributes?
 Oracle provides some attributes known as Explicit Cursor Attributes to control the data
processing while using cursors. We use these attributes to avoid errors while accessing cursors
through OPEN, FETCH and CLOSE Statements.
 When does an error occur while accessing an explicit cursor?
 a) When we try to open a cursor which is not closed in the previous operation.
b) When we try to fetch a cursor after the last operation.
 These are the attributes available to check the status of an explicit cursor.

Attributes Return values Example


%FOUND TRUE, if fetch statement returns at least Cursor_name%FOUND
one row.
FALSE, if fetch statement doesn’t return a
row.
%NOTFOUND TRUE, , if fetch statement doesn’t return a Cursor_name%NOTFOUND
row.
FALSE, if fetch statement returns at least
one row.
%ROWCOUNT The number of rows fetched by the fetch Cursor_name%ROWCOUNT
statement
If no row is returned, the PL/SQL
statement returns an error.
%ISOPEN TRUE, if the cursor is already open in the Cursor_name%ISNAME
program
FALSE, if the cursor is not opened in the
program.

Department of Information Technology


Nivedita Ray De Sarkar Page
7979
Database Management System

Both implicit and explicit cursors have the same functionality, but they differ in the way they are
accessed.

Disadvantages of cursors

Fetching a row from the cursor may result in a network round trip each time. This uses much more
network bandwidth than would ordinarily be needed for the execution of a single SQL statement like
DELETE. Repeated network round trips can severely impact the speed of the operation using the cursor.
Some DBMSs try to reduce this impact by using block fetch. Block fetch implies that multiple rows are
sent together from the server to the client. The client stores a whole block of rows in a local buffer and
retrieves the rows from there until that buffer is exhausted.

Cursors allocate resources on the server, for instance locks, packages, processes, temporary storage, etc.

Stored Procedure

A stored procedure or in simple a proc is a named PL/SQL block which performs one or more specific
task. This is similar to a procedure in other programming languages. A procedure has a header and a
body. The header consists of the name of the procedure and the parameters or variables passed to the
procedure. The body consists or declaration section, execution section and exception section similar to a
general PL/SQL Block. A procedure is similar to an anonymous PL/SQL Block but it is named for repeated
usage.

We can pass parameters to procedures in three ways.


1) IN-parameters
2) OUT-parameters
3) IN OUT-parameters

A procedure may or may not return any value.

A stored procedure is a subroutine available to applications accessing a relational database system.


Stored procedures are actually stored in the database data dictionary.

Stored procedures are typically used for data validation (integrated into the database) or access control
mechanisms. Extensive or complex processing that requires the execution of several SQL statements is
moved into stored procedures, and all applications call the procedures. One can use nested stored
procedures, by executing one stored procedure from within another.

Stored procedures are similar to user-defined functions (UDFs). The major difference is that UDFs can be
used like any other expression within SQL statements, whereas stored procedures must be invoked
using the CALL statement.

General Syntax to create a procedure is:

Department of Information Technology


Nivedita Ray De Sarkar Page
8080
Database Management System

CREATE [OR REPLACE] PROCEDURE proc_name [list of parameters]


IS
Declaration section
BEGIN
Execution section
EXCEPTION
Exception section
END;

Stored procedures may return result sets, i.e. the results of a SELECT statement. Stored procedures may
also contain declared variables for processing data and cursors that allow it to loop through multiple
rows in a table. Stored procedure languages typically include IF, WHILE, LOOP, REPEAT, and CASE
statements, and more. Stored procedures can receive variables, return results or modify variables and
return them, depending on how and where the variable is declared.

There are two ways to execute a procedure.

1) From the SQL prompt.

EXECUTE [or EXEC] procedure_name;

2) Within another procedure – simply use the procedure name.

procedure_name;

Disadvantages

Stored procedure languages are quite often vendor-specific. If you want to switch to using another
vendor's database, then you have to rewrite your stored procedures. Stored procedure languages from
different vendors have different levels of sophistication; for example, Oracle's PL/SQL has more
languages features and built-in features than Microsoft's T-SQL.

Normalization
Data normalization is a set of rules and techniques concerned with:

 Identifying relationships among attributes.

 Combining attributes to form relations.

 Combining relations to form a database.

Department of Information Technology


Nivedita Ray De Sarkar Page
8181
Database Management System

It follows a set of rules worked out by E F Codd in 1970. A normalized relational database provides
several benefits:

Elimination of redundant data storage.

Close modeling of real world entities, processes, and their relationships.

Structuring of data so that the model is flexible.

First Normal Form:


The only attribute values permitted by 1 NF are single atomic (or indivisible) values. If there is no value
present for the column, then it should be denoted as null.
An example for the following table data named as Student.

Student_ID First_Name Last_Name Telephone_No

A102 Ram Sinha

A103 Tom Joseph 224-504-2803,


224-728-2403

This table is not in First Normal Form because field Telephone_No containing more than one value. Also
the first record is not having any values. To make the table in the first normal form we need to split the
A103 rows into two and supply the blank value with null. So the resultant table is as follows:-

Student_ID First_Name Last_Name Telephone_No

A102 Ram Sinha null

A103 Tom Joseph 224-504-2803

A103 Tom Joseph


224-728-2403

Here repetitive data cannot be avoided since they represent unique items.

Department of Information Technology


Nivedita Ray De Sarkar Page
8282
Database Management System

Second Normal Form


2NF is based on the concept of full functional dependency and the table must be in 1NF. A relation
schema R is in 2NF if every nonprime attribute A in R is fully functional dependent on the primary key of
R.
Anomalies can occur when attributes are dependent on only part of a multi-attribute (composite) key.
A relation is in second normal form when all non-key attributes are dependent on the whole key. That is,
no attribute is dependent on only a part of the key.
Any relation having a key with a single attribute is in second normal form.
An example
Student_ID Sub_Code Course Student_Name Student_Add

A981 M101 MCA Tom Kolkata

A982 MB102 MBA Sam Delhi


There are two functional dependencies, to make this table in 2NF these attribute must be removed and
placed somewhere else.

{Student_ID, Sub_Code} Course


Student_ID {Student_Name, Student_Add}
So, we have partial dependency since Student_Name and Student_Add are partially dependent on the
key attribute. After decomposition:
Course Table:-
Student_ID Sub_Code Course

A981 M101 MCA

A982 MB102 MBA


Student Table:-
Student_ID Student_Name Student_Add

A981 Tom Kolkata

A982 Sam Delhi

Third Normal Form


A table is in third normal form (3NF) if and only if it is in 2NF and every non key attribute is non
transitively dependent on the primary key (i.e. there are no transitive dependencies).
Anomalies can occur when a relation contains one or more transitive dependencies.
A relation is in 3NF when it is in 2NF and has no transitive dependencies.
A relation is in 3NF when 'All non-key attributes are dependent on the key, the whole key and nothing
but the key'.
An example:

Emp_ID EmpName Emp_DOB Emp_Add Dept_No Dept_Name

Department of Information Technology


Nivedita Ray De Sarkar Page
8383
Database Management System

A101 Adam 29-10-1979 Kolkata 11234 Accounts

B202 Brian 20-03-1989 Delhi 12344


IT

The table is not in 3NF as the Dept_Name is not directly dependent on the EMP_ID but on the Dept_No
and the Emp_Id is associated to the Dept_No because each employee should belong to some
department. So the table is split into two following relations to make it in 3NF.
FDs :
Emp_ID EmpName
Emp_ID Emp_DOB
Emp_ID Emp_Add
Emp_ID Dept_No Transitive dependency
Dept_no Dept_Name

So the table is split into two following relations to make it in 3NF.


R1 (Emp_ID,EmpName,Emp_DOB,Emp_Add,Dept_No)
Emp_ID EmpName Emp_DOB Emp_Add Dept_No

A101 Adam 29-10-1979 Kolkata 11234

B202 Brian 20-03-1989 Delhi 12344

R2 (Dept_No,Dept_Name)
Dept_No Dept_Name

11234 Accounts

12344 IT

Boyce-Codd Normal Form (BCNF)


A table is in Boyce-Codd normal form (BCNF) if and only if it is in 3NF and every determinant is a
candidate key.
Anomalies can occur in relations in 3NF if there is a composite key in which part of that key has a
determinant which is itself not a candidate key.
This can be expressed as R(A,B,C), C--->A where:

The relation R contains attributes A, B and C.

A and B form a candidate key.

Department of Information Technology


Nivedita Ray De Sarkar Page
8484
Database Management System

C is the determinant for A (C -- > A).

C is not part of any key.

Anomalies can also occur where a relation contains several candidate keys where:

The keys contain more than one attribute (they are composite keys).

An attribute is common to more than one key.

Example to understand BCNF:-

campus course class time room/bldg

A English 101 1 8:00-9:00


A212

B English 101 2 10:00-11:00 O305

A English 101 3 8:00-9:00


A102

No two buildings on any of the university campuses have the same name, thus
ROOM/BLDG ----->CAMPUS. As the determinant is not a candidate key this table is NOT in Boyce-Codd
normal form.

This table should be decomposed into the following relations:


R1 (course, class, room/bldg, time)

course class time room/bldg

English 101 1 8:00-9:00


A212

English 101 2 10:00-11:00 O305

English 101 3 8:00-9:00


A102

R2 (room/bldg, campus)

Department of Information Technology


Nivedita Ray De Sarkar Page
8585
Database Management System

campus room/bldg

A A212

B O305

A A102

Fourth Normal Form


A table is in fourth normal form (4NF) if and only if it is in BCNF and contains no more than one multi-
valued dependency.
Anomalies can occur in relations in BCNF if there is more than one multi-valued dependency.
If A--->B and A--->C but B and C are unrelated, ie A---> (B, C) is false, then we have more than one multi-
valued dependency.
Example to understand 4NF:-
Info (employee#, skills, hobbies)
E_no skills hobbies

1 Programming Golf

1 Programming Bowling

1 Analysis Golf

1 Analysis Bowling

2 Analysis Golf

2 Analysis Gardening

2 Management Golf

2 Management
Gardening

Department of Information Technology


Nivedita Ray De Sarkar Page
8686
Database Management System

This table is difficult to maintain since adding a new hobby requires multiple new rows corresponding to
each skill. This problem is created by the pair of multi-valued dependencies EMPLOYEE#--->SKILLS and
EMPLOYEE#--->HOBBIES.
A much better alternative would be to decompose INFO into two relations:
Skills (employee#, skill)

E_no skills

1 Programming

1 Analysis

2 Analysis

2
Management
Hobbies (employee#, hobby)
E_no hobbies

1 Golf

1 Bowling

2 Golf

2
Gardening

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin
in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the
second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is
concerned with a more general type of dependency known as a multivalued dependency. A table is in
4NF if and only if, for every one of its non-trivial multivalued dependencies X →→ Y, X is a superkey—
that is, X is either a candidate key or a superset thereof.

Multivalued dependencies

Department of Information Technology


Nivedita Ray De Sarkar Page
8787
Database Management System

If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z,
then, in the context of a particular row, we can refer to the data beneath each group of headings as x, y,
and z respectively. A multivalued dependency X →→ Y signifies that if we choose any x actually
occurring in the table (call this choice xc), and compile a list of all the xcyz combinations that occur in the
table, we will find that xc is associated with the same y entries regardless of z.

A trivial multivalued dependency X →→ Y is one in which Y consists of all columns belonging to X. That
is, a subset of attributes in a table has a trivial multivalued dependency on the remaining subset of
attributes.

A functional dependency is a special case of multivalued dependency. In a functional dependency X → Y,


every x determines exactly one y, never more than one.

Example

Consider the following example:

Pizza Delivery Permutations


Restaurant Pizza Variety Delivery Area
A1 Pizza Thick Crust Garia
A1 Pizza Thick Crust Salt Lake
A1 Pizza Thick Crust New Market
A1 Pizza Stuffed Crust Garia
A1 Pizza Stuffed Crust Salt Lake
A1 Pizza Stuffed Crust New Market
Elite Pizza Thin Crust New Market
Elite Pizza Stuffed Crust New Market
Domino's Pizza Thick Crust Garia
Domino's Pizza Thick Crust Salt Lake
Domino's Pizza Thin Crust Garia
Domino's Pizza Thin Crust Salt Lake

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.

The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}.
Therefore it meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a
restaurant are not affected by delivery area, then it does not meet 4NF. The problem is that the table
features two non-trivial multivalued dependencies on the {Restaurant} attribute (which is not a
superkey). The dependencies are:

Department of Information Technology


Nivedita Ray De Sarkar Page
8888
Database Management System

 {Restaurant} →→ {Pizza Variety}


 {Restaurant} →→ {Delivery Area}

These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza
a restaurant offers are independent from the areas to which the restaurant delivers. This state of affairs
leads to redundancy in the table: for example, we are told three times that A1 Pizza offers Stuffed Crust,
and if A1 Pizza starts producing Cheese Crust pizzas then we will need to add multiple rows, one for each
of A1 Pizza's delivery areas. There is, moreover, nothing to prevent us from doing this incorrectly: we
might add Cheese Crust rows for all but one of A1 Pizza's delivery areas, thereby failing to respect the
multivalued dependency {Restaurant} →→ {Pizza Variety}.

To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a
different table from the facts about delivery areas, yielding two tables that are both in 4NF:

Varieties By Restaurant
Restaurant Pizza Variety
A1 Pizza Thick Crust
A1 Pizza Stuffed Crust
Elite Pizza Thin Crust
Elite Pizza Stuffed Crust
Domino's Pizza Thick Crust
Domino's Pizza Thin Crust

Delivery Areas By Restaurant


Restaurant Delivery Area
A1 Pizza Garia
A1 Pizza Salt Lake
A1 Pizza New Market
Elite Pizza New Market
Domino's Pizza Garia
Domino's s Pizza Salt Lake

In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one
delivery area to another, the original three-column table would satisfy 4NF.

Ronald Fagin demonstrated that it is always possible to achieve 4NF.

Department of Information Technology


Nivedita Ray De Sarkar Page
8989
Database Management System

Fifth Normal Form (Projection-Join Normal Form)


A table is in fifth normal form (5NF) or Project-Join Normal Form (PJNF) if it is in 4NF and it cannot have
a lossless decomposition into any number of smaller tables.
Properties of 5NF:-
Anomalies can occur in relations in 4NF if the primary key has three or more fields.
5NF is based on the concept of join dependence - if a relation cannot be decomposed any further then it
is in 5NF.
Pair wise cyclical dependency means that:

You always need to know two values (pair wise).

For any one you must know the other two (cyclical).

Example to understand 5NF:


The following table structure as an example of a buying table. This is used to track buyers, what they
buy, and from whom they buy. Take the following sample data:

buyer vendor item


Sally L Tshirts
Mary L Tshirts
Sally J Jeans
Mary J Jeans
Sally J Sneakers

Problem:- The problem with the above table structure is that if L starts to sell Jeans then how many
records must you create to record this fact? The problem is there are pair wise cyclical dependencies in
the primary key. That is, in order to determine the item you must know the buyer and vendor, and to
determine the vendor you must know the buyer and the item, and finally to know the buyer you must
know the vendor and the item.

Solution: - The solution is to break this one table into three tables; Buyer-Vendor, Buyer-Item, and
Vendor-Item. So following tables are in the 5NF.
Buyer-Vendor

buyer vendor
Sally L
Mary L
Sally J
Mary J

Buyer-Item
buyer item

Department of Information Technology


Nivedita Ray De Sarkar Page
9090
Database Management System

Sally Tshirts
Mary Tshirts
Sally Jeans
Mary Jeans
Sally Sneakers

Vendor-Item

vendor item
L Tshirt
J Jeans
J Sneakers

Note:- There is also one more normal form i.e. 6 NF. A table is in sixth normal form (6NF) or Domain-Key
normal form (DKNF) if it is in 5NF and if all constraints and dependencies that should hold on the relation
can be enforced simply by enforcing the domain constraints and the key constraints specified on the
relation.

A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z
is a superkey for R.

Example

Consider the following example:

Travelling Salesman Product Availability By Brand


Travelling Salesman Brand Product Type
Jack Schneider Acme Vacuum Cleaner
Jack Schneider Acme Breadbox
Willy Loman Robusto Pruning Shears
Willy Loman Robusto Vacuum Cleaner
Willy Loman Robusto Breadbox
Willy Loman Robusto Umbrella Stand
Louis Ferguson Robusto Vacuum Cleaner
Louis Ferguson Robusto Telescope
Louis Ferguson Acme Vacuum Cleaner
Louis Ferguson Acme Lava Lamp
Louis Ferguson Nimbus Tie Rack

Department of Information Technology


Nivedita Ray De Sarkar Page
9191
Database Management System

The table's predicate is: Products of the type designated by Product Type, made by the brand designated
by Brand, are available from the travelling salesman designated by Travelling Salesman.

In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and
Product Type, the three-attribute table above is necessary in order to model the situation correctly.

Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain
Product Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then
(assuming Brand B makes Product Type P), the Travelling Salesman must offer products of Product Type
P made by Brand B.

In that case, it is possible to split the table into three:

Product Types By Travelling Salesman


Travelling Salesman Product Type
Jack Schneider Vacuum Cleaner
Jack Schneider Breadbox
Willy Loman Pruning Shears
Willy Loman Vacuum Cleaner
Willy Loman Breadbox
Willy Loman Umbrella Stand
Louis Ferguson Telescope
Louis Ferguson Vacuum Cleaner
Louis Ferguson Lava Lamp
Louis Ferguson Tie Rack

Brands By Travelling Salesman


Travelling Salesman Brand
Jack Schneider Acme
Willy Loman Robusto
Louis Ferguson Robusto
Louis Ferguson Acme
Louis Ferguson Nimbus

Product Types By Brand


Brand Product Type

Department of Information Technology


Nivedita Ray De Sarkar Page
9292
Database Management System

Acme Vacuum Cleaner


Acme Breadbox
Acme Lava Lamp
Robusto Pruning Shears
Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack

Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's
products. In the previous setup we would have to add two new entries since Jack Schneider is able to
sell two Product Types covered by Robusto: Breadboxes and Vacuum Cleaners. With the new setup we
need only add a single entry (in Brands By Travelling Salesman).

Query Optimization

Query Processing
Query Processing: Activities involved in retrieving data from the database.
Aims of QP:
 transform query written in high-level language (e.g. SQL), into correct and efficient execution
strategy
 expressed in low-level language (implementing RA);
 execute the strategy to retrieve required data.

Query Optimization
Query Optimization: Activity of choosing an efficient execution strategy for processing query.
 As there are many equivalent transformations of same high-level query, aim of QO is to choose
one that minimizes resource usage.
 Generally, reduce total execution time of query.
 Problem computationally intractable with large number of relations, so strategy adopted is
reduced to finding near optimum solution.

Example 1 - Different Strategies


Find all Managers that work at a London branch:
SELECT *
FROM staff s, branch b
WHERE [Link] = [Link] AND
([Link] = ‘Manager’ AND [Link] = ‘London’);

Department of Information Technology


Nivedita Ray De Sarkar Page
9393
Database Management System

 3 equivalent RA queries are:


σ(position='Manager') ∧ (city='London') ∧ ([Link]=[Link]) (Staff X Branch)
σ(position='Manager') ∧ (city='London')(Staff Branch)
(σposition='Manager'(Staff)) (σcity='London'(Branch))

Example 1 - Different Strategies


Assume:
 1000 tuples in Staff; 50 tuples in Branch;
 50 Managers; 5 London branches;
 No indexes or sort keys;
 Results of any intermediate operations stored on
 disk;
 Cost of the final write is ignored;
 Tuples are accessed one at a time.

Phases of Query Processing


QP has 4 main phases:
 decomposition
– Aims are to transform high-level query into RA query and check that query is
syntactically and
 semantically correct.
 optimization
 code generation
 execution.

Optimization: Heuristical Processing Strategies


 Perform selection operations as early as possible.
 Keep predicates on same relation together.
 Combine Cartesian product with subsequent selection whose predicate represents join
condition into a join operation.
 Use associativity of binary operations to rearrange leaf nodes so leaf nodes with most restrictive
selection operations executed first.

Optimization: Heuristical Processing Strategies


 Perform projection as early as possible.
 Keep projection attributes on same relation together.
 Compute common expressions once.
 If common expression appears more than once, and result not too large, store result and reuse
it when required.
 Useful when querying views, as same expression is used to construct view each time.
Evaluation of SQL Statement
• The query is evaluated in a different order.
„ The tables in the from clause are combined using Cartesian products.
„ The where predicate is then applied.
„ The resulting tuples are grouped according to the group by clause.
„ The having predicate is applied to each group, possibly eliminating some groups.

Department of Information Technology


Nivedita Ray De Sarkar Page
9494
Database Management System

„ The aggregates are applied to each remaining group.


The select clause is performed last.

Overview of Query Processing


1. Parsing and translation
2. Optimization
3. Evaluation
4. Execution

Transactions
Concurrent execution of user programs is essential for good DBMS performance. Because disk accesses
are frequent, and relatively slow, it is important to keep the cpu humming by working on several user
programs concurrently. A user’s program may carry out many operations on the data retrieved from the
database, but the DBMS is only concerned about what data is read/written from/to the database.
A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.

Department of Information Technology


Nivedita Ray De Sarkar Page
9595
Database Management System

A transaction is a unit of a program execution that accesses and possibly modifies various data objects
(tuples, relations).
DBMS has to maintain the following properties of transactions:
Atomicity: A transaction is an atomic unit of processing, and it either has to be performed in its entirety
or not at all.
Consistency: A successful execution of a transaction must take a consistent database state to a (new)
consistent database state. (; integrity constraints)
Isolation: A transaction must not make its medications visible to other transactions until it is committed,
i.e., each transaction is unaware of other transactions executing concurrently in the system. (;
concurrency control)
Durability: Once a transaction has committed its changes, these changes must never get lost due to
subsequent (system) failures. (; recovery)
Model used for representing database modifications of a transaction:

read(A,x): assign value of database object A to variable x;


write(x,A): write value of variable x to database object A
Example of a Transaction T
read(A,x)
x := x - 200
write(x,A) Transaction Schedule reflects
read(B,y) chronological order of operations
y := y + 100
write(y,B)

Problems that can occur for certain transaction schedules without appropriate concurrency control
mechanisms:

Department of Information Technology


Nivedita Ray De Sarkar Page
9696
Database Management System

Department of Information Technology


Nivedita Ray De Sarkar Page
9797
Database Management System

Department of Information Technology


Nivedita Ray De Sarkar Page
9898
Database Management System

Department of Information Technology


Nivedita Ray De Sarkar Page
9999
Database Management System

Checks for serializability are based on precedence graph that describes dependencies among concurrent
transactions; if the graph has no cycle, then the transactions are serializable.
They can be executed concurrently without affecting each others transaction result.

Department of Information Technology


Nivedita Ray De Sarkar Page
1001001
Database Management System

Department of Information Technology


Nivedita Ray De Sarkar Page
1011011
Database Management System

Concurrency Control: Lock-Based Protocols


One way to ensure serializability is to require that accesses to data objects must be done in a mutually
exclusive manner.
Allow transaction to access data object only if it is currently holding a lock on that object.
Serializability can be guaranteed using locks in a certain fashion

Department of Information Technology


Nivedita Ray De Sarkar Page
1021021
Database Management System

Department of Information Technology


Nivedita Ray De Sarkar Page
1031031
Database Management System

Dirty-reads and non-repeatable reads are not possible. Furthermore, this mode guarantees serializability
(but does not provide much parallelism).

Checkpoint :
[Checkpoint] is appended to the log when writes of all previously committed transactions are effected
on the database.
Purpose:
Reduce the work required for recovery.

Department of Information Technology


Nivedita Ray De Sarkar Page
1041041
Database Management System

How to check for View Serializable and Conflict Serializable

This is a note for myself about how to check whether a schedule is view serializable, conflict serializable,
or not.

There is various resources in the internet about how to do this, but the examples are a bit scattered, so
in this post I just want to make a neat note on how to do it properly with several examples that can
cover many possibilities as well.

Let’s get started!

For the sake of this tutorial, we will use this example:


R1(X),R2(Y),R2(Y),W2(X),W3(Y),R1(X)

or if we spread it to three different transactions, it will be:

T1 | R1(X) R1(X)
T2 | R2(Y) R2(Y) W2(X)
T3 | W3(Y)

How to check for Conflict Serializable?

To check for conflict serializability takes two steps.


Step #1 : Check for the conflicting actions.

——————————————————
Two or more actions are said to be in conflict if:

1. The actions belong to different transactions.


2. At least one of the actions is a write operation.
3. The actions access the same object (read or write).

The following set of actions is conflicting:

T1:R(X), T2:W(X), T3:W(X)

While the following sets of actions are not:

T1:R(X), T2:R(X), T3:R(X)


T1:R(X), T2:W(Y), T3:R(X)
—————————————————–

For our example, we have a conflict on X (T1 reads it and T2 writes it).
We also have a conflict on Y (T2 reads it and T3 writes it).

So it has a conflict property but is it serializable?

Department of Information Technology


Nivedita Ray De Sarkar Page
1051051
Database Management System

Step #2 : Check for a cycle in the Precedence Graph.

The easiest way to check for a cyclic transaction, is to paste the schedule to this web app.

But if you are like me, you won’t be satisfied getting answer from someone without understanding how
exactly they come up with the the answer (though sometimes I don’t care either..). So let’s find out how
to do exactly just that.

First, draw all the Transactions (Tx):

Check if there is a Tx that reads an item after a different Tx writes it.

 We have T1 that reads X after T2 writes it, so draw arrow from T2 -> T1

Check if there is a Tx that writes an item after a different Tx reads it.

 We have T2 that writes X after T1 reads it, so draw arrow from T1 -> T2
 We also have T3 that writes Y after T2 reads it, so draw arrow from T2 -> T3

Check if there is a Tx that writes an item after a different TX writes it.

 not in this case

So, the final graph is :

Department of Information Technology


Nivedita Ray De Sarkar Page
1061061
Database Management System

We can see that there is a cycle between T1 and T2, so the graph is cyclic, and therefore it is not conflict
serializable.

Need more example?


Graph for R1(X),R2(Y),W3(Z),W2(Y),W2(X),R1(Z),W3(Y),W2(X)

Graph for R1(X),R2(Y),R3(Z),W2(Y),W2(X),W1(Z),W3(Y),W2(X)

(same like the previous example)

How to check for View Serializable?

To check for view serializability of a schedule, you must create all possible combinations of the
Transactions. So let’s say you have three transactions, then you need to check for these combinations:
< T1, T2, T3 >

Department of Information Technology


Nivedita Ray De Sarkar Page
1071071
Database Management System

< T1, T3, T2 >


< T2, T1, T3 >
< T2, T3, T1 >
< T3, T1, T2 >
< T3, T2, T1 >

Now the schedule is view serializable if:

A Tx reads an initial data in a Schedule, the same Tx also should read the initial data in one of the
transaction combination.

 For our example, at least T1 should occur before T2, because T1 reads initial value X. If T2 occurs
before T1, then T1 reads X value after T2 writes. So remove these combinations:
o < T2, T1, T3 >
o < T2, T3, T1 >
o < T3, T2, T1 >

A Tx reads a data after another Tx has written in a Schedule, the same Tx also should read the data after
another Tx has written it in one of the transaction combination.

 In our example, T1 reads X after T2 writes, so it means that T2 should occur before T1. But wait
a minute, isn’t it we’ve just said that T1 should occur before T1 on the previous condition?
That’s what a cycle in the graph also caused. We need T1 before T2 and at the same time we
need T2 before T1. Because of this, none of the combinations can satisfy these two conditions,
so it is not view serializable.

A Tx writes the final value for a data in a Schedule, the same Tx should also write the final data in one of
the transaction combination.

 Although we’ve said that our schedule is not view serializable, let’s just take a look which
combination satisfy this condition. We know that T2 writes Y and X, but in our schedule the T3
writes the last value of Y. So the combination that satisfy these condition should have T2 occur
before T3. If we are to choose the combinations, keep everything else but remove these:
o < T1, T3, T2 >
o < T3, T1, T2 >
o < T3, T2, T1 >

View Serializable Example:

Now, let’s take a look an example that is view serializable


W3(Z), R2(X), W2(Y), R1(Z), W3(Y), W1(Y)

A Tx reads an initial data in a Schedule, the same Tx also should read the initial data in one of the
transaction combination.

 Here, at least T2 must occur first, though it actually doesn’t matter also because no one else
writes X, so we still keep all our Tx combinations.
Department of Information Technology
Nivedita Ray De Sarkar Page
1081081
Database Management System

A Tx reads a data after another Tx has written in a Schedule, the same Tx also should read the data after
another Tx has written it in one of the transaction combination.

 Now, this means that T1 must occur after T3 because T1 reads Z after T3 writes it. So we remove
o < T1, T2, T3 >
o < T1, T3, T2 >
o < T2, T1, T3 >

A Tx writes the final value for a data in a Schedule, the same Tx should also write the final data in one of
the transaction combination.

 Here, T1 must occur last after T3 or T2 because if not T3 or T2 will overwrites Y that T1 writes in
our schedule. So we remove < T3, T1, T2 >

So the two combinations left satisfy the view serializability this time, they are:

 < T2, T3, T1 >


 < T3, T2, T1 >

Locking

The most common way in which access to items is controlled is by “locks.” Lock manager is the part of a
DBMS that records, for each item A, whether one or more transactions are reading or writing any part of
A. If so, the manager will forbid another transaction from gaining access to A, provided the type of
access (read or write) could cause a conflict, such as the duplicate selling of an airline seat.

Serializability through timestamps

A timestamp is a centrally dispensed number assigned to each transaction in strictly increasing order.
The DBMS guarantees that the final result from competing transactions will appear as if the transactions
had executed serially in timestamp order.

The log files role in rollbacks and failure recovery

A log file maintains a record of all changes to the database, including the ID of the perpetrating
transaction, a before-image of each modified object.

The log file enables recovery from a failure that loses the memory buffer’s contents but doesn’t corrupt
the database. You scan the log backward and reverse transactions by rewriting their before-images. You
then scan it forward and reverse transactions by rewriting their after-images.

2-phase locking protocol

• lock every item you touch

• once you release your first lock, you can’t acquire any more locks

Department of Information Technology


Nivedita Ray De Sarkar Page
1091091
Database Management System

• 2Φ locking implies transactions are serializable

• problem with 2Φ locking: can require cascaded rollback (impossible to do in practice)

Locks are applied and removed in two phases:

1. Expanding phase(growing phase): locks are acquired and no locks are released.
2. Shrinking phase: locks are released and no locks are acquired.

The strict two-phase locking (S2PL) class of schedules is the intersection of the 2PL class with the class
of schedules possessing the Strictness property.

To comply with the S2PL protocol a transaction needs to comply with 2PL, and release its write
(exclusive) locks only after it has ended, i.e., being either committed or aborted. On the other hand, read
(shared) locks are released regularly during phase 2.

S2PL is a special case of 2PL, i.e., the S2PL class is a proper subclass of 2PL

Solution to Cascaded Rollbacks Problem

• Modify 2 Φ locking protocol so that transactions hold all their locks until after they commit.

Department of Information Technology


Nivedita Ray De Sarkar Page
1101101
Database Management System

File Organization & Index Structures


Typical DB operations:
In the following discussion, we assume that a DB has already been defined, and each table in it already
contains some amount (usually large amount) of data.
Our concern is: how fast can the DB server respond to some query that is sent to it. When is this
important? If a large number of users are simultaneously accessing a DB; and/or if some table that
needs to be accessed is very large, namely, have many rows, perhaps millions.
The second thing we need to know is “what is the user trying to do with the data in the DB?” In general
We can divide our analysis into the following types of cases:
• Search for a particular row of data in a table
• Creation a row in a table
• Modify some data in a row of a table
• Deleting a row of data from the table

In terms of the logical operations to be performed on the data, relational tables provide a beautiful
mechanism for all of the three above tasks. The storage of a Database in a computer memory is mainly
concerned with:
1. The need to store a set of tables [each table can be stored as an independent file];
2. The attributes in a table are often accessed together [Why?].
Therefore, it makes sense to store the different attribute values in each record contiguously. For each
record of a table, the attributes MUST be stored in the same sequence [Why?]
3. Since there is no prescribed order in which records must be stored in a table, we may choose the
sequence in which we store the different records of a table. We shall see that this observation is quite
useful.
It is useful to look at how tables are stored on the HD in a little more detail. Each row (called record) of a
table can contain different amount of data [Why?].
Therefore, the record is stored with each subsequent attribute separated by the next by a special ASCII
character called a field separator. Of course, in each block, we may place many records. Each record is
separated from the next, again by another special ASCII character called the record separator. For our
discussion, we shall assume that a record is stored in a block only if there is sufficient space to store it
completely (namely, we do not store part of the record in one block, and the remaining in the next
block). There are some cases where each record of a table can be larger in size than one sector on the
HD, for example, a DB storing street maps, where each map segment is a JPEG image of over 3MBytes.
However, we will ignore such special cases for our discussion.

Department of Information Technology


Nivedita Ray De Sarkar Page
1111111
Database Management System

What affects the speed of operations in a DB?


The speed with which a DBMS can execute a query depends on the basic operation that is to be done.
Recall that the basic operations are:
(i) find a row
(ii) insert a row,
(iii) delete a row
(iv) modify a cell in a row. In order for a computer program to work with the data, it must first read the
data to chip memory accessible to the CPU, namely the RAM. In general, the speed at which the CPU
and RAM communicate can be 100x or 1000x faster than the speed at which data can be transferred
from the HD to the RAM (for example, to fetch one block of data from the HD and load it into the RAM
may take 10-3sec, but to search for a record with a particular ID value in one block of data stored in the
RAM may take 10-6sec!).
Therefore, in all the following analysis, we shall be concerned with the time it takes to transfer the
required data between the HD and RAM to complete an operation.

HEAP files:

The simplest method of storing a DB table is to store all the records of the table in the order in which
they are created, on contiguous blocks, in a large file. Such files are called HEAP files, or a
PILE. We shall examine the storage methods in terms of the operations we need to perform on the
Database.
Operation:
Insert a new record
Method:
The heap file data records the address of the first block, and the file size (in blocks). It is therefore easy
to calculate the last block of the file, which is directly copied into the buffer. The new record is inserted
at the end of the last existing record, and the block is written back to the disk.
Performance:
Very fast -- in the worst case, the DBMS needs to read two blocks of data and write back one block (i.e.
read the last block, if it does not have sufficient empty space, read the next available block and finally,
write the record to this block). If the time for transfer of 1 block = t, then the worst case time required is
2t. In addition, there is a fixed overhead cost associated with updating the size of the file, which sectors
it occupies, etc. -- but this is negligible.

Department of Information Technology


Nivedita Ray De Sarkar Page
1121121
Database Management System

Operation:
Search for a record
Method:
Linear search: (i) The first block to the RAM; (ii) CPU searches each record in the block to match the
search criterion; (iii) If no match is found, copy the next block into RAM and go to (ii). Performance:
Poor. Assume that a table occupies B blocks in size. In the worst case, we will find the data in the last
Block (or not find it at all); worst case time =Bt. On the average, if the searched data is
uniformly distributed, the average search time= Bt/2, which is still very poor.

Operation:
Update a cell in a record
Method:
Linear search: (i) The first block to the RAM; (ii) CPU searches each record in the block to match the
search criterion; (iii) If no match is found, copy the next block into RAM and go to (ii); (iv) else, if a record
in the block matches the criterion, update the record, and write it back.
Performance:
It is easy to see that the worst case time =Bt+t, and the average time= Bt/2+t(assuming that the time to
write a block of data = time to read a block = t).
[Note: it is possible that the update increases the size of the block greater than 1 sector; convince
yourself that this will introduce a constant, and relatively negligible, overhead]
Operation:
Delete a record
Method:
Search the record that is to be deleted (requires linear search).
Performance:
Poor (analysis is same as for “update a record” case).
Problem:
After the record is deleted, the block has some extra (unused) space. What to do about the
unused space?
Different DBMS may use a different policy about this. Typical policies may be: (a) Delete the space and
rewrite the block. At periodic intervals (few days), read the entire file into a large
RAM buffer and write it back into a new file. (b) For each deleted record, instead of actually deleting the
record, just use an extra bit per record, which is the 'RECORD_DELETED' marker.
IF: RECORD_DELETED == 1, the record is ignored in all searches.
Approach (b) may have some advantage in terms of recovery of data deleted by mistake, or in cases
where a deleted record must be recovered. In any case, after fixed intervals, the file is updated just as in
case (a), to recover wasted space. Heaps are quite inefficient when we need to search for data in large
database tables.

Sorted Files:
The simplest method to make efficient searching is to organize the table in a Sorted File. The entire file is
sorted by increasing value of one of the attributes, called the ordering attribute
(or ordering field). If the ordering field) is also a key attribute, it is called the ordering key. Figure shows
an example of a table sorted by the "SSN" attribute.

Department of Information Technology


Nivedita Ray De Sarkar Page
1131131
Database Management System

Operation:
Search for a record with a given value of the ordering attribute.
Method:
Binary search
For a file of b blocks, look in the block number |b/2|. If the searched value is in this block, we are done;
If the searched value is larger than the last ordering attribute value in this block, repeat the binary
search in blocks between ( ⎡b/2⎤+ 1) and b; otherwise repeat the search in blocks between 1 and ( ⎡b/2⎤-
1).
Performance:
Let us consider the worst case time spent for a sorted file of size b blocks. In each iteration, we need to
check one block (= t units of time); Worst case we do not find the record in until the last remaining
block. In the 2nd iteration, at most b/2 blocks will remain to be searched; in the 3rd iteration, (b/2)/2 =
b/2
2blocks remain. After i-iterations, b/2i-1 blocks remain to be searched. Of course, if only one block
remains to be searched, no further sub-division of the list of blocks is required, namely, we stop when
b/2i-1= 1, which gives: b = 2i-1, or i = (1 + lg2b). Total number of blocks searched after i-iterations is i,
and the total time is t(1 + lg2b).
Let’s compare the relative worst-case search time between heap storage and sorted file storage. Let’s
assume a file occupies 8192 blocks. Stored as a heap file, the worst case time is 8192t; stored as a sorted
file, the worst case time is t( 1 + lg28192) = t(1+ 13) = 14t. The search is 8192/14 ≈585 times faster!
Operation:
Delete a record/update a value in a record (other than the ordering attribute value)
Method:
First, search the record to be deleted (using binary search).

Department of Information Technology


Nivedita Ray De Sarkar Page
1141141
Database Management System

After the record is deleted, we still need to perform the usual file compacting once every few days.
Performance:
Quite fast. The worst case time is the same as that for search, plus ‘t’ units of time to write
the modified block.
Operation:
Insert a new record/update a record where the ordering attribute is changed
Method:
If we insert the new record in its correct ordered position, we would need to shift every subsequent
record in the table! This is of course very time consuming. Consider the case that a new record is
inserted for ‘SSN=1003’ in our example of figure 7. Since this record goes into the first block, the record
for ‘SSN=1008’ will need to be removed to make space for the new record. The ‘SSN=1008’ record must
be shifted to Block-2, but this may require ‘SSN=1086’ to be shifted out of Block 2, and so on. Total
worst-case time spent is approximately ≈Search for the insertion point ≈t(1+ lg2b) + (Read and Write
each block = 2bt)
Performance: Very inefficient.
Solution:
Overflow file. The table is stored as a sorted file. However, when a new record is inserted, it is stored in
a different file, called an overflow file (thus each table has its own main file and its own overflow file).
The main file is sorted, while the overflow file is a heap file.
Periodically, the overflow file is merged with the ordered file. Thus every insert operation is very fast
(constant time) -- just add the record at the end of the overflow file.
For other operations, we first use binary searching in the regular file. If we cannot find a matching
record, we then perform a linear search in the overflow file. Note that since the overflow file is not
ordered, it requires a linear search.

FASTER searching: Hashing Functions


Hashing is a common idea that divides the total data to be stored into a series of organized “buckets”.
Let’s look at a simple, physical example first. Suppose you need to store paper files for several hundred
people in file folders. An efficient way to do so is to use 26 folders -- one for all last-names that begin
with ‘A’, one for ‘B’ etc. Here, each folder can be though of as a ‘bucket’ storing a subset of the data. The
same idea can be used in storing records of a table in a DB file.
In order for Hashing to work, we must first have a reasonable idea of the total size of the data (e.g. the
expected maximum number of records, and size of each record). We also decide upon the approximate
size of each ‘bucket’ -- thus, if we expect the maximum size of our table to be 10,000 blocks of data, and
we would want a response for a search in under 10tunits of time, then we may use 1000 buckets of 10
blocks each. Secondly, we must specify the hashing attribute, whose value will be used to search for the
data -- this attribute will be used to create the ‘hashing index’. Finally, we need to define a hashing
function: its input is the value of the hashing attribute, and its output is an integer that is in the range [1,
..., max number of buckets]. It is beneficial to design hashing functions whose output is uniformly
distributed over
this range.
A simple hash function is: Hash Address = K mod M, where K is the hashing key, an integer equivalent to
the value of the hashing attribute, M is the maximum number of buckets needed to accommodate the
table.
Suppose each bucket has a size of S blocks. Initially, the DBMS will reserve a space of MxS contiguous
blocks on the HD. Even if the table has very few records, this entire space is reserv

Department of Information Technology


Nivedita Ray De Sarkar Page
1151151
Database Management System

ed for the hashed table. For simplicity, let’s assume that the table is stored in blocks [1, ..., MxS]. The
first bucket occupies blocks (1, ... S), the 2nd bucket is stored from blocks (S+1, ..., 2S), etc.
Operation:
Insert a record.
Procedure:
1. Using the Hashing attribute value, calculate hashing key, K;
2. Calculate hashing address, [in our example, (K mod M)];
3. Place the record in first sector with enough space, starting from address Sx(K mod M)
Performance:
In general, the maximum time that it will take to do an insertion will be tS +t( tS to read the S blocks in
this bucket, and t units to write the block where the record is inserted).

Operation:
Search for a record
Procedure:
Compute the hashing address, and look in the blocks starting from the address. If all blocks of this
bucket are full, look into the blocks for the next bucket until an empty block is encountered
Performance:
Quite fast (Almost constant time)
Operation:
Deletion of a record.
Procedure, Performance:
Similar to that for Searching for a record.
All the above are techniques used by the DBMS's to physically store each of the tables in a Database. In
all of the above, we assume that each table is arranged according to the value of a sorting key. However,
in practical use, a record may be searched by the value of some other key. This is a very common
requirement, and often it is essential to get a fast data operation on a table based on more than one
searching attribute. DBMS's provide another mechanism to quickly search for records, at the cost of
using some extra disk space: INDEXES.

Basics of indexes
The main idea of an index is similar to the physical index. Relational DB’s can have two different types of
indexes; we shall study each type below Primary Indexes.
A primary index file is an index that is constructed using the sorting attribute of the main file.
In the primary index file, we store the value of the sorting attribute of the main file in the first record in
each block of the main file. Corresponding to each entry, we store the address of the block. Notice that
the index file is much smaller (i.e. has very few blocks) compared to the main file.

Department of Information Technology


Nivedita Ray De Sarkar Page
1161161
Database Management System

Main data file and its corresponding primary index file

Operation:
Search for a record in the main file
Procedure:
First perform a binary search on the primary index file, to find the address of the corresponding data.
Suppose we are searching for ‘SSN=1208’. From the primary index file we can tell that if a record for
‘SSN=1208’ exists, it must be in Block 3. We then fetch the contents of Block3 from the HD, and search in
this block for the record for ‘SSN=1208’.
Performance:
Very fast! The worst case search time to search for the address in the primary index file of size P blocks
is ≈t(1 + lg2P); an additional t units is needed to fetch the correct block from the main file.
Thus the total time ≈t(2 + lg2b)
Similarly you can analyze the performance of other operations.
Problem:

Department of Information Technology


Nivedita Ray De Sarkar Page
1171171
Database Management System

The Primary Index will work only if the main file is a sorted file. What if we want to insert a new record
into the main file? Recall that inserting a record in a sorted file is time-consuming, therefore records are
initially stored in an overflow file. Earlier we studied how simple heap files can be used for overflow
files; in reality, most DBMS’s will use a special structure in storing such records, called a balanced tree
(or B-tree). However, in the following solution to our problem, we shall just assume that overflow files
are heaps.
Solution:
The new records are inserted into an unordered (heap) in the overflow file for the table.
Periodically, the ordered and overflow tables are merged together; at this time, the main file is sorted
again, and the Primary Index file is accordingly updated.
Thus any search for a record first looks for the INDEX file, and searches for the record in the indicated
Block.
If the record is not found, then a linear search is conducted in the overflow file for a possible match.

Secondary Indexes
It is possible to construct indexes for any attribute of a table. Of course, since the main file can only be
sorted by one field (usually the primary key field), therefore the contents of secondary indexes is
different from that of primary indexes.
Note that the data file was sorted using SSN. Obviously, we cannot order the records of the main file
using our secondary index key attribute, ‘Lname’. The Secondary Index is a two column file storing the
block address of every secondary index attribute value of the table.

Department of Information Technology


Nivedita Ray De Sarkar Page
1181181
Database Management System

Schematic figure for a secondary index on Lname attribute of main file

You can also create Secondary Indexes for attributes that are non-unique. The idea is similar, though the
storage details are slightly different (for example, if there are three persons with
‘Lname=Wong’, then the three records may be located in different blocks -- thus the secondary
index file must record multiple entries for the same ordering attribute (Lname), one for each record --
and the search algorithm must account for each occurrence. Of course, you can create as many
secondary indexes as you wish.

Creating Indexes using SQL


Example:
Create an index file for Lname attribute of the EMPLOYEE Table of our Database.
CREATE INDEX myLnameIndex ON EMPLOYEE( Lname);
This command will create an index file which contains all entries corresponding to the rows of the
EMPLOYEE table sorted by Lname in Ascending Order.
Example:

Department of Information Technology


Nivedita Ray De Sarkar Page
1191191
Database Management System

You can also create an Index on a combination of attributes.


CREATE INDEX myNamesIndex ON EMPLOYEE( Lname, Fname);
DROP (delete) an index
Example:
Delete the index created in the previous example.
DROP INDEX myNamesIndex;
In conclusion, indexes provide the advantage of speeding up access time for any of the DB’s basic
operations. The main disadvantages of Indexes are that they use up space on hard disk, and require
some additional DB maintenance operations.

Department of Information Technology


Nivedita Ray De Sarkar Page
1201201

You might also like