0% found this document useful (0 votes)
6 views356 pages

Database Management System Overview

The document outlines the concepts and significance of Database Management Systems (DBMS) as part of a course at Manipal University Jaipur. It covers topics such as data independence, data modeling, and various applications of databases across different sectors, emphasizing their importance in modern enterprises. The document also includes self-assessment questions to reinforce learning objectives related to DBMS.

Uploaded by

dapono4701
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views356 pages

Database Management System Overview

The document outlines the concepts and significance of Database Management Systems (DBMS) as part of a course at Manipal University Jaipur. It covers topics such as data independence, data modeling, and various applications of databases across different sectors, emphasizing their importance in modern enterprises. The document also includes self-assessment questions to reinforce learning objectives related to DBMS.

Uploaded by

dapono4701
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 1: Database Management System Concepts 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 1
Database Management System Concepts
Table of Contents

SL Fig No / Table SAQ /


Topic Page No
No / Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Significance of Database - 1 4-5
3 Database System Applications - 2 6-7
4 Data Independence - 3 8
5 Data Modeling for a Database - 4
5.1 Entities and their Attributes - - 9 - 11
5.2 Relationships and Relationships Types - -

6 Advantages and Disadvantages of Database - 5 12 - 14


Management System
7 DBMS Vs RDBMS 1 6 15
8 Summary - - 16
9 Terminal Question - - 16
10 Answers - - 17 - 18

Unit 1: Database Management System Concepts 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
In this unit, we will introduce the basic concepts of DBMS. Database technology development
evolved rapidly in the three decades since the rise and eventual dominance of relational
database systems. While many specialized database systems (spatial, object-oriented,
multimedia, etc.) have found substantial user communities in the science and engineering
sections, relational systems remain the preferred database technology for business
enterprises.

A database is a collection of related information stored so that it is available to several users


for several different purposes. The content of a database is obtained by combining data from
all the different sources in an organization so that data are available to all users and
replicated data can be minimized or eliminated. A computer database gives us an electronic
filing system, which has a large number of ways of cross-referencing and this allows the user
several different ways in which to retrieve and reorganize data.

1.1 Objectives:
By the end of Unit 1, the learners should be able to:

❖ Understand the definition of database and database management system


❖ Study the applications of database system
❖ Understand modeling for a database
❖ Understand the advantages and disadvantages of DBMS
❖ Differentiate between DBMS and RDBMS

Unit 1: Database Management System Concepts 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. SIGNIFICANCE OF DATABASE
The database is a collection of related data. A data item is the smallest identified unit of data
that has value in the real world – for example, last name, first name, street address, ID
number, or political party – and is the fundamental component of a file in a file system. A
group of related data items considered as a single unit by an application is called a record.
Examples of types of records are salesperson, customer, order, product,and department.
A file is a collection of several records of a single type.

A database has the following properties:


• A database represents an aspect of the real world and is sometimes called the universe
of discourse (UoD) or mini-world. The database keeps track of changes to the mini-
world.
• A database is a logically organised collection of data that has some meaning..
• A database is designed, built, and populated with relevant data for a specific purpose.

A database is a more complex object; it is a collection of interrelated stored data that


facilitates the requirements of several users within one or more organizations, that is,
interrelated collections of many different types of tables. The encouragement for using
databases rather than files includesless redundancy of data, greater availability to a diverse
set of users, and integration of data for easier access to and updating of complextransactions.

The basic definitions of some database concepts are:


Data: Data is a collection of known facts that may be recorded and has an underlying
significance.

Database: It is a collection of related data.

Database System: It is the DBMS software together with the data [Link], the
applications are also included.

Database Management System (DBMS): It is a software package/systemto facilitate the


creation and maintenance of a computerized database.

Examples of database management systems software are ORACLE, SQL Server, MS Access,
DB2, SYBASE, INFORMIX, etc.

Unit 1: Database Management System Concepts 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

A database can handle accounting and filing, and business inventory, and use the information
in its files to prepare summaries, estimates, and other reports. There can be a database that
stores books, newspaper articles, magazines, and comics. On practically every issue, there is
already a well-defined market for specialised knowledge for a small number of customers.

The management of data in a database system is handled by a database management system,


which is a general-purpose software programme. The database management system is the
major software component of a database system. Therefore a database management system
is a combination of hardware and software that can be used to set up and monitor a database
and can manage the retrieval andupdating the database that has been stored in it. The
majority of database management systems have the following capabilities:
• Creating of a file, addition to data, modification of data, deletion of data, creation,
addition, and deletion of files.
• Retrieving data selectively or collectively.
• The data can be sorted or indexed at the user's discretion and direction.
• Several reports can be generated from the system. These may be eitherstandardized
reports or specifically generated according to specific user requirements.
• Mathematical functions can be performed on data stored in the database and can be
manipulated using these functions to perform the desired calculations.

Self-Assessment Questions – 1

1. The basic component of a file in a file system is a______ .


2. UoD stands for __________.
3. Database Management System is a_ ______system to facilitates thecreation and
maintenance of a computerized database.
4. The database management system is the major software component of a
_________.

Unit 1: Database Management System Concepts 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. DATABASE SYSTEM APPLICATIONS


Databases are widely used. Here are some of the representative applications:
• Universities: For student information, course registrations, exam, and grades.
• Credit card transactions: For purchases on credit cards andgeneration of monthly
statements, and payments.
• Banking: For accounts, and loans, customer information, and bankingtransactions
online/offline.
• Finance: For storing information regarding financial instrument holdings, sales, and
purchases, such as stocks and bonds.
• Sales: For the customer, product, and purchase information.
• Telecommunication: To keep track of incoming and outgoing calls, monitor prepaid
calling card balances, generate monthly bills, and store information about
communication networks..
• Manufacturing: For inventories of items in warehouses/ stores,management of supply
chain, and tracking the production of items in factories, and orders for different items.
• Human resources: For information about employees, payroll taxes and benefits,
salaries, and for generation of paychecks.
• Airlines: For reservations, cancellations, and schedule information. Airlines were
among the first to use databases in a geographically distributed manner terminals
situated across the world accessed the central database system through some data
networks.

So we can say that databases form an essential part of almost all enterprises today. During
the last four decades of the twentieth century, the useof databases grew in all enterprises.
Few people engaged directly with database systems in the early days, but they did so
indirectly — through printed reports such as credit card statements, or through agents such
as bank tellers and airline reservation agents – without recognising it. Then automated teller
machines came along and let users interact directly with databases. Phone interfaces to
computers also facilitated users to deal with databases directly. For example – a caller could
dial a number, and press phone keys to enter information or selectalternative options, to find
flight arrival/departure times.

Unit 1: Database Management System Concepts 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

During the late 1990s, the internet revolution sharply increased direct user access to
databases. Organizations converted many of their phone interfaces to databases into Web
interfaces and facilitates a different kinds of services and information available online. For
example, when you access a Web site, information about you may be retrieved from a
database, to selectwhich advertisements should be shown to you.

When you explore a book or music collection in an online retailer, you are accessing data
stored in a database.

When you place an online order, it is saved in a database.

When you go to a bank's website to get your account balance and transaction history, the
data is pulled from the bank's database system. Moreover, data related to Web access may
be stored in a database. Thus, although user interfaces hide details of access to a database,
and most people are unaware they are dealing with a database, accessingdatabases forms an
essential part of almost everyone’s life these days. The importance of database systems can
be judged in another way – today, database system vendors like Oracle are among the largest
software companies in the world, and database systems form an essential part of the product
line of more diversified companies like IBM and Microsoft.

Self-Assessment Questions – 2

5. ____ were among the first to use databases in ageographically distributed


manner.
6. Web accesses may be stored in a _____.
7. The ____machines came along and let users interactdirectly with databases.
8. In __________database is used for keeping records of calls made,generating
monthly bills, maintaining balances on prepaid calling cards,and storing
information about the communication networks.

Unit 1: Database Management System Concepts 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. DATA INDEPENDENCE
We can define data independence as the capability to modify the schema definition at one
level without disturbing a schema definition at the next higher level. It is usually understood
from two points of view: physical data independence and logical data independence.
Physical data independence permits changes in the physical storage devices or organization
of the files to be made without causing changes in the conceptual view or any of the external
views and hence in the application programs using the database. Thus, the files may migrate
from one physicalmedia to another or the file structure may change without requiring any
changes in the application programs.

Logical data independence refers that application programs need not be changed if fields are
added/deleted to/from an existing record. Logical data independence indicates that the
conceptual schema can be changed without affecting the existing external schemas. In a
database environment, data independence is advantageous, as it allows for changes at one
level of the database, without requiring any change in other levels. The mappings between
the layers absorb these changes.. Since application programs are heavily dependent on the
logical structure of the data they access, so it is more typical to achieve logical data
independence than physical independence.

In many respects, the concept of data independence is similar to the conceptof abstract data
type in programming languages like C++. Both hide implementation details from the users.
This facilitates users to concentrate on the general structure rather than low-level complex
implementation details.

Self-Assessment Questions – 3

9. Data independence is usually considered from __________points of view.


10. ___data independence allows changes in the physicalstorage devices.
11. ___ data independence implies that application programs need not be changed
if fields are added to an existingrecord.
12. Logical data independence is more __________ to achieve than physical
independence.

Unit 1: Database Management System Concepts 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. DATA MODELING FOR A DATABASE


The data model is considered one part of the conceptual design process. The other is the
function model. The data model emphasizes what data should be stored in the database
whereas the function model deals with how the data is processed. In the context of the
relational database, the data model is used to design the relational tables, whereas the
functional model is used to design the queries that will access and perform operations on
those tables.

Data modeling is preceded by planning and analysis. The effort done to this stage is
proportional to the scope of the database. The planning and analysis of a database help to
serve the needs of an enterprise and requiresmore effort than one intended to serve a small
workgroup.

5.1 Entities and Their Attributes


The entity-relationship (E-R) data model is based on a consideration of the real-world that
consists of a set of basic objects called entities, and of relationships among these objects. It
was developed to facilitate database design by considering the specification of an enterprise
schema, which represents the overall logical structure of a database. The E-R data model is
one of numerous semantic data models; the semantic aspect of the model is based on the
attempt to reflect the data's meaning.. The E-R model is very much useful in mapping and
interactions of real-world enterprises onto a conceptual schema. Due to this utility, many
database-design tools draw on concepts from the E-R model. More about the E-R model is
explained in Unit 3 (Entity-Relationship Model).

Entities
These are the main data objects about which information is to be collected; they usually
denote a person, place, thing, or event of informational interest.A particular occurrence of an
entity is said to be an entity instance or sometimes an entity occurrence. Suppose a
Company database; here department, division, project, skill, employee, and location are all
examples of entities. In Unit 3, you'll obtain more examples of entities (Entity- Relationship
Model)

Unit 1: Database Management System Concepts 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Attributes
These are characteristics of entities that provide details about them.

An attribute value is a unique instance of an attribute within an entity or relationship.

Employees may have attributes such as emp-id, emp-name, phone-no, fax-no, job-title, emp-
address, and so on.

The attribute has a link to the entity it describes.

There are two types of attributes: identifiers and descriptors. An identifier (or key) is used
to uniquely identify an instance of an entity also known as a key attribute; a descriptor (or
monkey attribute) is used to specify a non-unique characteristic of a particular entity
instance. Both identifiers and descriptors may consist of single or composite attribute.

For example, an identifier or key attribute of an Employee is emp-id, and a descriptor of an


Employee is emp-name or job-title. Strong entities have internal identifiers that uniquely
determine the existence of entity instances, but weak entities derive their identity from the
identifying attributes of one or more “parent” entities. Weak entities are often displayed with
a double-bordered rectangle, which denotes that all occurrences of that entity dependon an
associated (strong) entity for their existence in the database. More about attributes is
explained in Unit 3 (Entity-Relationship Model).

5.2 Relationships and Relationship Types


Relationships
Relationships represent associations among one or more entities and have no physical or
conceptual existence, other than that which depends upon their entity associations. Degree,
connectedness, and existence are all concepts used to characterise relationships.. The most
common meaning associated with the term relationship is indicated by the connectivity
between entity occurrences: one-to-one, one-to-many, and many-to-many.

The relationship construct is a diamond box that connects the associated entities. The
relationship name can be written inside or just outside the diamond box.

A role is the name of one end of a relationship when each end needs a distinct name for clarity
of the relationship. The individual duties of each entity in the relationship are clearly defined

Unit 1: Database Management System Concepts 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

by the entity names paired with the relationship name. However, in some cases, role names
should be used to clarify ambiguities. Role names are generally nouns. More information
about relationships and their types is given in Unit 3 (Entity-Relationship Model).

Relationships Types
The contents of a database must conform to specific limitations defined by an E-R enterprise
schema.. Cardinality ratios or mapping cardinalities shows the number of entities to which
another entity can be associated through a relationship set.

For a binary relationship set R between two entity sets A and B, the mapping cardinality
must be one of the following:
• One-to-one (1:1): Each entity in A is linked to only one entity in B, and each entity in
B is linked to only one entity in A.
• A one-to-many (1:N) relationship exists between an entity in A and any number of
entities in B.
However, an entity in B can only be linked to one entity in A.
• Many-to-one (N:1) means that an entity in A is linked to only one entity in B.
However, one entity in B can be linked to any number of entities in A.
• Many-to-many (M: N" ): An entity in A is linked to an unlimited number of entities in
B. A single entity in B can be linked to any number of entities in A

Self-Assessment Questions – 4

13. The ___is one part of the conceptual design process.


14. Data modeling is preceded by ______and analysis.
15. The E-R data model is based on a perception of real-world that consistsof a
set of basic objects called ____.
16. A particular occurrence of an entity is called an entity _____.
17. An entity in A is associated with at most one entity in B, and an entity Bis
associated with at most one entity in A, this type of relationship is ___________ .
18. In one-to-many relationships, an entity in A is associated with any number of
entities in B. An entity in B, however, can be associated with most
__________entities in A.

Unit 1: Database Management System Concepts 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. ADVANTAGES AND DISADVANTAGES OF DATABASE MANAGEMENT


SYSTEM

Out of several advantages, one of the main advantages of using a database system is that the
organization can exert, through the DBA, centralized management and control over the data.

The centralised control is focused on the database administrator.

Any application that necessitates a change in the structure of a data record must make prior
arrangements with the DBA, who will make the appropriate changes.

The following are some of the most significant advantages of DBMS:

6.1 Advantages
Sharing Data
A database permits the sharing of data under its control by any number of users or
application programs.

Data Redundancy Reduction


Centralized control of data by the DBA avoids undesirable duplication of data and effectively
reduces the total amount of data storage required.

It also reduces the need for further processing to find the relevant data in a huge amount of
data.

Further

Another benefit of minimising duplication is that it eliminates discrepancies that are


common in redundant data files.

Any DBMS redundancies are managed, and the system assures that numerous copies of the
same data are consistent.

Data Integrity
Centralized control can also ensure that sufficient checks are incorporated in the DBMS to
facilitate data integrity. When we talk about data integrity, we're referring to the fact that
the data is accurate.

Unit 1: Database Management System Concepts 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The database's information is both accurate and consistent.

As a result, data values entering for storage could be double-checked to ensure that they are
within a certain range and are in the correct format. For example, consider the value for
the age of an employee may be in the range of 16 and 75. Another integrity check that
should be incorporated into the database is to ensure that if there is a reference to a certain
object, that objectmust exist. In the case of an automatic teller machine, for example, a user
isnot allowed to transfer funds from a nonexistent saving account to anexisting one.

Data Security
Data is of vital importance for any organization and may be confidential. Such confidential
data must be protected from unauthorized persons. The DBA, as the owner of the data in the
DBMS, can ensure that suitable access protocols are followed, such as proper authentication
schemas for DBMS access and additional checks before granting access to sensitive data.

For different types of data and procedures, multiple levels of security could be established.
The enforcement of security could be data value-dependent (e.g., a manager has access to
the salary details of employees in his or her department only), as well as data-type
dependent (e.g. the manager cannot access the medical history ofany employees, including
those in his/her department).

Conflict Resolution
Since the database is under the control of the DBA, she/ he should resolve the conflicting
requirements of various users and applications. In other words, the DBA selects the
appropriate file format and access mechanism for response-critical applications while
enabling less important apps to continue to use the database with a delayed response time.

Disadvantages
The expense of the DBMS system is a big disadvantage.. In addition to the cost of purchasing
or developing the software, the hardware has to be upgraded to facilitate the extensive
programs and the workspaces required for their storage and execution. The processing
overheadintroduced by the DBMS to implement security, integrity and sharing of the data
results in a degradation of the response and throughput times. Furthermore, an additional

Unit 1: Database Management System Concepts 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

cost is that of migration from a traditionally separate application environment to an


integrated one.

While centralization reduces duplication, the lack of duplication requires that the database
be sufficiently backed up, so that, the data can be recoveredin the case of failure. In a DBMS
setting, backup and recovery operations are fairly difficult, and this is worsened in a
concurrent multi - user database [Link], a database system needs a certain
amount of controlled redundancies and duplication to allow access to related data items.

Due to centralization, the data is accessible from a single source namely the database.

As a result, the severity of security breaches and disruptions to the organization's operations
due to downtime and failures is increased.

Many of the problems caused by downtime and failures are reduced when a centralised
database is replaced with a federation of independent and cooperating distributed
databases.

Self-Assessment Questions – 5

19. The database administrator is the focus of the _____control.


20. Any redundancies that exist in the DBMS are controlled and the systemensures
that these multiple copies are _________.
21. ____ means that the data contained in the database is bothaccurate and
consistent.
22. Data is of _______importance to an organization and may beconfidential.
23. A significant disadvantage of the DBMS system is_________ .

Unit 1: Database Management System Concepts 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. DBMS Vs RDBMS
Currently, the market-leading DBMS products are all SQL DBMS products. They were
originally based on the relational database management model - but did not implement the
model completely accurately. First of all, the SQL DBMS permits the data to be queried based
on any column in any table. Relational/SQL data is easier to query than CODASYL,
hierarchical, or some other model.

Secondly, since the relational model is based on set theory its usefulness and accuracy have
a basis in mathematics. A basis in mathematics thatindeed is centuries old and proven.

Thirdly, a relational database describes data in terms of its natural structure only – which
means, it excludes all details having to do with machine representation. The comparison
between DBMS and RDBMS is shown in table 1.1.

Table 1.1: DBMS Vs RDBMS

Sr. DBMS RDBMS


No.
1. It is Introduced in the 1960s. It is Introduced in the 1970s.
It takes more time to fetch data from a It is comparatively faster because of its
2.
complex and large amount ofdata set. relational model.
It is used for applications thatuse a small It is used for huge applications which
3.
amount of data. use a complex and large amount of data.
Managing DBMS is difficult due tohigher data Managing RDBMS is easier by avoiding
4.
redundancy. data redundancy.
Some examples are dBase,Microsoft Some example systems are SQLServer,
5.
Acces, and FoxPro. Oracle, and MySQL.

Self-Assessment Questions – 6

24. Relational/SQL data is to query than hierarchical,CODASYL, or some other


model.
25. The relational model is based on _____________ its accuracy and usefulness and has
a basis in mathematics.

Unit 1: Database Management System Concepts 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. SUMMARY
A database system is a collection of related files along with information about their
definition, interpretation, maintenance, and manipulation. A DBMS is a vital software
component of the database system. It consists of the collection of interrelated data and
programs to access that data. The main goal of a DBMS is to provide an environment that is
both efficient and convenient to use in retrieving desired information from and storing
information in the database.

The DBMS not only makes the integrated collection of reliable and accurate data available to
multiple applications and users but also prohibits unauthorized users to access the data. The
DBMS has its advantages and disadvantages. DBMS is different from RDBMS.

9. TERMINAL QUESTIONS
1. List out the database implicit properties.
2. What are the representative applications of Databases? List them.
3. Differentiate between physical data independence and logical dataindependence.
4. What are entities and attributes? Give one example.
5. What are relationships? Explain the relationship types.
6. Explain the Advantages and Disadvantages of the DBMS.
7. Compare DBMS with RDBMS.

Unit 1: Database Management System Concepts 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

10. ANSWERS
Self Assessment Questions
1. Data item
2. Universe of Discourse
3. Software
4. Database system
5. Airlines
6. Database
7. Automated teller
8. Telecommunication
9. Two
10. Physical
11. Logical
12. Difficult
13. data model
14. Planning
15. Entities
16. Instance
17. One-to-one
18. One
19. Centralized
20. Consistent
21. Data integrity
22. Vital
23. Cost
24. Easier
25. set theory

Unit 1: Database Management System Concepts 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions
1. Implicit properties of a database are: (i) A database represents some aspect of the real
world, sometimes called the mini-world or universe of discourse (UoD). (ii) A database
is a logically coherent collection of data with some inherent meaning and so on. (Refer
to section 1.2 for detail)
2. List of representative applications of Databases are Banking, Airlines, Universities,
Credit card transactions, Telecommunication, etc. (Refer to section 1.3 for detail.)
3. Physical data independence allows changes in the physical storage devices or
organization of the files to be made without requiring changesin the conceptual view
whereas Logical data independence implies that application programs need not be
changed if fields are added to an existing record. (Refer to section 1.4 for detail)
4. Entities are the principal data objects about which information is to be collected.
Attributes are characteristics of entities that provide descriptive details about them.
(Refer to section 1.5 for detail)
5. Relationships represent real-world associations among one or more entities. A
relationship is indicated by the connectivity between entityoccurrences: one-to-one,
one-to-many, and many-to-many. (Refertosection 1.5.2 for detail)
6. The advantages of using a database system are that the organization can exert, via the
DBA, centralized management and control over the data apart from Reduction of
Redundancies, Data integrity, Data security, and Conflict resolution. (Refer to section
1.6.1 for detail)
7. Relational/SQL data is easier to query than hierarchical, CODASYL, or some other
model. SQL DBMS products were originally based on the relational database
management model – but did not implement the model fully or completely accurately.
(Refer to section 1.7)

Unit 1: Database Management System Concepts 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 2: Database System Architecture 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 2
Database System Architecture
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction - -
3–4
1.1 Objectives - -
2 Three Level Architecture of DBMS 1 1

2.1 The External Level or Subschema - -

2.2 The Conceptual Level or Conceptual - - 5–7


Schema
2.3 The Internal Level or Physical Schema - -

2.4 Mapping - -
3 MySQL Architecture 2 2 8 – 10
4 SQL Server 2000 Architecture 3 3 11 – 12
5 Oracle Architecture 4 4 13 – 14
6 Database Management System Facilities - 5

6.1 Data Definition Language - - 15 – 16


6.2 Data Manipulation Language - -
7 Database Management System Structure 5 6

7.1 Database Manager - -


17 – 23
7.2 Database Administrator - -
7.3 Data Dictionary - -
8 Distributed Processing 6 7
8.1 Information and Communications 7 - 24 – 26
Technology System (ICT)
8.2 Client / Server Architecture 8 -
9 Summary - - 27
10 Terminal Questions - - 27
11 Answers - - 28 - 29

Unit 2: Database System Architecture 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
The architecture of a database system is greatly influenced by the underlying computer
system on which the database system runs. Database systems can be centralized, or client–
server, where one server machine executes work on behalf of multiple client machines.
Database systems can also be designed to exploit parallel computer architectures.
Distributed databases span multiple geographically separated machines. Distributed query
processing and directory systems are also described in this unit.

The architecture of a database system is greatly influenced by the underlying computer


system on which it runs, in particular by such aspects of computer architecture as
networking, parallelism, and distribution:
• Networking of computers allows some tasks to be executed on a server system, and
some tasks to be executed on client systems.
• Parallel processing within a computer system allows database-system activities to be
speeded up, allowing faster response to transactions, as well as more transactions per
second.
• Distributing data across sites or departments in an organization allows those data to
reside where they are generated or most needed, but still to be accessible from other
sites and other departments.

Also, the architecture of DBMS packages has evolved from the early monolithic systems,
where the whole DBMS software package was one tightly integrated system, to the modern
DBMS packages that are modularin design, with a client/server system architecture. A DBMS
can be considered as a buffer between application programs, end-users, and a database
designed to fulfill features of data independence. In 1975 the American National Standards
Institute Standards Planning and Requirements Committee (ANSI-SPARC) proposed a three-
level architecture for this buffer. This architecture identified three levels of abstraction.
These levels are sometimes referred to as schemas or views.

Unit 2: Database System Architecture 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1.1 Objectives:
By the end of Unit 2, you should be able to understand:
❖ Three views of data i.e., three levels of architecture of dbms
❖ Architectures of mysql, sql server, and oracle architecture
❖ Database management systems - facilities and structure
❖ Database manager and database administrator
❖ Distributed processing and client-server architecture.

Unit 2: Database System Architecture 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. THREE LEVEL ARCHITECTURE OF DBMS


Database systems are made-up of complex data structures. To provideeasy interaction
with the database system, developers hide internal irrelevant details from users. This
process of hiding irrelevant details from the user is known as data abstraction. Database
systems support three levels of data abstraction. A database management system that
provides these threelevels of data is said to follow three-level architecture as shown in
Figure 2.1. These three levels are the external level, the conceptual level, and the internal
level.

Fig 2.1: Three level architecture for a DBMS

The view at each of these levels is described by a schema. A schema as mentioned earlier is
an outline or a plan that describes the records and relationships existing in the view. The
schema also describes the way in which entities at one level of abstraction can be mapped to
the next level. The overall design of the database is called the database schema. A database
schema includes such information as:
• Characteristics of data items such as entities and attributes
• Logical structure and relationship among those data items
• Format for storage representation
• Integrity parameters such as physical authorization and backup politics.

The concept of a database schema corresponds to programming language notion of the type
definition. A variable of a given type has a particular value ata given instant in time. The

Unit 2: Database System Architecture 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

concept of the value of a variable in Programming languages corresponds to the concept of


an instance of a database schema.

Since each view is defined by a schema, there exist, several schemas in the database, and
these schemas are partitioned following three levels of data abstraction or views. At the
lower level, we have the physical schema; at the intermediate level, we have the conceptual
schema, while at the higher level we have a subschema. In general, a database system
supports one physical schema, one conceptual schema, and several subschemas.

2.1 The External Level or Subschema


The external level is at the highest level of database abstraction where only those portions
of the database of concern to a user or application program are included. Any number of user
views (some of which may be identical) may exist for a given global or conceptual view.

Each external view is described through a schema called an external schema or subschema.
The external schema consists of the definition of thelogical records and the relationships in
the external view. The externalschema also contains the method of deriving the objects in
the external view from the objects in the conceptual view. The object includes entities,
attributes, and relationships.

2.2 The Conceptual Level or Conceptual Schema


At this level of database abstraction, all the database entities and the relationships among
them are included. One conceptual view represents theentire database. This conceptual view
is defined by the conceptual schema. It describes all the records and relationships included
in the conceptual viewand, therefore, in the database. There is only one conceptual schema
per database. This schema also contains the method of deriving the objects in the conceptual
view from the objects in the internal view.

The description of data at this level is in a format independent of its physicalrepresentation.


It also includes features that specify the checks to retain data consistency and integrity.

2.3 The Internal Level or Physical Schema


We find this view at the lowest level of abstraction, closest to the physical storage method
used. It indicates how the data will be stored and describes the data structures and access
methods to be used by the database. The internal view is expressed by the internal schema,

Unit 2: Database System Architecture 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

which contains the definition of the stored record, the method of representing the data fields,
and the access aids used.

2.4 Mapping
Two mappings are required in a database system with three different views as shown in
Figure 2.1. A mapping between the external and conceptual levels gives the correspondence
among the records and the relationships of the external and conceptual levels.

Similarly, there is a mapping from a conceptual record to an internal one. Aninternal record
is a record at the internal level, not necessarily a stored record on a physical storage
device. The internal record of figure 2.2 maybe split up into two or more physical records.
The physical database is the data that is stored on secondary storage devices. It is made up
of records with certain data structures and organized in files. Consequently, there is an
additional mapping from the internal record to one or more stored recordson secondary
storage devices.

Self-Assessment Questions – 1

1. The overall design of the database is called the __________schema.


2. In general, a database system supports one physical schema, oneconceptual
schema, and several ________.
3. The highest level of abstraction as seen by a user is called as
___________view.
4. _______ _____level describes what data are actually stored in the database.
5. There is only _________conceptual schema per database.
6. The physical database is the data that is stored on _____________storage devices.

Unit 2: Database System Architecture 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. MYSQL ARCHITECTURE
MySQL is based on a tiered architecture, consisting of both primary subsystems and support
components, that interact with each other to read, parse, and execute queries, and to cache
and return query results.

Primary Subsystems
The MySQL architecture consists of five primary subsystems that worktogether to respond
to a request made to the MySQL database server:
• The Query Engine
• The Storage Manager
• The Buffer Manager
• The Transaction Manager
• The Recovery Manager

The organization of these features is shown in Figure 2.2. We’ll explaineach one briefly
to help you gain a better understanding of how the parts fit together.

Fig 2.2: Architecture of MySQL

The Query Engine: This subsystem contains three interrelated components:


• The Syntax Parser
• The Query Optimizer
• The Execution Component

The Syntax Parser decomposes the SQL commands it receives from callingprograms into a
form that can be understood by the MySQL engine. The objects that will be used are

Unit 2: Database System Architecture 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

identified, along with the correctness of the syntax. The Syntax Parser also checks the objects
being referenced to ensure that the privilege level of the calling program allows it to use
them.

The Query Optimizer then streamlines the syntax for use by the Execution Component, which
then prepares the most efficient plan of query execution. The Query Optimizer checks to see
which index should be used to retrieve the data as quickly and efficiently as possible. It
chooses one from among the several ways it has found to execute the query and then creates
a plan of execution that can be understood by the Execution Component.

The Execution Component then interprets the execution plan and, based onthe information
it has received, makes requests of the other components to retrieve the records.

The Storage Manager: The Storage Manager interfaces with the operating system (OS) to
write data to the disk efficiently. Because the storage functions reside in a separate
subsystem, the MySQL engine operates at a level of abstraction away from the operating
system. This means that if you port to a different operating system that uses a different
storagemechanism, for example, you can rewrite only the storage portion of the code while
leaving the rest of the engine as it is. With the help of MySQL’s Function Libraries, the Storage
Manager writes to disk all of the data in the user tables, indexes, and logs as well as the
internal system data.

The Buffer Manager: This subsystem handles all memory management issues between
requests for data by the Query Engine and the Storage Manager. MySQL makes aggressive
use of memory to cache result sets that can be returned as-is rather than making duplicate
requests to the Storage Manager; this cache is maintained in the Buffer Manager.

This is also the area where new records can be cached while waiting for the availability of
targeted tables and indexes. If any new data is needed, it’s requested from the Storage
Manager and placed in the buffer, before then being sent to the Query Engine.

The Transaction Manager: The function of the Transaction Manager is to facilitate


concurrency in data access. This subsystem provides a locking facility to ensure that multiple
simultaneous users consistently access the data, without corrupting or damaging the data in
any way. Transaction control takes place via the Lock Manager subcomponent, whichplaces

Unit 2: Database System Architecture 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

and releases locks on various objects being used in transactions. Each transactional table
handler implements its own Transaction Managerto handle all locking and concurrency
needs.

The Recovery Manager: The Recovery Manager’s job is to keep copies of data for retrieval
later, in case of a loss of data. It also logs commands that modify the data and other significant
events inside the database.

SELF ASSESSMENT QUESTIONS – 2

7. The MySQL architecture consists of _ _________primary subsystems.


8. The _____decomposes the SQL commands it receives from calling
programs into a form that can be understood by the MySQL engine.
9. The _____interfaces with the operating system (OS) to writedata to the
disk efficiently.
10. The buffer manager subsystem handles all memory management issues
between requests for data by the ______and Storage Manager.
11. Each transactional table handler implements its own ______to handle all
locking and concurrency needs.

Unit 2: Database System Architecture 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. SQL SERVER 2000 ARCHITECTURE


Microsoft® SQL Server 2000 is a family of products that meet the data storage requirements
of the largest data processing systems and commercial Web sites, yet at the same time can
provide easy-to-use data storage services to an individual or small business. The SQL Server
architecture is shown in Figure 2.3.

Fig 2.3: SQL Server Architecture

The data storage needs of a modern corporation or government organization are very
complex. Some examples are:
• Online Transaction Processing (OLTP) systems must be capable ofhandling thousands
of orders placed at the same time.

• Increasing numbers of corporations are implementing large Web sites as a mechanism


for their customers to enter orders, contact the service department, get information
about products, and for many other tasks that previously required contact with
employees. These sites require data storage that is secure, yet tightly integrated with
the Web.

Unit 2: Database System Architecture 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

• Organizations are implementing off-the-shelf software packages for critical services


such as human resources planning, manufacturing resources planning, and inventory
control. These systems require databases capable of storing large amounts of data and
supporting largenumbers of users.

• Organizations have many users who must continue working when they do not have
access to the network. Examples are mobile disconnected users, such as traveling sales
representatives or regional inspectors. These users must synchronize the data on a
notebook or laptop with thecurrent data in the corporate system, disconnect from the
network, record the results of their work while in the field, and then finally reconnect
with the corporate network and merge the results of their fieldwork into the corporate
data store.

• Managers and marketing personnel need increasingly sophisticated analysis of trends


recorded in corporate data. They need robust Online Analytical Processing (OLAP)
systems easily built from OLTP data and support sophisticated data analysis.

• Independent Software Vendors (ISVs) must be able to distribute data storage


capabilities with applications targeted at individuals or small workgroups. This means
the data storage mechanism must be transparent to the users who purchase the
application. This requires a data storage system that can be configured by the
application and then tune itself automatically so that the users do not need to dedicate
database administrators to constantly monitor and tune the application.

Self-Assessment Questions – 3

12. ________ systems must be capable of handling thousands of orders placed at the
same time.
13. In ___________the data storage mechanism must be transparent to the users who
purchase the application.

Unit 2: Database System Architecture 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. ORACLE ARCHITECTURE
The Oracle server consists of physical files and memory components. Figure 2.4 displays the
architecture of the Oracle Database 9i. It is broadly divided into memory components which
form the Oracle instance and the physical database components, where different kinds of
data are stored.

Fig 2.4: Oracle Architecture

The Oracle 9i Database product is made up of three main components namely:


• The Oracle Server – This is the Oracle database management system that is able to
store, manage and manipulate data. It consists of all the files, structures, and processes
that form Oracle Database 9i. The Oracle server is made up of an Oracle instance and an
Oracle database.
• The Oracle Instance – Consists of the memory components of Oracle and various
background processes.
• The Oracle Database – This is the centralized repository where thedata is stored. It
has a physical structure that is visible to the Operating system made up of operating
system files and a logical structure that is recognized only by the Oracle Server.

Unit 2: Database System Architecture 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 4

14. Oracle 9i Database product is made up _____________ of main components.


15. The ______consists of physical files and memory components.
16. The ________consists of the memory components of Oracle andvarious
background processes.

Unit 2: Database System Architecture 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. DATABASE MANAGEMENT SYSTEM FACILITIES


Two main types of facilities are supported by the DBMS: the data definition facility or data
definition language (DDL), the data manipulation facility or data manipulation language
(DML)

6.1 Data Definition Language


Database management systems provide a facility known as the data definition language
(DDL), which can be used to define the conceptual schema and also give some details about
how to implement this schema in the physical devices used to store the data. This definition
includes all the entity sets and their associated attributes, as well as the relationships among
the entity sets. The definition also includes any constraints that have to be maintained,
including the constraints on the value that can be assigned to a given attribute, and the
constraints on the values assigned to different attributes in the same or different records.
These definitions, which can be described as metadata about the data in the database, are
expressed in the DDL of the DBMS and maintained in a compiled form (usually as a set of
tables). The compiled form of the definitions is known as a data dictionary, directory, or
system catalog. The data dictionary contains information on the data stored in the Database
Base Concepts andis consulted by the DBMS before any data manipulation operation.

The database management system maintains the information on the file structure, the
method used to efficiently access the relevant data (i.e., the access method). It also provides
a method whereby the application programs indicate their data requirements. The
application program could use a subset of the conceptual data definition language or a
separate language. The database system also contains mapping functions that allowit to
interpret the stored data for the application program.

The internal schema is specified in a somewhat similar data definition language called data
storage definition language. The definition of the internal view is compiled and maintained
by the DBMS. The compiled internal schema specifies the implementation details of the
internal database, including the access methods employed. This information is handled by
the DBMS; the user need not be aware of these details.

Unit 2: Database System Architecture 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6.2 Data Manipulation Language


DML is a language that enables users to access or manipulate as organizedby the appropriate
data model. Data manipulation involves retrieval of data from the database, insertion of new
data into the database, and deletion or

modification of existing data. The first of these data manipulation operations is called a
query. A query is a statement in the DML that requests the retrieval of data from the
database. The subset of the DML used to pose a query is known as a query language; however,
we use the terms DML and query language synonymously.

The DML provides commands to select and retrieve data from the [Link] are
also provided to insert, update, and delete records. They could be used in an interactive
mode or embedded in conventional programming languages such as Assembler, COBOL,
FORTRAN, Pascal,or PL/I. The data manipulation functions provided by the DBMS can be
invoked in application programs directly by procedure calls or by preprocessor statements.
The latter would be replaced by appropriate procedure calls by either a preprocessor or the
compiler.

There are basically two types of DML:


Procedural: which requires a user to specify what data is needed and how to get it.

Nonprocedural: which requires a user to specify what data is neededwithout specifying


how to get it.

Self-Assessment Questions – 5

17. Database management systems provide a facility known as the____________


which can be used to define the conceptual schema.
18. DML is a language that enables users to access or manipulate as organized by the
appropriate _______.
19. ___________DML requires a user to specify what data is neededand how to get it.

Unit 2: Database System Architecture 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. DATABASE MANAGEMENT SYSTEM STRUCTURE


Data definition of the external view in most current DBMSs is done outside the application
program or interactive session. Data manipulation is done byprocedure calls to subroutines
provided by a DBMS or via preprocessor statements. In an integrated environment, data
definition and manipulation are achieved using a uniform set of constructs, that forms part
of the user's programming environment.

The major components of a DBMS structure are explained below in Figure 2.5.

Fig 2.5: DBMS Structure

DML Precompiler: It converts DML statement embedded in an application program to


normal procedure calls in the host language. The precompiler must interact with the query
processor in order to generate the appropriate code.

DDL Compiler: The DDL compiler converts the data definition statements into a set of
tables. These tables contain information concerning the database and are in a form that can
be used by other components of the DBMS.

File Manager: The file manager manages the allocation of space on disk storage and the data
structure used to represent information stored on disk. The file manager can be
implemented using an interface to the existing file subsystem provided by the operating

Unit 2: Database System Architecture 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

system of the host computer, or it can include a file subsystem written especially for the
DBMS.

7.1 Database Manager


Databases typically require a large amount of storage space. Corporate databases are usually
measured in terms of gigabytes of data. Since the main memory of computers cannot store
this information, it is stored on disks. Data is moved between disk storage and main
memory as needed.

Since the movement of data to and from the disk is slow relative to the speed of control
processing unit of computers, it is imperative that the database system structure data to
minimize the need to move data between disk and main memory. A database manager is a
program module that provides the interface between the low-level data stored in the
database and the application programs and queries submitted to the system. It is responsible
for interfacing with the file system.

One of the functions of a database manager is to convert the user's queries coming directly
via the query processor or indirectly via an applicationprogram from the user's logical view
to the physical file system. In addition, the tasks of enforcing constraints to maintain the
consistency and integrity of the data as well as its security are also performed by the
database manager. Synchronizing the simultaneous operations performed by concurrent
users is under the control of the data manager. It also performs backup and recovery
operations. Let us now summarize the important responsibilities of a Database manager:
• Interaction with a file manager: The raw data is stored on the disk using the file
system which is usually provided by a conventional operating system. The database
manager translates the various DML statements into low-level file system commands.
Thus the database manager is responsible for the actual storing, retrieving, and
updating of data in the database.
• Integrity enforcement: The data values stored in the database must satisfy certain
types of consistency constraints. For example, the balance of a bank account may
never fall below a prescribed amount(for example ` 200). Similarly, the number of
holidays per year an employee may be having should not exceed 25 days. These
constraints must be specified explicitly by the DBA. If such constraints are specified,

Unit 2: Database System Architecture 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

then the database manager can check whether updates to the database result in the
violation of any of these constraints and if so appropriate action may be imposed.
• Security enforcement: As discussed above, not every user of thedatabase needs to
have access to the entire content of the database. It is the job of the database manager
to enforce these securityrequirements.
• Backup and recovery: A computer system like any other mechanical or electrical
device is subject to failure. There are a variety of causes of
• such failure, including disk crash, power failure, and s/w errors. In eachof these cases,
information concerning the database is lost. It is the responsibility of the database
manager to detect such failures andrestore the database to a state that existed prior to
the occurrence of thefailure. This is usually accomplished through backup and recovery
procedures.
• Concurrency Control: When several users update the database concurrently, the
consistency of data may no longer be preserved. It is necessary for the system to control
the interaction among the concurrentusers, and achieving such control is one of the
responsibilities of the database manager.

Query Processor: The database user retrieves data by formulating a query in the data
manipulation language provided with the database. The query processor is used to interpret
the online user's query and convert it into an efficient series of operations, in a form capable
of being sent to the data manager for execution. The query processor uses the data dictionary
to findthe structure of the relevant portion of the database and uses this information in
modifying the query and preparing an optimal plan to access the database.

7.2 Database Administrator


One of the main reasons for having a database management system is to have control of both
the data and programs accessing that data. The personhaving such control over the system
is called the database administrator (DBA). The DBA administers the three levels of the
database and, in consultation with the overall user community, sets up the definition of the
global view or conceptual level of the database. The DBA further specifies the external view
of the various users and applications and is responsible for the definition and
implementation of the internal level, including thestorage structure and access methods to
be used for the optimum performance of the DBMS. Changes to any of the three levels

Unit 2: Database System Architecture 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

necessitated by changes or growth in the organization and/or emerging technology are


under the control of the DBA.

Mappings between the internal and the conceptual levels, as well as between the internal
and the conceptual levels, as well as between the conceptual and external levels, are also
defined by the DBA. Ensuring that appropriate measures are in place to maintain the
integrity of the database and that the database is not accessible to unauthorized users is
another responsibility. The DBA is responsible for granting permission to the users of the
database and stores the profile of each user in the database. This profile describes the
permissible activities of a user on that portion of the database accessible to the user via one
or more user views. The user profile can be used by the database system to verify that a
particular user can perform a given operation on the database.

The DBA is also responsible for defining procedures to recover the database from failures
due to human, natural, or hardware causes with minimal loss of data. This recovery
procedure should enable the organization to continue to function, and the intact portion of
the database should continue to be available.

Let us summarize the functions of the DBA:


• Schema definition: The creation of the original database schema. This is accomplished
by writing a set of definitions which is translated by the DDL compiler to a set of tables
that is permanently stored in the data dictionary.
• Storage Structure and access method definition: The creation of appropriate
storage structure and access method. This is accomplished by writing a set of
definitions which are translated by the data storage and definition language compiler.
• Schema and Physical organization modification: Either the modification of the
database schema or the description of the physical storage organization. These
changes, although relatively rare, are accomplished by writing a set of definitions which
is used by either the DDL compiler or the data storage, and definition language compiler
to generate modifications to the appropriate internal system tables (for example, the
data dictionary).
• Granting of authorization for data access: The granting of different types of
authorization for data access to the various users of the database.

Unit 2: Database System Architecture 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

• Integrity constraint specification: These constraints are kept in a special system


structure that is consulted by the database manager whenever one of the valuable tools
that the DBA uses to carry out data administration in the data dictionary is required.

7.3 Data Dictionary


It is seen that when a program becomes somewhat large in size, keeping track of all the
available names that are used, and the purpose for which

they were used becomes more and more difficult. Of course, it is possible fora programmer
who has coined the available names to bear them in mind,but should the same author come
back to his program after a significant time, or should another programmer have to modify
the program, it would befound that it is extremely difficult to make a reliable account of the
purpose the data files were used for.

The problem becomes even more difficult when the number of data types that an
organization has in its database increases. It has also now perceived that the data of an
organization is a valuable corporate resource, and therefore some kind of an inventory, and
catalog of it must be maintained to assist in both the utilization and management of the
resource.

It is for this purpose that a data dictionary or dictionary/directory is emergingas a major


tool. An inventory provides definitions of things. A directory tells you where to find them. A
data dictionary/directory contains information (or data) about the data.

A comprehensive data dictionary would define data items, how they fit into the data
structure and how they relate to other entities inthe database. With the comprehensive
base of information, the data dictionary can serve several useful purposes connecting across
the whole spectrum of planning, determining information requirements, designing and
implementing operations, and revision. There is now a greater emphasis on having an
integrated system in which the data dictionary is part of the DBMS. In such a case the data
dictionary would store the information concerning the external, conceptual, and internal
levels of the databases. It would combine the source of each data field value that is from
where the authenticate value is obtained, the frequency of its use, and the audit trail
regarding the updates, including user identification with the time of each update.

Unit 2: Database System Architecture 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The DBA uses the data dictionary in every phase of a database life cycle, starting from the
embryonic data gathering phase to the design, implementation, and maintenance phases.
The documentation provided by a data dictionary is as valuable to end-users and managers
as it is essential to the programmers. Users can plan their applications with the database
only if they know exactly what is stored in it. For example, the description of a data item in a
data dictionary may include its origin and other text description in plain English, in addition
to its data format. Thus users and managers will be able to see exactly what is available in
the database. You could consider a data dictionary to be a road map that guides users to
access information within a large database.

An ideal data dictionary should include everything a DBA wants to know about the database.

1. External, conceptual, and internal database descriptions


2. Descriptions of entities (record types), attributes (fields), as well as cross-
references, origin, and meaning of data elements
3. Synonyms, authorization, and security codes
4. Which external schemas are used by which programs, who the usersare, and what
their authorizations are.

A data dictionary is implemented as a database so that users can query its content by either
interactive or batch processing. Whether or not the cost of acquiring a data dictionary system
is justifiable depends on the size and complexity of the information system. The cost-
effectiveness of a data dictionary increases as the complexity of an information system
increases. A data dictionary can be a great asset not only to the DBA for database design,
implementation, and maintenance but also to managers or end-users in their project
planning.

Unit 2: Database System Architecture 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 6

20. ______________ converts DML statement embedded in an application


program to normal procedure calls in the host language.
21. The DDL compiler converts the data definition statements into a set of
_______________ .
22. The _______translates the various DML statements into low-levelfile
system commands.
23. The _____is used to interpret the online user's query and convert it
into an efficient series of operations.
24. The _______is also responsible for defining procedures to recover the
database from failures.
25. The DBA grants different types of __________for data access to thevarious
users of the database.
26. The DBA uses the___ ______in every phase of a database life cycle.
27. A data dictionary is implemented as a database so that users canquery
its content by either interactive or _________processing.

Unit 2: Database System Architecture 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. DISTRIBUTED PROCESSING
It is possible to make the distinction between two dimensions of distributed database
systems: distributed data and distributed processing. In this Unit, we provide an overview
of the topic of distributed processing. Distribution can also be discussed in terms of the
distribution of functions or processing. Figure 2.6 illustrates a system characterized by
distributed processing. Here, we also have four sites connected by a communications
network. However, in this configuration only site 1 stores any data. Sites 2, 3, and 4 act as
clients to this database, perhaps running particular information system applications. It is
for this reason we include client-server database systems within our discussion of
distributed database systems. Although most current client-server database systems do not
effectively distribute data, they do distribute functionality.

Fig 2.6: Example of Distributed Processing

8.1 Information and Communications Technology System (ICT)


In terms of the software, it is useful to consider an ICT system as being made upof a number
of subsystems or horizontal layers (Figure 2.7). The layers define the major component parts
of processing in an ICT system:

Unit 2: Database System Architecture 24


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 2.7: Components of ICT

• Interface subsystem: This subsystem is responsible for managing interaction with the
end-user. It is generally referred to as the user interface
• Rules subsystem: This subsystem manages the application logic in terms of a defined
model of business rules. In a database application, these will comprise functions, which
primarily enforce both inherent and additional constraints on data
• Transaction subsystem: This subsystem acts as a link between the data subsystem and
the rules and interface subsystems. Querying, insertion, and update activity are
triggered at the interface, validated by the rules subsystem, and packaged as units
(transactions) that will initiate actions (responses or changes) in the data subsystem
• Data subsystem: This subsystem is responsible for managing the underlying data
needed by an application.

8.2 Client /Server Architecture


Client-server is a software architecture in which two processes interact as superior and
subordinate. A client-server architecture is shown in Figure 2.8. The client process always
initiates requests and the server process always responds to requests. Theoretically, the
client and server can reside on the same machine. Typically however they run on separate
machines linkedby some form of communications network, usually a local area network.

Many different types of servers exist, for example:


• Mail servers
• Print servers
• File servers
• Database (SQL) server

Unit 2: Database System Architecture 25


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Most people tend to equate the term client-server with a network of PCs or workstations
connected by a network to a remote machine running a database server.

Fig 2.8: Client/Server Architecture

In practice, a client-server database system generally refers to a local area network of


personal computers (PCs). At least one of these PCs is dedicated to serving the database
needs of the others, which act in a client capacity. The database is held on the server. The
user interface and application development tools are held on the client machines.

The server in this configuration is either set up as a file server or SQLserver. In a file
server situation, an SQL query expressed by a client will issue a request to the server for the
appropriate files needed by the query. The client will perform the query and extract the
relevant data. In an SQL server situation, the SQL statement will travel down the
communication line from client to server.

The server then executes the query and sends back only the extracted [Link], because
of the reduced communication traffic, most DBMS now offer SQL server facilities.

Self-Assessment Questions – 7

28. In ICT, the ________subsystem is responsible for managing


interaction with the end-user.
29. The transaction subsystem acts as the link between the ________and the rules
and interface subsystems.
30. The _______subsystem manages the application logic in terms of adefined
model of business rules.
31. Client-server is a software architecture in which two processes interactas
superior and _________.
32. In practice, a client-server database system generally refers to a
_____________ of personal computers (PCs).

Unit 2: Database System Architecture 26


DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. SUMMARY
This Unit gives the descriptions and different components of MySQL, SQL Server 2000, and
Oracle 9i architecture. The DBMS structure is defined by three views of data. A DBMS is a
major software system consisting of a number of elements. It provides users with DDL for
defining the external and conceptual view of the data and DML for manipulating the data
stored in thedatabase. The database manager is the component of DBMS that provides the
interface between the user and the file system. The database administration defines and
maintains the three levels of the database as well as the mapping between levels to insulate
die higher levels from changes that take place in the lower levels. The DBA is responsible for
implementing measures for ensuring the security, integrity, and recovery of the database.

10. TERMINAL QUESTIONS


1. Explain the three-level architecture of DBMS.
2. List and explain the MySQL architectural primary subsystems.
3. What are the applications of SQL Server 2000 architecture?
4. Explain the Oracle architecture with a neat diagram.
5. What are the two main types of facilities that are supported by theDBMS? Discuss their
applications.
6. Discuss the responsibilities of the database manager and databaseadministrator.
7. Describe the features of distributed processing.
8. Give the advantages of the client/server architecture.

Unit 2: Database System Architecture 27


DCA2102: Database Management System Manipal University Jaipur (MUJ)

11. ANSWERS
Self-Assessment Questions
1. Database
2. Subschema
3. External
4. Internal
5. One
6. Secondary
7. Five
8. Syntax Parser
9. Storage Manager
10. Query Engine
11. Transaction Manager
12. Online Transaction Processing (OLTP)
13. Independent Software Vendors (ISVs)
14. Three
15. Oracle server
16. Oracle Instance
17. data definition language (DDL)
18. data model
19. Procedural
20. DML Precompiler
21. Tables
22. database manager
23. query processor
24. DBA
25. Authorization
26. data dictionary
27. batch processing
28. Interface subsystem
29. data subsystem
30. rules
31. subordinate
32. local area network

Unit 2: Database System Architecture 28


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions
1. The three-level architecture of DBMS includes the External Level or Subschema, the
Conceptual Level or Conceptual Schema, and the Internal Level or Physical Schema.
(Refer section 2 for detail)
2. The MySQL architecture consists of five primary subsystems that work together to
respond to a request made to the MySQL database server: the Query Engine, the Storage
Manager, the Buffer Manager, the Transaction Manager, and the Recovery Manager.
(Refer section 3 for detail)
3. Microsoft® SQL Server 2000 is applied in the different fields that meet the data storage
requirements of the largest data processing systems and commercial Web sites, yet at
the same time can provide easy-to-use data storage services to an individual or small
business. (Refer section 4 for detail)
4. The Oracle server consists of physical files and memory components. E.g. the Oracle
9i Database product is made up of three main components namely: (i) The Oracle
Server (ii) The Oracle Instance (iii) The Oracle Database (Refer section 5 for detail)
5. Two main types of facilities that are supported by the DBMS are the data definition
facility or data definition language (DDL), the data manipulation facility, or data
manipulation language (DML). (Refer section 6 for detail)
6. A database manager is a program module which provides the interface between the
low-level data stored in the database and the application programs and queries
submitted to the system. The DBA administers the three levels of the database and, in
consultation with the overall usercommunity, sets up the definition of the global view
or conceptual level of the database. (Refer section 7 for detail)
7. The main feature of Distributed processing is to connect communication networks
having different sites which can store data. They need to connect for processing stored
data or functions. (Refer section 8 for detail)
8. The main advantage of Client/Server Architecture is that the client process always
initiates requests and the server process always responds to requests so this type of
architecture is applicable with the network of PCs or workstations connected by a
network to a remote machine running a database server. (Refer section 8.2 for detail)

Unit 2: Database System Architecture 29


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 3: Database Models and Implementation 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 3
Database Models and Implementation
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Data Model and Types of Data Models - 1

2.1 Relational Data Model - -

2.2 Hierarchical Model - -


4 - 13
2.3 Network Data Model - -

2.4 Object/Relational Model - -

2.5 Object-Oriented Model - -


3 Entity-Relationship Model - -

3.1 Modeling using E-R Diagrams - -


14 – 22
3.2 Notation used in E-R Model 1 -

3.3 Relationships and Relationship Types 2, 3, 4 2


4 Associative Database Model - 3 23 – 25
5 Summary - - 26
6 Terminal Questions - - 27
7 Answers - - 28 – 29

Unit 3: Database Models and Implementation 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
A data model is an integrated collection of concepts for describing data, relationships
between data, and constraints on the data. A data modelallows us to treat a database as an
abstract machine. In other words, wecan concentrate on the principles of design divorced
from an immediate concern with implementation. Data models constitute formal languages
for defining data structures declaring integrity and for manipulating data. A data model is a
mechanism for specifying the schema of some database. Data models establish the principles
underlying DBMS. Every database, andindeed every DBMS, must adhere to the principles of
some data model. However, the term data model is somewhat ambiguous. In the database
literature, the term is used in a number of different senses, two of which are the most
important: that of architecture for data, and that of an integrated set of data requirements.

1.1 Objectives:
After completing Unit 3, you should be able to:
❖ Define the concepts of the Relational Model and its implementations
❖ Understand the concepts of other Data Models like Hierarchical Model, Network Data
Model, Object/Relational Model, Object-Oriented Model, and the Associative Database
Model.
❖ Understand the Entity-Relationship Model and its applications
❖ Differentiate between data models

Unit 3: Database Models and Implementation 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. DATA MODEL AND TYPES OF DATA MODELS


The term data model is used to refer to a set of general principles for handling data. Here,
people talk of the relational data model, the hierarchical data model, or the object-oriented
data model.

This set of principles that define a data model may be divided into three major parts:
• Data definition – a set of principles concerned with how data isstructured.
• Data manipulation – a set of principles concerned with how data isoperated upon.
• Data integrity – a set of principles concerned with determining whichstates are
valid for a database.

Data definition involves defining an organization for data, a set of templates for the
organization of data. Data manipulation concerns the process of howthe data is accessed,
and how it is changed in the database. Data integrityis very much linked with the idea of
data manipulation, in the sense that integrity concerns the idea of what are valid changes
and invalid changes todata.

We may make a distinction between three generations of architectural data model:


• Primitive data models: In this approach, objects are represented by record structures
grouped in file structures. The main operations available are read and write operations
over records
• Classic data models: These are the hierarchical, network, and relational data models.
The hierarchical data model is an extension of the primitive data model discussed
above. The network is an extension of the hierarchical approach. The relational data
model is a fundamental departure from the hierarchical and network approaches.
• Semantic data models: The main problem with classic data models, such as the
relational data model, is that they maintain a fundamental record orientation.

2.1 Relational Data Model


The relational model today, is the primary data model for commercial data-processing
applications. It has attained its primary position because of its simplicity, which eases the
job of the programmer, as compared to earlier data models such as the network model or the
hierarchical model.

Unit 3: Database Models and Implementation 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

A database based on the relational model developed by E.F. Codd allows the definition of
data structures, storage and retrieval operations, and integrity constraints. In such a
database the data and relations between them are organized in tables. A table is a collection
of records and each record in a table contains the same fields. Properties of Relational
Tables:
• Values Are Atomic
• Each Row is Unique
• Column Values Are of the Same Kind
• The Sequence of Columns is Insignificant
• The Sequence of Rows is Insignificant
• Each Column has a unique Name

In the relational data model, the database is represented as a group of related tables. The
relational data model was introduced in 1970. It iscurrently the most popular model. The
mathematical simplicity, and ease of visualization of the relational data model, have
contributed to its success. The relational data model is based on the mathematics of set
theory, whosebasic components are the following.

Relation: A two-dimensional table. A relation is a collection of tuples, each of which contains


values for a fixed number of attributes. Relations are sometimes referred to as at files,
because of their resemblance to an unstructured sequence of records. Each tuple in a relation
must be unique; that is, there can be no duplicates.

A Relation may be defined in multiple ways. The Relation Schema R, denoted by R (A1, A2,
An), is made up of a relation name R and is a list of attributes A1, A2, An.

For Example - CUSTOMER (Cust-id, Cust-name, Address, Phone#). Here, CUSTOMER is a


relation defined over the four attributes Cust-id, Cust- name, Address, Phone#, each of which
has a domain or a set of valid values.

A tuple is an ordered set of values. Each value is derived from an appropriate domain. Each
row in the CUSTOMER table may be referred to as a tuple in the table and would consist of
four values.

Unit 3: Database Models and Implementation 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

<1759, "Rama Krishna", "101 Main 3rd Cross Manipal", "0820-2653487"> is a tuple
belonging to the CUSTOMER relation.

A relation may be regarded as a set of tuples (rows). Columns in a table are also called
attributes of the relation.

A domain has a logical definition, for example, “India_Phone_numbers” is the set of 10 digit
mobile numbers. A domain may have a data type or a format defined for it. The
India_Phone_numbers may have a format: (ddd)- ddddddd where each d is a decimal digit.
E.g., Dates have various formats such as a month, name, date, year or yyyy-mm-dd, or
dd,mm,yyyy, etc.

An attribute designates the role played by the domain. E.g., the domainDate may be used
to define attributes “Invoice-date” and “Payment-date”. The relation is formed over the
cartesian product of the sets; each set has values from a domain; that domain is used in a
specific role which isconveyed by the attribute name. For example, attribute Cust-name is
defined over the domain of strings of 25 characters. The role these strings play in the
CUSTOMER relation is that of the name of the customers.
Given R(A1, A2, , An)
r(R)  dom (A1) X dom (A2) X X dom(An)

Where R is a schema of the relation and r(R) is a specific "value" orpopulation of R. Here R is
also called the intension of a relation and r is also called the extension of a relation.

Let S1 and S2 be domains, S1 = {0,1} and S2 = {a,b,c}. The Relation R canbe written as
R  S1 X S2
r(R) = {<0,a> , <0,b> , <1,c> }

is one possible “state” or “population” or “extension” r(R), defined over domains S1 and S2.
It has three tuples.

Characteristics of Relations
• Ordering of tuples in a relation r(R): The tuples are not considered to be ordered,
even though they appear to be in the tabular form.

Unit 3: Database Models and Implementation 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

• Ordering of attributes in a relation schema R (and of values within each tuple):


We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to
be ordered.
• Values in a tuple: All values are considered atomic (indivisible). A special null value is
used to represent values that are unknown orinapplicable to certain tuples.

Relational Integrity Constraints


Constraints are conditions that must hold on t o all valid relation [Link] are
three main types of constraints:
– Key constraints
– Entity integrity constraints
– Referential integrity constraints

Key Constraints
Superkey of R: A set of attributes SK of R such that no two tuples in any valid relation
instance r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R),
t1[SK] ≠ t2[SK].

Key of R: A "minimal" superkey; that is, a superkey K such that removal of any attribute from
K results in a set of attributes that is not a superkey.

For Example: Consider a CAR relation schema:


CAR(State, Reg#, SerialNo, Make, Model, Year)

It has two keys Key1 = {State, Reg#}, Key2 = {SerialNo}, which are also superkeys.
{SerialNo, Make} is a superkey but not a key.

If a relation has several candidate keys, one is chosen arbitrarily to be theprimary key. The
primary key attributes are underlined.

Unit 3: Database Models and Implementation 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Entity Integrity
Relational Database Schema: A set S of relation schemas that belong tothe same database.
S is the name of the database.
S = {R1, R2, ..., Rn}

Entity Integrity: The primary key attributes PK of each relation schemaR in S cannot
have null values in any tuple of r(R). This is because primary key values are used to identify
the individual tuples.
t[PK]  null for any tuple t in r(R)

Referential Integrity
A constraint involving two relations (the previous constraints involve a singlerelation). Used
to specify a relationship among tuples in two relations, the referencing relation, and the
referenced relation.

Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that
reference the primary key attributes PK of the referenced relation R2. A tuple t1 in R1 is said
to reference a tuple t2 in R2 if t1[FK] = t2[PK].

A referential integrity constraint can be displayed in a relational database schema as a


directed arc from [Link] to R2.

Referential Integrity Constraints


The value in the foreign key column (or columns) FK of the referencing relation R1 can be
either a value of an existing primary key value of the corresponding primary key PK in the
referenced relation R2 or a null.

Attribute: A table column. Other commonly used terms for attribute are property and field.
The set of permissible values for each attribute is called the domain for that attribute.
Tuple: A table row. A tuple is an instance of an entity or relationship or whatever is
represented by the relation.
Key: A single attribute or combination of attributes whose values uniquely identify the
tuples of the relation. That is, each row has a different value for the key attribute(s). The
relational model requires that every relation have a key and that:

Unit 3: Database Models and Implementation 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

• No two tuples may have the same key value and


• Every tuple must have a value for the key attribute (the key fields have non-null values).

There are two restrictions on the relational model that are sometimes circumvented in
practice:
• Duplicate tuples are not permitted. If two tuples are entered with the same value for
each and every attribute, they are considered to be the same tuple.
• No ordering of tuples within a relation is assumed. In practice, however, one method or
another of ordering tuples is often used.

One of the main advantages of the relational model is that it is conceptually simple and more
importantly based on the mathematical theory of relation. It also frees the users from details
of the storage structure and access methods. The relational model like all other models
consists of three basic components:
• a set of domains
• a set of relations operation on relations
• integrity rules

In this unit, we first provide the formal definition of a relational data model. The basic
operations of relational algebra are discussed in Unit 4.

In general, we say that a relation defined over n domains has a degree n or is n-ary. The
elements of this set are n-tuples. We shall distinguish between the definition of a relation
and the relation itself. We shall say that the definition of a relation gives a name to the
relation and specifies the components over which it is defined. These components are
referred to as relation attributes or attributes for short. An attribute has a domain associated
with it from which it takes on values. The relation itself, on the other hand, is the set of tuples
which constitute it at a given instance of time.

For example, a statement which says that a relation Supplier is built over attributes S#, P#,
SCITY having domains integer, character string respectively is the definition of the relation
Supplier. The relation itself is shown below. It must be noted at the time the definition of a
relation is just given, a relation with no tuples in it, i.e. a null relation is just given, a relation
with no tuples in it, i.e. a null relation, is created.

Unit 3: Database Models and Implementation 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Supplier

S# P# SCITY

10 1 BANGALORE

10 2 BANGALORE

10 3 BANGALORE

11 1 BOMBAY

11 2 BOMBAY

2.2 Hierarchical Model


The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent
and child data segments. This structure implies that a record can have repeating information,
generally in the child data [Link] is in a series of records, which has a set of field
values attached to it. Itcollects all the instances of a specific record together as a record type.
These record types are the equivalent of tables in the relational model, with the individual
records being the equivalent of rows. To create links between these record types, the
hierarchical model uses Parent-Child Relationships. These are a 1:N mapping between
record types. For example, an organization might store information about an employee, such
as name, employee number, department, and salary. The organization might also store
information about an employee's children, such as name and date of birth. The employee
and children data form a hierarchy, where the employee data represents the parent
segment and the children data represents the child segment. If an employee has three
children, then there would be three child segments associated with one employee segment.
In a hierarchical database, the parent-child relationship is one to many. This restricts a child
segment to having only one parent segment. Hierarchical DBMSs were popular from the late
1960s, with the introduction of IBM's InformationManagement System (IMS) DBMS, through
the 1970s.

Unit 3: Database Models and Implementation 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.3 Network Data Model


The popularity of the network data model coincided with the popularity of thehierarchical
data model. Some data were more naturally modeled with morethan one parent per child.
So, the network model permitted the modeling of many-to-many relationships in data. In
1971, the Conference on Data Systems Languages (CODASYL) formally defined the network
model. The basic data modeling constructs in the network model is the set construct. A set
consists of an owner record type, a set name, and a member record type. A member record
type can have that role in more than one set; hence the multi-parent concept is supported.
An owner record type can also be a member or owner in another set. The data model is a
simple network, and link and intersection record types may exist, as well as sets between
them. Thus, the complete network of relationships is represented by several pair-wise sets;
in each set, some (one) record type is the owner (at the tail of the network arrow) and one
or more record types are members (at the head of the relationship arrow). Usually, a set
defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based
on mathematical set theory.

2.4 Object/Relational Model


Object/relational database management systems (ORDBMSs) add new object storage
capabilities to the relational systems at the core of modern information systems. These new
facilities integrate the management of traditional fielded data, complex objects such as time-
series and geospatial data, and diverse binary media such as audio, video, images, and
[Link] encapsulating methods with data structures, and ORDBMS server can execute
complex analytical and data manipulation operations to search and transform multimedia
and other complex objects.

As an evolutionary technology, the object/relational (OR) approach hasinherited the robust


transaction and performance-management features of its relational ancestor and the
flexibility of its object-oriented cousin.

Database designers can work with familiar tabular structures and data definition languages
(DDLs) while assimilating new object-management possibilities. Query and procedural
languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages,
and ODBC, JDBC, and proprietary call interfaces are all extensions of RDBMS languages and

Unit 3: Database Models and Implementation 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and
Oracle.

2.5 Object-Oriented Model


Object DBMSs add database functionality to object programming languages. They bring
much more than persistent storage of programming language objects. Object DBMSs extend
the semantics of the C++, Smalltalk, and Java object programming languages to provide full-
featured database programming capability, while retaining native language compatibility. A
major benefit of this approach is the unification of the application and database development
into a seamless data model and language environment. As a result, applications require less
code, use more natural data modeling, and code bases are easier to maintain. Object
developers can write complete database applications with a modest amount of additional
effort.

The object-oriented database (OODB) paradigm is the combination of object-oriented


programming language (OOPL) systems and persistent systems. The power of the OODB
comes from the seamless treatment of both persistent data, as found in databases, and
transient data, as found in executing programs.

In contrast to a relational DBMS where a complex data structure must be flattened out to fit
into tables or joined together from those tables to form the in-memory structure, object
DBMSs have no performance overhead to storeor retrieve a web or hierarchy of interrelated
objects. This one-to-one mapping of object programming language objects to database
objects has two benefits over other storage approaches: it provides higher performance
management of objects, and it enables better management of the complex interrelationships
between objects. This makes object DBMSs better suited to support applications such as
financial portfolio risk analysis systems, telecommunications service applications, WWW
(World Wide Web) document structures, design and manufacturing systems, and hospital
patient record systems, which have complex relationships between data.

Unit 3: Database Models and Implementation 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 1

1. ________ is a set of principles concerned with determining whichstates


are valid for a database.
2. In Primitive data models approach, objects are represented by
___________structures grouped in file-structures.
3. In the __________data model the database is represented as agroup of
related tables.
4. Each tuple in a relation must be ________; that is, there can be no
duplicates.
5. A tuple is an ___________set of values.
6. In relational integrity constraints, there are main types ofconstraints.
7. If a relation has several candidate keys, one is chosen arbitrarily to bethe
_________key.
8. The set of permissible values for each attribute is called the
______________for that attribute.
9. A single attribute or combination of attributes whose values uniquely
identify the ______of the relation.
10. The hierarchical data model organizes data in a ________________ structure.
11. In a hierarchical database the parent-child relationship is _______________ .
12. The popularity of the network data model coincided with the popularityof
the data model.
13. Object DBMSs add database functionality to __________programming
languages.
14. A major benefit of Object- Oriented data model approach is the
_____________of the application.

Unit 3: Database Models and Implementation 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. ENTITY-RELATIONSHIP MODEL
There are three basic notions that the E-R data model employs: entity sets, relationship sets,
and attributes. Consider an example COMPANY Database, the COMPANY is organized into
DEPARTMENTs. Each department has a name, number, and employee who manages the
department. We keep track of the start date of the department manager. Each department
controlsa number of PROJECTs. Each project has a name, number and is located at a single
location.

We store each EMPLOYEE’s social security number, address, salary, sex, and birth date. Each
employee works for one department but may work on several projects.

We keep track of the number of hours per week that an employee currently works on each
project. We also keep track of the direct supervisor of each employee.

Each employee may have a number of DEPENDENTs. For eachdependent, we keep track of
their name, sex, birth date, and relationship to the employee.

An entity is a “thing” or object in the real world that is distinguishable fromall other
objects. For example, each employee in an enterprise or companyis an entity. An entity has
a set of properties, and the values for some set of properties may uniquely identify an entity.

An entity set is a set of entities of the same type that share the same properties or attributes.
Attributes are properties used to describe an [Link] example, an EMPLOYEE entity may
have a Name, SSN, Address, Sex, BirthDate. A specific entity will have a value for each of its
attributes. For example a specific employee entity may have Name='John Smith',
SSN='123456789', Address ='731, Fondren, Houston, TX', Sex='M', BirthDate='09-JAN-55‘.
Each attribute has a value set (or data type) associated with it – e.g. integer, string, subrange,
enumerated type.

The attribute used in the E-R model can be characterized by the following attribute types.
• Simple
– Each entity has a single atomic value for the attribute. For example, SSN or Sex.
• Composite
– The attribute may be composed of several components. For example, Address
(Apt#, House#, Street, City, State, ZipCode, Country) or Name (FirstName,

Unit 3: Database Models and Implementation 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

MiddleName, LastName). The composition may form a hierarchy where some


components are themselves composite.
• Single valued
– An entity has a single value for that attribute. For example, the loan- number in the
loan entity.
• Multi-valued
– An entity may have multiple values for that attribute. For example, the Color of a
CAR or t h e Previous Degrees of a STUDENT. Denoted as {Color} or {Previous
Degrees}.
• Null attribute
– A null value is used when an attribute does not have a value or valueexists but not
known at present. For an instance, an employee has no dependent. In another
example where the task completed status is not known at present, the value for
this attribute can be kept as null.
• Derived attribute
– The value for this type of attribute can be derived from the values of other related
attributes or entities. For example - value of the attribute “age” can be derived if
we know the value of attribute “dob” (date of birth) as age= current_date – dob.
Another example – value of the attribute “loe” (length of employment) can be
derived if know the value of attribute “doj” (date of joining) as loe= current_date
–doj.
– For instance, account number and balance. Here balance attribute isderived from
the account number attribute.

An attribute of an entity type for which each entity must have a unique value, is called a key
attribute of the entity type. For example, SSN is the key attribute of the EMPLOYEE relation
schema. A key attribute may be composite. For example, Vehicle Tag Number is a key of the
CAR entity type with components (Number, State).

Unit 3: Database Models and Implementation 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3.1 Modeling using E-R Diagrams


A model is an abstract form of any system or process that hides the unnecessary details while
highlighting those details important to the application. We have noticed the model of huge
campuses or buildings which help to visualize the structure before they are built. On similar
lines, we can also model our software applications before they are developed. Modeling the
databases using E-R diagrams is called as E-R modeling. Thistechnique is also called as Top-
Down approach because one need not identify all the attributes to model the system using
this technique.

Steps in E-R Modeling


Usually, the following six steps are followed to generate E-R Models.

• Identify the entities: Look for general nouns in the requirement specification
document which are of business interest to business users.
• Find relationships: Identify the natural relationship and their cardinalities between
the entities.
• Identify the key attributes for every entity: Identify the attribute or set of attributes
which can identify the instance of the entity uniquely.
• Identify other relevant attributes: Identify other attributes which are of interest to
business users and which they want to store the informationin the database.
• Complete E-R diagram: Draw a complete E-R diagram with all attributes, including
the primary key.
• Review your results with your business users: Look at the list of attributes
associated with each entity to see if anything has been omitted.

3.2 Notation used in E-R Model


An entity-relationship model is a tool for analyzing the semantic features of an application
that are independent of events. This approach includes a graphical notation, which depicts
entity classes as rectangles, relationships as diamonds, and attributes as circles or ovals. For
complex situations, a partial entity-relationship diagram may be used to present a summary
of the entities and relationships that do not include the details of the attributes.

The entity-relationship diagram provides a convenient method for visualizing the


interrelationships among entities in a given application. This tool has proven to be useful in

Unit 3: Database Models and Implementation 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

making the transition from an information application description to a formal database


schema. The entity-relationship model is used for describing the conceptual schema of an
enterprise without attention to the efficiency of the physical database design. The entity-
relationship diagrams are then turned into a logical schema in which the database is actually
implemented. Figure 3.0 shown below shows the diagram used for entity-relationship.

Fig 3.0: Diagram used in entity-relationship.

3.3 Relationships and Relationship Types


A relationship is an association among several entities. A relationship set is a set of the
same type. Formally, it is a mathematical relation on n ≥ 2 entity sets. If E1, E2,……. En are
entity sets, then a relationship set R is a subset of
{ (e1, e2,……… en) | e1 ε E1, e2 ε E2 , …..en ε En }
where (e1, e2,……… en) is a relationship.

A relationship relates to two or more distinct entities with a specific meaning. For example,
EMPLOYEE John Smith works on the Product X PROJECT orEMPLOYEE Franklin manages the
Research DEPARTMENT. Relationships of the same type are grouped or typed into a
relationship type. For example,the WORKS_ON relationship type in which EMPLOYEEs and

Unit 3: Database Models and Implementation 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs and


DEPARTMENTs participate. The degree of a relationship type is the number of participating
entity types. Both MANAGES and WORKS_ON are binary relationships. More than one
relationship type can exist with the same participating entity types. For example,
MANAGES and WORKS_FOR are distinct relationships between EMPLOYEE and
DEPARTMENT, but with different meanings and different relationship instances.

Figure 3.1 shows an example relationship of the WORKS_FOR relationship between


EMPLOYEE and DEPARTMENT

EMPLOYEE WORKS_FOR DEPARTMENT

e1 R1 d1

e2 R2
d2
e3 R3
e4 d3
R4
e5
R5
e6
R6

Fig 3.1: Relationship of WORKS_FOR relationship

Weak Entity Types


An entity that does not have a key attribute is called weak entity types. A weak entity must
participate in an identifying relationship type with an owner, or identifying entity type.
Entities are identified by the combination of:

– A partial key of the weak entity type


– The particular entity they are related to in the identifying entity type

Unit 3: Database Models and Implementation 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Example:
Suppose that a DEPENDENT entity is identified by the dependent’s first name and birthdate,
and the specific EMPLOYEE that the dependent is related to. DEPENDENT is a weak entity
type with EMPLOYEE as its identifying entity type via the identifying relationship type
DEPENDENT_OF

Relationships Types
An E-R enterprise schema may define certain constraints to which the contents of a database
must conform. Mapping cardinalities, or cardinality ratios, express the number of entities
to which another entity can be associated via a relationship set.

For a binary relationship set R between entity sets A and B, the mapping cardinality must be
one of the following:
❖ One-to-one (1:1): An entity in A is associated with at most one entity in B, and an entity
B is associated with at most one entity in A. See Figure 3.2 (a).
❖ One-to-many (1:N): An entity in A is associated with any number of entities in B. An
entity in B, however, can be associated with at most one entity in A. See Figure 3.2 (b).
❖ Many-to-one (N:1): An entity in A is associated with at most one entityin B. An entity
in B, however, can be associated with any number of entities in A. See Figure 3.2 (c).
❖ Many-to-many (M:N): An entity in A is associated with any number of entities in B. An
entity in B is associated with any number of entities in A. See Figure 3.2 (d).

A B
A B

B1 A1 B1
A1

B2 A2 B2
A2

B3 A3 B3
A3

(a) (b)

Fig 3.2: Relationships types a) One-to-One b) One-to-many

Unit 3: Database Models and Implementation 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

A1 B1
A1 B1

A2 B2
A2 B2

A3 B3
A3 B3

A B A B

(c) (d)

Fig 3.3: Relationships types c) Many-to-One d) Many-to-many

Examples for Relationships


1. The department offers multiple courses and each course belongs to only one
department, hence cardinality between department and course is one to many.

Offers Course
Department

2. One course is taken up by multiple students and one student enrolls for multiple
courses, hence the relationship is many-to-many.

Enrolled Student
Course
by

3. Each department has one “Head of Department” (HOD), hence relationis one-to-one.

Department Headed HOD


by

Unit 3: Database Models and Implementation 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 2

15. There are ____________basic notions that E-R data model employs.
16. An ______________is a “thing” or object in the real world that is
distinguishable from all other objects.
17. An entity set is a set of entities of the same type that share the same
properties or _____________.
18. Attributes are properties used to describe an ___________.
19. Each entity has a single atomic value for the attribute is called _________
attribute.
20. Address is an example for _________attribute.
21. The value for ______type of attribute can be derived from thevalues of
other related attributes or entities.
22. An attribute of an entity type for which each entity must have a unique
value is called a of the entity type.
23. One of the following is a demerit of the ER modeling.
a. Gives a higher level abstraction of the system.
b. Can be generalized and specialized based on needs.
c. Physical design derived from E-R Model may have some amount of
ambiguities or inconsistency.
d. Intuitive and helps in physical database creation
24. How many basics are there in E-R data model?
a. Three
b. Four
c. Five
d. Six
25. An is a “thing” or object in the real world that is
distinguishable from all other objects.
a. Relation
b. Entity
c. Attribute
d. Simple attribute

Unit 3: Database Models and Implementation 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

26. Pick out the composite attribute from the list of attributes
a. Sex
b. Address
c. SSN
d. Department number
27. ‘Color of the car and degrees of students’ are examples for the .
a. Null attribute
b. Derived attribute
c. Single valued
d. Multi valued
28. Identifying the natural relationship and their cardinalities between the
entities is a step of .
a. Identify the entities
b. Find relationships
c. Identify the key attributes for every entity
d. Identify other relevant attributes
29. ER diagram includes a graphical notation, which depicts entity classesas
.
a. Rectangles
b. Ovals
c. Diamonds
d. Circles
30. A is an association among several entities.
a. Relationship
b. Key
c. Partial key
d. Entity
31. An entity that does not have a key attribute is called
a. Weak entity types
b. Entity Types
c. Null attribute
d. Derived attribute

Unit 3: Database Models and Implementation 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. ASSOCIATIVE DATABASE MODEL


The Associative model was devised by Simon Williams of Lazy Software and is said to be
built upon current research with some unique additions. If you are familiar with applying
XML to data you will feel comfortable with most of the underlying concepts. Both XML and
Associative databases havea common route in Semantic Databases and Topic Maps. Most of
the terms and concepts used will be found in references to Binary Relational Databases.

An Associative database has two fundamental data structures. There is aset of "Items" and
a set of "Links" that connect them together. In the "Item" structure entries have a unique
identifier, a name, and a type. Each entry in the "links" structure also has a unique identifier
together with the identifiers for the relevant "source", "verb" and "target" (the subject, verb,
and object from our sentences). This can be illustrated with the following two diagrams. For
clarity, the question of item type has been ignored in this illustration.

Items
Identifier Name

12 Red
41 Is a
76 Colour
14 Mary
81 Vegetarian
43 Eats
82 Plants
15 Ski Lessons
39 Start at
83 08:00
42 On
85 Sunday

Links
Identifier Source Verb Target
101 12 41 76
103 14 41 81
124 81 43 82
105 15 39 83
107 105 42 85

Unit 3: Database Models and Implementation 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The last entry in the Links structure (107) shows how another entry (identifier 105) has
become the Source for that entry. The two entries show how you could store the "ski lesson"
sentence within the Links structure. Readers with some familiarity with Microsoft Access
will have seen the "AutoNumber" facility, that can be used to create a unique numeric
identifierfor any row in a given table. Here we see an identifier being assigned on a database-
wide basis. The number itself has no significance, it is simply required to be unique.

The Associative model structure is economical with storage space as there is no need to hold
available "spaces" for data that is not available, even if itis a normal part of a given data set.
This contrasts with relational databases.A relational database stores a minimum of a single
"null" byte for missing data items in any given row. Some relational databases reserve the
maximum space for a given column in every row. The Associative database makes the
storage of "custom" data for different users, or for other varying needs, straightforward and
"inexpensive" in terms of maintenance or network resources. If there is a need to store
different data about, say, different customers or customer groups in different countries, then
an Associative database can manage this more efficiently than a relational database.

The Associative model differentiates between what it calls Entities and Associations. An
entity is defined as being discrete and having an independent existence. An association is
something that depends upon one or more other things. An example may help with this. A
person or companyis an entity while a supplier or a customer or an employee is association
– their existence depends upon the role being played at any one time. Indeed it is possible
for an Entity to have multiple business roles simultaneously, each being recorded as an
association. If circumstances change, one or more of the associations may die away but the
entity would endure. The difference may seem a little moot at first but is designed to simplify
rather than complicate the data model.

An Associative database is comprised of a number of "chapters" and a user's view of the


content of a database is controlled by his or her "Profile".A Profile is a list of Chapters. The
database designer consigns the various elements to specific Chapters, and the user Profile
restricts access to the relevant Chapters for a given user. If some links exist between items
in chapters inside and outside of a particular user Profile, then those links are not visible to
that user. The combination of Chapters and Profiles cansimplify the tailoring of the database

Unit 3: Database Models and Implementation 24


DCA2102: Database Management System Manipal University Jaipur (MUJ)

to particular users or subject groups. Data that is relevant to one user group could be
invisible to another, and indeed may be replaced by an alternate data set.

The concept of a record is missing from the Associative model. To assemble all of the current
information on something as complex as (say) a sales order, the data storage will need to be
re-visited many times. This is a potential disadvantage, although it should be recognized
that a well a normalized relational database would probably also require a number of
data store reads to establish a similar data set. Some rough calculations based upon a small
personal sample would suggest that the Associative database would require more than four
times as many data reads as a relational database. If the process of reading a sequence of
links can be optimized, then this may go some way to minimizing the difference as
experienced by the user.

Self-Assessment Questions – 3

32. An Associative database has _____________ fundamental data structures.

33. In the "Item" structure entries have a ______________ identifier, a name and a
type.
34. A relational database stores a minimum of a single byte formissing
data items in any given row.
35. A ___________is a list of Chapters.
36. The combination of Chapters and Profiles can simplify the ofthe
database to particular users.
37. The concept of a is missing from the Associative model.

Unit 3: Database Models and Implementation 25


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. SUMMARY
In this unit, we reviewed three major traditional data models used in current DBMSs. These
three models are primitive, classic and semantic data models. In the Primitive data, models
approach objects are represented by record structures grouped in file structures. The main
operations available are read and write operations over records. Classic data models are the
hierarchical, network and relational data models.

The hierarchical data model is an extension of the primitive data model discussed above. The
network is an extension of the hierarchical approach. The relational data model is a
fundamental departure from the hierarchical and network approaches. The main problem
with classic data models such as the relational data model is that they maintain a
fundamental record- orientation.

The hierarchical model evolved from the file-based system. It uses tree-type data structure
to represent the relationship among records. The hierarchical data model restricts each
record type to only t h e parent record type. Eachparent record type can have any
number of children record types.

In a network model, one child record may have more than one parent node. A network can
be converted into one or more trees by introducing redundant nodes.

The relational model is based on a collection of tables. A table is also called a relation. A tree
or network structure can be converted into a relational structure by separating each node in
the data structure into a relation.

The entity-relationship diagrams are useful in representing the relationship among entities.
They help in logical database design. We have also presented implementation schemes of
each of the traditional database models.

Unit 3: Database Models and Implementation 26


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. TERMINAL QUESTIONS
1. Distinguish between three major types of the architectural data model.
2. Describe the differences in meaning between the terms relation and relation
schema.
3. Explain the distinctions among the terms primary key, candidate key, and superkey.
4. Construct an E-R diagram for a car-insurance company whose customers own one or
more cars each. Each car has associated with it zero to any number of recorded
accidents.
5. Construct an E-R diagram for a hospital with a set of patients and a set of medical
doctors. Associate with each patient a log of the various tests and examinations
conducted.
6. Explain the difference between a weak and a strong entity set.
7. We can convert any weak entity set to a strong entity set by simply adding appropriate
attributes. Why, then, do we have weak entity sets? Self Assessment Questions

Unit 3: Database Models and Implementation 27


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. ANSWERS
Self-Assessment Questions
1. Data integrity
2. Record
3. Relational
4. Unique
5. Ordered
6. Three
7. Primary
8. Domain
9. Tuples
10. Tree
11. one to many
12. hierarchical
13. object
14. unification
15. Three
16. Entity
17. Attributes
18. Entity
19. Simple
20. Composite
21. Derived
22. key attribute
23. Physical design derived from E-R Model may have some amount ofambiguities or
inconsistencies.
24. Three
25. Entity
26. Address
27. Multi-valued
28. Find relationships
29. Rectangles

Unit 3: Database Models and Implementation 28


DCA2102: Database Management System Manipal University Jaipur (MUJ)

30. Relationship
31. Week-entity types
32. Two
33. Unique
34. "null"
35. Profile
36. Tailoring
37. Record

Terminal Questions
1. In terms of three generations, the architectural data models are:
(i) Primitive data models (ii) Classic data models (iii) Semantic data models. These
three data models can be distinguished in terms ofobject representation, hierarchical
network formation, and record-orientation. (Refer section 2)
2. A relation is a collection of tuples, each of which contains values for a fixed number of
attributes whereas the Relation Schema R, denoted byR (A1, A2, ..... An), is made up of
a relation name R and is a list of attributes A1, A2, An. (Refer section 2.1)
3. A different set of attributes which are able to identify any row in thedatabase is known
as super key. And minimal super key is termed as candidate key i.e. among set of super
keys one with minimum number ofattributes. If a relation has several candidate keys,
one is chosen arbitrarily to be the primary key. (Refer section 2.1)
4. To construct E-R diagram, identify car as entities, find naturalrelationships between
car and customer, identify the key attributes for every entity, identify other relevant
attributes related to car, customer, car-insurance to complete E-R diagram. Also review
results. (Refer section 3.1 for detail)
5. Follow the same steps as shown in Q. No. 4 and refer section 3.1 for detail)
6. A strong entity is independent of other entities and can exist on its own. A weak entity
is dependent on one or more other entities in order for it toexist. (Refer section 3.3 for
detail)
7. A weak entity is required because it is dependent on one or more other entities in order
for it to exist and it is often prevalent in database. (Refersection 3.3 for detail)

Unit 3: Database Models and Implementation 29


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 4: File Organization for Conventional DBMS 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 4
File Organization for Conventional DBMS
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction
3
1.1 Objectives
2 Storage Devices and its Characteristics 1 1
2.1 Magnetic Disks 2, 3
2.2 Physical Characteristics of Disks 4 - 11
2.3 Performance Measures of Disks

2.4 Optimization of Disk-Block Access


3 File Organization 4
3.1 Fixed-Length Records 5, 6, 7
12 - 16
3.2 Variable-Length Records 8
3.3 Organization of Records in Files
4 Sequential file Organization 9, 10 17 - 18

Indexed Sequential Access Method (ISAM) 11, 12, 13, 14, 15,
5 19 - 24
16, 17, 18
6 Virtual Storage Access Method (VSAM) 19 25 - 26
7 Summary 27
8 Terminal Questions 27
9 Answers 28

Unit 4: File Organization for Conventional DBMS 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
Although a database system provides a high-level view of data, ultimately data have to be
stored as bits on one or more storage devices. A vast majority of databases today store data
on a magnetic disk and fetch data into main space memory for processing, or copy data onto
tapes and other backup devices for archival storage. The physical characteristics of storage
devices play a major role in the way data are stored, in particular because access to a random
piece of data on disk is much slower than memory access: Disk access takes tens of
milliseconds, whereas memory access takes a tenth of a microsecond.

1.1 Objectives:
By the end of unit 4, the learners should be able to understand:

❖ Different Storage devices and their characteristics


❖ File Organization and records in file organizations
❖ Organization of Sequential files
❖ Indexed Sequential Access Method (ISAM)
❖ Virtual Storage Access Method (VSAM)

Unit 4: File Organization for Conventional DBMS 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. STORAGE DEVICES AND ITS CHARACTERISTICS


Several types of data storage exist in most computer systems. These storage media are
classified by the speed with which data can be accessed,by the cost per unit of data to buy
the medium, and by the medium’s reliability. Among the media typically available are these:
• Cache. The cache is the fastest and most costly form of storage after register. Register
has less storage capacity than cache. Cache memory is small; its use is managed by the
computer system hardware. We shallnot be concerned about managing cache storage
in the database system.
• Main memory. The storage medium used for data that are available to be operated on
the main memory. The general-purpose machine instructions operate on the main
memory. Although main memory may contain many megabytes of data or even
gigabytes of data in large server systems, it is generally too small (or too expensive) for
storing theentire database. The contents of the main memory are usually lost if a power
failure or system crash occurs.
• Flash memory. Also known as electrically erasable programmable read-only memory
(EEPROM), flash memory differs from main memory in that data survives power
failure. Reading data from flash memory takes less than 100 nanoseconds (a
nanosecond is 1/1000 of a microsecond), which is roughly as fast as reading data from
the main memory.
• Magnetic-disk storage. The primary medium for the long-term online storage of data
is the magnetic disk. Usually, the entire database is stored on a magnetic disk. The
system must move the data from the disk to the main memory so that they can be
accessed. After the system has performed the designated operations, the data that have
been modified must be written to disk.
• Optical storage. The most popular forms of optical storage are the compact disks (CD),
which can hold about 640 megabytes of data, and the digital video disk (DVD) which
can hold 4.7 or 8.5 gigabytes of data per side of the disk (or up to 17 gigabytes on a two-
sided disk). Data arestored optically on a disk and are read by a laser. The optical disks
used in read-only compact disks (CD-ROM) or read-only digital video disk (DVD-ROM)
cannot be written, but are supplied with data prerecorded.

Unit 4: File Organization for Conventional DBMS 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

• Tape storage. Tape storage is used primarily for backup and archival data. Although
magnetic tape is much cheaper than disks, access todata is much slower, because
the tape must be accessed sequentially from the beginning. For this reason, tape storage
is referred to assequential-access storage. In contrast, disk storage is referred to as
direct-access storage because it is possible to read data from any location on a disk.

Fig 4.1: Storage Hierarchy

The fastest storage media – for example, cache and the main memory – are referred to as
primary storage. The media in the next level in the hierarchy – for example, magnetic
disks – are referred to as secondary storage or online storage. The media in the lowest
level in the hierarchy –for example, magnetic tape and optical disk jukeboxes – are referred
to as tertiary storage, or offline storage.

In addition to the speed and cost of the various storage systems, there is also the issue of
storage volatility. Volatile storage loses its contents when the power to the device is
removed. In the hierarchy shown in Figure 4.1,the storage systems from main memory up
are volatile, whereas the storagesystems below main memory are nonvolatile. In the absence
of expensive battery and generator backup systems, data must be written to nonvolatile
storage for safekeeping.

Unit 4: File Organization for Conventional DBMS 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.1 Magnetic Disks


Magnetic disks provide the bulk of secondary storage for modern computer systems. Disk
capacities have been growing at over 50 percent per year, but the storage requirements of
large applications have also been growing very fast, in some cases even faster than the
growth rate of disk capacities. A large database may require hundreds of disks.

2.2 Physical Characteristics of Disks


Physically, disks are relatively simple (Figure 4.2). Each disk platter has a flat circular shape.
Its two surfaces are covered with a magnetic material, and information is recorded on the
surfaces. Platters are made from rigid metal or glass and are covered (usually on both sides)
with magnetic recording material. We call such magnetic disks hard disks, to distinguish
them from floppy disks, which are made from a flexible material.

The disk surface is logically divided into tracks, which are subdivided into sectors. A sector
is the smallest unit of information that can be read from or written to the disk. The read-
write head stores information on a sector magnetically as reversals of the direction of
magnetization of the magnetic material. There may be hundreds of concentric tracks on a
disk surface, containing thousands of sectors.

Fig 4.2: Magnetic Disks

Each side of a platter of a disk has a read-write head, which moves across the platter to access
different tracks. A disk typically contains many platters, and the read-write heads of all the
tracks are mounted on a single assemblycalled a disk arm, and move together. The disk

Unit 4: File Organization for Conventional DBMS 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

platters mounted on a spindle and the heads mounted on a disk arm are together known
ashead–disk assemblies.

A fixed-head disk has a separate head for each track. This arrangement allows the computer
to switch from track to track quickly, without having to move the head assembly, but
because of the large number of heads, the device is extremely expensive. Some disk systems
have multiple disk arms, allowing more than one track on the same platter to be accessed at
a time. Fixed-head disks and multiple-arm disks were used in high-performance mainframe
systems, but are no longer in production.

A disk controller interfaces between the computer system and the actual hardware of the
disk drive. It accepts high-level commands to read or writea sector and initiates actions,
such as moving the disk arm to the right track and reading or writing the data. Disk
controllers also attach checksums to each sector that is written; the checksum is computed
from the data written to the sector. When the sector is read back, the controller computes
the checksum again from the retrieved data and compares it with the stored checksum; if the
data are corrupted, with a high probability the newly computed checksum will not match the
stored checksum. If such an error occurs, the controller will retry the read several times; if
the error continues to occur, the controller will signal a read failure. A disk controller
interface is shown in Figure 4.3.

Fig 4.3: Disk controller interface

In the storage area network (SAN) architecture, large numbers of disks areconnected by a
high-speed network to a number of server computers. The disks are usually organized locally
using redundant arrays of independentdisks (RAID) storage organizations, but the RAID
organization may be hidden from the server computers: the disk subsystems pretend each
RAID system is a very large and very reliable disk.

Unit 4: File Organization for Conventional DBMS 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.3 Performance Measures of Disks


The main measures of the qualities of a disk are capacity, access time, data transfer rate, and
reliability.

Access time is the time from when a read or write request is issued to when data transfer
begins. To access (that is, to read or write) data on a given sector of a disk, the arm first must
move, so that it is positioned over the correct track, and then must wait for the sector to
appear under it as the disk rotates. The time for repositioning the arm is called the seek
time, andit increases with the distance that the arm must move. Typical seek times range
from 2 to 30 milliseconds, depending on how far the track is from the initial arm position.
Smaller disks tend to have lower seek time since the head has to travel a smaller distance.

The average seek time is the average of the seek times, measured over a sequence of
(uniformly distributed) random requests. If all tracks have the same number of sectors, and
we disregard the time required for the head to start moving and to stop moving, we can show
that the average seek time isone-third of the worst case seek time. Taking these factors into
account, the average seek time is around one-half of the maximum seek time. Average seek
time currently ranges between 4 milliseconds and 10 milliseconds, depending on the disk
model.

Once the seek time has started, the time spent waiting for the sector to be accessed to appear
under the head is called the rotational latency time. On an average, one-half of a rotation
of the disk is required for the beginning of the desired sector to appear under the head. Thus,
the average latency time of the disk is one-half the time for a full rotation of thedisk.

The access time is then the sum of the seek time and the latency, and ranges from 8 to 20
milliseconds. Once the first sector of the data to be accessed has come under the head, the
data transfer begins. The data- transfer rate is the rate at which data can be retrieved from
or stored on the disk.

The final commonly used measure of a disk is the meantime to failure (MTTF), which is a
measure of the reliability of the disk. The mean time to failure of a disk (or of any other
system) is the amount of time that, on average, we can expect the system to run continuously
without any failure.

Unit 4: File Organization for Conventional DBMS 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.4 Optimization of Disk-Block Access


Requests for disk I/O are generated both by the file system and by the virtual memory
manager found in most operating systems. Each request specifies the address on the disk to
be referenced; that address is in the form of a block number. A block is a contiguous sequence
of sectors from a single track of one platter. Block sizes range from 512 bytes to several
kilobytes. Data is transferred between disk and main memory in units of blocks. The lower
levels of the file-system manager convert block addressesinto the hardware-level cylinder,
surface, and sector number.

Since access to data on disk is several orders of magnitude slower than access to data in main
memory, equipment designers have focused on techniques for improving the speed of access
to blocks on disk.

• Scheduling: If several blocks from a cylinder need to be transferredfrom disk to


main memory, we may be able to save access time by requesting the blocks in the order
in which they will pass under the heads. If the desired blocks are on different cylinders,
it is advantageousto request the blocks in an order that minimizes disk-arm movement.
Disk-arm-scheduling algorithms attempt to order accesses to tracks in a fashion that
increases the number of accesses that can be processed. A commonly used algorithm is
the elevator algorithm, which works in the same way many elevators do.

• File organization: To reduce block-access time, we can organize blocks on disk in a


way that corresponds closely to the way we expect data to be accessed. For example, if
we expect a file to be accessed sequentially, then we should ideally keep all the blocks
of the file sequentially on adjacent cylinders. Older operating systems, such as theIBM
mainframe operating systems, provided programmers with fine control on placement
of files, allowing a programmer to reserve a set of cylinders for storing a file. However,
this control places a burden on the programmer or system administrator to decide, for
example, how many cylinders to allocate for a file, and may require costly
reorganization if data is inserted into or deleted from the file.

• Nonvolatile write buffers: Since the contents of the main memory are lost in a power
failure, information about database updates has to be recorded on the disk to survive
possible system crashes. For this reason, the performance of update-intensive

Unit 4: File Organization for Conventional DBMS 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

database applications, such astransaction-processing systems, is heavily dependent on


the speed of disk writes.

• Log disk: Another approach to reducing write latencies is to use a log disk – that is, a
disk devoted to writing a sequential log – in much the same way as a nonvolatile RAM
buffer. All access to the log disk is sequential, essentially eliminating seek time and
several consecutive blocks can be written at once, making write to the log disk several
times faster than random writes. As before, the data have to be written to their actual
location on disk as well, but the log disk can do the write later, without the database
system having to wait for the write to be completed. Furthermore, the log disk can
reorder the writes to minimize disk arm movement. If the system crashes before some
writes to the actual disk location have been completed, when the system comes back, it
reads the log disk to find those writes that had not been completed, and carries them
out then. File systems that support log disks as above are called journaling file
systems.

Unit 4: File Organization for Conventional DBMS 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 1

1. The ______is the fastest and most costly form of storage.


2. The contents of main memory are usually _______if a power failure or
system crash occurs.
3. Flash memory differs from main memory in that data survive _____.
4. The primary medium for the long-term on-line storage of data is the
____________ .
5. The CD and DVD come under ___________memory storage.
6. Tape storage is referred to as __________access storage.
7. Each disk platter has a flat _________shape.
8. A ___________interfaces between the computer system and the actual
hardware of the disk drive.
9. In the __________architecture, large numbers of disks areconnected by a
high-speed network to a number of server computers.
10. ___________time is the time from when a read or write request is issuedto
when data transfer begins.
11. The _____is the average of the seek times, measured over asequence of
(uniformly distributed) random requests.
12. A ___________is a contiguous sequence of sectors from a single trackof one
platter.
13. A commonly used algorithm in the scheduling is ______.

Unit 4: File Organization for Conventional DBMS 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. FILE ORGANIZATION
A file is organized logically as a sequence of records. These records are mapped onto disk
blocks. Files are provided as a basic construct in operating systems, so we shall assume the
existence of an underlying file system. We need to consider ways of representing logical data
models in terms of files.

Although blocks are of a fixed size determined by the physical properties of the disk and by
the operating system, record sizes vary. In a relational database, tuples of distinct relations
are generally of different sizes. A file containing account records is shown in Figure 4.4.

Fig 4.4: File containing account records

One approach to mapping the database to files is to use several files and store records of only
one fixed-length in any given file.

3.1 Fixed-Length Records


As an example, let us consider a file of account records for our bankdatabase. Each record of
this file is defined as:
type deposit = record
account-number : char(10);
branch-name : char (22);
balance : real;
end

If we assume that each character occupies 1 byte and that a real occupies8 bytes, our
account record is 40 bytes long. A simple approach is to usethe first 40 bytes for the first

Unit 4: File Organization for Conventional DBMS 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

record, the next 40 bytes for the second record,and so on (Figure 4.5). However, there are
two problems with this simple approach:
1. It is difficult to delete a record from this structure. The space occupiedby the record
to be deleted must be filled with some other record of the file, or we must have a way
of marking deleted records so that they can be ignored.
2. Unless the block size happens to be a multiple of 40 (which is unlikely), some records
will cross block boundaries. That is, part of the record will be stored in one block and
part in another. It would thus require two-block accesses to read or write such a record.

Fig 4.5: File of Figure 4.4, with record 2 deleted and all records moved.

When a record is deleted, we could move the record that came after it into the space formerly
occupied by the deleted record, and so on, until every record following the deleted record
has been moved ahead (Figure 4.5). Such an approach requires moving a large number of
records. It might be easier simply to move the final record of the file into the space occupied
by the deleted record (Figure 4.6).

Fig 4.6: File of Figure 4.4, with record 2 deleted and final record moved.

It is undesirable to move records to occupy the space freed by a deleted record since doing
so requires additional block accesses. Since insertions tend to be more frequent than
deletions, it is acceptable to leave open the space occupied by the deleted record and to wait
for a subsequent insertion before reusing the space. A simple marker on a deleted record is

Unit 4: File Organization for Conventional DBMS 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

not sufficient, since it is hard to find this available space when an insertion isbeing done.
Thus, we need to introduce an additional structure.

At the beginning of the file, we allocate a certain number of bytes as a file header.

The header will contain a variety of information about the file. For now, all we need to store
there is the address of the first record whose contents are deleted. We use this first record
to store the address of the second availablerecord, and so on. Intuitively, we can think of
these stored addresses as pointers, since they point to the location of a record.

Fig 4.7: File of Figure 4.4, with free list after deletion of records 1, 4 and 6

The deleted records thus form a linked list, which is often referred to as a free list. Figure
4.7 shows the file of Figure 4.4, with the free list, after records 1, 4, and 6 have been deleted.

3.2 Variable-Length Records


Variable-length records arise in database systems in several ways:
• Storage of multiple record types in a file
• Record types that allow variable lengths for one or more fields
• Record types that allow repeating fields

Different techniques for implementing variable-length records exist. For purposes of


illustration, we shall use one example to demonstrate the various implementation
techniques. We shall consider a different representation of the account information stored
in the file of Figure 4.4, in which we use one variable-length record for each branch name
and for all the account information for that branch. The format of the record is

Unit 4: File Organization for Conventional DBMS 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

type account-list = record


branch-name : char (22);
account-info : array [1 ..∞] of record;
account-number : char(10);
balance: real;
end
end

We define account-info as an array with an arbitrary number of elements. That is, the type
definition does not limit the number of elements in the array, although any actual record
will have a specific number of elements in its array. There is no limit on how large a record
can be (up to, of course,the size of the disk storage!).

The slotted-page structure appears in Figure 4.8. There is a header at thebeginning of each
block, containing the following information:
1. The number of record entries in the header
2. The end of free space in the block
3. An array whose entries contain the location and size of each record

Fig 4.8: Slotted-page structure

The actual records are allocated contiguously in the block, starting from the end of the block.
The free space in the block is contiguous, between thefinal entry in the header array, and
the first record. If a record is inserted, space is allocated for it at the end of free space, and
an entry containing its size and location is added to the header.

If a record is deleted, the space that it occupies is freed, and its entry is set to deleted (its size
is set to −1, for example). Further, the records in the block before the deleted record are
moved, so that the free space created by the deletion gets occupied, and all free space is again

Unit 4: File Organization for Conventional DBMS 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

between the final entry in the header array and the first record. The end-of-free-space
pointer in the header is appropriately updated as well.

3.3 Organization of Records in Files


So far, we have studied how records are represented in a file structure. An instance of a
relation is a set of records. Given a set of records, the next question is how to organize them
in a file. Several of the possible ways of organizing records in files are:
• Heap file organization: Any record can be placed anywhere in the file where there is
space for the record. There is no ordering of records. Typically, there is a single file for
each relation

• Sequential file organization: Records are stored in sequential order, according to the
value of a “search key” of each record.

• Hashing file organization: A hash function is computed on some attribute of each


record. The result of the hash function specifies in which block of the file the record
should be placed.

Unit 4: File Organization for Conventional DBMS 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. SEQUENTIAL FILE ORGANIZATION


A sequential file is designed for efficient processing of records in sorted order based on
some search key. A search key is any attribute or set of attributes; it need not be the
primary key, or even a super key. To permitfast retrieval of records in search-key order,
we chain together records by pointers. The pointer in each record points to the next record
in search-key order. Furthermore, to minimize the number of block accesses in sequential
file processing, we store records physically in search-key order, or as close to search-key
order as possible.

Figure 4.9 shows a sequential file of account records taken from our banking example. In
that example, the records are stored in search-key order, using branchname as the search
key.

The sequential file organization allows records to be read in sorted order; that can be useful
for display purposes, as well as for certain query-processing algorithms.

It is difficult, however, to maintain physical sequential order as records are inserted and
deleted, since it is costly to move many records as a result of asingle insertion or deletion.
We can manage deletion by using pointer chains, as we saw previously.

Fig 4.9: Sequential file for account records

Unit 4: File Organization for Conventional DBMS 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 4.10: Sequential file after insertion

For insertion, we apply the following rules:


1. Locate the record in the file that comes before the record to be inserted in search-key
order.
2. If there is a free record (that is, space left after a deletion) within the same block as this
record, insert the new record there. Otherwise, insert the new record in an overflow
block. In either case, adjust the pointersso as to chain together the records in search-
key order.

Figure 4.10 shows the file of Figure 4.9 after the insertion of the record (North Town,
A-888, 800). The structure in Figure 4.10 allows the fast insertion of new records, but
forces sequential file-processing applications to process records in an order that does
not match the physical order of the records.

Unit 4: File Organization for Conventional DBMS 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. INDEXED SEQUENTIAL ACCESS METHOD (ISAM)


To understand the motivation for the ISAM technique, it is useful to begin with a simple
sorted file. Consider a file of Students records sorted by GPA (grade point average) as shown
in Figure 4.11.

Students(sid: string, name: string, login: string, age: integer, gpa: real)

Fig 4.11: An Instance of the Students Relation

To answer a range selection such as “Find all students with a gpa higher than 3.0" we must
identify the first such student by doing a binary search of the file and then scan the file from
that point on. If the file is large, the initial binary search can be quite expensive; can we
improve upon this method?

One idea is to create a second file with one record per page in the original (data) file, of the
form (first key on-page, pointer to page), again sorted by the key attribute (which is gpa in
our example). The format of a page in the second index file is illustrated in Figure 4.12.

Fig 4.12: Format of an index page

We refer to pairs of the form <key, pointer> as entries. Notice that each index page contains
one pointer more than the number of key searches. Key serves as a separator for the
contents of the pages pointed to by the pointers to its left and right. This structure is
illustrated in Figure 4.13.

Unit 4: File Organization for Conventional DBMS 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 4.13: One level index structure

We can do a binary search of the index file to identify the page containing the first key (gpa)
value that satisfies the range selection (in our example, the first student with gpa over 3.0)
and follows the pointer to the page containing the first data record with that key value. We
can then scan the data file sequentially from that point on to retrieve other qualifying
records. This example uses the index to find the first data page containing a Students record
with gpa greater than 3.0, and the data file is scanned from that point on to retrieve other
such Student records.

Because the size of an entry in the index file (key value and page id) is likelyto be much
smaller than the size of a page, and only one such entry exists per page of the data file, the
index file is likely to be much smaller than the data file; thus, a binary search of the index file
is much faster than a binary search of the data file. However, a binary search of the index file
could still be fairly expensive, and the index file is typically still large enough to make inserts
and deletes expensive.

The potential large size of the index file motivates the ISAM idea: Why not apply the previous
step of building an auxiliary file on the index file and soon recursively until the final auxiliary
file fits on one page? This repeated construction of a one-level index leads to a tree structure
that is illustrated inFigure 4.14.

The data entries of the ISAM index are in the leaf pages of the tree and additional overflow
pages that are chained to some leaf page. In addition, some systems carefully organize the
layout of pages so that page boundaries correspond closely to the physical characteristics of
the underlying storage device. The ISAM structure is completely static (except for the
overflow pages, of which it is hoped, there will be few) and facilitates such low-level
optimizations.

Unit 4: File Organization for Conventional DBMS 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 4.14: ISAM Index Structure

Each tree node is a disk page, and all the data resides in the leaf pages. This corresponds to
an index that uses Alternative (1) for data entries, in terms of the alternatives; we can create
an index with Alternative (2) by storing the data records in a separate file and storing <key,
rid> pairs in the leaf pages of the ISAM index. When the file is created, all leaf pages are
allocated sequentially and sorted on the search key value. (If Alternatives (2) or (3) are
used, the data records are created and sorted before allocating the leaf pages of the ISAM
index.) The non-leaf level pages are then allocated. If there are several inserts to the file
subsequently, so that more entries are inserted into a leaf than will fit onto a single
page, additional pages are needed because the index structure is static. These additional
pages are allocated from an overflow area. The allocation of pages is illustrated in Figure
4.15.

Fig 4.15: Page allocation in ISAM

Unit 4: File Organization for Conventional DBMS 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The basic operations of insertion, deletion, and search are all quite straightforward. For an
equality selection search, we start at the root node and determine which subtree to search
by comparing the value in thesearch field of the given record with the key values in the node.
For a range query, the starting point in the data (or leaf) level is determined similarly, and
data pages are then retrieved sequentially. For inserts and deletes, the appropriate page is
determined as for a search, and the record is inserted ordeleted with overflow pages added
if necessary.

The following example illustrates the ISAM index structure. Consider the tree shown in
Figure 4.16. All searches begin at the root. For example, to locate a record with the key-value
27, we start at the root and follow the left pointer, since 27 < 40. We then follow the middle
pointer, since 20 <= 27 < 33. For a range search, we find the first qualifying data entry as for
an equality selection and then retrieve primary leaf pages sequentially (also retrieving
overflow pages as needed by following pointers from the primary pages). The primary leaf
pages are assumed to be allocated sequentially this assumption is reasonable because the
number of such pages is known when the tree is created and does not change subsequently
under inserts and deletes and so no ‘next leaf page' pointers are needed.

We assume that each leaf page can contain two entries. If we now insert a record with key-
value 23, the entry 23* belongs on the second data page, which already contains 20* and
27* and has no more space. We deal with this situation by adding an overflow page and
putting 23* in the overflow page. Chains of overflow pages can easily develop.

For instance, inserting 48*, 41*, and 42* leads to an overflow chain of two pages. The tree of
Figure 4.16 with all these insertions is shown in Figure 4.17.

Fig 4.16: Sample ISAM Tree

Unit 4: File Organization for Conventional DBMS 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The deletion of an entry k* is handled by simply removing the entry. If this entry is on an
overflow page and the overflow page becomes empty, the page can be removed. If the entry
is on a primary page and deletion makes the primary page empty, the simplest approach is
to simply leave the empty primary page as it is; it serves as a placeholder for future insertions
(and possibly non-empty overflow pages, because we do not move records from the overflow
pages to the primary page when deletions on the primary page create space).

Fig 4.17: ISAM tree after inserts

Thus, the number of primary leaf pages is fixed at file creation time. Notice that deleting
entries could lead to a situation in which key values that appear in the index, levels do not
appear in the leaves! Since index levels are used only to direct a search to the correct leaf
page, this situation is not aproblem. The tree of Figure 4.17 is shown in Figure 4.18 after the
deletion of theentries 42*, 51*, and 97*.

Note that after deleting 51*, the key-value 51 continues to appear in the index level. A
subsequent search for 51* would go to the correct leaf page and determine that the entry is
not in the tree.

Unit 4: File Organization for Conventional DBMS 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 4.18: ISAM tree after deletes

The non-leaf pages direct a search to the correct leaf page. The number of disks I/Os is equal
to the number of levels of the tree and is equal to logFN, where N is the number of primary
leaf pages and the fan-out F is the number of children per index page. This number is
considerably less than the number of disk I/Os for binary search, which is log2N; in fact, it is
reduced further by pinning the root page in memory. The cost of access viaa one-level index
is log2(N=F). If we consider a file with 1,000,000 records, 10 records per leaf page, and 100
entries per index page, the cost (in page I/Os) of a file scan is 100,000, a binary search of the
sorted data file is 17, a binary search of a one-level index is 10, and the ISAM file (assuming
no overflow) is 3.

Unit 4: File Organization for Conventional DBMS 24


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. VIRTUAL STORAGE ACCESS METHOD (VSAM)


The term Virtual Storage Access Method (VSAM) applies to both a data set type and the
access method used to manage various user data types. As anaccess method, VSAM provides
much more complex functions than other disk access methods. VSAM keeps disk records in
a unique format that is not understandable by other access methods.

You can use VSAM to organize records into four types of data sets: key-sequenced, entry-
sequenced, linear, or relative record. The primarydifference among these types of data sets
is the way their records are stored and accessed.

VSAM data sets


Key Sequenced Data Set (KSDS)
This type is the most common use for VSAM. Each record has one or more key fields and a
record can be retrieved (or inserted) by key value. This provides random access to data.
Records are of variable length.

Entry Sequenced Data Set (ESDS)


This form of VSAM keeps records in sequential order. Records can be accessed sequentially.
It is used by IMS™, DB2®, and z/OS® UNIX®.

Relative Record Data Set (RRDS)


This VSAM format allows retrieval of records by number; record 1, record 2, and so forth.
This provides random access and assumes the application program has a way to derive the
desired record numbers.

Linear Data Set (LDS)


This type is, in effect, a byte-stream data set and is the only form of a byte-stream data set in
traditional z/OS files (as opposed to z/OS UNIX files). A number of z/OS system functions
use this format heavily, but it is rarelyused by application programs.

Several additional methods of accessing data in VSAM are not listed here. Most applications
use VSAM for keyed data.

Unit 4: File Organization for Conventional DBMS 25


DCA2102: Database Management System Manipal University Jaipur (MUJ)

VSAM works with a logical data area known as a control interval (CI) that is diagrammed in
Figure 4.19. The default CI size is 4K bytes, but it can be up to 32K bytes. The CI contains data
records, unused space, record descriptor fields (RDFs), and a CI descriptor field.

Fig 4.19: Simple VSAM control interval

Multiple CIs are placed in a control area (CA). A VSAM data set consists of control areas and
index records. One form of index record is the sequence set, in which the lowest-level index
is pointing to a control interval.

VSAM data is always variable-length and records are automatically blocked in control
intervals. The RECFM attributes (F, FB, V, VB, U) do not apply to VSAM, nor does the BLKSIZE
attribute. You can use the Access Method Services (AMS) utility to define and delete VSAM
structures, such as files and indexes.

Unit 4: File Organization for Conventional DBMS 26


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. SUMMARY
This unit has covered storage devices, file organization, indexed, and virtual access methods
which are used in DBMS. We discussed that several types of data storage media are used in
t h e computer system and are classified bythe speed with which data can be accessed, by
the cost per unit of data to buy the medium, and by the medium’s reliability. We also
discussed about the File organizations and ways of representing logical data models in terms
of files. We also illustrated the ISAM index structure and the know-how of accessing files.
Another accessing method discussed is Virtual Storage Access Method (VSAM) that applies
to both a data set type and are used to manage various user data types. We also found out
that VSAM keeps disk records in a unique format that is not understandable by other access
methods.

8. TERMINAL QUESTIONS
1. Explain various storage devices and their characteristics.
2. Explain various performance measures of disks.
3. Explain various file organizations in detail.
4. Explain indexed and virtual storage access methods.

Unit 4: File Organization for Conventional DBMS 27


DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. ANSWERS
Self-Assessment Questions
1. Cache
2. Lost
3. Power failure.
4. magnetic disk
5. optical
6. sequential
7. circular
8. disk controller
9. storage area network (SAN)
10. Access
11. average seek time
12. block
13. elevator algorithm

Terminal Questions
1. Various storage devices available in most computer systems are: main memory, flash
memory, magnetic disk storage, optical storage, tape storage. Each of these storage
devices has their own characteristics. (Refer section 4.2 for detail)
2. Various performance measures of disk are Access time, average seek time, capacity,
latency time, data-transfer rate, reliability. (Refersection 4.2.3 for detail)
3. File are organized in Fixed-length records, Variable-length records. (Refer section 4.3
for detail)
4. In Indexed sequential access method (ISAM), index files are createdwith records in a
page and pointer are used to do binary search for a key. In Virtual Storage access
method, four types of data sets namely key-sequenced, entry-sequenced, linear, or
relative record are used.

Unit 4: File Organization for Conventional DBMS 28


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 5: An Introduction to RDBMS 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 5
An Introduction to RDBMS
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction
1.1 Objectives 3
2 An Informal look at the Relational Model 1, 2 1 4-5
3 Relational Database Management System 3, 4, 5 2 6–9
4 RDBMS Properties 6, 7 3
10 - 14
4.1 The Entity-Relationship Model
5 Overview of Relational Query Optimization 4 15
6 System Catalog in a Relational DBMS 5
6.1 Information Stored in the System Catalog 16 - 18
6.2 How Catalogs are Stored 8
7 Summary 19
8 Terminal Questions 19
9 Answers 20 - 21

Unit 5: An Introduction to RDBMS 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
A relational database is a collection of relations with distinct relation names. The
relational database schema is the collection of schemas for the relations in the database.
For example, a university database contains schemas for relations called Students, Faculty,
Courses, Rooms, Enrolled Numbers, Teachers, etc. An instance of a relational database is a
collection of relation instances, one per relation schema in the database schema; of course,
each relation instance must satisfy the domain constraints in its schema.

1.1 Objectives:
By the end of the Unit 5, the learners should be able to understand:
❖ The relation, relation schema, and domain name concept
❖ The relational database components like tables, columns, etc.
❖ Properties of RDBMS including entity-relationship models
❖ Optimization in RDBMS
❖ System catalogs in RDBMS

Unit 5: An Introduction to RDBMS 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. AN INFORMAL LOOK AT THE RELATIONAL MODEL


The main construct for representing data in the relational model is a relation. A relation
consists of a relation schema and a relation instance. The relation instance is a table,
and the relation schema describes the column heads for the table. We first describe the
relation schema and then the relation instance. The schema specifies the relation's name, the
name ofeach field (or column, or attribute), and the domain of each field. A domain is
referred to in a relation schema by the domain name and has a set of associated values.

We use the example of student information in a university database to illustrate the parts of
a relation schema:
Students(sid: string, name: string, login: string, age: integer, gpa: real)

This says, for instance, that the field name sid has a domain name string. The set of values
associated with the domain string is the set of all character strings. We now turn to the
instances of a relation. An instance of a relationis a set of tuples, also called records, in
which each tuple has the same number of fields as the relation schema. A relation instance
can be thought of as a table in which each tuple is a row, and all rows have the same number
of fields. (The term relation instance is often abbreviated to just relation when there is no
confusion with other aspects of a relation such as its schema.)

An instance of the Students relation appears in Figure 5.1. The instance S1 contains six tuples
and has, as we expect from the schema, five fields. Notethat no two rows are identical.

Fig 5.1: An Instance S1 of the Student's Relation

Unit 5: An Introduction to RDBMS 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

This is a requirement of the relational model - each relation is defined to bea set of unique
tuples or rows. The order in which the rows are listed is not important.

Fig 5.2: An Alternative Representation of Instance S1 of Students

Figure 5.2 shows the same relation instance. If the fields are named, as in our schema
definitions and figures depicting relation instances, the order of fields does not matter either.
However, an alternative convention is to list fields in a specific order and to refer to a field
by its position. Thus sid is field 1 of Students, login is field 3, and so on. If this convention is
used, the order of fields is significant. Most database systems use a combination of these
conventions. For example, in SQL the named fields’ convention is used in statements that
retrieve tuples, and the ordered fields’ convention is commonly used when inserting tuples.

A relation schema specifies the domain of each field or column in therelation instance.

These domain constraints in the schema specify an important condition that we want each
instance of the relation to satisfy: The values that appear in a column must be drawn from
the domain associated with that column. Thus, the domain of a field is essentially the type of
that field, in programming language terms, and restricts the values that can appear inthe
field. Domain constraints are so fundamental in the relational model that we will henceforth
consider only relation instances that satisfy them; therefore, relation instance means
relation instance that satisfies the domainconstraints in the relation schema.

Self-Assessment Questions – 1

1. A relation consists of a __________and a relation instance.


2. An instance of a relation is a set of tuples, also called ___________.
3. Relation instance means relation instance that satisfies the ___________ in the
relation schema.

Unit 5: An Introduction to RDBMS 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. RELATIONAL DATABASE MANAGEMENT SYSTEM


A Relational Database Management System (RDBMS) provides a comprehensive and
integrated approach to information management. A relational model provides the basis for
a relational database. A relational model has three aspects:
1. Structures
2. Operations
3. Integrity rules

Structures consist of a collection of objects or relations that store data. An example of a


relation is a table. You can store information in a table and use the table to retrieve and
modify data.

Operations are used to manipulate data and structures in a database. Whenusing operations,
you must adhere to a predefined set of integrity rules.

Integrity rules are laws that govern the operations allowed on data in a database. This
ensures data accuracy and consistency.

Relational database components as shown in Figure 5.3 include:


• Table
• Row
• Column
• Field
• Primary key
• Foreign key

Unit 5: An Introduction to RDBMS 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 5.3: Relational database components

A Table is a basic storage structure of an RDBMS and consists of columns and rows as shown
in Figure 5.4. A table represents an entity. For example,the S_DEPT table stores information
about the departments of an organization.

A Row is a combination of column values in a table and is identified by a primary key. Rows
are also known as records. For example, a row in the table S_DEPT contains information
about one department.

A Column is a collection of one type of data in a table. Columns represent the attributes of an
object. Each column has a column name and contains values that are bound by the same type
and size. For example, a column in the table S_DEPT specifies the names of the departments
in the organization.

A Field is an intersection of a row and a column. A field contains one data value. If there is
no data in the field, the field is said to contain a NULLvalue.

Unit 5: An Introduction to RDBMS 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 5.4: Table, Row, Column & Field

A Primary key is a column or a combination of columns that is used to uniquely identify


each row in a table. For example, the column containing department numbers in the S_DEPT
table is created as a primary key, and therefore every department number is different. A
primary key must contain a value. It cannot contain a NULL value.

A Foreign key is a column or set of columns that refers to a primary key of another table.
You use foreign keys to establish principle connections between, or within, tables. A foreign
key must either match a primary key or else be NULL. Rows are connected logically when
required. The logical connections are based upon conditions that define a relationship
between corresponding values, typically between a primary key and a matching foreign key.
This relational method of linking provides great flexibility as it is independent of physical
links between records. The primary and foreign key concept is shown in Figure 5.5.

Fig 5.5: Primary & Foreign key

Unit 5: An Introduction to RDBMS 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 2

4. ___________consist of a collection of objects or relations that store data.


5. Operations are used to _______data and structures in a database.
6. Integrity Rules ensures data accuracy and .
7. A _____is a basic storage structure of an RDBMS and consists ofcolumns and rows.
8. A Row is a combination of column values in a table and is identified by a
_____key.
9. A _______is an intersection of a row and a column.

Unit 5: An Introduction to RDBMS 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. RDBMS PROPERTIES
An RDBMS is easily accessible. You execute commands in the Structured Query Language
(SQL) to manipulate data. SQL is the International Standards Organization (ISO) standard
language for interacting with an RDBMS. The interaction between the SQL and database is
shown in Figure 5.6.

An RDBMS provides full data independence. The organization of the data isindependent of
the applications that use it. You do not need to specify the access routes to tables or know
how data is physically arranged in a database.

A relational database is a collection of individual, named objects. The basic unit of data
storage in a relational database is called a table. A tableconsists of rows and columns used
to store values. For access purposes, the order of rows and columns is insignificant. You can
control the access order as required.

Fig 5.6: SQL & Database

When querying the database, you use conditional operations such as joins and restrictions.
A join combines data from separate database rows. A restriction limits the specific rows
returned by a query as shown in Figure 5.7. We can learn more details about join
operations under relational algebra in Unit 7.

Unit 5: An Introduction to RDBMS 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 5.7: Conditional operations

An RDBMS enables data sharing between users. At the same time, you can ensure the
consistency of data across multiple tables by using integrity constraints. An RDBMS uses
various types of data integrity constraints. These types include an entity, column, referential
and user-defined constraints.

The constraint, entity, ensure the uniqueness of rows, and the constraint column ensures
consistency of the type of data within a column. The other type, referential, ensures the
validity of foreign keys, and user-definedconstraints are used to enforce specific business
rules.

An RDBMS minimizes the redundancy of data. This means that similar data is not repeated
in multiple tables.

4.1 The Entity-Relationship Model


The entity-relationship model is a tool for analyzing the semantic features of an application
that are independent of events. This approach includes a graphical notation, which depicts
entity classes as rectangles, relationships as diamonds, and attributes as circles or ovals. For
a complex situation, a partial entity-relationship diagram may be used to present a summary
of the entities and relationships that do not include the details of the attributes.

The entity-relationship diagram provides a convenient method for visualizing the


interrelationships among entities in a given application. This tool has proven to be useful in
making the transition from an information application description to a formal database
schema. The entity-relationship model is used for describing the conceptual schema of an

Unit 5: An Introduction to RDBMS 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

enterprise, without attention to the efficiency of the physical database design. The entity-
relationship diagrams are then turned into a logical schema in which the database is
implemented.

Short definitions of some of the basic terms that are used for describing important entity-
relationship concepts are:

1. Entity: An entity is a thing that exists and is distinguishable.


a) Entity instance. An instance is a particular occurrence of an entity. For example,
each person is an instance of an entity Person, each car is an instance of an entity
Car, etc.
b) Entity class. A group of similar entities is called an entity class or entity type. An
entity class has common attributes.

2. Attributes: Attributes describe the properties of entities and relationships.


a) Simple and composite attributes: A simple attribute is the smallestsemantic unit
of data, which is atomic (no internal structure). A composite attribute can be
subdivided into parts, e.g., address (street, city, state, zip).
b) Single and multivalued attributes: Single attributes have a single value for a
particular entity. Multivalued attributes have multiple values of an attribute for a
particular entity; e.g., degrees or courses that a student can have or take.
c) Domain: Conceptual definition of attributes: a named set of scalar values, all of the
same type, and a pool of possible values.

3. Relationships: A relationship is a connection between entities. For example, a


relationship between PERSONS and AUTOMOBILES could be an “OWNS” relationship.
That is to say, automobiles are owned by people.
• Is-a hierarchies: A special type of relationship that allows attribute inheritance.
For example, to say that a truck is an automobile and an automobile has a model
and serial number implies that a truck also has a model and serial number.

4. Keys: The key uniquely differentiates one entity instance from all others in the entity.
A key is an identifier.

Unit 5: An Introduction to RDBMS 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

a) Primary Key: Identifier used to uniquely identify one particular instance of an


entity.
A primary key
• can be one or more attributes (e.g., consider substituting a single
concatenated key attribute for multiple attribute key)
• must be unique within the domain (not just the current data set),
• its value should not change over time,
• must always have a value, and
• is created when no obvious attribute exists. Each instance hasa value.

b) Candidate Key
When multiple possible identifiers exist, each of them is acandidate key.

c) Concatenated Key.
Key i s made up of parts which, when combined, become a uniqueidentifier.
Multiple attribute keys are concatenated keys.

d) Borrowed Key Attributes.


If an is-a relationship exists, the key of the more general entity is also a key of the
sub-entities. For example, if the serial number is a keyfor automobiles, it would
also be a key for trucks.

e) Foreign Keys. Foreign keys reference a related table through the primary key of
that related table.

An ER schema may identify certain constraints to which the content of the data must
conform.

Two of the most important types of constraints are:


1. The mapping cardinality of a relationship indicates the number of instances in entity
E1 that can or must be associated with instances in entity E2:
a) One-One Relationship. For each entity instance in one entity there isat most one
associated entity instance in the other entity. For example, for each husband, there
is at the most one current legal wife. A wife has at the most one current legal
husband.

Unit 5: An Introduction to RDBMS 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

b) Many-One Relationships. One entity instance in entity E2 is associated with zero


or more entity instances in entity E1, but each entity instance in E1 is associated
with at most one entity instance in E2. For example, a woman may have many
children but a child has only one birth mother.
c) Many-Many Relationships There are no restrictions on how many entity instances
in either entity are associated with a single entity instance in the other. An example
of a many-to-many relationship would be students taking classes. Each student
takes many [Link] class has many students.
Mapping cardinality is derived from cardinality constraints. The cardinality
constraint between two entities E1 and E2, denoted by (m,n), specifies that an
instance in E1 appears in E2 at least m andat most n times. Mapping cardinality
takes the maximum number of the cardinality constraint for each entity in a
relationship.
2. Existence dependence. If the existence of an entity instance x depends on the existence
of an entity instance y, then x is said to be existence dependent on y. If y is deleted so is
x. For example – loan_payment is existence dependent on loan_number. If loan_number
is deleted so is loan_payment.

Self-Assessment Questions – 3

10. ________ is the International Standards Organization (ISO) standard


language for interacting with an RDBMS.
11. The basic unit of ____________in a relational database is called a table.
12. A ___________combines data from separate database rows.
13. An RDBMS enables data ____________between users.
14. An RDBMS minimizes the ____of data.
15. A degree of a student is _________valued attribute.
16. The ___uniquely differentiates one entity instance from all others inthe
entity.
17. A Concatenated Key is made up of parts which, when combined,become a
_________identifier.

Unit 5: An Introduction to RDBMS 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. OVERVIEW OF RELATIONAL QUERY OPTIMIZATION


The goal of a query optimizer is to find a good evaluation plan for a given query. The space
of plans considered by a typical relational query optimizer can be understood by
recognizing that a query is essentially treated as a --x algebra expression, with the
remaining operations (if any, in a given query) carried out on the result of the --x
expression. Optimizing such a relational algebra expression involves two basic steps:

a) Enumerating alternative plans for evaluating the expression; typically, an optimizer


considers a subset of all possible plans because the number of possible plans is very
large.
b) Estimating the cost of each enumerated plan, and choosing the plan with the least
estimated cost.

We can study more about select, project operations in relational algebra in Unit 7.

Self-Assessment Questions – 4

18. The goal of a query optimizer is to find a good evaluation plan for a given __________.
19. Optimizing a relational algebra expression involves _ _ _ _ _ basic steps.

Unit 5: An Introduction to RDBMS 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. SYSTEM CATALOG IN A RELATIONAL DBMS


We can store a relation using one of several alternative file structures, and we can create one
or more indexes each stored as a file on every relation. Conversely, in a relational DBMS,
every file contains either the tuples in relation or the entries in an index. The collection of
files corresponding to users' relations and indexes represents the data in the database.

A fundamental property of a database system is that it maintains a description of all the data
that it contains. A relational DBMS maintains information about every relation and index that
it contains. The DBMS also maintains information about views, for which no tuples are stored
explicitly; rather, a definition of the view is stored and used to compute the tuples that belong
in the view when the view is queried. This information is stored in a collection of relations,
maintained by the system, called the catalog relations; an example of a catalog relation is
shown in Figure 5.8. The catalog relations are also called the system catalog, the catalog, or
the data dictionary. The system catalog is sometimes referred to as metadata;that is, not
data, but descriptive information about the data. The information in the system catalog is
used extensively for query optimization.

6.1 Information Stored in The System Catalog


Let us consider what is stored in the system catalog. At a minimum, we havesystem-wide
information, such as the size of the buffer pool and the page size, and the following
information about individual relations, indexes, and views:

For each relation:


• Its relation name, the file name (or some identifier), and the file structure
• (e.g., heap file) of the file in which it is stored.
• The attribute name and type of each of its attributes.
• The index name of each index on the relation.
• The integrity constraints (e.g., primary key and foreign key constraints)on the relation.

For each index:


• The index name and the structure of the index.
• The search key attributes.

Unit 5: An Introduction to RDBMS 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

For each view:


• Its view name and definition.

In addition, statistics about relations and indexes are stored in the system catalogs and
updated periodically (not every time the underlying relations are modified). The following
information is commonly stored:
Cardinality: The number of tuples NTuples(R) for each relation R.
Size: The number of pages NPages(R) for each relation R.
Index Cardinality: Number of distinct key values NKeys(I) for each index I.
Index Size: The number of pages INPages(I) for each index I.
Index Height: The number of nonleaf levels IHeight(I) for each tree index I.
Index Range: The minimum present key value ILow(I) and the maximumpresent key
value IHigh(I) for each index I.

6.2 How Catalogs Are Stored


A very elegant aspect of a relational DBMS is that the system catalog is itself a collection of
relations. For example, we might store information about the attributes of relations in a
catalog relation called Attribute Cat:
Attribute Cat(attr name: string, rel name: string, type: string, position: integer)

Suppose that the database contains two relations:


Students(sid: string, name: string, login: string, age: integer, gpa: real)

Faculty (fid: string, fname: string, sal: real)

Figure 5.8 shows the tuples in the Attribute Cat relation that describe the attributes of these
two relations. Notice that in addition to the tuples describing Students and Faculty, other
tuples (the first four listed) describe the four attributes of the Attribute Cat relation itself!
These other tuples illustrate an important point: the catalog relations describe all the
relations inthe database, including the catalog relations themselves. When information about
a relation is needed, it is obtained from the system catalog. Of course,at the implementation
level, whenever the DBMS needs to find the schema of a catalog relation, the code that
retrieves this information must be handled specially.

Unit 5: An Introduction to RDBMS 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 5.8: An Instance of the Attribute Cat Relation

The fact that the system catalog is also a collection of relations is very useful. For example,
catalog relations can be queried just like any other relation, using the query language of the
DBMS! Further, all the techniques available for implementing and managing relations apply
directly to catalog relations. The choice of catalog relations and their schemas is not unique
and is made by the implementor of the DBMS. Real systems vary in their catalog schema
design, but the catalog is always implemented as a collection of relations, and it essentially
describes all the data stored in the database.

Self-Assessment Questions – 5

20. The catalog relations are also called the ____catalog.


21. In a catalog related to view, the information storages are view nameand
_______.
22. Index ______means Number of distinct key values NKeys(I) foreach index I.

Unit 5: An Introduction to RDBMS 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. SUMMARY
In this unit, we had an informal look at the relational model initially. We see that the main
construct for representing data in the relational model is a relation that consists of a
relation schema and a relation instance. We also dealt with relational database
management systems and discussed RDBMS properties. We also briefly presented about the
entity-relationship model as a tool for analyzing the semantic features of an application that
are independent of events. Finally, we discussed the optimization and system catalogs in
relational database management systems. We dealt with the catalog relations also known as
system catalog, the catalog, or the data dictionary. The system catalog is sometimes referred
to as metadata which is not data, but descriptive information about the data.

8. TERMINAL QUESTIONS
1. Create a relational schema to hold information about insurance policies and
policyholders. Insurance policies have properties policyNo, startDate, premium,
renewalDate, policyType. Policyolders are characterized by holderNo, holderName,
holderAddress and holderTelno.
a. What would be suitable primary keys in this database?
b. How do we relate insurance policy information with policyholderinformation?
c. Declare a suitable domain for policy type.
2. What is the goal of query optimization? Why is it important?
3. What information is stored in the system catalogs?
4. What are the benefits of making the system catalogs relations?

Unit 5: An Introduction to RDBMS 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. ANSWERS
Self Assessment Questions
1. relation schema
2. records
3. domain constraints
4. Structures
5. Manipulate
6. Consistency
7. Table
8. Primary
9. Field
10. SQL
11. data storage
12. join
13. sharing
14. redundancy
15. Multivalued
16. key
17. unique
18. query
19. two
20. System
21. Definition
22. Cardinality

Unit 5: An Introduction to RDBMS 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions
1 . ( a ) You can create a relational schema of your own. Although in thiscase, Policy No.
would be a suitable primary key.
(b) Insurance policy information and policy holder information can berelated by Fields
and Tuples.
(c) You can declare it in your own or give the domain name of existingpolicy type of
any insurance company. (Refer entire unit for details)

2. The goal of a query optimizer is to find a good evaluation plan for agiven query.
(Refer section 5)

3. Information for each relation (like relation name, the file name, file structure, the
attribute name, and type of each of its attributes, the index name of each index on the
relation, the integrity constraints on the relation) are stored in the system catalogs.
(Refer section 6.1 for detail)

4. The main benefits of making the system catalogs relations is that the catalog relations
can be queried just like any other relation, using the query language of the DBMS.
(Refer section 6.1 for detail)

Unit 5: An Introduction to RDBMS 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 6: SQL – 1 1
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 6
SQL – 1
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction to SQL - -
3
1.1 Objectives - -
2 Categories of SQL Commands - 1 4-6
3 Data Definition - 2 7-9
4 Data Manipulation Statements - 3

4.1 SELECT – The Basic Form - -

4.2 Subqueries - -
10 - 23
4.3 GROUP BY Feature - -
4.4 Updating the Database - -
4.5 Data Definition Facilities - -
5 Summary - - 24
6 Terminal Questions 1, 2 - 24 - 25
7 Answers - - 25 - 26

Unit 6: SQL – 1 2
DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION TO SQL
SQL is an acronym for Structured Query Language. It is available in a number of database
management packages based on the relational model of data, for example, in DB2 of the IBM
and UNIFY of the UNIFY corporation.

Originally defined by D.D. Chamberlain in 1974, SQL underwent a number of modifications


over the years. Today, SQL has become an official ANSI standard.

It allows for data definition, manipulation, and data control for a relational database. The
data definition facilities of SQL permit the definition of relations and various alternative
views of relations. Further, the data control facility gives features for one user to authorize
other users to access his data. This facility also permits assertions to be made about data
integrity. All the three major facilities of SQL, namely, data manipulation, data definition, and
data control are bound together in one integrated language framework.

1.1 Objectives:
After going through Unit 6, you should be able to:

❖ Differentiate among SQL commands


❖ List data manipulation commands
❖ List data definition commands
❖ Make queries using data manipulation commands
❖ Understand different views and its definition
❖ Understand embedded SQL and Transactions

Unit 6: SQL – 1 3
DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. CATEGORIES OF SQL COMMANDS


SQL commands can be roughly divided into three major categories with regard to their
functionality. Firstly, there are those used to create and maintain the database structure. The
second category includes those commands that manipulate the data in such structures, and
thirdly there are those that control the use of the database. To have all this functionality in a
single language is a clear advantage over many other systems straightway, and must
certainly contribute largely to the rumor of it being easy to [Link]'s worth naming these
three fundamental types of commands for future reference. Those that create and maintain
the database are grouped intothe class called DDL or Data Definition Language statements
and those used to manipulate data in the tables, of which there are four, are the DML or Data
Manipulation Language commands. To control usage of the data the DCL commands (Data
Control Language) are used, and it is these threein conjunction plus one or two additions
that define SQL. There are therefore no environmental statements, as one finds so irritating
in COBOL: for example, no statements to control program flow (if/then/else, perform, goto)
and of course, no equivalent commands to open and close files, and read individual records.
At this level, it is easy to see where SQL gets itsend-user- tool and easy-to-use tags.

The Data Definition Statements


To construct and administer the database there are two major DDL statements – CREATE
and DROP, which form the backbone of many commands:

CREATE DATABASE to create a database DROP DATABASE to remove a database CREATE


TABLE to create a table DROP TABLE to drop a table CREATE INDEX to create an index on a
column DROP INDEX to drop an index CREATE VIEW to create a view DROP VIEW to drop a
view.
There may be some additional ones, such as ALTER TABLE or MODIFY DATABASE, which
are vendor-specific.

The Data Manipulation Statements


To manipulate data in tables directly or through views, we use the four standard DML
statements:
SELECT DELETE INSERT UPDATE

Unit 6: SQL – 1 4
DCA2102: Database Management System Manipal University Jaipur (MUJ)

These statements are now universally accepted, as is their functionality, although the degree
to which these commands support this functionality varies somewhat between products.
Compare the functionality of different implementations of UPDATE for example.

Data Control
This deals with three issues:
a) Recovery and Concurrency
Concurrency is concerned with the manner in which multiple users operate upon the
database.

Each user can either reflect the updates of a transaction by using the COMMIT or can cancel
all the updates of a transaction by using ROLLBACK.

b) Security
Security has two aspects to it.

The first is the VIEW mechanism. A view of a relation can be created which hides the sensitive
information and defines only that part of a relation that should be visible. A user can then be
allowed to access this view.
CREATE VIEW LOCAL AS
SELECT * FROM SUPPLIER
WHERE [Link] = 'Delhi'
The above view reveals only the suppliers of Delhi.
The second is by using GRANT operation. This shall grant one or moreaccess rights to
perform the data manipulative operations on the relations.

c) Integrity Constraints
For example, one can specify that an attribute of a relation will not take onnull values.

Unit 6: SQL – 1 5
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 1

1. SQL commands can be roughly divided into____________ major categories with


regard to their functionality.
2. To construct and administer the database there are major DDL
statements.
3. To manipulate data in tables directly or through views we use the four standard
_______statements.
4. ______ is concerned with the manner in which multiple users operate upon
the data base.
5. A _______of a relation can be created which hides the sensitive information and
defines only that part of a relation that should be visible.

Unit 6: SQL – 1 6
DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. DATA DEFINITION
Data definition in SQL is via the create statement. The statement can be used to create a table,
index, or view (i.e., a virtual table based on existing tables). To create a table, the create
statement specifies the name of the table and the names and data types of each column of
the table. Its format is:
create table< relation >( < attribute list >)

where the attribute list is specified as:


< attribute list > :: = < attribute name > (< data type >)[not null] < attributelist >

The data types supported by SQL depend on the particular implementation. However, the
following data types are generally included: integer, decimal, real (i.e., floating point values),
and character strings, both of fixed size and varying length. A number of ranges of values for
the integer data type are generally supported, for example, integer and short int. The decimal
value declaration requires the specification of the total number of decimal digits forthe value
and (optionally), the number of digits on the right of the decimal point. The number of
fractional decimal digits is assumed to be zero if only the total number of digits is specified.

< data type >:: = < integer > |< short int >| < char (n) > | < float >| < decimal (p[,q]) >

In addition, some implementations can support additional data types suchas bit strings,
graphical strings, logical, data, and time. Some DBMSs support the concept of date. One
possible implementation of date could be as eight unsigned decimal digits representing the
data in the yyyymmdd format. Here yyyy represents the year, mm represents the month and
dd represents the day. Two dates can be compared to find the one that islarger and hence
occurring later. The system ensures that only t h e legal date values are inserted (20081016
for the date would be illegal) and functions are provided to perform operations such as
adding a number of days to a date to come up with another date or subtracting a date from
the current date to find the number of days, months, or years. Date constants are provided
in either the format is given above or as a character string in one of the following formats:
mm/dd/yy; mm/dd/yyyy; dd-mmm-yy; dd-mmm-yyyy. In this text, we represent a date
constant as eight unsigned decimal digits in the format yyyymmdd.

Unit 6: SQL – 1 7
DCA2102: Database Management System Manipal University Jaipur (MUJ)

The employee relation for the hotel database can be defined using the create table statement
given below. Here, the Empl_No is specified to be not null to disallow this unique identifier
from having a null value. SQL supports the concept of null values and, unless a column is
declared withthe not null option, it could be assigned a null value.

create table EMPLOYEE


(
Empl_No, integer not null,
Name char (25),
Skill char(20),
Pay-Rate decimal
)
The definition of an existing relation can be altered by using the alter statement. This
statement allows a row column to be added to an existing relation. The existing tuples of the
altered relation are logically considered tobe assigned the null value for the added column.
The physical alteration occurs to a tuple only during an update of the record.

alter table existing-table-name


add column-name-data-type[ ]
alter table EMPLOYEE
add Phone_Number decimal (10)

The create index statement allows the creation of an index for an already existing relation.
The columns to be used in the generation of the index are also specified. The index is named
and the ordering for each column used in the index can be specified as either ascending or
descending. The clusteroption could be specified to indicate that the records are to be placed
in physical proximity to each other. The unique option specifies that only one record could
exist at any time with a given value for the column(s) specified in the statement to create the
index. (Even though this is just an access aid and a wrong place to declare the primary key).
Such columns, for instance, could form the primary key of the relation and hence duplicate
tuples are notallowed. One case is the ORDER relation where the key is the combination of
the attribute Bill#, Dish#. In the case of an existing relation, an attempt to create an index

Unit 6: SQL – 1 8
DCA2102: Database Management System Manipal University Jaipur (MUJ)

with the unique option will not succeed if the relation does not satisfy this uniqueness
criterion. The syntax of the create index statement is shown below:
create [unique] index name-of-index
on existing-table-name
(column-name [ascending or descending]
[, column-name[order]….])
[cluster]

The following statement causes an index called emp index to be built on thecolumn Name
and Pay_Rate. The entries in the index are ascending by Name value and descending by
Pay_Rate. In this example, there are no restrictions on the number of records with the same
Name and Pay_Rate.
Create index empindex
on EMPLOYEE (Nameasc, Pay_Rate desc);
An existing relation or index could be deleted from the database by the drop SQL statement.
The syntax of the drop statement is as follows:

drop table existing-table-name;


drop index existing-index-name;

Self-Assessment Questions – 2

6. Data definition in SQL is via the ______statement.


7. The data types supported by SQL depend on the particular_________
8. The definition of an existing relation can be altered by using the ________
statement.
9. An existing relation or index could be deleted from the database by the __________
SQL statement.

Unit 6: SQL – 1 9
DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. DATA MANIPULATION STATEMENTS


Data manipulation capabilities allow one to retrieve and modify the contents of the database.
The most important of these is the SELECT operation which allows data to be retrieved from
the database. The relation definitions that shall be used in the rest of the module are given
below.

There are parts that are supplied by suppliers. S contains the details about each supplier.
Turnover for a supplier is in terms of lakhs of rupees. Information regarding suppliers of
specific parts is contained in SP, whereas information about the parts themselves is
contained in P.

4.1 SELECT – The Basic Form


The select statement specifies the method of selecting the tuples of the relations(s). The
tuples processed are from one or more relations specified by the form clause of the select
statement. The basic form of SELECT is
Select < target list >
from < relation list >
[where < predicate > ]

SELECT lists the attributes to be selected


FROM relations from which information is to be used
WHERE condition. The rows that qualify are those in which the condition evaluates to
true.
Condition is a single predicate or a collection of predicates combined using the Boolean
operators AND, OR, and NOT.

The column names following SELECT are to be retrieved from the relations specified in the
FROM part. WHERE specifies the condition that the tuples must satisfy in order to be part of
the result.

Unit 6: SQL – 1 10
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Below we shall state first the retrieval query in English and then specify its SQL equivalent.
Unqualified Retrieval
1. Get the part numbers of all the parts being supplied.
SELECT P#
FROM SP
P#
1
1
2
2
3
3
1
Part numbers getting repeated? That's right. SELECT does not eliminateduplicate rows
(unlike the project operation of the relational algebra). In order to do that

2. Get the part numbers of all the parts being supplied with no duplication.
SELECT DISTINCT P#
FROM SP
P#
1
2
3
If all the columns of the relation are to be retrieved then one needn't list all of them. A*
can be specified after SELECT to indicative retrieval ofthe entire relation.

3. Get full details of all suppliers.


SELECT *
FROM S
The ORDER BY clause
The result of a query can be ordered either in ascending (ASC) order orin descending
(DESC) order.

Unit 6: SQL – 1 11
DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. Get the supplier numbers and turnover in descending order of turnover.

SELECT S#, TURNOVER


FROM S
ORDER BY TURNOVER DESC
S# TURNOVER
11 100
12 70
10 50
13 20
Instead of a column name, the original position of the column in theresult can be
used. That is, the above query can be written as

5. Get the details of suppliers who operate from Bombay with a turnover
[Link] S.*
FROM S
WHERE CITY = 'BOMBAY' AND TURNOVER > 50
S# SNAME SCITY TURNOVER
11 NARMADA BOMBAY 100
The above form is a conjunction of comparison predicates. Acomparison predicate is of the
form
scalar-expr O scalar-expr

where O is any of the six relational operators


=,< > ,<, >, <=,>=
and a scalar expression is an arithmetic expression with operators as +,-
,*,/ operands as col., function, constant
BETWEEN Predicate

Unit 6: SQL – 1 12
DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. Get the part numbers weighing between 25 and 35


SELECT P#, WEIGHT
FROM P
WHERE WEIGHT BETWEEN 25 AND 35
P# WEIGHT
1 25
2 30

The use of BETWEEN gives the range within which the values must [Link] the value should
lie outside a range then BETWEEN is to be precededby NOT. For example,
SELECT P#
FROM P
WHERE WEIGHT NOT BETWEEN 25 AND 35
P# WEIGHT
3 45
would retrieve all part numbers whose weight is less than 25 or greater than 35 as shown
above.

LIKE Predicate
This predicate is used for pattern matching. A column of type char can be compared with a
string constant. The use of the word LIKE doesn't look for an exact match but a form of wild
string match. A % or - can appear in the string constant where
% stands for a sequence of n (>=0) characters
- stands for a single character

Examples
ADDRESS LIKE '%Bangalore%' – ADDRESS should have Bangaloresomewhat as a part of
it if the match is to succeed.
STRANGE STRING LIKE '\-%' ESCAPE\'

Here, the normal meaning of -is overridden with the use of the escapecharacter. STRANGE
above will match any string beginning with –

Unit 6: SQL – 1 13
DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. Get the names and cities of suppliers whose names begin with
CSELECT SNAME SCITY
FROM S
WHERE SNAME LIKE 'C%'
SNAME SCITY
CAUVERY BANGALORE

When the data is to be retrieved from more than one relation, then both the relation
names are specified in the FROM clause and the join condition in the WHERE part.

8. Get pairs of supplier numbers such that both operate from the same city.
SELECT FIRST.S#, SECOND.S#
FROM S FIRST, S SECOND
WHERE [Link] = [Link]
AND FIRST.S#< > SECOND.S#
FIRST and SECOND are tuple variables, both ranging over S. The last line eliminates a
supplier getting compared with him.
S# S#
11 13
13 11

But, we see that suppliers with numbers 11 and 13 are getting compared twice. Can that
be avoided? How about < instead of< >?
SELECT FIRST.S#, SECOND.S#
FROM S FIRST, S SECOND
WHERE [Link] = [Link]
AND FIRST.S# SECOND.S#
IN Predicate
This is to be used whenever you want to test whether an attribute valueis one of a set of
values. For example,

Unit 6: SQL – 1 14
DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. Get the part numbers that cost 20, 30, or 40


[Link] P.P#, SELLING PRICE
FROM P
WHERE SELLING PRICE IN (20,40,45)
P# SELLING PRICE
2 45
3 45
It's a quicker way of specifying comparison. The format of the predicateis scalar-expr [NOT]
IN (atom list)

4.2 Subqueries
The expression following WHERE can be either a simple predicate asexplained above or it
can be a query itself! This part of the query following WHERE is called a Subquery.

A subquery, which in turn is a query, can have its own subquery and the process of specifying
subqueries can continue ad infinitum! More practically,the process ends once the query has
been fully expressed as an SQL statement.

Subqueries can appear when using the comparison predicate, the IN predicate and when
quantifiers are used.

Comparison Predicate
10. Get the supplier numbers of suppliers who are located in the same city as TAPI.
SELECT S.S#, SNAME
FROM S
WHERE [Link] = (SELECT CITY
FROM S
WHERE SNAME = 'TAPI‘)

The inner select (subquery) retrieves the city of the supplier named TAPI. The outer select
(the main one) then compares the city of each supplier in the supplier relation and picks up
those where the comparison succeeds.

Unit 6: SQL – 1 15
DCA2102: Database Management System Manipal University Jaipur (MUJ)

S# SNAME
11 NARMADA
13 TAPI

Notice that the subquery appears after the comparison operator. The formatof this form of
expression
scalar-expr operator subquery

IN Predicate
In this form, the subquery selects a set of values. The outer query checkswhether the
value of a specified attribute is in this set.
11. Get the names of suppliers who supply part 2
SELECT [Link]
FROM S
WHERE S.S# IN
(SELECT SP.S#
FROM SP
WHERE SP.P#= 2)
SNAME
CAUVERY
NARMADA

The above query can be equivalently expressed as


SELECT [Link]
FROM S
WHERE 2 IN
(SELECT P#
FROM SP
WHERE S.S#=S#)
S# is unqualified and therefore, refers to SP. That is because every unqualified attribute
name is implicitly qualified with the relation name from the nearest applicable FROM clause.

Unit 6: SQL – 1 16
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Quantified predicates
The two quantifiers that can be used are ALL and ANY. Any stands for the existential
quantifier and ALL for the universal quantifier.

Let's first look at ANY. It can be specified in a comparison predicate just following the
comparison operator. That is,
scalar-expr O ANY subquery

The subquery is first evaluated to give a set of values. The above expression is true if the
scalar-expr is O comparable with any of the values that form the result of the subquery.

12. Get the part numbers for parts whose cost is less than the current maximum cost.
SELECT P#, COST
FROM P
WHERE COST< ANY
(SELECTCOST
FROM P)
P# COST
1 10
2 15

The inner select gets the cost of all the parts. In the outer select, a P# is selected if its cost is
less than some element of the set selected in the earlier step.

4.3 Aggregate Functions


Some standard functions are defined in SQL and can be used when framingqueries. There
are five aggregate functions. These are

COUNT – number of values of a column


SUM – Sum of values in a column
AVG – Average of values in a column
MAX – Maximum of all the values in a column
MIN – Minimum of all the values in a column

Unit 6: SQL – 1 17
DCA2102: Database Management System Manipal University Jaipur (MUJ)

If the function is followed by the word DISTINCT then unique values are used. On the other
hand, if ALL follows the functions then all the values are used for evaluating the function. ALL
is default.
COUNT(*) has a special meaning in that it counts the number of rows of a relation. COUNT in
any other form must make use of DISTINCT. In other words, except when rows are counted,
COUNT always returns the number of distinct values in a column.

13. Get the total number of suppliers


SELECT COUNT(*)
FROM S
COUNT (*) counts the number of tuples of S and hence, the number of suppliers.

14. Get the total quantity of Part 2 that is supplied


SELECT SUM([Link])
FROM SP
WHERE SP.P#=2
The answer is one of
a) 25 b) 40 c) 80 d) 100

15. Get the part numbers whose cost is greater than the average cost.
SELECT P#
FROM P
WHERE COST >
(SELECT AVG (COST)
FROM P)

16. Get the names of suppliers who supply from a city where there is atleast one more
supplier
SELECT SNAME
FROM S FIRST
WHERE 2 >=
(SELECT COUNT (CITY)
FROM S
WHERE CITY = [Link])

Unit 6: SQL – 1 18
DCA2102: Database Management System Manipal University Jaipur (MUJ)

4.4 GROUP BY Feature


This feature allows one to partition the result into a number of groups such that all rows of
the group have the same value in some specified column.

17. Get the part number and the total quantity.

SELECT P#, SUM(QTY)


FROM SP
GROUP BY P#
P# SUM(QTY)

1 125
2 80
3 110
GROUP BY groups together all the rows which have the same value for P#. The function SUM
is then applied to each group. That is, the result consists of a part number along with the total
quantity in which it is supplied.

Whenever GROUP BY is used then the phrase WHERE is to be replaced by HAVING. The
meaning of HAVING is the same as WHERE except that the condition is now applicable to
each group.

18. Get the part numbers for parts supplied by more than one supplier.
SELECT P#
FROM SP
GROUP BY P#
HAVING COUNT(*) > 1

Each group contains one or more tuples which have the same part number. COUNT(*) is
applied to each such group.

The result before COUNT(*) is applied is


P#
1
2
3

Unit 6: SQL – 1 19
DCA2102: Database Management System Manipal University Jaipur (MUJ)

4.5 Updating the Database


The contents of the database can be modified by inserting a new tuple, deleting an existing
tuple or changing the values of attributes of one or moretuples.

INSERT
The insertion facility allows new tuples to be inserted into given relations. Attributes which
are not specified by the insertion statement are given null values. Consider

1. Add a part with 14, weight 10, colored red, with the cost and selling price as 20 and 60
respectively
INSERT INTO P: < 14, 10, 'red', 20, 60 >

The tuple is inserted into P.

If all the fields are not known then a tuple can still be added. The attributes whose values are
not specified will have a null value.

The tuple is inserted into P.

If all the fields are not known then a tuple can still be added. The attributes whose values are
not specified will have a null value.
INSERT INTO P:
< 15,'green' >

The values for fields the weight, cost and the selling price which are not specified are
assumed to be null.

2. Let us assume that there is a relation called RED-PART with one column P#.
INSERT INTO RED-PART
SELECT P#
FROM P
WHERE COLOUR = 'red'

The various attributes of P having red color are identified and inserted into the relation RED-
PART.

Unit 6: SQL – 1 20
DCA2102: Database Management System Manipal University Jaipur (MUJ)

DELETE
The deletion facility removes specified tuples from the database. Consider
1. Delete supplier 13
DELETE S
WHERE S# = 13
Since S# is the primary key only one tuple will be deleted from S.

2. Delete all suppliers who supply from Bangalore


DELETE S
WHERE SCITY = 'BANGALORE'
Here, more than one supplier can get deleted.

3. Delete all suppliers


DELETE S
The definition of S exists but the relation is empty.

UPDATE
When columns are to be modified, SET clause is used. This clause specifies the update to be
made to selected tuples.
1. Change the city of supplier 13 to Bangalore and increase the turnoverby 20 lakhs.
UPDATE S
SET CITY = 'BANGALORE'
TURNOVER = TURNOVER + 20
WHERE S# = 13

2. Increase quantity by 10 for all supplies of red coloured parts.


UPDATE SP
SET QTY = QTY + 10WHEREP# IN
(SELECT P#
FROM P
WHERE COLOUR = 'RED')

Unit 6: SQL – 1 21
DCA2102: Database Management System Manipal University Jaipur (MUJ)

4.6 Data Definition Facilities


Data definition facilities permit users to create and drop relations, and define alternative
views of relations.

CREATE statement allows to define a relation. The name of the relation to be created and its
various fields together with their data types must be specified. If a certain attribute is barred
from containing null values then a NONULL specification must be made for it.

It must be noted that the word TABLE is used in this syntax instead of
RELATION.
Example
CREATE TABLE DEPT
(DNO CHAR (2) NONULL,
DNAME VARCHAR (12),
LOC VARCHAR (12))
VIEW
A very important aspect of data definition is the ability to define alternative views of data.
The process of specifying an alternative view is very similar tothat of framing a query. The
derived relation is stored and can be used thereafter as an object of the various commands.
It is also possible to defineother views on top of the newly created relation.

Example
DEFINE VIEW D50 AS
SELECT EMPNO, NAME, JOB
FROM EMP
WHERE DNO = 50
D50 contains the employee number, name, and job of those employees who are in
department 50.

Unit 6: SQL – 1 22
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 3

10. Data manipulation capabilities allow one to _____and modifycontents


of the data base.
11. The ____statement specifies the method of selecting the tuples ofthe
relations(s).
12. SELECT does not eliminate ____rows.
13. The use of ____gives the range within which the values mustlie.
14. Like predicate is used for pattern ____.
15. Subqueries can appear when using the predicate, the INpredicate and
when quantifiers are used
16. ANY stands for the existential quantifier and ALL for the ____
quantifier.
17. There are __________built-in functions in SQL.
18. In SQL ______function has a special meaning in that it counts thenumber
of rows of a relation.
19. When columns are to be modified ______clause is used.

Unit 6: SQL – 1 23
DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. SUMMARY
This unit discussed that most commercial relational DBMSs support some form of the SQL
data manipulation language, and this creates different dialects of SQL. We dealt that SQL
commands can be roughly divided into three major categories: (i) commands to create and
maintain the database structure (ii) commands that manipulate the data in such
structures (iii) commands that control the use of the database. We discussed about
subqueries, functions, GROUP BY features, updating the database, and data definition
facilities.

6. TERMINAL QUESTIONS
1. Consider the insurance database of Figure 6.1, where the primary keys are underlined.
Construct the following SQL queries for this relational database.
a) Find the total number of people who owned cars that were involvedin accidents in
1989.
b) Find the number of accidents in which the cars belonging to “JohnSmith” were
involved.
c) Add a new accident to the database; assume any values for requiredattributes.
d) Delete the Mazda belonging to “John Smith”.
Update the damage amount for the car with license number “AABB2000” in the accident with
report number “AR2197” to $3000.

Fig 6.1: Insurance database

Fig 6.2: Employee database

Unit 6: SQL – 1 24
DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. Consider the employee database of Figure 6.2, where the primary keys are underlined.
Give an expression in SQL for each of the following queries.
a) Find the names of all employees who work for First BankCorporation.
b) Find the names and cities of residence of all employees who workfor First Bank
Corporation.
c) Find the names, street addresses, and cities of residence of allemployees
who work for First Bank Corporation and earn more than $10,000.
d) Find all employees in the database who live in the same cities as thecompanies
for which they work.
3. Consider the relational database of Figure 6.2. Using SQL, define a viewconsisting of
manager-name and the average salary of all employees who work for that manager.
Explain why the database system should not allow updates to be expressed in terms of
this view.
4. Describe the circumstances in which you would choose to use embedded SQL rather
than SQL alone or only a general-purpose programming language.

7. ANSWERS

.
Self-Assessment Questions
1. Three
2. Two
3. DML
4. Concurrency
5. View
6. Create
7. Implementation
8. Alter
9. Drop
10. Retrieve
11. Select
12. Duplicate
13. BETWEEN
14. Matching
15. Comparison

Unit 6: SQL – 1 25
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions

1. Refer the whole unit to construct SQL queries of the given insurance database and
check it practically to confirm results.
2. Updates should not be allowed in view because there is no way to determine how to
change the underlying data and define alternative views of data. (Refer section 6.4 for
detail)
3. Writing queries in SQL is easier than coding the same queries in a general-purpose
programming language. Embedded SQL is lesscomplicated since it avoids the clutter of
the ODBC or JDBC function calls, but requires a specialized pre-processor. (Refer whole
unit for detail)

Unit 6: SQL – 1 26
DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 7: SQL – 2 1
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 7
SQL – 2
Table of Contents

SL Fig No / Table / SAQ /


Topic Page No
No Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 View - 1 4–5
3 Embedded SQL - 2

3.1 Declaring Variables and Exceptions - - 6–9

3.2 Embedding SQL Statements - -


4 Transaction Processing - 3
4.1 Consistency and Isolation - - 9 - 12
4.2 Atomicity and Durability - -
5 Summary - - 12
6 Terminal Questions - - 12
7 Answers - - 13

Unit 7: SQL – 2 2
DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
In the last Unit, SQL was introduced, and depicted that this language is available in a number
of database management packages based on the relational model of data. It was also said that
SQL is used for data definition,manipulation, and data control for a relational database and
illustrated that all the three major facilities of SQL, namely, data manipulation, data
definition, and data control are bound together in one integrated language framework.

In this unit, an introduction about the Views is given which are useful for hiding unneeded
information and for collecting together information from more thanone relation into a single
view. This Unit also introduces Embedded SQL which uses SQL commands within the host
language program.

We discuss about the transaction processing which consists of a sequenceof operations and
which must appear to be atomic.

1.1 Objectives:
After going through Unit 7, the learners should be able to:

❖ Discuss about views and its definition


❖ Explain embedded SQL and Transactions
❖ Discuss the Transaction Processing

Unit 7: SQL – 2 3
DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. VIEWS
A view is a virtual table that does not actually exist. It is made up of a query on other
tables in the database. It could include only certain columns or rows from a table or from
many tables. A view that restricts the user to certain rows is called a horizontal view, and a
vertical view restricts the user to certain columns. You are not restricted to purely horizontal
or vertical slices of data.

A view can be as complicated as you like. You can have grouped views, where the query
contains a GROUP BY clause. This makes the view a summary of the data in a table or tables.

If the list of column names is omitted the columns in the view take the same name as in the
underlying tables. You must specify column names if the query includes calculated columns
or two columns with the same name. There are several advantages to views, including

Security: Users can be given access to only the rows/columns of tablesthat concern them.
The table may contain data for the whole firm but they only see data for their department.

Data integrity: The WITH CHECK OPTION clause is used to enforce the query conditions on
any updates to the view. If the view shows data for a particular office the user can only enter
data for that office.

Simplicity: Even if a view is a multi-table query, querying the view still lookslike a single-
table query.

Protection from change: If the structure of the database changes, the user's view of the data
can remain the same.

There are two disadvantages to views:

Performance: A view may look like a single table but underneath the DBMSis usually still
running multi-table queries. If the view is complex then even simple queries can take a long
time.

Update restrictions: Updating the data through a view may or may not be possible. If the
view is complex, the DBMS may decide it can't perform updates and make the view read-
only.

Unit 7: SQL – 2 4
DCA2102: Database Management System Manipal University Jaipur (MUJ)

The ISO standard specifies five conditions that a view must meet in order to allow updates:
• The view must not have a DISTINCT clause.
• The view must only name one table in the FROM clause.
• All the columns must be real columns – no expressions, calculated columns, or column
functions
• The WHERE clause must not contain a sub-query
• There must be no GROUP BY or HAVING clause

You will find that most dialects of SQL are not quite so restrictive. The underlying principle
is that updates are allowed if the rows and columns of the view are traceable back to actual
rows and columns in tables.

The format of the view statement is as follows:


create view< view name > as query expression

A view is a relation (virtual other than base) and can be used in query expressions, that is,
queries can be written using the view as a relation. Views generally are not stored, since the
data in the base relations may change. The base relations on which a view is based are
sometimes called the existing relations. The definition of a view in a create view statement
is stored in the system catalog. Having been defined, it can be used as if the view really
represented a real relation. However, such a virtual relation defined by a view is recomputed
whenever a query refers to it.

Self-Assessment Questions – 1

1. A view is a table that is one that does not actually exist.


2. Users can be given access to only the rows/columns of __________that concern
them.
3. If the structure of the database changes, the user's view of the data ___________
4. The view must not have a ____clause.
5. The base relations on which a view is based are sometimes called the
_____________ relations.

Unit 7: SQL – 2 5
DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. EMBEDDED SQL
We have looked at a wide range of SQL query constructs, treating SQL as an independent
language in its own right. A relational DBMS supports an interactive SQL interface, and users
can directly enter SQL commands. Thissimple approach is fine as long as the task at hand can
be accomplished entirely with SQL commands. In practice, we often encounter situations in
which we need the greater flexibility of a general-purpose programming language, in
addition to the data manipulation facilities provided by SQL. For example, we may want to
integrate a database application with a nice graphical user interface, or we may want to ask
a query that cannot be expressed in SQL.

To deal with such situations, the SQL standard defines how SQL commands can be executed
from within a program in a host language such as C or Java. The use of SQL commands within
a host language program is called embedded SQL. Details of embedded SQL also depend on
the host language. Although similar capabilities are supported for a variety of host languages,
the syntax sometimes varies.

Conceptually, embedding SQL commands in a host language program is straightforward. SQL


statements (i.e., not declarations) can be used wherever a statement in the host language is
allowed (with a few restrictions). Of course, SQL statements must be clearly marked so that
a preprocessor can deal with them before invoking the compiler for the host language. Also,
any host language variables used to pass arguments into an SQL command must be declared
in SQL. In particular, some special hostlanguage variables must be declared in SQL.

There are, however, two complications to bear in mind. First, the data types recognized by
SQL may not be recognized by the host language, and vice versa. This mismatch is typically
addressed by casting data values appropriately before passing them to or from SQL
commands. (SQL, like C and other programming languages, provides an operator to cast
values of one type into values of another type.) The second complication has to do with the
fact that SQL is set-oriented; commands operate on and produce tables, which are sets (or
multisets) of rows. Programming languages do not typically have a data type that
corresponds to sets or multisets of rows. Thus, although SQL commands deal with tables, the
interface to the host language is constrained to be one row at a time.

Unit 7: SQL – 2 6
DCA2102: Database Management System Manipal University Jaipur (MUJ)

In our discussion of embedded SQL, we assume that the host language is Cfor concreteness
because minor differences exist in how SQL statements are embedded in different host
languages.

3.1 Declaring Variables and Exceptions


SQL statements can refer to variables defined in the host program. Such host-language
variables must be prefixed by a colon (:) in SQL statements and must be declared between
the commands EXEC SQL BEGIN DECLARE SECTION and EXEC SQL END DECLARE
SECTION. The declarations are similar to how they would look in a C program and, as usual
in C, are separated by semicolons. For example, we can declare variables c_sname, c_sid,
c_rating, and c_age (with the initial c used as a naming convention to emphasize that these
are host language variables) asfollows:
EXEC SQL BEGIN DECLARE SECTION
char c_sname[20];
long c_sid;
short c_rating;
float c_age;
EXEC SQL END DECLARE SECTION

The first question that arises is which SQL types correspond to the variousC types since we
have just declared a collection of C variables whose values are intended to be read (and
possibly set) in an SQL run-timeenvironment when an SQL statement that refers to them is
executed. The SQL-92 standard defines such a correspondence between the host language
types and SQL types for a number of host languages. In our example c_sname has the type
CHARACTER(20) when referred to in an SQL statement, c_sid has the type INTEGER, c_rating
has the type SMALLINT, and c_age has the type REAL.

An important point to consider is that SQL needs some way to report what went wrong if an
error condition arises when executing an SQL statement. The SQL-92 standard recognizes
two special variables for reporting errors, SQLCODE and SQLSTATE. SQLCODE is the older
of the two and is defined to return some negative value when an error condition arises,
without specifying further just what error a particular negative integer denotes.

Unit 7: SQL – 2 7
DCA2102: Database Management System Manipal University Jaipur (MUJ)

SQLSTATE, introduced in the SQL-92 standard for the first time, associatespredefined values
with several common error conditions, thereby introducing some uniformity to how errors
are reported. One of these two variables must be declared. The appropriate C type for
SQLCODE is long and the appropriate C type for SQLSTATE is char[6], that is, a character
string that is five characters long. (Recall the null-terminator in C strings!) In this unit, we
will assume that SQLSTATE is declared.

3.2 Embedding SQL Statements


All SQL statements that are embedded within a host program must be clearly marked, with
the details dependent on the host language; in C, SQL statements must be pre-fixed by EXEC
SQL. An SQL statement can essentially appear in any place in the host language program,
where a host language statement can appear.

As a simple example, the following embedded SQL statement inserts a row, whose column
values are based on the values of the host language variables contained in it, into the Sailors
relation:

EXEC SQL INSERT INTO Sailors VALUES (:c_sname, :c_sid, :c_rating, :c_age);

Observe that a semicolon terminates the command, as per the convention for terminating
statements in C.

The SQLSTATE variable should be checked for errors and exceptions after each embedded
SQL statement. SQL provides the WHENEVER command to simplify this tedious task:

EXEC SQL WHENEVER [ SQLERROR | NOT FOUND ] [ CONTINUE | GOTO stmt ]

The intent is that after each embedded SQL statement is executed, the value of SQLSTATE
should be checked. If SQLERROR is specified and the value of SQLSTATE indicates an
exception, control is transferred to stmt, which is presumably responsible for
error/exception handling. Control is alsotransferred to stmt if NOT FOUND is specified and
the value of SQLSTATE is 02000, which denotes NO DATA.

Unit 7: SQL – 2 8
DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions – 2

6. The use of SQL commands within a host language program is called __________ .
7. The _________variable should be checked for errors andexceptions after
each embedded SQL statement.

4. TRANSACTION PROCESSING

A user writes data access/update programs in terms of the high-level query and update
language supported by the DBMS. To understand how the DBMS handles such requests, with
respect to concurrency control andrecovery, it is convenient to regard an execution of a user
program, or transaction, as a series of reads and writes of database objects:
To read a database object, it is first brought into main memory (specifically, some frame in
the buffer pool) from the disk, and then its value is copied into a program variable.

To write a database object, an in-memory copy of the object is first modified and then written
to disk.

Database ‘objects’ are the units in which programs read or write [Link] units could
be pages, records, and so on, but this is dependent on the DBMS and is not central to the
principles underlying concurrency control or recovery. In this unit, we will consider a
database to be a fixed collection of independent objects. When objects are added to or deleted
from a database, or there are relationships between database objects that wewant to exploit
for performance, some additional issues arise.

There are four important properties of transactions that a DBMS must ensure to maintain
data in the face of concurrent access and system failures:
1. Users should be able to regard the execution of each transaction as atomic: either all
actions are carried out or none are. Users should not have to worry about the effect of
incomplete transactions (say, when a system crash occurs).
2. Each transaction, run by itself with no concurrent execution of other transactions, must
preserve the consistency of the database. This property is called consistency, and the

Unit 7: SQL – 2 9
DCA2102: Database Management System Manipal University Jaipur (MUJ)

DBMS assumes that it holds for each transaction. Ensuring this property of a
transaction is the responsibility of the user.
3. Users should be able to understand a transaction without consideringthe effect of
other concurrently executing transactions, even if the DBMSinterleaves the actions of
several transactions for performance reasons. This property is sometimes referred to
as isolation: Transactions are isolated, or protected, from the effects of concurrently
scheduling other transactions.
4. Once the DBMS informs the user that a transaction has been successfully completed, its
effects should persist even if the system crashes before all its changes are reflected on
the disk. This property is called durability.

The acronym ACID is sometimes used to refer to the four properties of transactions that we
have presented here: atomicity, consistency, isolation, and durability. We now consider how
each of these properties is ensured in a DBMS.

4.1 Consistency and Isolation


Users are responsible for ensuring transaction consistency. That is, the userwho submits a
transaction must ensure that when run to completion by itselfagainst a ‘consistent' database
instance, the transaction will leave the database in a `consistent' state. For example, the user
may (naturally!) havethe consistency criterion that fund transfers between bank accounts
should not change the total amount of money in the accounts. To transfer money from one
account to another, a transaction must debit one account, temporarily leaving the database
inconsistent in a global sense, even though the new account balance may satisfy any integrity
constraints with respect to the range of acceptable account balances. The user's notion of a
consistent database is preserved when the second account is credited with the transferred
amount. If a faulty transfer program always credits the second account with one dollar less
than the amount debited from the first account, the DBMS cannot be expected to detect
inconsistencies due to such errors in the user program’s logic.

The isolation property is ensured by guaranteeing that even though actions of several
transactions might be interleaved, the net effect is identical to executing all transactions one
after the other in some serial order. For example, if two transactions T1 and T2 are executed
concurrently, the net effect is guaranteed to be equivalent to executing (all of) T1 followed

Unit 7: SQL – 2 10
DCA2102: Database Management System Manipal University Jaipur (MUJ)

by executing T2 or executing T2 followed by executing T1. (The DBMS provides no


guarantees about which of these orders is effectively chosen.) If each transaction maps a
consistent database instance to another consistent database instance, executing several
transactions one after the other (on a consistent initial database instance) will also result in
a consistent final database instance.

Database Consistency is the property in that every transaction sees a consistent database
instance. Database consistency follows from transaction atomicity, isolation, and
transaction consistency. Next, we discuss how atomicity and durability are guaranteed in a
DBMS.

4.2 Atomicity and Durability


Transactions can be incomplete for three kinds of reasons. First, a transaction can be
aborted, or terminated unsuccessfully, by the DBMS because some anomaly arises during
the execution. If a transaction is aborted by the DBMS for some internal reason, it is
automatically restarted and executed anew. Second, the system may crash (e.g., because
the power supply is interrupted) while one or more transactions are in progress. Third, a
transaction may encounter an unexpected situation (for example, read an unexpected data
value or be unable to access some disk) and decide to abort (i.e., terminate itself).

Of course, since users think of transactions as being atomic, a transaction that is interrupted
in the middle may leave the database in an inconsistent state. Thus a DBMS must find a way
to remove the effects of partial transactions from the database, that is, it must ensure
transaction atomicity: either all of a transaction's actions are carried out, or none are. A
DBMS ensures transaction atomicity by undoing the actions of incomplete transactions. This
means that users can ignore incomplete transactions in thinking about how the database is
modified by transactions over time. Tobe able to do this, the DBMS maintains a record, called
the log, of all, writes to the database.

The log is also used to ensure durability. If the system crashes before the changes made by
a completed transaction are written to disk, the log isused to remember and restore these
changes when the system restarts.

Unit 7: SQL – 2 11
DCA2102: Database Management System Manipal University Jaipur (MUJ)

SELF ASSESSMENT QUESTIONS – 3

8. To write a database object, an in-memory copy of the object is first _________


and then written to disk.
9. Each transaction, run by itself with no concurrent execution of other
transactions, must preserve the __________of the database.
10. The acronym ACID is sometimes used to refer to the ________of
transactions.
11. ________are responsible for ensuring transaction consistency.
12. Transactions can be incomplete for__ _______kinds of reasons.
13. The _______is also used to ensure durability.

5. SUMMARY
In this unit, we dealt with the SQL data definition language used to create relations with
specified schemas. We discussed Views, embedded SQL, and transaction processing
properties. The SQL DDL supports a number of types including date and time types. SQL
queries can be invoked from host languages, via embedded SQL.

7 6. TERMINAL QUESTIONS

1. What is View? With an example, use the format of view statement tocreate view.
2. What is an Embedded SQL statement? Describe briefly.
3. Explain any two important properties of transactions that a DBMS must ensure to
maintain data in the face of concurrent access and system failures.

Unit 7: SQL – 2 12
DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. ANSWERS
Self-Assessment Questions
1. Virtual
2. Tables
3. can remain the same.
4. DISTINCT
5. Existing
6. embedded SQL
7. SQLSTATE
8. Modified
9. Consistency
10. four properties
11. Users
12. Three
13. Log

Terminal Questions
1. A view is a virtual table which does not actually exist. It is made up of a query on other
tables in the database. It could include only certaincolumns or rows from a table or
from many tables. (Refer section 7.2 for detail)
2. The use of SQL commands within a host language program is called embedded SQL.
(Refer section 7.3 for detail)
3. There are four important properties of transactions that a DBMS must ensure to
maintain data in the face of concurrent access and system failures. They are (i)
Atomicity (ii) Consistency (iii) Isolation (iv) Durability. (Refer section 7.4 for detail)

Unit 7: SQL – 2 13
DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 8: Relational Algebra 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 8
Relational Algebra
Table of Contents

SL Topic Fig No / SAQ / Page


No Table / Activity No
Graph
1 Introduction - -
3
1.1 Objectives - -
2 Basic Operations 1 1

2.1 Union () 2 -

2.2 Difference ( - ) - - 4-8

2.3 Intersection () - -

2.4 Cartesian Product (x) 3 -


3 Additional Relational Algebraic Operations - 2

3.1 Projection () 4 -


5 - 9 - 17
3.2 Selection ()
3.3 JOIN (⨝) 6, 7 -

3.4 Division () 8 -


4 Summary - - 18
5 Terminal Questions - - 18 - 19
6 Answers - - 19 - 20

Unit 8: Relational Algebra 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
Relational algebras received little attention until the publication of E.F. Codd's relational
model of data in 1970. Codd proposed such an algebra as a basis for database query
languages. The first query language to be based on Codd's algebra was ISBL, and this
pioneering work has been acclaimed by many authorities as having shown the way to make
Codd's idea into a useful language. Even the query language of SQL is loosely based on a
relational algebra, though the operands in SQL (tables) are not exactly relations and several
useful theorems about relational algebra do not hold inthe SQL counterpart (arguably to the
detriment of optimizers and/or users).

Relational algebra is a procedural language. It specifies the operations to be performed on


existing relations to derive result relations. Furthermore, it defines the complete scheme for
each of the result relations. The relational algebraic operations can be divided into basic set-
oriented operations and relational-oriented operations. The former are the traditional set
operations, the latter, are those for performing joins, selection, projection, and division.

1.1 Objectives:
By the end of Unit 8, the learners should be able to understand:

❖ Relational algebra and its history


❖ Basic operations like union, difference, and intersections
❖ Additional relational algebra operations including selection, project, joinand division
operations.

Unit 8: Relational Algebra 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. BASIC OPERATIONS
Basic operations are the traditional set operations: union, difference, intersection, and
Cartesian product. Three of these four basic operations- union, intersection, and difference
– require that operand relations be union compatible. Two relations are union compatible if
they have the same parity and one-to-one correspondence of the attributes with the
corresponding attributes defined over the same domain. The Cartesian product can be
defined on any two relations. Two relations P and Q are said to be union compatible if both
P and Q are of the same degree n and the domain of the corresponding n attributes are
identical,

i.e.,
if P= { P1,..,Pn } and Q={ Q1,...Qn } then
Dom(Pi) = Dom(Qi) for i = { 1,2...n }
where Dom(Pi) represents the domain of the attribute Pi.

Example 1
In the examples to follow, we utilize two relations P and Q given in Figure 8.1. R is a
computed result relation. We assume that the relations P and Qin Figure 8.1 represent
employees working on the development of software application packages J1 (say) and J2
(say) respectively.

P:
Id Name
101 Jones
103 Smith
104 Lalonde
107 Evan
110 Drew
112 Smith
Q:
Id Name
103 Smith
104 Lalonde
106 Byron
110 Drew
Fig 8.1: Union compatible relations

Unit 8: Relational Algebra 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.1 Union ()


If we assume that P and Q are two union-compatible relations, then theunion of P and
Q is set-theoretic union of P and Q.

The resultant relation, R = P U Q, has tuples drawn from P and Q such that
R = { t | P v t  Q }

The result relation R contains tuples that are in either P or Q or in both of them. The duplicate
tuples are eliminated.

Remember that from our definition of union compatibility the degree of the relations P and
R is the same. The cardinality of the resultant relation depends on the duplication of tuples
in P and Q.

From the above expression, we can see that if all the tuples in Q were contained in P, then |R|
= |P| and R= P, while if the tuples in P and Q were disjoint, then |R| = |P|+|Q|.

Example 2
R, the union of P and Q given in Figure 8.1 in the above example 1 is shownin Figure 8.2(a).
R represents employees working on the packages J1 or J2, or both of these packages. Since a
relation does not have duplicate tuples, an employee working on both J1 and J2 will appear
in the relation R only once.
R:
Id Name
101 Jones
103 Smith
104 Lalonde
106 Byron
107 Evan
110 Drew
112 Smith
a) P  Q
R:
Id Name
101 Jones
107 Evan
112 Smith
b) P-Q

Unit 8: Relational Algebra 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

R:
Id Name
103 Smith
104 Lalonde
110 Drew
c) PQ
Fig 8.2: Results of (a) union (b) difference and (c) intersection operations

2.2 Difference ( - )
The difference operation removes common tuples from the first relation.
R = P-Q such that
R = { t | t  P  t / Q }

Example 3
R, the result of P-Q, gives employees working only on package J1. (figure 8.2(b) in example
2). Employees working on both packages J1 and J2 have been removed.

2.3 Intersection ()


The intersection operation selects the common tuples from the two relations.
R = P  Q where
R = {t |  t P  t  Q}

Example 4
The resultant relation PQ is the set of all employees working on both the packages. (figure
8.2(c) of example 2).

The intersection operation is really unnecessary. It can be very simplyexpressed as:


P  Q= P-(P-Q)

It is, however, more convenient to write an expression with a single intersection operation
than one involving a pair of difference operations.

Note that in these examples the operand and the result relation schemes, including the
attribute names, are identical i.e. P=Q=R. If the attribute names of compatible relations are
not identical, the naming of the attributes of the result relation will have to be resolved.

Unit 8: Relational Algebra 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2.4 Cartesian Product (x)


The extended Cartesian or simply the Cartesian product of two relations is the concatenation
of tuples belonging to the two relations. A new resultant relation scheme is created
consisting of all possible combinations of the tuples.

R=PxQ
where a tuple r  R is given by { t1|| t2 | t1  P  t2  Q }, i.e. the result relation is obtained
by concatenating each tuple in relation P with each tuplein relation Q. Here, we represent
the concatenation operation.

The scheme of the result relation is given by:


R = P || Q
The degree of the result relation is given by:
|R| = |P| + |Q|
The cardinality of the result relation is given by:
|R| = |P|*|Q|

Example 5
The Cartesian product of the PERSONNEL relation and SOFTWARE_PACKAGE relations of
figure 8.3(a) is shown in figure 8.3(b). Note that the relations P and Q from figure 8.1 of
Example 1 are a subset of the PERSONNEL relation.

PERSONNEL:
Id Name
101 Jones
103 Smith
104 Lalonde
107 Evan
110 Drew
112 Smith
Software Packages:

S
J1
J2
(a)

Unit 8: Relational Algebra 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

PERSONNEL:
Id P. Name S
101 Jones J1
101 Jones J2
103 Smith J1
103 Smith J2
104 Lalonde J1
104 Lalonde J2
106 Byron J1
106 Byron J2
107 Evan J1
107 Evan J2
110 Drew J1
110 Drew J2
112 Smith J1
112 Smith J2
(b)
Fig 8.3: (a) PERSONNEL (Emp#, Name) and SOFTWARE_PACKAGE(S)
represent employees and software packages respectively; (b) the Cartesianproduct of
PERSONNEL and SOFTWARE_PACKAGES.

The union and intersection operations are associative and commutative;therefore,


given relations R, S, T:
R  (S  T) = (R  S)  T = (S  R)  T = T  (S  R)=...R (S  T) =(R  S)  T=...
The difference operation, in general, is non-commutative and non-associative.
R-S  S-R non-commutative
R-(S-T)  (R-S)-T non-associative

Self-Assessment Questions – 1

1. In Relational Algebra the Basic operations are the traditional ______


orations.
2. The Cartesian product can be defined on any ______relations.
3. If we assume that P and Q are two union-compatible relations, then theunion of
P and Q is set-theoretic ______of P and Q.
4. The difference operation removes _____tuples from the first relation.
5. The operation selects the common tuples from the two relations.
6. The Cartesian product of two relations is the ____of tuplesbelonging to
the two relations.
7. The union and intersection operations are associative and ______.
8. The ____operation, in general, is non-commutative and non-associative.

Unit 8: Relational Algebra 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. ADDITIONAL RELATIONAL ALGEBRAIC OPERATIONS

The basic set operations, which provide a very limited data manipulation facility, have been
supplemented by the definition of the following operations: projection, selection, join, and
division. These operations arerepresented by symbols , , ⨝ and  respectively. Projection
and selectionare unary operations; join and divisions are binary.

3.1 Projection ()


The projection of a relation is defined as a projection of all its tuples over some set of
attributes, i.e., it yields a vertical subset of the relation. The projection operation is used to
either reduce the number of attributes in the resultant relation or to reorder attributes.
In the first case, the arity (or degree) of the relation is reduced. The projection operation
is shown graphically in figure 8.4. Figure 8.4 shows the projection of the relation
PERSONNEL on the attribute Name. The cardinality of the result relation is also reduced due
to the deletion of duplicate tuples.

Id Name Name
101 Jones Jones
103 Smith Smith
104 Lalonde Lalonde
106 Byron Byron
107 Evan Evan
110 Drew Drew
112 Smith Smith

Fig 8.4: Projection of relation PERSONNEL over attribute Name

We defined the projection of a tuple ti over the attribute A, denoted ti[A] or A(ti), as (a),
where a is the value of tuple ti over the attribute A. Similarly, we define the projection of a
relation T, denoted by T[A] or A(ti), on the attribute A. This is defined in terms of the
projection for each tuple in ti belonging to T on the attribute A as:

T[A] = { ai | ti [A] = ai ti  T}

where T[A] is a single attribute relation and | T[A] | £ T. The cardinality T[A] may be less
than the cardinality |T| because of the deletion of any duplicates in the result. A case in point
is illustrated in figure 8.4.

Unit 8: Relational Algebra 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Similarly, we can define the projection of a relation on a set of attribute names, X, as a


concatenation of the projections for each attribute A in X for every tuple in the relation.

T[X] = { ti[A] | ti  T }
Here A belongs to X where ti[A] represents the concatenation of all ti[A] for all A  X (A
belongs to X)

Simply stated, the projection of a relation P on the set of attribute names Y belongs to P is the
projection of each tuple of the relation P on the set of attribute names Y.

Note that the projection operation reduces the arity if the number of attributes in X is less
than the arity of the relation. The projection operation may also reduce the cardinality of the
result relation since duplicate tuples are removed. (Note that the projection operation
produces a relation as the result. By definition, a relation cannot have duplicate tuples. In
most commercial implementations of the relational model, however, the duplicates would
still be present in the result).

3.2 Selection ()


Suppose we want to find those employees in the relation PERSONNEL of figure 8.3(a) of
Example 5 with an Id less than 105. This is an operation that selects only some of the tuples
in the relation. Such an operation is known as a selection operation. The projection
operation yields a vertical subset of a relation. The action is defined over a subset of the
attribute names but overall the tuples in the relation. The action is defined over a subset of
the attribute names but overall the tuples in the relation. The selection operation, however,
yields a horizontal subset of a given relation, i.e., the action is defined over a complete set of
attribute names but only a subset of

the tuples are included in the result. To have a tuple included in the result relation, the
specified selection conditions or predicates must be satisfied byit. The selection operation is
sometimes known as the restriction operation.

Unit 8: Relational Algebra 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

PERSONNEL:
Id Name
101 Jones
103 Smith
104 Lalonde
107 Evan
110 Drew
112 Smith

Results of Selection
Id Name
101 Jones
103 Smith
104 Lalonde

Fig 8.5: Result of Selection over PERSONNEL for Id< 105

Any finite number of predicates connected by Boolean operators may be specified in the
selection operation. The predicates may define a comparison between two domain-
compatible attributes or between an attribute and a constant value; if the comparison is
between attribute A1 andconstant c1, then c1 belongs to Dom (A1).

Given a relation P and a predicate expression B, the selections of those tuples of relation P
that satisfy the predicate B is a relation R written as:
R =  B (P)

The above expression could be read as "select those tuples t from P in which the predicate
B(t) is true". The set of tuples with R is in this case defined as follows:
R= { t | t  to P B (t) }

3.3 JOIN (⨝)


The join operator, as the name suggests, allows the combining of two relations to form a
single new relation. The tuples from the operand relations that participate in the operation
and contribute to the result are related. The join operation allows the processing of
relationships existing between the operand relations.

Unit 8: Relational Algebra 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Example 6
In figure 8.6 we encounter the following relations:
ASSIGNMENT (Emp#, Prod#, Job#)
JOB_FUNCTION (Job#, Title)
EMPLOYEE:
Emp# Name Profession
101 Jones Analyst
103 Smith Programmer
104 Lalonde Receptionist
106 Byron Receptionist
107 Evan VPR & D
110 Drew VP operations
112 Smith Manager
PRODUCT:
Prod# Prod-Name Prod-details
HEAP1 HEAP-SORT ISS Module
BINS9 BINARY-SEARCH ISS/R Module
FM6 FILE-MANAGER ISS/R-PC Subsys
B++1 B++_TREE ISS/R Turbo Sys
B++2 B++_TREE VPR & D
a)
JOB-FUNCTION:
Job# Title
1000 CEO
900 President
800 Manager
700 Chief Programmer
600 Analyst

ASSIGNMENT
Emp# Prod# Job#
107 HEAP1 800
101 HEAP1 600
110 BINS9 800
103 HEAP1 700
101 BINS9 700
110 FM6 800
107 B++1 800
b)
Fig 8.6: (a) Relation schemes for employee role in development teams
b) Sample relations

Unit 8: Relational Algebra 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Suppose we want to respond to the query “Get product number of assignments whose
development teams have a chief programmer”. This requires first computing the Cartesian
product of the ASSIGNMENT and JOB_FUNCTION relations. Let us name this product relation
TEMP. This is followed by selecting those tuples of TEMP where the attribute Title has the
value chief programmer and the value of the attribute Job# in ASSIGNMENT and
JOB_FUNCTION are the same. The required result, shown below is obtained by projecting
these tuples on the attribute Prod#. The operations are specified below.
TEMP = (ASSIGNMENT X JOB_FUNCTION)
Prod#( Title = 'chief programmer'  [Link]# (TEMP))
Prod#
HEAP1
BINS9

In another method of responding to this query, we can first select those tuples from the
JOB_FUNCTION relation so that the value of the attribute Title is a chief programmer. Let us
call this set of tuples the relation TEMP1. We then compute the Cartesian product of TEMP1
and ASSIGNMENT, calling the product TEMP2. This is followed by a projection on Prod# over
TEMP2 to give us the required response. These operations are specified below:
TEMP1 = ( Title = 'chief programmer' (JOB_FUNCTION))
TEMP2 = ( [Link]# = JOB_FUNCTION.Job# (ASSIGNMENTX TEMP1))
Prod# (TEMP2) gives the required result.

Notice that in the selection operation that follows the Cartesian product we take only those
tuples where the value of the attributes [Link]#and JOB_FUNCTION.Job# are the
same. These combined operations of the cartesian product followed by selection are the join
operation. Note that we have qualified the identically named attributes by the name of the
corresponding relation to distinguishing them.

In case of the join of a relation with itself, we would need to rename either the attributes of
one of the copies of the relation or the relation name itself. We illustrate this in example 7.

In general, the join condition may have more than one term, necessitating the use of the
subscript in the comparison operator. Now we shall define the different types of join
operations.

Unit 8: Relational Algebra 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

In these discussions, we use P, Q, R, and so on to represent both the relation scheme and the
collection or bag of underlying domains of the attributes. We call it a bag of domains because
more than one attribute may be defined on the same domain.

Typically, P С Q may be null and this guarantees the uniqueness of attributenames in the
result relation. When the same attribute name occurs in thetwo schemes we use qualified
names.

Two common and very useful variants of the join are the equi-join and the natural join. In
the equi-join the comparison operator theta (i=1,2,...,n) is always the equality operator (=).
Similarly, in the natural join, the comparison operator is always the equality operator.
However, only one of the two sets of domain compatible attributes involved in the natural
join is Ai from P and Bi from Q, for i = 1,...n, the natural join predicate is a conjunction of
terms of the following form:
( t1[Ai] = t2[Bi] ) for i = 1,2...n
Domain compatibility requires that the domains of Ai and Bi be compatible, and for this
reason relation scheme P and Q have attributes defined on common domains, i.e., P С Q¹.
Therefore, join attributes have common domains in the relation schemes P and Q.
Consequently, only one set of the join attributes on these common domains needs to be
preserved in the result relation. This is achieved by taking a projection after the join
operation, thereby eliminating the duplicate attributes. If the relations P and Q have
attributes with the same domain, but different attribute names, then renaming or projection
may be specified.

Example 7
Given the EMPLOYEE and SALARY relations of figure 8.7(i), if we have to find the salary of
employees by name, we join the tuples in the relation EMPLOYEE with those in SALARY such
that the value of the attribute Id in EMPLOYEE is the same as that in SALARY. The natural
join takes the predicate expression to be [Link] = [Link]. The result of the natural
join is shown in figure 8.7(ii). When using the natural join, we do not need to specify this
predicate. The expression to specify the operation of finding the salary of employees by name
is given as follows. Here we project the result of the natural join operation on the attributes
Name and Salary:

Unit 8: Relational Algebra 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

(Name. Salary)(EMPLOYEE |x| SALARY)


EMPLOYEE:
Id Name
101 Jones
103 Smith
104 Lalonde
107 Evan

SALARY:
Id Salary
101 67
103 55
104 75
107 80

EMPLOYEE |X| SALARY


Id Name Salary
101 Jones 67
103 Smith 55
104 Lalonde 75
107 Evan 80

[Link]# [Link]#
107 107
107 101
107 103
101 107
101 101
101 103
110 101
103 107
103 101
103 103
101 110

Fig 8.7: i) The natural join of EMPLOYEE and SALARY relations;


ii) The joint of ASSIGNMENT with the renamed copy

Unit 8: Relational Algebra 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3.4 Division ()


Before we define the division operation, let us consider an example.

Example 8
Given the relations P and Q, as shown in figure 8.8(a), the result of dividingP by Q is the
relation R and it has two tuples. For each tuple in R, its product with the tuples of Q must be
in P. In our example (a1,b1) and (a2,b2) must both be tuples in P; the same is true for (a5,b1)
and (a5,b2).

Simply stated, the cartesian product of Q and R is a subset of P. In figure 8.8(b), the result
relation R has four tuples; the cartesian product of R and Qgives a resulting relation which is
again a subset of P. In figure 8.8(c) since there are no tuples in P with a value b3 for the
attribute B (i.e., selecting B=b3(P) = 0), we have an empty relation R, which has a cardinality
of zero.

Fig 8.8: Examples of the division operation. (a) R = P / Q; (b) R = P / Q(P is same in
part i); (c) R=P/Q (P is same as in part i); (d) R= P/Q (P is same as in part i)

In figure 8.8(d), the relation Q is empty. The result relation can be definedas the projection
of P on the attributes in P-Q. However, it is usual to disallow division by an empty relation.

Unit 8: Relational Algebra 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Finally, if relation P is an empty relation, then relation R is also an empty relation.

Let us treat the Q as representing one set of properties (the properties are defined on the Q,
each tuple in Q representing an instance of these properties) and the relation r as
representing entities with these properties (entities are defined on P-Q, and the properties
are, as before, defined on Q); note that P  Q must be equal to P. Each tuple in P represents
an object with some given property. The resultant relation R, then, is the set of entities that
possesses all the properties specified in Q. The two entities a1 and a5 possess all the
properties, i.e., b1 and b2. The other entities in P, a2, a3, and a4, only possess one, not both, of
the properties. The divisionoperation is useful when a query involves the phrase "for all
objects having all the specified properties." Note that both P-Q and Q in general representa
set of attributes. It should be clear that Q is not a subset of P.

Self-Assessment Questions – 2

9. The projection of a relation yields a _________subset of the relation.


10. The projection operation may_________the cardinality of the resultrelation
since duplicate tuples are removed.
11. The selection operation yields a____________subset of a given relation.
12. The ____operator allows the combining of two relations toform a
single new relation.
13. The join operation allows the processing of _____________existing between the
operand relations.
14. Two common and very useful variants of the join are the equi-join and the
______.
15. In the _______________ the comparison operator theta (i=1,2,...,n) is always the
equality operator (=).
16. In the natural join the _______________ operator is always the equality
operator.

Unit 8: Relational Algebra 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. SUMMARY
In this unit, we discussed that relational algebra defines a set of algebraic operations that
operate on tables, and output tables as their results. These operations can be combined to
get expressions that express desired queries. The algebra defines the basic operations used
within relational query languages.

The operations in relational algebra can be divided into two types. One is Basic operations
and the other is Additional operations that can be expressed interms of the basic operations.
We used relational algebra with the assignment operator to express these modifications.

5. TERMINAL QUESTIONS
1. Explain the statement that relational algebra operators can be composed. Why is the
ability to compose operators importantly?
2. Given two relations R1 and R2, where R1 contains N1 tuples, R2 contains N2 tuples, and
N2 > N1 > 0, give the minimum and maximum possible sizes (in tuples) for the result
relation produced by each of the following relational algebra expressions. In each case,
state any assumptions about the schemas for R1 and R2 that are needed to makethe
expression meaningful:
a. R1 υ R2
b. R1 ∩ R2
c. R1− R2
d. R1 x R2
e.  a=5 (R1)
f. a (R1)
g. R1/R2
3. Consider the following schema:
Suppliers(sid: integer, sname: string, address: string)
Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)

Unit 8: Relational Algebra 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The key fields are underlined, and the domain of each field is listed after thefield name. Thus
sid is the key for Suppliers, pid is the key for Parts, and sidand pid together form the key for
Catalog. The Catalog relation lists the prices charged for parts by Suppliers. Write the
following queries in relational algebra.
1. Find the names of suppliers who supply some red part.
2. Find the sids of suppliers who supply some red or green part.
3. Find the sids of suppliers who supply some red part or are at 221Packer Ave.
4. Find the sids of suppliers who supply some red part and some greenpart.
5. Find the sids of suppliers who supply every part.
6. Find the sids of suppliers who supply every red part.
7. Find the sids of suppliers who supply every red or green part.
8. Find the sids of suppliers who supply every red part or supply everygreen part.

6. ANSWERS
Self-Assessment Questions
1. Set
2. Two
3. Union
4. Common
5. Intersection
6. Concatenation
7. Commutative Difference
8. Vertical
9. Reduce
10. Horizontal
11. Join
12. Relationships
13. natural join
14. equi-join
15. comparison

Unit 8: Relational Algebra 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions
1. A set of basic operators in the Relational Algebra can take relations as their operands
and return the result as a relation. Hence the output of one operation may be used as
input to another operation. In other words operators are composed. (Refer section 3
for detail)
2. Refer section 2 for detail.
3. Refer section 2 and 3 for detail.

Unit 8: Relational Algebra 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 9: Relational Calculus 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 9
Relational Calculus
Table of Contents

SL Topic Fig No / Table / SAQ / Page No


No Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Tuple Relational Calculus - 1

2.1 Semantics of TRC Queries 1, 2, 3, 4 - 4–9

2.2 Examples of TRC Queries - -


3 Domain Relational Calculus - 2 10 – 12
4 Relational ALGEBRA vs Relational CALCULUS - 3 13 – 14
5 Summary - 15 - 16
6 Terminal Questions - - 15
7 Answers - - 16

Unit 9: Relational Calculus 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
Relational calculus is an alternative to relational algebra. In contrast to algebra, which is
procedural, the calculus is nonprocedural, or declarative, in that it allows us to describe
the set of answers, without being explicitabout how they should be computed. Relational
calculus has had a big influence on the design of commercial query languages such as SQL
and, especially, Query-by-Example (QBE).

The variant of the calculus that we present in detail is called the tuple relational calculus
(TRC). Variables in TRC take on tuples as values. In another variant, called the domain
relational calculus (DRC), the variables range over field values. TRC has had more of an
influence on SQL, while DRC has strongly influenced QBE.

1.1 Objectives:
After learning this unit, you should be able to:

❖ Explain about relational calculus.


❖ Describe about tuple relational calculus and its semantics.
❖ Explain about domain relational calculus
❖ Differentiate relational ALGEBRA and relational CALCULUS

Unit 9: Relational Calculus 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. TUPLE RELATIONAL CALCULUS


A tuple variable is a variable that takes on tuples of a particular relation schema as values.
That is, every value assigned to a given tuple variable has the same number and type of
fields. A tuple relational calculus query has the form { T | p(T) } where T is a tuple variable
and p(T) denotes a formula that describes T; we will shortly define formulas and queries
rigorously. The result of this query is the set of all tuples t for which the formula p(T)
evaluates to true with T = t. The language for writing formulas p(T) is thus at the heart of
TRC and is essentially a simple subset of first-order logic. As a simple example, consider the
following query.

For Example: Find all sailors with a rating above 7. (Sailors- S is a relation)
{ S | S ε Sailors ^ S: rating > 7 }

When this query is evaluated on an instance of the Sailors relation, the tuplevariable S is
instantiated successively with each tuple, and the test [Link]>7 is applied. The answer
contains those instances of S that pass this test.

Syntax of TRC Queries


We now define these concepts formally, beginning with the notion of a formula. Let Rel be a
relation name, R and S be tuple variables, a an attribute of R, and b an attribute of S. Let op
denote an operator in the set(<, >, =, ≤, ≥ ). An atomic formula is one of the following:
• R ε Rel (ε is belongs to)
• R.a op S.b
• R.a op constant, or constant op R.a

A formula is recursively defined to be one of the following, where p and q are themselves
formulas, and p(R) denotes a formula in which the variableR appears:

Unit 9: Relational Calculus 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

In the last two clauses above, the quantifiers “For any” and “For all” are said to bind the
variable R. A variable is said to be free in a formula or subformula (a formula contained in a
larger formula) if the subformula does not contain an occurrence of a quantifier that binds
it.

We observe that every variable in a TRC formula appears in a subformula that is atomic, and
every relation schema specifies a domain for each field; this observation ensures that each
variable in a TRC formula has a well-defined domain from which values for the variable are
drawn. That is, each variable has a well-defined type, in the programming language sense.
Informally, an atomic formula R ε Rel gives R the type of tuples in Rel, and comparisons such
as R.a op S.b and R.a op constant induce type restrictions on the field R.a. If a variable R does
not appear in an atomic formula of the form R ε Rel (i.e., it appears only in atomic formulas
that are comparisons), we will follow the convention that the type of R is a tuple, whose fields
include all (and only) fields of R that appear in the formula.

We will not define types of variables formally, but the type of a variable should be clear in
most cases, and the important point to note is that comparisons of values having different
types should always fail. (In discussions of relational calculus, the simplifying assumption is
often made that there is a single domain of constants and that this is the domain associated
with each field of each relation.)

A TRC query is defined to be an expression of the form { T | p(T) }, where T is the only free
variable in the formula p.

2.1 Semantics of TRC Queries


What does a TRC query mean? More precisely, what is the set of answer tuples for a given
TRC query? The answer to a TRC query { T | p(T) }, as we noted earlier, is the set of all tuples
t for which the formula p(T) evaluates to true with variable T assigned the tuple value t. To
complete this definition, we must state which assignments of tuple values to the free
variables in a formula make the formula evaluate to true.

A query is evaluated on a given instance of the database. Let each free variable in a formula
F be bound to a tuple value. For the given assignment of tuples to variables, with respect

Unit 9: Relational Calculus 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

to the given database instance,F evaluates to (or simply ‘is') true if one of the following
holds:
• F is an atomic formula R ε Rel, and R is assigned a tuple in the instanceof relation Rel.
• F is a comparison R.a op S.b, R.a op constant, or constant op R.a, and the tuples assigned
to R and S have field values R.a and S.b that make the comparison true.
• F is of the form ¬p, and p is not true or of the form p ^ q and both p andq are true; or of
the form p ν q, and one of them is true, or of the form (p)q and q is true whenever p is
true.
• F is of the form For Any R(p(R)), and there is some assignment of tuplesto the free
variables in p(R), including the variable R that makes the formula p(R) true.
• F is of the form For all R(p(R)), and there is some assignment of tuplesto the free
variables in p(R) that makes the formula p(R) true no matter what tuple is assigned to
R.

2.2 Examples of TRC Queries


We now illustrate the calculus through several examples, using the instances B1 of Boats,
R2 of Reserves, and S3 of Sailors shown in Figures 9.1, 9.2, and 9.3. We will use parentheses,
as needed, to make our formulas unambiguous. Often, a formula p(R) includes a condition R
ε Rel and the meaning of the phrases some tuple R and for all tuples R is intuitive. We will use
the notation For any R ε Rel(p(R)) for any R(R ε Rel ^ p(R)).

Fig 9.1: An Instance S3 of Sailors Fig 9.2: An Instance R2 of Reserves

Unit 9: Relational Calculus 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 9.3: An Instance B1 of Boats

Similarly, we use the notation For all R ε Rel(p(R)) for all R(R ε Rel) p(R)).Find the names
and ages of sailors with a rating above 7.

This query illustrates a useful convention: P is considered to be a tuple variable with exactly
two fields, which are called name and age, because these are the only fields of P that are
mentioned and P does not range over any of the relations in the query; that is, there is no
subformula of the formP ε Relname. The result of this query is a relation with two fields,
name and age. The atomic formulas [Link] = [Link] and [Link] = [Link] give values to the
fields of an answer tuple P. On instances B1, R2, and S3, the answer is the set of tuples
<Lubber, 55:5>, <Andy, 25:5>, <Rusty, 35.0>, <Zorba, 16.0>, and <Horatio, 35.0>.

Find the sailor name, boat id, and reservation date for each reservation.

For each reserve tuple, we look for a tuple in Sailors with the same sid. Given a pair of such
tuples, we construct an answer tuple P with fields sname, bid, and day by copying the
corresponding fields from these two tuples. This query illustrates how we can combine
values from different relations in each answer tuple. The answer to this query on instances
B1, R2, and S3 is shown in Figure 9.4.

Find the names of sailors who have reserved boat 103.

Unit 9: Relational Calculus 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

This query can be read as follows: “Retrieve all sailor tuples for which thereexists a tuple in
Reserves, having the same value in the sid field, and with bid = 103." That is, for each sailor
tuple, we look for a tuple in Reserves thatshows that this sailor has reserved boat 103. The
answer tuple P contains just one field, sname.

(Q2) Find the names of sailors who have reserved a red boat.

Fig 9.4: Answer to Query

This query can be read as follows: “Retrieve all sailor tuples S for which there exist tuples R
in Reserves and B in Boats such that [Link] = [Link], [Link] = [Link], and [Link] = ‘red’.” Another
way to write this query, which corresponds more closely to this reading, is as follows:

(Q7) Find the names of sailors who have reserved at least two boats.

Contrast this query with the algebra version and see how much simpler the calculus version
is. In part, this difference is due to the cumbersome renaming of fields in the algebra version,
but the calculus version really is simpler.

(Q9) Find the names of sailors who have reserved all boats.

Unit 9: Relational Calculus 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

This query was expressed using the division operator in relational algebra. Notice how easy
it is expressed in calculus. The calculus query directly reflects how we might express the
query in English: “Find sailors S such thatfor all boats B there is a Reserves tuple showing
that sailor S has reserved boat B.”

(Q14) Find sailors who have reserved all red boats.

This query can be read as follows: For each candidate (sailor), if a boat is red, the sailor must
have reserved it. That is, for a candidate sailor, a boat being red must imply the sailor having
reserved it. Observe that since we can return an entire sailor tuple as the answer instead of
just the sailor's name, we have avoided introducing a new free variable (e.g., the variable P
in the previous example) to hold the answer values. In instances B1, R2, and S3, the answer
contains the Sailors tuples with sids 22 and 31.

We can write this query without using implication, by observing that an expression of the
form p=>q is logically equivalent to ¬pνq:

This query should be read as follows: “Find sailors S such that for all boats B, either the boat
is not red, or a Reserves tuple shows that sailor S has reserved boat B.”

Self-Assessment Questions – 1

1. A _______is a variable that takes on tuples of a particular relation schema as


values.
2. A TRC stands for ___________.
3. In the atomic formula clauses, the quantifiers “For any” and “For all” aresaid to
_______the variable R.
4. A ________query is defined to be expression of the form { T | p(T) },where T
is the only free variable in the formula p.
5. A query is evaluated on a given __________of the database.

Unit 9: Relational Calculus 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. DOMAIN RELATIONAL CALCULUS


A domain variable is a variable that ranges over the values in the domainof some attribute
(e.g., the variable can be assigned an integer if it appears in an attribute whose domain is the
set of integers). A DRC query has the form { <x1, x2, …… xn> | p(<x1, x2, …. Xn>) }, where each
xi is either a domain variable or a constant and p(<x1, x2, …. Xn>) denotes a DRCformula
whose only free variables are the variables among the xi ; 1 ≤ I ≤ n. The result of this query is
the set of all tuples <x1, x2…. xn> for which the formula evaluates to true.

A DRC formula is defined in a manner that is very similar to the definition of a TRC formula.
The main difference is that the variables are now domain variables. Let op denote an
operator in the set { <, >,=, ≤, ≥, ≠ } and let X and Y be domain variables.

An atomic formula in DRC is one of the following:


• <x1, x2,….. xn> ε Rel, where Rel is a relation with n attributes; each xi ; 1
≤ I ≤ n is either a variable or a constant.
• X op Y
• X op constant, or constant op X

A formula is recursively defined to be one of the following, where p and q are themselves
formulas, and p(X) denotes a formula in which the variable X appears: any atomic formula.

The reader is invited to compare this definition with the definition of TRC formulas and see
how closely these two definitions correspond. We will not define the semantics of DRC
formulas formally; this is left as an exercise forthe reader.

Examples of DRC Queries


We now illustrate DRC through several examples. The reader is invited to compare these
with the TRC versions.

Unit 9: Relational Calculus 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

(Q11) Find all sailors with a rating above 7.

This differs from the TRC version in giving each attribute a (variable) name. The condition <
I,N,T,A > ε Sailors ensures that the domain variables I, N, T,and A are restricted to be fields of
the same tuple. In comparison with the TRC query, we can say T > 7 instead of [Link] > 7,
but we must specifythe tuple < I,N, T,A> in the result, rather than just S.

(Q1) Find the names of sailors who have reserved boat 103.

Notice that only the sname field is retained in the answer and that only N isa free variable.
We use the notation For any Ir, Br, D(…) as a shorthand for For any Ir(For any Br(For any D(:
: :))).

Very often, all the quantified variables appear in a single relation, as in this example. An even
more compact notation in this case is For any < Ir, Br, D> ε Reserves. With this notation, which
we will use henceforth, the above query would be as follows:

The comparison with the corresponding TRC formula should now be straightforward. This
query can also be written as follows; notice the repetition of variable I and the use of the
constant 103:

(Q2) Find the names of sailors who have reserved a red boat.

(Q7) Find the names of sailors who have reserved at least two boats
〈〉
〈〉|{N | I , T, A
( I, N,T, A
 Sailors 
Br1, Br2, D1, D2 (
I, Br1, D1 Reserves 
I, Br2, D2 Reserves  Br1  Br2)

Unit 9: Relational Calculus 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

〈〉〈〉Notice how the repeated use of variable I ensures that the same sailor has reserved both
the boats in question.

(Q9) Find the names of sailors who have reserved all boats.

This query can be read as follows: “Find all values of N such that there is some tuple <I,N,T,A>
in Sailors satisfying the following condition: for every <B, BN, C >, either this is not a
tuple in Boats or there is some tuple <Ir, Br,D> in Reserves that proves that Sailor I has
reserved boat B.” The For all quantifier allows the domain variables B, BN, and C to range
overall values in their respective attribute domains, and the pattern ‘ ¬(<B, BN,C> ε Boats) ν
’ is necessary to restrict attention to those values that appear in tuples of Boats. This pattern
is common in DRC formulas, and the notation For all <B, BN,C> ε Boats can be used as a
shorthand instead. This is similar to the notation introduced earlier for for any. With this
notationthe query would be written as follows:

(Q14) Find sailors who have reserved all red boats.

Here, we find all sailors such that for every red boat there is a tuple inReserves that
shows the sailor has reserved it.

Self-Assessment Questions – 2

6. A _______is a variable that ranges over the values in the domainof some attribute.
7. A DRC formula is defined in a manner that is very similar to thedefinition of a
_______.
8. The main difference between the DRC and TRC is that the variablesare now
_______variables.

Unit 9: Relational Calculus 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. RELATIONAL ALGEBRA VS RELATIONAL CALCULUS


Relational algebra is a procedural language while relational calculus is a non- procedural
language. We have presented two formal query languages forthe relational model. Are
they equivalent in power? Can every query thatcan be expressed in relational algebra also
be expressed in relational calculus? The answer is yes, it can. Can every query that can be
expressed in relational calculus also be expressed in relational algebra? Before we answer
this question, we consider a major problem with calculus as we have presented it.

Consider the query { S | ¬(S ε Sailors) }. This query is syntactically correct. However, it asks
for all tuples S such that S is not in (the given instance of) Sailors. The set of such S tuples is
obviously infinite, in the context of infinitedomains such as the set of all integers. This simple
example illustrates an unsafe query. It is desirable to restrict relational calculus to disallow
unsafe queries.

We now sketch how calculus queries are restricted to be safe. Consider a set I of relation
instances, with one instance per relation that appears in the query Q. Let Dom(Q,I) be the set
of all constants that appear in these relation instances I, or in the formulation of the query Q
itself. Since we only allow finite instances I, Dom(Q, I) is also finite.

For a calculus formula Q to be considered safe, at a minimum we want to ensure that for any
given I, the set of answers for Q contains only values that are in Dom(Q, I).

While this restriction is obviously required, it is not enough. Not only do we want the set of
answers to be composed of constants in Dom(Q,I), we wish to compute the set of answers by
only examining tuples that contain constants in Dom(Q, I)! This wish leads to a subtle point
associated with theuse of quantifiers For all and For any: Given a TRC formula of the form
For any R(p(R)), we want to find all values for variable R, that make this formulatrue by
checking only tuples that contain constants in Dom(Q, I). Similarly, given a TRC formula of
the form For all R(p(R)), we want to find any values for variable R, that make this formula
false, by checking only tuples that contain constants in Dom(Q, I).

Unit 9: Relational Calculus 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

We therefore define a safe TRC formula Q to be a formula such that:


1. For any given I, the set of answers for Q contains only values that are in Dom(Q,I).
2. For each subexpression of the form For any R(p(R)) in Q, if a tuple r (assigned to
variable R) makes the formula true, then r contains only constants in Dom(Q, I).
3. For each subexpression of the form For all R(p(R)) in Q, if a tuple r (assigned to variable
R) contains a constant that is not in Dom(Q, I), then r must make the formula true.

Note that this definition is not constructive, that is, it does not tell us how to check if a query
is safe.

The query Q = { S |¬(S 2 Sailors) } is unsafe by this definition. Dom(Q,I) is the set of all values
that appear in (an instance I of) Sailors. Consider the instance S1 shown in Figure 9.1. The
answer to this query obviously includes values that do not appear in Dom(Q, S1). Returning
to the questionof expressiveness, we can show that every query that can be expressed using
a safe relational calculus query, can also be expressed as a relational algebra query. The
expressive power of relational algebra is often used as ametric of how powerful a relational
database query language is. If a query language can express all the queries that we can
express in relational algebra, it is said to be relationally complete. A practical query
language isexpected to be relationally complete; in addition, commercial query languages
typically support features that allow us to express some queries that cannot be expressed in
relational algebra.

Self-Assessment Questions – 3

9. Can every query that can be expressed in relational algebra also beexpressed
in relational calculus?
10. If a query language can express all the queries that we can express inrelational
algebra, it is said to be __________.

Unit 9: Relational Calculus 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. SUMMARY
In this unit, we dealt with that instead of describing a query by how to compute the output
relation, a relational calculus query describes the tuples in the output relation. The language
for specifying the output tuples is essentially arestricted subset of first-order predicate logic.
In tuple relational calculus, variables take on tuple values and in domain relational calculus,
variables take on field values, but the two versions of the calculus are very similar.

All relational algebra queries can be expressed in relational calculus. If we restrict ourselves
to safe queries on the calculus, the converse also holds. An important criterion for
commercial query languages is that they should berelationally complete in the sense that
they can express all relational algebra queries.

6. TERMINAL QUESTION
1. What is relational completeness? If a query language is relationallycomplete, can you
write any desired query in that language?
2. What is an unsafe query? Give an example and explain why it isimportant to disallow
such queries.
3. Let the following relation schemas be given:
R = (A,B,C)
S = (D,E, F)
Let relations r(R) and s(S) be given. Give an expression in the tuplerelational
calculus that is equivalent to each of the following:
a. ΠA(r)
b. σB =17 (r)
c. r×s
d. ΠA,F (σC =D(r × s))
Let R = (A, B, C), and let r1 and r2 both be relations on schema R. Give an expression in
the domain relational calculus that is equivalent to eachof the following:
a. ΠA(r1)
b. σB =17 (r1)
c. r1 ∪ r2
d. r1 ∩ r2
e. r1 − r2
f. ΠA,B(r1) ∪ B,C(r2)
4. How do you differentiate relational algebra and relational calculus?

Unit 9: Relational Calculus 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. ANSWERS
Self-Assessment Questions
1. tuple variable
2. tuple relational calculus
3. bind
4. TRC
5. Instance
6. domain variable
7. TRC formula
8. Domain
9. Yes
10. relationally complete

Terminal Questions
1. If a query language can express all the queries that we can express in relational
algebra, it is said to be relationally complete. Yes, we can write the desired query in
that language if features are supported. (Refer section 4)
2. Queries where the set of S tuples is obviously infinite in the context of infinite
domains such as the set of all integers then such queries are unsafe queries.
3. Refer the whole unit for detail.
4. All relational algebra queries can be expressed in relational calculus. If we restrict
ourselves to safe queries on the calculus, the converse also holds.

Unit 9: Relational Calculus 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 10: Normalization 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 10
Normalization
Table of Contents

SL Topic Fig No / Table / SAQ / Page No


No Graph Activity
1 Introduction
3
1.1 Objectives
2 Functional Dependency T1, F1, F2 1 4–6
3 Anomalies in a Database T2, F3 2 7 – 10
4 Properties of Normalized Relations 3 11
5 First Normalization T3, T3a, T3b, F4 4 12 – 13
6 Second Normal Form Relation T4, F5 5 14 – 15
7 Third Normal Form T5, T6, F6, F7 6 16 – 18
8 Boyce-Codd Normal Form (BNCF) F8, F9, T7, T8 7 19 – 21
9 Fourth and Fifth Normal Form T9, T10, T11, 8
22 – 24
T12, F10
10 Summary 25
11 Terminal Questions 25 – 26
12 Answers 26 – 27

Unit 10: Normalization 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
The basic objective of normalization is to reduce redundancy which means that information
is to be stored only once. Storing information several times leads to wastage of storage space,
and an increase in the total size of the data stored. Relations are normalized so that when
relations in a database are tobe altered during the lifetime of the database, we do not lose
information or introduce inconsistencies. The type of alterations normally needed for
relations are:
• Insertion of new data values to a relation. This should be possiblewithout being
forced to leave blank fields for some attributes.
• Deletion of a tuple, namely, a row of a relation. This should be possiblewithout losing
vital information unknowingly.
• Updating or changing a value of an attribute in a tuple. This should be possible
without exhaustively searching all the tuples in the relation.

1.1 Objectives
After going through this unit, the reader should be able to:

❖ Discuss the different types of anomalies in a database.


❖ State the functional dependency.
❖ List the different forms of normalization.
❖ Differentiate among different types of normalization.

Unit 10: Normalization 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. FUNCTIONAL DEPENDENCY
As the concept of dependency is very important, it is essential that we first understand it well
and then proceed to the idea of normalization. There is nofool-proof algorithmic method of
identifying dependency. We have to use our commonsense and judgment to specify
dependencies.

Let X and Y be the two attributes of a relation. Given the value of X, if there is only one value
of Y corresponding to it, then Y is said to be functionally dependent on X. This is indicated by
the notation:
X→Y
For example, given the value of the item code, there is only one value of the item name for it.
Thus item name is functionally dependent on the item code. This is shown as:
Item code → item name

Similarly in Table 10.1, given an order number, the date of the order is known.
Thus: Order no➔ Order date

Functional dependency may also be based on a composite attribute. Forexample, if we


write
X, Z → Y
It means that there is only one value of Y corresponding to given values ofX, Z. In other
words, Y is functionally dependent on the composite X, Z. In Table 10.1 mentioned below, for
example, Order no., and Item code together determine Qty. and Price.

Thus :
Order no., Item code → Qty., Price
As another example, consider the relation

Student (Roll no, Name, Address, Dept., Year of study)

Unit 10: Normalization 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Table 10.1: Normalized Form of the Relation


Order no. Order date Item code Quantity Price/unit

1456 260289 3687 52 50.40

1456 260289 4627 38 60.20

1456 260289 3214 20 17.50

1886 040389 4629 45 20.25

1886 040389 4627 30 60.20

1788 040489 4627 40 60.20

In this relation, Name is functionally dependent on Roll no. In fact, given thevalue of Roll
no., the values of all other attributes can be uniquely

determined. Name and Department are not functionally dependent, because,given the name
of a student, one cannot find his department uniquely. This is due to the fact that there may
be more than one student with the same name. Name in this case is not a key. Department
and Year of study are not functionally dependent, as Year of study pertains to a student,
whereas Department is an independent attribute. The functional dependency in this relation
is shown in the following figure as a dependency diagram. Such dependency diagrams shown
in figure 10.1 are very useful in normalization.

Relation Key: Consider the relation of Table 10.1. Given the Vendor code, the Vendor name
and Address are uniquely determined. Thus Vendor code is the relation key. Given a relation,
if the value of an attribute X uniquely determines the value of all other attributes in a row,
then X is said to be the key of that relation. Sometimes more than one attribute is needed to
uniquely determine other attributes in a relation row. In that case, such a set of attributes is
the key.

Fig 10.1: Dependency diagram for the relation "Student"

Unit 10: Normalization 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

In table 10.1, Order no. and Item code together form the key. In the relation "Supplies"
(Vendor code, Item code, Qty supplied, Date of supply, Price/unit) Vendor code and Item
code together form the key. This dependency is shown in the following diagram (figure 10.2).

Fig 10.2: Dependency diagram for the relation "Supplies"

Observe that in the figure the fact that Vendor code and Item code together form a composite
key is clearly shown by enclosing them together in a rectangle.

SELF ASSESSMENT QUESTIONS – 1

1. Let X and Y be the two attributes of a relation, The Functional Dependency can
be written as _________.
2. Functional dependency may also be based on a attribute.

Unit 10: Normalization 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. ANOMALIES IN A DATABASE
Consider the following relation scheme pertaining to the information about a student
maintained by a university:
STDINF(Name, Course, Phone_No, Major, Prof, Grade)
Table 10.2 shows some tuples of a relation on the relation scheme STDINF(Name, Course,
Phone_No, Major, Prof, Grade). The functionaldependencies among its attributes are shown
in Figure 10.3. The key of the relation is Name Course and the relation has, in addition, the
following functional dependencies {Name➔ Phone_No, Name➔ Major, Name Course➔
Grade, Course➔ Prof }.

Table 10.2: Student Data Representation in Relation STDINF

Name Course Phone_No Major Prof Grade


Jones 353 237-4539 Comp Sci Smith A
Ng 329 427-7390 Chemistry Turner B
Jones 328 237-4539 Comp-Sci Clark B
Martin 456 388-5183 Physics James A
Dulles 293 371-6259 Decision Sci Cook C
Duke 491 823-7293 Mathematics Lamb B
Duke 356 823-7293 Mathematics Bond in prog
Jones 492 237-4539 Comp Sci Cross in prog
Baxter 379 839-0827 English Broes C
Here the attribute Phone_No, which is not in any key of the relation scheme STDINF, is not
functionally dependent on the whole key, but only one part ofthe key, namely, the attribute
Name. Similarly, the attributes Major and Prof, which are not in any key of the relation
scheme STDINF either, are fully functionally dependent on the attributes Name and Course,
respectively. Thus the determinants of these functional dependencies are again not the
entire key, but only part of the key of the relation. Only the attribute Grade isfully functionally
dependent on the key Name Course.

The relation scheme STDINF can lead to several undesirable problems: Redundancy: The
aim of the database system is to reduce redundancy,meaning that information is to be
stored only once. Storing information several times leads to the waste of storage space
and an increase in the total size of the data stored.

Unit 10: Normalization 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Updates to the database with such redundancies have the potential ofbecoming inconsistent
as explained below. In the relation of table 10.2, the Major and Phone_No. of a student are
stored several times in the database:once for each course that is or was taken by a student.

Update Anomalies: Multiple copies of the same fact may lead to update anomalies or
inconsistencies when an update is made, and only some of the multiple copies are updated.
Thus, a change in the Phone_No. of Jones must be made, for consistency, in all tuples
pertaining to the student Jones. If one of the three tuples of Figure 10.3 is not changed to
reflect the new Phone_No. of Jones, there will be an inconsistency in the data.

Insertion Anomalies: If this is the only relation in the database showing the association
between a faculty member and the course he or she teaches, the fact that a given professor
is teaching a given course cannot be entered into the database unless a student is registered
in the course. Also, if another relation also establishes a relationship between a course and a
professor, who teaches that course, the information stored in these relations has to be
consistent.

Fig 10.3: Function dependencies in STDINF

Deletion Anomalies: If the only student registered in a given course discontinues the
course, the information as to which professor is offering thecourse will be lost, if this is the
only relation in the database showing the association between a faculty member and the
course she or he teaches. If another relation in the database also establishes the relationship
between a course and a professor, who teaches that course, the deletion of the last tuple in
STDINF for a given course will not cause the information about the course's teacher to be
lost.

The problems of database inconsistency and redundancy of data are similarto the problems
that exist in the hierarchical and network models. These problems are addressed in the
network model by the introduction of virtual fields and in the hierarchical model by the

Unit 10: Normalization 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

introduction of virtual records. In the relational model, the above problems can be remedied
by decomposition. We define decomposition as follows:

Definition: Decomposition
The decomposition of a relation scheme R = (A1 ,A2,...,An) is its replacement by a set of
relation schemes {R1,R2,...,Rm} such that R1  R for 1  i  m and R1  R2  Rm = R.

A relation scheme R can be decomposed into a collection of relation schemes { R1,R2,R3..., Rm


} to eliminate some of the anomalities contained inthe original relation R. Here the relation
schemes R1 (1  i  m) are subsets of R and the intersection of R1  Rj for i≠ j need not be
empty. Furthermore, he union of Rj(1  i  m) is equal to R, i.e. R=R1R2... Rm.

The problems in the relation scheme STDINF can be resolved if we replace it with the
following relation schemes:
STUDENT _ INFO (Name,Phone_No,Major)
TRANSCRIPT (Name,Course,Grade)
TEACHER (Course, Prof)

The first relation scheme gives the phone number and the major of each student, and such
information will be stored only once for each student. Anychange in the phone number will
thus require a change in only one tuple of this relation.

The second relation scheme stores the grade of each student in each course that the student
is or was enrolled in. The third relation scheme records the teacher of each course. One of
the disadvantages of replacing the original relation scheme STDINF with the three relation
schemes is that the retrieval of certain information requires a natural join operation to be
performed. For instance, to find the majors of a student who obtained a grade of A in course
353 requires a join to be performed: (STUDENT_INFO |x| TRANSCRIPT). The same
information could be derived from the original relation STDINF by selection and projection.

When we replace the original scheme STDINF with the relation schemes STUDENT_INFO,
TRANSCRIPT, and TEACHER, the consistency and referential integrity constraints have to
be enforced. The referential integrity enforcement implies that if a tuple in the relation
TRANSCRIPT exists, such as (Jones, 353, in prog), a tuple must exist in STUDENT_INFO with

Unit 10: Normalization 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Name =Jones, and furthermore, a tuple must exist in STUDENT_INFO with Course = 353.
The attribute Name, which forms part of the key of the relation TRANSCRIPT, is a key of the
relation STUDENT_INFO. Such an attribute (or a group of attributes), which establishes a
relationship between specific tuples (of the same or two distinct relations), is called a foreign
key. Notice that the attribute Course in relation to TRANSCRIPT is also a foreign key since it
is a key of the relation TEACHER.

Note that the decomposition of STDINF into the relation schemes STUDENT (Name,
Phone_No, Major, Grade) and COURSE (Course, Prof.) is a bad decomposition for the
following reasons:
1. Redundancy and update anomaly, because the data for the attributesPhone_no and
Major are repeated.
2. Loss of information, because we lose the fact that a student has a given grade in a
particular list.

Self-Assessment Questions – 2

3. The aim of the database system is to reduce redundancy, meaningthat information is to


be stored _______.
4. Multiple copies of the same fact may lead to update ________.
5. In the relational model, the problem of redundancy and inconsistencycan be remedied by
_______.

Unit 10: Normalization 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. PROPERTIES OF NORMALIZED RELATIONS


Ideal relations after normalization should have the following properties so that the problems
mentioned above do not occur for relations in the (ideal) normalized form:
1. No data value should be duplicated in different rows unnecessarily.
2. A value must be specified (and required) for every attribute in a row.
3. Each relation should be self-contained. In other words, if a row from arelation is
deleted, important information should not be accidentally lost.
4. When a row is added to a relation, other relations in the databaseshould not be
affected.
5. A value of an attribute in a tuple may be changed independently of othertuples in the
relation and other relations.

The idea of normalizing relations to higher and higher normal forms is toattain the goal
of having a set of ideal relations meeting the above criteria.

Self-Assessment Questions – 3

6. According to Properties of normalized relation, no data value should be___________


in different rows unnecessarily.
7. Each relation should be _____________

Unit 10: Normalization 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. FIRST NORMALIZATION
The relation shown in table 10.1 is said to be in the First Normal Form, abbreviated as 1NF.
This form is also called a flat-file. There are nocomposite attributes, and every attribute is
single and describes one property. How do we convert an un-normalized table into the first
normal form? Consider the following example:

Table 10.3: Department

Department_ID Department_Name Location


1 Production Delhi, Kolkata
2 Sales Mumbai
3 Marketing Chennai
4 Research Goa, Gurugram

Table 10.3 is not in the first normal form because the Location column contains multiple
values. For example, the first row includes values "Delhi" and"Kolkata". To bring this table
to its first normal form, we split the table into two tables and now we have the resulting
tables:

Table 10.3a: Department_Name Table 10.3b : Department_Location

Department_ID Department_Name Department_ID Location

1 Production 1 Delhi

2 Sales 1 Kolkata

3 Marketing 2 Mumbai

4 Research 3 Chennai

4 Goa

4 Gurugram

Unit 10: Normalization 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Now the first normal form is satisfied, as the columns on each table all hold just one value.
Converting a relation to the 1NF form is the first essential step in normalization. There are
successive higher normal forms known as 2NF, 3NF, BNCF, 4NF, and 5NF. Each form is an
improvement over the earlier form. In other words, 2NF is an improvement on 1NF, 3NF is
an improvement on 2NF, and so on. A higher normal form relation is a subsetof a lower
normal form as shown in the following figure 10.4. The higher normalization steps are based
on three important concepts:

Fig 10.4: Illustration of successive normal forms of a relation

1. Dependencies among attributes in a relation


2. Identification of an attribute or a set of attributes as the key of a relation
3. Multivalued dependency between attributes.

SELF ASSESSMENT QUESTIONS – 4

8. First Normal Form, is also called a file.


9. A higher normal; form relation is a subset of _ normal form.

Unit 10: Normalization 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. SECOND NORMAL FORM RELATION


We will now define a relation in the Second Normal Form (2NF). A relationis said to be in
2NF if it is in 1NF, and non-key attributes are functionally dependent on the key attribute(s).
Further, if the key has more than one attribute then no non-key attributes should be
functionally dependent upon apart of the key attributes. Consider, for example, the relation
given in the table

This relation is in 1NF. The key is (Order no., Item code). The dependency diagram for
attributes of this relation is shown in figure 10.5. The non-key attribute Price/Unit is
functionally dependent on the Item code which is part of the relation key. Also, the non-key
attribute Order date is functionally dependent on Order no. which is a part of the relation
key. Thusthe relation is not in 2NF. It can be transformed to 2NF by splitting it into three
relations as shown in table 10.4.

In table 10.4 the relation Orders has Order no. as the key. The relation Order details
have the composite key Order no. and Item code.

Fig 10.5: Dependency diagram for the relation given in a table

In both relations, the non-key attributes are functionally dependent on the whole key.
Observe that by transforming to 2NF relations the repetition of the Order date (table 10.1)
has been removed. Further, if an order for an item is cancelled, the price of an item is not
lost. For example, if Order no.1886 for item code 4629 is cancelled in table 10.1, then the
fourth row will beremoved, and the price of the item is lost. In table 10.4 only the fourth row
oftable 10.4(b) is omitted. The item price is not lost as it is available in table 10.4(c). The date
of the order is also not lost as it is in table 10.4(a).

Unit 10: Normalization 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Table 10.4: Splitting of Relation given in table 10.1 into 2NF Relations
a) Orders
Order no. Order date
1456 260289
1886 040389
1788 040489

b) Order Details
Order no. Item Code Qty.
1456 3687 52
1456 4627 38
1456 3214 20
1886 4629 45
1886 4627 30
1788 4627 40

c) Prices
Item code Price/unit
3687 50.40
4627 60.20
3214 17.50
4629 20.25
These relations in 2NF form meet all the "ideal" conditions specified. Observe that the three
relations obtained are self-contained. There is no duplication of data within a relation.

Self-Assessment Questions – 5

10. A relation is said to be in 2NF if it is in ______and non-keyattributes are


functionally dependent on the key attribute(s).
11. If the key has more than one attribute then no _______attributes should be
functionally dependent upon a part of the key attributes.

Unit 10: Normalization 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. THIRD NORMAL FORM


A Third Normal Form of normalization will be needed where all attributes in a relation tuple
are not functionally dependent only on the key attribute. If two non-key attributes are
functionally dependent, then there will be no unnecessary duplication of data. Consider the
relation given in table 10.5. Here, Roll no. is the key, and all the other attributes are
functionally dependent on it.

Table 10.5: A 2NF Form Relation

Roll no. Name Department Year Hostel name


1784 Raman Physics 1 Ganga
1648 Krishnan Chemistry 1 Ganga
1768 Gopalan Mathematics 2 Kaveri
1848 Raja Botany 2 Kaveri
1682 Maya Geology 3 Krishna
1485 Singh Zoology 4 Godavari
Thus it is in 2NF. If it is known that in the college all first-year students are accommodated
in the Ganga hostel, all second-year students in Kaveri, all third-year students in Krishna, and
all fourth-year students in the Godavari, then the non-key attribute Hostel name is
dependent on the non-key attribute Year. This dependency is shown in figure 10.6.

Fig 10.6: Dependency diagram for the relation

Observe that given the year of the student, his hostel is known and vice versa. The
dependency of the hostel on year leads to duplication of data as is evident from table 10.5. If
it is decided to ask all first-year students to move to Kaveri hostel, and all second-year
students to Ganga hostel, this change should be made in many places in table 10.5. Also, when

Unit 10: Normalization 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

a student's year ofstudy changes, his hostel change also should be noted in table 10.5. This is
undesirable. Table 10.5 is said to be in 3NF if it is in 2NF and no non-key attribute is
functionally dependent on any other non-key attribute. Table 10.5 is thus not in 3NF. To
transform it to 3NF, we should introduce another relation that includes the functionally
related non-key attributes.

Table 10.6: Conversion of table 10.5 into two 3NF relations

Roll no. Name Department Year


1784 Raman Physics 1
1648 Krishnan Chemistry 1
1768 Gopalan Mathematics 2
1848 Raja Botany 2
1682 Maya Geology 3
1485 Singh Zoology 4

Year Hostel name


1 Ganga
2 Kaveri
3 Krishna
4 Godavari
It should be stressed again that dependency between attributes is a semantic property and
has to be stated in the problem specification. In this example, the dependency between Year
and Hostel is clearly stated. In case hostel allocated to students does not depend on their
year in college, then table 10.5 is already in 3NF.

Let us consider another example of a relation. The relation Employee is given below and its
dependency diagram in figure 10.7.

Employee (Employee code, Employee name, Dept., Salary, Project no., Termination date of
project).

As can be seen from the figure, the termination date of a project is dependent on Project no.
Thus this relation is not in 3NF. The 3NF relations are:

Unit 10: Normalization 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Employee (Employee code, Employee name, Salary, Project no.) Project (Project no.
Termination date)

Fig 10.7: Dependency diagram of employee relation

SELF ASSESSMENT QUESTIONS – 6

12. A __________Form normalization will be needed where all attributes ina relation
tuple are not functionally dependent only on the key attribute.
13. If two non-key attributes are functionally _______, then there will be no
unnecessary duplication of data.
14. In 3 NF dependency between attributes is a ______property and hasto be stated
in the problem specification.

Unit 10: Normalization 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. BOYCE-CODD NORMAL FORM (BCNF)


Assume that a relation has more than one possible key. Assume further that the composite
keys have a common attribute. If an attribute of a composite key is dependent on an attribute
of the other composite key, a normalizationcalled BCNF is needed. Consider, as an example,
the relation Professor:

Professor (Professor code, Dept., Head of Dept., Parent time)

It is assumed that
1. A professor can work in more than one department
2. The percentage of the time he spends in each department is given.
3. Each department has only one Head of Department.

The relationship diagram for the above relation is given in figure 10.8. Table 10.7 gives the
relation attributes. The two possible composite keys are Professor Code and Dept. or
Professor Code and Head of Dept. Observe that department as well as Head of Dept. are not
non-key attributes. They are a part of a composite key.

Fig 10.8: Dependency diagram of Professor relation Table 10.7: Normalization of Relation
"Professor"

Unit 10: Normalization 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Table 10.7: Normalization of Relation "Professor"

Professor Code Department Head of Dept. Parent


P1 Physics Ghosh 50
P1 Mathematics Krishnan 50
P2 Chemistry Rao 25
P2 Physics Ghosh 75
P3 Mathematics Krishnan 100

The relation given in table 10.7 is 3NF. Observe, however, that the namesof Dept. and Head
of Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose
the information that Rao is the Head of the Department of Chemistry.

The normalization of the relation is done by creating new relation for [Link] Head of Dept.
and deleting Head of Dept. from Professor Relation. The normalized relations are shown in
the following table 10.8 and the dependency diagrams for these new relations are in figure
10.8.

Table 10.8: Normalized Professor Relation in BCNF


a)
Professor Code Department Percent time
P1 Physics 50
P1 Mathematics 50
P2 Chemistry 25
P2 Physics 75
P3 Mathematics 100

b)
Department Head of Dept.
Physics Ghosh
Mathematics Krishnan
Chemistry Rao

The dependency diagram gives the important clue to this normalization stepas is clear from
figures 10.8 and 10.9.

Unit 10: Normalization 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Fig 10.9: Dependency diagram of Professor relation

SELF ASSESSMENT QUESTIONS – 7

15. Boyce-Codd Normal Form is acronym for _______.


16. Removing more than one independent multivalued dependency from arelation by
splitting relation is called ________.

Unit 10: Normalization 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. FOURTH AND FIFTH NORMAL FORM


When attributes in relation have a multi-valued dependency, further Normalisation to 4NF
and 5NF required. We will illustrate this with an example. Consider a vendor supplying many
items to many projects in an organization. The following are the assumptions:
1. A vendor is capable of supplying many items.
2. A project uses many items.
3. A vendor supplies many projects.
4. An item may be supplied by many vendors.

Table 10.9 gives the relation for this problem and figure 10.10 the dependency diagram(s).
Table 10.9: Vendor-supply-projects Relation
Vendor Code Item code Project no.1
V1 I1 P1
V1 I2 P1
V1 I1 P3
V1 I2 P3
V2 I2 P1
V2 I3 P1
V3 I1 P2
V3 I1 P2

The relation given in table 10.9 has a number of problems. For example:

Fig 10.10: Dependency diagram of Professor Relation

If vendor V1 has to supply to project P2, but the item is not yet decided, then a row with a
blank for item code has to be introduced. The information about item 1 is stored twice for
vendor V3. Observe that the relation given inTable 10.8 is in 3NF and also in BCNF. It still has
the problems mentioned above. The problem is reduced by expressing this relation as two

Unit 10: Normalization 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

relations in the Fourth Normal Form (4NF). A relation is in 4NF if it has no more than one
independent multivalued dependency, or one independent multivalued dependency with a
functional dependency.

Table 10.9 can be expressed as the two 4NF relations given in Table 10.10. The fact that
vendors are capable of supplying certain items and that they are assigned to supply for some
projects are independently specified in the 4NF relation.

Table 10.10: Vendor-supply-project Relations in 4NF


a) Vendor Supply
Vendor Code Item code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1

b) Vendor project
Vendor Code Project no.
V1 P1
V1 P3
V2 P1
V3 P1
V3 P2
These relations still have a problem. Even though vendor V1's capability to supply items and
his allotment to supply for specified projects may not need it. We thus need another relation
that specifies this. This is called the 5NF form. The 5NF relations are the relations in Table
10.10(a) and 10.10(b) together with the relation given in table 10.11.
Table 10.11: 5NF Additional Relation
Project no. Item code
P1 I1
P1 I2
P2 I1
P3 I1
P3 I3
In table 10.12 we summarize the normalization steps already explained

Unit 10: Normalization 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Table 10.12: Summary of Normalization steps

Input relation Transformation Output relation


All relations Eliminate variable-length records. 1NF
Remove dependency of non-key attribute on part of 2NF
1NF relation
a multiattribute key
Remove dependency of non-key attribute on other 3NF
2NF
non-key attributes
Remove dependency of an attribute ofa
multiattribute key on an attribute of another
BCNF
3NF (overlapping) multi-attribute key

Remove more than one independent multivalued


dependency from relation by splitting relation
BCNF 4NF

Add one relation relating attributes with


multivalued dependency to the two relations with
4NF 5NF
multivalued dependency

SELF ASSESSMENT QUESTIONS – 8

17. A relation is in 4NF if it has no more than one _______multivalued


dependency or one independent multivalued dependency with a functional
dependency.
18. In 4 NF, transformation is done by adding one relation relating attributes with
multivalued dependency to the ______relations withmultivalued
dependency.

Unit 10: Normalization 24


DCA2102: Database Management System Manipal University Jaipur (MUJ)

10. SUMMARY
In this unit, we discussed that there is no fool-proof algorithmic method of identifying
dependency and hence we have to use our commonsense and judgment to specify
dependencies. We dealt about the importance of having a consistent database without
repetition of data and pointed out the anomalies that could be introduced in a database with
an undesirable scheme. We also discussed the properties of normalized relations. Then we
discussed the several forms of normalization that could help in removing these anomalies.

11. TERMINAL QUESTIONS

1. What is the basic purpose of 4NF?


2. What types of anomalies are found in relational database?
3. Define the term functional dependency.
4. Give a set of FDs for the relation schema R(A,B,C,D) with the primary key AB under
which R is in 1NF but not in 2NF.
5. Consider the relation schema R(A,B,C), which has the FD B➔ C. If A is a candidate key
for R, is it possible for R to be in BCNF? If so, under what conditions? If not, explain why
not.
6. Suppose that we have a relation schema R(A,B,C) representing a relationship between
two entity sets with keys A and B, respectively, andsuppose that R has (among others)
the FDs A ➔ B and B➔ A. Explain what such a pair of dependencies means (i.e., what
they imply about therelationship that the relation models).
7. Consider a relation R with five attributes ABCDE. You are given the following
dependencies: A➔ B, BC ➔ E, and ED ➔ A.
a. List all keys for R.
b. Is R in 3NF?
c. Is R in BCNF?

Unit 10: Normalization 25


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. Consider the attribute set R = ABCDEGH and the FD set F = { AB ➔ C, AC ➔ B, AD➔ E,


B➔ D, BC➔A, E ➔ G }.
a. For each of the following attribute sets, do the following:
i. Compute the set of dependencies that hold over the set andwrite down a
minimal cover.
ii. Name the strongest normal form that is not violated by therelation containing
these attributes.
iii. Decompose it into a collection of BCNF relations if it is not inBCNF.
a) ABC b) ABCD c) ABCEG d) DCEGH e) ACEH

b. Which of the following decompositions of R = ABCDEG, with thesame set of


dependencies F, is (a) dependency-preserving?
a) lossless-join?
b) {AB, BC, ABDE, EG}
c) {ABC, ACDE, ADG}

12. ANSWERS
Self-Assessment Questions
1. X ➔ Y
2. composite
3. only once
4. anomalies
5. decomposition
6. Duplicated
7. self-contained
8. flat
9. lower
10. 1NF
11. non-key
12. Third Normal
13. Dependent
14. Semantic
15. BCNF
16. Independent
17. two

Unit 10: Normalization 26


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions
1. Basic purpose of 4NF transformation is Normalization of data when attributes in a
relation have multi-valued dependency. It is done by adding one relation relating
attributes with multivalued dependency to the two relations with multivalued
dependency. (Refer section 9)
2. Anomalies found in relational databases are Update Anomalies,Insertion Anomalies,
and Deletion Anomalies (Refer section 3 for detail)
3. A functional dependency occurs when one attribute in a relation uniquely determines
another attribute. This can be written X -> Y which would be the same as stating "Y is
functionally dependent upon X".
4. Refer section 2 for detail
5. Yes, it is possible for R to be in BCNF if R has more than one possiblekey. (Refer section
8 for detail)
6. The given pair of dependencies means functional dependencies. (Refersection 10.2 for
detail)
7. Refer whole unit for detail.
8. Refer whole unit for detail.
9. Refer whole unit for detail.

Unit 10: Normalization 27


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER APPLICATIONS


SEMESTER 3

DCA2101
COMPUTER ORIENTED NUMERICAL
METHODS

Unit 11: Query Processing and Optimization 1


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Unit 11
Query Processing and Optimization
Table of Contents

SL Topic Fig No / Table SAQ / Page No


No / Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Query Interpretation 1 1 4–6
3 Equivalence of Expressions - 2 7–8
4 Algorithm for Executing Query Operations - 3
4.1 External sorting - -
4.2 Select operation - -
4.3 Join operation - - 9 – 14
4.4 PROJECT and set operation - -
4.5 Aggregate operations - -
4.6 Outer join - -
5 Heuristics in Query Optimization 2, 3 4 15 – 18
6 Semantic Query Optimization - - 19
7 Converting Query Tree to Query Evaluation - 5 20 – 21
Plan
8 Cost Estimates in Query Optimization - 6
8.1 Measure of query cost - -
22 – 24
8.2 Catalog information for cost estimation
- -
of queries
9 Join Strategies for Parallel Processing - 7
9.1 Parallel join - -
25 – 26
9.2 Pipelined multiway join - -
9.3 Physical organisation - -
10 Summary - - 27
11 Terminal Questions - - 27
12 Answers - - 28 – 29

Unit 11: Query Processing and Optimization 2


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

1. INTRODUCTION

In the previous unit, you have learned that normalization is needed to reduce redundancy,
and relations are normalized so that when relations in a database are to be altered during
the lifetime of the database, we do not lose information or introduce inconsistencies. In this
unit, we learn about the query processing and optimization techniques used in Database
management systems.

Query processing is a set of activities to obtain the desired information from a database
system in a predictable and reliable fashion. These activities are

(i) parsing a query and translating it into the form such that it can be optimized (ii)
Optimization of query data and (iii) finally evaluating for an execution plan over physical
data model, using operations on file structures, indices, etc. In SQL, queries are expressed in
high-level declarative form. So the query has to be processed and optimized so that the query
of the internal form gets a suitable execution strategy for processing. This optimization helps
get the result in a lesser time.

There can be several possible strategies for processing a query depending upon its
complexity. One should select a good strategy for processing a query.

This unit will introduce you to the basic concepts of query processing process and query
optimization strategies in the relational database domain.

1.1 Objectives

After studying this unit, you should be able to:

❖ Process and optimize query


❖ Explain about the equivalence of expressions in queries
❖ Write basic algorithms for executing query operations
❖ Describe heuristics in query optimization
❖ Explain cost estimates in query optimization
❖ Discuss basic query optimization strategies

Unit 11: Query Processing and Optimization 3


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

2. QUERY INTERPRETATION

There are three phases that a query passes through during the DBMS processing of that
query: (i) Parsing and translation (ii) Optimization

(iii) Evaluation. Figure 11.1 shows the steps in the query processing process.

Figure 11.1: Steps in the query processing process

Parsing and translation are done because Query in High-Level language is suitable for human
use only. For example in SQL, queries are expressed in high-level declarative form and a high-
level relational query is generally non-procedural in nature. It simply informs about “what“
rather than informing “how” to get a query. Internally a query should be represented in a
more useful form, like relational algebra. So, first, the system must translate the query into

Unit 11: Query Processing and Optimization 4


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

its internal form. After parsing and translating the query into a relational algebra expression,
the query is then transformed into a form, usually a query tree or graph, which can be
handled by the optimization engine (Query Optimizer).

After the query is been processed and translated into the form that can be optimized, the
question arises why we need to optimize such a form of data. A very efficient methodology
should be applied to find the result of the query in the existing database structure. If
optimization is done then it can be queried using the information in the main memory, with
little or no disk access. After optimization, various analyses are performed on the query data,
thereby helping in giving solutions for valid evaluation plans for execution of the result.
Execution of the query requires disk access. The problem again is that the transfer of data
from disk is slow, relative to the speed of the main memory and the CPU. So it is better to
think about saving the disk accesses also. Keeping all these limitations in mind, optimization
is done to find out the best possible methods.

There are two main approaches to finding the best possible solutions. They are

(i) Rewriting the query in a more effective manner. (ii) Estimating the cost of various
execution strategies for the query.

Usually, both strategies are combined because one has to solve it in a very less amount of
time.

In the network and hierarchical systems, optimization is left for the most part to the
application programmer. It is so because one has to know about the entire application
program and it is not easy to transform a hierarchical or network query into another one.

When the optimization phase is over then the detailed strategy is observed and the best
evaluation plan is found by the Evaluation engine for processing the query. Here we choose
specific indices to use, and the order in which tuples are to be processed, etc.

The best evaluation plan is chosen out of multiple methods of executing a query and then
executed to get the result.

Unit 11: Query Processing and Optimization 5


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Self-Assessment Questions - 1

1. Query is parsed and translated to get relational algebra expression. (True/False)


2. Optimization of query processing is done by ________ .

Unit 11: Query Processing and Optimization 6


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

3. EQUIVALENCE OF EXPRESSIONS
It is always better to find out a query-processing strategy to find a relational algebra
expression that is equivalent to the given query. It gives efficiency in execution.

An SQL query is first translated into an equivalent extended relational algebra expression–
represented as a query tree data structure–that is then optimized. This can be done by first
decomposing the SQL queries into query blocks. This basic unit can be translated into an
extended relational algebra expression and then optimized. In fact, Nested queries are not
query blocks, but are identified as separate query blocks.

A query block contains a single SELECT-FROM-WHERE expression (may contain GROUP BY


and HAVING)

Let us look at the SQL query in the example of a University database on the FACULTY relation.

FACULTY (FID: string, FNAME: string, LNAME:string, DEPT: string, SALARY: real, ENO:
int, DNO: int, SEX:string)
FACULTY FID FNAME LNAME DEPT SALARY DNO ENO SEX

46 Mitra Guru IT $88,000 2 76 M


47 Lara Maya MBA $88,999 3 85 F

Consider the following SQL query on the Faculty relation

SELECT

LNAME, FNAME

FROM FACULTY
WHERE SALARY > (SELECT MAX (SALARY)
FROM FACULTY
WHERE DNO=2);

Unit 11: Query Processing and Optimization 7


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

This query includes a nested subquery and hence would be decomposed into two blocks. The
inner block is

(SELECT MAX (SALARY)


From FACULTY
WHERE DNO= 2);

And the outer block is

SELECT LNAME, FNAME

FROM FACULTY

WHERE SALARY > c

Where c represents the result returned from the inner query block. The inner block can be
translated into the extended relational algebra expression
δ<MAX SALARY> (σ<DNO=2> (FACULTY))
And the outer block into the expression
π <FNAME, LNAME> (σ<SALARY > c> (FACULTY))

After the SQL query is decomposed then the query optimizer chooses an execution plan for
each block. It should be noted that in the above example, the inner block needs to be
evaluated only once to produce the maximum salary, which is then used–as the constant c–
by the outer block. This is known as an uncorrelated nested query. It is difficult to optimize
the more complex correlated nested queries, where a tuple variable from the outer block
appears in the WHERE-clause of the inner block.

Self-Assessment Questions - 2

3. Nested query blocks are not identified as separate query blocks. (True/False)
4. It is difficult to optimize the more complex correlated nested queries, where a tuple
variable from the outer block appears in the ________of the inner block.

Unit 11: Query Processing and Optimization 8


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

4. ALGORITHM FOR EXECUTING QUERY OPERATIONS

In this section, we will discuss the algorithm for executing query operations. They are (i)
EXTERNAL sorting, (ii) SELECT operation, (iii) JOIN operation,

(iv) PROJECT & SET operation, (v) AGGREGATE operation, and (vi) Outer Join.

4.1 External Sorting

External Sorting is one of the Primary algorithms used in query processing that are suitable
for large files of records stored on disk that do not fit entirely in main memory, as most
database files do.

Whenever an SQL query specifies an ORDER BY clause, the query result must be sorted.
Sorting is also a key component in sort-merge algorithms used for JOIN and other operations
(such as UNION and INTERSECTION), and in duplicate elimination algorithms for the
PROJECT operation (when an SQL query specifies the DISTINCT option in the SELECT
clause).

The typical external sorting algorithm uses a sort-merge strategy. This algorithm consists
of two phases: (i) Sorting Phase (ii) Merging Phase.

This sort-merge algorithm also like other database algorithms requires buffer space in the
main memory, where the actual sorting and merging of the runs (portions or pieces of the
file) is performed.

In the sorting phase, runs (portions or pieces of the file) which can fit in the available buffer
space are read into the main memory, which is sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or runs). The size of a run and the
number of initial runs are dictated by the number of file blocks (b) and the available
buffer space.

In the merging phase, the sorted runs are merged during one or more passes. The degree
of merging is the number of runs that can be merged together in each pass. In each pass, one
buffer block is needed to hold one block from each of the runs being merged, and one block
is needed for containing one block of the merge result.

Unit 11: Query Processing and Optimization 9


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

4.2 SELECT Operation

This is another form of an algorithm for executing query operations. The execution of the
SELECT operation depends on the file having specific access paths and may apply only to
certain types of selection conditions.

Search methods for simple selection

There may be a number of search algorithms to select records from a file. These algorithms
scan a file and are also known as file scans because they scan the records of a file to search
for and retrieve records that satisfy a selection condition. A search algorithm may involve an
index for searching and this type of index search is called an index scan. The search methods
given below are examples of some of the search algorithms that can be used to implement a
select operation:

i) Linear search (brute force): This method is used to retrieve every record in the file, and
test whether its attribute values satisfy the selection condition.
ii) Binary search: This method can be used if the selection condition involves an equality
comparison on a key attribute on which the file is ordered.
iii) Using a primary index (or hash key): This method is used in the selection condition and
involves an equality comparison on a key attribute with a primary index (or hash key).
iv) Using a primary index to retrieve multiple records: This method is used in the
comparison condition is >, >=, <, or <= on a key field with a primary index.
v) Using a clustering index to retrieve multiple records: It is used in the selection condition
that involves an equality comparison on a non-key attribute with a clustering index.
vi) Using a secondary (BPlus-tree) index on an equality comparison: This search method
can be used to retrieve a single record if the indexing field is a key (has unique values)
or to retrieve multiple records if the indexing field is not a key. This can also be used
for comparisons involving >, >=, <, or <=.

Linear search (brute force) applies to any file, but all the other methods depend on having
the appropriate access path on the attribute used in the selection condition. Binary search is
more efficient than linear search if the selection condition involves an equality comparison

Unit 11: Query Processing and Optimization 10


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

on a key attribute on which the file is ordered. The primary index, Clustering index, and
Secondary index can be used to retrieve records in a certain range. Queries involving such
conditions are called range queries.

4.3 Join Operation

The JOIN operation is one of the most time-consuming operations in query processing. There
are many possible ways to implement a two-way join, which is a join on two files. Joins
involving more than two files are called multiway joins. The number of possible ways to
execute multiway joins grows very rapidly. Let us consider the algorithms for join operations
of the form

P⨝A=B Q

For relations P and Q and where A and B are domain-compatible attributes of P and Q,
respectively.

Methods for implementing Joins:

Following are the methods commonly used for implementing Joins:

i) Nested-loop join (brute force): In this method, for each record t in P (outer loop), every
records from Q (inner loop) has to be retrieved and tested whether the two records
satisfy the join condition t[A] = s[B].
ii) Single-loop join (using an access structure to retrieve the matching records): In this
method, suppose if an index (or hash key) exists for one of the two join attributes, B of
Q then each record t in P is retrieved one at a time (single loop), and then the access
structure is used to retrieve all matching records s directly from Q that satisfy s[B] =
t[A].
iii) Sort–merge join: In this method, the records of P and Q has to be physically sorted
(ordered) by the value of the join attributes A and B, respectively. Then both files are
scanned concurrently in order of the join attributes, matching the records that have the
same values for A and B. If the files are not sorted, they may be sorted first by using
external sorting. Pairs of file blocks are copied into memory buffers in order and the

Unit 11: Query Processing and Optimization 11


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

records of each file are scanned only once each for matching with the other file–unless
both A and B are non-key attributes. In such a case, the method needs to be modified
slightly. We use P(i) to refer to the ith record in P. A variation of the sort-merge join can
be used when secondary indexes exist on both join attributes. The indexes provide the
ability to access (scan) the records in order of the join attributes, but the records
themselves are physically scattered all over the file blocks. This method may be quite
inefficient if every record access involves accessing a different disk block.
iv) Hash-join: In this method, the records of files P and Q are both hashed to the same hash
file, using the same hashing function on the join attributes A of P and B of Q as hash
keys. First, it passes through the first phase known as the partitioning phase, a single
pass through the file with fewer records (say, P) hashes its records to the hash file
buckets. It is called the partitioning phase because the records of P are partitioned
into the hash buckets. In the second phase, called the probing phase, a single pass
through the other file (Q) then hashes each of its records to probe the appropriate
bucket, and that record is combined with all matching records from P in that bucket.
Here we assume that the smaller of the two files fit entirely into memory buckets after
the first phase although this type of assumption is not always required.

4.4 PROJECT And Set Operations

Project and set operations are very useful and used in certain cases for query optimization.

PROJECT operation: PROJECT operation π<attribute list>(R) can be implemented if <attribute


list> includes a key of relation P because in this case, the result of the operation will have the
same number of tuples as P, but with only the values for the attributes in <attribute list> in
each tuple. If <attribute list> does not include a key of P, duplicate tuples must be eliminated.
This is usually done by sorting the result of the operation and then eliminating duplicate
tuples, which appear consecutively after sorting.

Hashing can also be used to eliminate duplicates. Each record is hashed and inserted into a
bucket of the hash file in memory and it is checked against those already in the bucket. If it
is a duplicate, it is not inserted

Unit 11: Query Processing and Optimization 12


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Set operations: Set operations are UNION, INTERSECTION, SET DIFFERENCE, and
CARTESIAN PRODUCT. But the CARTESIAN

PRODUCT operation like R x S is quite expensive because its result includes a record for each
combination of records from R and S. Moreover the attributes of the result include all
attributes of R and S. Hence CARTESIAN PRODUCT operation is avoided and other equivalent
operations are used for query optimizations.

The other three set operations UNION, INTERSECTION, and SET DIFFERENCE apply only to
union-compatible relations, which have the same number of attributes and the same
attribute domains.

Hashing can also be used to implement UNION, INTERSECTION, and SET DIFFERENCE.

4.5 Aggregate Operations

We use aggregate operations to find out the aggregate values for query optimizations. MIN,
MAX, COUNT, AVERAGE, and SUM are the aggregate operators used to compute the aggregate
values. But these can be computed by a table scan or by using an appropriate index. For
example:

SELECT MAX (SALARY)

FROM FACULTY

If an ascending index on salary exists for the FACULTY relation, it can be used (otherwise we
can scan the entire table).

The index can also be used for the COUNT, AVERAGE, and SUM aggregates but the index must
be dense, that is, there must be an index entry for every record in the main file.

When a GROUP BY clause is used in a query, the aggregate operator must be applied
separately to each group of tuples.

In order to do this, the table is first partitioned into subsets of tuples, where each partition
(group) has the same value for the grouping attributes. In this case, the computation is more
complex. Let us consider the following query:

Unit 11: Query Processing and Optimization 13


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

SELECT DNO, AVG (SALARY)

FROM FACULTY

GROUP BY DNO;

Usually either sorting or hashing is used on the grouping attributes to partition the file into
the appropriate groups and then the algorithm computes the aggregate function for the
tuples in each group which have the same grouping attribute values.

4.6 Outer Join

Outer join also can be used for query optimizations. Outer Join can be computed by modifying
one of the join algorithms, such as nested-loop join or single-loop join. The sort-merge and
hash-join algorithms can also be extended to compute outer joins.

Outer join can also be computed by executing a combination of relational algebra operators.
In this case, the cost of the outer join would be the sum of the costs of the associated steps
i.e. inner join, projections, and union.

Self-Assessment Questions - 3

5. Sort-merge algorithm requires buffer space in __________ memory, where the


actual sorting and merging of the runs is performed.
6. The condition where Primary index, Clustering index and Secondary index are
used to retrieve records in a certain range are called _______________queries.
7. The JOIN operation is a very fast operation in query processing. (True/False).
8. In set operation, which operations is avoided. (Choose correct option)
a) UNION
b) INTERSECTION
c) SET DIFFERENCE
d) CARTESIAN PRODUCT

Unit 11: Query Processing and Optimization 14


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

5. HEURISTICS IN QUERY OPTIMIZATION

The heuristic rule is applied for Query Optimization by modifying the internal representation
of the query. This form of query is generally in the form of query tree or a query graph data
structure. Although some optimization techniques were based on query graphs, nowadays,
this technique is not applied because query graphs cannot show the order of operation which
is needed by the query optimizer for query execution. So this unit will deal mainly with the
Heuristic Optimization of the query tree.

A heuristic rule is applied to the initial query expression and produces the heuristically
transformed equivalent query expressions. This is performed by transforming an initial
expression (tree) into an equivalent expression (tree) which is made more efficient for
execution. This rule works well in most cases but is not always guaranteed.

Execution of the query tree consists of executing an internal node operation whenever its
operands are available and then replacing that internal node with the relation that results
from executing the operation.

Query trees and query graphs

The query tree is a tree data structure that represents the relational algebra expression in
the query optimization process. The leaf nodes in the query tree correspond to the input
relations of the query. The internal nodes represent the relational algebra operations. The
system will execute an internal node operation whenever its operands are available and then
the internal node is replaced by the relation which is obtained from executing the operation.
The execution is terminated when the root node is executed and produces the result relation
for the query.
For example – Consider the following two tables with the following schema:
Emp(EmpID, Name, Sal, DeptID)
Dept(DeptID, Dname, Location)

To find the name of employees working in the Marketing department, a query in relational
algebra can be written as shown in Q as well as a query tree as shown in fig. 11.2.

Q: Ename ( σDname = "Marketing” (Emp ⨝ Dept))

Unit 11: Query Processing and Optimization 15


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Figure 11.2: Query Tree for Q

The query can also be represented by a query graph. In this case, the relations in the query
are represented by relational nodes and are represented by single circles. Constant nodes
are used to represent constant values and are displayed as double circles or ovals. Selection
and join conditions are represented by the graph edges. And finally, the attributes to be
retrieved from each relation are displayed in square brackets above each relation. The graph
query representation does not give an order of performing the operations because there is
only a single graph corresponding to each query.

Hence query trees are better than query graphs because the query optimizer needs to show
the order of operations for query execution which is not possible in query graphs.

Heuristic optimization of query trees:

Two relational algebra expressions are said to be equivalent if the two expressions generate
two relations of the same set of attributes and contain the same set of tuples although their
attributes may be ordered differently.

Hence while doing Heuristic Optimization on Query Trees, generally, many different
relational algebra expressions can be found which can be equivalent to correspond to the
same query. And for every relational algebra expression a query tree can be drawn. And
hence there can be many different query trees to correspond to the same query.

The query parser generates a standard initial query tree to correspond to the SQL query
without optimizing. When the simple standard form of the query tree is found out then the
heuristic query optimizer transforms this initial query tree into a final query tree that is
efficient to execute.

Unit 11: Query Processing and Optimization 16


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

A general outline of a Heuristic Algebraic Optimization Algorithm

Heuristic rule are generally applied as per the following steps:

1) First of all SELECT operations are broken up with conjunctive operations into a cascade
of SELECT operations.
2) Then SELECT operations are moved down far to the query tree as is permitted by the
attributes involved in the select condition.
3) Leaf nodes of the tree are rearranged by :
o positioning the leaf node relation with the most restrictive SELECT
operations so they are executed first in the query representation,
o and making sure that the ordering of leaf nodes does not cause CARTESIAN PRODUCT
operation.
4) CARTESIAN PRODUCT operations are combined with a subsequent SELECT operation
in the tree into a JOIN operation if the condition represents a join condition.
5) Lists of projection attributes are broken down and moved down the tree as far as
possible by creating new PROJECT operations as needed.
6) And lastly subtrees are identified that represent groups of operations that can be
executed by a single algorithm.

Using the above rules the optimized query Q is shown in fig 11.3. It is optimized as now we
need to transfer less data from the Dept table for performing joining with data of Emp table.

Figure 11.3: Optimized Query Tree for Q

These steps are applied using general transformation rules for relational algebra operations
into equivalent ones. While transforming, the meaning of the operations and the resulting
relations should not mismatch.

Unit 11: Query Processing and Optimization 17


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Self-Assessment Questions - 4

9. In query graph Constant nodes are used to represent constant values and are
displayed as double circles or__________ .
10. The query parser generates a standard initial query tree to correspond to SQL
query without __________ . (Choose correct option)
a) optimizing
b) parsing
c) evaluation
d) sorting

Unit 11: Query Processing and Optimization 18


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

6. SEMANTIC QUERY OPTIMIZATION

This is one of the alternative approaches to query optimization and uses constraints
specified on the database scheme.

Suppose there is a query which says to retrieve the names of all students in a University
whose age is more than their faculty. And if we had a constraint on the database schema that
stated no student is older than faculty. In such a case if the semantic query optimizer checks
for the existence of this constraint then it need not to execute the query at all. But searching
through many constraints to find constraints applicable to a given query can be quite time-
consuming.

Unit 11: Query Processing and Optimization 19


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

7. CONVERTING QUERY TREE TO QUERY EVALUATION PLAN


We have seen above that query optimizers use the equivalence rules to generate a logically
equivalent expression to the given query expression such that it is easier for evaluation.
Equivalent expression helps in optimizing process easily. And we also observed that the
evaluation plan must have a detailed algorithm for each operation in the expression so that
the execution of the operations is coordinated one after another.

We saw that the output of the Parsing and Translating step in the query processing is a
relational algebra expression. If the query is complex, then the expressions consist of several
multiple operations and interact with various relations. The evaluation of the expression
becomes very costly in terms of both time and memory space in such a case. So it is very
important to consider how to evaluate an expression containing multiple operations. The
two approaches used for evaluating expression are materialization and pipelining.
Depending upon the situation sometimes Materialization is applicable and sometimes
Pipelining is applicable.

Materialization

In the materialization approach of query evaluation, we start from the lowest-level


operations in the expression. Using materialized evaluation, every immediate operation
produces a temporary relation which is used for the evaluation of higher-level operations.
Those temporary relations vary in size and might have to be stored in the disk. Hence if
materialization is applicable, the cost for this evaluation is the sum of the costs of all
operations plus the cost of writing/reading the result of intermediate results to disk.

Pipelining

Pipelining of different operations can improve query evaluation efficiency by reducing the
number of temporary relations that are produced. To achieve this reduction, we can combine
several operations into a pipeline of operations. We can implement the pipeline by creating
a query execution code. Depending upon different situations, we can use pipelining to reduce
the number of temporary files, thus reducing the cost of query evaluation.

Unit 11: Query Processing and Optimization 20


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Self-Assessment Questions - 5

11. If the query is complex, then the expressions consist of several multiple operations
and interact with various _________ .
12. We can combine several operations into a pipeline of operations to improve ________
evaluation.

Unit 11: Query Processing and Optimization 21


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

8. COST ESTIMATES IN QUERY OPTIMIZATION


The strategy of the query optimizer depends on estimation and comparison of the cost of
executing different strategies for the query. Query Optimizer should be able to find out the
lowest cost estimates. After considering all the strategies, one has to choose the query
execution plans with the lowest cost and least execution time. So accurate cost estimates are
required which can be compared fairly and realistically.

There are different components involved in the cost of query execution and different types
of information needed in cost functions. This information is kept in the DBMS catalog.

8.1 Measure Of Query Cost

The cost of executing a query includes the following components:

i) Access cost to secondary storage: This is the cost measured by the search performed
while reading, and writing data blocks that reside on secondary storage, mainly on disk.
The cost of searching for records in a file depends on the type of access structures on
that file, such as ordering, hashing, and primary or secondary indexes. In addition,
access cost is affected by the way file blocks are allocated contiguously on the same disk
cylinder or scattered on the disk. This cost is usually more important in the case of a
large database since disk accesses are slow compared to in-memory operations.
ii) Storage cost: This is the cost of storing any intermediate files that are generated by an
execution strategy for the query.
iii) Computation cost: This is the cost measured by the performance of in-memory
operations on the data buffers during query execution. Such operations include
searching for and sorting records, merging records for a join, and performing
computations on field values. This cost is important when the database is small where
almost all data reside in the memory.
iv) Memory usage cost: This is the cost pertaining to the number of memory buffers
needed during query execution.

Unit 11: Query Processing and Optimization 22


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

v) Communication cost: This is the cost of shipping the query and its results from the
database site to the site or terminal where the query originated. In the distributed
system, this cost should be minimized.

There can be other factors also that may be the cost components in a cost function which
may not be so much of importance. Therefore, some cost functions consider only disk access
cost as the reasonable measure of the cost of a query-evaluation plan.

8.2 Catalog Information For Cost Estimation Of Queries

As already mentioned, Query optimizers use the statistic information stored in the DBMS
catalog to estimate the cost of a plan. The relevant catalog information about the relation
includes:

• Number of tuples in a relation r; denoted by nr


• Number of blocks containing tuple of relation r: denoted by br
• Size of the tuple in a relation r (assume records in a file are all of same types): denoted
by sr
• Blocking factor of relation r which is the number of tuples that fit into one block:
denoted by fr
• V(A,r) is the number of distinct value of an attribute A in a relation r. This value is the
same as size of πA(r). If A is a key attribute then V(A,r) = nr
• SC(A,r) is the selection cardinality of attribute A of relation r. This is the average
number of records that satisfy an equality condition on attribute A.

Moreover some information about indices is also used along with the relation information.
They are:

• A number of levels in index i.


• Number of lowest–level index blocks in index i (number of blocks in the leaf level of the
index)

Unit 11: Query Processing and Optimization 23


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

The above given statistical information are the simplified one. In a real database
management system, the optimizer may have more information to improve the accuracy of
their cost estimates.

This statistical information maintained in the DBMS catalog along with the measures of
query cost based on the number of disk accesses helps in estimating the cost for different
relational algebra operations

Self-Assessment Questions - 6

13. Memory usage cost is the cost pertaining to the number of ________ buffers needed
during query execution.
14. Query optimizers use the statistic information stored in ________________ catalog to
estimate the cost of a plan.

Unit 11: Query Processing and Optimization 24


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

9. JOIN STRATEGIES FOR PARALLEL PROCESSING

We can use multiple processors for parallel computation to make the processing faster.
There are many cases where multiple processors may be available for parallel computation
of the join. The architecture may be different, including database machines. We will consider
an architecture where all processors have access to all disks, and all processors share main
memory.

9.1 Parallel Join

In Parallel-join, pairs to be tested are split over several processors. Each processor computes
part of the join, and then the results are assembled (merged).

Ideally, the overall work of computing join is partitioned evenly over all processors. If such
a split is achieved without any overhead, a parallel join using N processors will take 1/N
times as long as the same join would take on a single processor.

The speedups between several processors are more or less similar because the overhead is
incurred in partitioning the work among the processors and in collecting the results
computed by each processor. If the split is not even then the final result cannot be obtained
until the last processor has finished. Speedup also depends upon the processors competing
for shared system resources. In such a case for e.g., for relation A and B where A ⨝ B, if each
processor uses its own partition of A, and the main memory cannot hold the entire B, the
processors need to synchronize the access of B so as to reduce the number of times that each
block of B must be read in from the disk.

A parallel hash algorithm is used to reduce memory contention.

9.2 Pipelined Multiway Join

Multiway Join can be pipelined if several joins are computed in parallel over other
processors. For example:

1) r1⨝r2⨝r3⨝r4 can be computed by first computing “ t1← r1⨝r2 ” and “t2 ← r3⨝r4”,
and then “ t1⨝t2 ”

Unit 11: Query Processing and Optimization 25


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

2) And, it can be computed in the pipelined way: For r1 ⨝ r2 ⨝ r3 ⨝ r4 One Processor (say
P1) can be assigned to process r1 ⨝ r2, another processor (say P2) to process r3 ⨝ r4,
and other processors (say P3) to process the join of the tuples being generated by P1
and P2.

9.3 Physical Organization

We can also organize the physical resources to make the strategies for query evaluation more
efficient and better.

The database can be partitioned over several disks for accessing in parallel to avoid
contention between several processors. We must also distribute data among disks to exploit
parallel disk access.

For the parallel 2-way join, tuples of individual relations can be split among several disks.
This phenomenon is known as disk striping. For example in the hash-join algorithm, we can
assign tuples to disks based on the hash function value where all groups of tuples that share
a bucket are assigned to the same disk. Either each group can be assigned to the same disk,
if possible, or the groups can be distributed uniformly among the available disks in order to
exploit parallel disk access.

For the pipeline-join, it is better to keep each relation on one disk and the distinct relations
be assigned to separate disks to the degree possible.

Although physical organization can be optimized differently for different queries, one must
organize the physical resources for a better expectation of result. If the physical organization
is done in a proper way then only the query optimizer can choose the best technique by
estimating the cost of each technique on the given physical organization.

Self-Assessment Questions - 7

15. In Parallel-join pairs to be tested are split over several processors. (True/False)
16. __________ can be pipelined if several joins are computed in parallel.

Unit 11: Query Processing and Optimization 26


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

10. SUMMARY
When the queries are expressed in the high-level declarative form then the queries have to
be processed and optimized so that the query of the internal form gets a suitable execution
strategy for processing. To obtain this, several query processing and optimization strategies
have to be performed.

Query processing and optimization is a set of activities to obtain the desired information
from a database system such that the result is obtained in lesser time with low cost. The
queries are parsed and translated in equivalent relational algebra expression which is then
easier for optimizing it using different rules and algorithms.

We use External Sorting, Select operation, Join Operation, Project and Set Operation,
Aggregate Operations, and Outer Join for executing query operations. We also do Heuristic
Optimization of Query Trees or semantic optimization where ever appropriate and then the
optimized results are obtained for the query evaluation plan. We convert query trees into a
query evaluation plan.

After the query evaluation plan, the cost is estimated to find out the lowest cost possible.
Costs are estimated using several components that are included while executing plans and
statistical information stored in the DBMS catalog. Finally, the plan is executed over a
physical data model, using operations on file structures, indices, etc.

11. TERMINAL QUESTIONS


1. Draw and explain the architecture of Query Processing.
2. Explain sort-merge strategy in external sorting.
3. What is Heuristic Optimization?
4. Explain the measures of query cost in Query Optimization?
5. Explain Join strategies for parallel processing.

Unit 11: Query Processing and Optimization 27


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

12. ANSWERS
Self Assessment Questions

1. True
2. Query optimizer
3. False
4. WHERE clause
5. Main
6. Range
7. False
8. d)-CARTESIAN PRODUCT.
9. ovals
10. a)-optimizing.
11. Relations
12. Query
13. Memory
14. DBMS
15. True
16. Multiway join

Terminal Questions

1. There are three phases that a query passes through during the DBMS processing of that
query: (i) Parsing and translation (ii) Optimization
(iii) Evaluation. (Refer section 2 for detail)
2. A sort-merge strategy in external sorting consists of two phases: Sorting Phase (ii)
Merging Phase. In the sorting phase, runs (portions or pieces of the file) which can fit
in the available buffer space are read into main memory, which is sorted using an
internal sorting algorithm, and written back to disk as temporarily sorted subfiles (or
runs). In the merging phase, the sorted runs are merged during one or more passes.
(Refer section 4.1 for detail)

Unit 11: Query Processing and Optimization 28


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

3. The heuristic rule is applied for Query Optimization by modifying the internal
representation of the query. A heuristic rule is applied to the initial query expression
and produces the heuristically transformed equivalent query expressions. This is
performed by transforming an initial expression (tree) into an equivalent expression
(tree) which is made more efficient for execution. This rule works well in most cases
but not always guaranteed. (Refer section 5 for detail)
4. The measure of query cost includes the components like access cost to secondary
storage, storage cost, computation cost, memory usage cost, and communication cost.
(Refer section 8.1 for detail)
5. Join strategies for parallel processing are Parallel join, pipelined multiway join along
with the way physical resources are organized. (Refer section 9 for detail)

Unit 11: Query Processing and Optimization 29


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER APPLICATIONS


SEMESTER 3

DCA2101
COMPUTER ORIENTED NUMERICAL
METHODS

Unit 12: Distributed Databases 1


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Unit 12
Distributed Databases
Table of Contents

SL Topic Fig No / Table SAQ / Page No


No / Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Structure of Distributed Database 1 1 4-6
3 Tradeoffs in Distributing the Database - 2
2.1 Advantages of Data Distribution - - 7-9
2.2 Disadvantages of Data Distribution - -
4 Design of Distributed Databases 2, 3, 4, 5, 6 3
4.1 Data Replication - - 10 - 16
4.2 Data Fragmentation - -
5 Summary - - 17
6 Terminal Questions - - 18
7 Answers - - 19 - 20

Unit 12: Distributed Databases 2


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

1. INTRODUCTION

In a distributed database system, the database is stored on several computers. The


computers in a distributed system communicate with each other through various
communication media, such as high-speed buses or telephone lines. They do not share main
memory, nor do they share a clock.

The processors in a distributed system may vary in size and function. They may include small
microcomputers, workstations, minicomputers, and large general-purpose computer
systems. These processors are referred to by a number of different names such as sites,
nodes, computers, and so on, depending on the context in which they are mentioned. We
mainly use the term site, in order to emphasize the physical distribution of these systems.

A distributed database system consists of a collection of sites, each of which may participate
in the execution of transactions which access data at one site, or several sites. The main
difference between centralized and distributed database systems is that, in the former, the
data resides in one single location, while in the latter, the data resides in several locations.
As we shall see, this distribution of data is the cause of many difficulties that will be
addressed in this chapter.

1.1 Objectives

After going through this unit, the learners should be able to:

❖ differentiate Distributed DBMS (DDBMS) and conventional DBMS


❖ discuss Network topology for DDBMS
❖ distinguish between horizontal and vertical fragmentation

Unit 12: Distributed Databases 3


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

2. STRUCTURE OF DISTRIBUTED DATABASE

A distributed database system consists of a collection of sites, each of which maintains a local
database system. Each site is able to process local transactions, those transactions that
access data only in that single site. In addition, a site may participate in the execution of
global transactions, those transactions that access data in several sites. The execution of
global transactions requires communication among the sites.

The sites in the system can be connected physically in a variety of ways. The various
topologies are represented as graphs, whose nodes correspond to sites. An edge from node
A to node B corresponds to a direct connection between the two sites. Some of the most
common configurations are depicted in Figure 12.1. The major differences among these
configurations involve:

• Installation cost: The cost of physically linking the sites in the system.
• Communication cost: The cost of time and money to send a message from site A to site
B.
• Reliability: The frequency with which a link or site fails.
• Availability: The degree to which data can be accessed despite the failure of some links
or sites.

As we shall see, these differences play an important role in choosing the appropriate
mechanism for handling the distribution of data. The sites of a distributed database system
may be distributed physically, either over a large geographical area (such as the all Indian
states) or over a small geographical area such as a single building or a number of adjacent
buildings). The former type of network is referred to as a long-haul network, while the latter
is referred to as a local-area network.

Since the sites in long-haul networks are distributed physically over a large geographical
area, the communication links are likely to be relatively slow and less reliable as compared
with local-area networks. Typical long-haul links are telephone lines, microwave links, and
satellite channels. In contrast, since all the sites in local-area networks are close to each
other, communication links are of higher speed and lower error rate than their counterparts

Unit 12: Distributed Databases 4


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

in long-haul networks. The most common links are twisted pair, baseband coaxial,
broadband coaxial, and fiber optics.

Let us illustrate these concepts by considering a banking system consisting of four branches
located in four different cities. Each branch has its own computer with a database consisting
of all the accounts maintained at that branch. Each such installation is thus a site. There also
exists one single site which maintains information about all the branches of the bank.
Suppose that the database systems at the various sites are based on the relational model.
Thus, each branch maintains (among others) the relation deposit (Deposit-scheme) where

Deposit-scheme=(branch-name, account-number, customer-name, balance)

the site containing information about the four branches maintains the relation branch
(Branch-scheme), where

Branch-scheme = (branch-name, assets, branch-city)

There are other relations maintained at the various sites which are ignored for the purpose
of our example.

Figure 12.1: Network Topology

A local transaction is a transaction that accesses accounts in the one single site, at which the
transaction was initiated. A global transaction, on the other hand, is one which either

Unit 12: Distributed Databases 5


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

accesses accounts in a site different from the one at which the transaction was initiated, or
accesses accounts in several different sites. To illustrate the difference between these two
types of transactions, consider the transaction to add $ 50 to account number 177 located at
the Delhi branch. If the transaction was initiated at the Delhi branch, then it is considered
local; otherwise, it is considered global. A transaction to transfer $ 50 from account 177 to
account 305, which is located at the Bombay branch, is a global transaction since accounts in
two different sites are accessed as a result of its execution.

What makes the above configuration a distributed database system are the facts that:

• The various sites are aware of each other.


• Each site provides an environment for executing both local and global transactions.

Self-Assessment Questions - 1

1. A distributed database system consists of a collection of sites, each of which


maintains a ___________ databases system.
2. The cost of physically linking the sites in the system is called ______________.
3. Availability is the degree to which data can be accessed despite failure of
some_______________ .
4. The sites of a distributed database system that may be distributed physically over
a large geographical area is called ________________.
5. A ____________is a transaction that accesses accounts in the one single site, at which
the transaction was initiated.

Unit 12: Distributed Databases 6


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

3. TRADEOFFS IN DISTRIBUTING THE DATABASE

There are several reasons for building distributed database systems, including sharing of
data, reliability, and availability, and speedup of query processing. However, along with
these advantages come several disadvantages, including software development cost, greater
potential for bugs, and increased processing overhead. In this section, we shall elaborate
briefly on each of these.

3.1 Advantages Of Data Distribution

The primary advantage of distributed database systems is the ability to share and access data
in a reliable and efficient manner.

Data sharing and Distributed Control

If a number of different sites are connected to each other, then a user at one site may be able
to access data that is available at another site. For example, in the distributed banking
system, it is possible for a user in one branch to access data in another branch. Without this
capability, a user wishing to transfer funds from one branch to another would have to resort
to some external mechanism for such a transfer. This external mechanism would, in effect,
be a single centralized database.

The primary advantage to accomplishing data sharing by means of data distribution is that
each site is able to retain a degree of control over data stored locally. In a centralized system,
the database administrator of the central site controls the database. In a distributed system,
there is a global database administrator responsible for the entire system. A part of these
responsibilities is delegated to the local database administrator for each site. Depending
upon the design of the distributed database system, each local administrator may have a
different degree of autonomy which is often a major advantage of distributed databases.

Reliability and Availability

If one site fails in a distributed system, the remaining sited may be able to continue operating.
In particular, if data are replicated in several sites, transactions needing a particular data

Unit 12: Distributed Databases 7


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

item may find it in several sites. Thus, the failure of a site does not necessarily imply the
shutdown of the system.

The failure of one site must be detected by the system, and appropriate action may be needed
to recover from the failure. The system must no longer use the service of the failed site.
Finally, when the failed site recovers or is repaired, mechanisms must be available to
integrate it smoothly back into the system.

Although recovery from failure is more complex in distributed systems than in a centralized
system, the ability of most of the systems to continue to operate despite the failure of one
site results in increased availability. Availability is crucial for database systems used for real-
time applications. Loss of access to data, for example, in an airline may result in the loss of
potential ticket buyers to competitors.

Speedup Query Processing

If a query involves data at several sites, it may be possible to split the query into subqueries
that can be executed in parallel by several sites. Such parallel computation allows for faster
processing of a user's query. In those cases in which data is replicated, queries may be
directed by the system to the least heavily loaded sites.

3.2 Disadvantages Of Data Distribution

The primary disadvantage of distributed database systems is the added complexity required
to ensure proper coordination among the sites. This increased complexity takes the form of:

Software development cost: It is more difficult to implement a distributed database system


and, thus, more costly.

Greater potential for bugs: Since the sites that comprise the distributed system operate in
parallel, it is harder to ensure the correctness of algorithms. The potential exists for
extremely subtle bugs. The art of constructing distributed algorithms remains an active and
important area for research.

Unit 12: Distributed Databases 8


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Increased processing overhead: The exchange of messages and the additional


computation required to achieve inter-site coordination is a form of overhead that does not
arise in centralized systems.

In choosing the design for a database system, the designer must balance the advantages
against the disadvantages of distribution of data design ranging from fully distributed
designs to designs, which include a large degree of centralization.

Self-Assessment Questions - 2

6. The primary advantage of distributed database systems is the ability to


___________ and _____________ data in a reliable and efficient manner.
7. In a distributed system, there is a __________ administrator responsible for the
entire system.
8. According to reliability and availability, the failure of one site must be by
______________the system, and appropriate action may be needed to recover from
the failure.
9. The primary disadvantage of distributed database systems is the added
____________ required to ensure proper coordination among the sites.
10. Software development cost is the more difficult to implement in a distributed
database system and, thus, more __________ .

Unit 12: Distributed Databases 9


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

4. DESIGN OF DISTRIBUTED DATABASES

The principles of database design that we discussed earlier apply to distributed databases as
well. In this section, we focus on those design issues that are specific to distributed
databases.

Consider a relation that is to be stored in the database. There are several issues involved in
storing this relation in the distributed database, including:

Replication: The system maintains several identical replicas (copies) of the relation. Each
replica is stored at a different site, resulting in data replication. The alternative to replication
is to store only one copy of the relation.

Fragmentation: The relation is partitioned into several fragments. Each fragment is stored
at a different site.

Replication and Fragmentation: This is a combination of the above two notions. The
relation is partitioned into several fragments. The system maintains several identical
replicas of each such fragment.

In the following subsections, we elaborate on each of these.

4.1 Data Replication

If relation r is replicated, a copy of relation r is stored in two or more sites. In the most
extreme case, we have full replication, in which a copy is stored on every site in the system.

There are a number of advantages and disadvantages to replication.

Availability: If one of the sites containing relation r fails, then the relation r may be found in
another site. Thus, the system may continue to process queries involving despite the failure
of one site.

Increased parallelism: In the case where the majority of access to the relation r results in
only the reading of the relation, the several sites can process queries involving r in parallel.
The more replicas of r there are, the greater the chance that the needed data is found on the

Unit 12: Distributed Databases 10


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

site where the transaction is executing. Hence, data replication minimizes the movement of
data between sites.

Increase overhead on update: The system must ensure that all replicas of a relation r are
consistent since otherwise erroneous computations may result. This implies that whenever
r is updated, this update must be propagated to all sites containing replicas, resulting in
increased overhead. For example, in a banking system, where account information is
replicated on various sites, it is necessary that transactions assure that the balance in a
particular account agrees on all sites.

In general, replication enhances the performance of reading operations and increases the
availability of data to read transactions. However, update transactions incur greater
overhead. The problem of controlling concurrent updates by several transactions to
replicated data is more complex than the centralized approach to concurrency control. We
may simplify the management of replicas of relation r by choosing one of them as the primary
copy of r. For example, in a banking system, an account may be associated with the site in
which the account has been opened. Similarly, in an airline reservation system, a flight may
be associated with the site at which the flight originates.

4.2 Data Fragmentation

If the relation r if fragmented, r is divided into a number of fragments r1, r2 ,rn. These
fragments contain sufficient information to reconstruct the

original relation r. As we shall see, this reconstruction can take place through the application
of either the union operation or a special type of join operation on the various fragments.
There are three different schemes for fragmenting a relation: horizontal fragmentation,
vertical fragmentation, and mixed fragmentation. Horizontal fragmentation splits the
relation by assigning each tuple of r to one or more fragments. Vertical fragmentation splits
the relation by decomposing the scheme R of relation r in a special way that we shall discuss
and mixed fragmentation is a combination of horizontal and vertical fragments. These three
schemes can be applied successively to the same relation, resulting in a number of different
fragments. Note that some information may appear in several fragments.

Unit 12: Distributed Databases 11


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Below we discuss the various ways for fragmenting a relation. We shall illustrate these by
fragmenting the relation deposit, with the scheme:

Deposit-scheme = (branch-name, account-name, customer-name, balance) The relation


deposit (deposit-scheme) is shown in Figure 12.2.
branch-name account-number customer-name balance
Bombay 305 Lowman 500
Bombay 226 Camp 336
Delhi 117 Camp 205
Delhi 402 Khan 10000
Bombay 155 Khan 62
Delhi 408 Khan 1123
Delhi 639 Khan 750

Table 12.2 : Sample deposit relation

Horizontal Fragmentation:

It refers to the division of a relation into subsets of tuples (rows). Each fragment has unique
rows and all fragments have the same attributes (columns). The relation r is partitioned into
a number of subsets r1, r2, rn.

Each subset consists of a number of tuples of relation r. Each tuple of relation r must belong
to one of the fragments so that the original relation can be reconstructed if needed.

A fragment may be defined as a selection on the global relation r. That is, a predicate P1 is
used to construct fragment ri as follows:

ri =Pi(r)

The reconstruction of the relation r can be obtained by taking the union of all fragments, that
is,

r = n  i = 1 ri

To illustrate this, suppose that the relation r is the deposit relation in Figure 12.2. This
relation can be divided into n different fragments, each of which consists of tuples of
accounts belonging to a particular branch.

Unit 12: Distributed Databases 12


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

a)
branch-name account-number customer-name balance
Bombay 305 Lowman 500
Bombay 226 Camp 336
Bombay 155 Khan 62

b)
branch-name account-number customer-name balance
Delhi 177 Camp 205
Delhi 402 Khan 10000
Delhi 408 Khan 1123
Delhi 639 Khan 750

Figure 12.3: Horizontal fragmentation of relation deposit

If the banking system has only two branches, Bombay and Delhi, then there are two different
fragments:

deposit1 = branch-name = "Bombay" (deposit)

deposit2 = branch-name = "Delhi" (deposit)

These two fragments are shown in figure 12.3. Fragment deposit1 is stored in the Bombay
site. Fragment deposit 2 is stored in the Delhi site.

In our example, the fragments are disjoint. By changing the selection predicates used to
construct the fragments; we may have a particular tuple of r appear in more than one of the
r1. This is a form of data replication about which we shall say more at the end of this section.

Vertical Fragmentation:
In its most simple form, vertical fragmentation is the same as decomposition. Vertical
fragmentation of r(R) involves the definition of several subsets R1, R2,... ,Rn of R such that ri =
r.
Each fragment ri of r is defined by:
ri = Ri(r)
relation r can be reconstructed from the natural join:
r = r1 ⨝ r2 ⨝ r 3 ⨝……. ⨝ r n

Unit 12: Distributed Databases 13


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

branch-name account-number customer-name balance tuple-id


Bombay 305 Lowman 500 1
Bombay 226 Camp 336 2
Delhi 117 Camp 205 3
Delhi 402 Kahn 10000 4
Bombay 155 Kahan 62 5
Delhi 408 Khan 1123 6
Delhi 639 Green 750 7

Figure 12.4: The deposit relation of Figure 12.2 with tuple-ids

More generally, vertical fragmentation is accomplished by adding a special attribute called a


tuple-id to the scheme R. A tuple-id is a physical or logical address for a tuple. Since each
tuple in r must have a unique address, the tuple-id attribute is a key for the augmented
scheme.

In Figure 12.4, we show the relation deposit, the deposit relation of Figure 12.2 with tuple-
ids added. Figure 12.5 shows a vertical decomposition of the scheme Deposit-scheme tuple-
id into:

Deposit-scheme-3 = (branch-name. customer-name, tuple-id)

Deposit-scheme-4 = (account-number, balance, tuple-id)

The two relation shown in figure 12.5 results from computing:

deposit3=Deposit-scheme-3(Deposit')

deposit4 = Deposit-scheme-4(Deposit')

a)
branch-name customer-name tuple-id
Bombay Lowman 1
Bombay Camp 2
Delhi Camp 3
Delhi Khan 4
Bombay Khan 5
Delhi Khan 6
Delhi Green 7

Unit 12: Distributed Databases 14


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

b)
account-number balance tuple-id
305 500 1
226 336 2
117 205 3
402 10000 4
155 62 5
408 1123 6
639 750 7

Figure 12.5: Vertical fragmentation of relation deposit


To reconstruct the original deposit relation from the fragments, we compel
Deposit-scheme(deposit3  deposit4)
Note that the expression
deposit3 ⨝ deposit4

is a special form of natural join. The join attribute is tuple-id. Since the tuple-id value
represents an address, it is possible to pair a tuple of deposit3 with the corresponding tuple
of deposit4 by using the address given by the tuple-id value. This address allows direct
retrieval of the tuple without the need for an index. Thus, this natural join may be computed
much more efficiently than typical natural joins.

Although the tuple-id attribute is important in the implementation of vertical partitioning, it


is important that this attribute is not visible to users. If users are given access to tuple-ids, it
becomes impossible for the system to change tuple addresses. Furthermore, the accessibility
of internal addresses violates the notion of data independence, one of the main virtues of the
relational model.

Mixed Fragmentation:

It is also known as Hybrid fragmentation. Mixed fragmentation refers to a combination of


vertical and horizontal strategies. It can be achieved in one of two ways: by performing
vertical fragmentation followed by horizontal fragmentation or by performing horizontal
fragmentation followed by vertical fragmentation. The need for mixed fragmentation in
distributed databases arises because database users often access subsets of data which are

Unit 12: Distributed Databases 15


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

horizontal and vertical fragments of global relations and there is a need to process
transactions or queries that would access these data fragments.

Example – Suppose we want to create a fragment that consists of the account number and
balance of all the customers who belongs to Delhi branch, then we can do the same using
mixed fragmentation as:

deposit5 = account-number, balance branch-name = "Delhi" (deposit)

OR

deposit5 = branch-name = "Delhi" account-number, balance (deposit)


account-number balance
117 205
402 10000
408 1123
639 750

Figure 12.2 : Mixed fragmentation of deposit relation

Self-Assessment Questions - 3

11. Replication is the system that maintains several ________ of the relation.
12. Data replication ___________ movement of data between sites.
13. In general, replication enhances the performance of read operations and
increases the __________ of data to read transactions.
14. Horizontal fragmentation ____________ the relation by assigning each tuple of r to
one or more fragments.
15. ___________fragmentation splits the relation by decomposing the scheme R of
relation r in a special way.
16. In its most simple form, vertical fragmentation is the same as ____________ .
17. Mixed fragmentation is combination of _____________ and ___________ .

Unit 12: Distributed Databases 16


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

5. SUMMARY

A distributed database system consists of a collection of sites, each of which maintains a local
database system. Each site is able to process local transactions, those transactions that
access data only in that single site. In addition, a site may participate in the execution of
global transactions, those transactions that access data n several sites. The execution of
global transactions requires communication among the sites.

There are several reasons for building distributed database systems, including sharing of
data, reliability, and availability, and speed of query processing. However, along with these
advantages come several disadvantages, including software development cost, greater
potential for bugs, and increased processing overhead. The primary disadvantage of
distributed database systems is the added complexity required to ensure proper
coordination among the sites.

There are several issues involved in storing, a relation in the distributed database, including
replication and fragmentation. It is essential that the system minimize the degree to which a
user needs to be aware of how a relation is stored.

Unit 12: Distributed Databases 17


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

6. TERMINAL QUESTIONS

1. Discuss the relative advantages of centralized and distributed databases.


2. How much a distributed database designed for a local-area network differs from one
designed for a wide-area network?
3. When is it useful to have replication or fragmentation of data? Explain.
4. To build a highly available distributed system, you must know what kinds of failures
can occur.
a) List possible types of failure in a distributed system.
b) Which items in your list from part a are also applicable to a centralized system?
5. Consider a distributed system with two sites, A and B. Can site A
distinguish among the following?
a) B goes down.
b) The link between A and B goes down.
c) B is extremely overloaded and the response time is 100 times longer than normal.
6. What implications does your answer have for recovery in distributed systems?
7. Explain the difference between data replication in a distributed system and the
maintenance of a remote backup site.
8. What are the advantages of a distributed database management system over a
centralized DBMS?

Unit 12: Distributed Databases 18


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

7. ANSWERS

Self-Assessment Questions

1. Local
2. Installation cost.
3. Links or sites.
4. Long-haul network
5. Local transaction
6. Share and access
7. Global database
8. Detected
9. Complexity
10. Costly
11. identical replicas
12. minimizes
13. availability
14. splits
15. Vertical
16. Decomposition
17. Horizontal fragmentation, Vertical fragmentation

Terminal Questions

1. The main difference between centralized and distributed database systems is that, in a
centralized database, the data resides in one single location, while in the distributed
database the data resides in several locations. (Refer section 3.1 for detail)
2. In local-area networks, all the sites are close to each other, so communication links are
of higher speed and lower error rate than Wide area networks. (Refer section 2 for
detail)

Unit 12: Distributed Databases 19


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

3. Replication of data is useful when the interconnection of sites fails often but data is
always required to be updated. Fragmentation of data is useful when each fragment has
to be stored in a different site.
4. a) Possible types of failure in a distributed system are Failures of sites, and failures of
links.
Failures of links can be in centralized systems. (Refer section 4 for detail)
5. a) Site A can distinguish if B goes down.
b) Site A can distinguish if the link between A and B goes down.
c) Site A can distinguish if B is extremely overloaded and the response time is 100 times
longer than normal. (Refer section 3 for detail)
6. If the data are replicated then the data can be recovered even if the link or site fails in
the distributed system. (Refer section 3 for detail)
7. Data replication is done by maintaining several identical replicas (copies) of the
relation in several sites in distributed databases. Maintenance of remote backup site is
done when one of the site fails due to some reason.
8. The advantages of a distributed database management system over a centralized DBMS
is that even if there is a link failure or site failure data can be updated. (Refer section 3
for detail)

Unit 12: Distributed Databases 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 13: Object Oriented Database Management System 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 13
Object Oriented Database Management
System
Table of Contents
Fig No /
SL SAQ /
Topic Table / Page No
No Activity
Graph
1 Introduction 3–4
1.1 Objectives
2 Next Generation Data Base System 1 5
3 New Database Application 1 6–7
4 Object Oriented Database Management System 1 8 – 10
5 Features of Object Oriented System 1 11

6 Advantages of Object Oriented Database 1 12 - 16


Management System
7 Deficiencies of Relational Database Management 1 17 – 18
System
Difference between Relational Database
8 Management System and Object Oriented I 1 19 – 21
Database Management System
9 Alternative Object Oriented Database Strategies 1 22 – 23
10 Summary 24
11 Terminal Questions 24
12 Answers 25 – 26

Unit 13: Object Oriented Database Management System 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION

Since the 1960s, Data Base Management Systems (DBMS) have been widely used in the data
processing environment. The support of characteristics such as data sharing, independence,
consistency, and integrity is the main reason for its success which traditional file
management system does not inherently offer.
A database system is usually organized according to a data model. In Unit 3, we discussed
the three most popular models: hierarchical, network, and relational. The difference among
all these three models is in the way of organizing records, although they are record-based.
They were mainly designed to process a large amount of relatively simple and fixed format
data. DBMS, based on these models along with sophisticated indexing and query
optimization techniques, have served business-oriented database applications especially
well.
RDBMSs were originally designed for mainframe computer and business data processing
applications. Moreover, relational systems were optimized for environments with a large
number of users, who issue short queries. But today's application has moved from
centralized computer-aided design (CAD), multimedia systems, software engineering, and
knowledge database. These operations require complex operations and data structure
representation. For example, a multimedia database may contain variable length text,
graphics, images, audio, and video data. Finally, a knowledge base system requires data-rich
in semantics.
Existing commercial DBMS, both small and large have proven inadequate for these
applications. The traditional database notion of storing data in two- dimensional tables, or
in flat files, breaks down quickly in the face of complex data structures and data types used
in today's applications.

Research in modeling and processing complex data has gone in two directions:
a. Extending the functionality of RDBMS
b. Developing and implementing OODBMS that is based on object oriented programming
paradigm.

OODBMSs are designed for use in today's application areas such as multimedia, CAD, office
automation, etc. In this unit, we will touch upon some of the basic issues related to OODBMS.

Unit 13: Object Oriented Database Management System 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1.1 Objectives
By the end of this unit, you should be able to:
❖ Define what is object oriented DBMS
❖ Differentiate between RDBMS and OODBMS
❖ List next-generation database systems
❖ List advantages of object oriented DBMS

Unit 13: Object Oriented Database Management System 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. NEXT GENERATION DATABASE SYSTEM


Computer science has gone through several generations of database management starting
with indexed files and later, network and hierarchical database management systems
(DBMS). More recently, relational DBMS revolutionized the industry by providing powerful
data management capabilities based on a few simple concepts. Now, we are on the verge of
another generation of database systems called Object oriented DBMS, based on object
oriented programming paradigm. This new kind of DBMS, unlike previous DBMS models,
manages more complex kinds of the database management system (KDBMS) which is used
to support the management of the shared knowledge. It supports a large number of complex
rules for automatic data inferencing (retrieval), and maintenance of data integrity.
The goal of this new DBMS is to support a much wider range of data-intensive applications
in engineering, and graphic representation – scientific and medical. This new DBMS can also
support new generations of traditional business applications.

Self-Assessment Questions - 1
1. OODBMS is an acronym for .
2. KDBMS is an acronym for .
3. The new DBMS can support new generations of traditional business _______.

Unit 13: Object Oriented Database Management System 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. NEW DATABASE APPLICATION


Some applications that require the manipulation of large amounts of data can benefit from
using a DBMS. However, the nature of the data in these applications does not fit well into the
relational framework.
1. Design databases: Engineering design databases are useful in computer-aided
design/manufacturing/software engineering (CAD/CAM/ CASE) systems. In such
systems, complex objects can be recursively partitioned into smaller objects.
Furthermore, an object can have different representations at different levels of
abstraction (equivalent objects). Moreover, a record of an object's evolution (object
versions) should be maintained. Traditional database technology does not support the
notions of complex objects, equivalent objects, or object versions.
2. Multimedia databases: In modern office information or other multi-media systems,
data includes not only text and numbers but also images, graphics, and digital audio
and video. Such multimedia data is typically stored as sequences of bytes with variable
lengths, and segments of data are linked together for easy reference. The variable-
length data structure cannot fit well into the relational framework, which mainly deals
with fixed-format records. Furthermore, applications may require access to multimedia
data on the basis of the structure of a graphical item or by following logical links.
Conventional query languages were not designed for such applications.
3. Knowledge bases: Artificial intelligence and expert systems represent information as
facts and rules that can be collectively viewed as a knowledge base. In typical Artificial
Intelligence applications, knowledge representation requires data structures with rich
semantics that go beyond the simple structure of the relational model. Artificial
decomposition and mapping would be necessary if a relational DBMS were used.
Furthermore, operations in a knowledge base are more complex than those in a
traditional database. When a rule is added, the system must check for contradiction and
redundancy. Such operations cannot be represented directly by relational operations,
and the complexity of checking increases rapidly, as the size of the knowledge base
grows.

Unit 13: Object Oriented Database Management System 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

In general, these applications require the representation of complex data elements as well
as complex relationships among them. Users in these environments have found relational
technology inadequate in terms of flexibility, modeling power, and efficiency.

Self-Assessment Questions - 2
CASE is an acronym for .
The data is typically stored as sequences of bytes with variable
lengths, and segments of data are linked together for easy reference.
Artificial intelligence and expert systems represent information as facts and rules
that can be collectively viewed as a .

Unit 13: Object Oriented Database Management System 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. OBJECT ORIENTED DATABASE MANAGEMENT SYSTEM


Object-oriented technologies in use today include object-oriented programming languages
(e.g., C++ and small talk), object-oriented database systems, object-oriented user interfaces
(e.g., Macintosh and Microsoft Windows systems), and so on. An object-oriented technology
is a technology that makes available to the user's facilities that are based on object-oriented
concepts. To define object-oriented concepts, we must first understand what an object is.

Object
The term object means a combination of data and program that represents some real-world
entity. For example, consider an employee named Amit; Amit is 25 years old, and his salary
is $25000. Then Amit may be represented in a computer program as an object. The data part
of this object would be (name: Amit, age: 25, salary: $25000). The program part of the object
may be a collection of programs (hire, retrieve the data, change age, change salary, fire). The
data part consists of data of any type. For the Amit object, the string is used for the name,
integer for age, and monetary for salary; but in general, even any user-defined type, such as
Employee, may be used. In the Amit object, the name, age, and salary are called attributes of
the object.

Encapsulation
Often, an object is said to encapsulate data and program. This means that the users cannot
see the inside of the object, but can use the object by calling the program part of the object.
This is not much different from procedure calls in conventional programming; the users call
a procedure by supplying values for input parameters and receive results in output
parameters.

Inheritance and Class


The term object-oriented roughly means a combination of object encapsulation and
inheritance. The term inheritance is sometimes called reuse. Inheritance means roughly that
a new object may be created by extending an existing object. Now let us understand the term
inheritance more precisely. An object has a data part and a program part. All objects that
have the same attributes for the data part and the same program part are collectively called
a class (or type). The classes are arranged such that some classes may inherit the attributes
and program parts from some other classes.

Unit 13: Object Oriented Database Management System 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Amit, Ankit, and Anup are each an Employee object. The data part of each of these objects
consists of the attributes Name, Age, and salary. Each of these Employee objects has the same
program part (hire, retrieve the data, change age, change salary, fire). Each program in the
program part is called a method. The term class refers to the collection of all objects that
have the same attributes and methods. In our example, the Amit, Ankit, and Anup objects
belong to the class Employee, since they all have the same attributes and methods. This class
may be used as the type of an attribute of any object. At this time, there is only one class in
the system namely, the class Employee; and three objects that belong to the class namely
Amit, Ankit, and Anup objects.

Inheritance Hierarchy or Class Hierarchy


Now suppose that a user wishes to create two sales employees, Jai and Prakash. But sales
employees have an additional attribute namely, commission. The sales employees cannot
belong to the class Employee. However, the user can create a new class, Sales Employee, such
that all attributes and methods associated with the class Employee may be reused and the
attribute commission may be added to Sales Employee. The user does this by declaring the
class Sales Employee to be a subclass of the class Employee. The user can now proceed to
create the two sales employees as objects belonging to the class Sales Employee. The users
can create new classes as subclasses of existing classes. In general, a class may inherit from
one or more existing classes, and the inheritance structure of classes becomes a directed
acyclic graph (DAG); but for simplicity, the inheritance structure is called an inheritance
hierarchy or class hierarchy.

The power of object-oriented concepts is delivered when encapsulation and inheritance


work together.
Since inheritance makes it possible for different classes to share the same set of attributes
and methods, the same program can be run against objects that belong to different classes.
This is the basis of the object-oriented user interface that desktop publishing systems and
windows management systems provide today. The same set of programs (e.g., open, close,
drop, create, move, etc.) apply to different types of data (image, text file, audio, directory,
etc.).
If the users define many classes, and each class has many attributes and methods, the benefit
of sharing not only the attributes but also the programs can be dramatic. The attributes and

Unit 13: Object Oriented Database Management System 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

programs need not be defined and written from scratch. New classes can be created by
adding attributes and methods of existing classes, thereby reducing the opportunity to
introduce new errors to existing classes.

Self-Assessment Questions - 3
The term means a combination of data and programthat
represents some real-world entity.
The means that the users cannot see the inside of theobject but can use the
object by calling the program part of the object.
The term inheritance is sometimes called .
Each program in the program part is called a .
In general, a class may inherit from one or more existing classes and,the
inheritance structure is called an inheritance hierarchy or ____________.

Unit 13: Object Oriented Database Management System 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. FEATURES OF OBJECT ORIENTED SYSTEM


Object-oriented systems make these promises:
Reduced maintenance: The primary goal of object-oriented development is the assurance
that the system will enjoy a longer life while having far smaller maintenance costs. Because
most of the processes within the system are encapsulated, the behaviors may be reused and
incorporated into new behaviors.

Real-world modeling: Object-oriented systems tend to model the real world in a more
complete fashion than do traditional methods. Objects are organized into classes of objects,
and, objects are associated with behaviors. The model is based on objects rather than on data
and processing.

Improved reliability: Object-oriented systems promise to be far more reliable than


traditional systems, primarily because new behaviors can be built from existing objects.

High code reusability: When a new object is created, it will automatically inherit the data
attributes and characteristics of the class from which it was spawned. The new object will
also inherit the data and behaviors from all superclasses in which it participates.

processing.

Self-Assessment Questions - 4
The primary goal of object-oriented development is the assurance thatthe system
will enjoy a longer life while having far smaller .
The Real World Model is based on objects rather than on
and processing.
Object-oriented systems promise to be far more thantraditional
systems.

Unit 13: Object Oriented Database Management System 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. ADVANTAGES OF OBJECT ORIENTED DATABASE MANAGEMENT


SYSTEM

An object-oriented programming language (OOPL) provides facilities to create classes for


organizing objects, create objects, structure an inheritance hierarchy to organize classes so
that subclasses may inherit attributes and methods from superclasses, and call methods to
access specific objects. Similarly, an object-oriented database system (OODB) should provide
facilities to create classes for organizing objects, create objects, structure an inheritance
hierarchy to organize classes so that subclasses may inherit attributes and methods from
superclasses, and call methods to access specific objects. Beyond these, an OODB, because it
is a database system, must provide standard database facilities found in today's relational
database systems (RDBs), including nonprocedural query facility for retrieving objects,
automatic query optimization, and processing, dynamic schema changes (changing the class
definitions and inheritance structure), automatic management of access methods (e.g., B+-
tree index, extensible hashing, sorting, etc.) to improve query processing performance,
automatic transaction management, concurrency control, recovery from system crashes, and
security and authorization. Programming languages, including OOPLs, are designed with one
user and a relatively small database in mind. Database systems are designed with many users
and very large databases in mind; hence performance, security and authorization,
concurrency control, and dynamic schema changes become important issues. Further,
transaction systems are used to maintain critical data accurately; hence, transaction
management, concurrency control, and recovery are important facilities.
In view of the fact that C++, despite its growing popularity, is not the only programming
language that database application programmers are using or will ever use, there is a
significant gulf between a programming language and a database system that will deliver the
power of object-oriented concepts to database application programmers. Regardless of the
approach, OODBs, if done right, can bring about a quantum jump in the productivity of
database application programmers and even in the performance of the application
programs.

Unit 13: Object Oriented Database Management System 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Advantages of Object-Oriented Databases


Systems developed with object-oriented languages have many benefits, as previously
discussed. Yet, as also described, these systems have particular attributes that can be
complemented with object-oriented databases. These attributes include lack of persistence,
inability to share objects among multiple users, limited version control, and lack of access to
other data, for example, data in other databases.

In systems designed with object-oriented languages, objects are created during the running
of a program and are destroyed when the program ends. Providing a database that can store
the objects between runs of a program offers both increased flexibility and increased
security. The ability to store the objects also allows the objects to be shared in a distributed
environment. An object-oriented database can allow only the actively used objects to be
loaded into memory, and thus minimizes or preempts the need for virtual memory paging.
This is especially useful in large-scale systems. Persistent objects also allow objects to be
stored for each version. This version control is useful not only for testing applications but
also for many object-oriented design applications where version control is a functional
requirement of the application itself. Access to other data sources can also be facilitated with
object-oriented databases, especially those built as hybrid relational systems, which can
access relational tables as well as other object types.

Object-oriented databases also offer many of the benefits that were formerly found only in
expert systems. With an object-oriented database, the relationships between objects and the
constraints in objects are maintained by the database management system, that is, the
objects themselves. The rules associated with the expert system are essentially replaced by
the object schema and the methods. As many expert systems currently do not have adequate
database support, object-oriented databases afford the possibility of offering expert system
functionality with much better performance.

Object-oriented databases offer benefits over current hierarchical and relational database
models. They enable support of complex applications not supported well by the other
models. They enable programmability and performance, improve navigational access, and
simplify concurrency control. They lower the risks associated with referential integrity, and
they provide a better user metaphor than the relational model.

Unit 13: Object Oriented Database Management System 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Object-oriented databases by definition allow the inclusion of more of the code (i.e. the
object's methods) in the database itself. This incremental knowledge about the application
has a number of potential benefits for the database system itself, including the ability to
optimize query processing and to control the concurrent execution of transactions.

Performance, always a significant issue in system implementation, may be significantly


improved by using an object-oriented model instead of a relational model. The greatest
improvement can be expected in applications with high data complexity and large numbers
of inter-relationships. Clustering, or locating the related objects in close proximity, can be
accomplished through class hierarchy or by other interrelations. Caching, or the retention of
certain objects in memory or storage can be optimized by anticipating that the user or
application may retrieve a particular instance of the class. When there is high data
complexity, clustering and caching techniques in object databases gain tremendous
performance benefits that relational databases, because of their fundamental architecture,
will never be able to approach.

Object-oriented databases can store not only complex application components, but also
larger structures. Although relational systems can support a large number of tuples (i.e.,
rows in a table), individual types are limited in size. Object-oriented databases with large
objects do not suffer performance degradation, because the objects do not need to be broken
apart and reassembled by applications, regardless of the complexity of the properties of the
application objects.

Since objects contain direct references to other objects, complex data sets can be efficiently
assembled using these direct references. The ability to search by direct references
significantly improves navigational access. In contrast, complex data sets in relational
databases must be assembled by the application program using the slow process of joining
tables.

For the programmer, one of the challenges in building a database is the data manipulation
language (DML) of the database. DMLs for relational databases usually differ from the
programming language used to construct the rest of the application. This contrast is due to
differences in the programming paradigms and mismatches of type systems. The
programmer must learn two languages, two toolsets, and two paradigms because neither

Unit 13: Object Oriented Database Management System 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

alone has the functionality to build an entire application. Certain types of programming tools,
such as application generators and fourth-generation languages (4GLs) have emerged to
produce code for the entire application, thereby bridging the mismatch between the
programming language and the DML, but most of these tools compromise the application
programming process.

With object-oriented databases, much of this problem is eliminated. The DML can be
extended so that the application can be written in the DML. Or an object-oriented application
language, for example, C++ can be extended to be the DML. More of the application can be
built into the database itself. Class libraries can also assist the programmer in speeding the
creation of databases. Class libraries encourage the reuse of existing code and help minimize
the cost of later modifications. Programming is easier because the data structures model the
problem more closely. Having the data and procedures encapsulated in a single object makes
it less likely that a change to one object will affect the integrity of other objects in the
database. Concurrency control is also simplified with an object-oriented database. In a
relational database, the application needs to look into each record in each table explicitly
because related data is re-represented across a number of tables. Integrity, a key
requirement for databases, can be better supported with an object-oriented database
because the application can lock all the relevant data in one operation. Referential integrity
is better supported in an object-oriented database because the pointers are maintained and
updated by the database itself. Finally, object-oriented databases offer a better user
metaphor than relational databases. The tuple or table, although enabling a well-defined
implementation strategy, is not an intuitive modeling framework, especially outside the
domain of numbers. Objects offer a more natural and encompassing modeling metaphor.

Unit 13: Object Oriented Database Management System 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 5
Programming languages, including OOPLs, are designed with one userand a
relatively small in mind.
Providing a database that can store the objects between runs of aprogram
offers both increased flexibility and .
Object- oriented databases also offer many of the benefits that wereformerly
found only in .
Object-oriented databases offer benefits over current
and database models.
Object-oriented databases can store not only complex applicationcomponents
but also .

Unit 13: Object Oriented Database Management System 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. DEFICIENCIES OF RELATIONAL DATA BASE MANAGEMENT SYSTEM


The data type facilities are the keys to eliminating three of the important deficiencies of
RDBs. These are summarized in the paragraphs given below.

RDBs force the users to represent hierarchical data (or complex nested data or compound
data) such as bill of materials in terms of tuples in multiple relations. This is awkward to
start with. Further, to retrieve data thus spread out in multiple relations. RDBs must resort
to joins, a general expensive operation. The data type of an attribute of an object in OOPLs
may be a primitive type or an arbitrary user-defined type (class). The fact that an object may
have an attribute whose value may be another object naturally leads to nested object
representation, which in turn allows hierarchical data to be naturally (i.e., hierarchically)
represented.

RDBs offers a set of primitive built-in data types for use as domains of columns of relation,
but they do not offer any means of adding user-defined data types. The built-in data types
are basically all numbers and short symbols. RDBs are not designed to allow new data types
to be added and therefore often require major surgery to the system architecture and code
to add any new data type. Adding a new data type to a database system means allowing its
use as the data type of an attribute – that is, storage of data of that type, querying, and
updating of such data. Object encapsulation in OOPLs does not impose any restriction on the
types of data, that the data may be primitive types or user-defined types. Further, new data
types may be created as new classes, possibly even as subclasses of existing classes,
inheriting their attributes and methods.

Object encapsulation is the basis for the storage and management of programs as well as
data in the database. RDBs now supports stored procedures – that is, they allow programs
to be written in some procedural language and stored in the database for later loading and
execution. However, the stored procedures in RDBs are not encapsulated with the data – that
is, they are not associated with any linear relation or any tuple of a relation. Further, since
RDBs do not have the inheritance mechanism, the stored procedures cannot automatically
be reused.

Unit 13: Object Oriented Database Management System 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 6
The data type facilities are the keys to eliminating of the important
deficiencies of RDBs.
Adding a new data type to a system means allowing its use as the data
type of an attribute – that is, storage of data of that type, querying, and updating
of such data.

Unit 13: Object Oriented Database Management System 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. DIFFERENCE BETWEEN RELATIONAL DATABASE MANAGEMENT


SYSTEM AND OBJECT ORIENTED DATABASE MANAGEMENT SYSTEM

RDBMSs were never designed to allow for the nested structure. These types of applications
are extensively found in CAD/CAE, aerospace, etc. OODBM can easily support these
applications. Moreover, it is much easier and more natural to navigate through these
complex structures in the form of objects that model the real world in OODBMS, rather than
tables, tuples, and records in RDBMS.
It is hard to confuse a relational database with an object-oriented database. The normalized
relational model is based on a fairly elegant mathematical theory. Relational databases
derive a virtual structure at run time, based on values from sets of data stored in tables.
Database constructs views of the data by selecting data from multiple tables and loading it
into a single table (OODBs traverse the data from the object to object).

Relational databases have a limited number of simple, built-in data types, such as integer and
string, and a limited number of built-in operations that can handle these data types. You can
create complex data types in a relational database, but you must do it on a linear basis, such
as combining fields into records. And the operations on these new complex types are
restricted, again, to these defined for the basic types (as opposed to arbitrary data types or
sub-classing with inheritance as found in OODBs).

The object model supports browsing of object class libraries, which allows the reuse, rather
than the reinvention, of commonly used data elements. Objects in an OODB survive multiple
sessions; they are persistent. If you delete an object stored in a relational database, other
objects may be left with references to the deleted one and may now be incorrect. The
integrity of the data thus becomes suspect and creates inconsistent versions.

In the relational database, complex objects must be broken up and stored in separate tables.
This can only be done in a sequential procedure, with the next retrieval relying on the
outcome of the previous. The relational database does not understand a global request, and
thus cannot optimize multiple requests; OODBs can issue a single message (request) that
contains multiple transactions.

Unit 13: Object Oriented Database Management System 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The relational model, however, suffers at least one major disadvantage. It is difficult to
express the semantics of complex objects with only a table model for data storage. Although
relational databases are adequate for accounting or other typical transaction–processing
applications, where the data types are simple and few in number, the relational model offers
limited help when data types become numerous and complex.

Object-oriented databases are favored for applications where the relationships among
elements in the database carry the key information. Relational databases are favored when
the values of the database elements carry the key information. That is, object-oriented
models, capture the structure of the data; relational models organize the data itself. If a
record can be understood in isolation, then the relational database is probably suitable. If a
record makes sense only in the context of other records, then an object-oriented database is
more appropriate.

Engineering and technical applications were the first applications to require databases that
handle complex data types and capture the structure of the data. Applications such as
mechanical and electrical computer-aided design (MCAD and ECAD) have always used
nontraditional forms of data, representing such phenomena as three-dimensional images
and VLSI circuit designs. Currently, these application programs store their data in
application-specific file structures. The data-intensiveness of these applications is not only
in a large amount of data that needs to be programmed into the database but in the
complexity of the data itself. In these design-based applications, relationships among
elements in the database carry key information for the user. Functional requirements for
complex cross-references, structural dependences, and version management all require a
richer representation than what is provided by hierarchical or relational databases. The
comparison of OODBMS and RDBMS is shown in table 13.1.

Unit 13: Object Oriented Database Management System 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Table 13.1: Comparison of OODBMS with RDBMS

OODBMS RDBMS

It emphasizes data It emphasizes mainly data


encapsulation and data independence.
independence.
It stores methods and data. It stores data only.

The data can be used only through The data can be partitioned on basis
their classes’ methods. of users’ requirements and onthe
specific users’ applications.
Data and methods non- Data normalization aims at reducing
redundancy is achieved through or eliminating data redundancy. It is
encapsulation and inheritance. used in the stage of designing the
Inheritance helps to reduce the database and not in the stage of
redundancy of methods. developing the applications.
Object-oriented databases are Relational databases are favored
favored for applications where the when the values of the database
relationships among elementsin elements carry the key information.
the database carry the key
information.
It follows object oriented It does not have object oriented
properties. properties.

Self-Assessment Questions - 7
Relational databases derive a structure at run time based onvalues
from sets of data stored in tables.
Objects in an OODB survive multiple sessions; they are .
In the relational database, complex objects must be broken up and stored in
separate .
Object-oriented databases are favoured for applications where the relationships
among elements in the database carry the information.
Relational databases are favoured when the values of the _________________
elements carry the key information.

Unit 13: Object Oriented Database Management System 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

9. ALTERNATIVE OBJECT-ORIENTED DATABASE STRATEGIES


There are at least six approaches for incorporating object orientation capabilities in
databases:
1. Novel database data model/data language approach: The most aggressive
approach develops an entirely new database management system with object
orientation capabilities. Most of the research projects in object-oriented databases
have pursued this approach. The industry introduces novel DML (Data Manipulation
Language) and DDL (Data Definition Language) constructs for a data model based on
semantic and functional data models.
2. Extending an existing database language with object-orientation capabilities: A
number of programming languages have been extended with object-oriented
constructs. C++ flavors (an extension of LISP), and Object Pascal are examples of this
approach in programming languages. It is conceivable to follow a similar strategy with
database languages. Since SQL is a standard and the most popular database language,
the most reasonable solution is to extend this language with object-oriented constructs,
reflecting the object oriented capabilities of the underlying database management
system. This approach is being pursued by most vendors of relational systems, as they
evolve the next generation products. There have been many such attempts
incorporating inheritance, function composition for nested entities, and even some
support of encapsulation in an SQL framework.
3. Extending an existing object-oriented programming language with database
capabilities: is to introduce database capabilities to an existing object-oriented
language, object identity – will already be supported by the object-oriented language.
The extensions will incorporate database features (querying, transaction support,
persistence, and so on).
4. Embedding object-oriented database language constructs in a host
(conventional) language: Database languages can be embedded in host programming
languages. For example, SQL statements can be embedded in PL/SQL, FORTRAN, and
Ada. The types of SQL (that is relations and rows in relations) are quite different from
the type systems of these host languages. Some object-oriented databases have taken a
similar approach with a host language and an object-oriented database language.

Unit 13: Object Oriented Database Management System 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 8
27. Most of the research projects in object-oriented databases have pursued
_______approach.
28. A number of programming languages have been extended with
_____constructs.
29. Database languages can be embedded in host languages.

Unit 13: Object Oriented Database Management System 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

10. SUMMARY
During the past decade, object-oriented technology has found its way into database user
interfaces, operating systems, programming languages, expert systems, and the like. In this
unit, we discussed the advantages of OODBMS and the deficiencies of RDBMS. We discussed
that OODB must provide standard database facilities found in today's relational database
system (RDBS), including nonprocedural query facility for retrieving objects, automatic
query optimization and processing, dynamic schema changes, automatic management of
access methods to improve query processing performance, automatic transaction
management, concurrency control, recovery from system crashes, and security and
authorization.
Object-Oriented database product has already been in the market for several years, and
several vendors of RDBMS are now declaring that they will extend their products with
object-oriented capabilities. Nowadays, OODBMS is evolving in the industry to overcome all
the drawbacks of RDBMS.

11. TERMINAL QUESTIONS

1. What are the drawbacks of current commercial databases?


2. What is the meaning of multi-media data?
3. List a few requirements for multi-media data management.
4. What are the two kinds of new data types supported in object-database systems? Give
an example of each, and discuss how the example situation would be handled if only an
RDBMS were available.
5. Compare RDBMSs with OODBMSs. Describe an application scenario for which you
would choose an RDBMS, and explain why. Similarly, describe an application scenario
for which you would choose an ORDBMS, and explain why.

Unit 13: Object Oriented Database Management System 24


DCA2102: Database Management System Manipal University Jaipur (MUJ)

12. ANSWERS

Self Assessment Questions


1. Object Oriented Database Management Systems
2. Knowledge-based Database Management Systems
3. Applications
4. Computer Oriented Software Engineering
5. Multimedia
6. knowledge base
7. Object
8. Encapsulation
9. Reuse
10. Method
11. class hierarchy
12. maintenance costs
13. data
14. reliable
15. database
16. increased security.
17. expert systems
18. hierarchical and relational
19. larger structures
20. Three
21. database
22. Virtual
23. Persistent
24. Tables
25. Key
26. Database
27. Novel database data model
28. object-oriented
29. programming

Unit 13: Object Oriented Database Management System 25


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Terminal Questions

1. Existing commercial DBMS, both small and large have proven inadequate for the
applications like computer-aided design (CAD), multimedia systems, software
engineering, and knowledge database where operations require complex operations
and data structure representation. (Refer section 1, 3 and 7 for detail)

2. Multi-media data are those data which includes text, numbers, images, graphics, and
digital audio and video. Such multimedia data is typically stored as sequences of bytes
with variable lengths, and segments of data are linked together for easy reference.
(Refer section 3 for detail)

3. Multi-media data requires storage for the sequences of bytes with variable lengths, and
segments of data that can be linked together for easy reference. The variable-length
data structure cannot fit well into the relational framework, which mainly deals with
fixed-format records. (Refer section 3)

4. In object oriented database systems, the data type of an attribute of an object in OOPLs
may be (i) a primitive type (ii) an arbitrary user-defined type (class). (Refer section 7
for detail)

5. Both RDBMS and OODBMS are used to store and retrieve information to and from
DBMS. But RDBMS does not support user-defined data types (class) and does not allow
nested structure whereas OODBMS supports all of these and is more natural to navigate
through the complex structures in the form of objects that model the real world in
OODBMS, rather than a table, tuples, and records in RDBMS. (Refer section 8 for detail)

Unit 13: Object Oriented Database Management System 26


DCA2102: Database Management System Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER
APPLICATIONS
SEMESTER 3

DCA2102
DATABASE MANAGEMENT SYSTEM

Unit 14: Object Relational Mapping 1


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Unit 14
Object Relational Mapping
Table of Contents
SL Topic Fig No / SAQ / Page No
No Table / Activity
Graph
1 Introduction
3
1.1 Objectives
2 Significance of Mapping 1 4–5
3 Mapping Basics 1 6–7
4 Mapping a Class Inheritance Tree 1 8–9
5 Mapping Object Relationships
5.1 Types of relationships
5.2 Implementation of object relationships
5.3 Implementation of relational database relationships 10 – 17

5.4 Relationship mappings


5.5 Mapping ordered collections
5.6 Mapping recursive relationships 1
6 Modeling with Join Tables 1 18 – 19
7 Open Source Object Relational Mapping Software 1 20 – 22
8 Summary 23
9 Terminal Questions 23
10 Answers 24

Unit 14: Object Relational Mapping 2


DCA2102: Database Management System Manipal University Jaipur (MUJ)

1. INTRODUCTION
We have already discussed so far that the database management task is done so that all the
data working with the application system can be retained, resulting in information and
knowledge. This task may be done by structured query language database management
systems (SQL DBMS) or by object oriented programming techniques or other traditional
methods. SQL DBMS manipulates scalar values such as integers and strings which are
organized within normalized tables whereas the Object-oriented programming technique
implements data management by manipulating the object that represents non-scalar values.
In order to maintain and manage a database, a programmer can either convert the object
values into groups of simpler values for storage in the database (and convert them back upon
retrieval) or can use simple scalar values within the program.
In this unit, we will discuss Object-relational mapping (ORM or O/RM or O/R mapping)
which is a programming technique for converting data between two different paradigms (a
relational database and an object-oriented programming language). We will also discuss the
significance of mapping, modeling, and implementation aspects related to Database
management.

1.1 Objectives

After studying this unit, you should be able to:


❖ Describe Object Relational Mapping and its significance.
❖ Discuss the Mapping basics
❖ Explain strategies for representing a class inheritance tree
❖ Describe how object relationships can be mapped to the relational model
❖ Explain the open-source object relational mapping software.

Unit 14: Object Relational Mapping 3


DCA2102: Database Management System Manipal University Jaipur (MUJ)

2. SIGNIFICANCE OF MAPPING

We have already mentioned that Object-relational mapping (ORM or O/RM or O/R mapping)
is a programming technique for converting data between two different paradigms (a
relational database and an object-oriented programming language) where data
representations, data manipulation, modeling techniques and other entities related to it are
different. It helps to create a "virtual object database" that can be used from within the
programming language so that the objects can be translated into forms that can be stored in
the database for easy retrieval while preserving the properties of the objects and their
relationships.

Persistent Object
While dealing with ORM you have to know what a persistent object is. It is the object that can
automatically store and retrieve itself in the database while preserving the properties of
objects and their relationships.

Although the advantage of ORM is that it often reduces the amount of code needed to be
written, the disadvantage would be due to some O/R mapping tools that do not perform well
during bulk deletions of data. Hence ORM software has been pointed to as a major factor in
producing poorly designed databases.

Now let us discuss what the significance of mapping is. When we write an application, for
example in Java, which stores data in an RDBMS by relying on data-aware controls (e.g. data-
aware GUI components) to interface directly with the database then such applications are
not object oriented, and by using such two-layer (GUI/Database) applications, we lose the
benefits of object oriented design principle i.e. encapsulation. Applications that use data-
aware widgets or controls, the client and the database are very tightly coupled where GUI
code, business logic, and SQL statements are all interwoven throughout the application
source code. So, any changes in the database schema will surely cascade into unexpected
failures resulting in a maintenance nightmare. This type of problem where technical
difficulties are often encountered when a relational database management system (RDBMS)
is being used by a program written in an object-oriented programming language or style is
known as object-relational impedance mismatch. This type of problem occurs when objects

Unit 14: Object Relational Mapping 4


DCA2102: Database Management System Manipal University Jaipur (MUJ)

or class definitions are mapped in a straightforward way to database tables or relational


schema.

Object-relational impedance mismatch is due to the following reasons:


• objects can't be directly saved to and retrieved from relational databases
• objects have an identity, state, and behavior in addition to data whereas RDBMS stores
data only.
• even data alone can present a problem since there is often no direct mapping between
Java and RDBMS data types.
• objects are traversed using direct references while RDBMS tables are related via like
values in foreign and primary keys.
• current RDBMS has no parallel to Java's object inheritance for data and behavior.
• the goal of relational modeling is to normalize data (i.e., eliminate redundant data from
tables), whereas the goal of object-oriented design is to model a business process by
creating real-world objects with data and behavior.
• One of the main objectives of Object-Relational mapping is to solve the problem stated
above (i.e. the object-relational impedance mismatch).

Self-Assessment Questions - 1
1. Object-relational impedance mismatch occurs when objects or class definitions are
mapped in a straightforward way to________________.
2. The main objective of Object-Relational mapping is to solve object-relational _________.

Unit 14: Object Relational Mapping 5


DCA2102: Database Management System Manipal University Jaipur (MUJ)

3. MAPPING BASICS

We can see that classes of OOP language can be mapped to RDBMS tables. For e.g. JAVA
classes can be mapped to RDBMS tables.

We can map a persistent class and a table in so many ways. The simplest mapping between
a persistent class and a table is one-to-one. In this case, all attributes in the persistent class
are represented by all columns of a table. Each instance of a business class is then stored in
a row of the table.

Although this type of mapping is straightforward, it might conflict with the existing object
and entity-relation (ER) models. This is because the goal of object modeling is to model a
business process using real-world objects, whereas the goal of ER modeling is normalization
and fast retrieval of data.

Let us take an example of the Visual Business Sight Framework ™ (VBSF) which is an object-
relational Java framework that allows Java objects to be easily saved and retrieved from
relational databases. Here a Java developer has to work with two different models: (i) an
object model and (ii) a relational model. As Java has to access relational databases and must
deal with tables, rows, and columns, Developers have to go work with tedious routines that
convert rows and columns into Java objects. This is known as the impedance mismatch
between Java objects and relational databases. VBSF contains an object-relational mapping
engine that automatically implements object persistence to relational databases, allowing a
programmer to forget about JDBC and stay focused on the object model.

As a result, VBSF supports two types of class-to-table mappings that help overcome the
differences between relational and object models: SUBSET and SUPERSET.

SUBSET Mapping: This type of mapping is used when all the attributes of a persistent class
are mapped to the same table. It is also used when a persistent class is not concerned with
some of the columns (not part of the business model) of its corresponding table in the
database.

The attributes of a persistent class with a subset mapping represent either a portion of the
columns in a table or all of the columns in the table. Subset mappings can also create

Unit 14: Object Relational Mapping 6


DCA2102: Database Management System Manipal University Jaipur (MUJ)

"projection classes" for tables with a large number of columns. A projection class allows a
user to select a row for full retrieval from the database. The full row can be mapped to
another persistent class. This type of design reduces the amount of information passed
across the network. Subset mappings are also used to map a class inheritance tree to a table
using filters.

SUPERSET Mapping: This is done when a persistent class with a superset mapping contains
attributes derived from columns of multiple tables. This type of mapping is also known as
table spanning. Superset mappings can be used to create "view classes" that hide the
underlying data model. It also can map a class inheritance tree to the database using a
Vertical mapping approach. VBSF fully supports performing insertions, updates, and
deletions of objects with this type of mapping, while transparently updating and maintaining
all foreign key columns that join the tables.

Self-Assessment Questions - 2
3. SUBSET Mapping is used when all the attributes of a persistent class are mapped to a
different table. (True/False)
4. SUPERSET Mapping is done when a persistent class with a superset mapping contains
attributes derived from columns of .

Unit 14: Object Relational Mapping 7


DCA2102: Database Management System Manipal University Jaipur (MUJ)

4. MAPPING A CLASS INHERITANCE TREE

A class inheritance tree can be mapped in the RDBMS as vertical mapping, horizontal
mapping, and filtered mapping.
VBSF supports all three methods. A combination of these types of mappings can also be used
within an inheritance tree.

• Vertical Mapping: In vertical mapping, each class in the tree, whether abstract or
concrete, is mapped to a different table. All branch and leaf tables in the tree must be
linked to their parent tables. This is done by the use of a foreign key column that
references the primary key of the parent table.
• Horizontal Mapping: Under horizontal mapping, each concrete class in the tree is
mapped to a different table. Each of these mapped tables contains columns for all
attributes in its concrete class, plus all attributes inherited from all its abstract parent
classes. In other words, abstract classes are not mapped to their own table. This
approach results in fast performance and is simple to design. However, if an attribute
of an abstract parent class is changed, then potentially many tables must be modified.
Consequently, this method is most useful if the inheritance tree is more method-driven
than attribute-driven. In Short, if a substantial number of classes inherit a large number
of attributes from an abstract parent class, then the vertical or filtered methods might
be better choices.
• Filtered Mapping: In filtered mapping, all concrete classes in the tree are mapped to
the same table. The table must contain columns for all attributes of all the abstract and
concrete classes in the inheritance tree (or the part of the tree using this mapping). In
addition, a filter column is created in the table. The value of the filter column is used to
distinguish between subclasses. Abstract classes are not mapped to this table. This
approach provides adequate performance but violates table-normalization rules. More
specifically, it could lead to a substantial number of NULL columns in the table resulting
in wastage of space. Consequently, this method is most useful if most of the attributes
are inherited from the abstract parent classes.

Unit 14: Object Relational Mapping 8


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 3
5. In vertical mapping, each class in the tree, whether abstract or concrete, is mapped
to a table.
6. In filtered mapping, a filter column is created in the table and the value of the filter
column is used to distinguish between subclasses. (True/False)

Unit 14: Object Relational Mapping 9


DCA2102: Database Management System Manipal University Jaipur (MUJ)

5. MAPPING OBJECT RELATIONSHIPS


We have discussed the class inheritance tree mapping in the previous section. Now we will
discuss the art of relationship mapping. There are three types of object relationships that
may be needed to map: association, aggregation, and composition. But as of now, let us treat
these three types of relationships the same (i.e. They are mapped in the same way although
mapping, in the same way, creates a problem when it comes to referential integrity.)

5.1 Types Of Relationships


Basically object relationships (for mapping to the relational model) can be categorized based
on multiplicity, directionality, ownership, and reference.

Based on the multiplicity, object relationships can be One-to-one relationships, One-to-


many relationships, and Many-to-many relationships.

• One-to-one relationships: This is a relationship where the maximum of each of its


multiplicities is one.

In the relational model, a one-to-one relationship is usually maintained by means of an


embedded foreign key column. This foreign key column holds the value of the primary
key (the object ID) of the row (object) being referenced.
For e.g. a one-to-one relationship is implemented in VBSF by defining a Reference
attribute. VBSF transparently converts the foreign key reference to an object and
updates the value of the foreign key column from referenced objects.

It is also possible to define a one-to-one relationship in the relational model using a join
table. VBSF also supports this scenario.

• One-to-many relationships: This is also known as a many-to-one relationship, this


occurs when the maximum of one multiplicity is one and the other is greater than one.

In the object model, there are two types of one-to-many relationships: aggregation
(part-of), and association (acquaintance).
Under VBSF an aggregation relationship is defined by means of an Owned Collection
attribute, and an association relationship by means of a Referenced Collection attribute.

Unit 14: Object Relational Mapping 10


DCA2102: Database Management System Manipal University Jaipur (MUJ)

The difference between the two is that in an owned relationship when the owner is
updated in the database, all objects in all its owned collections are automatically
updated (this default behavior can be overridden at runtime if necessary).

In the relational model, a one-to-many relationship can be defined either using an


embedded foreign key column or using a join table. An embedded foreign key is a
column defined in the table on the many sides of the relationship that holds a key to the
table on the one side of the relationship. A join table is a table whose main purpose is
to store mapping values between the key and foreign key of the two tables involved in
the relationship.

An owned relationship can only be implemented using an embedded foreign key


column. A referenced relationship can be implemented using either method-embedded
foreign key or join table.

• Many-to-many relationships: This is a relationship where the maximum of both


multiplicities is greater than one.

A many-to-many relationship can be thought of as a bi-directional one-to-many


association. To create this type of relationship you simply define a Referenced
Collection attribute in each of the classes involved in the relationship. In the relational
model, a many-to-many relationship can be defined either using foreign key columns
or using a join table.

To use foreign keys, an embedded foreign key column is defined in each of the tables
involved in the relationship. Each foreign key column holds the key to the other table.

Based on directionality, objects' relationship can be uni-directional relationships and bi-


directional relationships.
• Uni-directional relationships: A uni-directional relationship exists between two
objects when an object knows about the object(s) it is related to but the other

Unit 14: Object Relational Mapping 11


DCA2102: Database Management System Manipal University Jaipur (MUJ)

object(s) does not know about the original object. Uni-directional relationships are
easier to implement than bi-directional relationships.
Example - The holds relationship between Employee and Position. Employee objects
know about the position that they hold, but Position objects do not know which
employee holds it (also there is no need to do so).
• Bi-directional relationships: A bi-directional relationship exists when the objects
on both end of the relationship know each other.
Example – The works-in relationship between the Employee and Department.
Employee objects know in which department they work and Department objects also
know what employees work in them.

The other types of relationships can be:


Owned relationships (Aggregation): In an owned relationship, the owner class holds a
reference to its owned collection using an Owned Collection attribute and the owned class
holds a direct reference to its owner using a Reference attribute. An owner object
automatically handles all inserts, updates, and deletions of all its owned objects in the
database.

VBSF automatically and transparently takes care of updating owned objects in the database
whenever an owner object is updated in the database. This behavior is recursive, so if any
owned objects are in turn owners (i.e. they hold other owned collections), all their owned
objects will also be updated in the database too, and so on. In this manner, whole object
graphs can be updated in the database in one operation. This type of behavior is known as
persistence by reachability.

An owned relationship is implemented in the relational model using an embedded foreign


key column in the table on the many sides of the relationship.

Referenced relationships (Association): This type of relationship exists when (for e.g. in
a one-to-many association) the class on the one side of the relationship merely references
other classes and does not own it. In this case (one-to-many association) an object on the one
side (also referred to as the holder of the collection) of the relationship cannot create, update,
or delete objects on the other side of the relationship and can only retrieve them. So the
holder has to use a Referenced Collection attribute to reference its associated collection

Unit 14: Object Relational Mapping 12


DCA2102: Database Management System Manipal University Jaipur (MUJ)

instead of an Owned Collection attribute. A Referenced Collection attribute can also be used
to hold collections of objects of the same class as the holder of the collection.

A Referenced Collection attribute can handle the association in the database by using either
an embedded foreign key column or a join table. To handle the association using a join table,
the referenced collection attribute must be mapped to a join table.

5.2 Implementation Of Object Relationships


It is already mentioned in the previous section that relationships in object schemas can be
implemented by a combination of references to objects and operations. This section says
how the object relationships are implemented.

When the multiplicity is one (e.g. 0...1 or 1) the relationship is implemented with a reference
to an object by the use of a getter operation, and a setter operation. The attribute(s) and
operations required to implement a relationship are often referred to as scaffolding.

When the multiplicity is many (e.g. N, 0...*, 1...*) the relationship is implemented via a
collection attribute, such as an Array or a HashSet in Java, and operations to manipulate that
array.

When a relationship is uni-directional, the code is implemented only by the object that knows
about the other object(s). When the relationship is Bi-directional then codes are
implemented by both classes.

5.3 Implementation Of Relational Database Relationships


In the earlier sections, it is mentioned that relationships in relational databases are
maintained through the use of foreign keys. A foreign key is a data attribute(s) that appears
in one table that may be part of or is coincidental with the key of another table.

In the case of a one-to-one relationship, the foreign key needs to be implemented by one of
the tables. To implement a one-to-many relationship a foreign key is implemented from the
“one table” to the “many tables”. It can be also chosen to overbuild the database schema and
implement a one-to-many relationship via an associative table, effectively making it a many-
to-many relationship.

Unit 14: Object Relational Mapping 13


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Many-to-many relationships can be implemented in two ways in a relational database. The


first one is to implement in each table the foreign key column(s) to the other table several
times. But unfortunately, with this approach, a problem occurs when more tasks are assigned
to a single task. So another better approach is to implement what is called an associative
table, which includes the combination of the primary keys of the tables that it associates. In
this approach, the many-to-many relationship is converted into two one-to-many
relationships, both of which involve the associative table.

A consistent key strategy within the database can simplify the relationship mapping efforts
to a greater extent. To make it simpler the first step is to prefer single-column keys and the
next step is to use a globally unique surrogate key.

Because foreign keys are used to join tables, all relationships in a relational database are
effectively bi-directional. This is why it doesn’t matter in which table you implement a one-
to-one relationship; the code to join the two tables is virtually the same.

5.4 Relationship Mappings


We have already seen how object relationships and Relational Database relationships are
implemented. In this section, we will mention how mapping is done.

We will mention the mappings from the point of view of mapping the object relationships
into the relational database. Note that in some cases, we may also choose to design mapping.

A general rule of thumb with relationship mapping is that the multiplicities should be kept
the same. Therefore a one-to-one object relationship maps to a one-to-one data relationship,
a one-to-many maps to a one-to-many, and a many-to-many maps to a many-to-many
relationship. But this doesn’t have to be the case always; a one-to-one object relationship can
be implemented into a one-to-many or even a many-to-many data relationship. This is
because a one-to-one data relationship is a subset of a one-to-many data relationship and a
one-to-many relationship is a subset of a many-to-many relationship.

One-to-one mappings: In one-to-one mapping, when an object of a one-to-one object


relationship is read into memory then the application will automatically traverse the holds
relationship and automatically read in the corresponding object. Or the application will

Unit 14: Object Relational Mapping 14


DCA2102: Database Management System Manipal University Jaipur (MUJ)

manually traverse the relationship in the code, taking a lazy read approach (where the other
object is read at the time) when required by the application.

One-to-many mappings: In a one-to-many mapping, when an object of a one-to-many


relationship is read into memory the relationship is automatically traversed to read in the
division that they work in.

Many-to-many mappings: To implement many-to-many relationships one has to know the


concept of an associative table, a data entity whose main objective is to maintain the
relationship between two or more tables in a relational database.

In relational databases the attributes contained in an associative table are traditionally the
combination of the keys in the tables involved in the relationship. The name of an associative
table is typically either the combination of the names of the tables that it associates with or
the name of the association that it implements.

The rule for mapping is that the multiplicities should "cross over" once the associative table
is introduced. A multiplicity of 1 is always introduced on the outside edges of the relationship
within the data schema to preserve the overall multiplicity of the original relationship.

Many-to-many relationships are interesting because of the addition of the associative table.
But two business classes are being mapped to three data tables to support this relationship.
And so this creates another extra work to do.

5.5 Mapping Ordered Collections


Sometimes we keep models in some order with an aggregation association between the two
classes. In such a case, we get an ordered constraint placed on the relationship i.e. users care
about the order in which items appear on an order. While mapping this to a relational
database you need to add an additional column to track this information.

Although this mapping seems straightforward on the surface, there are several issues that
are needed to be taken into consideration. These issues become apparent when you consider
basic persistence functionality for the aggregate. These issues are (i) Reading the data in the
proper sequence

Unit 14: Object Relational Mapping 15


DCA2102: Database Management System Manipal University Jaipur (MUJ)

(ii) not to include the sequence number in the key (iii) updating the sequence numbers after
rearranging the order items (iv) updating sequence numbers after deleting an order item (v)
Considering sequence number gaps greater than one.

5.6 Mapping Recursive Relationships


A recursive relationship, also called reflexive relationship exits when the same entity (class,
data entity, table …) is involved with both ends of the relationship.

For example, an employee may manage several other employees. The aggregate relationship
that the Team class has with itself is recursive. For e.g. a team may be a part of one or more
other teams.

The many-to-many recursive aggregation is mapped to the Subteams associative table in the
same way a normal many-to-many relationship is mapped. The only difference is that both
columns are foreign keys into the same table. Similarly, the one-to-many association is
mapped in the same way a normal one-to-many relationship is mapped.

Unit 14: Object Relational Mapping 16


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 4
7. Based on directionality, objects relationship can be
relationships and relationships.
8. In Referenced Relationships (Association) a holder has to use a
to reference its associated collection instead of an
Owned Collection attribute.
9. While implementing relational database relationships, in a one-to-manyrelationship a
key is implemented from the “one table” tothe “many
table”.
10. A general rule of thumb with relationship mapping is that the multiplicities in
relationship should be kept . (Choose correct
option)
a) same
b) different
c) uni-directional
d) bi-directional

Unit 14: Object Relational Mapping 17


DCA2102: Database Management System Manipal University Jaipur (MUJ)

6. MODELING WITH JOIN TABLES


In order to implement many-to-many relationships in the relational model, a Join table is
commonly used. We will discuss how join tables can be used to implement many-to-many
associations in the object model. A join table is used to maintain a relationship between the
rows of two tables (or two different rows in the same table) in a relational database. A join
table must contain a foreign key column that points to each of the primary keys of the tables
in the association. Additional information may also be maintained in the join table for more
clarity.
While modeling many-to-many relationships in the object model, we should always think of
a class that represents the join table. If the join table is transactional, then a corresponding
persistent class should be created in the object model.

The join is said to be transactional if one or more additional fields such as a quantity or date
are defined in a join table. The join is said to be transparent If the join only contains foreign
key columns.

Transactional Joins
In a transactional join, the join table is modeled by two or more transaction classes. Each
transaction class is owned by one of the classes involved in the many-to-many relationship.

Transparent Joins
In Transparent joins, the join table is not modeled by a class. Here each class involved in the
relationship has to get methods that return its related objects to the other class. This is
implemented using a Referenced Collection attribute on each class involved in the
relationship.

When two classes are related via a join table, one of the classes must be designated as the
join manager. The join manager has no functionality in the object model. Only one
persistent class can be assigned as the join manager of a join table.

Self-Join Tables
A join does not always have to involve two different tables. You can join a table to itself,
creating a self-join. Joining a table to itself can be useful when you want to compare values
in a column to other values in the same column.

Unit 14: Object Relational Mapping 18


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 5
11. The join is said to be of a , if one or more additionalfields such as
a quantity or date is defined in a join table. (Choose correct option).
a) transactional
b) transparent
c) self-join
d) join manager
12. When two classes are related via join table, one of the classes must bedesignated as the
.

Unit 14: Object Relational Mapping 19


DCA2102: Database Management System Manipal University Jaipur (MUJ)

7. OPEN SOURCE OBJECT RELATIONAL MAPPING SOFTWARE


There are many open-source object relational mapping software supported by many high-
level languages. A few well-known object-relational mapping software are listed below.

LiteSQL: It is an open-source mapper and is a C++ library that integrates C++ objects tightly
to a relational database and thus provides an object persistence layer. LiteSQL supports
SQLite3, PostgreSQL and MySQL as backends. This library is distributed under the terms of
the BSD License.
Java: The open-source mappers using JAVA are given below:

• Cayenne: Apache Cayenne is an open-source persistence framework licensed under


the Apache License, providing object-relational mapping (ORM) and remoting services.
Cayenne seamlessly binds one or more database schemas directly to Java objects,
managing atomic commits and rollbacks, SQL generation, joins, sequences, and more.
• Carbonado: It is an open-source framework, backed by Berkeley DB or JDBC. It is an
extensible, high-performance persistence abstraction layer for Java applications,
providing a relational view to the underlying persistence technology. It also supports
queries, joins, indexes, and it performs query optimization.
• Hibernate: It is open source and provides relational Persistence for Java and .NET.
Hibernate facilitates the storage and retrieval of Java domain objects via
Object/Relational Mapping.
• KeyAccess: KeyAccess is a lightweight Object-Relational Mapping (ORM) tool, that uses
existing databases as the starting point to generate a domain model. KeyAccess doesn’t
mandate a special architecture or framework – it will work in any application using
JDBC.

Similarly, the name of other Open-source object-relational mapping software using JAVA are
Java Data Objects (JDO), Java Persistence API (JPA), DataNucleus, JPOX, Object Relational
Bridge, OpenJPA, ORMLite, QuickDB ORM, Sobat, etc.

Unit 14: Object Relational Mapping 20


DCA2102: Database Management System Manipal University Jaipur (MUJ)

.NET: The names of open source mappers using .NET are given below:

• NHibernate: It is an open-source port of Hibernate Core for Java to [Link]


Framework. It handles persisting plain .NET objects to and from an underlying
relational database. Given an XML description of the entities and relationships,
NHibernate automatically generates SQL for loading and storing the objects.
NHibernate supports transparent persistence and the object classes don't have to
follow a restrictive programming model.
• Castle ActiveRecord: It is an implementation of the ActiveRecord pattern for .NET. The
ActiveRecord pattern consists on instance properties representing a record in the
database, instance methods acting on that specific record, and static methods acting on
all records.

Similarly, the name of other open source using .NET ( some may be with commercial
support) mappers are Base One Foundation Component Library, BLToolkit, ECO, Habanero,
iBATIS, Neo, nHydrate, [Link], Picasso, Quick Objects, SubSonic and AgileFx.

PHP: Open source mapper using PHP are:

• PHP-ActiveRecord: It is an open-source ORM library based on the ActiveRecord


pattern. It aims to massively simplify the interactions with your database and eliminate
the chore of handwritten SQL for common operations. Unlike other ORMs, no code
generators nor maintenance of mapping files for the tables are required.
• Propel: It is an open-source Object-Relational Mapping (ORM) and Query-Toolkit for
PHP 5, inspired by Apache Torque. It allows you to access your database using a set of
objects, providing a simple API for storing and retrieving data.
Name of the other open-source software using PHP is CakePHP, Doctrine, PdoMap,
Rocks, Qcodo, and Torpor.

There are other open-source mappers also using languages like Python, Ruby, Perl, Scala,
Delphi, VB6, etc. For example, the open-source mappers using Python are Django, SQLObject,
Storm, XRecord, Autumn and Tryton ActiveRecord, Datamapper, and iBATIS. The sequel is
used in Ruby.

Unit 14: Object Relational Mapping 21


DCA2102: Database Management System Manipal University Jaipur (MUJ)

Self-Assessment Questions - 6
13. Hibernate is an open-source mapper and provides relational Persistence for Java and
.NET. (True/False)
14. Castle ActiveRecord is an implementation of the ActiveRecord pattern for _____.

Unit 14: Object Relational Mapping 22


DCA2102: Database Management System Manipal University Jaipur (MUJ)

8. SUMMARY
There is a hot debate among many whether the object-relational impedance mismatch is a
problem or not. Few programmers advocate the use of Object-Oriented databases as a
solution for impedance mismatch and not ORM. The objective of Object-Relational mapping
is to solve the problem of object-relational impedance mismatch.
Classes of Objects can be mapped to the RDBMS table. For e.g. VBSF supports two types of
class-to-table mappings that help overcome the differences between relational and object
models: SUBSET and SUPERSET. A class inheritance tree can be mapped in the RDBMS as:
vertical mapping, horizontal mapping, and filtered mapping.

Object relationships for mapping to the relational model can be categorized based on
multiplicity, directionality, ownership, and reference.

Based on the multiplicity, object relationships can be One-to-one relationships, One-to-


many relationships, and Many-to-many relationships. Based on directionality, objects'
relationships can be unidirectional relationships and bi-directional relationships.

A general rule of thumb with relationship mapping is that the multiplicities should be kept
the same.

The join is said to be transactional if one or more additional fields such as a quantity or date
is defined in a join table. The join is said to be transparent. If the join only contains foreign
key columns.

There are many open-source object relations mapping software supported by C++, JAVA,
Python, Ruby, Perl, Scala, Delphi, VB6, etc.

9. TERMINAL QUESTIONS

1. What is object-relational impedance mismatch?


2. Explain vertical mapping, horizontal mapping, and filtered mapping.
3. What is the difference between owned relationship and referenced relationship?
4. What are the different open source mappers?

Unit 14: Object Relational Mapping 23


DCA2102: Database Management System Manipal University Jaipur (MUJ)

10. ANSWERS

Self-Assessment Questions
1. database tables or relational schema
2. impedance mismatch
3. False
4. multiple tables
5. different
6. True
7. uni-directional, bi-directional
8. Referenced Collection attribute
9. Foreign
10. a) same
11. a) transactional
12. join manager
13. True
14. .NET

Terminal Questions

1. During Object-relational mapping when an RDBMS is being used by a program written


in an object-oriented programming language or style then any changes in database
schema will result in unexpected failure of system. This type of phenomenon is known
as an impedance mismatch. (Refer to 2 for detail)
2. In vertical mapping, each class in the tree, whether abstract or concrete, is mapped to
a different table. In horizontal mapping, each concrete class in the tree is mapped to a
different table. In filtered mapping, all concrete classes in the tree are mapped to the
same table. (Refer to 4 for detail)
3. In an owned relationship, the owner class holds a reference to its owned collection
using an Owned Collection attribute and the owned class holds a direct reference to its
owner using a Reference attribute. Inreferenced relationship an object on the one side
(also referred to as the holder of the collection) has to use a Referenced Collection
attribute to reference its associated collection instead of an Owned Collection attribute.
4. Few of the open-source mappers are Cayenne, Hibernate, Castle ActiveRecord, PHP-
ActiveRecord, etc. (Refer section 7 for detail)

Unit 14: Object Relational Mapping 24


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

BACHELOR OF COMPUTER APPLICATIONS


SEMESTER 3

DCA2101
COMPUTER ORIENTED NUMERICAL
METHODS

Unit 15: Technological Trends in DBMS 1


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Unit 15
Technological Trends in DBMS
Table of Contents

SL Topic Fig No / Table SAQ / Page No


No / Graph Activity
1 Introduction - -
3
1.1 Objectives - -
2 Cloud Computing 1, 2 1
2.1 Functioning of Cloud Computing - - 4-7
2.2 Cloud Architecture - -
3 Cloud Storage and Cloud Services - - 8 - 10
4 Cloud Industrial Applications 3, 4, 5, 6 2 11 - 14
5 Temporal Database - 3I 15 - 18
6 Big Data - 4 19 - 20
7 NoSQL Databases - 5
7.1 Types of NoSQL databases - -
21 - 27
7.2 Advantages and Disadvantages of NoSQL - -
7.3 SQL Databases vs. NoSQL Databases - -
8 Summary - - 28
9 Terminal Questions - - 29
10 Answers - - 29 - 31

Unit 15: Technological Trends in DBMS 2


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

1. INTRODUCTION

In the previous unit, we studied object-relational mapping and several associated aspects
such as mapping a class inheritance tree, mapping object relationships, modeling with join
tables, and open-source object-relational mapping software. In this unit, we will cover
technological trends in the database management system.

The rapidly evolving and dynamic nature of systems-driven research imposes special
requirements on the technology, design, approach, and architecture of computational
infrastructure including database applications. Several technological advances exploded in
the field of databases during the last few years. In this unit, you will study advanced database
technological developments including cloud computing, temporal database, big data, and
NoSQL databases.

1.1 Objectives

After studying this unit, you should be able to:

❖ explain the functions of cloud computing


❖ understand the design of cloud architecture
❖ discuss the cloud storage and cloud services
❖ discuss temporal database
❖ understand the big data concepts
❖ explain NoSQL databases
❖ differentiate between SQL databases and NoSQL databases

Unit 15: Technological Trends in DBMS 3


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

2. CLOUD COMPUTING

Cloud computing has a background in the combination of both client/server computing and
peer-to-peer distributed computing. Cloud computing is a natural evolution of the extensive
adoption of virtualization, service-oriented architecture, and autonomic and utility
computing.

Cloud computing can be defined as Internet-based computing that facilitates shared


computer processing resources and data to computers and other devices on a demand basis.
It is a model for supporting ubiquitous, on-demand access to a shared pool of configurable
computing resources such as – networks, storage, servers, services, and applications, which
can be rapidly provisioned and released with minimal managerial efforts.

2.1 Functioning Of Cloud Computing

Before even getting deep into cloud computing technology, it is important to understand the
key element of “cloud” which represents computers that are organized in the form of a
network that fulfills the purpose of service-oriented architecture to give out information and
software. Basically, cloud computing technology is set apart from the traditional method
because the resources from computers are arranged in such a manner that the applications
can function irrespective of the server configuration which uses them.

This methodology makes less use of the resources degrading the necessity of using hardware
for the working of the applications. Cloud in cloud computing technology takes up the idea
of using the internet to run software on an individual’s computer. These days internet seems
to be a hub of everything therefore everyone prefers to use software that is entirely based
on the web and can also work on this software using a simple web browser.

To understand cloud computing technology, think of the cloud so that it will have layers
inside divided into two parts: back end and front end. The front end layer consists of
everything visible to a normal human who is using the technology and also gets an
opportunity to interact with it. The back end consists of both hardware and software
required to make the front end interface function properly.

Unit 15: Technological Trends in DBMS 4


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

The set of computers in the cloud computing technology are put together so that any
application can take any resource it wishes to take and also use up the complete power as it
usually does if it functions on one single machine. Cloud computing also provides scope for
flexibility that is the number of resources being consumed can vary depending on the task at
hand, which means that the resources can either decrease or increase according to the job.

Trends have been changing rapidly, and the number of people using these cloud computing
methods had only been seen to be increasing without any questions of halting. Though it is
a good thing to know, however it has its own set of risks of which the most primary one is
that if for any reason the internet is down, access to data over another system will be
tampered therefore stopping the work at least for some time. It might even disappear for
longer durations if the internet bill is not paid at the specified time.

2.2 Cloud Architecture

The key to cloud computing is the “cloud” a massive network of servers or even individual
PCs interconnected in a grid. These computers run in parallel, combining the resources of
each to generate supercomputing like power. What, exactly, is the “cloud”? Put simply, the
cloud is a collection of computers and servers that are publicly accessible via the Internet.
This hardware is typically owned and operated by a third party on a consolidated basis in
one or more data center locations. The machines can run any combination of operating
systems; it’s the processing power of the machines that matter, not what their desktops look
like. As shown in Figure 15.1, individual users connect to the cloud from their own personal
computers or portable devices, over the Internet. To these individual users, the cloud is seen
as a single application, device, or document. The hardware in the cloud (and the operating
system that manages the hardware connections) is invisible.

Unit 15: Technological Trends in DBMS 5


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Figure 15.1: cloud architecture

This cloud architecture is deceptively simple, although it does require some intelligent
management to connect all those computers together and assign task processing to
multitudes of users. As you can see in Figure 15.2, it all starts with the front-end interface
seen by individual users. This is how users select a task or service (either starting an
application or opening a document). The user’s request then gets passed to the system
management, which finds the correct resources and then calls the system’s appropriate
provisioning services. These services carve out the necessary resources in the cloud, launch
the appropriate web application, and either create or open the requested document. After
the web application is launched, the system’s monitoring and metering functions track the
usage of the cloud so that resources are apportioned and attributed to the proper user(s).

Unit 15: Technological Trends in DBMS 6


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Figure 15.2: Process in cloud management

As you can see, the key to the notion of cloud computing is the automation of many
management tasks. The system isn’t a cloud if it requires human management to allocate
processes to resources. What you have in this instance is merely a twenty-first-century
version of the old-fashioned data center–based client/server computing. For the system to
attain cloud status, manual management must be replaced by automated processes.

Self-Assessment Questions - 1

1. _________________ in cloud computing technology takes up the idea of using internet


to run software on any individual’s computer.
2. Key to cloud computing is a massive network of servers or even individual PCs
interconnected in a grid [True/False].
3. Name the component through which the cloud user interacts.

Unit 15: Technological Trends in DBMS 7


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

3. CLOUD STORAGE AND CLOUD SERVICES

Cloud storage is a model of networked online storage where data is stored in virtualized
pools of storage which are generally hosted by third parties Any web-based application or
service offered via cloud computing is called cloud service.

Cloud storage

Cloud storage is a system of networked machine aggregation hardware where information


is stored on multiple virtual servers, rather than being hosted on a single hard server.
Hosting companies operate huge server hubs with backup and data protection systems; and
people who use the service.

One of the primary uses of cloud computing is for data storage. With cloud storage, data is
stored on multiple third-party servers, rather than on the dedicated servers used in
traditional networked data storage. When storing data, the user sees a virtual server, that is,
it appears as if the data is stored in a particular place with a specific name. But that place
doesn’t exist in reality. It’s just an assumed name used to reference virtual space carved out
of the cloud. In reality, the user’s data could be stored on any one or more of the computers
used to create the cloud. The actual storage location may even differ from day to day or even
minute to minute, as the cloud dynamically manages available storage space. But even
though the location is virtual, the user sees a “static” location for his data and can actually
manage his storage space as if it were connected to his own PC. Cloud storage has both
financial and security-associated advantages. Financially, virtual resources in the cloud are
typically cheaper than dedicated physical resources connected to a personal computer or
network. As for security, data stored in the cloud is secure from accidental erasure or
hardware crashes, because it is duplicated across multiple physical machines; since multiple
copies of the data are kept continually, the cloud continues to function as normal even if one
or more machines go offline. If one machine crashes, the data is duplicated on other machines
in the cloud.

Unit 15: Technological Trends in DBMS 8


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Cloud service
Cloud computing is a general term for anything that involves delivering hosted services over
the Internet. These services are broadly divided into three categories:

• Infrastructure-as-a-Service (IaaS),
• Platform-as-a-Service (PaaS)
• Software-as-a-Service (SaaS).

The name cloud computing was inspired by the cloud symbol that's often used to represent
the Internet in flowcharts and diagrams.

A cloud service has three distinct characteristics that differentiate it from traditional hosting.
It is sold on demand, “typically by the minute or the hour; it is elastic – a user can have as
much or as little of a service as they want at any given time;” and the service is fully managed
by the provider (the consumer needs nothing but a personal computer and Internet access).
Significant innovations in virtualization and distributed computing, as well as improved
access to high-speed Internet and a weak economy, have accelerated interest in cloud
computing.

A cloud can be private or public. A public cloud sells services to anyone on the Internet.
(Currently, Amazon Web Services is the largest public cloud provider.) A private cloud is a
proprietary network or a data center that supplies hosted services to a limited number of
people. When a service provider uses public cloud resources to create their private cloud,
the result is called a virtual private cloud. Private or public, the goal of cloud computing is to
provide easy, scalable access to computing resources and IT services.

Infrastructure-as-a-Service like Amazon Web Services provides virtual server instance API
(Application programming interface) to start, stop, access and configure their virtual servers
and storage. In the enterprise, cloud computing allows a company to pay for only as much
capacity as is needed, and bring more online as soon as required. Because this pay-for-what-
you-use model resembles the way electricity, fuel and water are consumed, it's sometimes
referred to as utility computing.

Unit 15: Technological Trends in DBMS 9


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Platform-as-a-service in cloud computing is defined as a set of software and product


development tools hosted on the provider's infrastructure. Developers create applications
on the provider's platform over the Internet. PaaS providers may use APIs, website portals,
or gateway software installed on the customer's computer. [Link], (an outgrowth of
[Link]) and GoogleApps are examples of PaaS. Developers need to know that
currently there are no standards for interoperability or data portability in the cloud. Some
providers will not allow software created by their customers to be moved off the provider's
platform.

In the software-as-a-service cloud model, the vendor supplies the hardware infrastructure,
the software product and interacts with the user through a front-end portal. SaaS is a very
broad market. Services can be anything from Web-based email to inventory control and
database processing. Because the service provider hosts both the application and the data,
the end-user is free to use the service from anywhere.

Unit 15: Technological Trends in DBMS 10


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

4. CLOUD INDUSTRIAL APPLICATIONS


The cloud computing industry has seen a rapid rise in the number of vendors, with each
vendor trying to get the first-mover advantage. In Figure 15.3, some examples of vendors
and users are provided for various cloud services, i.e, IaaS, PaaS, and SaaS.

Figure 15.3: Responsibilities of vendor and user for different types of services

salesforce: One of the most popular cloud computing SaaS applications is Salesforce CRM.
This was one of the first software with a multitenant platform that charged based on usage
instead of buying the software, deploying, and maintaining the same. You access the software
over the Internet.

Google Apps: Google Apps is a suite of cloud computing SaaS applications that includes e-
mail (Gmail), Organizer (Google Calendar), Word Processing documents (Google Docs), and
others. Figure 15.4 illustrates the various components of Google Apps. It has a free edition
with few applications and other editions with a lot more functionality. Google’s Web-based
messaging and collaboration apps require no server-side hardware or software and need
minimal administration, creating tremendous time and cost savings for businesses.

Unit 15: Technological Trends in DBMS 11


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Figure 15.4: Components of Google Apps

Office 365: Office 365 is the familiar Microsoft Office now available on the cloud as SaaS. It
is now available as a per-user per-month subscription. You do not need to install the
software on your PC. You just need a web browser to access the service. Figure 15.5
illustrates the various components of Office 365.

Figure 15.5: Components of Office 365

Zoho: One of the leading companies which were started in India that has cloud-based SaaS
is Zoho. It has applications similar to the ones offered by salesforce, Office 365, and Google
Apps. Figure 15.6 illustrates the various components which the Zoho supports.

Unit 15: Technological Trends in DBMS 12


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Figure 15.6: Components of Zoho

[Link]: [Link] is PaaS offering from [Link]. It is a platform for creating and
deploying applications for social enterprises. Because there are no servers or software to
buy or manage, you can focus solely on building applications that include built-in social and
mobile functionality, business processes, reporting, and search. Your applications run on a
secure, proven service of [Link] that scales, tunes and backs up data automatically.

Windows Azure iPlatform: Windows Azure iPlatform is a cloud platform (PaaS) that
enables you to quickly build, deploy and manage Windows applications across a global
network of Microsoft managed datacenters.

Google App Engine: Google App Engine is a PaaS cloud computing platform used for
developing and hosting web applications in Google-managed datacenters.

Dropbox - Dropbox is an IaaS that provides a Web-based file hosting service. It uses cloud
storage to enable users to store and share files and folders with others across the Internet
using file synchronization.

Sify mystorage: Sify mystorage is a IaaS and provides a cloud-based online storage and
backup solution.

Amazon Web Services: Amazon Web Services (AWS) is a collection of remote computing
services (also called web services) that together make up a cloud computing IaaS platform,
offered over the Internet by [Link]. The most central and well-known of these services
are Amazon EC2 for resizable compute capacity and Amazon S3 Cloud Storage.

Netmagic SimpliCloud: Netmagic SimpliCloud is an IaaS Cloud Computing Platform that


features instant online provisioning of virtual machines and virtual appliances. It also

Unit 15: Technological Trends in DBMS 13


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

features ‘Elastic’ plans where customers can pay for their cloud infrastructure by the hour,
thereby availing of true ‘Pay-As-You-Use’ and ‘On-Demand’ infrastructure.

Traditional companies like Oracle and SAP sold software as licenses with Annual
Maintenance Contract. Cloud companies on the other hand have a subscription model where
customers pay based on usage. This allows companies to try new applications at a very low
cost and when they are comfortable, move all users to use the service. Cloud computing
technology implies the fundamental challenges of how IT operations are managed and
therefore, business as a whole. Traditional companies such as SAP, Oracle, Microsoft, and
Google are now trying to get a big piece of data in the cloud.

Self-Assessment Questions - 2

4. In cloud storage, data is stored on multiple ___________ .


5. Name the three broad categories of cloud services
6. Name the first software with multitenant platform that charged based on usage
instead of buying the software.
7. __________is the familiar Microsoft Office now available on cloud as SaaS
8. AWS stands for ______________.

Unit 15: Technological Trends in DBMS 14


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

5. TEMPORAL DATABASE

Temporal databases are used to record time-referenced data. Basically, the majority of the
database technologies are temporal. For example:

• Record keeping function (inventory administration, medical-record, and personnel, )


• Financial function (banking, accounting, and portfolio organization)
• Scientific function (weather monitoring)
• Scheduling function (project organization, hotel, airline, and train reservations).

All these functions trust on temporal databases.

Temporal databases are best suited for applications where information is organized on time
constraints. Therefore, the temporal database set a good example to demonstrate the
requirement for the development of a combined set of concepts for the use of application
developers. The framing (objective, design, coding, interface and implementation) of a
temporal database is designed by application developers and designers.

There are numerous applications where time is an important factor in storing information.
For example:

• Insurance, to keep a record of accidents and claims.


• Healthcare, to maintain patient histories.
• Reservation systems, check the reservation and availability of seats in the train, airline,
hotel, car rental, and many more places.
• Scientific databases, where experiments outcome needs to be stored along with the
time that when it was carried out.

In the case of temporal applications, even the two instances utilized might be simply
expanded. For example, in the COMPANY database, it may be desirable to keep the PROJECT,
JOB, and SALARY histories of all the employees.

It can be applied to the UNIVERSITY database as well, to store the grade history of the
STUDENT. The details about the YEAR, SEMESTER, COURSE, and each SECTION are also
included in the database.

Unit 15: Technological Trends in DBMS 15


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Actually, it can be easily concluded that some temporal information is stored by many of the
database applications. But it is also observed that many users try to ignore temporal features
as it adds complexity to the applications.

Different Forms of Temporal Databases

Time can be interpreted as valid time (when data occurred or is true in reality) or transaction
time (when data was entered into the database).

A historical database stores data with respect to valid time.

A rollback database stores data with respect to transaction time.

A bitemporal database stores data with respect to both valid and transaction time – they
store the history of data with respect to valid time and transaction time.

Application domains of Temporal Data

Examples of application domains dealing with temporal data are:

• Financial Applications – e.g. history of stock markets; share prices


• Reservation Systems – e.g. when was a flight booked
• Medical Systems – e.g. patient records
• Computer Applications – e.g. history of file backups
• Archive Management Systems – e.g. sporting events, publications, journals, etc

It is always possible to identify application domains of temporal data since any data can be
represented as temporal data. The question is: when is such transition (considering all
different states of data) required and important? The answer often is how vital it is to have
temporal data for an organization and the benefits that it will bring. Often, when
organizations are storing archives of data then a Temporal Database Management System
(TDBMS) will be useful for manipulating temporal data.

Solutions in developing a Temporal Database

The following approaches underline how a temporal database may be created:

Unit 15: Technological Trends in DBMS 16


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

1. Use the type date provided by a non-temporal (any commercial) DBMS.


2. Extend a non-temporal data model to a temporal data model by attaching time
attributes to each data.
3. Develop a new temporal database system from scratch that provides a primitive data
type time and handles the different states/time instances of data being stored.

The first and second solution does not involve any changes to existing database technology
and may be simple to form as we just build new methods for temporal support on top of the
existing database system that will be used.

The third solution involves developing a whole new database system with temporal support.
This will be difficult as the underlying principles used by commercial DBMS to optimize
operations must be reformed and a lot of theoretical work needs to be carried out to show
that the new system is fully complete, all new and modified operations perform as required.
The amount of time and manpower required for this approach is similar to that needed by
commercial vendors to develop DBMS that we all are familiar with today.

Thus, this project cannot consider the third solution when developing a temporal database,
as this is out of reach in terms of time and manpower available.

The project adopts the second solution, by using a relational database system to model and
store temporal relations and hence, produces a temporal database.

Self-Assessment Questions - 3

9. Temporal databases are the technique which record _____________


data.
10. Financial function of temporal database include:
a. Banking
b. accounting
c. portfolio organisation
d. all of the above

Unit 15: Technological Trends in DBMS 17


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Activity 1
Explain how temporal databases include all database applications to organise their
information.

Unit 15: Technological Trends in DBMS 18


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

6. BIG DATA

Big data is a term that describes the large volume of data – both structured and unstructured
– that inundates a business on a day-to-day basis. But it’s not the amount of data that’s
important. It’s what organizations do with the data that matters. Big data can be analyzed for
insights that lead to better decisions and strategic business moves.

While the term “big data” is relatively new, the act of gathering and storing large amounts of
information for eventual analysis is ages-old. The concept gained momentum in the early
2000s when industry analysts articulated the now-mainstream definition of big data as the
three Vs:

Volume – Organizations collect data from a variety of sources, including business


transactions, social media, and information from sensor or machine-to-machine data. In the
past, storing it would’ve been a problem – but new technologies such as Hadoop have eased
the burden.

Velocity – Data streams in at an unprecedented speed and must be dealt with in a timely
manner. RFID tags, sensors, and smart metering are driving the need to deal with torrents of
data in near-real-time.

Variety – Data comes in all types of formats – from structured, numeric data in traditional
databases to unstructured text documents, email, video, audio, stock ticker data, and
financial transactions.

In real-world big data is used for several applications. Some of the big data applications
include the following:

• Monitor premature infants to alert when interventions is needed


• Predict machine failures in manufacturing
• Prevent traffic jams, save fuel, reduce pollution
• Telecommunication network monitoring
• Count the number of users logging in from a particular location on the earth

The various tools used in big data scenarios are:

Unit 15: Technological Trends in DBMS 19


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

• NoSQL Databases (high-performance, non-relational databases ) - MongoDB, CouchDB,


Cassandra, Redis, BigTable, Hbase, Hypertable, Voldemort, Riak, ZooKeeper
• MapReduce (a programming model introduced by Google for processing and
generating large data sets on clusters of computers) – Hadoop, Hive, Pig, Cascading,
Cascalog, mrjob, Caffeine, S4, MapR, Acunu, Flume, Kafka, Azkaban, Oozie, Greenplum
• Storage - S3, Hadoop Distributed File System
• Servers - EC2, Google App Engine, Elastic, Beanstalk, Heroku
• • Processing - R, Yahoo! Pipes, Mechanical Turk, Solr/Lucene,
ElasticSearch, Datameer, BigSheets, Tinkerpop

Self-Assessment Questions - 4

11. Big data supports 3 Vs: Volume, Velocity and_________ .


12. MapReduce is introduced by ___________ .

Unit 15: Technological Trends in DBMS 20


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

7. NOSQL DATABASES

NoSQL is a term used to describe high-performance, non-relational databases. NoSQL


databases utilize a variety of data models, including document, graph, key-value, and
columnar. NoSQL databases are widely recognized for ease of development, scalable
performance, high availability, and resilience. NoSQL encompasses a wide range of
technologies and architectures, and seeks to solve the scalability and big data performance
issues that relational databases weren’t designed to address.

NoSQL is especially useful when an enterprise needs to access and analyze massive amounts
of unstructured data or data that are stored remotely on multiple virtual servers in the cloud.
Contrary to misconceptions caused by its name, NoSQL does not prohibit SQL. While it's true
that some NoSQL systems are entirely non-relational, others simply avoid selected relational
functionality such as fixed table schemas and join operations. For example, instead of using
tables, a NoSQL database might organize data into objects, key/value pairs, or tuples.
Arguably, the most popular NoSQL database is Apache Cassandra. Cassandra, which was
once Facebook’s proprietary database, was released as open-source in 2008. Other NoSQL
implementations include SimpleDB, Google BigTable, Apache Hadoop, MemcacheDB,
Voldemort, Hbase, and Hypertable. Companies that use NoSQL include Twitter, NetFlix, and
LinkedIn.

7.1 Types Of NoSQL Databases

There are 4 basic types of NoSQL databases:

Key-Value Store – It has a Big Hash Table of keys & values. A key-value store is a NoSQL
database optimized for read-heavy application workloads such as - social networking,
gaming, media sharing, and Q&A portals, or compute-intensive workloads such as - a
recommendation engine. In-memory caching improves application performance by storing
critical pieces of data in memory for low-latency access. The cached information may include
the results of I/O-intensive database queries or the results of computationally-intensive
calculations. Some examples of such databases are- Amazon S3 (Dynamo), and Riak.

Unit 15: Technological Trends in DBMS 21


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Document-based Store – It stores documents made up of tagged elements. A document


database is designed to store semi-structured data like documents, typically in JSON or XML
format. Unlike SQL databases, the schema for each non-relational (NoSQL) document can
vary, giving Developers, Database Administrators, and IT Professionals more flexibility in
organizing and storing application data and reducing the storage required for optional
values. An example of this database is – CouchDB.

Column-based Store – Each storage block contains data from only one column. In a column-
oriented NoSQL database, data is stored in cells grouped in columns of data rather than as
rows of data. Columns are logically grouped into column families. Column families can
contain a virtually unlimited number of columns that can be created at runtime or the
definition of the schema. Read and write is done using columns rather than rows. Some
examples of such databases include - HBase and Cassandra.

While most relational DBMS store data in rows, the advantage of storing data in columns, is
fast search/ access and data aggregation. Relational databases store a single row as a
continuous disk entry. Different rows are stored in different places on the disk, whereas
Columnar databases store all the cells corresponding to a column as a continuous disk entry
thus making the search/access faster. For example: To query the titles from a bunch of a
million articles will be a painstaking task while using relational databases as it will go over
each location to get item titles. On the other hand, with just one disk access, the title of all the
items can be easily obtained.

Graph-based – A network database that uses edges and nodes to represent and store data.
In a Graph-Based NoSQL Database, you will not find the rigid format of SQL or the tables and
columns representation, a flexible graphical representation is instead used which is perfect
to address scalability concerns. Graph structures are used with edges, nodes and properties
which provides index-free adjacency. Data can be easily transformed from one model to the
other using a Graph Based NoSQL database. These databases use edges and nodes to
represent and store data. Nodes are organised by some relationships with one another,
which is represented by edges between the nodes. Both the nodes and the relationships have
some defined properties. Example of this database includes - Neo4J, Giraph.

Unit 15: Technological Trends in DBMS 22


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

7.2 Advantages And Disadvantages Of NoSQL

In comparison to relational databases, NoSQL databases are more scalable and provide
superior performance. Some of the advantages of NoSQL are given below:

Replication

In contrast to SQL databases, most NoSQL databases allow automatic database replication to
increase availability in the event of outages or planned maintenance events. More
sophisticated NoSQL databases are fully self-healing, offering automated failover and
recovery, as well as the ability to distribute the database across multiple geographic regions
to withstand regional failures and enable data localization.

Integrated Caching

In SQL database systems caching tier can be facilitated by a number of products. These
systems can substantially improve read performance, but they do not improve write
performance, and they add operational complexity to system deployments. If your
application is dominated by reads then a distributed cache could be considered, but if your
application has just a modest write volume, then a distributed cache may not improve the
overall experience of your end-users and will add complexity in managing cache invalidation.

Most NoSQL database technologies have excellent inbuilt caching capabilities, keeping
frequently-used data in system memory as much as possible and removing the need for a
separate caching layer. Additionally, some NoSQL databases also offer a fully managed,
integrated in-memory database management layer for workloads demanding the highest
throughput and lowest latency.

Dynamic Schemas

You are aware of the fact that Relational databases require schemas to be defined before you
can add data. For example, if you want to store data about your customers such as first and
last name, phone numbers, address, city, and state – a SQL database needs to know what you
are storing in advance.

Unit 15: Technological Trends in DBMS 23


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Whereas NoSQL databases provide the flexibility to allow the insertion of data without a
predefined schema. That makes it easy to make significant application changes in real-time,
without worrying about service interruptions – which means code integration is more
reliable, development is faster, and less database administrator time is required. Developers
typically had to add application-side code to enforce data quality controls, such as data types
or permissible values, mandating the presence of specific fields. More sophisticated NoSQL
databases allow validation rules to be applied within the database, allowing users to enforce
governance across data while maintaining the agility benefits of a dynamic schema.

Auto-sharding

Relational databases are structured, therefore they usually scale vertically – a single server
has to host the entire database to ensure acceptable performance for cross-table joins and
transactions. This gets expensive quickly, places limits on scale, and creates a relatively small
number of failure points for database infrastructure. The solution to allow rapidly growing
applications is to scale horizontally, by adding servers instead of concentrating more
capacity on a single server.

'Sharding' a database across many server instances cannot be achieved automatically with
SQL databases. On the other hand NoSQL databases, usually allow auto-sharding, meaning
that they natively and automatically spread data across an arbitrary number of servers,
without requiring the application to even be aware of the composition of the server pool.
Data and query load are automatically balanced across servers, and when a server goes
down, it can be quickly and transparently replaced with no application disruption.

Despite the above advantages, NoSQL has also a few disadvantages, which are given below:

Data Consistency

Many NoSQL databases don’t perform ACID transactions a tried and true technique for
ensuring data consistency across the entire database as it is moved around. Instead, NoSQL
follows the principle of "eventual consistency." This provides some performance benefits,
but it poses the risk that data on one database node may go out of sync with data on another
node.

Unit 15: Technological Trends in DBMS 24


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Security

In comparison to SQL databases, NoSQL databases are generally subject to a fairly long list
of security issues. Most of them will likely be solved in time (a few already have been solved
on certain NoSQL platforms) as NoSQL continues to mature. But currently, security is a
limiting factor for NoSQL deployment.

7.3 SQL Databases Vs. NoSQL Databases


Characteristics SQL Databases NoSQL Databases
Types It supports one type with minor It supports many different types
variations including key-value stores, document
databases, column-based stores, and
graph databases
Development It was developed in the 1970sto It was developed in the late 2000s to
History deal with the first wave of data deal with the limitations of SQL
storage applications databases, especially scalability,
multi-structured data, geo-
distribution, and agile development
sprints.
Data Model Its relational model normalizes It does not enforce a schema. A
data into tabular structures partition key is generally used to
known as tables, which consist of retrieve values, column sets, or semi-
rows and columns. A schema structured JSON, XML, or other
strictly defines the tables, documents containing related item
columns, indexes, relationships attributes.
between tables, and other
database elements.
Scaling It grows vertically, which means It grows horizontally, which means
a single server must be made that to add capacity, a database
increasingly powerful in order to administrator can simply add more
deal with increased demand. Itis commodityservers or cloud instances.
possible to spread SQLdatabases The database automatically spreads
over many servers, but data across servers as necessary.
significant additional
engineering is generally
required, and core relational
features such as JOINs,
referential integrity, and
transactions are typically lost.

Unit 15: Technological Trends in DBMS 25


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Development A mix of closed source (e.g., Open-source


Model Oracle Database) and open-
source (e.g., Postgres, MySQL)
Data Manipulation With the help of Specific Using object-oriented APIs
language using Select, Insert, and
Update statements, e.g. SELECT
fields FROM table WHERE…
Consistency It can be configured for strong It depends on the product. Some
consistency provide strong consistency (e.g.,
MongoDB, with tunable consistency
for reads) whereas others offer
eventual consistency (e.g.,
Cassandra).
Performance Its performance is generally Its performance is generally a
dependent on the disk function of the underlying hardware
subsystem. cluster size, network latency, and the
Optimization of queries, indexes, calling application.
and table structure is required to
achieve peak performance.
Examples MySQL, Postgres, Microsoft SQL MongoDB, Cassandra, HBase, Neo4j
Server, Oracle Database

Self-Assessment Questions - 5

13. NoSQL does not prohibit SQL. [True/False]


14. Neo4J is a ___________ based database.
15. Data consistency is high in NoSQL as compared to relational databases. [True/
False]

Unit 15: Technological Trends in DBMS 26


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

8. SUMMARY
Let us recapitulate the important points discussed in this unit:

• Cloud computing has two precursors as client-server computing and peer-to-peer


distributed computing.
• The term ‘cloud’ is used as a metaphor for the Internet, based on the cloud drawing
used in the past to represent the telephone network and later to depict the Internet in
computer network diagrams as an abstraction of the underlying infrastructure it
represents.
• Cloud provides services are broadly divided into three categories: Infrastructure-as-a-
Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS).
• Temporal databases are used to record time-referenced data. Basically the majority of
the database technologies are temporal.
• Big data describes the large volume of data – both structured and unstructured – that
inundates a business on a day-to-day basis. Big data can be analyzed for insights that
lead to better decisions and strategic business moves.
• NoSQL describes high-performance, non-relational databases. It supports 4 basic types
of NoSQL databases: Key-Value Store, Document-based Store, Column-based Store, and
Graph-based database. It supports both structured and non-structured databases.

Unit 15: Technological Trends in DBMS 27


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

9. TERMINAL QUESTIONS
1. Describe the functioning of cloud computing.
2. Explain cloud architecture.
3. What are cloud services?
4. Explain the role of cloud computing in industrial applications.
5. Explain temporal database.
6. What is big data? List the applications where big data can be used.
7. Explain the 3 Vs of big data.
8. Describe the types of NoSQL.
9. Explain the advantages and disadvantages of NoSQL databases.
10. Compare SQL databases and NoSQL databases.

10. ANSWERS

Self Assessment Questions

1. Cloud
2. True
3. User interface
4. third-party servers
5. SaaS, PaaS, IaaS
6. Salesforce
7. Office 365
8. Amazon Web Services
9. Time-referenced
10. d. all of the above
11. Variety
12. Google
13. True
14. Graph
15. False

Unit 15: Technological Trends in DBMS 28


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

Terminal Questions
1. Fundamentally, the cloud computing technology is different as compared to the
traditional method because cloud computing is the delivery of computing as a service
rather than a product. (For more details refer section 2.1)
2. The key to cloud computing is the “cloud” a massive network of servers or even
individual PCs interconnected in a grid. (For more details refer section 2.2)
3. Different categories of cloud services are a software as a Service (SaaS), Platform as a
Service (PaaS), and Infrastructure as a Service (IaaS). (For more details refer section 3)
4. The cloud computing industry has seen a rapid rise in the number of vendors, with each
vendor trying to get the first-mover advantage. (For more details refer section 4)
5. The temporal database works on time-referenced data. (Refer Section 5 for more
details.)
6. Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis. (For more details refer
section 6).
7. Volume, Velocity, and Variety. (Refer Section 6 for more details)
8. There are 4 basic types of NoSQL databases: Key-Value Store, Document-based Store,
Column-based Store, and Graph-based database. (Refer Section 7.1 for more details)
9. Refer Section 7.2.
10. Refer Section 7.3.

Acknowledgments, References, and Suggested Readings:

• Ramakrishnan, R. & Gehrke, J. (2003), Database Management Systems, Third Edition,


McGraw-Hill, Higher Education.
• Rob, P. & Coronel, C. (2006), Database Systems:
Design, Implementation and Management, Seventh Edition, Thomson Learning.
• Silberschatz, Korth & Sudarshan (1997), Database System Concepts, Fourth Edition,
McGraw-Hill
• Navathe, E. (2000), Fundamentals of Database Systems, Third Edition, Pearson
Education Asia

Unit 15: Technological Trends in DBMS 29


DCA2102: Computer Oriented Numerical methods Manipal University Jaipur (MUJ)

• Paul Beynon-Davies (2003), Database Systems, Third


Edition, Palgrave.
• Toby Teorey, Sam Lightstone and Tom Nadeau (2006), Database Modeling & Design,
Fourth Edition, Elsevier Inc.
• [Link]/.../database/delphi_database_application_developers_... retrieved on
May 14, 2012.
• [Link]/dbs/[Link] retrieved on May 14, 2012.
• [Link]/~rts/pubs/[Link] retireved on May 14, 2012.
• [Link]
• [Link]
• [Link]
• [Link] computing/
• URL: [Link]
• URL: [Link]
• URL:[Link]
[Link]
• [Link] Only-SQL
• [Link]
• [Link]

Unit 15: Technological Trends in DBMS 30

You might also like