0% found this document useful (0 votes)
13 views17 pages

Understanding Database Management Systems

A Database Management System (DBMS) is a software that facilitates the efficient storage and retrieval of data, addressing issues like data redundancy and inconsistency found in traditional file processing systems. DBMS applications span various sectors including enterprise information, banking, and telecommunications, offering advantages such as data integrity, security, and ease of access. The document also discusses data models, levels of data abstraction, and the architecture of DBMS, highlighting the importance of data independence and user interaction.

Uploaded by

ajaysir2k6
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views17 pages

Understanding Database Management Systems

A Database Management System (DBMS) is a software that facilitates the efficient storage and retrieval of data, addressing issues like data redundancy and inconsistency found in traditional file processing systems. DBMS applications span various sectors including enterprise information, banking, and telecommunications, offering advantages such as data integrity, security, and ease of access. The document also discusses data models, levels of data abstraction, and the architecture of DBMS, highlighting the importance of data independence and user interaction.

Uploaded by

ajaysir2k6
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Database Management System

Introduction:

A database-management system (DBMS) is a collection of interrelated data and a set of


programs to access those data. The collection of data, usually referred to as the database,
contains information relevant to an enterprise. The primary goal of a DBMS is to provide a
way to store and retrieve database information that is both convenient and efficient.

Database-System Applications:

Databases are widely used in the following applications:


1. Enterprise Information
 Sales
 Accounting
 Human resources
2. Manufacturing
3. Banking and Finance
4. Universities
5. Airlines
6. Telecommunication

Purpose of Database Systems :

Database systems are designed to manage large bodies of information. Management of


data involves both defining structures for storage of information and providing mechanisms
for the manipulation of information. Before arrival of DBMS, File Processing system is used
for managing the information. file-processing system is supported by a conventional operating
system. The system stores permanent records in various files, and it needs different application
programs to extract records from, and add records to, the appropriate files

A file-processing system has a number of major disadvantages:

 Data redundancy is high – Duplication of data at several places.


 Data inconsistency is high – Various copies of same data may no longer get matched.
 Difficulty in accessing data – The conventional file system does not allow to retrieve
the desired data in efficient and convenient manner.
 Data isolation – As the data scattered over several files and files may be in different
formats, it becomes difficult to retrieve the desired data from the file for writing the
new application.
 Integrity problems – Data integrity means data values entered in the database fall within
a specified range and are of correct format. With the use of several files enforcing such
constraint on the data becomes difficult.
 Atomicity problems – An atomicity means particular operation must be carried out
entirely or not at all with the database. It is difficult to ensure atomicity in conventional
file processing system.
 Concurrent-access anomalies - For efficient execution, multiple users update data
simultaneously, in such a case data need to be synchronized. As in the traditional file
systems, data is distributed over multiple files,, one cannot access these files
concurrently.
 Security problems – Every user is not allowed to access all the data of the database
system. In the file system, the authorized user is allowed to view all the data which
means File system provides the detail of the data representation and storage of data.
Hence enforcing security constraints is difficult.

These difficulties, among others, prompted the development of database systems. The
main purpose of DBMS is handling the issues of file processing system and managing large
bodies of information and provides a way to store and retrieve database information that is both
convenient and efficient. The following are the advantages of DBMS:

 Data redundancy is low


 Data inconsistency is low
 Easy accessing of data
 Integrity is ensured through constraints
 Atomicity is ensured
 Concurrent access and Crash recovery.
 Reduced Application Development Time.
 Security is ensured.

Comparison between DBMS and File System

 There are following differences between DBMS and File system:

DBMS File System


Data Redundancy is less Data Redundancy is high
Data inconsistency is less Data inconsistency is high
DBMS gives an abstract view of data that File system provides the detail of the data
hides the details representation and storage of data.
Easy accessing of data Difficulty in accessing data
DBMS takes care of Concurrent access of In the File system, concurrent access has
data using some form of locking. many problems like redirecting the file
while other deleting some information or
updating some information.
Integrity is ensured through constraints Ensuring Integrity is difficult

Characteristics of a Database

There are a number of characteristics that distinguish the database approach with the file-based
approach.

1. Self-Describing Nature of a Database System


A Database System contains not only the database itself but also the descriptions of
data structure and constraints (meta-data). This information is used by the DBMS software or
database users if needed.
2. Insulation between Program and Data
In the database approach, the data structure is stored in the system catalog not in the
programs. Therefore, one change is all that’s needed.

3. Support multiple views of data


A view is a subset of the database which is defined and dedicated for particular users
of the system. Multiple users in the system might have different views of the system. Each
view might contain only the data of interest to a user or a group of users.

4. Sharing of data and Multiuser system


A multiuser database system must allow multiple users access to the database at the
same time.

5. Control Data Redundancy


In the Database approach, ideally each data item is stored in only one place in the
database. In some cases redundancy still exists so as to improve system performance, but such
redundancy is controlled and kept to minimum.

6. Data Sharing
The integration of the whole data in an organization leads to the ability to produce more
information from a given amount of data.

7. Enforcing Integrity Constraints


DBMSs should provide capabilities to define and enforce certain constraints such as
data type, data uniqueness, etc.

8. Restricting Unauthorised Access


Not all users of the system have the same accessing privileges. DBMSs should provide
a security subsystem to create and control the user accounts.

9. Data Independence
System data (Meta Data) descriptions are separated from the application programs. Changes
to the data structure is handled by the DBMS and not embedded in the program.

10. Transaction Processing


The DBMS must include concurrency control subsystems to ensure that several users trying to
update the same data do so in a controlled manner. The results of any updates to the database
must maintain consistency and validity.

11. Providing backup and recovery facilities


If the computer system fails in the middle of a complex update process, the recovery subsystem
is responsible for making sure that the database is restored to the stage it was in before the
process started executing.

12. Managing information


Managing information means taking care of it so that it works for us, and is useful for the work
we are doing.

--------------------------------------------------------------------------------------------------------------

View of Data

A major purpose of a database system is to provide users with an abstract view of the data.
That is, the system hides certain details of how the data are stored and maintained. Hiding
irrelevant details from user and providing abstract view of data users, helps in easy and efficient
user – Database interaction.

Levels of data abstraction: DBMS provides abstract view of data in three levels.

i) Physical Level ii) Logical Level iii) View Level 1.3 View of Data 7

view level

view 1 view 2 … view n

logical
level

physical
level

Figure 1.1 The three levels of data abstraction.


Level of Abstraction
languages support the notion of a structured type. For example, we may describe
Physical alevel:
recordThe lowest 1level of abstraction describes how the data are actually stored. The
as follows:
physical level describes complex low-level data structures in detail.
type instructor = record
ID : char (5);
Logical level: The next-higher level of abstraction describes what data are stored in the
name : char (20);
database, and what relationships exist among those data. The (20);
dept name : char logical level thus describes the
entire database in terms of a small number of relatively simple(8,2);
salary : numeric structures.
end;
View level: The highest level of abstraction describes only part of the entire database. Many
users of the database
This system
code defines do not
a new needtype
record all this information;
called instead,
instructor with fourthey need
fields. to field
Each access only
has a name and a type associated with it. A university organization may have
a part of the database. The view level of abstraction exists to simplify their interaction with the
several
system. The suchmay
system record types,many
provide including
views for the same database.

• department, with fields dept name, building, and budget


• course, with fields course id, title, dept name, and credits
• student, with fields ID, name, dept name, and tot cred

At the physical level, an instructor, department, or student record can be de-


Example:

Consider the following record:

type employee = record


empid: number (10)
ename: char (20)
deptno: number (10)
salary: number (10)

At Physical level these records can be described as blocks of consecutive storage locations
(bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.

At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers
generally work at this level.

At view level, user just interact with system using GUI and enter the details at the screen, they
are not aware how the data is stored and what data is stored; such details are hidden from them.
In addition to hiding details of the logical level of the database, the views also provide a security
mechanism to prevent users from accessing certain parts of the database.

Instances and Schemas:

Databases change over time as information is inserted and deleted. The collection of
information stored in the database at a particular moment is called an instance of the database.
The overall design of the database is called the database schema.

The concept of database schemas and instances can be understood by analogy to a program
written in a programming language. A database schema corresponds to the variable
declarations (along with associated type definitions) in a program. Each variable has a
particular value at a given instant. The values of the variables in a program at a point in time
correspond to an instance of a database schema.

Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level. A database may also have several schemas
at the view level, sometimes called subschemas, that describe different views of the database.

Data Independence:

Data Independence is defined as a property of DBMS that helps to change the Database
schema at one level of a database system without requiring to change the schema at the next
higher level. Data independence helps you to keep data separated from all programs that make
use of it.
Types of Data Independence

In DBMS there are two types of data independence


1. Physical data independence
2. Logical data independence.

Physical level data independence : It refers to the characteristic of being able to modify the
physical schema without any alterations to the conceptual or logical schema

Logical level data independence: It refers characteristic of being able to modify the logical
schema without affecting the external schema or application program. The user view of the
data would not be affected by any changes to the conceptual view of the data.

Data Models

Data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way to describe the design of
a database at the physical, logical, and view levels.

Types of Data Models:

 Entity Relationship Model


 Relational Model
 Hierarchical Model
 Network Model
 Object Oriented Model
 Object Relational Model

Entity Relationship Model:

ER model is based on the perception of the real world object. It express the logical
structure of the database graphically. The entity-relationship (E-R) data model uses a collection
of basic objects, called entities, and relationships among these objects. An entity is a “thing”
or “object” in the real world that is distinguishable from other objects. The entity-relationship
model is widely used in database design,

The basic components are:


 Entity set - Rectangle
 Attribute - Ellipse
 Relationship between entity sets – Diamond
 Links – Line ( which link attributes to entity sets and entity sets to relationships )

ER Diagram
Advantages:
 It is easy to develop relational model using ER model
 It specifies constraints

Drawbacks:
 It is just used for database design and not for implementation.

Relational Model:

The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Tables are also known as relations.
20 Silberschatz−Korth−Sudarshan: 1. Introduction Text © The McGraw−Hill

Database System Each row of the table referred as tuples Companies, 2001
Concepts, Fourth Edition
 The collection of values of one particular column is referred as domain.

The relational model is an example of a record-based model. Record-based models are so


named because the database is structured in fixed-format records of several types. Each table
contains records of a particular type. Each record type defines a fixed number of fields, or
10attributes.
ChapterThe
1 columns of the table correspond to the attributes of the record type.
Introduction

customer-id customer-name customer-street customer-city


192-83-7465 Johnson 12 Alma St. Palo Alto
019-28-3746 Smith 4 North St. Rye
677-89-9011 Hayes 3 Main St. Harrison
182-73-6091 Turner 123 Putnam Ave. Stamford
321-12-3123 Jones 100 Main St. Harrison
336-66-9999 Lindsay 175 Park Ave. Pittsfield
019-28-3746 Smith 72 North St. Rye
(a) The customer table

Advantages: account-number balance


A-101 500
A-215 700
 Structural Independence A-102 400
A-305 350
A-201 900
A-217 750
A-222 700
(b) The account table
 Greater Flexibility

Drawbacks:

 Significant hardware and software overheads

Hierarchical Model:

In hierarchical model, data is organized into a tree like structure with each record is
having one parent record and many children. The heirarchy starts from the Root data, and
expands like a tree, adding child nodes to the parent nodes. The hierarchical model is a top-
down structure where each parent may have many children but each child can have only one
parent. This model supports one-to-one and one-to-many relationships.

Advantages:

 Widely used in Main Frame systems


 High speed access to large data sets

Drawbacks:

 Implementation complexity
 Lack of structural independence

Network Model:

This is an extension of the Hierarchical model. In this model data is organised more
like a graph, and are allowed to have more than one parent node. The network model builds on
the hierarchical model by allowing many-to-many relationships between linked records,
implying multiple parent records.
Advantages:

 Handles more relationship types


 Data access flexibility
 Promotes database integrity

Drawbacks:

 System complexity
 Lack of structural independence

Object Oriented Model:

This model defines a database as a collection of objects, or reusable software elements, with
associated features and methods. The object that contains the same type of values and methods
are grouped together into classes.
Advantages:

 Application require less code


 Object oriented features improves productivity

Drawbacks:

 Complex navigational data access.

Object Relational Model:

This hybrid database model combines the simplicity of the relational model with some of the
advanced functionality of the object-oriented database model. In essence, it allows designers
to incorporate objects into the familiar table structure.

----------------------------------------------------------------------------------------------------------------

Architecture of DBMS

System Architecture:

The architecture of a database system is greatly influenced by the underlying computer system
on which the database system runs. Database systems can be centralized, or client-server,
where one server machine executes work on behalf of multiple client machines.

A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. A primary goal of a database system is to retrieve information from and store
new information into the database. People who work with a database can be categorized as
database users or database administrators.

There are four different types of database-system users.

Naive users are unsophisticated users who interact with the system by invoking one of the
application programs that have been written previously.

Application programmers are computer professionals who write application programs.


Application programmers can choose from many tools to develop user interfaces.

Sophisticated users interact with the system without writing programs. Instead, they form
their requests either using a database query language or by using tools

Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework.
24 Chapter 1 Introduction

Database applications are usually partitioned into two or three parts, as in


Figure 1.6. In a two-tier architecture, the application resides at the client machine,
where it invokes database system functionality at the server machine through

naive users sophisticated


application database
(tellers, agents, users
programmers administrators
w eb users) (analysts)

use w rite use use

application application query administration


interfaces programs tools tools

compiler and
DM L queries DDL interpreter
linker

application
program DM L compiler
object code and organizer

query evaluation
engine
query processor

buffer manager fi le manager authorization transaction


and integrity manager
manager

storage manager

disk storage
indices data dictionary

data statistical data

Figure 1.5 System structure.


Database Administrator

A person who has such central control over the system is called a database administrator
(DBA). The functions of a DBA include:

1. Schema definition.
2. Storage structure and access-method definition.
3. Schema and physical-organization modification.
4. Granting of authorization for data access..
5. Routine maintenance. Examples of the database administrator’s routine maintenance
activities are:

 Periodically backing up the database, either onto tapes or onto remote servers, to
prevent loss of data in case of disasters such as flooding.
 Ensuring that enough free disk space is available
 Monitoring jobs running on the database and ensuring that performance is not
degraded

The functional components of a database system can be broadly divided into the storage
manager and the query processor components.
The Query Processor

The query processor subsystem compiles and executes DDL and DML statements. The query
processor components include:

 DDL interpreter, which interprets DDL statements and records the definitions in the
data dictionary.
 DML compiler, which translates DML statements in a query language in to an
evaluation plan consisting of low-level instructions that the query evaluation engine
understands.
 A query can usually be translated into any of a number of alternative evaluation plans
that all give the same result. The DML compiler also performs query optimization;
that is, it picks the lowest cost evaluation plan from among the alternatives.
 Query evaluation engine, which executes low-level instructions generated by the
DML compiler.

The storage manager is the component of a database system that provides the interface between
the low-level data stored in the database and the application programs and queries submitted
to the system. The storage manager components include:

 Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
 Transaction manager, which ensures that the database remains in a consistent
(correct) state despite system failures, and that concurrent transaction executions
proceed without conflicting.
 File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
 Buffer manager, which is responsible for fetching data from disk storage in to main
memory, and deciding what data to cache in main memory.

The storage manager implements several data structures as part of the physical system
implementation:

 Datafiles, which store the database itself.


 Data dictionary, which stores metadata about the structure of the database,
in particular the schema of the database.
 Indices, which can provide fast access to data items.

Application Architecture:

Database applications are usually partitioned into two or three parts, as in Figure. In a two-tier
architecture, the application resides at the client machine, where it invokes database system
functionality at the server machine through query language statements. Application program
interface standards like ODBC and JDBC are used for interaction between the client and the
server.
1.10 Data Mining and Information Retrieval 25

user user
client

application application client

netw ork netw ork

application server
database system
server
database system

(a) Tw o-tier architecture (b) Three-tier architecture

Figure 1.6 Two-tier and three-tier architectures.

query language statements. Application program interface standards like ODBC


and JDBC are used for interaction between the client and the server.
In contrast, in a three-tier architecture, the client machine acts as merely a front end and does
In contrast, in a three-tier architecture, the client machine acts as merely a
front end and does not contain any direct database calls. Instead, the client end
not contain any direct database calls. Instead, the client end communicates with an application
communicates with an application server, usually through a forms interface.
server, usually through a forms interface. The application server in turn communicates with a
The application server in turn communicates with a database system to access
data. The business logic of the application, which says what actions to carry out
database system to access data.
under what conditions, is embedded in the application server, instead of being
distributed across multiple clients. Three-tier applications are more appropriate
for large applications, and for applications that run on the World Wide Web.
---------------------------------------------------------------------------------------------------------------
1.10 Data Mining and Information Retrieval
Introduction to Relational Database:
The term data mining refers loosely to the process of semiautomatically analyzing
large databases to find useful patterns. Like knowledge discovery in artificial
intelligence (also called machine learning) or statistical analysis, data mining
A relational database is based on the relational model and uses a collection of tables to represent
attempts to discover rules and patterns from data. However, data mining differs
from machine learning and statistics in that it deals with large volumes of data,
both data and the relationships among those data.
stored primarily on disk. That is, data mining deals with “knowledge discovery
in databases.”
Some types of knowledge discovered from a database can be represented by
a set of rules. The following is an example of a rule, stated informally: “Young
Properties of Relations: women with annual incomes greater than $50,000 are the most likely people to buy
small sports cars.” Of course such rules are not universally true, but rather have

 Relation name is distinct from all other relations


 Each cell of relation contains exactly one atomic (single ) value.
 Each attribute has a distinct name
 Values of an attribute are all from the same domain
 Order of attributes have no significance.
 Each tuple is distinct.
 Order of tuples has no significance.

Tables − In relational data model, relations are saved in the format of Tables. This format
stores the relation among entities. A table has rows and columns, where rows represents
records and columns represent the attributes.

Tuple − A single row of a table, which contains a single record for that relation is called a
tuple.

Relation instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.

Relation schema − A relation schema describes the relation name (table name), attributes,
and their names.

Relation key − Each row has one or more attributes, known as relation key, which can identify
the row in the relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute
domain.

Constraints

Every relation has some conditions that must hold for it to be a valid relation. These conditions
are called Relational Integrity Constraints. Integrity constraints ensure that changes made to
the database by authorized users do not result in a loss of data consistency. Thus, integrity
constraints guard against accidental damage to the database. The constraints can be set during
the creation of table as well as alteration of table.

Types of Integrity Constraints:

 Domain Integrity Constraints


 Entity Integrity Constraints
 Referential Integrity Constraints

Domain Integrity Constraints:


It is the entry level constraint. It ensures that the entered data are in proper format and
are in with in the range.

Types:
 NOTNULL
 CHECK

NOTNULL:

The NOT NULL constraint enforces a column to NOT accept NULL values. This
enforces a field to always contain a value, which means that you cannot insert a new
record, or update a record without adding a value to this field.

Syntax:

CREATE TABLE Student


(
ID int(6) NOT NULL,
NAME varchar(10) NOT NULL,
ADDRESS varchar(20)
)

CHECK:

CHECK constraint is used for specifying range of values for a particular column of a
table. This constraint is used to restrict the value of a column between a range. It
performs check on the values, before storing them into the database. It’s like condition
checking before saving data into a column
Syntax:
CREATE TABLE table_name
(
column1 datatype null/not null,
column2 datatype null/not null,
?
CONSTRAINT constraint_name CHECK (column_name condition)
);

Eg.
CREATE TABLE student (id numeric(4), name varchar2(50), CONSTRAINT check_id C
HECK(id BETWEEN 1 and 10))

Entity Integrity Constraints


Entity Integrity constraint uniquely identifies each row in table.
Types:
 Primary Key
 Unique Key

Primary Key:
The PRIMARY KEY constraint uniquely identifies each record in a table. Primary keys must
contain UNIQUE values, and cannot contain NULL values.
Syntax:
CREATE TABLE Persons (
ID int PRIMARY KEY,
Name varchar(255),
Age int
);

Unique Key:
The UNIQUE constraint ensures that all values in a column are different
CREATE TABLE table_name (
...
column_name data_type UNIQUE
...
);
Eg.
CREATE TABLE Persons (
ID int Unique,
Name varchar(255),
Age int
);

Key Differences Between Primary key and Unique key:

 Primary key will not accept NULL values whereas Unique key can accept one NULL
value.
 A table can have only primary key whereas there can be multiple unique key on a table.

Referential Integrity Constraint-


This constraint is specified between two tables (parent and child); it maintains the
correspondence between rows in these tables. Referential integrity constraints work on the
concept of Foreign Keys. A foreign key is a key attribute of a relation that can be referred in
other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or
same relation, then that key element must exist.

The rules are:

1. Records cannot be deleted from a primary table if matching records exist in a related
table.
2. Records cannot be inserted to the child table if the corresponding record is not found
in parent table.
3. Primary key value in the primary table cannot be changed if that record has related
records.

Keys in DBMS
KEYS is an attribute or set of attributes which helps you to identify a row(tuple) in a
relation(table). They allow you to find the relation between two tables.
Types of keys:
 Super Key - A super key is a group of single or multiple keys which identifies rows
in a table. Example, the combination of customer-name and customer-id is a super key
for the entity set customer

 Candidate Key - is a set of attributes that uniquely identify tuples in a table. Candidate
Key is a super key with no repeated attributes. It is the sub set of super keys.

For example: In the EMPLOYEE table, id is best suited for the primary key. Rest of
the attributes like SSN, Passport_Number, and License_Number, etc. are considered as
a candidate key.
 Primary Key - is a column or group of columns in a table that uniquely identify every
row in that table.

 Alternate Key - is a column or group of columns in a table that uniquely identify every
row in that table. All the keys which are not primary key are called an Alternate Key.
 Foreign Key - is a column that creates a relationship between two tables. The purpose
of Foreign keys is to maintain data integrity and allow navigation between two different
instances of an entity.
 Compound Key - has two or more attributes that allow you to uniquely recognize a
specific record.
 Surrogate Key - An artificial key which aims to uniquely identify each record is called
a surrogate key. These kind of key are unique because they are created when you don't
have any natural primary key.

You might also like