UNIT-1
DBMS= Database Management System
Database-: Database is an organized collection of data about people, location or things so that it can be
easily accessed and managed.
Introduction:
In computerized information system data is the basic resource of the organization. So, proper
organization and management for data is required for organization to run smoothly. Database
management system deals the knowledge of how data stored and managed on a computerized
information system. In any organization, it requires accurate and reliable data for better decision
making, ensuring privacy of data and controlling data efficiently. The examples include deposit and/or
withdrawal from a bank, hotel, airline or railway reservation, purchase items from supermarkets in all
cases, a database is accessed.
Q-: Difference between data and information?
= Data is a collection of facts, while information put those facts into context.
Data -: A collection of raw facts and figures ,audio,video,image etc is called data.
Data: data is the known facts or figures that have implicit meaning.
It can also be defined as it is the representation of facts ,concepts or instruction in a formal manner,
which is suitable for understanding and processing.
Data can be represented in alphabets(A-Z, a-z),in digits(0-9) and using special characters(+,-.#,$, etc)
e.g: 25, “ajit” etc.
Information: Information is the processed data on which decisions and actions are based. Information
can be defined as the organized and classified data to provide meaningful values.
Eg: “The age of ajit is 25”
Information-: Processed form of data is called information. Systematic and meaningful form of data.
Q-: Which level of database is viewed by user-
= External level
DDL= Data Definitiom Language
Q-: Which person is esponsible for over all activities for database-
= Databse Administrator (DBA)
SQL-: It is a language (structure querry language ) which is used to operate data stored in database.
Schema-: Logical structure of the database is called schema.
Database:
A database is an organized collection of data, so that it can be easily accessed and managed.
You can organize data into tables, rows, columns, and index it to make it easier to find relevant
information.
It is stored only once for example: consider the roll no, name, address of a student stored in a student
file. It is collection of related data with an implicit meaning.
DBMS
DBMS is a collection of programs that allows you to create, manage and operate database.
Database Management System (DBMS): A database management system consists of collection of
related data and refers to a set of programs for defining, creation, maintenance and manipulation of a
database.
• Here are some examples of popular DBMS used these days:
• MySql
• Oracle
• SQL Server
• IBM DB2
• Amazon SimpleDB (cloud based) etc.
Q-: Differentiate between foreign key and primary key?
= Primary key is a unique identifier for each record in table. Foreign establishes a relationship between
tables by referencing the primary keyof another table.
Function of DBMS/characteristics:
1. Defining database schema: it must give facility for defining the database structure also specifies
access rights to authorized users.
2. Manipulation of the database: The dbms must have functions like insertion of record into
database updation of data, deletion of data, retrieval of data
3. Sharing of database:The DBMS must share data items for multiple users by maintaining
consistency of data.
4. Protection of database: It must protect the database against unauthorized users.
5. Database recovery: If for any reason the system fails DBMS must facilitate data base recovery.
6.
File System Approach
File: File is a collection of related data stored in secondary memory. A File Management system is a
DBMS that allows access to single files or tables at a time. In a File System, data is directly stored in a set
of files. It contains flat files that have no relation to other files (when only one table is stored in a single
file, then this file is known as a flat file).
Q-: What do you mean by relationship in DBMS ? Give an example.
= Any association between two entity type is called a relationship. Entities take part in a relationship. It
is represented by diamond shape.
Ex-: A teacher teaches students.
Here teacher and students are the entity and teaches is relationship between teacher entity and
students entity.
Q-: Before use of DBMS information was stored using-
=File Management System
Q-: DBMS helps achieve-
=Data Independence
Q-:How to write the databse schema-
=DDL
SQL is the databse language.
Columns= Field/Attributes
Rows= Record/Tuple
Q-: Difference between truncate and delete command?
= The delete command remove all record from table but the truncate command remove all record from
table but not the structure.
Normalization-: The process of organizing data in databse is called normalization.
In the above figure: Some fields are duplicated in more than one file, which leads to data redundancy.
So to overcome this problem, we need to create a centralized system, i.e. DBMS approach.
File Oriented Approach: The traditional file oriented approach to information processing each
application has a separate master file and its own set of personal file. In file oriented approach the
program dependent on the files and files dependent upon the programs.
Disadvantages of file oriented approach:
1) Data redundancy and inconsistency: The same information may be written in several files. This
redundancy leads to higher storage and access cost. It may lead data inconsistency that is the various
copies of the same data may present at multiple places for example a changed customer address may be
reflected in single file but not else where in the system.
2) Difficulty in accessing data : The conventional file processing system do not allow data to be retrieved
in a convenient and efficient manner according to user choice.
3) Data isolation : Because data are scattered in various files and files may be in different formats with
new application programs to retrieve the appropriate data is difficult.
4)Integrity Problems:Developers enforce data validation in the system by adding appropriate code in
the various application program. How ever when new constraints are added, it is difficult to change the
programs to enforce them.
5) Atomicity: It is difficult to ensure atomicity in a file processing system when transaction failure occurs
due to power failure, networking problems etc. (atomicity: either all operations of the transaction are
reflected properly in the database or non are)
6) Concurrent access: In the file processing system it is not possible to access the same file for
transaction at same the time.
7) Security problems: There is no security provided in file processing system to secure the data from
unauthorized user access.
Advantages of dbms:
Reduction of redundancies: Centralized control of data by the DBA avoids unnecessary duplication of
data and effectively reduces the total amount of data storage required avoiding duplication in the
elimination of the inconsistencies that tend to be present in redundant data files.
Sharing of data: A database allows the sharing of data under its control by any number of application
programs or users.
Data Integrity: Data integrity means that the data contained in the database is both accurate and
consistent. Therefore data values being entered for storage could be checked to ensure that they fall
with in a specified range and are of the correct format.
Data Security: The DBA who has the ultimate responsibility for the data in the dbms can ensure that
proper access procedures are followed including proper authentication schemas for access to the DBS
and additional check before permitting access to sensitive data
Data abstraction:
Data Abstraction is a process of hiding unwanted or irrelevant details from the end user. It provides a
different view and helps in achieving data independence which is used to enhance the security of data.
Three level/View Architecture of DBMS:
External level :It is also called view level. The reason this level is called “view” is because several users
can view their desired data from this level which is internally fetched from database with the help of
conceptual and internal level mapping.
Conceptual level
It is also called logical level. The whole design of the database such as relationship among data, schema
of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This level is
maintained by DBA (database administrator).
Internal level
This level is also known as physical level. This level describes how the data is actually stored in the
storage devices. This level is also responsible for allocating space to the data. This is the lowest level of
the architecture.
Data Independence: Data independence refers characteristic of being able to modify the schema at one
level of the database system without altering the schema at the next higher level.
Data independence is usually considered from two points of views;
physical data independence and logical data independence.
Physical data Independence: allows changes in the physical storage devices or organization of the files
to be made without requiring changes in the conceptual view or any of the external views.
Logical data independence: indicates that the conceptual schema can be changed without affecting the
existing external schema or any application program.
Disadvantage of DBMS:
1. DBMS software and hardware (networking installation) cost is high .
2. The processing overhead by the dbms for implementation of security, integrity and
sharing of the data.
3. centralized database control.
4. Setup of the database system requires more knowledge, money, skills, and time.
5. The complexity of the database may result in poor performance.
Hardware
The hardware is the actual computer system used for keeping and accessing the database. The
conventional DBMS hardware consists of secondary storage devices such as hard disks.
Databases run on the range of machines from micro computers to mainframes.
• Software
Software is the actual DBMS between the physical database and the users of the system. All the
requests from the user for accessing the database are handled by DBMS.
• Data
It is an important component of the database management system. The main task of DBMS is to
process the data. Databases are used to store the data, retrieved, and updated to and from the
databases.
• Users
There are a number of users who can access or retrieve the data on demand using the
application and the interfaces provided by the DBMS.
Database users
Naive users : Users who need not be aware of the presence of the database system or any other
system supporting their usage are considered naïve users . A user of an automatic teller
machine falls on this category.
Application programmers : Professional programmers who are responsible for developing
application programs or user interfaces utilized by the naïve and online user falls into this
category.
Sophisticated Users : Sophisticated users can be engineers, scientists, business analyst, who are
familiar with the database. They can develop their own database applications according to their
requirement.
Database Administrator (DBA) : Database Administrator (DBA) is a person/team who defines
the schema and also controls the 3 levels of database. The DBA will then create a new account
id and password for the user if he/she need to access the database.
Role of DBA
Schema definition: The DBA creates the original database schema by executing a set of data definition
statements in the DDL, Storage structure and access-method definition.
Granting of authorization for data access: By granting different types of authorization, the database
administrator can regulate which parts of the database various users can access. The authorization
information is kept in a special system structure that the database system consults whenever someone
attempts to access the data in the system.
Routine maintenance: Examples of the database administrator’s routine maintenance activities are:
[Link] backing up the database, either onto tapes or onto remote servers, to prevent loss of data
in case of disasters such as flooding.
DBMS Architecture
• Database management systems architecture will help us understand the components of
database system and the relation among them.
• The architecture of DBMS depends on the computer system on which it runs. For example, in a
client-server DBMS architecture, the database systems at server machine can run several
requests made by client machine.
Types of DBMS Architecture
There are three types of DBMS architecture:
1. Single tier architecture
2. Two tier architecture
3. Three tier architecture
1. Single tier architecture
• In this type of architecture, the database is readily available on the client machine, any request
made by client doesn’t require a network connection to perform the action on the database.
• For example, lets say you want to fetch the records of employee from the database and the
database is available on your computer system, so the request to fetch employee details will be
done by your computer and the records will be fetched from the database by your computer as
well. This type of system is generally referred as local database system.
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the DBMS application
is present at the client machine, these two machines are connected with each other through a reliable
network as shown in the above diagram. Ex.- MS- Access.
3. Three tier architecture
In three-tier architecture, another layer is present between the client machine and server machine. In
this architecture, the client application doesn’t communicate directly with the database systems present
at the server machine, rather the client application communicates with server application and the server
application internally communicates with the database system present at the server.
Ex.:web server, application server, database server
• SQL or Structured Query Language is used to operate on the data stored in a database. SQL
depends on relational algebra and tuple relational calculus.
• A cylindrical structure is used to display the image of a database.
Database Language
• Once data is stored or filled it requires manipulation like insertion, deletion, updating, and
modification of data. For these operations a set of languages are provided by the database
management system (DBMS). So, the database languages are used to read, update and store
data in the database.
• The different types of DBMS languages are as follows −
• Data Definition Language (DDL)
• Data Manipulation Language(DML)
• Data Control Language(DCL)
• Transactional Control Language(TCL)
DDL is the short name for Data Definition Language, which deals with database schemas and
descriptions, of how the data should reside in the database.
• CREATE: to create a database and its objects like (table, index, views, store procedure, function,
and triggers)
• ALTER: alters the structure of the existing database
• DROP: delete objects from the database
• TRUNCATE: remove all records from a table, including all spaces allocated for the records are
removed
• COMMENT: add comments to the data dictionary
• RENAME: rename an object
DML is the short name for Data Manipulation Language which deals with data manipulation and
includes most common SQL statements such SELECT, INSERT, UPDATE, DELETE, etc., and it is used to
store, modify, retrieve, delete and update data in a database.
• SELECT: retrieve data from a database
• INSERT: insert data into a table
• UPDATE: updates existing data within a table
• DELETE: Delete all records from a database table
DCL is short for Data Control Language which acts as an access specifier to the database.(basically to
grant and revoke permissions to users in the database
• GRANT: grant permissions to the user for running DML(SELECT, INSERT, DELETE,…) commands
on the table
• REVOKE: revoke permissions to the user for running DML(SELECT, INSERT, DELETE,…) command
on the specified table
TCL is short for Transactional Control Language which acts as an manager for all types of transactional
data and all transactions. Some of the command of TCL are
• Role Back: Used to cancel or Undo changes made in the database
• Commit: It is used to apply or save changes in the database
• Save Point: It is used to save the data on the temporary basis in the database
• SQL - CREATE Table
• Creating a basic table involves naming the table and defining its columns and each column's
data type.
• The SQL CREATE TABLE statement is used to create a new table.
• Syntax
• The basic syntax of the CREATE TABLE statement is as follows −
• CREATE TABLE table_name ( column1 datatype,
• column2 datatype,
• column3 datatype, ..... columnN datatype,
• PRIMARY KEY( one or more columns ) );
You can verify if your table has been created successfully by looking at the message displayed by the SQL
server, otherwise you can use the DESC command as follows −
SQL> DESC CUSTOMERS;
DML COMMAND
INSERT INTO
The SQL INSERT INTO Statement is used to add new rows of data to a table in the database.
Syntax-
• The syntaxes of the INSERT INTO statement which is shown below.
• INSERT INTO TABLE_NAME
• (column1, column2, column3,...columnN)
• VALUES (value1, value2, value3,...valueN);
Example-
The following statements would create six records in the CUSTOMERS table.
• INSERT INTO CUSTOMERS (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );
• INSERT INTO CUSTOMERS (2, 'Khilan', 25, 'Delhi', 1500.00 );
• INSERT INTO CUSTOMERS (3, 'kaushik', 23, 'Kota', 2000.00 );
UNIT-2 (DATA MODEL)
Introduction-: Data models are used to describe how the data is stored, accessed, and updated in a
DBMS. A set of symbols and text is used to represent them so that all the members of an organization
can understand how the data is organized. It provides a set of conceptual tools that are vastly used to
represent the description of data.
There are many types of data models that are used in the industry, some of them are –
Hierarchical Model
The hierarchical data model is one of the oldest data models, developed in the 1950s by IBM. In this
data model, the data is organized in a hierarchical tree-like structure. This data model can be easily
visualized because each record has one parent and many children (possibly 0) as shown in the image
given below.
Network Model
A network model is nothing but a generalization of the hierarchical data model as this data model
allows many to many relationships therefore in this model a record can also have more than one parent.
The network model can be represented as a graph and hence it replaces the hierarchical tree with a
graph in which object types are the nodes and relationships are the edges. For example -
Relational Model
This is the most widely accepted data model. In this model, the database is represented as a collection
of relations in the form of rows and columns of a two-dimensional table. Each row is known as
a tuple (a tuple contains all the data for an individual record) while each column represents an attribute.
For example -
Stu. Id Name Branch
101 Naman CSE
102 Saloni ECE
103 Rishabh IT
104 Pulkit ME
Advantages of Data Models
• Data models ensure that the data is represented accurately.
• The relationship between the data is well defined.
• Data redundancy can be minimized and missing data can be identified easily.
• Last but not the least, security of the data is not compromised.
ER (Entity-Relationship) Diagram in DBMS
• ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.
• It develops a conceptual design for the database. It also develops a very simple and easy to
design view of data.
• For example, Suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc. The address can be another entity with
attributes like city, street name, pin code, etc and there will be a relationship between them.
Component of ER Diagram
1. Entity:
• An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
• Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity.
Weak Entity: An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double rectangle.
Attribute
The attribute is used to describe the property of an entity. Ellipse is used to represent an attribute.
• For example, id, age, contact number, name, etc. can be attributes of a student.
a. Key Attribute
• The key attribute is used to represent the main characteristics of an entity. It represents a
primary key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute
• An attribute that composed of many other attributes is known as a composite attribute. The
composite attribute is represented by an ellipse, and those ellipses are connected with an
ellipse.
c. Multivalued Attribute An attribute can have more than one value. These attributes are known as a
multivalued attribute. The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.
d. Derived Attribute
• An attribute that can be derived from other attribute is known as a derived attribute. It can be
represented by a dashed ellipse.
• For example, A person's age changes over time and can be derived from another attribute like
Date of birth.
3. Relationship
• A relationship is used to describe the relation between entities. Diamond or rhombus is used to
represent the relationship.
Types of relationship are as follows:
a. One-to-One Relationship
When only one instance of an entity is associated with the relationship, then it is known as one to one
relationship.
For example, A person has one passport and a passport can opted by one person.
b. One-to-many relationship
• When only one instance of the entity on the left, and more than one instance of an entity on the
right associates with the relationship then this is known as a one-to-many relationship.
• For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.
c. Many-to-one relationship
When more than one instance of the entity on the left, and only one instance of an entity on the right
associates with the relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.
d. Many-to-many relationship
• When more than one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then it is known as a many-to-many relationship.
• For example, Employee can assign by many projects and project can have many employees.
Keys
Keys play an important role in the relational database.
• It is used to uniquely identify any record or row of data from the table. It is also used to establish
and identify relationships between tables.
• For example, ID is used as a key in the Student table because it is unique for each student. In the
PERSON table, passport_number, license_number, SSN are keys since they are unique for each
person.
Types of keys:
1. Primary key
It is the first key used to identify one and only one instance of an entity uniquely. An entity can contain
multiple keys, as we saw in the PERSON table. The key which is most suitable from those lists becomes a
primary key.
In the EMPLOYEE table, ID can be the primary key since it is unique for each employee. In the EMPLOYEE
table, we can even select License_Number and Passport_Number as primary keys since they are also
unique.
2. Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a tuple.
Except for the primary key, the remaining attributes are considered a candidate key. The candidate keys
are as strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of the attributes, like
SSN, Passport_Number, License_Number, etc., are considered a candidate key.
3. Super Key
• Super key is an attribute set that can uniquely identify a tuple. A super key is a superset of a
candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this combination can also
be a key.
4. Foreign key
• Foreign keys are the column of the table used to point to the primary key of another table.
• Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.
• We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.
• In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify each tuple in
a relation. These attributes or combinations of the attributes are called the candidate keys. One key is
chosen as the primary key from these candidate keys, and the remaining candidate key, if it exists, is
termed the alternate key. In other words, the total number of the alternate keys is the total number of
candidate keys minus the primary key. The alternate key may or may not exist. If there is only one
candidate key in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act as candidate
keys. In this relation, Employee_Id is chosen as the primary key, so the other candidate key, PAN_No,
acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite key. This key is
also known as Concatenated Key.
Properties of Relations
• Name of the relation is distinct from all other relations.
• Each relation cell contains exactly one atomic (single) value
• Each attribute contains a distinct name
• Attribute domain has no significance
• tuple has no duplicate value
• Order of tuple can have a different sequence
Relational database concepts to understand in DBMS include:
• Tables: A table is a data collection organized into rows and columns. In the relational
model for database management, tables are used to store information about entities.
• Relation Schema: A relation schema is a blueprint for a table. It defines the attributes
that are contained in a table as well as the relationships between those attributes.
• Degree: The degree is the number of attributes that it contains.
• Cardinality: Cardinality defines the relationship between two attributes in a relation
schema. There are three possible relationships: one-to-one, one-to-many, and many-to-
many.
• Relation key: A relation key is an attribute or combination of attributes that uniquely
identifies a tuple in a table.
• Domain: The domain of an attribute is the set of values that can be stored in that
attribute. For example, if an attribute represents ages, its domain would be any age
from 0 onward.
Integrity Constraints
• Integrity constraints are a set of rules. It is used to maintain the quality of information.
• Integrity constraints ensure that the data insertion, updating, and other processes have
to be performed in such a way that data integrity is not affected.
• Thus, integrity constraint is used to guard against accidental damage to the database.
Data integrity is the overall accuracy, completeness, and consistency of data.
Types of Integrity Constraint
1. Domain constraints
• Domain constraints can be defined as the definition of a valid set of values for an
attribute.
• The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain
2. Entity integrity constraints
• The entity integrity constraint states that primary key value can't be null.
• This is because the primary key value is used to identify individual rows in relation and if
the primary key has a null value, then we can't identify those rows.
• A table can contain a null value other than the primary key field.
3. Referential Integrity Constraints
• A referential integrity constraint is specified between two tables.
• In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary
Key of Table 2, then every value of the Foreign Key in Table 1 must be null or be
available in Table 2.
4. Key constraints
• Keys are the entity set that is used to identify an entity within its entity set uniquely.
• An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.
Codd's Rule for Relational DBMS
E.F Codd was a Computer Scientist who invented the Relational model for Database management. Based
on relational model, the Relational database was created. Codd proposed 12 rules popularly known as
Codd's 12 rules to test DBMS's concept against his relational model. Codd's rule actually define what
quality a DBMS requires in order to become a Relational Database Management System (RDBMS).
Till now, there is hardly any commercial product that follows all the 12 Codd's rules. Even
Oracle follows only eight and half(8.5) out of 12. The Codd's 12 rules are as follows
Rule 1: Information Rule
A database contains various information, and this information must be stored in each cell of a
table in the form of rows and columns.
Rule 2: Guaranteed Access Rule
Every single or precise data (atomic value) may be accessed logically from a relational database
using the combination of primary key value, table name, and column name.
Rule 3: Systematic Treatment of Null Values
This rule defines the systematic treatment of Null values in database records. The null value has
various meanings in the database, like missing the data, no value in a cell, inappropriate
information, unknown data and the primary key should not be null.
Rule 4: Active Online Catalog
The structure description of the entire database must be stored in an online catalog, known
as data dictionary, which can be accessed by authorized users. Users can use the same query
language to access the catalog which they use to access the database itself.
Rule 5: Comprehensive Data Sub-Language Rule
A database can only be accessed using a language having linear syntax that supports data
definition, data manipulation, and transaction management operations. This language can be
used directly or by means of some application. If the database allows access to data without
any help of this language, then it is considered as a violation.
Rule 6: View Updating Rule
All the views of a database, which can theoretically be updated, must also be updatable by the
system.
Rule 7: Relational Level Operation
There must be Insert, Delete, Update operations at each level of relations. It also supports
union, intersection and minus operation in the database system.
Rule 8: Physical Data Independence Rule
All stored data in a database or an application must be physically independent to access the
database. Each data should not depend on other data or an application. If data is updated or
the physical structure of the database is changed, it will not show any effect on external
applications that are accessing the data from the database.
Rule 9: Logical Data Independence Rule
It is similar to physical data independence. It means, if any changes occurred to the logical level
(table structures), it should not affect the user's view (application). For example, suppose a
table either split into two tables, or two table joins to create a single table, these changes
should not be impacted on the user view application.
Rule 10: Integrity Independence
A database must be independent of the application that uses it. All its integrity constraints can
be independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
Rule 11: Distribution Independence Rule
The distribution independence rule represents a database that must work properly, even if it is
stored in different locations and used by different end-users.
Rule 12: Non-Subversion Rule
If a system has an interface that provides access to low-level or different language to access the
database system other than SQL, it should not bypass security and integrity constraints.