Data Base Management System
With Oracle
Unit 2
Relational model: Introduction to relational model, codd’s rules, concept of
domain, attributes, tuple, relation, constraints(domain, key constraints, integrity
constraints) and their importance, concept of keys(super key, candidate key,
primary key, surrogate key, foreign key), relational algebra & relational calculs
Normalization: purpose of normalization or schema refinement, concept of
functional dependency, normal forms based on functional dependency(1NF, 2NF
and 3NF), boyce-codd normal form(BCNF)
Introduction to relational model:
Relational Model is a key concept in Database Management Systems (DBMS) that
organizes data in a structured and efficient way.
It represents data and their relationships using tables.
Each table has multiple columns, each with a unique name. These tables are also called
relations.
The relational model transforms conceptual designs from ER diagrams into
implementable structures. These structures are used in relational database systems like
Oracle SQL and MySQL.
Key Terms in the Relational Model:
Relations (Tables): It is the basic structure in which data is stored. Each relation is made
up of rows and columns.
Example: The table above named Student, is a relation. It stores data about students
using rows and columns.
Relational Schema: Schema represents the structure of a
relation.
Example: Relational Schema of STUDENT relation can be represented as
STUDENT(StudentID, Name, Age, Course).
Relational Instance: The set of values present in a relationship at a particular instance of
time is known as a relational instance as shown in Table .
Attribute: Each relation is defined in terms of some properties, each of which is known
as an attribute. or Each column shows an attribute of the data.
Example: StudentID, Name, Age, and Course are the attributes in this table.
The domain of an attribute: The possible values an attribute can take in a relation is
called its domain.
Example: The domain of the Age column is valid ages like 21, 22, 23 etc.
The domain of the Course column includes valid courses like "Computer Science,"
"Mathematics," and "Physics."
Tuple: Each row of a relation is known as tuple.
Example: STUDENT relation has 4 tuples.
Cardinality: Cardinality refers to the number of distinct values in a column compared to
the total number of rows in a table.
Example: The Age column has 3 distinct values: 21, 22 and 23.
Degree (Arity): The degree of relation refers to total number of attribute a relation has. It
is also known as Arity.
Example: The degree of this table is 4 because it has 4 columns: StudentID, Name,
Age and Course.
Primary Key: The primary key is an attribute or a set of attributes that help to uniquely
identify the tuples(records) in the relational table.
NULL values: Values of some attribute for some tuples may be unknown, missing, or
undefined which are represented by NULL. Two NULL values in a relationship are
considered different from each other.
STUDENT TABLE
Imagine a Student Table in a database:
StudentID Name Age Course
1 Vaibhav Garg 21 Computer Science
2 Komal Gupta 22 Mathematics
3 Varnika Gangwar 21 Physics
StudentID Name Age Course
4 Shubhi Chandra 23 Computer Science
Relational Model can be represented as shown below:
STUDENT (StudNo, Sname, Special)
ENROLLMENT (StudNo, Subcode, marks)
SUBJECT (Subcode, Subname, Maxmarks, Faccode)
FACULTY (Faccode, Fname, Dept)
Characteristics of the Relational Model:
Data Representation: Data is organized in tables (relations), with rows (tuples) representing
records and columns (attributes) representing data fields.
Atomic Values: Each attribute in a table contains atomic values, meaning no multi-valued or
nested data is allowed in a single cell.
Unique Keys: Every table has a primary key to uniquely identify each record, ensuring no
duplicate rows.
Attribute Domain: Each attribute has a defined domain, specifying the valid data types and
constraints for the values it can hold.
Data Independence: The model ensures logical and physical data independence, allowing
changes in the database schema without affecting the application layer.
Relational Operations: Supports operations like selection, projection, join, union and
intersection, enabling powerful data retrieval manipulation.
Data Consistency: Ensures data consistency through constraints, reducing redundancy and
anomalies.
Set-Based Representation: Tables in the relational model are treated as sets and operations
follow mathematical set theory principles.
CODD Rules:
Codd's rules are proposed by a computer scientist named Dr. Edgar F. Codd and he also invent
the relational model for database management. These rules are made to ensure data integrity,
consistency, and usability. This set of rules basically signifies the characteristics and
requirements of a relational database management system (RDBMS). In this article, we will
learn about various Codd's rules.
Codd's Rules in DBMS
Rule 1: The Information Rule
All information, whether it is user information or metadata, that is stored in a database
must be entered as a value in a cell of a table. It is said that everything within the
database is organized in a table layout.
Rule 2: The Guaranteed Access Rule
Each data element is guaranteed to be accessible logically with a combination of the table
name, primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule
The database catalog, which contains metadata about the database, must be stored and
accessed using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule
A crucial component of any efficient database system is its ability to offer an easily
understandable data manipulation language (DML) that facilitates defining, querying, and
modifying information within the database.
Rule 6: The View Updating Rule
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete
A successful database system must possess the feature of facilitating high-level
insertions, updates, and deletions that can grant users the ability to conduct these
operations with ease through a single query.
Rule 8: Physical Data Independence
Application programs and activities should remain unaffected when changes are made to
the physical storage structures or methods.
Rule 9: Logical Data Independence
Application programs and activities should remain unaffected when changes are made to
the logical structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence
Integrity constraints should be specified separately from application programs and stored
in the catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence
The distribution of data across multiple locations should be invisible to users, and the
database system should handle the distribution transparently.
Rule 12: Non-Subversion Rule
If the interface of the system is providing access to low-level records, then the interface
must not be able to damage the system and bypass security and integrity constraints.
Concept of domain:
A domain in a DBMS is the set of all allowable, atomic values (data type, size, range, and
format) that a specific attribute (column) can hold. It ensures data integrity by restricting inputs
to valid, predefined values, such as integers for age or specific strings for gender.
Key Aspects of Domains in DBMS
Definition: Defines the pool of values from which an attribute draws its value.
Components: Includes data type (e.g., INT , VARCHAR , DATE ), size (e.g., length of
string), and constraints (e.g., CHECK , NOT NULL ).
Purpose: Enforces data integrity and accuracy by preventing invalid data (e.g., string in a
numeric column) from being stored.
Example: A "Month" attribute domain might restrict inputs to only: January , February ,
..., December .
Implementation: Created using CREATE DOMAIN in SQL.
Domain Constraints
These are rules that restrict the values in a column to maintain data quality.
NOT NULL: Prevents a column from being empty.
CHECK: Restricts the range or format of values.
Example 1:
Creating a table “student” with the “ROLL” field having a value greater than 0.
Domain:
Table:
The above example will only accept the roll no. which is greater than 0.
Attributes and tuple relation:
Domain :
Data is modeled by using atomic values as the basis for the domain. In the relational model,
atomic values refer to the number of values in a domain that are indivisible. First Name is a
set of character strings that represent the names of people in the domain.
In a database, a domain is a column that contains a data type. Data types can be built-in
(such as integers or strings) or custom types that define constraints on the data themselves.
A SQL Domain is a set of valid values that can be named by the user. Name of the Domain’s
set of values that must belong to (for character string types). This is the name of the
domain’s default Collation.
Example :
In a table, a domain is a set of values that can be used to attribute an attribute. The domain of a
month can accept January, February, etc. A domain of integers can accept whole numbers that are
negative, positive, and zero in December.
Tuple :
Tuples are one of the most used items in Database Management Systems (or DBMS). A Tuple in
DBMS is just a row having inter-related data about a particular entity(it can be any object).
This data is spread across some columns having various attributes such as name, age,
gender, marks, etc. It should be noted that Tuples are mostly seen in Relational Databases
Management Systems(RDBMS) as RDBMS works on the relational model (Tabular format).
What Is Tuple In DBMS?
In Database Management System (DBMS), most of the time we need to store the data in tabular
format . This kind of data storage model is also called a Relational model and the system which
leverages the relational model is called Relational Database Management System (RDBMS).
These relations (or tables) consist of rows and columns. But in DBMS, we call these rows
“Tuples” and a row “Tuple”.
Let us see Tuple in DBMS in detail. Let us understand this with the help of a real-life example.
Example Of Single Record Or Tuple
Consider the table given below. We have data of some students like their id, name, age, etc. here,
each row has almost all the information of the respective student. Like the first row has all the
information about a student named “Sufiyan”, similarly, all other rows contain information about
other students. Hence, a single row is also termed a “record” as it contains all the information of a
student. This row or record is termed as Tuple in DBMS. Hence Tuple in DBMS is just a row
representing some inter-related data of a particular entity such as student, employee, user, etc.
Table for reference:
Press enter or click to view image in full size
A Tuple from the above-given table
In the above-given image, you can see that a Tuple is just a row having attributes of a particular
entity like name, age, marks, etc.
Attributes :
Any real-world object is considered to be an entity that has self-existence and these entities
in DBMS have their own characteristics and properties known as attributes. Attributes give
us additional information about entities and help us to study their relationship within the
specified system.
Attributes in an ER (Entity Relationship) model are always represented in an elliptical shape.
There are different types of attributes in DBMS: Simple, Composite, Single Valued, Multi-
Valued, Stored, Derived, Key, and Complex attributes.
An entity may contain any number of attributes while one of the attributes is considered to
be a primary key attribute.
An attribute can take its values from a set of possible values for each entity instance in an
ER model in DBMS.
Press enter or click to view image in full size
We always represent attributes in DBMS in an elliptical shape. We can refer to the above image
where we have an ER model diagram and the student represented in rectangle shape is our entity
object. Student entity has different attributes: Roll_No, Name, DOB, Phone_No, Age, Address,
Country, State, City, and Street.
constraints(domain, key constraints, integrity
constraints):
In modeling the design of the relational database we can put some restrictions like what values
are allowed to be inserted in the relation, and what kind of modifications and deletions are
allowed in the relation. These are the restrictions we impose on the relational database.
In models like Entity-Relationship models, we did not have such features. Database Constraints
can be categorized into 3 main categories:
1. Constraints that are applied in the data model are called Implicit Constraints.
2. Constraints that are directly applied in the schemas of the data model, by specifying them in
the DDL(Data Definition Language). These are called Schema-Based Constraints or
Explicit Constraints.
3. Constraints that cannot be directly applied in the schemas of the data model. We call these
Application-based or Semantic Constraints.
So here we are going to deal with Implicit constraints.
Relational Constraints
These are the restrictions or sets of rules imposed on the database contents. It validates the
quality of the database. It validates the various operations like data insertion, updation, and other
processes that have to be performed without affecting the integrity of the data. It protects us
against threats/damages to the database. Mainly Constraints on the relational database are of 4
types
Domain constraints
Key constraints or Uniqueness Constraints
Entity Integrity constraints
Referential integrity constraints
----
Types of Relational Constraints
Let's discuss each of the above constraints in detail.
1. Domain Constraints
Every domain must contain atomic values(smallest indivisible units) which means
composite and multi-valued attributes are not allowed.
We perform a datatype check here, which means when we assign a data type to a
column we limit the values that it can contain. Eg. If we assign the datatype of
attribute age as int, we can't give it values other than int datatype.
Example:
EID Name Phone
123456789
01 Bikash Dutta
234456678
Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values
attribute, so it is violating domain constraint.
2. Key Constraints or Uniqueness Constraints
These are called uniqueness constraints since it ensures that every tuple in the
relation should be unique.
A relation can have multiple keys or candidate keys(minimal superkey), out of which
we choose one of the keys as the primary key, we don’t have any restriction on
choosing the primary key out of candidate keys, but it is suggested to go with
the candidate key with less number of attributes.
Null values are not allowed in the primary key, hence Not Null constraint is also part
of the key constraint.
Example:
EID Name Phone
01 Bikash 6000000009
02 Paul 9000090009
01 Tuhin 9234567892
Explanation: In the above table, EID is the primary key, and the first and the last tuple have the
same value in EID ie 01, so it is violating the key constraint.
3. Entity Integrity Constraints
Entity Integrity constraints say that no primary key can take a NULL value, since
using the primary key we identify each tuple uniquely in a relation.
Example:
EID Name Phone
01 Bikash 9000900099
02 Paul 600000009
NULL Sony 9234567892
Explanation: In the above relation, EID is made the primary key, and the primary key can't take
NULL values but in the third tuple, the primary key is null, so it is violating Entity Integrity
constraints.
Advantages of Relational Database Model :
It is simpler than the hierarchical model and network model.
It is easy and simple to understand.
Its structure can be changed anytime upon requirement.
Data Integrity: The relational database model enforces data integrity through various
constraints such as primary keys, foreign keys, and unique constraints. This ensures that the
data in the database is accurate, consistent, and valid.
Flexibility: The relational database model is highly flexible and can handle a wide range of
data types and structures. It also allows for easy modification and updating of the data
without affecting other parts of the database.
Scalability: The relational database model can scale to handle large amounts of data by
adding more tables, indexes, or partitions to the database. This allows for better performance
and faster query response times.
Security: The relational database model provides robust security features to protect the data
in the database. These include user authentication, authorization, and encryption of sensitive
data.
Data consistency: The relational database model ensures that the data in the database is
consistent across all tables. This means that if a change is made to one table, the
corresponding changes will be made to all related tables.
Query Optimization: The relational database model provides a query optimizer that can
analyze and optimize SQL queries to improve their performance. This allows for faster query
response times and better scalability.
concept of keys(super key, candidate key, primary
key, surrogate key, foreign key):
Keys are fundamental elements of the relational database model that ensure uniqueness, data
integrity, and efficient data access.
They uniquely identify each row in a table.
They prevent data duplication and maintain consistency.
They create relationships between different tables.
Importance of Keys in DBMS:
Keys are important in a Database Management System (DBMS) for several reasons:
Uniqueness: Keys ensure that each record in a table is unique and can be identified
distinctly.
Data Integrity: Keys prevent data duplication and maintain the consistency of the
data.
Efficient Data Retrieval: Keys help in creating relationships between tables,
allowing faster queries and better data organization.
Key Types in DBMS
Super Key: A set of one or more attributes that uniquely identifies rows within a table. It
can contain extra, unnecessary attributes.
Candidate Key: A minimal super key; a set of attributes that uniquely identifies tuples
with no redundant attributes. A table can have multiple candidate keys.
Primary Key: A candidate key chosen by the database designer to uniquely identify
tuples in a table. It cannot contain NULL values and must be unique.
o It uniquely identifies every tuple (row) and does not allow duplicate values.
o It cannot be NULL, as each record must have a valid identifier.
o It may be single-column or composite (made of multiple columns).
o Databases often organize data using the primary key to allow faster access and
searching
Example: The STUDENT table has the structure Student(STUD_NO, SNAME,
ADDRESS, PHONE), where STUD_NO is the primary key.
Foreign Key: An attribute that links two tables together by referencing the primary key
of another table, enforcing referential integrity.
The table that contains the foreign key is called the referencing table and the table that
is referenced is called the referenced table.
Alternate Key: A candidate key that was not selected as the primary key.
Unique Key: Ensures all values in a column are unique, but unlike the primary key, it
allows one NULL value.
Composite Key: A key composed of two or more attributes used together to uniquely
identify a record when a single attribute is insufficient.
Surrogate Key: An artificially generated, unique identifier (often numeric) used when a
natural key is not available or suitable.
Compound Key: Similar to a composite key, it consists of two or more attributes, but
these attributes can be foreign keys themselves
Key Differences
Primary vs. Unique: Primary keys cannot be NULL; Unique keys allow one NULL
value.
Super vs. Candidate: Candidate keys are minimal; Super keys may contain redundant
attributes.
Primary vs. Foreign: Primary keys identify rows in the current table; Foreign keys refer
to rows in another table.
relational algebra & relational calculs:
relational algebra:
Relational Algebra is a formal language used to query and manipulate relational
databases, consisting of a set of operations like selection, projection, union, and join.
It provides a mathematical framework for querying databases, ensuring efficient data
retrieval and manipulation.
Relational algebra serves as the mathematical foundation for query SQL.
SQL queries are based on relational algebra operations, enabling users to retrieve data
effectively.
Key Concepts in Relational Algebra:
Before explaining relational algebra operations, let's define some fundamental concepts:
Relations: In relational algebra, a relation is a table that consists of rows and columns,
representing data in a structured format. Each relation has a unique name and is made up
of tuples.
Tuples: A tuple is a single row in a relation, which contains a set of values for each
attribute. It represents a single data entry or record in a relational table.
Attributes: Attributes are the columns in a relation, each representing a specific
characteristic or property of the data. For example, in a "Students" relation, attributes
could be "Name", "Age", and "Grade".
Domains: A domain is the set of possible values that an attribute can have. It defines the
type of data that can be stored in each column of a relation, such as integers, strings, or
dates.
relational calculus:
Relational calculus, a non-procedural query language in database management systems, guides
users on what data is needed without specifying how to obtain it. Commonly utilized in
commercial relational languages like SQL-QBE and QUEL, relational calculus ensures a focus
on desired data without delving into procedural details, promoting a more efficient and abstract
approach to querying in relational databases.
What is Relational Calculus?
Before understanding Relational calculus in DBMS, we need to understand Procedural
Language and Declarative Langauge.
1. Procedural Language - Those Languages which clearly define how to get the required
results from the Database are called Procedural Language. Relational algebra is a
Procedural Language.
2. Declarative Language - Those Language that only cares about What to get from the
database without getting into how to get the results are called Declarative
Language. Relational Calculus is a Declarative Language.
So Relational Calculus is a Declarative Language that uses Predicate Logic or First-Order Logic
to determine the results from Database.
Types of Relational Calculus in DBMS
Relational Calculus is of Two Types:
1. Tuple Relational Calculus (TRC)
2. Domain Relational Calculus (DRC)
Tuple Relational Calculus (TRC)
Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each row of the table
and checks if the predicate is true or false for the given row. Depending on the given predicate
condition, it returns the row or part of the row.
The Tuple Relational Calculus expression Syntax
{t \| P(t)}
Where t is the tuple variable that runs over every Row, and P(t) is the predicate logic expression
or condition.
Let's take an example of a Customer Database and try to see how TRC expressions work.
Customer Table
Customer_id Name Zip code
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345.
Example 1: Write a TRC query to get all the data of customers whose zip code is 12345.
TRC Query: {t \| t ∈ Customer ∧ [Link] = 12345} or TRC Query: {t \| Customer(t) ∧
t[Zipcode] = 12345 }
Workflow of query - The tuple variable "t" will go through every tuple of the Customer table.
Each row will check whether the Cust_Zipcode is 12345 or not and only return those rows that
satisfies the Predicate expression condition.
The TRC expression above can be read as "Return all the tuple which belongs to the
Customer Table and whose Zipcode is equal to 12345."
Result of the TRC expression above:
Customer_id Name Zip code
1 Rohit 12345
4. Amit 12345
Example 2: Write a TRC query to get the customer id of all the Customers.
Domain Relational Calculus (DRC)
Domain Relational Calculus uses domain Variables to get the column values required from the
database based on the predicate expression or condition.
The Domain realtional calculus expression syntax:
{<x1,x2,x3,x4...> \| P(x1,x2,x3,x4...)}
where,
<x1,x2,x3,x4...> are domain variables used to get the column values required,
and P(x1,x2,x3...) is predicate expression or condition.
Let's take the example of Customer Database and try to understand DRC queries with some
examples.
Customer Table
Customer_id Name Zip code
1 Rohit 12345
2 Rahul 13245
3 Rohit 56789
4 Amit 12345
Example 1: Write a DRC query to get the data of all customers with Zip code 12345.
DRC query: {<x1,x2,x3> \| <x1,x2> ∈ Customer ∧ x3 = 12345 }
Workflow of Query: In the above query x1,x2,x3 (ordered) refers to the attribute or column
which we need in the result, and the predicate condition is that the first two domain variables x1
and x2 should be present while matching the condition for each row and the third domain
variable x3 should be equal to 12345.
Result of the DRC query will be:
Customer_id Name Zip code
1 Rohit 12345
4 Amit 12345
Normalization: purpose of normalization or schema
refinement:
If a database design is not perfect, it may contain anomalies, which are like a bad dream for any
database administrator. Managing a database with anomalies is next to impossible
Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while
a few others are left with old values. Such instances leave the database in an inconsistent
state
Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else
Insert anomalies − We tried to insert data in a record that does not exist at all
Normalization is a method to remove all these anomalies and bring the database to a consistent
state
Concept of functional dependency:
Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,..., An,
then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand
side.
Amstrong’s Axioms:
If F is a set of functional dependencies then the closure of F, denoted as F +, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha
holds beta
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That
is adding attributes in dependencies, does not change the basic dependencies
Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds,
then a → c also holds. a → b is called as a functionally that determines b
First Normal Form(1NF):
First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all
the attributes in a relation must have atomic domains. The values in an atomic domain are
indivisible units
We re-arrange the relation (table) as below, to convert it to First Normal Form
Each attribute must contain only a single value from its pre-defined domain
Second normal form(2NF):
Before we learn about the second normal form, we need to understand the following
Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime
attribute
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a
non-prime attribute
If we follow second normal form, then every non-prime attribute should be fully functionally
dependent on prime key attribute. That is, if X → A holds, then there should not be any proper
subset Y of X, for which Y → A also holds true
We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon
both and not on any of the prime key attribute individually. But we find that Stu_Name can be
identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is
called partial dependency, which is not allowed in Second Normal Form
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency
Third Normal Form(3NF):
For a relation to be in Third Normal Form, it must be in Second Normal form and the following
must satisfy
No non-prime attribute is transitively dependent on prime key attribute
For any non-trivial functional dependency, X → A, then either
o X is a superkey or
A is prime attribute
We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor
is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive
dependency
To bring this relation into third normal form, we break the relation into two relations as follows
Boyce-Codd Normal Form (BCNF):
Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF
states that
For any non-trivial functional dependency, X → A, X must be a super-key
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-
key in the relation ZipCodes. So
Stu_ID → Stu_Name, Zip
And
Zip → City
Which confirms that both the relations are in BCNF