Understanding DBMS Architecture Types
Understanding DBMS Architecture Types
Unit-1
A Database stores a lot of critical information to access data quickly and securely. Hence it is
important to select the correct architecture for efficient data management. Database
Management System (DBMS) architecture is crucial for efficient data management and
system performance. It helps users to get their requests done while connecting to the
database. It focuses on how the database is designed, built and maintained, shaping how
users access and interact with it.
Types of DBMS Architecture
There are several types of DBMS Architecture that we use according to the usage
requirements. Types of DBMS Architecture are discussed here.
1-Tier Architecture
2-Tier Architecture
3-Tier Architecture
1-Tier Architecture
In 1-Tier Architecture the database is directly available to the user, the user can directly sit
on the DBMS and use it that is, the client, server, and Database are all present on the same
machine. This setup is simple and is often used in personal or standalone applications where
the user interacts directly with the database.
For Example: A Microsoft Excel spreadsheet is a great example of one-tier architecture.
Everything—the user interface, application logic and data is handled on a single
system.
The user directly interacts with the application, performs operations like
calculations or data entry and stores data locally on the same machine.
This architecture is simple and works well for personal, standalone applications where no
external server or network connection is needed.
Advantages of 1-Tier Architecture
Below mentioned are the advantages of 1-Tier Architecture.
Simple Architecture: 1-Tier Architecture is the most simple architecture to set up,
as only a single machine is required to maintain it.
Cost-Effective: No additional hardware is required for implementing 1-Tier
Architecture, which makes it cost-effective.
Easy to Implement: 1-Tier Architecture can be easily deployed, and hence it is
mostly used in small projects.
2-Tier Architecture
The 2-tier architecture is similar to a basic client-server model . The application at the client
end directly communicates with the database on the server side. APIs like ODBC and JDBC
are used for this interaction. The server side is responsible for providing query processing
and transaction management functionalities. On the client side, the user interfaces and
application programs are run. The application on the client side establishes a connection with
the server side to communicate with the DBMS.
For Example: A Library Management System used in schools or small organizations is a
classic example of two-tier architecture.
1. Client Layer (Tier 1): This is the user interface that library staff or users interact
with. For example they might use a desktop application to search for books, issue
them, or check due dates.
2. Database Layer (Tier 2): The database server stores all the library records such as
book details, user information, and transaction logs.
The client layer sends a request (like searching for a book) to the database layer which
processes it and sends back the result. This separation allows the client to focus on the user
interface, while the server handles data storage and retrieval.
Advantages of 2-Tier Architecture
Easy to Access: 2-Tier Architecture makes easy access to the database, which
makes fast retrieval.
Scalable: We can scale the database easily, by adding clients or upgrading
hardware.
Low Cost: 2-Tier Architecture is cheaper than 3-Tier Architecture and Multi-Tier
Architecture.
Easy Deployment: 2-Tier Architecture is easier to deploy than 3-Tier Architecture.
Simple: 2-Tier Architecture is easily understandable as well as simple because of
only two components.
3-Tier Architecture
In 3-Tier Architecture, there is another layer between the client and the server. The client
does not directly communicate with the server. Instead, it interacts with an application server
which further communicates with the database system and then the query processing and
transaction management takes place. This intermediate layer acts as a medium for the
exchange of partially processed data between the server and the client. This type of
architecture is used in the case of large web applications.
For Example: E-commerce Store
User: You visit an online store, search for a product and add it to your cart.
Processing: The system checks if the product is in stock, calculates the total price and
applies any discounts.
Database: The product details, your cart and order history are stored in the database for
future reference.
Advantages of 3-Tier Architecture
Enhanced scalability: Scalability is enhanced due to the distributed deployment of
application servers. Now, individual connections need not be made between the
client and server.
Data Integrity: 3-Tier Architecture maintains Data Integrity. Since there is a
middle layer between the client and the server, data corruption can be
avoided/removed.
Security: 3-Tier Architecture Improves Security. This type of model prevents
direct interaction of the client with the server thereby reducing access to
unauthorized data.
Disadvantages of 3-Tier Architecture
More Complex: 3-Tier Architecture is more complex in comparison to 2-Tier
Architecture. Communication Points are also doubled in 3-Tier Architecture.
Difficult to Interact: It becomes difficult for this sort of interaction to take place
due to the presence of middle layers.
Data abstraction is the process of hiding unwanted and irrelevant details from the end user.
It helps to store information in such a way that the end user can access data which is
necessary, the user will not be able to see what data is stored or how it is stored in a
database. Data abstraction helps to keep data secure from unauthorized access and it hides all
the implementation details.
Levels of Abstraction in DBMS
There are three levels of data abstraction in DBMS that are mentioned below.
2. ALTER
Alter command is used for altering the table in many forms like:
1. Add a column
2. Rename existing column
3. Drop a column
4. Modify the size of the column or change datatype of the column
ADD using ALTER:
Example:
ALTER TABLE Student
ADD
(Address VARCHAR(200));
RENAME using ALTER
Example:
ALTER TABLE
Student
RENAME
Marks TO Age;
DROP using ALTER
Example:
ALTER TABLE Student
DROP
(Age);
MODIFY using ALTER
Example:
ALTER TABLE
Student
MODIFY
(name varchar(300));
3. TRUNCATE
This command removes all the records from a table. But this command will not destroy the
table’s structure.
Example:
TRUNCATE TABLE Student;
4. DROP
This command completely removes the table from the database along with the destruction of
the table structure.
Example:
DROP TABLE Student;
Advantages of DDL
Defines Structure: DDL commands like CREATE and ALTER allow you to set up
and modify the structure of database objects (tables, indexes, views), organizing data
effectively.
Manages Schema: You can easily create, update, or remove entire database schemas,
making it simpler to maintain and update the database structure over time.
Enforces Data Integrity: DDL allows the implementation of rules and constraints
(PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL) that ensure data remains
accurate and consistent.
Boosts Performance: By creating indexes and partitions using DDL commands, you
can significantly improve query performance and speed up data retrieval.
Standardized: DDL commands are standardized across most SQL-based databases,
making it easier to apply knowledge across different database systems.
Disadvantages of DDL
Irreversible Changes: DDL commands, especially DROP and TRUNCATE, are
auto-committed, meaning changes can’t be undone once executed. This can result in
permanent data loss if not used carefully.
Risk of Data Loss: Misuse of commands like DROP can delete entire tables along
with their data, leading to significant data loss, particularly if no backup is available.
Complex for Large Databases: Altering the structure of large databases can be
complicated and may require downtime or additional planning to avoid disrupting
operations.
Causes Locking: DDL operations can lock database objects during execution, which
might slow down or block other operations, impacting overall database performance.
Compatibility Issues: While DDL is standardized, different databases (like MySQL,
Oracle, or PostgreSQL) may have variations in syntax and features, leading to
compatibility issues during migration.
Applications of DDL
Creating Database Objects: DDL statements can be used to create various database
objects such as tables, views, indexes, and stored procedures.
Modifying Database Objects: DDL statements can be used to modify the structure of
existing database objects such as adding or dropping columns from tables, modifying
the data type of columns, renaming tables or columns, etc.
Managing Database Constraints: DDL statements can be used to create or alter
database constraints such as primary keys, foreign keys, unique constraints, and check
constraints.
Granting or Revoking Permissions: DDL statements can be used to grant or revoke
permissions to various database objects such as tables, views, stored procedures, and
indexes.
Indexing: DDL statements can be used to create or modify indexes on database tables,
which can improve the performance of SQL queries.
Partitioning: DDL statements can be used to create or modify partitioned tables, which
can improve the performance of queries that access large amounts of data.
DML stands for Data Manipulation Language. Tables and formulas are helpful when
communicating with data stored up to a point in a database through SQL (Structured
Query Language), but a time comes when we actually want to execute some fairly
complicated data interactions. We will also need the Data Manipulation Language in that
situation. DML is a way to inform a database precisely what we want it to do by conversing
in a manner that it has been built to comprehend from scratch. When it comes to interacting
with existing data, whether adding, moving, or deleting data, it provides a convenient way to
do so.
Generally, DML commands fall into one of four primary categories:
INSERT adds fresh data to a table.
UPDATE Change the data that is already in a table.
DELETE takes a record out of a table.
SELECT Get information out of one or more tables.
Types of Data Manipulation Language
There are basically two types of Data Manipulation Language. These are mentioned below.
We have described them in the difference between format.
High-Level or Non-Procedural DML:
It is also labelled as set-at-a-time or series oriented DML.
It can be used on its own for precisely specifying complex operations in the database.
It is prescriptive in nature.
It demands that a user must clearly state which data is needed without clarifying how
and when to obtain those data.
For Example: Every SQL statement is a prescriptive command.
Low-level or Procedural DML
It is also labelled as track-at-a-time DML.
It must be integrated to a general-purpose programming language.
It is indispensable in nature.
It demands that a user must clearly state which data is needed and how to obtain those
data.
For Example: DB2’s SQL PL, Oracle’s PL/SQL.
Characteristics of DML
It performs interpret-only data queries. It is used in a database schema to recall and
manipulate the information. DML It is a dialect which is used to select, insert, delete and
update data in a database. Data Manipulation Language (DML) commands are as follows:
SELECT Command
This command is used to get data out of the database. It helps users of the database to access
from an operating system, the significant data they need. It sends a track result set from one
tables or more.
Example:
SELECT *
FROM students
where due_fees <=20000;
INSERT Command
This command is used to enter the information or values into a row. We can connect one or
more records to a single table within a repository using this instruction. This is often used to
connect an unused tag to the documents.
Example :
INSERT INTO students ('stu_id' int, 'stu_name' varchar(20), 'city' varchar(20))
VALUES ('1', 'Nirmit', 'Gorakhpur');
UPDATE Command
This command is used to alter existing table records. Within a table, it modifies data from
one or more records. This command is used to alter the data which is already present in a
table.
Example:
UPDATE students
SET due_fees = 20000
WHERE stu_name = 'Mini';
DELETE Command
It deletes all archives from a table. This command is used to erase some or all of the previous
table’s records. If we do not specify the ‘WHERE’ condition then all the rows would be
erased or deleted.
Example:
DELETE FROM students
WHERE stu_id = '001';
Advantages of DML
DML statements could alter the data that is contained or stored in the database.
It delivers effective human contact with the machine.
User could specify what data is required.
DML aims to have many different varieties and functionalities between vendors
providing databases.
Disadvantages of DML
We cannot use DML to change the structure of the database.
Limit table view i.e., it could conceal some columns in tables.
Access the data without having the data stored in the object.
Unable to build or erase lists or sections using DML.
A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from the
design of the data to its proper implementation of data.
Types of Relational Models
1. Conceptual Data Model
The conceptual data model describes the database at a very high level and is useful to
understand the needs or requirements of the database. It is this model, that is used in the
requirement-gathering process i.e. before the Database Designers start making a particular
database. One such popular model is the entity/relationship model (ER model). The E/R
model specializes in entities, relationships, and even attributes that are used by database
designers. In terms of this concept, a discussion can be made even with non-computer
science(non-technical) users and stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to
define the data and the relationships between them. It is basically a conceptual design of any
database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place,
object, class, etc. These are represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are
represented by Ellipse in an ER Diagram. It can be Age, Roll Number, or Marks
for a Student.
3. Relationship: Relationships are used to define relations among different entities.
Diamonds and Rhombus are used to show Relationships.
Characteristics of a conceptual data model
Offers Organization-wide coverage of the business concepts.
This type of Data Models are designed and developed for a business audience.
The conceptual model is developed independently of hardware specifications like
data storage capacity, location or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all
stakeholders by establishing basic concepts and scope
2. Representational Data Model
This type of data model is used to represent only the logical part of the database and does not
represent the physical structure of the database. The representational data model allows us to
focus primarily, on the design part of the database. A popular representational model is
a Relational model. The relational Model consists of Relational Algebra and Relational
Calculus. In the Relational Model, we basically use tables to represent our data and the
relationships between them. It is a theoretical concept whose practical implementation is
done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the
base for the Physical model.
Characteristics of Representational Data Model
Represents the logical structure of the database.
Relational models like Relational Algebra and Relational Calculus are commonly
used.
Uses tables to represent data and relationships.
Provides a foundation for building the physical data model.
3. Physical Data Model
The physical Data Model is used to practically implement Relational Data Model.
Ultimately, all data in a database is stored physically on a secondary storage device such as
discs and tapes. This is stored in the form of files, records, and certain other data structures.
It has all the information on the format in which the files are present and the structure of the
databases, the presence of external data structures, and their relation to each other. Here, we
basically save tables in memory so they can be accessed efficiently. In order to come up with
a good physical model, we have to work on the relational model in a better way. Structured
Query Language (SQL) is used to practically implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS
system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.
Characteristics of a physical data model:
The physical data model describes data need for a single project or application
though it maybe integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality
and nullability of the relationships.
Developed for a specific version of a DBMS, location, data storage or technology
to be used in the project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc.
are defined
The Entity Relationship Model is a model for identifying entities (like student, car or
company) to be represented in the database and representation of how those entities are
related. The ER data model specifies enterprise schema that represents the overall logical
structure of a database graphically.
Why Use ER Diagrams In DBMS?
ER diagrams represent the E-R model in a database, making them easy to convert
into relations (tables).
ER diagrams provide the purpose of real-world modeling of objects which makes
them intently useful.
ER diagrams require no technical knowledge of the underlying DBMS used.
It gives a standard solution for visualizing the data logically.
Components of ER Diagram
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database
System.
What is Entity?
An Entity may be an object with a physical existence – a particular person, car, house, or
employee – or it may be an object with a conceptual existence – a company, a job, or a
university course.
What is Entity Set?
An Entity is an object of Entity Type and a set of all entities is called an entity set. For
Example, E1 is an entity having Entity Type Student and the set of all students is called
Entity Set. In ER diagram, Entity Type is represented as:
Types of Entity
There are two types of entity:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on
other Entity in the Schema. It has a primary key, that helps in identifying it uniquely, and it
is represented by a rectangle. These are called Strong Entity Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But
some entity type exists for which key attributes can’t be defined. These are called Weak
Entity types .
What is Attributes?
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB,
Age, Address, and Mobile_No are the attributes that define entity type Student. In ER
diagram, the attribute is represented by an oval.
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute.
For example, Roll_No will be unique for each student. In ER diagram, the key attribute is
represented by an oval with underlying lines.
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example,
the Address attribute of the student Entity type consists of Street, City, State, and Country. In
ER diagram, the composite attribute is represented by an oval comprising of ovals.
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is
represented by a dashed oval.
Relationship Type and Relationship Set
A Relationship Type represents the association between entity types. For example, ‘Enrolled
in’ is a relationship type that exists between entity type Student and Course. In ER diagram,
the relationship type is represented by a diamond and connecting the entities with lines.
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more
than one entity and the total number of tables that can be used in this is 2. Let us assume that
one surgeon department can accommodate many doctors. So the Cardinality will be 1 to M.
It means one department has many Doctors.
total number of tables that can used is 3.
3. Many-to-One: When entities in one entity set can take part only once in the relationship
set and entities in other entity sets can take part more than once in the relationship set,
cardinality is many to one. Let us assume that a student can take only one course but one
course can be taken by many students. So the cardinality will be n to 1. It means that for one
course there can be n students but for one student, there will be only one course.
The total number of tables that can be used in this is 3.
Using Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than
one course and one course can be taken by many students. So the relationship will be many
to many.
the total number of tables that can be used in this is 3.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3,
and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
Network Model
This model was formalized by the Database Task group in the 1960s. This model is the
generalization of the hierarchical model. This model can consist of multiple parent segments
and these segments are grouped as levels but there exists a logical association between the
segments belonging to any level. Mostly, there exists a many-to-many logical association
between any of the two segments. We called graphs the logical associations between the
segments. Therefore, this model replaces the hierarchical tree with a graph-like structure, and
with that, there can more general connections among different nodes. It can have M: N
relations i.e, many-to-many which allows a record to have more than one parent segment.
Here, a relationship is called a set, and each set is made up of at least 2 types of record which
are given below:
An owner record that is the same as of parent in the hierarchical model.
A member record that is the same as of child in the hierarchical model.
Structure of a Network Model
In the above figure, member TWO has only one owner ‘ONE’ whereas member FIVE has
two owners i.e, TWO and THREE. Here, each link between the two record types represents
1 : M relationship between them. This model consists of both lateral and top-down
connections between the nodes. Therefore, it allows 1: 1, 1 : M, M : N relationships among
the given entities which helps in avoiding data redundancy problems as it supports multiple
paths to the same record. There are various examples such as TOTAL by Cincom Systems
Inc., EDMS by Xerox Corp., etc.
Advantages of Network Model
This model is very simple and easy to design like the hierarchical data model.
This model is capable of handling multiple types of relationships which can help in
modeling real-life applications, for example, 1: 1, 1: M, M: N relationships.
In this model, we can access the data easily, and also there is a chance that the
application can access the owner’s and the member’s records within a set.
This network does not allow a member to exist without an owner which leads to
the concept of Data integrity.
Like a hierarchical model, this model also does not have any database standard,
This model allows to represent multi parent relationships.
Disadvantages of Network Model
The schema or the structure of this database is very complex in nature as all the
records are maintained by the use of pointers.
There’s an existence of operational anomalies as there is a use of pointers for
navigation which further leads to complex implementation.
The design or the structure of this model is not user-friendly.
This model does not have any scope of automated query optimization.
This model fails in achieving structural independence even though the network
database model is capable of achieving data independence.
Object Oriented Data Model :
In Object Oriented Data Model, data and their relationships are contained in a single
structure which is referred as object in this data model. In this, real world problems are
represented as objects with different attributes. All objects have multiple relationships
between them. Basically, it is combination of Object Oriented programming and Relational
Database Model as it is clear from the following figure :
Object Oriented Data Model
= Combination of Object Oriented Programming + Relational database model
Inheritance –
By using inheritance, new class can inherit the attributes and methods of the old class
i.e. base class. For example: as classes Student, Doctor and Engineer are inherited
from the base class Person.
Advantages of Object Oriented Data Model :
Codes can be reused due to inheritance.
Easily understandable.
Cost of maintenance can reduced due to reusability of attributes and functions
because of inheritance.
Disadvantages of Object Oriented Data Model :
It is not properly developed so not accepted by users easily.
Relational Model
The relational model represents how data is stored in Relational Databases. A relational
database consists of a collection of tables each of which is assigned a unique name. Consider
a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS, PHONE, and AGE
shown in the table.
Table STUDENT
Unit-2
Relational Database systems are expected to be equipped with a query language that assists
users to query the database. Relational Query Language is used by the user to
communicate with the database user requests for the information from the database.
Relational algebra breaks the user requests and instructs the DBMS to execute the requests.
It is the language by which the user communicates with the database. They are generally on a
higher level than any other programming language. These relational query languages can be
Procedural and Non-Procedural.
1. User-Defined Data Types (UDTs): SQL3 allows users to define complex data types
tailored to specific application needs. For instance, a company can create a custom
data type to represent a full address, encapsulating fields like street, city, and postal
code. This enhances data modeling flexibility and accuracy.
2. Object-Oriented Support: Incorporating object-oriented programming concepts,
SQL3 introduces features such as inheritance, methods, and constructors. This enables
more natural and intuitive modeling of complex data structures, allowing objects to
encapsulate both data and behavior, thereby simplifying data management in intricate
applications.
3. New Data Types:
o BLOB (Binary Large Object): Designed to store large amounts of binary data,
such as images or multimedia files.
o CLOB (Character Large Object): Used for storing extensive text data.
o ARRAY: Enables the storage of arrays as column values, allowing for the
representation of multi-valued attributes within a single table.
o Structured Types: Facilitates the creation of complex data structures with
multiple attributes, akin to classes in object-oriented languages.
4. Collections of Objects: SQL3 generalizes the relational model into an object model,
offering abstract data types, multiple inheritance, and dynamic polymorphism. Tables
can contain collections (multisets) of objects, with sets and lists defined as related
collection types. This allows for more flexible and hierarchical data modeling.
5. Enhanced Procedural Capabilities: SQL3 introduces procedural language
extensions, making SQL a computationally complete programming language. This
facilitates the definition of functions and procedures within the database, allowing for
centralized processing logic and reduced network traffic.
These enhancements make SQL3 a powerful tool for developers and database administrators,
providing greater flexibility, improved performance, and more intuitive data modeling
capabilities in relational database management systems.
Normal
Forms Description of Normal Forms
First Normal A relation is in first normal form if every attribute in that relation is single-
Form (1NF) valued attribute.
Second A relation that is in First Normal Form and every non-primary-key attribute
Normal Form is fully functionally dependent on the primary key, then the relation is
(2NF) in Second Normal Form (2NF).
Boyce-Codd For BCNF the relation should satisfy the below conditions
Normal Form
The relation should be in the 3rd Normal Form.
(BCNF)
Normal
Forms Description of Normal Forms
Domain:
A domain defines the permissible set of values that an attribute (or column) in a database can
hold. It establishes constraints on the data type and format for a given attribute, ensuring that
only valid and consistent data is stored. For instance, if we have an attribute "BirthMonth,"
its domain might be the set of all twelve month names (e.g., January, February, etc.). This
means that any value entered into the "BirthMonth" column must be one of these specified
month names
Data Dependency
A dependency is a constraint that governs or defines the relationship between two
or more attributes.
In a database, it happens when information recorded in the same table uniquely
determines other information stored in the same table.
This may also be described as a relationship in which knowing the value of one
attribute (or collection of attributes) in the same table tells you the value of another
attribute (or set of attributes).
It's critical to understand database dependencies since they serve as the foundation
for database normalization.
Armstrong’s Axioms
Armstrong’s axioms, introduced by William W. Armstrong in 1974, are a set of inference
rules utilized to deduce all the functional dependencies within a relational database. These
rules, when applied to a set of functional dependencies denoted as F+, are both sound,
ensuring that only dependencies within the closure set F are generated, and complete,
meaning that applying these rules iteratively will derive all functional dependencies within
the closure F+.
Types of Axioms
There are 3 types of Axioms:
1. Reflexivity Axiom:
For a set of attributes A and a subset B of A, if B is a subset of A (B⊆A), then A→B.
This axiom represents a trivial property where a set of attributes implies itself.
2. Augmentation Axiom:
If A→B is true and Y is a set of attributes, then AY→BY is also true. This axiom
demonstrates that adding attributes to dependencies does not alter the fundamental
dependencies. If A→B holds, AC→BC holds for any set of attributes C.
3. Transitivity Axiom:
If A→B holds and B→C holds, then A→C also holds, similar to the transitive rule in
algebra. Functionally, it means that if A determines B and B determines C, then A
determines C. If X→Y and Y→Z are true, then X→Z is also true.
Rules of Axioms
Below are the rules of Axioms in DBMS:
1. Decomposition
If A→BC, then A→B and A→C
Proof:
A→BC (given)_______________ (i)
BC→B (reflexivity)____________ (ii)
A→B (transitivity from i and ii)
2. Composition
If A→B and C→D then AC→BD
Proof
A→B_________(i)
C→D_________(ii)
AC→BC________(iii) (Augmentation of i and C)
AC→B________(iv) Decomposition of iii)
AC→AD_______(v) (Augmentation of ii and A)
AC→D___________(vi) (Decomposition of v)
AC→BD________ (Union iv and vi)
3. Union (Notation)
If A→B and A→C then A→BC
Proof
A→B________(i) (given)
A→C________(ii) (given)
A→AC_______(iii) (Augmentation of ii and A)
AC→BC______(iv) (Augmentation of i and C)
A→BC________(transitivity of iii and ii)
4. Pseudo transitivity
If A→B and BC→D then AC→D
Proof
A→B__________(i) (Given)
BC→D________(ii) (Given)
AC→BC_______(iii) (Augmentation of i and C)
AC →D_________(Transitivity of iii and ii)
5. Self-determination
A→A for any given A.
This rule directly follows the Axiom of Reflexivity.
6. Extensivity
Extensivity is a particular case of augmentation where C=A
If A→B, then A→AB
In the sense that augmentation can be proven from extensivity and other axioms, extensivity
can replace augmentation as an axiom.
Proof
AC→A____(i)
A→B________(ii)
AC→B________(iii) (Transitivity of i and ii)
AC→ABC_______(iv) (Extensivity of iii)
ABC→BC______(v) (Reflexivity)
AC→BC (Transitivity of iv and v)
What is Armstrong Relation in DBMS?
An Armstrong relation refers to a relation that satisfies all the functional dependencies
present in the closure F+ and no other dependencies, based on a given set of functional
dependencies F. However, it is worth noting that for a given set of dependencies, the
smallest possible Armstrong relation can have a size that grows exponentially with the
number of attributes in the dependencies being considered.
Normalization
Normalization is the process of organizing database data by minimizing redundancy
and eliminating anomalies such as update, insertion, and deletion issues.
It involves breaking a single table into smaller tables and linking them through
relationships, following different normal forms to reduce redundancy.
Normalization continues until the database follows the Single Responsibility
Principle (SRP), ensuring that each table has a specific role for better efficiency and
consistency.
Why do we need normalization in databases?
Redundancy in data occurs when the same piece of information exists in a single database.
Database redundancy can lead to many drawbacks and introduces three anomalies (or
abnormalities). These anomalies are-
Insertion Anomaly
This anomaly occurs when specific data cannot be inserted into the table or database due to
the absence of some other data where both of these are independent of each other.
Deletion Anomaly
While deleting some data, when some critical information is lost that was necessary to
maintain the integrity of data, it is known as a deletion anomaly.
Updation / Modified Anomaly
An update anomaly occurs when modifying a single piece of data requires changes in
multiple rows, leading to potential data inconsistency.
It increases storage costs and database complexity due to redundant data, making
maintenance more challenging.
Normalization helps eliminate redundancy, optimizing the database and ensuring data
consistency.
Types of DBMS Normal Form
Normalization in a database is done through a series of normal forms.
Normal Description
Form
1NF If a table has no repeated groups, it is in 1NF.
2NF If a table is in 1NF and every non-key attribute is fully dependent on the primary
key, then it is in 2NF.
BCNF If a table is in 3NF and every non-prime attribute fully dependent on the candidate
keys, then it is in BCNF.
1 John 12345767890
1 John 12345767890
2 Claire 9242314321
2 Claire 7689025341
Here, we can notice data repetition, but 1NF doesn’t care about it.
Second Normal Form (2NF)
In 2NF, the relation present should be 1NF, and no partial dependency should exist. Partial
dependency is when the non-prime attributes depend entirely on the candidate or primary
key, even if the primary key is composite.
Example 1: (depicting partial dependency issues)
If given with a relation R(A, B, C, D) where we have {A, B} as the primary key where A
and B can’t be NULL simultaneously, but both can be NULL independently and C, D are
non-prime attributes. If B is NULL, and we are given the functional dependency, say,
B → C. So can this ever hold?
As B contains NULL, it can never determine the value of C. So, as B → C is a partial
dependency, it creates a problem. Therefore, the non-prime attributes cannot be determined
by a part of the primary key. We can remove the partial dependency present by creating two
relations ( the 2NF conversion)-
Relation 1 = R1(ABD), where {A, B} is the primary key. AB determines D.
Relation 2 = R1(BC), where B is the primary key. And from this, B determines C.
Example 2:
Consider the following table. Its primary key is {StudentId, ProjectId}.
The Functional dependencies given are -
StudentId → StudentName
ProjectId → ProjectName
Student Project
StudentId ProjectId
Name Name
1 P2 John IOT
2 P1 Claire Cloud
3 P7 Clara IOT
4 P3 Abhk Cloud
4 P3 Abhk
As this table is not in BCNF form, so we decompose it into the following tables:
John Olivia
Clara Emma
Robin Olivia
Teacher Subject
In the above table, the subject and student phone numbers are two independent entities,
showing no relation between the subject and
phone number. So to convert it into 4NF, we Student Subject
decompose the table as -
John Physics
Student Student Phone Number
Clara English
John 12345767890
Robin Mathematics
Clara 9242314321
Kaley Database
Robin 7689025341
Kaley 9878632656
Here the Student is mentioned in both tables to set up a relationship between them.
Importance of Normalization in Database Design
Reduces Data Redundancy: Normalization minimizes the duplication of data by
organizing it into related tables, ensuring that each piece of information is stored only
once.
Improves Data Integrity: By enforcing rules and constraints, normalization ensures the
accuracy and consistency of data across the database.
Simplifies Database Maintenance: A normalized database is easier to update, modify,
and maintain, as changes need to be made in only one place.
Enhances Query Performance: Properly normalized databases can improve query
efficiency by reducing the amount of data scanned during operations.
Prevents Update Anomalies: Normalization eliminates issues like insertion, deletion,
and update anomalies, ensuring that data remains consistent.
Facilitates Scalability: A well-structured, normalized database can handle growth and
changes in data requirements more effectively.
Supports Better Design Practices: Normalization encourages a logical and organized
approach to database design, making it easier to understand and manage.
Dependency Preservation:
A decomposition is considered dependency-preserving if, after the decomposition, all the
original functional dependencies are still valid. This means that either:
Each functional dependency from the original relation appears directly in one of the
decomposed relations.
The union of the functional dependencies in the decomposed relations is equivalent to
the original set of functional dependencies.
Why is it important?
Data Integrity: Dependency preservation ensures that the relationships between data
are maintained even after decomposition.
Reduced Redundancy: By breaking down relations, dependency preservation helps to
minimize data redundancy and improve data integrity.
Improved Query Performance: Smaller, more focused relations can lead to faster
query performance.
Easier Maintenance and Updates: Decomposed relations are easier to maintain and
update.
Example:
Let's say you have a relation R(A, B, C) with functional dependencies A -> B and B -> C.
If you decompose R into R1(A, B) and R2(B, C), then the decomposition is dependency-
preserving because both original dependencies are present in the decomposed relations.
If you decompose R into R1(A, B) and R2(A, C), then the decomposition is not
dependency-preserving because the dependency B -> C is not present in the decomposed
relations.
Lossless Join Decomposition
Consider there is a relation R which is decomposed into sub relations R 1, R2, …., Rn.
This decomposition is called lossless join decomposition when the join of the sub
relations results in the same relation R that was decomposed.
For lossless join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
where ⋈ is a natural join operator.
Example:
Consider the following relation R( A , B , C )-
Consider this relation is decomposed into two sub relations R1( A , B ) and R2( B , C )-
Now, let us check whether this decomposition is lossless or not. For lossless decomposition,
we must have-
R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and R2 , we get-
This relation is same as the original relation R. Thus, we conclude that the above
decomposition is lossless join decomposition.
Note
Lossless join decomposition is also known as non-additive join decomposition. This is
because the resultant relation after joining the sub relations is same as the decomposed
relation.
No extraneous tuples appear after joining of the sub-relations.
2. Lossy Join Decomposition
Consider there is a relation R which is decomposed into sub relations R 1 , R2 , …. , Rn.
This decomposition is called lossy join decomposition when the join of the sub relations
does not result in the same relation R that was decomposed.
The natural join of the sub relations is always found to have some extraneous tuples.
For lossy join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
where ⋈ is a natural join operator.
Example:
Consider the above relation R( A , B , C )
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-
decomposition.
Note
Lossy join decomposition is also known as careless decomposition.
This is because extraneous tuples get introduced in the natural join of the sub-relations.
Extraneous tuples make the identification of the original tuples difficult.
Query
A query is a kind of request that is sent to the Database for retrieval of data. In a query, we
pass some specific condition, and then it matches the specific data if it is present in the
database.
We can write queries in SQL( Structured Query Language). This language is a way of
communication between the user and the relational database. We can perform different
operations on data by writing queries. We can select the data, delete the data, insert
something, etc.
Query Processing
Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:
1. Parsing and translation
2. Optimization
3. Evaluation
The query processing works in the following way:
Parsing and Translation
As query processing includes certain activities for data retrieval. Initially, the given user
queries get translated in high-level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the file system. After this, the
actual evaluation of the queries and a variety of query -optimizing transformations and takes
place. Thus before processing a query, a computer system needs to translate the query into a
human-readable and understandable language. Consequently, SQL or Structured Query
Language is the best suitable choice for humans. But, it is not perfectly suitable for the
internal representation of the query to the system. Relational algebra is well suited for the
internal representation of a query. The translation process in query processing is similar to
the parser of a query. When a user executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query, verifies the name of the
relation in the database, the tuple, and finally the required attribute value. The parser creates
a tree of the query, known as 'parse-tree.' Further, translate it into the form of relational
algebra. With this, it evenly replaces all the use of the views when used in the query.
Thus, we can understand the working of a query processing in the below-described diagram:
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and
evaluating each operation. Thus, after translating the user query, the system executes a query
evaluation plan.
Query Evaluation Plan
In order to fully evaluate a query, the system needs to construct a query evaluation plan.
The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.
A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to write
their query efficiently.
Usually, a database system generates an efficient query evaluation plan, which minimizes
its cost. This type of task performed by the database system and is known as Query
Optimization.
For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
Evaluation of Expressions
For evaluating an expression that carries multiple operations in it, we can perform the
computation of each operation one by one. However, in the query processing system, we use
two methods for evaluating an expression carrying multiple operations. These methods are:
1. Materialization
2. Pipelining
Let's take a brief discussion of these methods.
Materialization
In this method, the given expression evaluates one relational operation at a time. Also, each
operation is evaluated in an appropriate sequence or order. After evaluating all the
operations, the outputs are materialized in a temporary relation for their subsequent uses. It
leads the materialization method to a disadvantage. The disadvantage is that it needs to
construct those temporary relations for materializing the results of the evaluated operations,
respectively. These temporary relations are written on the disks unless they are small in size.
Pipelining
Pipelining is an alternate method or approach to the materialization method. In pipelining, it
enables us to evaluate each relational operation of the expression simultaneously in a
pipeline. In this approach, after evaluating one operation, its output is passed on to the next
operation, and the chain continues till all the relational operations are evaluated thoroughly.
Thus, there is no requirement of storing a temporary relation in pipelining. Such an
advantage of pipelining makes it a better approach as compared to the approach used in the
materialization method. Even the costs of both approaches can have subsequent differences
in-between. But, both approaches perform the best role in different cases. Thus, both ways
are feasible at their place.
External Sort-Merge Algorithm
Till now, we saw that sorting is an important term in any database system. It means
arranging the data either in ascending or descending order. We use sorting not only for
generating a sequenced output but also for satisfying conditions of various database
algorithms. In query processing, the sorting method is used for performing various relational
operations such as joins, etc. efficiently. But the need is to provide a sorted input value to the
system. For sorting any relation, we have to build an index on the sort key and use that index
for reading the relation in sorted order. However, using an index, we sort the relation
logically, not physically. Thus, sorting is performed for cases that include:
Case 1: Relations that are having either small or medium size than main memory.
Case 2: Relations having a size larger than the memory size.
In Case 1, the small or medium size relations do not exceed the size of the main memory. So,
we can fit them in memory. So, we can use standard sorting methods such as quicksort,
merge sort, etc., to do so.
For Case 2, the standard algorithms do not work properly. Thus, for such relations whose
size exceeds the memory size, we use the External Sort-Merge algorithm.
The sorting of relations which do not fit in the memory because their size is larger than the
memory size. Such type of sorting is known as External Sorting. As a result, the external-
sort merge is the most suitable method used for external sorting.
External Sort-Merge Algorithm
Here, we will discuss the external-sort merge algorithm stages in detail:
In the algorithm, M signifies the number of disk blocks available in the main memory buffer
for sorting.
Stage 1: Initially, we create a number of sorted runs. Sort each of them. These runs contain
only a few records of the relation.
i = 0;
repeat
read either M blocks or the rest of the relation having a smaller size;
sort the in-memory part of the relation;
write the sorted data to run file Ri;
i =i+1;
Primary B+-tree index, (hi +1) * (tr + ts) Each I/O operation needs one
Equality on Key seek and one block transfer to
fetch the record by traversing
the height of the tree.
Primary B+-tree index, hi * (tT + ts) + b * tT It needs one seek for each
Equality on a Nonkey level of the tree, and one seek
for the first block.
Secondary B+-tree index, (hi + 1) * (tr + ts) Each I/O operation needs one
Equality on Key seek and one block transfer to
fetch the record by traversing
the height of the tree.
Secondary B+-tree index, (hi + n) * (tr + ts) It requires one seek per
Equality on Nonkey record because each record
may be on a different block.
Primary B+-tree index, hi * (tr + ts) + b * tT It needs one seek for each
Comparison level of the tree, and one seek
for the first block.
Secondary B+-tree index, (hi + n) * (tr + ts) It requires one seek per
Comparison record because each record
may be on a different block.
It is most commonly used for evaluating an It is typical so rarely used in the systems. But,
expression. it is good for systems such as parallel
processing systems.
Pipelining Materialization
It does not use any temporary relations for It uses temporary relations for storing the
storing the results of the evaluated results of the evaluated operations. So, it
operations. needs more temporary files and I/O.
It is a more efficient way of query evaluation It is less efficient as it takes time to generate
as it quickly generates the results. the query results.
It requires memory buffers at a high rate for It does not have any higher requirements for
generating outputs. Insufficient memory memory buffers for query evaluation.
buffers will cause thrashing.
It optimizes the cost of query evaluation. As The overall cost includes the cost of
it does not include the cost of reading and operations plus the cost of reading and
writing the temporary storages. writing results on the temporary storage.
Unit-3
The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device. The electromagnetic devices used in database systems
for data storage are classified as follows:
1. Primary Memory
2. Secondary Memory
3. Tertiary Memory
Types of Memory
1. Primary Memory
The primary memory of a server is the type of data storage that is directly accessible by the
central processing unit, meaning that it doesn’t require any other devices to read from it.
The primary memory must, in general, function flawlessly with equal contributions from the
electric power supply, the hardware backup system, the supporting devices, the coolant that
moderates the system temperature, etc.
The size of these devices is considerably smaller and they are volatile.
According to performance and speed, the primary memory devices are the fastest devices,
and this feature is in direct correlation with their capacity.
These primary memory devices are usually more expensive due to their increased speed
and performance.
The cache is one of the types of Primary Memory.
Cache Memory: Cache Memory is a special very high-speed memory. It is used to speed
up and synchronize with a high-speed CPU. Cache memory is costlier than main memory
or disk memory but more economical than CPU registers. Cache memory is an extremely
fast memory type that acts as a buffer between RAM and the CPU.
2. Secondary Memory
Data storage devices known as secondary storage, as the name suggests, are devices that can
be accessed for storing data that will be needed at a later point in time for various purposes
or database actions. Therefore, these types of storage systems are sometimes called backup
units as well. Devices that are plugged or connected externally fall under this memory
category, unlike primary memory, which is part of the CPU. The size of this group of
devices is noticeably larger than the primary devices and smaller than the tertiary devices.
It is also regarded as a temporary storage system since it can hold data when needed and
delete it when the user is done with it. Compared to primary storage devices as well as
tertiary devices, these secondary storage devices are slower with respect to actions and
pace.
It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day.
Some commonly used Secondary Memory types that are present in almost every system are:
Flash Memory: Flash memory, also known as flash storage, is a type of nonvolatile
memory that erases data in units called blocks and rewrites data at the byte level. Flash
memory is widely used for storage and data transfer in consumer devices, enterprise
systems, and industrial applications. Unlike traditional hard drives, flash memories are
able to retain data even after the power has been turned off
Magnetic Disk Storage: A Magnetic Disk is a type of secondary memory that is a flat
disc covered with a magnetic coating to hold information. It is used to store various
programs and files. The polarized information in one direction is represented by 1, and
vice versa. The direction is indicated by 0.
3. Tertiary Memory
For data storage, Tertiary Memory refers to devices that can hold a large amount of data
without being constantly connected to the server or the peripherals. A device of this type is
connected either to a server or to a device where the database is stored from the outside.
Due to the fact that tertiary storage provides more space than other types of device
memory but is most slowly performing, the cost of tertiary storage is lower than primary
and secondary. As a means to make a backup of data, this type of storage is commonly
used for making copies from servers and databases.
The ability to use secondary devices and to delete the contents of the tertiary devices is
similar.
Some commonly used Tertiary Memory types that are almost present in every system are:
Optical Storage: It is a type of storage where reading and writing are to be performed
with the help of a laser. Typically data written on CDs and DVDs are examples
of Optical Storage.
Tape Storage: Tape Storage is a type of storage data where we use magnetic tape to
store data. It is used to store data for a long time and also helps in the backup of data
in case of data loss.
Memory Hierarchy
A computer system has a hierarchy of memory. Direct access to a CPU’s main memory and
inbuilt registers is available. Accessing the main memory takes less time than running a
CPU. Cache memory is introduced to minimize this difference in speed. Data that is most
frequently accessed by the CPU resides in cache memory, which provides the fastest access
time to data. Fastest-accessing memory is the most expensive. Although large storage
devices are slower and less expensive than CPU registers and cache memory, they can store
a greater amount of data.
1. Magnetic Disks
Present-day computer systems use hard disk drives as secondary storage devices. Magnetic
disks store information using the concept of magnetism. Metal disks are coated with
magnetizable material to create hard disks. Spindles hold these disks vertically. As the
read/write head moves between the disks, it de-magnetizes or magnetizes the spots under it.
There are two magnetized spots: 0 (zero) and 1 (one). Formatted hard disks store data
efficiently by storing them in a defined order. The hard disk plate is divided into many
concentric circles, called tracks. Each track contains a number of sectors. Data on a hard disk
is typically stored in sectors of 512 bytes.
2. Redundant Array of Independent Disks(RAID)
In the Redundant Array of Independent Disks technology, two or more secondary storage
devices are connected so that the devices operate as one storage medium. A RAID array
consists of several disks linked together for a variety of purposes. Disk arrays are categorized
by their RAID levels.
RAID 0: At this level, disks are organized in a striped array. Blocks of data are divided
into disks and distributed over disks. Parallel writing and reading of data occur on each
disk. This improves performance and speed. Level 0 does not support parity and backup.
RAID 1: Mirroring is used in RAID 1. A RAID controller copies data across all disks
in an array when data is sent to it. In case of failure, RAID level 1 provides 100%
redundancy.
RAID 2: The data in RAID 2 is striped on different disks, and the Error Correction
Code is recorded using Hamming distance. Similarly to level 0, each bit within a word
is stored on a separate disk, and ECC codes for the data words are saved on a separate
set of disks. As a result of its complex structure and high cost, RAID 2 cannot be
commercially deployed.
RAID 3: Data is striped across multiple disks in RAID 3. Data words are parsed to
generate a parity bit. It is stored on a different disk. Thus, single-disk failures can be
avoided.
RAID 4: This level involves writing an entire block of data onto data disks, and then
generating the parity and storing it somewhere else. At level 3, bytes are striped, while
at level 4, blocks are striped. Both levels 3 and 4 require a minimum of three disks.
RAID 5: The data blocks in RAID 5 are written to different disks, but the parity bits
are spread out across all the data disks rather than being stored on a separate disk.
RAID 6: The RAID 6 level extends the level 5 concept. A pair of independent parities
are generated and stored on multiple disks at this level. A pair of independent parities
are generated and stored on multiple disks at this level. Ideally, you need four disk
drives for this level.
Unit-4
Transactions
A transaction is a program including a collection of database operations, executed as a
logical unit of data processing. The operations performed in a transaction include one or
more of database operations like insert, delete, update or retrieve data. It is an atomic process
that is either performed into completion entirely or is not performed at all. A transaction
involving only data retrieval without any data update is called read-only transaction.
Each high level operation can be divided into a number of low level tasks or operations. For
example, a data update operation can be divided into three tasks −
read_item() − reads data item from storage to main memory.
modify_item() − change value of item in the main memory.
write_item() − write the modified value from main memory to storage.
Database access is restricted to read_item() and write_item() operations. Likewise, for all
transactions, read and write forms the basic database operations.
Transaction Operations
The low level operations performed in a transaction are −
begin_transaction − A marker that specifies start of transaction execution.
read_item or write_item − Database operations that may be interleaved with main
memory operations as a part of transaction.
end_transaction − A marker that specifies end of transaction.
commit − A signal to specify that the transaction has been successfully completed in its
entirety and will not be undone.
rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be rolled
back.
Transaction States
A transaction may go through a subset of five states, active, partially committed, committed,
failed and aborted.
Active − The initial state where the transaction enters is the active state. The transaction
remains in this state while it is executing read, write or other operations.
Partially Committed − The transaction enters this state after the last statement of the
transaction has been executed.
Committed − The transaction enters this state after successful completion of the
transaction and system checks have issued commit signal.
Failed − The transaction goes from partially committed state or active state to failed state
when it is discovered that normal execution can no longer proceed or system checks fail.
Aborted − This is the state after the transaction has been rolled back after failure and the
database has been restored to its state that was before the transaction began.
The following state transition diagram depicts the states in the transaction and the low level
transaction operations that causes change in states.
Parallel Schedules − In parallel schedules, more than one transactions are active
simultaneously, i.e. the transactions contain operations that overlap at time. This is
depicted in the following graph −
Conflicts in Schedules
In a schedule comprising of multiple transactions, a conflict occurs when two active
transactions perform non-compatible operations. Two operations are said to be in conflict,
when all of the following three conditions exists simultaneously −
The two operations are parts of different transactions.
Both the operations access the same data item.
At least one of the operations is a write_item() operation, i.e. it tries to modify the data
item.
Serializability
A serializable schedule of n transactions is a parallel schedule which is equivalent to a serial
schedule comprising of the same n transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Equivalence of Schedules
Equivalence of two schedules can be of the following types −
Result equivalence − Two schedules producing identical results are said to be result
equivalent.
View equivalence − Two schedules that perform similar action in a similar manner are
said to be view equivalent.
Conflict equivalence − Two schedules are said to be conflict equivalent if both contain
the same set of transactions and has the same order of conflicting pairs of operations.
Various concurrency control techniques are:
1. Locking Based Protocol
2. Time stamp ordering Protocol
3. Multi version concurrency control
4. Validation concurrency control
Two Phase Locking
The Two-Phase Locking (2PL) Protocol is a key technique used in database management
systems to manage how multiple transactions access and modify data at the same time. When
many users or processes interact with a database, it’s important to ensure that data remains
consistent and error-free. Without proper management, issues like data conflicts or
corruption can occur if two transactions try to use the same data simultaneously.
The Two-Phase Locking Protocol resolves this issue by defining clear rules for managing
data locks. It divides a transaction into two phases:
1. Growing Phase: In this step, the transaction gathers all the locks it needs to access the
required data. During this phase, it cannot release any locks.
2. Shrinking Phase: Once a transaction starts releasing locks, it cannot acquire any new
ones. This ensures that no other transaction interferes with the ongoing process.
Time Stamp Ordering Protocol
A timestamp is a tag that can be attached to any transaction or any data item, which denotes
a specific time on which the transaction or the data item had been used in any way. A
timestamp can be implemented in 2 ways. One is to directly assign the current value of the
clock to the transaction or data item. The other is to attach the value of a logical counter that
keeps increment as new timestamps are required. The timestamp of a data item can be of 2
types:
• W-timestamp(X): This means the latest time when the data item X has been written into.
• R-timestamp(X): This means the latest time when the data item X has been read from.
These 2 timestamps are updated each time a successful read/write operation is performed
on the data item X.
Multiversion Concurrency Control
Multiversion schemes keep old versions of data item to increase concurrency. Multiversion
2 phase locking: Each successful write results in the creation of a new version of the data
item written. Timestamps are used to label the versions. When a read(X) operation is issued,
select an appropriate version of X based on the timestamp of the transaction.
Validation Concurrency Control
The optimistic approach is based on the assumption that the majority of the database
operations do not conflict. The optimistic approach requires neither locking nor time
stamping techniques. Instead, a transaction is executed without restrictions until it is
committed. Using an optimistic approach, each transaction moves through 2 or 3 phases,
referred to as read, validation and write.
• During read phase, the transaction reads the database, executes the needed computations
and makes the updates to a private copy of the database values. All update operations of
the transactions are recorded in a temporary update file, which is not accessed by the
remaining transactions.
• During the validation phase, the transaction is validated to ensure that the changes made
will not affect the integrity and consistency of the database. If the validation test is
positive, the transaction goes to a write phase. If the validation test is negative, he
transaction is restarted and the changes are discarded.
• During the write phase, the changes are permanently applied to the database.
Deadlock in DBMS
A deadlock is a condition where two or more transactions are waiting indefinitely for one
another to give up locks. Deadlock is said to be one of the most feared complications in
DBMS as no task ever gets finished and is in waiting state forever.
Deadlock in DBMS
• Every process need some resource for its execution and these resources are granted in
sequential order
• First the process request some resource.
• OS grants the resource to the process if it is available or else it places the request in the
wait queue.
• The process uses it and releases on the completion so that it can be granted to another
process
• Deadlock is a situation where two or more transactions are waiting indefinitely for each
other to give up their locks.
Necessary conditions for Deadlocks
• Mutual Exclusion: It implies if two processes cannot use the same resource at the same
time.
• Hold and Wait: A process waits for some resources while holding another resource at the
same time.
• No pre-emption: The process which once scheduled will be executed till the completion.
No other process can be scheduled by the scheduler meanwhile.
• Circular Wait: All the processes must be waiting for the resources in a cyclic manner so
that the last process is waiting for the resource which is being held by the first process.
Starvation in a database management system (DBMS) is when a transaction or process is
repeatedly delayed or stalled because it cannot access the resources it needs to move
forward. When other transactions or processes take precedence over the one that is starving,
this might occur.
Reasons for Starvation –
If the waiting scheme for locked items is unfair. ( priority queue )
Victim selection. (the same transaction is selected as a victim repeatedly )
Resource leak.
Via denial-of-service attack.
Solutions of Starvation –
1. Increasing Priority – Starvation occurs when a transaction has to wait for an indefinite
time, In In this situation, we can increase the priority of that particular transaction/s. But
the drawback with this solution is that it may happen that the other transaction may have
to wait longer until the highest priority transaction comes and proceeds.
2. Modification in Victim Selection algorithm – If a transaction has been a victim of
repeated selections, then the algorithm can be modified by lowering its priority over other
transactions.
3. First Come First Serve approach – A fair scheduling approach i.e FCFS can be adopted,
In which the transaction can acquire a lock on an item in the order, in which the requested
the lock.
4. Wait for the die and wound wait scheme – These are the schemes that use the timestamp
ordering mechanism of transaction.
Database Recovery Techniques
Database Systems like any other computer system, are subject to failures but the data stored
in them must be available as and when required. When a database fails it must possess the
facilities for fast recovery. It must also have atomicity i.e. either transactions are completed
successfully and committed (the effect is recorded permanently in the database) or the
transaction should have no effect on the database.
Types of Recovery Techniques in DBMS
Database recovery techniques are used in database management systems (DBMS) to restore
a database to a consistent state after a failure or error has occurred. The main goal of
recovery techniques is to ensure data integrity and consistency and prevent data loss.
There are mainly two types of recovery techniques used in DBMS
• Rollback/Undo Recovery Technique
• Commit/Redo Recovery Technique
• CheckPoint Recovery Technique
Rollback/Undo Recovery Technique
The rollback/undo recovery technique is based on the principle of backing out or undoing the
effects of a transaction that has not been completed successfully due to a system failure or
error. This technique is accomplished by undoing the changes made by the transaction using
the log records stored in the transaction log. The transaction log contains a record of all the
transactions that have been performed on the database. The system uses the log records to
undo the changes made by the failed transaction and restore the database to its previous state.
Commit/Redo Recovery Technique
The commit/redo recovery technique is based on the principle of reapplying the changes
made by a transaction that has been completed successfully to the database. This technique is
accomplished by using the log records stored in the transaction log to redo the changes made
by the transaction that was in progress at the time of the failure or error. The system uses the
log records to reapply the changes made by the transaction and restore the database to its
most recent consistent state.
Checkpoint Recovery Technique
Checkpoint Recovery is a technique used to improve data integrity and system stability,
especially in databases and distributed systems. It entails preserving the system’s state at
regular intervals, known as checkpoints, at which all ongoing transactions are either
completed or not initiated. This saved state, which includes memory and CPU registers, is
kept in stable, non-volatile storage so that it can withstand system crashes. In the event of a
breakdown, the system can be restored to the most recent checkpoint, which reduces data
loss and downtime. The frequency of checkpoint formation is carefully regulated to decrease
system overhead while ensuring that recent data may be restored quickly.
Overall, recovery techniques are essential to ensure data consistency and availability
in Database Management System, and each technique has its own advantages and limitations
that must be considered in the design of a recovery system.
Database Systems
There are both automatic and non-automatic ways for both, backing up data and recovery
from any failure situations. The techniques used to recover lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect command execution, etc. are
database recovery techniques. So to prevent data loss recovery techniques based on deferred
updates and immediate updates or backing up data can be used. Recovery techniques are
heavily dependent upon the existence of a special file known as a system log. It contains
information about the start and end of each transaction and any updates which occur during
the transaction. The log keeps track of all transaction operations that affect the values of
database items. This information is needed to recover from transaction failure.
• The log is kept on disk start_transaction(T): This log entry records that transaction T
starts the execution.
• read_item(T, X): This log entry records that transaction T reads the value of database
item X.
• write_item(T, X, old_value, new_value): This log entry records that transaction T
changes the value of the database item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new value is known as an
afterimage of X.
• commit(T): This log entry records that transaction T has completed all accesses to the
database successfully and its effect can be committed (recorded permanently) to the
database.
• abort(T): This records that transaction T has been aborted.
• checkpoint: A checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in a consistent state, and all the transactions were committed.
A transaction T reaches its commit point when all its operations that access the database
have been executed successfully i.e. the transaction has reached the point at which it will
not abort (terminate without completing). Once committed, the transaction is permanently
recorded in the database. Commitment always involves writing a commit entry to the log and
writing the log to disk. At the time of a system crash, the item is searched back in the log for
all transactions T that have written a start_transaction(T) entry into the log but have not
written a commit(T) entry yet; these transactions may have to be rolled back to undo their
effect on the database during the recovery process.
• Undoing: If a transaction crashes, then the recovery manager may undo transactions i.e.
reverse the operations of a transaction. This involves examining a transaction for the log
entry write_item(T, x, old_value, new_value) and setting the value of item x in the
database to old-value. There are two major techniques for recovery from non-catastrophic
transaction failures: deferred updates and immediate updates.
• Deferred Update: This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction updates
are recorded in the local transaction workspace. If a transaction fails before reaching its
commit point, it will not have changed the database in any way so UNDO is not needed.
It may be necessary to REDO the effect of the operations that are recorded in the local
transaction workspace, because their effect may not yet have been written in the database.
Hence, a deferred update is also known as the No-undo/redo algorithm.
• Immediate Update: In the immediate update, the database may be updated by some
operations of a transaction before the transaction reaches its commit point. However,
these operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point, the effect of
its operation must be undone i.e. the transaction must be rolled back hence we require
both undo and redo. This technique is known as undo/redo algorithm.
• Caching/Buffering: In this one or more disk pages that include data items to be updated
are cached into main memory buffers and then updated in memory before being written
back to disk. A collection of in-memory buffers called the DBMS cache is kept under the
control of DBMS for holding these buffers. A directory is used to keep track of which
database items are in the buffer. A dirty bit is associated with each buffer, which is 0 if
the buffer is not modified else 1 if modified.
• Shadow Paging: It provides atomicity and durability. A directory with n entries is
constructed, where the ith entry points to the ith database page on the link. When a
transaction began executing the current directory is copied into a shadow directory. When
a page is to be modified, a shadow page is allocated in which changes are made and when
it is ready to become durable, all pages that refer to the original are updated to refer new
replacement page.
• Backward Recovery: The term ” Rollback ” and ” UNDO ” can also refer to backward
recovery. When a backup of the data is not available and previous modifications need to
be undone, this technique can be helpful. With the backward recovery method, unused
modifications are removed and the database is returned to its prior condition. All
adjustments made during the previous traction are reversed during the backward
recovery. In other words, it reprocesses valid transactions and undoes the erroneous
database updates.
• Forward Recovery: “ Roll forward “and ” REDO ” refers to forwarding recovery.
When a database needs to be updated with all changes verified, this forward recovery
technique is helpful. Some failed transactions in this database are applied to the database
to roll those modifications forward. In other words, the database is restored using
preserved data and valid transactions counted by their past saves.
Backup Techniques
There are different types of Backup Techniques. Some of them are listed below.
• Full database Backup: In this full database including data and database, Meta
information needed to restore the whole database, including full-text catalogs are backed
up in a predefined time series.
• Differential Backup: It stores only the data changes that have occurred since the last full
database backup. When some data has changed many times since the last full database
backup, a differential backup stores the most recent version of the changed data. For this
first, we need to restore a full database backup.
• Transaction Log Backup: In this, all events that have occurred in the database, like a
record of every single statement executed is backed up. It is the backup of transaction log
entries and contains all transactions that had happened to the database. Through this, the
database can be recovered to a specific point in time. It is even possible to perform a
backup from a transaction log if the data files are destroyed and not even a single
committed transaction is lost.
Unit-5
Database Security
Database security in DBMS is a technique for protecting and securing a database from
intentional or accidental threats. Security considerations will apply not only to the data
stored in an organization's database: a breach of security may impact other aspects of the
system, which may ultimately affect the database structure. As a result, database
security encompasses hardware parts, software parts, human resources, and data.
To use the security efficiently, appropriate controls are required, which are separated into a
specific goal and purpose for the system. The demand for effective security, which was
frequently neglected or overlooked in the past, is now being rigorously verified by many
businesses.
We consider database security in the following scenarios:
Theft and fraudulent.
Loss of Data privacy.
Loss of Data integrity.
Loss of confidentiality or secrecy
Loss of availability of data.
These above-stated conditions generally represent the areas where the organization should
focus on lowering the risk, which is the possibility of data loss or destruction within a
database. Since all of the data inside an organization is interrelated, an activity that results
in a loss in one area may also lead to a loss in another.
Why Database Security is Important?
Security is an important concern in database management because the information stored in a
database is a very valuable and, at times, quite sensitive commodity. As a result, data in a
database management system must be protected from abuse and illegal access and updates.
Compromise of intellectual property: Our intellectual property—trade secrets,
inventions, or unique methods—could be essential for our ability to sustain an advantage
in our industry. If our intellectual property is stolen or leaked, then we will lose our
competitive advantage and it may be difficult to maintain or recover.
The reputational harm is done to our brand: Customers or partners may refuse to buy
goods or services from us (or do business with us) if they do not believe they can trust our
company to protect their data or their own.
The concept of business continuity (or lack of it): Some businesses are unable to
operate until a breach has been resolved.
Penalties or fines to be paid for failure: The cost of failing to comply with international
regulations such as the Sarbanes-Oxley Act (SAO) or Payment Card Industry Data
Security Standard (PCI DSS) specific to industry regulations on data privacy, such as
HIPAA, or regional privacy laws like the European Union's General Data Protection
Regulation (GDPR) could be significant, with fines exceeding many millions of dollars in
the worst-case scenario.
Costs of correcting breaches and notifying consumers about them: Along with
notifying customers of a breach, the organization that was breached must fund the
investigation and forensic services such as crisis management, triage repairs to the
affected systems, and much more.
Database Security Threats
Many software vulnerabilities, misconfigurations, or practices of misuse or carelessness
could lead to breaches. The following are some of the most well-known causes and types of
database security cyber threats.
1) SQL/NoSQL Injection Attacks
It is a type of attack that occurs when a malicious code is injected into frontend (web) apps
and then transmitted to the backend database. SQL injections provide hackers with
unrestricted access to any data saved in a database. There are two types of such computer
attacks: SQL injection attacks on traditional databases and NoSQL injection attacks on
large data databases. Typically, these are queries generated as an extension of online
application forms or received via HTTP requests. Any database system is vulnerable to these
attacks if developers do not follow secure coding practices and the organization does not
conduct regular vulnerability testing.
Countermeasures:
Direct queries should be replaced with stored procedures.
The MVC Architecture must be implemented.
2) Malware
Malware is software designed to corrupt data or harm a database. Malware could enter your
system via any endpoint device connected to the database's network and exploit
vulnerabilities in your system. Malware protection is important on any endpoint, but it is
particularly necessary on database servers due to their high value and
sensitivity. Examples of common malware include spyware, Trojan viruses, viruses, worms,
adware, and ransomware.
Countermeasures:
Safeguard any endpoints on your databases. Install specialized malware protection
software and spread awareness among users on risk and preventative measures.
3) Lack of Security Expertise and Education
Databases are breached and leaked due to insufficient level of IT security expertise and
education of non-technical employees, who may violate basic database security standards
and endanger databases. IT security employees may also lack the necessary expertise to
create security controls, enforce rules, or execute incident response processes.
Countermeasures:
Database users must be trained in database security.
IT security professionals will be encouraged to advance their professional level and
qualifications.
4) Denial of Service (DoS/DDoS) Attacks
In a denial of service (DoS) attack, the cybercriminal uses a huge number of fake requests
to overwhelm the target service—in this case, the database server. As a result, the server
cannot handle legitimate requests from actual users and frequently crashes or becomes
unstable.
A DoS (Denial of Service) attack causes a database server to slow down and may render it
unavailable to all users. Even though a DoS attack does not reveal the contents of a database,
it can cost victims a significant amount of time and money. Moreover, what is the use of
having a database if you can't use or access it?
In a distributed denial of service (DDoS) attack, fake traffic is generated by a large number
of computers that are part of an attacker-controlled botnet. This results in extremely high
traffic volumes, which are difficult to stop without a highly scalable defensive architecture.
Cloud-based DDoS prevention services can dynamically scale up to deal with massive DDoS
attacks.
Countermeasures:
Harden the TCP/IP stack by increasing the size of the TCP connection queue with the
appropriate registry settings.
Reduce the time it takes to establish a connection.
Use dynamic backlog methods to ensure that the connection queue is never full.
Use an Intrusion Detection System (IDS) on your network.
5) Exploitation of Database Software Vulnerabilities
Attackers are continuously attempting to isolate and target software vulnerabilities, and
database management software is a particularly desirable target. New vulnerabilities are
identified on a daily basis, and security updates are issued regularly by all open-source
database management platforms and commercial database software manufacturers. However,
if you do not apply these changes immediately, your database may be vulnerable to attack.
Even if you deploy patches on time, there is always the risk of zero-day attacks, which occur
when attackers find a vulnerability that the database vendor has not yet found and patched.
Countermeasures:
Encrypt any sensitive information in your database (s).
Apply the necessary database controls and permissions.
Conduct a regular search for new sensitive data in your databases. You may
accomplish this very successfully with the Periodic Data Discovery tool and
Compliance Manager, which will automatically discover and secure newly uploaded
sensitive data.
6) Excessive Database Privileges
Database users in DBMS may have varying levels of access. However, users may abuse
them, and the three basic categories of privilege abuse are as follows: excessive privilege
abuse, legitimate privilege abuse, and unused privilege abuse. Excessive privileges always
introduce unnecessary risks. According to statistics, 80% of attacks on company databases
are carried out by current or former workers.
Countermeasures:
It is recommended that a strict access and privileges control policy be implemented
and enforced.
Don't give staff too many privileges, and revoke outdated privileges as soon as
possible.
7) Weak Audit Trail
If a database is not audited, it represents a risk of noncompliance with sensitive data
protection rules at the national and international levels. All database events must be
automatically logged and registered, and automatic auditing solutions must be used. Failure
or unwillingness to do so represents a major risk on multiple levels.
Countermeasures:
Use automatic auditing solutions that have no impact on database performance.
Control Measures for the Security of Data in Databases
The following are the key control measures used to ensure data security in databases:
Authentication
Authentication is the process of confirming whether a user logs in only with the rights
granted to him to undertake database operations. A certain user can only log in up to
his privilege level, but he cannot access any other sensitive data.
The ability to access sensitive data is restricted by the use of authentication. For
example, a mobile phone performs authentication by requesting a PIN, fingerprint, or
by face recognition. Similarly, a computer verifies a username by requesting the
appropriate password.
However, in the context of databases, authentication takes on a new dimension
because it can occur at multiple levels. It can be done by the database itself, or the
configuration can be adjusted to allow the operating system or another external means
to authenticate users.
For example, when creating a database in Microsoft's SQL Server a user must specify
whether to use database authentication, operating system authentication, or both (the
so-called mixed-mode authentication). Other databases that prioritize security use
near-foolproof authentication methods such as fingerprint recognition and retinal
scanning.
By using various authentication technologies such as biometrics for retina and
fingerprints, you can protect your data from unauthorized/malicious users.
Access Control
Database access control is a means of restricting access to sensitive company data to
only those people (database users) who are authorized to access such data and
permitting access to unauthorized persons. It is a key security concept that reduces risk
to the business or organization.
Physical and logical access control are the two types of access control. Access to
campuses, buildings, rooms, and physical IT assets is restricted through physical
access control. Connections to computer networks, system files, and data are restricted
through logical access control.
To safeguard a facility, corporations use electronic access control systems to track
employee access to restricted company locations and private regions, such as data
centers, using user credentials, access card readers, auditing, and reports. Some of
these systems include access control panels to restrict access to rooms and buildings,
as well as alarms and lockdown features to prevent unauthorized access or operations.
Logical access control systems execute user and entity identification, authentication,
and authorization by evaluating needed login credentials, which can include
passwords, personal identification numbers, biometric scans, security tokens, or other
authentication factors. Multifactor authentication (MFA), which needs two or more
authentication factors, is frequently used as part of a layered defense to safeguard
access control systems.
The most well-known Database Access Control examples are:
o Discretionary Access Control (DAC): The data owner grants access to DAC
models. DAC is a method for assigning access rights based on rules defined by
the user.
o Mandatory Access Control (MAC): In MAC, people are permitted access
based on an information clearance, designed using a nondiscretionary paradigm.
MAC refers to a policy that assigns access permissions based on central
authority regulations.
o Role-Based Access Control (RBAC): RBAC uses fundamental security
principles like “least privilege” and “separation of privilege” to give access
depending on a user’s role. As a result, someone wanting to access information
can only access the data required for their function.
o Attribute-Based Access Control (ABAC): Each resource and user in ABAC
receives a set of attributes. This dynamic approach makes a judgment on
resource access based on comparing the user’s features, such as time of day,
position, and location.
Inference Control
Inference control in databases, also known as Statistical Disclosure Control (SDC), is a
discipline that aims to secure data so that it can be published without disclosing sensitive
information associated with specific individuals among those to whom the data
corresponds.
It prevents the user from completing any inference channel. This strategy prevents
sensitive information from indirect disclosure. There are two kinds of inferences: identity
disclosure and attribute disclosure.
SDC is used to protect the privacy of respondents in fields such as government statistics,
health statistics, e-commerce (sharing of customer data), etc. Since data modification
ultimately implies data protection, SDC aims to achieve protection with the minimum
amount of accuracy loss for database users.
Flow Control
Distributed systems involve a large amount of data flow from one site to another as well
as within a site. Flow control prohibits data from being transferred in such a way that
unauthorized agents cannot access it.
A flow policy specifies the channels through which data can flow. It also defines security
classes for data as well as transactions. Convert channels are the pathways for
information to flow implicitly in ways that violate a company's privacy policy.
Database Security Applying Statistical Method
Statistical database security focuses on the protection of sensitive individual values stored
in so-called statistical databases and used for statistical purposes, as well as retrieving
summaries of values based on categories. They do not allow the retrieval of individual
information.
Examples include patient records utilized by medical researchers and extensive phone call
records statistically examined by phone companies to improve their services.
It provides access to the database to obtain statistical information about the number of
employees in the company but not to obtain detailed confidential/personal information
about a specific individual employee.
The techniques used to prevent statistical database compromise are classified into two
types: noise addition, in which all data and/or statistics are available but are only
approximate rather than exact, and restriction, in which the system only delivers statistics
and/or data that are deemed safe.
Encryption
Data encryption protects data confidentiality by converting it to encoded information
known as ciphertext, which can only be decoded with a unique decryption key generated
either during or before encryption.
Data encryption can be used during data storage or transfer, and it is usually used in
conjunction with authentication services to ensure that keys are only given to or used by
authorized users.
Data is more accessible and desirable to attackers than ever before, increasing the need
for security. Additionally, many firms must comply with data protection regulations,
many of which specifically require the use of encryption.
Aside from the clear advantages of enhanced security, privacy protection, and the
avoidance of illegal access, encryption also helps to assure data integrity. Encryption
protects content against unauthorized change and can be used to validate the origin and
authenticity of data.
Data Protection Tools and Platforms
Several companies now offer data protection platforms and tools. A comprehensive approach
should include the following features:
Discovery: The ability to discover is often required to meet regulatory compliance
requirements. Look for a solution that can discover and classify vulnerabilities in our
databases, whether they're in the cloud or on-premises. It will also offer recommendations
to address any discovered vulnerabilities.
Data Activity Monitoring: The solution should be capable of monitoring and analyzing
all data activity in all databases, regardless of whether our application is on-premises, in
the cloud, or inside a container. It will notify us of suspicious activities in real-time,
allowing us to respond to threats more rapidly. It also provides visibility into the current
state of our data via an integrated and comprehensive user interface. It is also important to
select a system that enforces regulations, procedures, and the division of roles. Be
sure that the solution we select can generate the reports we require to comply with the
rules.
Data Tokenization and Encryption: In the case of an incident, encryption serves as an
additional line of protection against any breach. Any software we deploy must be capable
of protecting data in the cloud, on-premises hybrid, or multi-cloud environments. Find a
tool that has volume, file, and application encryption features that meet our company's
compliance requirements. Tokenization (data concealment) or advanced key management
of security keys may be required.
Data Security and Risk Analysis Optimization: An application that provides contextual
insights by combining security data with advanced analytics will allow the user to
perform optimizing, risk assessment, and reporting a breeze. Select a technology that can
store and combine vast amounts of recent and historical data regarding the security and
health of your databases. Also, choose a solution that allows for data analysis, auditing,
and reporting capabilities via a comprehensive but user-friendly self-service interface.
Unit-6
An object-oriented database (OODB) is a database that combines object-oriented
programming concepts with relational database principles. It is managed by an object-
oriented database management system (OODBMS). OODBs contain the following elements:
Objects. The basic building block and an instance of a class. The type is either
built-in or user-defined.
Classes. A schema or blueprint that defines object structure and behavior.
Methods. A blueprint that defines the behavior of a class.
Pointers. An entity that helps access elements of an object database. They also
help establish relationships between objects.
The main feature of objects is the possibility of user-constructed types. An object created in
a programming language (such as Java) saves into an OODB as is. All the information about
an object is in a single package instead of spread across multiple tables.
Object-Oriented Programming Concepts in Database Management
Object-oriented databases closely relate to object-oriented programming (OOP) concepts.
The four main features of OOP are:
Polymorphism.
Inheritance.
Encapsulation.
Abstraction.
These four attributes describe the main characteristics of object-oriented management
systems in general. The sections below explain each of these concepts in greater depth.
Polymorphism
Polymorphism is the capability of an object to take multiple forms. This ability allows the
same program code to work with different data types.
To illustrate, a Vehicle class can be defined to have a method called brake(). A Car and
a Bike can both inherit from the class and implement their version of the brake() method.
The same method is applied for different behaviors, resulting in polymorphism.
Inheritance
Inheritance creates a hierarchical relationship between related classes while making parts of
code reusable. Defining new types inherits all the existing class fields and methods plus
further extends them. The existing class is the parent class, while the child class extends the
parent.
For example, a parent class called Vehicle can have child classes such as Car and Bike.
Both child classes inherit information from the parent class. They also extend the parent
class with new information depending on the methods defined for each vehicle type.
Encapsulation
Encapsulation allows grouping variables and methods into a single object to create access
protection. This process hides information and details of how an object works from the rest
of the code and results in data and function security.
Classes interact with each other through interface methods without the need to know how
particular methods work.
For example, a Car class can have properties such as color, make, and model and methods
such as changeColor(). You can change the color of a car through a method, yet
the model and make are not accessible. Encapsulation bundles all the car information into
one entity, where some elements are modifiable while some are not.
Abstraction
Abstraction is the process of focusing on the essential characteristics to
provide functionality. The process selects vital information while unnecessary details stay
hidden. Abstraction helps reduce data complexity and simplifies code reusability.
For example, when a web browser connects to the internet, it doesn't need to know the
specific connection details. Whether the connection is established through Wi-
Fi or Ethernet is irrelevant. The specific connection type is hidden from the browser to
create an abstraction, whereas the various types of connections represent different
implementations of the abstraction.
Features of Object-Oriented Databases
OODB has different features and implementations. Most contain the attributes described in
the table below:
Feature Description
Query Language Finds objects and retrieve data from the database.
ACID Ensures that ACID transactions, and guarantees all transactions are
Transactions completed without conflicting changes.
Basis of
Comparison Data Warehousing Data Mining
A data warehouse is a
database system that is
Data mining is the process of analyzing
Definition designed for analytical
data patterns.
analysis instead of
transactional work.
Subject-oriented,
AI, statistics, databases, and machine
integrated, time-varying and
Functionality learning systems are all used in data
non-volatile constitute data
mining technologies.
warehouses.