Understanding the ER Model in Databases
Understanding the ER Model in Databases
Sign In
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Introduction of ER Model
Last Updated : 09 Sep, 2025
Components of ER Diagram
Gather the requirements (functional and data) by asking questions to the database
users.
Create a logical or conceptual design of the database. This is where ER model plays a
role. It is the most used graphical representation of the conceptual design of a database.
After this, focus on Physical Database Design (like indexing) and external design (like
views)
What is an Entity?
An Entity represents a real-world object, concept or thing about which data is stored in a
database. It act as a building block of a database. Tables in relational database represent
these entities.
Example of entities:
The entity type defines the structure of an entity, while individual instances of that type
represent specific entities.
Entity Set
We can represent the entity sets in an ER Diagram but we can't represent individual
entities because an entity is like a row in a table, and an ER diagram shows the structure
and relationships of data, not specific data entries (like rows and columns). An ER diagram
is a visual representation of the data model, not the actual data itself.
Types of Entity
There are two main types of entities:
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute that can uniquely identify each
instance of the entity. A Strong Entity does not depend on any other Entity in the Schema
for its identification. It has a primary key that ensures its uniqueness and is represented by
a rectangle in an ER diagram.
2. Weak Entity
A Weak Entity cannot be uniquely identified by its own attributes alone. It depends on a
strong entity to be identified. A weak entity is associated with an identifying entity (strong
entity), which helps in its identification. A weak entity are represented by a double
rectangle. The participation of weak entity types is always total. The relationship between
the weak entity type and its identifying strong entity type is called identifying relationship
and it is represented by a double diamond.
Example:
Attributes in ER Model
Attributes are the properties that define the entity type. For example, for a Student entity
Roll_No, Name, DOB, Age, Address, and Mobile_No are the attributes that define entity
type Student. In ER diagram, the attribute is represented by an oval.
Attribute
Types of Attributes
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key
attribute. For example, Roll_No will be unique for each student. In ER diagram, the key
attribute is represented by an oval with an underline.
Key Attribute
2. Composite Attribute
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No
(can be more than one for a given student). In ER diagram, a multivalued attribute is
represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a
derived attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute
is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following
relationship set depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in
C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. Ternary Relationship: When there are three entity sets participating in a relationship,
the relationship is called a ternary relationship.
4. N-ary Relationship: When there are n entities set participating in a relationship, the
relationship is called an n-ary relationship.
Cardinality in ER Model
The maximum number of times an entity of an entity set participates in a relationship set is
known as cardinality.
1. One-to-One
When each entity in each entity set can take part only once in the relationship, the
cardinality is one-to-one. Let us assume that a male can marry one female and a female
can marry one male. So the relationship will be one-to-one.
2. One-to-Many
In one-to-many mapping as well where each entity can be related to more than one entity.
Let us assume that one surgeon department can accommodate many doctors. So the
Cardinality will be 1 to M. It means one department has many Doctors.
3. Many-to-One
When entities in one entity set can take part only once in the relationship set and entities in
other entity sets can take part more than once in the relationship set, cardinality is many to
one.
Let us assume that a student can take only one course but one course can be taken by
many students. So the cardinality will be n to 1. It means that for one course there can be n
students but for one student, there will be only one course.
In this case, each student is taking only 1 course but 1 course has been taken by many
students.
4. Many-to-Many
When entities in all entity sets can take part more than once in the relationship cardinality
is many to many. Let us assume that a student can take more than one course and one
course can be taken by many students. So the relationship will be many to many.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3,
and S4. So it is many-to-many relationships.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation: Each entity in the entity set must participate in the relationship. If
each student must enroll in a course, the participation of students will be total. Total
participation is shown by a double line in the ER diagram.
2. Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the students, the participation in the
course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Every student in the Student Entity set participates in a relationship but there exists a
course C4 that is not taking part in the relationship.
2. Identify Relationships: The next step is to identify the relationship between them and
represent them accordingly using the Diamond shape. Ensure that relationships are not
directly connected to each other.
3. Add Attributes: Attach attributes to the entities by using ovals. Each entity can have
multiple attributes (such as name, age, etc.), which are connected to the respective entity.
4. Define Primary Keys: Assign primary keys to each entity. These are unique identifiers
that help distinguish each instance of the entity. Represent them with underlined attributes.
6. Review for Clarity: Review the diagram make sure it is clear and effectively conveys the
relationships between the entities.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and
Generalization and Specialization are two essential ideas used to describe the hierarchical
connections between things in a database in the context of Enhanced Entity-Relationship
(EER) diagrams. The aforementioned principles facilitate the organization and structuring
of data by building connections among various entity levels. While specialization is the act
of breaking down a higher-level entity into more focused, lower-level entities,
generalization is integrating lower-level entities into a higher-level entity. Comprehending
these ideas is essential for efficient database administration and design.
What is Generalization?
In EER diagrams, generalization is a bottom-up method used to combine lower-level
entities into a higher-level object. This approach creates a more generic entity, known as a
superclass, by combining entities with similar features. By removing duplication and
arranging the data in a more organized manner, generalization streamlines the data model.
Advantages of Generalization
Disadvantages of Generalization
Loss of Specificity: The generic entity may take center stage over the distinctive
qualities of lower-level entities.
Complexity of Querying: As data becomes more abstracted, queries may get more
complicated.
Example of Generalization
Consider two entities Student and Patient. These two entities will have some
characteristics of their own. For example, the Student entity will have Roll_No, Name, and
Mob_No while the patient will have PId, Name, and Mob_No characteristics. Now in this
example Name and Mob_No of both Student and Patient can be combined as a Person to
form one higher-level entity and this process is called as Generalization Process.
What is Specialization?
In EER diagrams, specialization is a top-down method where a higher-level entity is split
into two or more lower-level entities according to their distinct qualities. This technique,
which includes splitting a single entity set into subgroups, is often connected to
inheritance, in which attributes from the higher-level entity are passed down to the lower-
level entities.
Advantages of Specialization
Disadvantages of Specialization
Expands Schema Size: Adding additional entities may lead to an increase in the
schema's complexity and size.
Can Cause Redundancy: There might be certain characteristics that are duplicated
across specialized entities.
Example of Specialization
Consider an entity Account. This will have some attributes consider them Acc_No and
Balance. Account entity may have some other attributes like Current_Acc and
Savings_Acc. Now Current_Acc may have Acc_No, Balance and Transactions while
Savings_Acc may have Acc_No, Balance and Interest_Rate henceforth we can say that
specialized entities inherits characteristics of higher level entity.
After applying generalization and specialization, the structure of resultant figures are
same.
Conclusion
In database design, generalization and specialization are both crucial ideas for producing
effective and well-organized data structures. Generalization reduces duplication and
simplifies the schema by combining related things into a higher-level object. To improve
specificity and facilitate inheritance, specialization, on the other hand, splits a higher-level
object into smaller, more focused entities. Although they work in different ways—top-
down for specialization and bottom-up for generalization—the end objective is to develop
an adaptable and effective data model that satisfies certain organizational requirements.
Explore
Basics of DBMS
Relational Algebra
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Enhanced ER Model
Last Updated : 09 Sep, 2025
As data complexity grows, the traditional ER model becomes less effective for database
modeling. Enhanced ER diagrams extend the basic ER model to better represent complex
applications. They support advanced concepts like subclasses, generalization,
specialization, aggregation and categories.
ER Model
Entities in a database.
Attributes that they had.
Relationships between them.
Superclass is a higher-level entity set that has common attributes. Subclass is a lower-
level entity set that inherits attributes and relationships from its superclass but also has its
own specific attributes or relationships.. This supports the concept of inheritance, where a
subclass automatically possesses the features of the superclass.
Example: Science is a Super class which has subclasses like Physics, Chemistry,
Biology.
Example: Here we have three sets of employees: Secretary, Technician and Engineer. The
employee is a super-class of the rest three sets of individual sub-class is a subset of
Employee set.
Employee Set
Enhanced ER Model
Total: Every entity in the superclass must be in at least one subclass (e.g., every
employee is either salaried or hourly).
Partial: Some entities may not belong to any subclass (e.g., not all employees are a
secretary, engineer or technician).
Total subclassing means complete coverage while, partial means incomplete coverage.
A Category (or Union Type) is a subclass that is derived from two or more superclasses
that may not be related. It allows the model to represent an entity that can be a member of
more than one entity set.
Example: Set of Library Members is UNION of Faculty, Student and Staff. A union
relationship indicates either type; for example, a library member is either Faculty or Staff or
Student. Below are two examples that show how UNION can be depicted in ERD - Vehicle
Owner is UNION of PERSON and Company andRTO Registered Vehicle is UNION of Car
and Truck.
There might be some confusion in Sub-class and UNION. Consider an example in above
figure Vehicle is super-class of CAR and Truck. In the example, Vehicle is a superclass of
Car and Truck, which normally implies inheritance of attributes. However, in the RTO-
registered case, Car and Truck form a union without inheriting from Vehicle, each has
independent attributes.
In the EER model, subclasses inherit all attributes and relationships of their superclasses.
This supports reusability and data consistency, as common attributes don’t need to be
redefined. An entity can be a sub-class of multiple entity types such entities are sub-class
of multiple entities and have multiple super-classes. In multiple inheritances, attributes of
sub-class are the union of attributes of all super-classes.
Example: If Employee has attributes like Name and ID, all subclasses like Manager or
Engineer will automatically have these, in addition to their own unique attributes (like
Department or Project).
Subtypes and Supertypes: The EER model allows for the creation of subtypes and
supertypes where a supertype represents general attributes and subtypes represent
specialized entities .
Generalization and Specialization: Generalization is the process of identifying common
attributes and combines common features into a supertype, while Specialization is the
process of defining subtypes with unique attributes from a supertype.
Inheritance: Inheritance is a mechanism that allows subtypes to inherit attributes and
relationships from their supertype.
Constraints: The EER model allows for the specification of constraints that must be
satisfied by entities and relationships.
Subclasses and Superclasses: EER model allows for the creation of a hierarchical
structure of entities.
Attribute Inheritance: EER model allows attributes to be inherited from a superclass to
its subclasses.
Union Types: EER model allows for the creation of a union type, which is a combination
of two or more entity types.
Aggregation: EER model allows for the creation of an aggregate entity that represents a
group of entities as a single entity.
Multi-valued Attributes: EER model allows an attribute to have multiple values for a
single entity instance.
Relationships with Attributes: EER model allows relationships between entities to
have attributes. These attributes can describe the nature of the relationship or provide
additional information about the relationship.
Overall, these features make the EER model more expressive and powerful than the
traditional ER model, allowing a more accurate representation of complex relationships
between entities.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
The relationship between sub class and super class is denoted with symbol.
1. Super Class
Super class is an entity type that has a relationship with one or more subtypes.
An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class
Sub class is a group of entities with unique attributes.
Sub class inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.
Hence, as part of the Enhanced ER Model, along with other improvements, three new concepts
were added to the existing ER Model, they were:
1. Generalization
2. Specialization
3. Aggregration
1. Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form
a higher level entity. In generalization, the higher level entity can also combine with other lower
level entities to make further higher level entity.
It's more like Superclass and Subclass system, but the only difference is the approach,
which is bottom-up. Hence, entities are combined to form a more generalised entity, in other
words, sub-classes are combined to form a super-class.
For example, Saving and Current account types entities can be generalised and an entity with
name Account can be created, which covers both.
2. Specialization
Specialization is a process that defines a group entities which is divided into sub groups
based on their characteristic.
It is a top down approach, in which one higher entity can be broken down into two
lower level entity.
It maximizes the difference between the members of an entity by identifying the
unique characteristic or attributes of each member.
It defines one or more sub class for the super class and also forms the
superclass/subclass relationship.
3. Aggregation
Aggregation is a process that represent a relationship between a whole object and its
component parts.
It abstracts a relationship between objects and viewing the relationship as an object.
It is a process when two entity is treated as a single entity.
In the above example, the relation between College and Course is acting as an Entity in
Relation with Student. In the diagram above, the relationship between Center and Course
together, is acting as an Entity, which is in relationship with another entity Visitor. Now in real
world, if a Visitor or a Student visits a Coaching Center, he/she will never enquire about the
center only or just about the course, rather he/she will ask enquire about both.
Category or Union
Category represents a single super class or sub class relationship with more than one super
class.
It can be a total or partial participation.
For example Car booking, Car owner can be a person, a bank (holds a possession on a Car) or a
company. Category (sub class) → Owner is a subset of the union of the three super classes →
Company, Bank, and Person. A Category member must exist in at least one of its super classes.
An entity belonging to a sub-class is related to some super-class entity. For instance emp,
no 1001 is a secretary, and his typing speed is 68. Emp no 1009 is an engineer (sub-class)
and her trade is “Electrical”, so forth.
Sub-class entity “inherits” all attributes of super-class; for example, employee 1001 will
have attributes eno, name, salary, and typing speed.
Databases SQL MySQL PostgreSQL PL/SQL MongoDB SQL Cheat Sheet SQL Interview Questions MySQL Interview Questions PL/SQL Interview Questions
The SQL UNION operator is used to combine the result sets of two or more SELECT
queries into a single result set. It is a powerful tool in SQL that helps aggregate data from
multiple tables, especially when the tables have similar structures.
In this guide, we'll explore the SQL UNION operator, how it differs from UNION ALL, and
provide detailed examples to demonstrate its usage.
There are some rules for using the SQL UNION operator.
Each table used within UNION must have the same number of columns.
The columns must have the same data types.
The columns in each table must be in the same order.
Syntax:
UNION operator provides unique values by default. To find duplicate values, use UNION
ALL.
Note: SQL UNION and UNION ALL difference is that UNION operator removes
duplicate rows from results set and
UNION ALL operator retains all rows, including duplicate.
Emp1 Table
Output:
Emp1 Table
Emp2 Table
Output:
Emp2 Table
In this example, we will find the cities (only unique values) from both the "Table1" and the
"Table2" tables:
Query:
Output:
output
In the below example, we will find the cities (duplicate values also) from both the "Emp1"
and the "Emp2" tables:
Query:
Output:
Country
Australia
Austria
England
France
India
India
Ireland
Spain
Spain
Sri lanka
The following SQL statement returns the cities (duplicate values also) from both the
"Geeks1" and the "Geeks2" tables:
Query:
Output:
output
The SQL UNION operator combines the result sets of two or more SELECT queries.
UNION returns unique rows, eliminating duplicate entries from the result set.
UNION ALL includes all rows, including duplicate rows.
Columns in the result set must be in the same order and have the same data types.
UNION is useful for aggregating data from multiple tables or applying different
filters to data from the same table.
Conclusion
The SQL UNION operator is a powerful tool for combining multiple SELECT statements into
one result set. Whether you need to eliminate duplicates or include them, UNION and
UNION ALL provide flexible options for aggregating data from multiple tables.
Understanding how and when to use these operators will make your SQL queries more
efficient and effective for data retrieval and analysis.
Explore
Basics
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Using the ER model for bigger data creates a lot of complexity while designing a database
model, So in order to minimize the complexity Generalization, Specialization and
Aggregation were introduced in the ER model. These were used for data abstraction. In
which an abstraction mechanism is used to hide details of a set of objects.
Generalization
Generalization is the process of extracting common properties from a set of entities and
creating a generalized entity from it. It is a bottom-up approach in which two or more
entities can be generalized to a higher-level entity if they have some attributes in common.
Generalization
Specialization
In specialization, an entity is divided into sub-entities based on its characteristics. It is a
top-down approach where the higher-level entity is specialized into two or more lower-
level entities.
Specialization
Inheritance
1. Attribute inheritance: It allows lower level entities to inherit the attributes of higher
level entities and vice versa. In diagram Car entity is an inheritance of Vehicle entity ,So
Car can acquire attributes of Vehicle. Example: Car can acquire Model attribute of
Vehicle.
2. Relationship Inheritance: Sub-entities also inherit relationships of the parent entity.
3. Overriding Inheritance: Sub-entities can override or add their own attributes or
behaviors different from the parent.
4. Participation inheritance: Participation inheritance in ER modeling refers to the
inheritance of participation constraints from a higher-level entity (superclass) to a lower-
level entity (subclass). It ensures that subclasses adhere to the same participation rules
in relationships, although attributes and relationships themselves are inherited
differently.
Example of Relation
Example: In diagram Vehicle entity has an relationship with Cycle entity, but it would not
automatically acquire the relationship itself with the Vehicle entity. Participation inheritance
only refers to the inheritance of participation constraints, not the actual relationships
between entities.
Aggregation
An ER diagram is not capable of representing the relationship between an entity and a
relationship which may be required in some scenarios. In those cases, a relationship with its
corresponding entities is aggregated into a higher-level entity. Aggregation is an
abstraction through which we can represent relationships as higher-level entity sets.
Aggregation
Example: an Employee working on a project may require some machinery. So, REQUIRE
relationship is needed between the relationship WORKS_FOR and entity MACHINERY.
Using aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is
aggregated into a single entity and relationship REQUIRE is created between the
aggregated entity and MACHINERY.
This schema includes: The primary key of the aggregated relationship schema.
The primary key of the associated entity it relates to.
Any additional descriptive attributes of this higher-level relationship.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Chapter 9 1
Figure 3.2 ER schema diagram for the company database.
N 1
Address Locations
Name WORKS_FOR Name
Sex Salary
Ssn
___
NumberOfEmployees DEPARTMENT
StartDate
EMPLOYEE
Bdate 1 1 1
MANAGES
CONTROLS
N
Hours
supervisor supervisee N
WORKS_ON PROJECT
1 SUPERVISION N 1
Name
Location
Number
______
DEPENDENTS_OF
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 1: For each regular entity type E
• Create a relation R that includes all the
simple attributes of E.
• Include all the simple component attributes
of composite attributes.
• Choose one of the key attributes of E as
primary key for R.
• If the chosen key of E is composite, the set
of simple attributes that form it will together
form the primary key of R.
Chapter 9 2
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 2: For each weak entity type W with
owner entity type E
• Create a relation R, and include all simple
attributes and simple components of
composite attributes of W as attributes of R.
• In addition, include as foreign key attributes
of R the primary key attribute(s) of the
relation(s) that correspond to the owner
entity type(s).
Chapter 9 3
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 3: For each binary 1:1 relationship
type R
• Identify the relations S and T that correspond to
the entity types participating in R. Choose one of
the relations, say S, and include as foreign key in
S the primary key of T.
• It is better to choose an entity type with total
participation in R in the role of S.
• Include the simple attributes of the 1:1
relationship type R as attributes of S.
• If both participations are total, we may merge the
two entity types and the relationship into a single
relation.
Chapter 9 4
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 4: For each regular binary 1:N
relationship type R
• Identify the relation S that represents the
participating entity type at the N-side of the
relationship type.
• Include as foreign key in S the primary key
of the relations T that represents the other
entity type participating in R.
• Include any simple attributes of the 1:N
relationship type as attributes of S.
Chapter 9 5
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 5: For each binary M:N relationship
type R
• Create a new relation S to represent R.
• Include as foreign key attributes in S the
primary keys of the relations that represent
the participating entity types; their
combination will form the primary key of S.
• Also, include any simple attributes of the
M:N relationship type as attributes of S.
Chapter 9 6
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 6: For each multi-valued attribute A
• Create a new relation R that includes an
attribute corresponding to A plus the
primary key attribute K (as a foreign key in
R) of the relation that represents the entity
type or relationship type that has A as an
attribute.
• The primary key of R is the combination of
A and K. If a multi-valued attribute is
composite, we include its components.
Chapter 9 7
Figure 7.5 Schema diagram for the COMPANY relational
database schema; the primary keys are underlined.
EMPLOYEE
FNAME MINIT LNAME SSN BDATE ADDRESS SEX SALARY SUPERSSN DNO
DEPARTMENT
DEPT_LOCATIONS
DNUMBER DLOCATION
PROJECT
WORKS_ON
DEPENDENT
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Step 7: For each n-ary relationship type R,
n>2
• Create a new relation S to represent R.
• Include as foreign key attributes in the S the
primary keys of the relations that represent
the participating entity types.
• Also include any simple attributes of the n-
ary relationship types as attributes of S.
• The primary key for S is usually a
combination of all the foreign keys that
reference the relations representing the
participating entity types.
Chapter 9 8
TERNARY RELATIONSHIPS
Figure 9.1 Mapping the n-ary relationship type SUPPLY
from Figure 4.13(a).
SUPPLIER
SNAME
______
PROJECT
PROJNAME
__________
PART
PARTNO
_______
SUPPLY
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
• However, if the participation constraint
(min,max) of one of the entity types E
participating in the R has max =1, then the
primary key of S can be the single foreign
key attribute that references the relation E’
corresponding to E
• This is because , in this case, each entity e
in E will participate in at most one
relationship instance of R and hence can
uniquely identify that relationship instance.
Chapter 9 9
Step 8: To convert each super-class/sub-
class relationship into a relational schema
you must use one of the four options
available.
Chapter 9 10
Option 8A (multiple relation option):
Chapter 9 11
Option 8B (multiple relation option):
Chapter 9 12
Figure 9.2 Options for mapping specializations (or generalizations) to relations.
(a) Mapping the EER schema of Figure 4.4 to relations by using Option A. (b) Mapping the EER
schema of Figure 4.3(b) into relations by using Option B. (c) Mapping the EER schema of
Figure 4.4 by using Option C, with JobType playing the role of type attribute. (d) Mapping the EER
schema of Figure 4.5 by using Option D, with two Boolean type fields Mflag and Pflag.
(a) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType
(b) CAR
VehicleId LicensePlateNo Price MaxSpeed NoOfPassengers
TRUCK
VehicleId LicensePlateNo Price NoOfAxles Tonnage
(c) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType TypingSpeed TGrade EngType
(d) PART
PartNo Description MFlag DrawingNo ManufactureDate BatchNo PFlag SupplierName ListPrice
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Figure 9.2 Options for mapping specializations (or generalizations) to relations.
(a) Mapping the EER schema of Figure 4.4 to relations by using Option A. (b) Mapping the EER
schema of Figure 4.3(b) into relations by using Option B. (c) Mapping the EER schema of
Figure 4.4 by using Option C, with JobType playing the role of type attribute. (d) Mapping the EER
schema of Figure 4.5 by using Option D, with two Boolean type fields Mflag and Pflag.
(a) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType
(b) CAR
VehicleId LicensePlateNo Price MaxSpeed NoOfPassengers
TRUCK
VehicleId LicensePlateNo Price NoOfAxles Tonnage
(c) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType TypingSpeed TGrade EngType
(d) PART
PartNo Description MFlag DrawingNo ManufactureDate BatchNo PFlag SupplierName ListPrice
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Option 8c (Single Relation Option)
• Create a single relation L with attributes
Attrs(L) = {K, A1, …, An} U
{attributes of S1} U… U
{attributes of Sm} U {T}
and PK(L)=K
• This option is for specialization whose subclasses are
DISJOINT, and T is a type attribute that indicates the
subclass to which each tuple belongs, if any. This option
may generate a large number of null values.
• Not recommended if many specific attributes are defined
in subclasses (will result in many null values!)
Chapter 9 13
Figure 9.2 Options for mapping specializations (or generalizations) to relations.
(a) Mapping the EER schema of Figure 4.4 to relations by using Option A. (b) Mapping the EER
schema of Figure 4.3(b) into relations by using Option B. (c) Mapping the EER schema of
Figure 4.4 by using Option C, with JobType playing the role of type attribute. (d) Mapping the EER
schema of Figure 4.5 by using Option D, with two Boolean type fields Mflag and Pflag.
(a) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType
(b) CAR
VehicleId LicensePlateNo Price MaxSpeed NoOfPassengers
TRUCK
VehicleId LicensePlateNo Price NoOfAxles Tonnage
(c) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType TypingSpeed TGrade EngType
(d) PART
PartNo Description MFlag DrawingNo ManufactureDate BatchNo PFlag SupplierName ListPrice
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Option 8d (Single Relation Option)
• Create a single relation schema L with attributes
Attrs(L) = {K, A1, …, An} U
{attributes of S1} U… U
{attributes of Sm} U {T1, …, Tn}
and PK(L)=K
• This option is for specialization whose subclasses are
overlapping, and each Ti, 1 < i < m, is a Boolean attribute
indicating whether a tuple belongs to subclass Si.
• This option could be used for disjoint subclasses too.
Chapter 9 14
Figure 9.2 Options for mapping specializations (or generalizations) to relations.
(a) Mapping the EER schema of Figure 4.4 to relations by using Option A. (b) Mapping the EER
schema of Figure 4.3(b) into relations by using Option B. (c) Mapping the EER schema of
Figure 4.4 by using Option C, with JobType playing the role of type attribute. (d) Mapping the EER
schema of Figure 4.5 by using Option D, with two Boolean type fields Mflag and Pflag.
(a) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType
(b) CAR
VehicleId LicensePlateNo Price MaxSpeed NoOfPassengers
TRUCK
VehicleId LicensePlateNo Price NoOfAxles Tonnage
(c) EMPLOYEE
SSN FName MInit LName BirthDate Address JobType TypingSpeed TGrade EngType
(d) PART
PartNo Description MFlag DrawingNo ManufactureDate BatchNo PFlag SupplierName ListPrice
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Figure 9.3 Mapping the EER specialization lattice shown in Figure 4.7
using multiple options.
PERSON
SSN Name BirthDate Sex Address
EMPLOYEE
SSN Salary EmployeeType Position Rank PercentTime RAFlag TAFlag Project Course
ALUMNUS ALUMNUS_DEGREES
SSN SSN Year Degree Major
STUDENT
SSN MajorDept GradFlag UndergradFlag DegreeProgram Class StudAssistFlag
© Addison Wesley Longman, Inc. 2000, Elmasri/Navathe, Fundamentals of Database Systems, Third Edition
Option 8A for
•PERSON/{EMPLOYEE,ALUMNUS,STUDENT}
Option 8C for
•EMPLOYEE/{STAFF,FACULTY,STUDENT_ASSISTANT}
Option 8D for
•STUDENT_ASSISTANT/{RESEARCH_ASSISTANT,
TEACHING_ASSISTANT}
•STUDENT/{STUDENT_ASSISTANT}
•STUDENT/{GRADUATE_ASSISTANT,
UNDERGRADUATE_STUDENT}
Search... Sign In
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Case 1: Binary Relationship with 1:1 cardinality with total participation of an entity
First Convert each entity and relationship to tables. Person table corresponds to Person
Entity with key as Per-Id. Similarly Passport table corresponds to Passport Entity with key
as Pass-No. Has Table represents relationship between Person and Passport (Which
person has which passport). So it will take attribute Per-Id from Person and Pass-No from
Passport.
PR3 -
Table
1
As we can see from Table 1, each Per-Id and Pass-No has only one entry in Has Table. So
we can merge all three tables into 1 with attributes shown in Table 2. Each Per-Id will be
unique and not null. So it will be the key. Pass-No can’t be key because for some person, it
can be NULL.
Table 2
Case 2: Binary Relationship with 1:1 cardinality and partial participation of both
entities
A male marries 0 or 1 female and vice versa as well. So it is 1:1 cardinality with partial
participation constraint from both. First Convert each entity and relationship to tables. Male
table corresponds to Male Entity with key as M-Id. Similarly Female table corresponds to
Female Entity with key as F-Id. Marry Table represents relationship between Male and
Female (Which Male marries which female). So it will take attribute M-Id from Male and F-
Id from Female.
M1 - M1 F2 F1 -
M2 - M2 F1 F2 -
M3 - F3 -
Table 3
As we can see from Table 3, some males and some females do not marry. If we merge 3
tables into 1, for some M-Id, F-Id will be NULL. So there is no attribute which is always not
NULL. So we can’t merge all three tables into 1. We can convert into 2 tables. In table 4,
M-Id who are married will have F-Id associated. For others, it will be NULL. Table 5 will
have information of all females. Primary Keys have been underlined.
Table 4
Table 5
Note: Binary relationship with 1:1 cardinality will have 2 table if partial participation of both
entities in the relationship. If atleast 1 entity has total participation, number of tables
required will be 1.
In this scenario, every student can enroll only in one elective course but for an elective
course there can be more than one student. First Convert each entity and relationship to
tables. Student table corresponds to Student Entity with key as S-Id. Similarly
Elective_Course table corresponds to Elective_Course Entity with key as E-Id. Enrolls Table
represents relationship between Student and Elective_Course (Which student enrolls in
which course). So it will take attribute S-Id from Student and E-Id from Elective_Course.
S-Id Other Student Attribute S-Id E-Id E-Id Other Elective CourseAttribute
S1 - S1 E1 E1 -
S2 - S2 E2 E2 -
S3 - S3 E1 E3 -
S4 - S4 E1
Table 6
As we can see from Table 6, S-Id is not repeating in Enrolls Table. So it can be considered
as a key of Enrolls table. Both Student and Enrolls Table’s key is same. We can merge it as
a single table. The resultant tables are shown in Table 7 and Table 8. Primary Keys have
been underlined.
Table 7
Table 8
In this scenario, every student can enroll in more than 1 compulsory course and for a
compulsory course there can be more than 1 student. First Convert each entity and
relationship to tables. Student table corresponds to Student Entity with key as S-Id.
Similarly Compulsory_Courses table corresponds to Compulsory Courses Entity with key as
C-Id. Enrolls Table represents relationship between Student and Compulsory_Courses
(Which student enrolls in which course). So it will take attribute S-Id from Person and C-Id
from Compulsory_Courses.
S1 - S1 C1 C1 -
S2 - S1 C2 C2 -
S3 - S3 C1 C3 -
S4 - S4 C3 C4 -
S4 C2
S3 C3
Table 9
As we can see from Table 9, S-Id and C-Id both are repeating in Enrolls Table. But its
combination is unique; so it can be considered as a key of Enrolls table. All tables’ keys are
different, these can’t be merged. Primary Keys of all tables have been underlined.
In this scenario, an employee can have many dependents and one dependent can depend
on one employee. A dependent does not have any existence without an employee (e.g; you
as a child can be dependent of your father in his company). So it will be a weak entity and
its participation will always be total. Weak Entity does not have key of its own. So its key
will be combination of key of its identifying entity (E-Id of Employee in this case) and its
partial key (D-Name).
First Convert each entity and relationship to tables. Employee table corresponds to
Employee Entity with key as E-Id. Similarly Dependents table corresponds to Dependent
Entity with key as D-Name and E-Id. Has Table represents relationship between
Employee and Dependents (Which employee has which dependents). So it will take
attribute E-Id from Employee and D-Name from Dependents.
E1 - E1 RAM RAM E1 -
E2 - E1 SRINI SRINI E1 -
E3 - E2 RAM RAM E2 -
E3 ASHISH ASHISH E3 -
Table 10
As we can see from Table 10, E-Id, D-Name is key for Has as well as Dependents Table.
So we can merge these two into 1. So the resultant tables are shown in Tables 11 and 12.
Primary Keys of all tables have been underlined.
Table 11
Conclusion
Converting an ER diagram to a Relational Model is a crucial step in database design. The
ER model represents the conceptual structure, while the Relational Model is a physical
representation that can be directly implemented using a Relational Database Management
System (RDBMS) like Oracle or MySQL. We've explored how to convert ER diagrams to
Relational Models for different scenarios, including binary relationships with various
cardinalities and participation constraints. We've covered five cases, highlighting key
considerations and resulting table structures. By understanding these scenarios, database
designers and developers can effectively translate conceptual ER models into physical
Relational Models, ensuring successful database implementation using RDBMS. So,
mapping from ER Model to Relational Model is a vital skill, and we hope this article has
been helpful.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Advanced SQL in DBMS quickly save and access your notes anytime,
anywhere. 👉 Try it now
By Prakhar - April 14, 2025
Basic SQL provides a strong foundation to work with the databases, like inserting,
Recent Post
selecting, retrieve the data, whereas advanced SQL in DBMS helps in extending
the capabilities of the SQL and thus provides more sophisticated techniques,
features and functions to work with complex data. Top 30 DBMS Interview Questions in 2025
Blog 99
Advanced SQL is used to handle complex data through its advanced concepts and
many high capabilities of SQL.
Subqueries: Queries within the queries are called subqueries. It simply means
that the queries are nested in other queries. In Subqueries, the result or the
output of one query is used as the input of another query. It provides more
flexibility and thus enables more complex data manipulation and retrieval.
Subqueries can be used with INSERT, SELECT, DELETE and UPDATE statements.
Joins: Join function is used to combine the data from multiple tables based on
particular attributes. It is used to retrieve data from multiple tables based on
specific conditions. It gives a more comprehensive and meaningful result. There
are many types of joins such as INNER JOIN, RIGHT JOIN, LEFT JOIN and FULL
JOIN.
Views: These are called virtual tables, and are made or derived from the output
of the query. It is used to restrict sensitive and personal information and present
the data in a more user-friendly format.
Data Administrators and Developers can perform more complex operations with
data, and optimize the performance by using these components of the advanced
SQL. It can also help in creating more scalable and efficient database solutions.
Let’s see the list of some additional data types of the advanced SQL.
Arrays: It is the collection of multiple values of the same data types. It provides
efficient storage as retrieval of multiple related values happens at a time.
Examples are an array of integers, lists of tags, etc.
Binay Large Data (BLOB): This data type is used for storing very large data
objects such as images, videos, or documents. Advanced SQL supports BLOB
data type and can handle large data retrieval and manipulation easily. It uses
some specialized functions to perform any type of operation.
These additional data types may vary from the types of databases being used.
These data types are not supported by all the DBMSs.
SUM(): It is used to calculate the sum of the set of values in the query.
AVG(): It is used to calculate the average of the set of values in the query.
COUNT(): It is used to count the number of rows or all the not null values from
the set of values given.
MAX(): It is used to find the maximum value from the set of values.
MIN(): It is used to find the minimum values from the set of values.
2) String Functions: It operates on the character data, and is used for analyzing
and manipulating the string values. Some of the examples are:
SUBSTRING(): It is used to extract the specific portion from the given string.
TRIM(): It is used to remove the trailing and the leading spaces from the string.
ROUND(): It is used to round the given decimal value to the closest whole
number.
CASE WHEN THEN END: It is used to evaluate the multiple conditions using the
CASE WHEN statement, and give the specific result based on the conditions. It is
similar to the IF ELSE statements of the other programming language.
NULLIF(): It is used to compare two more expressions and returns null if the
expressions are equal otherwise returns the first expressions.
COALESCE(): It is used to return the first not null values from the list of
arguments passed.
Advanced SQL has a variety of powerful features that can be used by database
administrators, or developers to manipulate and retrieve complex data most easily.
It also offers a wide range of functions which helps perform various types of
calculations with large data and it is thus very helpful for business users.
Databases SQL MySQL PostgreSQL PL/SQL MongoDB SQL Cheat Sheet SQL Interview Questions MySQL Interview Questions PL/SQL Interview Questions
1. Aggregate Functions
Aggregate functions perform calculations on multiple rows and return a single value.
Example
Output :
Returns count of rows, average, total salary, min salary, and max salary.
2. Conditional Functions
Conditional functions help apply logic inside SQL queries.
Example
SELECT Name,
CASE WHEN Salary > 5000 THEN 'High'
ELSE 'Low' END AS Salary_Level
FROM Employees;
Output:
3. Mathematical Functions
Mathematical functions are used for numeric calculations. Some commonly used
mathematical functions are given below:
Example
Output :
15, 25.68, 8, 7
BIN()
SELECT BIN(18);
Output:
BINARY()
Output:
COALESCE()
SELECT COALESCE(NULL,NULL,'GeeksforGeeks',NULL,'Geeks');
Output:
CONNECTION_ID()
SELECT CONNECTION_ID();
Output:
CURRENT_USER()
Returns the user name and hostname for the MySQL account used by the server.
SELECT CURRENT_USER();
Output:
DATABASE()
SELECT DATABASE();
Output:
IF()
Output:
LAST_INSERT_ID()
Returns the first AUTO_INCREMENT value that was set by the most recent INSERT or
UPDATE statement
SELECT LAST_INSERT_ID();
Output:
NULLIF()
Output:
SESSION_USER()
Returns the user name and host name for the current MySQL user
SELECT SESSION_USER();
Output:
SYSTEM_USER()
Returns the user name and host name for the current MySQL user.
SELECT SYSTEM_USER();
Output:
USER()
It returns the user name and host name for the current MySQL user
SELECT USER();
Output:
VERSION()
SELECT VERSION();
Output:
Explore
Basics
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
File organization in DBMS refers to the method of storing data records in a file so they can
be accessed efficiently. It determines how data is arranged, stored, and retrieved from
physical storage.
we will be discussing each of the file Organizations in further sets of this article along with
the differences and advantages/ disadvantages of each file Organization method.
This method is quite simple, in which we store the records in a sequence i.e. one after the
other in the order in which they are inserted into the tables.
Insertion of the new record: Let the R1, R3, and so on up to R5 and R4 be four records in
the sequence. Here, records are nothing but a row in any table. Suppose a new record R2
has to be inserted in the sequence, then it is simply placed at the end of the file.
In this method, As the name itself suggests whenever a new record has to be inserted, it is
always inserted in a sorted (ascending or descending) manner. The sorting of records may
be based on any primary key or any other key.
Insertion of the new record: Let us assume that there is a preexisting sorted sequence of
four records R1, R3, and so on up to R7 and R8. Suppose a new record R2 has to be
inserted in the sequence, then it will be inserted at the end of the file and then it will sort
the sequence.
Insertion of the new record: Suppose we have four records in the heap R1, R5, R6, R4,
and R3, and suppose a new record R2 has to be inserted in the heap then, since the last
data block i.e data block 3 is full it will be inserted in any of the data blocks selected by the
DBMS, let's say data block 1.
If we want to search, delete or update data in the heap file Organization we will traverse
the data from the beginning of the file till we get the requested record. Thus if the
database is very huge, searching, deleting, or updating the record will take a lot of time.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Hashing is defined as a technique in DBMS that is used to search for records in databases that are very large or even small. In larger
Article Outline
databases, which contain thousands and millions of records, the indexing data structure technique becomes inefficient because searching
a specific record using indexing consumes more time. So, to counter this problem, hashing techniques are used. In this article, we will go
What is Hashing in DBMS? through various hashing techniques.
Hash Table: The total number of data records in the database determines the size of a hash table, which is an array or data structure.
Working of Hash Function The precise location of a data record is stored in each memory location in a hash table, which is referred to as a “bucket” or hash index
and is accessible via a hash function.
Types of Hashing in DBMS Bucket: In the hash table where the data record is stored, a bucket is a memory index. Typically, a disk block that holds numerous
records is stored in these buckets. Another name for it is the hash index.
Hash Function: A hash function is an algorithm or mathematical equation that computes the hash index as the output after receiving
Static Vs Dynamic Hashing in the main key of one data record as input.
DBMS
Conclusion Get curriculum highlights, career paths, industry insights and accelerate
your technology journey.
FAQs Download brochure
In this strategy, data is stored in blocks called addresses that are generated by the hashing process. These records are kept in memory at
locations called data buckets or data blocks.
In this instance, the address can be generated from any column value using a hash function. The hash function frequently generates the
address of the data block using the primary key. A hash function is a fundamental mathematical function to any sophisticated
mathematical function. The address of the data block, or any row that shares the same address as a main key within the data block, can
alternatively be thought of as the primary key.
The main key value in the image above corresponds to the data block addresses. An alternative to this hash function might be a
straightforward mathematical function, such as exponential, mod, cos, sin, and so forth. Let’s say we want to determine the address of the
data block using the mod (5) hash function. The mod (5) function is used in this situation to hash the primary keys, producing the results 3,
3, 1, 4, and 2, respectively. Records are then saved at those data block positions.
The main key value in the image above corresponds to the data block addresses. An alternative to this hash function might be a
straightforward mathematical function, such as exponential, mod, cos, sin, and so forth. Let’s say we want to determine the address of the
data block using the mod (5) hash function. The mod (5) function is used in this situation to hash the primary keys, producing the results 3,
3, 1, 4, and 2, respectively. Records are then saved at those data block positions.
Important Terminologies
Data Bucket
Data buckets are storage locations within a hash table where actual data records are kept. Each bucket can hold one or more records,
depending on the implementation.
Hash Function
A hash function is a mathematical algorithm that converts a given input (often a primary key) into a specific address within the hash table.
This address indicates where the corresponding data record is stored.
Hash Index
The hash index is the result produced by the hash function, representing the address of the data block within the hash table. It serves as a
quick reference to locate the desired data.
Linear Probing
Linear probing is a collision resolution technique used when the initial bucket calculated by the hash function is already occupied. It
sequentially checks the next available buckets in the hash table until an empty one is found.
Quadratic Probing
Quadratic probing is another collision resolution method similar to linear probing. However, instead of checking the next bucket linearly, it
uses a quadratic function to determine the next bucket to check, reducing clustering issues.
Bucket Overflow
Bucket overflow occurs when the bucket identified by the hash function is already full. This situation requires additional handling
strategies, such as linear or quadratic probing, to find an alternative bucket for storing the new record.
There isn’t a value that already occupies the generated hash index. Thus, this is where the data record’s address will be kept.
There is already another value occupying the hash index that was generated. This is referred to as a collision, so a collision resolution
technique will be used to combat it.
The hash function now applies whenever we query a specific record, returning the data record much faster than indexing because we can
use the hash function to find the exact location of the data record without having to search through all of the indices one by one.
Static Hashing
Dynamic Hashing
Fixed Number of Buckets: The number of data buckets remains constant throughout. Each bucket is a storage location where records
are stored.
Hash Function: A hash function is used to map search-key values to bucket addresses. For example, if the hash function is mod 5, it
will always map a given key to the same bucket address.
Insertion: When a record needs to be entered using a static hash, the hash function h determines the bucket address (where the record
will be stored) for search key K.
Variable Number of Buckets: Unlike static hashing, the number of buckets in dynamic hashing can grow or shrink based on the
number of records.
Directory: A directory is used to keep track of the buckets. The directory itself can grow or shrink dynamically.
Hash Function: The hash function generates a hash value, and the directory uses a certain number of bits from this hash value to
determine the bucket address.
Bucket Splitting: When a bucket overflows, it is split into two, and the directory is updated to reflect this change. This helps in
distributing the records more evenly.
Querying: Means looking at the depth value of the hash index and then using those to compute the bucket address.
Update: To perform a query as above and update the data.
Deletion: Perform a query to locate the desired data and delete the same.
Insertion: Compute the address of the bucket
If the bucket is already full.
Add more buckets.
Add additional bits to the hash value.
Re-compute the hash function.
Else
Add data to the bucket,
If all the buckets are full, perform the remedies of static hashing.
The overflow chain can grow rather lengthy if the hash table isn’t The advantage of dynamic hashing lies in its flexibility: you
optimized, which will lower performance and complicate scaling. don’t have to plan its size in advance
In cybersecurity, hashing plays a pivotal role in generating message digests, ensuring data integrity, and detecting tampering. Moreover,
during compiler operation, hashing helps expedite symbol table lookups and optimize code generation. Password verification is another
prime application, enhancing security by storing and comparing password hashes instead of the actual passwords.
Lastly, when linking file names and paths in file systems, hashing enables rapid access and reduces the need for exhaustive searches.
In these use cases, hashing offers a powerful solution to streamline operations and enhance both performance and security.
Conclusion
Hashing in databases is a technique that helps in speeding up data retrieval in large databases by using hash functions to search for
records. It overcomes the inefficiencies of traditional indexing methods. There are two main hashing methods Static and Dynamic, Static
uses fixed-sized tables while Dynamic adjusts itself based on the data volume. Each type has its advantages like static is simple and
dynamic is flexible. Hashing is widely used in areas that need fast access, security, and performance, like cybersecurity, graphics
processing, and file systems.
FAQs
What is a pointer in C?
A null pointer is a pointer that does not point to any valid memory location. It is often initialized with NULL to
signify that it is not currently pointing to a valid object.
Hashing is defined as a technique in DBMS that is used to search for records in databases that are very large or even small. In larger
Article Outline
databases, which contain thousands and millions of records, the indexing data structure technique becomes inefficient because searching
a specific record using indexing consumes more time. So, to counter this problem, hashing techniques are used. In this article, we will go
What is Hashing in DBMS? through various hashing techniques.
Hash Table: The total number of data records in the database determines the size of a hash table, which is an array or data structure.
Working of Hash Function The precise location of a data record is stored in each memory location in a hash table, which is referred to as a “bucket” or hash index
and is accessible via a hash function.
Types of Hashing in DBMS Bucket: In the hash table where the data record is stored, a bucket is a memory index. Typically, a disk block that holds numerous
records is stored in these buckets. Another name for it is the hash index.
Hash Function: A hash function is an algorithm or mathematical equation that computes the hash index as the output after receiving
Static Vs Dynamic Hashing in the main key of one data record as input.
DBMS
Conclusion Get curriculum highlights, career paths, industry insights and accelerate
your technology journey.
FAQs Download brochure
In this strategy, data is stored in blocks called addresses that are generated by the hashing process. These records are kept in memory at
locations called data buckets or data blocks.
In this instance, the address can be generated from any column value using a hash function. The hash function frequently generates the
address of the data block using the primary key. A hash function is a fundamental mathematical function to any sophisticated
mathematical function. The address of the data block, or any row that shares the same address as a main key within the data block, can
alternatively be thought of as the primary key.
The main key value in the image above corresponds to the data block addresses. An alternative to this hash function might be a
straightforward mathematical function, such as exponential, mod, cos, sin, and so forth. Let’s say we want to determine the address of the
data block using the mod (5) hash function. The mod (5) function is used in this situation to hash the primary keys, producing the results 3,
3, 1, 4, and 2, respectively. Records are then saved at those data block positions.
The main key value in the image above corresponds to the data block addresses. An alternative to this hash function might be a
straightforward mathematical function, such as exponential, mod, cos, sin, and so forth. Let’s say we want to determine the address of the
data block using the mod (5) hash function. The mod (5) function is used in this situation to hash the primary keys, producing the results 3,
3, 1, 4, and 2, respectively. Records are then saved at those data block positions.
Important Terminologies
Data Bucket
Data buckets are storage locations within a hash table where actual data records are kept. Each bucket can hold one or more records,
depending on the implementation.
Hash Function
A hash function is a mathematical algorithm that converts a given input (often a primary key) into a specific address within the hash table.
This address indicates where the corresponding data record is stored.
Hash Index
The hash index is the result produced by the hash function, representing the address of the data block within the hash table. It serves as a
quick reference to locate the desired data.
Linear Probing
Linear probing is a collision resolution technique used when the initial bucket calculated by the hash function is already occupied. It
sequentially checks the next available buckets in the hash table until an empty one is found.
Quadratic Probing
Quadratic probing is another collision resolution method similar to linear probing. However, instead of checking the next bucket linearly, it
uses a quadratic function to determine the next bucket to check, reducing clustering issues.
Bucket Overflow
Bucket overflow occurs when the bucket identified by the hash function is already full. This situation requires additional handling
strategies, such as linear or quadratic probing, to find an alternative bucket for storing the new record.
There isn’t a value that already occupies the generated hash index. Thus, this is where the data record’s address will be kept.
There is already another value occupying the hash index that was generated. This is referred to as a collision, so a collision resolution
technique will be used to combat it.
The hash function now applies whenever we query a specific record, returning the data record much faster than indexing because we can
use the hash function to find the exact location of the data record without having to search through all of the indices one by one.
Static Hashing
Dynamic Hashing
Fixed Number of Buckets: The number of data buckets remains constant throughout. Each bucket is a storage location where records
are stored.
Hash Function: A hash function is used to map search-key values to bucket addresses. For example, if the hash function is mod 5, it
will always map a given key to the same bucket address.
Insertion: When a record needs to be entered using a static hash, the hash function h determines the bucket address (where the record
will be stored) for search key K.
Variable Number of Buckets: Unlike static hashing, the number of buckets in dynamic hashing can grow or shrink based on the
number of records.
Directory: A directory is used to keep track of the buckets. The directory itself can grow or shrink dynamically.
Hash Function: The hash function generates a hash value, and the directory uses a certain number of bits from this hash value to
determine the bucket address.
Bucket Splitting: When a bucket overflows, it is split into two, and the directory is updated to reflect this change. This helps in
distributing the records more evenly.
Querying: Means looking at the depth value of the hash index and then using those to compute the bucket address.
Update: To perform a query as above and update the data.
Deletion: Perform a query to locate the desired data and delete the same.
Insertion: Compute the address of the bucket
If the bucket is already full.
Add more buckets.
Add additional bits to the hash value.
Re-compute the hash function.
Else
Add data to the bucket,
If all the buckets are full, perform the remedies of static hashing.
The overflow chain can grow rather lengthy if the hash table isn’t The advantage of dynamic hashing lies in its flexibility: you
optimized, which will lower performance and complicate scaling. don’t have to plan its size in advance
In cybersecurity, hashing plays a pivotal role in generating message digests, ensuring data integrity, and detecting tampering. Moreover,
during compiler operation, hashing helps expedite symbol table lookups and optimize code generation. Password verification is another
prime application, enhancing security by storing and comparing password hashes instead of the actual passwords.
Lastly, when linking file names and paths in file systems, hashing enables rapid access and reduces the need for exhaustive searches.
In these use cases, hashing offers a powerful solution to streamline operations and enhance both performance and security.
Conclusion
Hashing in databases is a technique that helps in speeding up data retrieval in large databases by using hash functions to search for
records. It overcomes the inefficiencies of traditional indexing methods. There are two main hashing methods Static and Dynamic, Static
uses fixed-sized tables while Dynamic adjusts itself based on the data volume. Each type has its advantages like static is simple and
dynamic is flexible. Hashing is widely used in areas that need fast access, security, and performance, like cybersecurity, graphics
processing, and file systems.
FAQs
What is a pointer in C?
A null pointer is a pointer that does not point to any valid memory location. It is often initialized with NULL to
signify that it is not currently pointing to a valid object.
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Hashing in DBMS
Last Updated : 31 Jul, 2025
The hashing technique utilizes an auxiliary hash table to store the data records using a
hash function. There are 3 key components in hashing:
Hash Table: A hash table is an array or data structure and its size is determined by the
total volume of data records present in the database. Each memory location in a hash
table is called a 'bucket' or hash indices and stores a data record's exact location and
can be accessed through a hash function.
Bucket: A bucket is a memory location (index) in the hash table that stores the data
record. These buckets generally store a disk block which further stores multiple records.
It is also known as the hash index.
Hash Function: A hash function is a mathematical equation or algorithm that takes one
data record's primary key as input and computes the hash index as output.
Hash Function
A hash function is a mathematical algorithm that computes the index or the location where
the current data record is to be stored in the hash table so that it can be accessed
efficiently later. This hash function is the most crucial component that determines the speed
of fetching data.
The hash function generates a hash index through the primary key of the data record.
1. The hash index generated isn't already occupied by any other value. So, the address of
the data record will be stored here.
2. The hash index generated is already occupied by some other value. This is called
collision so to counter this, a collision resolution technique will be applied.
3. Now whenever we query a specific record, the hash function will be applied and returns
the data record comparatively faster than indexing because we can directly reach the
exact location of the data record through the hash function rather than searching through
indices one by one.
Example:
Hashing
1. Static Hashing
In static hashing, the hash function always generates the same bucket's address. For
example, if we have a data record for employee_id = 107, the hash function is mod-5 which
is - H(x) % 5, where x = id. Then the operation will take place like this:
H(106) % 5 = 1.
This indicates that the data record should be placed or searched in the 1st bucket (or
1st hash index) in the hash table.
Example:
The primary key is used as the input to the hash function and the hash function generates
the output as the hash index (bucket's address) which contains the address of the actual
data record on the disk block.
To resolve this problem of bucket overflow, techniques such as - chaining and open
addressing are used. Here's a brief info on both:
Given:
Step-by-step hashing:
10 0 Bucket 0 → [10]
11 1 Bucket 1 → [11]
0 10 → 15 → 20 → 25 → 30
1 11
2 --
3 --
4 --
Key Points:
All keys that hash to the same index (like 10, 15, 20, etc.) are stored in a linked list at
that index.
Separate chaining avoids clustering and makes insertion easier.
Efficient when hash table load factor is high.
However, this will give rise to the problem bucket skew that is, if the hash function keeps
generating the same value again and again then the hashing will become inefficient as the
remaining data buckets will stay unoccupied or store minimal data.
Example:
Step-by-step insertion:
50 50 % 7 = 1 1 No 1
700 700 % 7 = 0 0 No 0
76 76 % 7 = 6 6 No 6
Index 0 1 2 3 4 5 6
Value 700 50 85 92 73 -- 76
4. Dynamic Hashing
Dynamic hashing is also known as extendible hashing, used to handle database that
frequently changes data sets. This method offers us a way to add and remove data
buckets on demand dynamically. This way as the number of data records varies, the
buckets will also grow and shrink in size periodically whenever a change is made.
Example: If global depth: k = 2, the keys will be mapped accordingly to the hash index. K
bits starting from LSB will be taken to map a key to the buckets. That leaves us with the
following 4 possibilities: 00, 11, 10, 01.
As we can see in the above image, the k bits from LSBs are taken in the hash index to map
to their appropriate buckets through directory IDs. The hash indices point to the directories,
and the k bits are taken from the directories' IDs and then mapped to the buckets. Each
bucket holds the value corresponding to the IDs converted in binary.
Article Tags : DBMS Geeks Premier League Geeks Premier League 2023
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Indexing in Databases
Last Updated : 31 Jul, 2025
Indexing in DBMS is used to speed up data retrieval by minimizing disk scans. Instead of
searching through all rows, the DBMS uses index structures to quickly locate data using key
values.
When an index is created, it stores sorted key values and pointers to actual data rows. This
reduces the number of disk accesses, improving performance especially on large datasets.
Attributes of Indexing
Several Important attributes of indexing affect the performance and efficiency of database
operations:
1. Access Types: This refers to the type of access such as value-based search, range
access, etc.
2. Access Time: It refers to the time needed to find a particular data element or set of
elements.
3. Insertion Time: It refers to the time taken to find the appropriate space and insert new
data.
4. Deletion Time: Time taken to find an item and delete it as well as update the index
structure.
5. Space Overhead: It refers to the additional space required by the index.
In this type of organization, the indices are based on a sorted ordering of the values. These
are generally fast and a more traditional type of storing mechanism. These Ordered or
Sequential file organizations might store the data in a dense or sparse format.
i. Dense Index: Every search key value in the data file corresponds to an index record. This
method ensures that each key value has a reference to its data location.
Example: If a table contains multiple entries for the same key, a dense index ensures that
each key value has its own index record.
Dense Index
ii. Sparse Index: The index record appears only for a few items in the data file. Each item
points to a block as shown. To locate a record, we find the index record with the largest
search key value less than or equal to the search key value we are looking for.
Access Method: To locate a record, we find the index record with the largest key value less
than or equal to the search key, and then follow the pointers sequentially.
Access Cost = log 2 (n) + 1 , where n is the number of blocks involved in the index
file.
Sparse Index
1. Clustered Indexing
Clustered Indexing stores related records together in the same file, reducing search time
and improving performance, especially for join operations. Data is stored in sorted order
based on a key (often a non-primary key) to group similar records, like students by
semester. If the indexed column isn't unique, multiple columns can be combined to form a
unique key. This makes data retrieval faster by keeping related records close and allowing
quicker access through the index.
Clustered Indexing
2. Primary Indexing
This is a type of Clustered Indexing wherein the data is sorted according to the search key
and the primary key of the database table is used to create the index. It is a default format
of indexing where it induces sequential file organization. As primary keys are unique and
are stored in a sorted manner, the performance of the searching operation is quite
efficient.
Key Features: The data is stored in sequential order, making searches faster and more
efficient.
A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual
pointers or references to the location where the data is actually stored. Data is not
physically stored in the order of the index. Instead, data is present in leaf nodes.
Example: The contents page of a book. Each entry gives us the page number or location of
the information stored. The actual data here(information on each page of the book) is not
organized but we have an ordered reference(contents page) to where the data points
actually lie. We can have only dense ordering in the non-clustered index as sparse ordering
is not possible because data is not physically organized accordingly.
It requires more time as compared to the clustered index because some amount of extra
work is done in order to extract the data by further following the pointer. In the case of a
clustered index, data is directly present in front of the index.
4. Multilevel Indexing
With the growth of the size of the database, indices also grow. As the index is stored in the
main memory, a single-level index might become too large a size to store with multiple
disk accesses. The multilevel indexing segregates the main block into various smaller
blocks so that the same can be stored in a single block.
The outer blocks are divided into inner blocks which in turn are pointed to the data blocks.
This can be easily stored in the main memory with fewer overheads. This hierarchical
approach reduces memory overhead and speeds up query execution.
Multilevel Indexing
Advantages of Indexing
Faster Queries: Indexes allow quick search of rows matching specific values, speeding
up data retrieval.
Efficient Access: Reduces disk I/O by keeping frequently accessed data in memory.
Improved Sorting: Speeds up sorting by indexing the relevant columns.
Consistent Performance: Maintains query speed even as data grows.
Data Integrity: Ensures uniqueness in columns indexed as unique, preventing duplicate
entries.
Disadvantages of Indexing
While indexing offers many advantages, it also comes with certain trade-offs:
Increased Storage Space: Indexes require additional storage. Depending on the size of
the data, this can significantly increase the overall storage requirements.
Increased Maintenance Overhead: Indexes must be updated whenever data is
inserted, deleted, or modified, which can slow down these operations.
Slower Insert/Update Operations: Since indexes must be maintained and updated,
inserting or updating data takes longer than in a non-indexed database.
Complexity in Choosing the Right Index: Determining the appropriate indexing strategy
for a particular dataset can be challenging and requires an understanding of query
patterns and access behaviors.
Features of Indexing
Several key features define the indexing process in databases:
Efficient Data Structures: Indexes use efficient data structures like B-trees, B+ trees,
and hash tables to enable fast data retrieval.
Periodic Index Maintenance: Indexes need to be periodically maintained, especially
when the underlying data changes frequently. Maintenance tasks include updating,
rebuilding, or removing obsolete indexes.
Query Optimization: Indexes play a critical role in query optimization. The DBMS query
optimizer uses indexes to determine the most efficient execution plan for a query.
Handling Fragmentation: Index fragmentation can reduce the effectiveness of an index.
Regular defragmentation can help maintain optimal performance.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Indexing in Databases
Last Updated : 31 Jul, 2025
Indexing in DBMS is used to speed up data retrieval by minimizing disk scans. Instead of
searching through all rows, the DBMS uses index structures to quickly locate data using key
values.
When an index is created, it stores sorted key values and pointers to actual data rows. This
reduces the number of disk accesses, improving performance especially on large datasets.
Attributes of Indexing
Several Important attributes of indexing affect the performance and efficiency of database
operations:
1. Access Types: This refers to the type of access such as value-based search, range
access, etc.
2. Access Time: It refers to the time needed to find a particular data element or set of
elements.
3. Insertion Time: It refers to the time taken to find the appropriate space and insert new
data.
4. Deletion Time: Time taken to find an item and delete it as well as update the index
structure.
5. Space Overhead: It refers to the additional space required by the index.
In this type of organization, the indices are based on a sorted ordering of the values. These
are generally fast and a more traditional type of storing mechanism. These Ordered or
Sequential file organizations might store the data in a dense or sparse format.
i. Dense Index: Every search key value in the data file corresponds to an index record. This
method ensures that each key value has a reference to its data location.
Example: If a table contains multiple entries for the same key, a dense index ensures that
each key value has its own index record.
Dense Index
ii. Sparse Index: The index record appears only for a few items in the data file. Each item
points to a block as shown. To locate a record, we find the index record with the largest
search key value less than or equal to the search key value we are looking for.
Access Method: To locate a record, we find the index record with the largest key value less
than or equal to the search key, and then follow the pointers sequentially.
Access Cost = log 2 ( n) + 1 , where n is the number of blocks involved in the index
file.
Sparse Index
1. Clustered Indexing
Clustered Indexing stores related records together in the same file, reducing search time
and improving performance, especially for join operations. Data is stored in sorted order
based on a key (often a non-primary key) to group similar records, like students by
semester. If the indexed column isn't unique, multiple columns can be combined to form a
unique key. This makes data retrieval faster by keeping related records close and allowing
quicker access through the index.
Clustered Indexing
2. Primary Indexing
This is a type of Clustered Indexing wherein the data is sorted according to the search key
and the primary key of the database table is used to create the index. It is a default format
of indexing where it induces sequential file organization. As primary keys are unique and
are stored in a sorted manner, the performance of the searching operation is quite
efficient.
Key Features: The data is stored in sequential order, making searches faster and more
efficient.
A non-clustered index just tells us where the data lies, i.e. it gives us a list of virtual
pointers or references to the location where the data is actually stored. Data is not
physically stored in the order of the index. Instead, data is present in leaf nodes.
Example: The contents page of a book. Each entry gives us the page number or location of
the information stored. The actual data here(information on each page of the book) is not
organized but we have an ordered reference(contents page) to where the data points
actually lie. We can have only dense ordering in the non-clustered index as sparse ordering
is not possible because data is not physically organized accordingly.
It requires more time as compared to the clustered index because some amount of extra
work is done in order to extract the data by further following the pointer. In the case of a
clustered index, data is directly present in front of the index.
4. Multilevel Indexing
With the growth of the size of the database, indices also grow. As the index is stored in the
main memory, a single-level index might become too large a size to store with multiple
disk accesses. The multilevel indexing segregates the main block into various smaller
blocks so that the same can be stored in a single block.
The outer blocks are divided into inner blocks which in turn are pointed to the data blocks.
This can be easily stored in the main memory with fewer overheads. This hierarchical
approach reduces memory overhead and speeds up query execution.
Multilevel Indexing
Advantages of Indexing
Faster Queries: Indexes allow quick search of rows matching specific values, speeding
up data retrieval.
Efficient Access: Reduces disk I/O by keeping frequently accessed data in memory.
Improved Sorting: Speeds up sorting by indexing the relevant columns.
Consistent Performance: Maintains query speed even as data grows.
Data Integrity: Ensures uniqueness in columns indexed as unique, preventing duplicate
entries.
Disadvantages of Indexing
While indexing offers many advantages, it also comes with certain trade-offs:
Increased Storage Space: Indexes require additional storage. Depending on the size of
the data, this can significantly increase the overall storage requirements.
Increased Maintenance Overhead: Indexes must be updated whenever data is
inserted, deleted, or modified, which can slow down these operations.
Slower Insert/Update Operations: Since indexes must be maintained and updated,
inserting or updating data takes longer than in a non-indexed database.
Complexity in Choosing the Right Index: Determining the appropriate indexing strategy
for a particular dataset can be challenging and requires an understanding of query
patterns and access behaviors.
Features of Indexing
Several key features define the indexing process in databases:
Efficient Data Structures: Indexes use efficient data structures like B-trees, B+ trees,
and hash tables to enable fast data retrieval.
Periodic Index Maintenance: Indexes need to be periodically maintained, especially
when the underlying data changes frequently. Maintenance tasks include updating,
rebuilding, or removing obsolete indexes.
Query Optimization: Indexes play a critical role in query optimization. The DBMS query
optimizer uses indexes to determine the most efficient execution plan for a query.
Handling Fragmentation: Index fragmentation can reduce the effectiveness of an index.
Regular defragmentation can help maintain optimal performance.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
(7)
UNIT 1: OBJECT ORIENTED DATABASE Space for learners:
SYSTEM
Unit Structure:
1.1 Introduction
1.2 Unit Objectives
1.3 Concepts of Object-Oriented Databases
1.3.1 Key Features of Object Databases
1.3.2 Drawbacks of Object Databases
1.3.3 Popular Object Databases
1.4 Standards, Languages and Design
1.4.1 Standards, Languages
1.4.2 Object Database and Relational Database Design
1.5 Object Relational Database Systems
1.5.1 Relational Database Management System (RDBMS)
1.5.2 History of Object Relational Database System
1.5.3 Object-Oriented Relational Database Management
System (OORDBMS)
1.5.4 Comparative Analysis of RDBMS and OORDBMS
1.6 Summing Up
1.7 Answers to Check Your Progress
1.8 Possible Questions
1.9 References and Suggested Readings
1.1 INTRODUCTION
227 | P a g e
addition to the relational form of existing database management system. Space for learners:
The data here are stored, manipulated and accessed as objects, which is
done in object-oriented programming paradigms. The concept of object
can be realized by defining one class with underlying characteristics
like data abstraction, information hiding, encapsulation and imposing on
it other object-oriented features like inheritance, polymorphism, early
binding and late binding. The idea of object-oriented database approach
comes into existence because of the acceptance of object-oriented
programming approach among wide range of users worldwide. Some
object databases, accepted widely and appreciated by the database
community are mentioned in this unit. The required standards in the
design of the object-oriented database systems and the associating query
languages needs to be discussed in order to have a detailed insight into
it. The relational database system is the basis on which the OORDBMS
approach is evolving. The history of object relational database system is
covered, which is followed up by its detailed description. The object-
oriented relational database management approach is compared with the
classic relational database management approach as the conclusive
topic.
228 | P a g e
1.3 CONCEPTS OF OBJECT-ORIENTED DATABASES Space for learners:
229 | P a g e
(ODBMS). The preparatory idea of object-oriented databases immerged Space for learners:
in the late nineties of the nineteenth century and currently it has become
common for various OOP based languages, such as C++, Java,
Smalltalk and LISP. For example, Smalltalk is used in GemStone, LISP
is used in Gbase, and COP is used in Vbase and so on.
Objects are composed of some data members and member functions or
methods, which are encapsulated within a single unit with individual
values and certain properties. Objects come into existence by
instantiation of certain user defined classes. Objects generally go
through a cycle that includes the creation or allocation of objects, use of
the objects and the deletion or de-allocation of objects. Object databases
are common among many modern high performances applications with
high speed data access and manipulative facilities. Some of the
significant areas where object databases are taking a pivotal role are the
real-time systems, architectural engineering for 3-D modeling,
telecommunications, robotics, molecular science, astronomy and many
more.
230 | P a g e
1.3.1 Key Features of Object Databases Space for learners:
Following are some of the popular object databases. These databases are
accepted by most database users because of the highly flexible features
that conform to the needs of current users. The descriptions of few such
databases are mentioned below.
231 | P a g e
Cache Space for learners:
Cache is developed by Inter Systems and it is a high-performing object
database. This object based database facilitates a set of services that
include data storage, concurrency management and handles diverse
transactions issues and process management activities. Cache engine
can be treated as full-fledged powerful database toolkit with extensive
relational database features. This database can be used for diverse
queries and modification purposes using standard SQL via ODBC,
JDBC or object based methods. The computational efficiency of Cache
is enormous and it is a most reliable relational database with high
scalability parameters. Some of the important features of Cache
database are mentioned below.
Able to model data as objects, while eliminating mismatch
between databases and object-oriented applications.
Supports user-defined data types.
The ability to take the advantage of methods and inheritance like
functions.
Object-extensions for SQL to handle object identity and
relationships.
The ability to avail SQL and object-based access through a
single application.
Clustering is used to store data ensuring maximum
performance.
ConceptBase
Concept Base is another database system with multi-user and object-
oriented support which is deductive in nature. It is a powerful tool for
meta-modeling and is very useful for customizing modeling languages.
Concept Base comes with an associating graphical user interface (GUI)
facilitating the users with some common routines. Concept Base is
developed by the Concept Base Team at University of Skövde (HIS)
and the University of Aachen (RWTH). Commonly available operating
systems like Linux, Windows and Mac support Concept Base. There is
also a pre-configured virtual application within Concep Base, which
contains associating executable files and source files along with the
232 | P a g e
tools for compiling. The system is distributed under a FreeBSD-style Space for learners:
license.
ObjectDB
Object DB is a powerful object-oriented database management system
(ODBMS) based on Java language. It is a compact but reliable system,
which is easy to use and extremely fast in terms of object database
access. It supports both the client-server mode and the embedded mode.
Object DB provides all the standard database management services.
This is the reason, why the development process gets easier and the
applications behave faster. It is capable of handling advanced level
queries and providing enhanced indexing facilities. It is very much
effective in multi-user environments, where there is always a rush of
users. ObjectDB can easily be embedded in any applications
irrespective of its sizes and types. This is such a database, which has
been tested with Tomcat, Jetty, GlassFish, JBoss and Spring.
Several other popular object based databases are ObjectDatabase++,
GemStone/S, Perst, ZODB, Wakanda, ODABA, Objectivity/DB. The
discussions on these object databases are beyond the scope of this
syllabus. The learners can use various internet sources to gather a
detailed knowledge on these object based databases.
Some of the sound reasons for the need of standards are as follows.
Standard provides support in maintaining the portability of
database applications. Portability is defined as the capability to
execute particular software or application on different platforms
with minimal modifications.
233 | P a g e
Standards help in achieving interoperability. Interoperability Space for learners:
refers to the ability of an application to access multiple systems.
Here, the same application program may access some data stored
under one ODBMS package, and another data stored under another
source or package.
Standard allows customers to compare commercial products of
various vendors more easily by determining which parts of the
standards are applied in their purchased product.
ODMG (Object Data Management Group) is an association for
monitoring the object-oriented database management activities. This
association proposed a standard for ODBMS in the year 1993 and it was
named as ODMG 1.0 followed by ODMG 2.0 in 1995 and ODMG 3.0
in 2000.
The ODMG 3.0 standard has the following major specifications:
Object Model
Object Definition Language (ODL)
Object Query Language (OQL)
C++ Language Binding
Smalltalk Language Binding
Java Language Binding
234 | P a g e
Space for learners:
235 | P a g e
process terminates, the memory is de-allocated. On the other extreme, Space for learners:
an object, whose lifetime is persistent, is allocated memory space to be
managed by the OODBMS runtime system. This kind of objects exists
in memory after the termination of the process initiated by the
application program. So, it has a long lifetime as compared to transient
form.
Object Structure: The structure of an object can be either atomic or
non-atomic (if the object is composed of other objects). The atomic
object referred here is user-defined in nature. There is no built-in atomic
object type included in OODBMS object models.
Some other important definitions useful for the demonstration of an
object model are stated in the following section. The terms used here are
class, interface, struct, literals and various literal types.
A class defines both the abstract state and the abstract behaviour of
the object.
The interface defines only the abstract behavior of some objects.
The struct defines the abstract state of some literals.
A literal has no identifier and cannot act alone. The literals are
embedded in objects and cannot be individually referenced. Objects
and literals can be classified by their types. Although, all the
elements of a given type have a common range of states and
behaviors, a literal defines only the abstract state of a literal type.
The value of literals does not change. Few examples of literal values
are 67, 17.161576, ‘P’, ‘Q’, “GUIDOL” and “August-15, 2021”.
These examples are some constant numbers, characters and strings.
236 | P a g e
bag<t>, list<t>, array<t>, dictionary<t, v> where t is a type of Space for learners:
objects or values in the collection.
o Structured literals correspond to the values constructed by
tuple constructor. They include the date, time, interval and
timestamp as built-in structures and any other user defined
structures.
[Link] Object Definition Language (ODL)
ODL is a specific kind of a language that specifies the structure of
databases in object-oriented terms. ODL is an extension of Interface
Description Language (IDL), which is again a component of CORBA
(Common Object Request Broker Architecture). CORBA is a standard
for distributed, object-oriented computing which will be discussed in
the later chapters. The ODL is basically a specification language or a
design language, which is used to define the specifications of object
types that obey the rules of ODMG object model. This can be used like
the E/R diagram used in the case of RDBMS platform. ODL is
independent of any programming language and it is not used for
database manipulation activities.
[Link] Object Query Language (OQL)
OQL is a query language preferred by object data management group
(ODMG) for object-oriented database management purpose. OQL
works closely with programming languages like C++. The embedded
OQL statements within a host language return compatible objects useful
for further processing. OQL’s syntax is similar to SQL with additional
features for object handling. This query language is designed to operate
on databases described through ODL. Unlike SQL, which produces
collection, OQL produces collections (sets, bags, lists) of objects. OQL
fits naturally in object oriented host languages. Returned objects are
assigned in the variables present in the host program and these variables
are then used for further programming based manipulative works.
[Link] C++ Language Binding
Binding of ODMG implementations to C++ intends at the writing of
portable C++ codes that manipulates persistent objects. This object
manipulative language of C++ is abbreviated as OML. The C++
language binding includes a version of the ODL that uses C++ syntax,
237 | P a g e
OQL invoking interface and some procedures for operations on Space for learners:
OODBMS prescribed transactions.
[Link] Smalltalk Language Binding
Binding of ODMG implementations to Smalltalk focuses on the binding
in terms of the mapping between ODL and Smalltalk. The Smalltalk
bindings also include a mechanism to invoke OQL and required
procedures for operations on databases and other transactions.
238 | P a g e
part of the class specifications. It is an important matter to specify the Space for learners:
operations needed during the design phase for all types of databases.
But it may be delayed in RDB design, because it is not mandatory until
the implementation phase comes in force. One can easily observe one
realistic difference between the relational model and the object model of
data in terms of behavioral specifications. Although relational data
models do not compel or encourage the database designers to set some
valid behaviors or operations, this is an implicit requirement in the case
of object models.
[Link] Mapping of an Enhanced Entity Relationship
(EER) Schema into an Object Database (ODB)
Schema
The correlation of EER schemas and ODB schemas is simple, because
the ODB schemas provide support for inheritance. Once the mapping
has been completed, the operations need to be added to ODB schemas.
It is because the EER schemas do not include any operations like ODB.
The mapping of EER into ODB schemas can be exhibited using the
following steps.
Step -1
Creation of an ODL class for each EER type.
Multi-valued attributes are declared by sets, bags or lists.
Composite attributes are mapped into tuple constructors.
Step – 2
Add reference attributes for each binary relationship into the
ODL classes that participate in the relationship.
Relationship cardinality is set as single-valued for 1:1 and N:1
types and set- valued for 1:N and M:N types.
Relationship attributes are created through the use of tuple
constructors.
Step - 3
Include the operations corresponding to each class.
239 | P a g e
EER schema does not provide these operations and it must be Space for learners:
added to the database design by choosing it from the original
requirements.
The associating constructor and destructor operations must also
be included.
Step - 4
Inheritance relationships can be specified via extends clause.
An ODL class that corresponds to a sub-class in the EER
schemas inherits the types and methods of its baser-class in the ODL
schemas.
Its non-inherited attributes, relationship references and
operations are specified as mentioned in the earlier steps.
Step - 5
Weak entities can also be mapped in the same way as the regular
entity types.
Non-participating weak entities in any relationships may
alternatively be presented as composite multi-valued attribute of the
owner entity.
The attributes of the weak entity are included in the struct <... >
construct.
Step - 6
Map categories (union types) to object definition language.
May follow the same mapping used for EER-to-relational
mapping.
Declare a class to represent the category.
Define the 1:1 relationship between the category and each of its
base-classes.
Step – 7
Map multi-dimensional cardinality relationships whose degree is
greater than 2.
240 | P a g e
Each relationship is mapped into a separate class with Space for learners:
appropriate reference to each participating class.
241 | P a g e
brought into reality by Postgres (UC Berkeley) during this time. In the Space for learners:
mid of 1990s, early commercially available products were released.
These releases include various products like Illustra
(IBM), Omniscience (Oracle) and UniSQL (KCOMS). The Ukrainian
developer Ruslan Zasukhin, who is the founder of Paradigma Software,
Inc. developed and released the first version of Valentina database in
the mid of 1990s, which was used as C++ SDK. After less than a decade
of time, PostgreSQL had become a commercially available database and
has become the basis for several currently available products
incorporating OORDBMS features. The experts in the domain started
referring these products as object oriented relational database
management systems or OORDBMS. Many of the ideas of early object
relational database efforts have largely been incorporated into SQL:
1999 via specific structured types. For example, IBM's DB2, Oracle
database, and Microsoft’s SQL Server are claiming to support most
OORDBMS requirements and do so with a varying degree of success.
SQL statements are written in RDBMS like this-
CREATE TABLE Customers (
Id CHAR(10) NOT NULL PRIMARY KEY,
Surname VARCHAR(30) NOT NULL,
FirstName VARCHAR(30) NOT NULL,
DOB DATE NOT NULL [# DOB :
Date of Birth]
);
242 | P a g e
WHERE Birthday (DOB) = Today(); Space for learners:
243 | P a g e
1.5.3 Object-Oriented Relational Database Management Space for learners:
System (OORDBMS)
244 | P a g e
extensive liberty to query languages to work on the object-oriented Space for learners:
principle. Some of the common implementations in this regard are the
Oracle Database, PostgreSQL, and Microsoft’s SQL Server. IBM DB2
also supports objects and can be considered as OORDBMS.
In OORDBMS, the approach is essentially that of relational databases,
where the data resides in the database and is manipulated collectively
with queries through a query language. But, in OODBMS, where the
database is essentially a persistent object store for software written in
an object-oriented programming language, a programming API is solely
responsible for storing and retrieving of objects. In this case, a very
little or no specific support presents for query languages.
The basic need of object–relational database arises from the fact that
both relational and object databases have their individual advantages
and drawbacks. Although, the object-oriented databases allow sets, lists,
arbitrary user-defined data types and nested objects, they do not provide
any mathematical base for in-depth analysis. The basic goal for the
object–relational database is to bridge the gap between relational
databases and the object-oriented modeling techniques. The commonly
used programming languages such as C++, C#, Java and Visual
[Link] are seen implementing these extensive features of object-
relational databases. Further, the object–relational DBMS or
OORDBMS allows software developers to integrate user-defined data
types and methods that apply to them into the DBMS. Some of the
leading features or characteristics of OORDBMS are Complex data,
Type inheritance and Object behavior.
Complex data creation is based on basic schema definition through
the user-defined types. Structured complex data are when stored in a
hierarchy; it offers an additional property termed as type inheritance.
That is, a structured type can have subtypes that reuse all of its
attributes and contain additional attributes specific to the subtype.
Finally, the object behavior is related with the access to the program
objects. Such program objects must be storable and transportable for
database processing. This is the reason why they are usually named
as persistent objects. Inside a database, all the relations with a persistent
program object are relations with its object identifiers. The mentioned
points above can be addressed in a proper relational system, although
the SQL standard and its implementations enforce arbitrary restrictions
245 | P a g e
and some amount of additional intricacy. Extension of the data Space for learners:
model with custom data types and methods is possible in a properly
arranged relational system.
RDBMS OORDBMS
246 | P a g e
CHECK YOUR PROGRESS Space for learners:
Multiple choice questions:
1. Object databases are based on
i) Relational approach
ii) Object based approach
iii) Both (i) and (ii)
iv) None of these
2. The term attribute refers to a
i) Record
ii) Row
iii) Column
iv) Key
3. Which of the following can be defined using ODL?
i) Structure
ii) Attribute
iii) Operation
iv) All of above
4. Which of the following belongs to an atomic literal?
i) String
ii) Boolean
iii) Long
iv) All of above
5. Which among the following is/are not Object Based Database(s)?
i) Cache
ii) Foxpro
iii) Wakanda
iv) Both (i) and (iii)
State whether True or False:
6. A single programming paradigm acts behind a single programming
language.
7. A class is an instance of an object in OOP.
8. ODMG looks after the object models in an OODBMS.
9. MS SQL Server does not support OOP principles in any of its
versions.
10. OORDBMS works in the principles of OOPs as well as the
relational models.
247 | P a g e
1.6 SUMMING UP Space for learners:
248 | P a g e
1.8 POSSIBLE QUESTIONS Space for learners:
Short answer type questions:
1) What is the difference between an object and a literal in the object
oriented data model (OODM)?
2) What is an object? What is an object model with reference to
ODMG standards?
3) What are the main difference between designing a relational
database and an object database?
4) Differentiate between:
i) Interface and Class
ii) Atomic object and Collection object
iii) Object identifier and Object lifetime
iv) Persistent object and Transient object
5) What is the significance of ODL in OODBMS?
Long answer type questions:
1) Explain the major specifications mentioned in ODMG 3.0 standard.
2) Describe the differences and similarities between objects and
literals in the ODMG object model?
3) Describe the steps involved in mapping the EER schema into ODB
schema.
4) Explain in detail the OORDBMS concept with the introduction to
all its organizing components.
5) Describe in detail the differences between RDBMS and
OORDBMS.
249 | P a g e
In Proceedings of the “First International Conference on Space for learners:
Deductive and Object-Oriented Databases”, pages 223-40,
Kyoto, Japan, December 1989.
[Link] And Lochovsky (Eds), Object-Oriented Concepts,
Databases, and Applications, Addison-Wesley (Reading MA),
1989
[Link]
management systems
[Link]
[Link]
250 | P a g e
UNIT 2: DISTRIBUTED DATABASE Space for learners:
Unit Structure:
2.1 Introduction
2.2 Unit Objectives
2.3 Distributed Database
2.4 Data Fragmentation
2.5 Data Replication and Allocation Technique.
2.6 Types of Distributed Database System
2.7. Query Processing in Distributed Database
2.8 Concurrency and Recovery Distributed Database
2.9 Summing Up
2.10 Answers to Check Your Progress
2.11Possible Questions
2.12 References and Suggested Readings
2.1 INTRODUCTION
The database is a collection of structured information. Among
other database systems, a distributed database is one where files
are stored in different sites or systems. This unit will give an
overview of the distributed database Management System
(DDBMS). The unit shows the uses of distributed databases. Data
fragmentation, replication, and allocation are very much important
in a database. These are also explained in detail in this unit by
considering the example of distributed database. Types of the
distributed database are also explained in this unit by considering
the examples. Query processing and data recovery of the
distributed database are shown by taking the database example.
251 | P a g e
ii) About the types of the distributed database Space for learners:
iii) About the data fragmentation, replication, and allocation in
a distributed system.
iv) About the query processing database distributed system.
v) About the concurrency and recovery in a distributed
database.
252 | P a g e
system. In this type of system, the data is distributed among the Space for learners:
system. That is the reason the database of this type of system is
known as a distributed database. Every DDBMS has some
features.
I. Databases of the DDBMS are interlinked logically and they
are connected through a communication network. Often, the
DDBMS act as a single database for the user.
II. Data is physically stored in multiple sites and the data is
managed by a local DBMS in the site which is independent
of the other sites.
III. A distributed database is not a loosely connected system.
IV. A distributed database integrates transaction processing.
V. DDBMS synchronizes the distributed database periodically
for which it is transparent to the users.
Every distributed database has to build with some goals and these
are as follows.
i) Reliability: In DDBMS, if one of the systems fails, then
other systems will provide the service to the user. The
other system can complete the task of the failure system.
ii) Availability: In DDBMS, sites or systems are available to
provide reliability to the system. If one distributed system
fails, other sites can give service, and it maintains the
availability of the systems.
iii) Performance: Performance of the DDBMS can be
achieved by distributing data or information over
different sites which are located in different locations. So,
the databases are available to every location which is
maintained through the communication channel.
253 | P a g e
database is divided into different subtables or sub relations so that Space for learners:
each subtable or sub relation can be saved in different sites of the
distributed system. These subtables or sub relations are the logical
units of the DDBMS. The fragmentation is done in such a way that
the subunits give the actual distributed database after combining it.
Let’s, you have a table T in your distributed system and it is
fragmented into different sub tables t1, t2, t3, ----, tn. These
fragments should have sufficient information, so that it will restore
the original table after combining the t1, t2, t3, ---, tn using the
UNION or JOIN operation. These subtables are known as the
fragments and the process is known as data fragmentation in
DDBMS. The fragments are independent of each other’s and user
are concerned about the data fragmentation. This is known as
fragmentation transparency.
The distributed data fragmentation process has some advantages:
I. As the data is fragmented and can be stored locally, the
performance of the DDBMS will increase.
II. Due to the local data store in the local sites, local query
optimization is possible in DDBMS.
III. Fragmentation helps to main the security and privacy of the
local system which will help to main the overall security of
the DDBMS.
You have 3 methods for data fragmenting of a table and they are.
i) Horizontal Fragmentation.
ii) Vertical Fragmentation.
iii) Hybrid Fragmentation.
254 | P a g e
2.4.2 Vertical Fragmentation Space for learners:
S_Roll_No S_Name
2020001 A
2019001 D
2020002 B
2020003 C
2019002 E
Vertical Fragmentation 2 =
S_Roll_No Branch
2020001 MSc. IT
2019001 BSc. IT
2020002 MSc. IT
2020003 MSc. IT
2019002 BSc. IT
In vertical fragmentation 1 and vertical fragmentation 2, one filed
is common i.e the primary key of the IDOL table. It is required to
perform the join operation between the fragments. You can join
the two fragments to get back the original table IDOL as follows.
ΠIDOL (T1 ⋈T2).
In DDBMS, the vertical fragments are saved in different sites as
follows. In Fig. 2.3, fragment 1 is saved in site A where fragment
2 is saved in site B.
256 | P a g e
Space for learners:
4. True or False
i) Vertical fragmentation divides the table column-wise
(attribute).
ii) Horizontal fragmentation allows dividing a table
horizontally into subsets of tables
5. Let's you have a table COURSE as follows.
258 | P a g e
Apart from the above data replication, there are another few data Space for learners:
replications in a distributed database.
i) Transactional Replication: Transactional replication is
generally used in server-to-server communication. In this
replication, a full copy of the database is present with one
system and that system gets the update notification from
the other system once the data changes. Data replication
is done in real-time, so it gives a consistency guarantee.
For example, Azure SQL.
ii) Snapshot Replication: In this replication, a snapshot of
the database is sent to one database from another
database. Data is not updated continuously. Data is
updated infrequently at a specific time. It is more
complex than transactional replication. For example, SQL
Server replication.
iii) Merge Replication: In this replication, data of one
database is combined with another database. In this type
of replication, the data is updated from both databases, so
hard to main consistency and concurrency. For example,
Server and Client Communication (SQL server).
Data replication in DDBMS happens in different modes. They are
as follows.
i) Full Replication: In this mode, a full copy of the database
is present at every site of the distributed system. This mode
increases the availability of the data in the system, and the
user gets the highest experience from it. It is hard to main
the concurrency.
ii) No Replication: Here, the data is divided into different
fragments and each fragment is present at only one site
which is located in different locations. Data availability is
less than the full replication but concurrency can be
controlled.
iii) Partial Replication: Here some of the data fragments of
the database are replicated but some are not. Data
replication is depending on the demands of the respective
data fragments.
Data allocation is a process to decide where exactly you want to
store the data. It involves at per which data has to be stored at
what location. The data allocation technique allocates data
fragments to a site in a distributed database. Each data fragment or
259 | P a g e
its replication can be stored in the particular site of a system. The Space for learners:
process of storing data in a site is known as data allocation. The
sites and numbers of data replication depend on the demand of the
data fragments. The choice of sites and the degree of replication
depend on the system performance and availability and also
depend on the number of transactions submitted on the site. If the
user demands high availability of data, then full replication is a
good choice for this allocation. Otherwise, if a fragment of data is
required then partial replication can be used to allocate the data.
Three main data allocation methods are there and they are as
follows.
i) Centralized: Here entire database is stored in a single site.
No such data distribution or replication occurs in this
process.
ii) Partitioned: In this technique, data is divided into different
fragments and those fragments are stored in different sites
of the distributed systems.
iii) Replicated: In this technique, a copy of the database is
present in a different location and it is accessed from those
locations.
260 | P a g e
2.7 QUERY PROCESSING IN DDB Space for learners:
261 | P a g e
the data. So, the federal schema should handle these Space for learners:
constraints.
iii) Query Language
In a distributed database, the databases are varying from site
to site. So, the query languages are also varied from site to
site. Hence federated schema should develop a common
language that is compatible with all the query languages.
iv) Data Transfer Cost:
In a distributed database, databases are distributed. So, the
table of the databases is also distributed. Even some tables are
fragmented. So, during the time of query processing; it may
need to access the tables at the different databases or different
locations. This demands a request and transfer cost for the
data which needs to optimize.
To explain data transfer cost, let’s you have two distributed
database tables namely IDOL_EMP and IDOL_DEPT. The
IDOL_EMP has a table ÉMP which is present in one location
(location 1) of the distributed system, and IDOL_DEPT has
another table DEPT in another location (location 2) of the
distributed system. The EMP contains the basic information of the
employee where the DEPT table contains the name of the
department where the employee works. Let’s you have 500 data of
size 50 bytes in your EMP table where DEPT table has 10 data of
size 10 bytes. Consider you have processed a query to find the
name of the employee and department from another location
(location 3). The result of this query will include 500 records,
assuming that every employee is related to a department. Suppose
that each record in the query result is 40 bytes long. In this
situation, you can execute your query based on the three costs, and
accordingly, you can choose the optimized cost.
CASE I. You are executing your query from location 3. For this
case, the cost is as bellow.
i) Cost of transferring EMP data: 500 records * 50 bytes =
25,000 bytes.
ii) Cost of transferring DEPT data: 10 records * 10 bytes =
100 bytes.
262 | P a g e
iii) Therefore, total cost = 25,000 bytes + 100 bytes Space for learners:
= 25,100 bytes
CASE II: You can shift the data of the EMP table from location 1
to location 2 and then you process it and transfer the data to
location 3. For this case, the cost is as bellow.
i) Cost of transferring EMP data: 500 records * 50 bytes =
25,000 bytes
ii) Cost of transferring the result: 500 records * 40 bytes =
20,000 bytes.
iii) Therefore, total cost = 25,000 bytes + 20,000 bytes
= 45,000 bytes
CASE III: You can shift the DEPT data of the EMP table from
location 2 to location 1 and then you process it and transfer the
data to location 3. For this case, the cost is as bellow.
i) Cost of transferring DEPT data: 10 records * 10 bytes =
100 bytes
ii) Cost of transferring the result: 500 records * 40 bytes =
20,000 bytes.
Therefore, total cost = 100 bytes + 20,000 bytes = 20,100 bytes
Now, if you compare the cost of CASE I, CASE II, and CASE II,
the cost of CASE III is the minimal one and it is optimized. Using
this method, you can perform your query in the distributed
database at a minimal cost.
263 | P a g e
In a distributed system, the database of one site may fail. Space for learners:
But the DDBMS should work with the other sites and it
will try to recover the sites and make its date up to date.
iii) Failure of Communication Network:
The DDBMS must deal with the communication failure
and will try to maintain the concurrency and recover the
sites as soon as possible. If network portioning occurs due
to network failure, then it is hard to recover the sites and
maintain consistency.
iv) Distributed Commit:
The problems occur when a commit transaction is done in
DDBMS where the database is present in a failed system.
The two-phase commit protocol is often used to deal with
this problem.
v) Distributed Deadlock:
Sometimes deadlock may occur in a distributed system. So,
it is necessary to main consistency and recovery in the
deadlock system.
The techniques which deal with concurrency control in DDBMS
are explained below.
I. Lock based protocol:
When two transactions are present in the database, a read-
write lock can apply in one transaction to avoid the
concurrency issue where others can access the data. This
lock
II. Shared lock system (Read lock):
The shared lock system is a read lock. The lock is shared
between the transaction. Any one of the transactions can
activate the shared lock for reading purposes.
III. Exclusive lock:
In this technique, an exclusive lock is activated for a
transaction for the read and write operation. In this
technique, no other lock can apply for the read and write
operation on the same data.
264 | P a g e
Lock-based concurrency protocol locks the data. A lock is a Space for learners:
variable that controls the read-write operation on data. It is two
types.
i) One phase Locking Protocol:
In this technique, a lock is applied by a transaction on data
before it uses and releases after the transaction is
complete.
ii) Two-phase locking protocol:
In the two-phase locking protocol, a transaction adopts all
the locks in the first phase and does not release any locks
until finish all read and write operations. In the second
phase, the transaction releases all the locks and never
requests any locks.
Recovery is the most important process in a DDBMS. It is
required to recover the information from a site. The recovery is
required due to the following reasons.
i) The receiver site may down
ii) The location of the receiver site may crash.
iii) The communication link between the sender and receiver
site may break.
A two-phase commit protocol is used to overcome the issue of the
data recovery on DDBMS. This atomic protocol coordinates the
process of DDBMS which decides to commit or terminate a
transaction. It provides the automatic recovery option in case of a
site failure. The original place of transaction is known as
coordinator and other places of the transaction are known as a
cohort. The protocol executes in two phases.
i) Commit request: In the commit phase, the coordinator
prepares the list of cohorts and asks to commit the
transaction.
ii) Commit phase: Based on the responses from the cohorts,
the coordinator can decide to commit or terminate a
transaction.
265 | P a g e
CHECK YOUR PROGRESS-III
Space for learners:
6. All sites in a distributed database commit at exactly the
same instant. TRUE/FALSE
7. Fill in the blanks.
i) The real use of the Two-phase commit protocol is
______________.
ii) Read one, write all available protocol is used to
increase ___________ in a distributed database
system.
iii) Commit and rollback in DDB are related to ..........
iv) If a distributed transactions are well-formed and 2-
phasedlocked, then ................ is the correct locking
mechanism in distributed transaction as well as in
centralized database.
v) A distributed transaction can be ............. if queries
are issued at one or more nodes.
2.9. SUMMING UP
266 | P a g e
You have 3 methods for data fragmenting of a table and they Space for learners:
are.
o Horizontal Fragmentation.
o Vertical Fragmentation.
o Hybrid Fragmentation
The process of storing data or information in more than one
site or system in a distributed system is known as data
replication. It is useful in improving the availability of data.
Two types of data replication are present.
o Synchronous Data Replication: In this type of
replication, once the changes are made in a table of the
database, the data replication is done immediately.
o Asynchronous Data Replication: In asynchronous
replication, the data replication is done after the commit
operation of the database.
Data allocation is a process to decide where exactly you want
to store the data. It involves at per which data has to be stored
at what location.
There are two types of distributed databases are found and
they are homogenous database and heterogeneous database. a)
Homogeneous Database b) Heterogeneous Database
In a distributed database system, query processing is done at
the end of the user site and server site. A query comes from
the user site, so it is checked and optimized at the user site i.e.
it is at the local level. The query comes to the server, so it is
processed and optimized at the server site i.e. it is at the
global level.
During the time of concurrency control and recovery
distributed databases face lots of issues. They are presented
below.
o Multiple copies of data,
o Failure of a site,
o Failure of Communication Network,
o Distributed Commit,
o Distributed Deadlock.
267 | P a g e
A lock is a variable that controls the read-write operation on Space for learners:
data. It is two types.
o One phase Locking Protocol: In this technique, a lock is
applied by a transaction on data before it uses and releases
after the transaction is complete.
o Two-phase locking protocol: In the two-phase locking
protocol, a transaction adopts all the locks in the first
phase and does not release any locks until finish all read
and write operations. In the second phase, the transaction
releases all the locks and never requests any locks.
ii)
S_Roll_No Branch
2019001 BSc. IT
2019002 BSc. IT
6) FALSE
268 | P a g e
Space for learners:
7)
i) Atomicity, i.e, all-or-nothing commits at all sites
ii) Both Availability and Robustness
iii) Data Consistency
iv) A two-phase locking.
v) partially read-only
269 | P a g e
2.12 REFERENCES AND SUGGESTED Space for learners:
READINGS
270 | P a g e
UNIT 3: IMAGE AND MULTIMEDIA Space for learners:
DATABASE
Unit Structure:
3.1 Introduction
3.2 Unit objectives
3.3 Concept of Image
3.4 Image Database and Multimedia database
3.5 Requirement of Multimedia database
3.6. Challenges of multimedia database
3.7 Contents of multimedia database
3.8 Application of multimedia database
3.9 Summing Up
3.10 Answers to Check Your Progress
3.11 Possible Questions
3.12 References and Suggested Readings
3.1 INTRODUCTION
271 | P a g e
ii) About types of multimedia database including an Space for learners:
image.
iii) About image and multimedia database.
iv) About the challenges and contents of the multimedia
database.
v) About the challenges of the multimedia database.
vi) About the applications of the multimedia database.
272 | P a g e
database for the BLOB data. You can execute the following queries Space for learners:
for the BLOB data.
i) Table Creation Query: CREATE TABLE `idol_emp` (
`emp_id` INT NOT NULL , `emp_name` TEXT NOT NULL ,
`emp_photo` BLOB NOT NULL , `emp_biodata` BLOB NOT
NULL , PRIMARY KEY (`id`))
In query (i), the emp_photo and emp_biodata, these two fields
require the BLOB data. So their data types are BLOB.
ii) Data Insertion Query: As BLOB is MySQL datatype and it
has the following four BLOB data type depending on the length
of the data that they can hold.
a) TINY BLOB
b) BLOB
c) MEDIUMBLOB
d) LONG BOB
To insert the data into ‘idol_emp’ using BLOB and python, you
need to perform the following steps.
a) You need to install MySQL-Python connector using pip and
then need to establish the connection.
b) You need a python function that converts images and other
multimedia data into binary data.
c) Then define your insert query and execute the query using
the [Link]() function.
d) After the query execution, you need to commit your
database changes.
e) Then you need to close your cursor and database
connection.
f) Finally, verify your result.
The code of insertion into the database using the BLOB is given
below.
273 | P a g e
import [Link] Space for learners:
def multimediaToBinary(filename):
with open(filename, 'rb') as file:
binaryData = [Link]()
return binaryData
def insertBLOB(emp_id, emp_name, emp_photo, emp_biodata):
print("Inserting multimedia data into idol_emp")
try:
connection = [Link](host='localhost',
database='idol_db',
user='idol',
password='idolidol')
cursor = [Link]()
sql_insert_blob = """ INSERT INTO idol_emp
(emp_id, emp_name, emp_photo, emp_biodata)
VALUES (%s,%s,%s,%s)"""
emp_photo = convertToBinaryData(emp_photo)
emp_biodata = convertToBinaryData(emp_biodata)
insert_blob = (emp_id, emp_name, emp_photo,
emp_biodata)
result = [Link](sql_insert_blob, insert_blob)
[Link]()
print("Image and biodata has inserted successfully ", result)
finally:
if connection.is_connected():
[Link]()
274 | P a g e
[Link]() Space for learners:
print("MySQL connection is closed")
insertBLOB(1, "idol_emp1", "path of the image",
"path of the text")
275 | P a g e
print("Storing employee’s photyo and biodata in the Space for learners:
local PC")
write_file(Employee image, emp_photo)
write_file(Employee biodata, emp_biodata)
except [Link] as error:
print("Failed to read data from idol_emp {}".format(error))
finally:
if connection.is_connected():
[Link]()
[Link]()
print("MySQL connection is closed")
readBLOB(1, "path of the image",
276 | P a g e
ii) Concurrency control: Like other DBMS, a multimedia Space for learners:
database should control the concurrency of the
transaction. Otherwise, consistency issues will be arises.
iii) Data Independency: In the multimedia database, data of
the different multimedia should be independent. It
should be managed from the user side.
iv) Persistence: Data of a multimedia database should be
saved and reused by the other transactions.
v) Recovery: Data should be recovered at the time of
failure. A system may fail due to different reasons, but
the recovery option of a multimedia database should
recover the data at the time of need.
277 | P a g e
Space for learners:
CHECK YOUR PROGRESS-II
6. What is concurrency in multimedia database?
7. State truth or false
i. Integration is a requirement of multimedia database.
ii. Data independecy should be a part of multimedia
database.
iii. All multimedia shoud not be recovered.
8. State two challenges of multimedia database.
278 | P a g e
iii) The Multimedia databases are used in the video on Space for learners:
demand. For example. Netflix
iv) A Multimedia database is used in music. For example.
Ganna.
v) The multimedia database is used in GIS. For example.
Landsat 8.
3.9 SUMMING UP
279 | P a g e
Table Creation Query: CREATE TABLE `idol_emp` ( Space for learners:
`emp_id` INT NOT NULL , `emp_name` TEXT NOT
NULL , `emp_photo` BLOB NOT NULL , `emp_biodata`
BLOB NOT NULL , PRIMARY KEY (`id`))
Data Insertion Query: As BLOB is MySQL datatype and
it has the following four BLOB data type depending on the
length of the data that they can hold.
a) TINY BLOB
b) BLOB
c) MEDIUMBLOB
d) LONG BOB
Like other DBMS, the multimedia database should address
the requirement issues.
1. Integration:
2. Concurrency control
3. Data Independency
4. Persistence
5. Recovery
280 | P a g e
The multimedia database can be applied in the following Space for learners:
areas.
1. Insurance claim records.
2. Inflibnet.
3. Netflix
4. Ganna.
5. Landsat 8
281 | P a g e
ii. As the multimedia database consists of images, text, Space for learners:
video, mp3, etc. So conversion of one file format to
another format is not so easy
9. i) False ii) True
10. The multimedia database can be applied in the following
areas.
i. Multimedia databases can be applied in digital libraries.
For example IR@ inflibnet.
ii. The Multimedia databases are used in the video on
demand. For example. Netflix
11. Netflix is an example of a multimedia database. It is a
streaming service that offers a wide variety of award-
winning TV shows, movies, anime, documentaries, etc.
12. Information and Library Network (INFLIBNET) is an
example of a multimedia database and is an autonomous
Inter-University Centre of the University Grants
Commission (UGC) that provides access to e-resources to
colleges, universities, and centrally funded technical
institutions
282 | P a g e
1. Explain the data insertion and retrieve in a multimedia Space for learners:
database using python.
2. Explain the different challenges of a multimedia database.
3. Explain the requirements of a multimedia database.
283 | P a g e
UNIT 4: SPATIAL DATABASE Space for learners:
Unit Structure:
4.1 Introduction
4.2 Unit Objectives
4.3 Spatial Database Concept
4.4 Spatial DBMS Data Models
4.5 Content-based Indexing and Retrieval
4.6 Different Indexing Techniques
4.7 Summing Up
4.8 Answers to Check Your Progress
4.9 Possible Questions
4.10 References and Suggested Readings
4.1 INTRODUCTION
284 | P a g e
iv) About the indexing technique such as R tree, R+ tree, Space for learners:
and KD tree.
285 | P a g e
v) Environmental applications: In the case of Fire or Space for learners:
Pollution Monitoring, the SDBMS is used
vi) Administrative applications: In Public networks
administration and vehicle navigation, the SBMS is used.
The SDBMS are necessary for the following requirements before
its designs.
i) For the manipulation of very large amounts of data, e.g.,
terabytes of data per day from satellite images, the SDBMS is
required.
ii) For data distinction, e.g., spatial and non-spatial
(alphanumeric) data, the DBMS is necessary.
iii) For Complex spatial relationships and operations, e.g.,
topological, directional, metric relationships, the SDBMS
is necessary.
iv) Complex spatial relationships, e.g., find all cities adjacent
to a river, find all dark shapes left to the heart, and find the
5 closest hospitals concerning a given location.
v) Spatial join: An expensive operation, e.g., Find the 5
closest hospitals concerning any highway.
In section 4.3, the two types of the data model of SDBMS are
already mentioned. They are
i) Raster Model: In the raster model, SDBMS spaces are
subdivided into cells of regular size and shape such as square,
287 | P a g e
triangle, hexagon, etc. Each cell of the raster is assigned the Space for learners:
value of the attribute it represents and only one value is
assigned for the same. Different attributes are stored in
separate files (layers).
ii) Vector Model: In the vector model, the subdivision of the
space is done based on the position of the geographic feature,
i.e., irregular. The features are represented by (2-D space),
such as Points (x,y), Lines (x1,y1, x2,y2, ..., xn,yn), Regions
(x1,y1, ..., xn,yn, x1,y1).
288 | P a g e
Space for learners:
289 | P a g e
data, virtual maps implementation, and handling geospatial Space for learners:
coordinates, etc. The properties R tree are given below.
i. Consists of a single root with internal and leaf nodes.
ii. The root node contains a pointer to the largest region.
iii. The parent nodes contain pointers to their child nodes
where the region of child nodes completely overlaps with
the regions of parent nodes.
iv. Leaf nodes contain the actual data within the Minimum
Bounding region (MBR) to the current objects where the
MBR is the sub-regions within the entire space that group
data as efficiently.
(a)
290 | P a g e
(b) Space for learners:
Fig. 3.3 R tree in SDBMS
291 | P a g e
Multiple noes of R+tree may store the same object. Three cases Space for learners:
should take care of the insertion.
i) Insert an object into a node where the covering
rectangles of all entries do not intersect with the object-
bounding rectangle.
ii) The second one is when the bounding rectangle of the
new object only partially intersects with the bounding
rectangles of entries.
iii) The third case is more serious in that the covering
rectangles of some entries can prevent each other from
expanding to include the new object.
III. KD Tree: KD tree is a binary search tree that is also known
as K dimensional tree. In the KD tree, the data in each node
of the tree represents the K dimensional point in space. So it
is also known as space partitioning data structure. It
represents the points or data in K dimensional space. The
non-leaf node of the KD tree effectively divides the tree into
two spaces, known as half-space. The data that is left of the
root will go into a left sub-tree, data right of a root will go in
a right subtree. Construction of the KD tree is as follows:
i) The axis used to generate splitting trees is cycled
repeatedly.
ii) The nodes are selected by taking the median of the
data being placed in the subtree.
292 | P a g e
iii) The 3rd coordinate is (70,70). Now 70>30, so it will go right Space for learners:
subtree.
iv) The 4th coordinate is (50,30). First, compare with root
50>30. But already (70,70) is in the right subtree. Now
Compare 50 with the y value of 70. 50<70. So, it will be in
the left subtree of (70,70).
v) The final coordinate is (35,45). Comparing with root
(35>30). It will go right subtree. In the right subtree,
comparing with y coordinates of (70,70), you find 35<70.
So, it will go left subtree of (70,70). But in the left subtree
of (70,70), the (50,30) coordinate is present. Now compare
35 of (35,45) with x of (50,30). So, 35<45. So, it will be on
the left of (50,30).
293 | P a g e
4.7 SUMMING UP Space for learners:
294 | P a g e
o The axis used to generate splitting trees is cycled Space for learners:
repeatedly.
o The nodes are selected by taking the median of the
data being placed in the subtree.
295 | P a g e
2) Explain raster and vector model in spatial DBMS. Space for learners:
3) What is CBIR? Give examples.
4) What are the applications of spatial DBMS?
5) What are the requirements of Spatial SBMS?
6) State the difference between R and R+ tree.
7) Give some examples of spatial query language.
8) What are advantages of R+ tree over R tree?
9) Why do need KD tree if you have R and R+ tree?
10) What are the requirements of spatial DBMS?
Long answer type questions:
1) Explain the CBIR system with an example and diagrams.
2) Explain the KD tree with an example.
3) What are the indexing techniques of Spatial DBMS?
Explain.
296 | P a g e
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
SQL - Object-Relational
Extensions
Contents
• Semantic Hierarchies - Inheritance
• Complex objects
• Postgresql and many other databases actually have many extensions that go well
beyond the relational data model.
• As these extensions violate relational data model, think about what you are giving up
and use them sparingly!
◦ Simplicity of data model and queries
◦ Optimizations may not be as easy to perform
• We will go through some of these here, using Postgresql as an example.
• Querying subtables:
1 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
SELECT
name
, altitude
FROM
cities
WHERE
altitude > 50;
SELECT
name
, altitude
FROM ONLY
cities
WHERE
altitude > 50;
SELECT
[Link]
, [Link]
, [Link]
FROM
cities c
, pg_class p
WHERE
[Link] > 50
and [Link] = [Link];
Output:
relname | name | altitude
----------+-----------+----------
cities | Las Vegas | 2174
cities | Mariposa | 1953
capitals | Madison | 84
• Semantic hierarchies about sets of objects and their relationship to each other.
◦ A type of object (capital) is a special type of city.
◦ All cities include the capitals.
Complex objects
2 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
id | name | phone
----+--------------+-----------------
1 | Kara Danvers | (555-1234,work)
• These complex types really go against the first normal form: that all values should be
atomic. But, they allow multiple related values to be encapsulated.
• You can access the types using dot notation
• Technically you should store the both attributes for phone separately, but this way,
you can tell that they belong together.
• You can also define user defined types to be restricted domains of values and then
use in multiple places.
Collection of Values
• In addition to records (like the one above), you can also define collection of values.
• Arrays:
3 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
squares
---------
8
(1 row)
msg
-----------
{world}
{feel,so}
(2 rows)
4 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
jsonb
-------------------------
{"foo": {"bar": "baz"}}
?column?
----------------
{"bar": "baz"}
Geographic Data
• PostGIS is an extension for supporting geographic data with many useful data types
of functions.
• First install postgis and create the extension from a superuser:
• Now you can use all the data types and methods available in postgis.
• SRID shows the projection used to compute the latitude and longitude.
• You can also enter polygons as arrays of points, line segments are arrays of lines, etc.
• Many geography functions are available (distance is in meters):
5 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
SELECT
[Link]
, [Link]
, ST_DISTANCE([Link], [Link])
FROM
bwithloc b1
, bwithloc b2
WHERE
[Link] < [Link] ;
• Other examples:
◦ Check whether a point is inside a polygon (which city is this restaurant in)?
◦ Check the length of a line segment
Text Querying
• The text queries we have seen so far very simplistic: find if the text contains a specific
word.
• More sophisticated approaches treat text as a collection of words or tokens.
◦ If you want to learn more, information retrieval is a field that studies this!
• Postgresql supports text processing:
to_tsvector
-----------------------------------
'ate':3 'cat':2 'fat':1,4 'rat':5
• Text queries will consist of boolean connection of keywords, tokenized and stop
words removed:
• You can search a keyword query in a document by relevance. The number of times a
word appears will increase the relevance of the text to the query.
We will use the Yelp database as an example:
6 of 7 13-10-2025, 21:41
SQL - Object-Relational Extensions — CSCI 4380 Database Systems... [Link]
SELECT
[Link]
, ts_rank_cd(to_tsvector('english', r.review_text), query) AS rank
FROM
reviews r
, businesses b
, to_tsquery('pizza & (crust | sauce) & (delicious|tasty)') query
WHERE
b.business_id = r.business_id
and to_tsvector('english', r.review_text) @@ query
ORDER BY
rank DESC
LIMIT 10;
name | rank
----------------------------+-----------
DeFazio's Pizzeria | 0.05
Little Bites and More | 0.05
Notty Pine Tavern | 0.0366667
Red Front Restrnt & Tavern | 0.0285714
New York Style Pizza | 0.025
Milano Restaurant | 0.0218698
DeFazio's Pizzeria | 0.0202986
The Fresh Market | 0.02
Dante's Pizzeria | 0.0192982
Labella Pizza | 0.0155556
Summary
• Postgresql extensible with many new data types and associated methods.
• We will also see how it is possible to create the appropriate indices for these data
types.
7 of 7 13-10-2025, 21:41
Search... Sign In
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Before designing a database, it's crucial to understand important terms and concepts. A
properly structured database guarantees efficiency, data accuracy, and usability. From
understanding data storage to the principles that define data relationships, these concepts
are essential for anyone involved in designing, managing, or optimizing databases.
However, depending on specific requirements above criteria might change. But these are
the most common things that ensure a good database design.
Step 2: List down all the entities that will be present in the database & what relationships
exist among them.
Step 3: Organize the information into different tables such that no or very little redundancy
is there.
Step 4: Ensure uniqueness in every table. The uniqueness of records present in any relation
is a very crucial part of database design that helps us avoid redundancy. Identify the key
attributes to uniquely identify every row from columns. You can use various key constraints
to ensure the uniqueness of your table, also keep in mind the uniquely identifying records
must consume as little space as possible & shall not contain any NULL values.
Step 5: After all the tables are structured, and information is organized apply Normalization
Forms to identify anomalies that may arise & redundancy that can cause inconsistency in
the database.
Following are the terminologies that a person should be familiar with before designing a
database:
Redundancy: Redundancy refers to the duplicity of the data. There can be specific use
cases when we need or don't need redundancy in our Database. For ex: If we have a
banking system application then we may need to strictly prevent redundancy in our
Database.
Schema: Schema is a logical container that defines the structure & manages the
organization of the data stored in it. It consists of rows and columns having data types
for each column.
Records/Tuples: A Record or a tuple is the same thing, basically its where our data is
stored inside a table
Indexing: Indexing is a data structure technique to promote efficient retrieval of the data
stored in our database.
Data Integrity & Consistency: Data integrity refers to the quality of the information
stored in our database and consistency refers to the correctness of the data stored.
Data Models: Data models provide us with visual modeling techniques to visualize the
data & the relationship that exists among those data. Ex: model, Network Model, Object
Oriented Model, Hierarchical model, etc.
Normalization: The process of organizing data to reduce redundancy and dependency
by dividing larger tables into smaller ones and defining relationships. It ensures data
storage and consistency.
Functional Dependency: Functional Dependency is a relationship between two
attributes of the table that represents that the value of one attribute can be determined
by another. Ex: {A -> B}, A & B are two attributes and attribute A can uniquely determine
the value of B.
Transaction: Transaction is a single logical unit of work. It signifies that some changes
are made in the database. A transaction must satisfy the ACID or BASE properties
(depending on the type of Database).
Schedule: Schedule defines the sequence of transactions in which they're executed by
one or multiple users.
Concurrency: Concurrency refers to allowing multiple transactions to operate
simultaneously without interfering with one another.
Constraints: Constraints are the rules applied to fields in a table to enforce data
integrity. e.g., NOT NULL, UNIQUE, CHECK, etc. It ensures data quality and accuracy.
1. Requirement Analysis
It's very crucial to understand the requirements of our application so that you can think in
productive terms. And imply appropriate integrity constraints to maintain the data integrity
& consistency.
This is the actual design phase that involves various steps that are to be taken while
designing a database. This phase is further divided into two stages:
Logical Data Model Design: This phase consists of coming up with a high-level design
of our database based on initially gathered requirements to structure & organize our
data accordingly. A high-level overview on paper is made of the database without
considering the physical level design, this phase proceeds by identifying the kind of
data to be stored and what relationship will exist among those data.
Entity, Key attributes identification & what constraints are to be implemented is the core
functionality of this phase. It involves techniques such as Data Modeling to visualize
data, normalization to prevent redundancy, etc.
Physical Design of Data Model: This phase involves the implementation of the logical
design made in the previous stage. All the relationships among data and integrity
constraints are implemented to maintain consistency & generate the actual database.
Finally, after implementing the physical design of the database, we're ready to input the
data & test our integrity. This phase involves testing our database for its integrity to see if
something got left out or, if anything new to add & then integrating it with the desired
application.
1. Data Models: Data modeling is a visual modeling technique used to get a high-level
overview of our database. Data models help us understand the needs and requirements of
our database by defining the design of our database through diagrammatic representation.
Ex: model, Network model, Relational Model, object-oriented data model.
Data Models
2. Entity: Entities are objects in the real world, which can have certain properties & these
properties are referred to as attributes of that particular entity. There are 2 types of entities:
Strong and weak entity, weak entity do not have a key attribute to identify them, their
existence solely depends on one 1-specific strong entity & also have full participation in a
relationship whereas strong entity does have a key attribute to uniquely identify them.
Weak entity example: Loan -> Loan will be given to a customer (which is optional) & the
load will be identified by the customer_id to whom the lone is granted.
3. Relationships: How data is logically related to each other defines the relationship of
that data with other entities. In simple words, the association of one entity with another is
defined here.
A relationship can be further categorized into - unary, binary, and ternary relationships.
Unary: In this, the associating entity & the associated entity both are the same. Ex:
Employee Manages themselves, and students are also given the post of monitor hence
here the student themselves is a monitor.
Binary: This is a very common relationship that you will come across while designing a
database.
Ex: Student is enrolled in courses, Employee is managed by different managers, One
student can be taught by many professors.
Ternary: In this, we have 3 entities involved in a single relationship. Ex: an employee
works on a project for a client. Note that, here we have 3 entities: Employee, Project &
Client.
4. Attributes: Attributes are nothing but properties of a specific entity that define its
behavior. For example, an employee can have unique_id, name, age, date of birth (DOB),
salary, department, Manager, project id, etc.
5. Normalization: After all the entities are put in place and the relationship among data is
defined, we need to look for loopholes or possible ambiguities that may arise as a result of
CRUD operations. To prevent various Anomalies such as INSERTION, UPDATION, and
DELETION Anomalies.
Data Normalization is a basic procedure defined for databases to eliminate such anomalies
& prevent redundancy.
Physical Design
The main purpose of the physical design is to actually implement the logical design that is,
show the structure of the database along with all the columns & their data types, rows,
relations, relationships among data & clearly define how relations are related to each other.
Step 1: Entities are converted into tables or relations that consist of their properties
(attributes)
Step 2: Apply integrity constraints: establish foreign key, unique key, and composite key
relationships among the data. And apply various constraints.
Step 3: Entity names are converted into table names, property names are translated into
attribute names, and so on.
Step 5: Final Schemes are defined based on the entities & attributes derived in logical
design.
Physical Design
Conclusion
In conclusion, a good database design is an essential part of a strong database
management system (DBMS). It provides the basis for data governance, data storage, and
data retrieval. The quality of a database has a direct impact on a system’s overall
performance and dependability. It is important to consider data organization,
standardization, performance, integrity, and more when designing a database to meet the
needs of your organization and your users.
Article Tags : DBMS Geeks Premier League Geeks Premier League 2023
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Conceptual database design is the process of identifying the essential data elements,
relationships, and constraints in a data model, which represents a particular
organization's business requirements. The conceptual design stage is the first step in
the database design process, which precedes the logical and physical design stages.
In this article, we will discuss the conceptual database design, its objectives, its
process, and the key components of a conceptual data model.
1 of 3 13-10-2025, 21:52
Conceptual Database Design [Link]
Establish the constraints ? Constraints are the rules that govern the
relationships between entities. The third objective of conceptual database
design is to establish the constraints between entities, which ensure data
consistency and integrity.
2 of 3 13-10-2025, 21:52
Conceptual Database Design [Link]
Entities ? Entities are objects or concepts that exist in the real world and
can be distinguished from each other. Examples of entities include
customers, orders, products, and employees.
Conclusion
Conceptual database design is an essential process in database development, as it
lays the foundation for the logical and physical design stages. The objectives of
conceptual database design are to identify the entities and their attributes, define the
relationships, and establish the constraints. The process of conceptual database
design involves requirements gathering and entity-relationship.
3 of 3 13-10-2025, 21:52
Overview of the C++ Language Binding in the ODMG Standard [Link]
Introduction
Diving into the world of data management and modeling can be a complex task,
especially when dealing with standards like the Object Data Management Group
(ODMG). Did you know that ODMG provides an essential standard for object-oriented
database systems, including a C++ language binding? This article will guide you
through an easy-to-understand overview of this very aspect of ODMG, highlighting its
key features such as ODL constructs and transactions.
The Object Data Management Group (ODMG) serves as a critical aspect of data
management, revolutionizing the way we perceive information storage. Composed of
several prominent object database and object-relational vendors, ODMG aims to
establish standards for programming-language-centric data management.
By providing specifications for C++, Java, and Smalltalk, it seeks to bridge the gap
between object-oriented programming languages and databases. The ODMG standard
hinges on the premise that integrating the database with client language helps
streamline application development?a game-changing paradigm in modern computing
environments.
The purpose of the ODMG standard is to provide a consistent and standardized way of
1 of 5 13-10-2025, 22:06
Overview of the C++ Language Binding in the ODMG Standard [Link]
This standard plays a crucial role in data modeling and management within object-
oriented databases. By defining an Object Definition Language (ODL) and an Object
Manipulation Language (OML), it provides a clear syntax and semantics for creating,
manipulating, querying, and deleting objects in the database.
The importance of the ODMG standard lies in its ability to bridge the gap between
application development using object-oriented programming languages like C++ or
Java, and underlying databases that store persistent data.
The Object Definition Language (ODL) is a key component of the ODMG standard,
which plays a crucial role in object-oriented database systems. It serves as a
declarative portion of the ODMG specification that allows developers to define objects
and their relationships within an object-oriented database.
By using ODL, developers can specify the structure, behavior, and constraints of
objects in a concise manner. This includes defining classes, attributes, methods,
inheritance hierarchies, and associations between objects.
With ODL, developers can easily model complex data structures and implement them
in an object-oriented database system. The language provides a standardized syntax
for representing various constructs related to object modeling.
The Object Manipulation Language (OML) is a key component of the C++ language
binding in the ODMG standard. It allows users to perform various operations on
objects, such as creating, naming, manipulating, and deleting them.
OML provides a set of commands and syntax for interacting with objects stored in an
object-oriented database management system (OODBMS). This includes features like
transaction support for ensuring data consistency and integrity during updates.
2 of 5 13-10-2025, 22:06
Overview of the C++ Language Binding in the ODMG Standard [Link]
systems. Discover how ODMG standardizes object creation, manipulation, and deletion
in this overview.
The C++ language binding in the ODMG Standard involves mapping ODL constructs to
C++ constructs. This allows users to utilize the power of C++ programming language
for object-oriented data management. Here is an overview of how ODL constructs are
mapped to C++ constructs
3 of 5 13-10-2025, 22:06
Overview of the C++ Language Binding in the ODMG Standard [Link]
It provides seamless integration between the C++ programming language and the
ODMG standard, making it easier for programmers to work with object-oriented data
modeling and management. The C++ class library also supports transactions,
ensuring that changes made to objects can be committed or rolled back as needed.
With the C++ class library provided by the ODMG Standard, developers
can easily create new objects in their C++ applications.
Transactions
Transactions play a crucial role in the C++ language binding of the ODMG Standard.
In this context, transactions refer to a set of operations that are executed as a single
logical unit, ensuring consistency and integrity of data.
Transactions provide atomicity, which means that either all the operations within a
transaction are completed successfully or none of them is applied at all. This helps in
maintaining data integrity by preventing partial updates or inconsistent states.
Transactions also provide durability by ensuring that once committed, changes made
during the transaction persist even in case of failures. With the C++ language binding
4 of 5 13-10-2025, 22:06
Overview of the C++ Language Binding in the ODMG Standard [Link]
in the ODMG Standard, developers can easily work with transactions using a well-
defined API and perform actions such as starting a transaction, committing changes,
or rolling back if needed.
Conclusion
The C++ language binding in the ODMG Standard provides an efficient and powerful
way to interact with object-oriented databases. By mapping ODL constructs to C++
constructs and providing a comprehensive class library for object manipulation,
developers are able to easily create, name, manipulate, and delete objects.
Additionally, transactions ensure data integrity and consistency within the database.
Overall, the C++ language binding in the ODMG Standard is a valuable tool for anyone
working with object-oriented database systems.
FAQs
1. What is the C++ language binding in the ODMG standard?
The C++ language binding in the ODMG (Object Data Management Group) standard is
a set of specifications and guidelines that define how C++ programming language can
be used to implement object-oriented databases.
2. How does the C++ language binding in the ODMG standard work?
The C++ language binding provides a set of classes, interfaces, and methods that
allow developers to interact with object-oriented databases using C++. It includes
features such as object persistence, query capabilities, and transaction management.
3. What are the benefits of using the C++ language binding in the ODMG
standard?
Using the C++ language binding allows developers familiar with C++ to leverage their
existing skills and knowledge for building applications that interact with object-
oriented databases. It provides a standardized approach for database interaction,
which promotes interoperability and code reusability.
5 of 5 13-10-2025, 22:06
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
The translation process in query processing is similar to the parser of a query. When a
user executes any query, for generating the internal form of the query, the parser in the system
checks the syntax of the query, verifies the name of the relation in the database, the tuple, and
finally the required attribute value. The parser creates a tree of the query, known as 'parse-tree.'
Further, translate it into the form of relational algebra. With this, it evenly replaces all the use of
the views when used in the query.
It is done in the following steps:
Step-1:
Parser: During parse call, the database performs the following checks- Syntax check, Semantic
check and Shared pool check, after converting the query into relational algebra.
Parser performs the following checks as (refer detailed diagram):
1. Syntax check – concludes SQL syntactic validity. Example:
SELECT * FORM employee
Here error of wrong spelling of FROM is given by this check.
2. Semantic check – determines whether the statement is meaningful or not. Example:
query contains a table name which does not exist is checked by this check.
3. Shared Pool check – Every query possess a hash code during its execution. So, this check
determines existence of written hash code in shared pool if code exists in shared pool
then database will not take additional steps for optimization and execution.
Step-2:
Optimizer: During optimization stage, database must perform a hard parse atleast for one
unique DML statement and perform optimization during this parse. This database never
optimizes DDL unless it includes a DML component such as subquery that require optimization.
It is a process in which multiple query execution plan for satisfying a query are examined
and most efficient query plan is satisfied for execution.
Database catalog stores the execution plans and then optimizer passes the lowest cost plan for
execution.
Step-3:
Execution Engine: Finally runs the query and display the required result.
Thus, we can understand the working of a query processing in the below-described
diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the employees
whose salary is greater than or equal to 10000. For doing this, the following query is undertaken:
SELECT EMP_NAME FROM EMPLOYEE WHERE SALARY>10000;
Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:
o σsalary>10000 (πEmp_Name(Employee))
o πEmp_Name(σsalary>10000 (Employee))
After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
each operation. Thus, after translating the user query, the system executes a query evaluation
plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to construct a query evaluation plan.
o A query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
o A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which minimizes
its cost. This type of task performed by the database system and is known as Query
Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.
Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
Example:
SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > (SELECT MAX (SALARY) FROM
EMPLOYEE WHERE DNO=5);
The inner block
(SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO=5)
Translated in: ∏ MAX SALARY (σDNO=5(EMPLOYEE))
The Outer block
SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C
Translated in: ∏ LNAZME, FNAME (σSALARY>C (EMPLOYEE))
(C represents the result returned from the inner block.)
The query optimizer would then choose an execution plan for each block.
The inner block needs to be evaluated only once. (Uncorrelated nested query).
It is much harder to optimize the more complex correlated nested queries.
External Sorting
It refers to sorting algorithms that are suitable for large files of records on disk that do not fit
entirely in main memory, such as most database files..
ORDER BY.
Sort-merge algorithms for JOIN and other operations (UNION, INTERSECTION). Duplicate
elimination algorithms for the PROJECT operation (DISTINCT).
Typical external sorting algorithm uses a sort-merge strategy:
Sort phase: Create sort small sub-files (sorted sub-files are called runs).
CS8492-DATABASE MANAGEMENT SYSTEMS
ROHINI COLLEGE OF ENGINEERING & TECHNOLOGY
Merge phase: Then merges the sorted runs. N-way merge uses N memory buffers to
buffer input runs, and 1 block to buffer output. Select the 1st record (in the sort order) among
input buffers, write it to the output buffer and delete it from the input buffer. If output buffer
full, write it to disk. If input buffer empty, read next block from the corresponding run. E.g. 2-way
Sort-Merge
The query parser will typically generate a standard initial query tree to correspond to an
SQL query, without doing any optimization.
For example, for a SELECT-PROJECT-JOIN query, such as Q2, the initial tree is shown in
Figure. The CARTESIAN PRODUCT of the relations specified in the FROM clause is first
applied; then the selection and join conditions of the WHERE clause are applied, followed by the
projection on the SELECT clause attributes.
Such a canonical query tree represents a relational algebra expression that is very
inefficient if executed directly, because of the CARTESIAN PRODUCT (×) operations.
The heuristic query optimizer will transform this initial query tree into an equivalent final
query tree that is efficient to execute.
The optimizer must include rules for equivalence among relational algebra
expressions that can be applied to transform the initial tree into the final, optimized query tree.
First we discuss informally how a query tree is transformed by using heuristics, and then we
discuss general transformation rules and show how they can be used in an algebraic heuristic
optimizer.
6. Identify sub-trees that represent groups of operations that can be executed by a single
algorithm
Query "Find the last names of employees born after 1957 who work on a project named
‗Aquarius‘."
SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=‗Aquarius‘AND PNUMBER=PNO AND ESSN=SSN AND BDATE.‗1957-12-
31‘;
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
2. Join Order Heuristics: When there are queries involving multiple tables, deciding on
what optimum join order to use becomes necessary in order to decrease request execution
time. Join ordering heuristics can include the usage of greedy algorithms, dynamic
programming approaches, and others. These methods assist the system in exploring
different join orders and selecting the most efficient one. Furthermore, in cardinality
estimation techniques the scale of output temporarily created by intermediate join results is
predicted so that better judgments can be made regarding the order of joins.
3. Index Selection Heuristics: In addition to extraction and load of data, indexes have the
ability to accelerate query processing by making data retrieval fast. Index selection
heuristics encompass the usage of the heuristic optimization which can choose the most
influential indexes for query execution among them. These factors, like query predicates,
selectivity, and the fuel activation cost, are considered in the optimal indexing strategy.
Running queries based on index selection heuristics, DBMS optimizers lead to better
performance and a timely response decreasing overall.
4. Query Rewriting Heuristics: The use of heuristic optimization is focused on the rewriting
and transformation of the statements into semantically corresponding ones that are
suitable for optimal query execution. The rule of adaptation to domain knowledge is used
for the purpose of query structure suggestion and performance improvement. Methods like
operation shift and query splitting allow for compressed data flow and improved query
running strategies in this manner.
Suboptimality: Although the heuristic approaches sometimes can yield plans with
subpar query execution compared to the exhaustive precision optimization, they again
remove the need to follow exact heuristic rules and use only heuristics that are broadly
applicable, so they can be useful for many queries.
Cost Estimation Accuracy: The outcome of heuristic optimization, however, relies on the
accuracy of the cost estimations which in turn may be affected by the scalability of the
data, query complexity or system dynamics, among other factors.
Trade-offs: Heuristic optimization introduces a balance between optimality and
efficiency, ramifying the problem of proportionality between speed and quality of the
planning.
Conclusion
Heuristic optimization is an integral part of DBMS optimization tasks which are used to
efficiently handle the complexities of query optimization and system performance. Through
the application of heuristic methods within the cost estimation, join-order selection, index
usage, or query rewriting components of the DBMS, optimizers eventually come up with an
effective plan, thus improving the execution time for queries. Although heuristic optimization
undertakes the task of speeding up the optimality process, it should be noted that it has its
own limits, which make it vital to ensure that the speed and optimality do not move in an
opposite direction during an attempt to achieve exceptional performance in a database.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Databases SQL MySQL PostgreSQL PL/SQL MongoDB SQL Cheat Sheet SQL Interview Questions MySQL Interview Questions PL/SQL Interview Questions
An execution plan is a roadmap that shows how SQL Server retrieves the data for a query.
It breaks down the exact steps—like which indexes to use, how tables are joined, and in
what order operations are performed. The query optimizer creates this plan, evaluates
multiple options, and chooses the most efficient one. Once generated, plans are stored in
the plan cache for reuse.
In SQL Server, execution plans can be viewed as Graphical, Text, or XML formats.
The actual execution plan is produced after the query has been executed. It reflects the
real operations carried out by SQL Server, along with runtime performance details.
The estimated execution plan is created before the query executes. It represents the query
optimizer’s prediction of how the query will run, based on available statistics.
The actual execution plan can be achieved in the following ways in SQL Server:
1. After completely writing the query, Press Ctrl+M, and the actual execution plan will be
generated.
2. Go to the query window and right-click on it, then click on the context menu and select
‘Display Actual Execution Plan’.
3. Or the ‘Display Actual Execution Plan’ icon can be directly selected from the toolbar.
An estimated execution plan can be achieved using the following ways in SQL Server:
1. After completely writing the query, Press Ctrl+L, and the plan will be generated.
2. Go to the query window and right-click on it, then click on the context menu and select
"Display Estimated Execution Plan".
3. Or the "Display Estimated Execution Plan" icon can be directly selected from the toolbar.
Explore
Basics
Software Engineering Tutorial Software Development Life Cycle Waterfall Model Software Requirements Software Measurement and Metrics Software Design Process System configuration management Soft
Query optimization is the process of choosing the most efficient or the most favorable type
of executing an SQL statement. Query optimization is an art of science for applying rules to
rewrite the tree of operators that is invoked in a query and to produce an optimal plan. A
plan is said to be optimal if it returns the answer in the least time or by using the least
space.
Cost-Based Optimization:
For a given query and environment, the Optimizer allocates a cost in numerical form which
is related to each step of a possible plan and then finds these values together to get a cost
estimate for the plan or for the possible strategy. After calculating the costs of all possible
plans, the Optimizer tries to choose a plan which will have the possible lowest cost
estimate. For that reason, the Optimizer may be sometimes referred to as the Cost-Based
Optimizer. Below are some of the features of the cost-based optimization-
1. The cost-based optimization is based on the cost of the query that to be optimized.
2. The query can use a lot of paths based on the value of indexes, available sorting
methods, constraints, etc.
3. The aim of query optimization is to choose the most efficient path of implementing the
query at the possible lowest minimum cost in the form of an algorithm.
4. The cost of executing the algorithm needs to be provided by the query Optimizer so that
the most suitable query can be selected for an operation.
5. The cost of an algorithm also depends upon the cardinality of the input.
Cost Estimation:
To estimate the cost of different available execution plans or the execution strategies the
query tree is viewed and studied as a data structure that contains a series of basic
operation which are linked in order to perform the query. The cost of the operations that are
present in the query depends on the way in which the operation is selected such that, the
proportion of select operation that forms the output. It is also important to know the
expected cardinality of an operation output. The cardinality of the output is very important
because it forms the input to the next operation.
The cost of optimization of the query depends upon the following-
1. Cardinality-
Cardinality is known to be the number of rows that are returned by performing the
operations specified by the query execution plan. The estimates of the cardinality must
be correct as it highly affects all the possibilities of the execution plan.
2. Selectivity-
Selectivity refers to the number of rows that are selected. The selectivity of any row from
the table or any table from the database almost depends upon the condition. The
satisfaction of the condition takes us to the selectivity of that specific row. The condition
that is to be satisfied can be any, depending upon the situation.
3. Cost-
Cost refers to the amount of money spent on the system to optimize the system. The
measure of cost fully depends upon the work done or the number of resources used.
The first step is to use ANALYZE TABLE COMPUTE STATISTICS SQL command to compute
table statistics. Use DESCRIBE EXTENDED SQL command to inspect the statistics.
Table Statistics:
The table statistics can be computed for tables, partitions, and columns and are as follows-
Depending on the variant, ANALYZE TABLE computes different statistics, i.e. of a table,
partitions, or columns-
ANALYZE TABLE with neither PARTITION specification nor FOR COLUMNS clause.
ANALYZE TABLE with PARTITION specification (but no FOR COLUMNS clause).
ANALYZE TABLE with FOR COLUMNS clause (but no PARTITION specification).
Explore
Practice Questions
Accountancy Business Studies Economics Organisational Behaviour Human Resource Management Entrepreneurship Marketing Income Tax Finance Management
Table of Content
Types of Email Marketing
Steps to do Email Marketing
Benefits of Email Marketing
Drawbacks of Email Marketing
Conclusion
Frequently Asked Questions (FAQs)
Key Takeaways:
Each email campaign involves carefully crafting compelling subject lines and content that
speaks to the unique interests and needs of the recipients. Calls-to-action are integrated at
key points, guiding the next click. The business works to build trust and nurture ongoing
dialogues with its email subscribers over time. The success of email campaigns can be
measured by metrics like open rates, click-through rates on links, and conversion rates on
desired outcomes like purchases. Email marketing analytics provide insight into optimizing
messages and segments for improved results. When used correctly and following best
practices, email allows meaningful customer connections that may ultimately lead to sales.
1. Promotional Emails: These are emails focused on promoting special offers, sales, new
products, or other commercial announcements to drive purchases and transactions. For
example, coupon emails, sale announcement emails, or new product launch emails. They
advertise the business's latest deals.
2. Newsletters: Newsletters are regular, recurring emails that provide new and updated
content like articles, company news, blog summaries, tips, or other useful information to
subscribers. Rather than directly promoting products, they aim to build engagement.
3. Welcome Emails: Welcome emails are some of the most important emails sent. They are
the first email contact when a person signs up and sets the tone of the subscriber
relationship. Well-crafted welcome emails introduce the business, highlight subscription
benefits, and start subscriber engagement.
4. Cart Abandonment Emails: When customers add items to an online shopping cart but
don't complete the purchase, cart abandonment emails remind them to return and check
out. These transactional emails recover lost sales from shoppers needing an extra prompt
to buy.
5. Customer Re-engagement Emails: These emails target subscribers who have been
inactive for some time by re-engaging with them in an attempt to bring them back for
repeat business. Tactics may include sending promo codes, linking to the newest content,
or showcasing recently added inventory.
7. Holiday or Event Emails: These capitalize on major holidays, events, or cultural moments
to send relevant communications. For example, Independence Day sales emails, Mother's
Day gift ideas emails, or event promotion emails around occasions like music festivals or
industry tradeshows. They tie into seasonal moments.
8. Ratings and Reviews Emails: These requests satisfy customer reviews or star ratings
post-purchase. The feedback allows businesses to monitor satisfaction and improve
products. Review emails tend to see high open rates as customers want to share evaluative
input.
2. Establish your Goals: Decide on the purpose and goals of your email campaigns. Are
you aiming to drive traffic, generate leads, increase sales, boost customer engagement, and
promote brand awareness? Set specific KPIs related to your objectives, such as email open
rates, click-through rates, conversion rates, revenue metrics, or subscribers gained.
3. Create your Email List: Build your list through methods like offering opt-in forms on your
website, blog, or social channels, capturing leads at in-person events and promotions, and
through strategic list acquisition and partnerships. Focus on acquiring email contacts within
your target personas. Incentivize subscribers.
4. Pick an Email Campaign Type: Select campaign categories that align with audience
preferences and business goals. Campaign types include promo emails, content
newsletters, win-back offers, post-purchase follow-ups, holiday themes, and more. Map a
campaign calendar to your KPIs with campaigns scheduled.
5. Make a schedule: Build an email cadence and systematic schedule for how often to send
emails to each segment—weekly, monthly, etc. Welcome new subscribers with an
onboarding drip series. Leverage automation tools to schedule recurring campaigns like
win-back offers. Maintain a sense of exclusivity and anticipation without fatigue.
6. Measure your Results: Link the email platform to Google Analytics and add campaign
UTM tracking to monitor performance. See what emails drove the most website traffic,
subscriber growth, and sales to double down on those while reworking laggards.
3. Driving Website Traffic: Calls-to-action within email campaigns can effectively direct
engaged subscribers to targeted pages on your website or online store. Things like
promotional offers, gated content previews, and newsletter highlights convert existing
awareness into tangible website visits.
4. Lead Generation: Email often sits at the top of the purchase funnel, moving subscribers
from awareness into consideration. Asking for a lead-generating action within emails, such
as downloading an educational whitepaper or eBook, subscribing to a service trial,
registering for a demo, etc., can capture key contact information on hot prospects.
5. Enhanced Customer Retention: Ongoing email nurturing beyond the initial sale or sign-
up helps retain customers longer. Transactional and promotional emails focused squarely
on existing purchasers or loyal members build satisfaction and brand affinity, improving
customer lifetime value.
2. Audience Burnout: If you send too many emails or emails that are not relevant or
valuable, subscribers will disengage, open fewer emails, and may even unsubscribe from
your list altogether. Preventing this requires continual optimization.
4. Advanced Analytics requires Work: While email providers offer basic reporting,
integrating deeper web and customer analytics requires manually implementing additional
tracking tools that may be outside of their core capabilities.
Conclusion
Email marketing can be super helpful for connecting with customers and growing a
business when done right. With the perfect foundation built on customer needs, creativity,
and constantly optimizing based on data, an email marketing program can be a game-
changer. By understanding the dynamics and employing best practices, businesses can
leverage the strengths of email marketing while mitigating its drawbacks. Ultimately, a
well-executed email strategy will have the potential to promote meaningful connections,
drive sales, and fortify brand loyalty.
Explore
Accountancy Business Studies Economics Organisational Behaviour Human Resource Management Entrepreneurship Marketing Income Tax Finance Management
Affiliate marketing is a way for brands and people to work together and make money.
Whether you're a company wanting to sell more or someone looking to earn money by
promoting products, knowing about affiliate marketing is important.
This article is here to help you understand affiliate marketing better and use it to your
advantage. You'll also learn about the advantages, such as making money without creating
a product, and the disadvantages, like the potential unpredictability of earnings.
Table of Content
What is Affiliate Marketing?
How does Affiliate Marketing Works?
Types of Affiliate Marketing
Who are Affiliate Marketers?
Affiliate Marketing Examples
Advantages of Affiliate Marketing
Disadvantages of Affiliate Marketing
How to Start Affiliate Marketing
Geeky Takeaways:
The affiliate receives a commission or a portion of the money made when
customers click on these links created by the affiliate and complete the specified
activity, such as making a purchase, signing up, or downloading.
Affiliates are encouraged by this pay plan to sell the advertiser's products
successfully and increase website traffic and sales.
1. Advertiser/Merchant: This is the firm or enterprise that provides a good or service that
has to be advertised. To determine which affiliate is in charge of bringing in the sale, they
provide affiliates personalized tracking links or affiliate codes.
3. Consumers: These are the people who go to the advertiser's website by clicking on the
affiliate's special tracking link after viewing the affiliate's promotional content.
Under this strategy, affiliates produce videos, articles, blogs, and reviews that contain
affiliate connections to goods and services. The affiliate receives a commission if readers or
viewers click on these links and buy anything. This is a widely used technique among
writers and content producers.
Members in this group concentrate on providing their audience with coupons, discounts,
and exclusive offers. The affiliate is paid when customers utilise these offers to make
purchases. Websites with coupons and deals are especially common in the e-commerce
industry.
3. Email Marketing
Some affiliates create email lists and use email marketing campaigns to offer goods and
services to their members. They include affiliate links in their emails, and the affiliate
receives income from subscribers who click and make purchases.
To advertise goods and services, affiliates use social media sites like Facebook, Instagram,
and YouTube. In their descriptions or postings, they contain affiliate links. Influencers
frequently employ this strategy.
Affiliates build online resources that provide in-depth analyses and evaluations of different
goods and services in a particular market. When consumers click on the affiliate links in
their product reviews and make a purchase, the affiliate receives a commission.
In this strategy, affiliates market mobile or software apps. They could include affiliate links
for these items along with reviews, tutorials, and instructions.
Under this approach, affiliates have the ability to suggest other affiliates, and they are paid
for both their own and their recruited affiliates' referrals. A multi-tiered commission system
is produced as a result.
The goal of lead generation affiliates is to obtain leads or prospective consumers for the
advertiser, as opposed to concentrating on sales. They receive payment for bringing in
prospects who, frequently through forms or sign-ups, indicate interest in a good or service.
Affiliates in this category advertise goods and services on search engines and social media
platforms by running sponsored advertisements. When people click on their
advertisements and complete a desired activity on the advertiser's website, they are paid a
commission.
The salary of an affiliate marketer can vary widely based on factors like experience, niche,
platform used (e.g., blog, YouTube, social media), and the amount of time they invest. An
average salary of an Affiliate Marketer based on the level of experience can be given as:
Beginner 8-9LPA
Intermediate 13-16LPA
Advanced 18 LPA +
1. Blog Posts:A beauty blogger reviews a new skincare product and includes an affiliate
link to purchase the product online. When readers buy the product through this link, the
blogger earns a commission.
2. Social Media Influencers: A fitness influencer, for example, might share a post about
their favorite protein powder with a discount code provided by the company. Purchases
made with this code generate earnings for the influencer.
3. Email Newsletters: An email from a travel blogger could include recommendations for
travel gear with links to buy the items on an e-commerce site, earning the blogger a
commission on sales.
4. Product Review Websites: A tech review site might publish detailed reviews on the
latest smartphones with affiliate links to buy the phones from online retailers. The site
earns a commission for each sale made through these links.
5. Coupon and Deal Sites: For example, a coupon site might offer a special discount code
for online electronics stores, and each use of the code generates revenue for the site.
1. Choose a Niche: Select a topic you're passionate about or have knowledge in.
2. Find Affiliate Programs: Look for programs in your niche that offer good commission
rates.
3. Create Content: Start a blog, YouTube channel, or social media account to share content.
4. Promote Products: Use your content to promote products with your affiliate links.
5. Drive Traffic: Use SEO, social media, and email marketing to attract viewers.
6. Track and Optimize: Monitor your performance and optimize your strategies for better
results.
Conclusion
In conclusion, affiliate marketing is a smart way for companies and individuals to work
together, helping each other grow. It's like a partnership where both sides can win:
companies get more customers, and individuals or websites earn money by promoting
products. As the online world changes, affiliate marketing will too, but its essence of
rewarding effort and quality will stay the same, making it a valuable strategy for the digital
age.
Explore
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Distributed Database
Types
Some of the type of distributed database system are:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating
system, database management system, and the data structures used all are the same at all
sites. Hence, they're easy to manage.
Features:
Example: A bank with branches in different cities uses Oracle DB at every location. All
databases have the same structure and are synchronized regularly.
2. Heterogeneous Database
Features:
Example: A logistics company uses MySQL for inventory, MongoDB for vehicle tracking,
and PostgreSQL for billing. Integration middleware allows unified querying across these
platforms.
In this model, the server stores and manages the database, while clients send queries over
the network. It offers centralized control with distributed access, making it ideal for
enterprise systems and web applications. Clients can be lightweight while the server
handles heavy processing. Example: Web application interacting with a central PostgreSQL
server.
Features:
Example: An e-commerce website where the frontend (client) is hosted separately and
interacts with a central PostgreSQL server to manage orders, users, and inventory.
Here, all nodes are equal, with no fixed client or server roles. Each node can store data and
also process queries, leading to decentralized control. It supports fault tolerance and high
availability. Example: Blockchain networks like Ethereum, where each node maintains a part
of the distributed ledger.
Features:
These systems are deployed on cloud platforms and span multiple geographic regions for
scalability and reliability. They abstract infrastructure details and are offered as DBaaS,
making them ideal for dynamic workloads. Example: Google Cloud Spanner and Amazon
DynamoDB used for global applications.
Features:
Example:
1. Replication
In replication, copies of the same data are stored at two or more sites. If every site has the
full database, it's called full replication. This improves data availability and allows faster,
parallel query processing. However, updates must be made at all sites, or data may
become inconsistent. It also adds overhead and makes concurrency control more complex.
2. Fragmentation
In this approach, the relations are fragmented (i.e., they're divided into smaller parts) and
each of the fragments is stored in different sites where they're required. It must be made
sure that the fragments are such that they can be used to reconstruct the original relation
(i.e, there isn't any loss of data).
Fragmentation is advantageous as it doesn't create copies of data, consistency is not a
problem.
Fragmentation of relations can be done in two ways:
3. Concurrency Control
Concurrency control ensures data remains accurate when multiple transactions run at the
same time. Without it, issues like lost updates or dirty reads can occur. Its goal is to make
parallel transactions behave as if run one by one. Common methods include locking,
timestamps, and optimistic concurrency.
[Link] Heterogeneity
Semantic heterogeneity happens when different databases use the same data labels but
with different meanings, formats, or units. For example, one system may store salary in
dollars, another in rupees. This can cause confusion during data integration, so resolving it
is important for accurate results.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Centralized database
Architectural Models
• Client - Server Architecture for DDBMS
• Peer - to - Peer Architecture for DDBMS
• Multi - DBMS Architecture
– data management
– Optimization
Client performs
– Application
– User interface
Data processor
• Local query optimizer
• Acts as the access path selector
• Responsible for choosing the best access path
• Local Recovery Manager
– Bottom-up design
– Top-down procedure
Three issues
– A directory may either be global to the entire database or local to each site.
– Directory may be maintained centrally at one site, or in a distributed fashion by
distributing it over a number of sites.
➢ If system is distributed, directory is always distributed
– Replication may be single copy or multiple copies.
➢ Multiple copies would provide more reliability
Bottom-Up Approach
• Suitable for applications where database already exists
• Starting point is individual conceptual schemas
• Exists primarily in the context of heterogeneous database.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −
Non-replicated and non-fragmented
Fully replicated
Partially replicated Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of
queries is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the
fact that the frequency of accessing the tables vary considerably from site to site. The
number of copies of the tables (or portions) depends on how frequently the access queries
execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
• Vertical fragmentation
• Horizontal fragmentation
• Hybrid fragmentation
Mixed Distribution: This is a combination of fragmentation and partial replications. Here, the
tables are initially fragmented in any form (horizontal or vertical), and then these fragments
are partially replicated across the different sites according to the frequency of accessing the
fragments.
Design Strategies
In the last chapter, we had introduced different design alternatives. In this chapter, we will
study the strategies that aid in adopting the designs. The strategies can be broadly divided
into replication and fragmentation. However, in most cases, a combination of the two is
used.
Data Replication
Data replication is the process of storing separate copies of the database at two or more
sites. It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication
• Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
• Reduction in Network Load − Since local copies of data are available, query
processing can be done with reduced network usage, particularly during prime hours.
Data updating can be done at non-prime hours.
• Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
• Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become
simpler in nature.
Disadvantages
1. Applications whose views are defined on more than one fragment may suffer
performance degradation, if applications have conflicting requirements.
2. Simple tasks like checking for dependencies, would result in chasing after data in a
number of sites
3. When data from different fragments are required, the access speeds may be very
high.
4. In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
5. Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In
order to maintain reconstructiveness, each fragment should contain the primary key field(s)
of the table. Vertical fragmentation can be used to enforce privacy of data.
Grouping
• Starts by assigning each attribute to one fragment
o At each step, joins some of the fragments until some criteria is satisfied.
• Results in overlapping fragments
Splitting
• Starts with a relation and decides on beneficial partitioning based on the access
behavior of applications to the attributes
• Fits more naturally within the top-down design
• Generates non-overlapping fragments
For example, let us consider that a University database keeps records of all registered
students in a Student table having the following schema.
STUDENT
Regd_No Name Course Address Semester Fees Ma
rks
Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment
Horizontal Fragmentation
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.
• Primary horizontal fragmentation is defined by a selection operation on the owner
relation of a database schema.
• Given relation Ri, its horizontal fragments are given by
Ri = σFi(R), 1<= i <= w
Fi selection formula used to obtain fragment Ri
The example mentioned in slide 20, can be represented by using the above formula as
Emp1 = σSal <= 20K (Emp)
Emp2 = σSal > 20K (Emp)
For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows −
• Link between the owner and the member relations is defined as equi-join
• Given a link L where owner (L) = S and member (L) = R, the derived horizontal
fragments of R are defined as
Ri = R α Si, 1 <= I <= w
Where,
Si = σ Fi (S)
w is the max number of fragments that will be defined on
Fi is the formula using which the primary horizontal fragment Si is defined
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques
are used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
Transparency
Transparency in DBMS stands for the separation of high level semantics of the system from
the low-level implementation issue. High-level semantics stands for the endpoint user, and
low level implementation concerns with complicated hardware implementation of data or
how the data has been stored in the database. Using data independence in various layers of
the database, transparency can be implemented in DBMS.
Distribution transparency is the property of distributed databases by the virtue of which the
internal details of the distribution are hidden from the users. The DDBMS designer may
choose to fragment tables, replicate the fragments and store them at different sites.
However, since users are oblivious of these details, they find the distributed database easy to
use like any centralized database.
Unlike normal DBMS, DDBMS deals with communication network, replicas and fragments
of data. Thus, transparency also involves these three factors.
Following are three types of transparency:
1. Location transparency
2. Fragmentation transparency
3. Replication transparency
Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a
table as if they were stored locally in the user’s site. The fact that the table or its fragments
are stored at remote site in the distributed database system, should be completely oblivious to
the end user. The address of the remote site(s) and the access mechanisms are completely
[Link] order to incorporate location transparency, DDBMS should have access to updated
and accurate data dictionary and DDBMS directory which contains the details of locations
of data.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of
some fragments. It also conceals the fact that the fragments are located at diverse [Link] is
somewhat similar to users of SQL views, where the user may not know that they are using a
view of a table instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table [Link]
transparency is associated with concurrency transparency and failure transparency. Whenever
a user updates a data item, the update is reflected in all the copies of the table. However, this
operation should not be known to the user. This is concurrency transparency. Also, in case of
failure of a site, the user can still proceed with his queries using replicated copies without
any knowledge of failure. This is failure transparency.
Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated
transparencies are maintained to a considerable extent. The designer may choose to fragment
tables, replicate them and store them at different sites; all oblivious to the end user.
However, complete distribution transparency is a tough task and requires considerable design
efforts.
Database Control
Database control refers to the task of enforcing regulations so as to provide correct data to
authentic users and applications of a database. In order that correct data is available to users,
all data should conform to the integrity constraints defined in the database. Besides, data
should be screened away from unauthorized users so as to maintain security and privacy of
the database. Database control is one of the primary tasks of the database administrator
(DBA).
The three dimensions of database control are −
• Authentication
• Access Control
• Integrity Constraints
Authentication
In a distributed database system, authentication is the process through which only legitimate
users can gain access to the data resources.
Authentication can be enforced in two levels −
Controlling Access to Client Computer − At this level, user access is restricted while login
to the client computer that provides user-interface to the database server. The most common
method is a username/password combination. However, more sophisticated methods like
biometric authentication may be used for high security data.
Controlling Access to the Database Software − At this level, the database
software/administrator assigns some credentials to the user. The user gains access to the
database using these credentials. One of the methods is to create a login account within the
database server.
Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS
operations such as the rights to create a table, drop a table, add/delete/update tuples in a
table or query upon the table.
In distributed environments, since there are large number of tables and yet larger number of
users, it is not feasible to assign individual access rights to users. So, DDBMS defines
certain roles. A role is a construct with certain privileges within a database system. Once the
different roles are defined, the individual users are assigned one of these roles. Often a
hierarchy of roles are defined according to the organization’s hierarchy of authority and
responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this
role to user "ABC".
COMMIT;
);
CAPACITY INTEGER);
Hence, in a distributed system, the target is often to find a good execution strategy for query
processing rather than the best one. The time to execute a query is the sum of the following
Query Processing
Query processing is a set of all activities starting from query placement to displaying the
results of the query. The steps are as shown in the following diagram −
Figure 2.1 step in query processing
✦ Ship-whole vs ship-as-needed
➡ Join methods
Cost-Based Optimization
• Solution space
• Search algorithm
Search Space
• For N relations, there are O(N!) equivalent join trees that can be obtained by applying
commutativity and associativity rules
SELECT ENAME,RESP
WHERE [Link]=[Link]
AND [Link]=[Link]
Cost Functions
• Given
➡ A set of sites S = {s1, s2, …,sn} with the load of each site
➡ A query Q ={q1, q2, q3, q4} such that each subqueryqiis the maximum
processing unit that accesses one relation and communicates with its
neighboring queries
➡ For each qi in Q, a feasible allocation set of sites Sq={s1, s2, …,sk} where each
site stores a copy of the relation in qi
➡ Select best site b fora (with least load and best benefit)
➡ Remove a from Q and recompute loads if needed
• Let Q = {q1, q2, q3, q4} where q1 is associated with R1, q2 is associated with R2 joined
with the result of q1, etc.
Relational Algebra :
• The Relational Algebra is used to define the ways in which relations (tables) can be
operated to manipulate their data.
• This Algebra is composed of Unary operations (involving a single table) and Binary
operations (involving multiple tables).
• Join, Semi-join these are Binary operations in Relational Algebra.
Join
• Join is a binary operation in Relational Algebra.
• It combines records from two or more tables in a database.
• A join is a means for combining fields from two tables by using values common to
each.
Semi-Join
•A Join where the result only contains the columns from one of the joined tables.
•Useful in distributed databases, so we don't have to send as much data over the network.
•Can dramatically speed up certain classes of queries.
What is “Semi-Join” ?
Semi-join strategies are technique for query processing in distributed database systems. Used
for reducing communication cost.
A semi-join between two tables returns rows from the first table where one or more matches
are found in the second table.
The difference between a semi-join and a conventional join is that rows in the first table will
be returned at most once. Even if the second table contains two matches for a row in the first
table, only one copy of the row will be returned.
Semi-joins are written using EXISTS or IN.
A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a conventional join:
SELECT [Link], [Link] FROM dept D, emp E WHERE [Link] = [Link]
ORDER BY [Link];
◦ A department with N employees will appear in the list N times.
◦ We could use a DISTINCT keyword to get each department to appear only once.
A Simple Semi-Join Example “Give a list of departments with at least one employee.” Query
written with a semi-join:
SELECT [Link], [Link] FROM dept D WHERE EXISTS (SELECT 1 FROM
emp E WHERE [Link] = [Link]) ORDER BY [Link];
◦ No department appears more than once.
◦ Oracle stops processing each department as soon as the first employee in that
department is found.
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Transaction Operations
The low level operations performed in a transaction are −
• commit − A signal to specify that the transaction has been successfully completed in
its entirety and will not be undone.
• rollback − A signal to specify that the transaction has been unsuccessful and so all
temporary changes in the database are undone. A committed transaction cannot be
rolled back.
• Consistency − A transaction should take the database from one consistent state to
another consistent state. It should not adversely affect any data item in the database.
Goal:
The goal of transaction management in a distributed database is to control the execution of
transactions so that: 1. Transactions have atomicity, durability, serializability and isolation
properties.
• CPU and main memory utilization
• Control messages
• Response time
• Availability
Distributed Transactions
A distributed transaction is a database transaction in which two or more network hosts are
involved. Usually, hosts provide transactional resources, while the transaction manager is
responsible for creating and managing a global transaction that encompasses all operations
against such resources.
• After each slave has locally completed its transaction, it sends a “DONE” message to
the controlling site. When the controlling site has received “DONE” message from
all slaves, it sends a “Prepare” message to the slaves.
• The slaves vote on whether they still want to commit or not. If a slave wants to
commit, it sends a “Ready” message.
• A slave that does not want to commit sends a “Not Ready” message. This may
happen when the slave has conflicting concurrent transactions or there is a timeout.
• After the controlling site has received “Ready” message from all the slaves −
o The slaves apply the transaction and send a “Commit ACK” message to the
controlling site.
o When the controlling site receives “Commit ACK” message from all the
slaves, it considers the transaction as committed.
• After the controlling site has received the first “Not Ready” message from any slave
−
o The slaves abort the transaction and send a “Abort ACK” message to the
controlling site.
o When the controlling site receives “Abort ACK” message from all the slaves,
it considers the transaction as aborted.
Concurrency control for distributed Transactions
Locking-based concurrency control systems can use either one-phase or two-phase locking
protocols.
• Centralized
2 2
3 3
1 1 1
4 4
5 5
Prepare Ready or Abort Commit or Abort ACK
Figure3. 2 Centralized
• Hierarchial 3
3
2
2
4 1
4 1
1
5
5
Prepare Ready or Abort Commit or Abort ACK
3
3
2
2
4 1
4 1
1
5
5
• Linear
(Commit or Abort)
1 2 3 4
(Prepare or Ready)
Ordering is defined
• Distributed
2 2
1
3 3
4 4
5 5
Concurrency Control
Concurrency controlling techniques ensure that multiple transactions are executed
simultaneously while maintaining the ACID properties of the transactions and serializability
in the schedules.
• Serial Schedules − In a serial schedule, at any point of time, only one transaction is
active, i.e. there is no overlapping of transactions. This is depicted in the following
graph −
Conflicts in Schedules
In a schedule comprising of multiple transactions, a conflict occurs when two active
transactions perform non-compatible operations. Two operations are said to be in conflict,
when all of the following three conditions exists simultaneously −
• At least one of the operations is a write_item() operation, i.e. it tries to modify the
data item.
Serializability
A serializable schedule of ‘n’ transactions is a parallel schedule which is equivalent to a
serial schedule comprising of the same ‘n’ transactions. A serializable schedule contains the
correctness of serial schedule while ascertaining better CPU utilization of parallel schedule.
Equivalence of Schedules
Equivalence of two schedules can be of the following types −
• Result equivalence − Two schedules producing identical results are said to be result
equivalent.
• View equivalence − Two schedules that perform similar action in a similar manner
are said to be view equivalent.
Distributed deadlocks
Distributed deadlocks can occur in distributed systems whendistributed transactions or
concurrency control is being [Link] deadlocks can be detected either by
constructing a global wait-for graph from local wait-for graphs at a deadlockdetector or by
a distributed algorithm like edge chasing.
Transaction processing in a distributed database system is also distributed, i.e. the same
transaction may be processing at more than one site. The two main deadlock handling
concerns in a distributed database system that are not present in a centralized system
are transaction location and transaction control. Once these concerns are addressed,
deadlocks are handled through any of deadlock prevention, deadlock avoidance or deadlock
detection and removal.
Transaction Location
Transactions in a distributed database system are processed in multiple sites and use data
items in multiple sites. The amount of data processing is not uniformly distributed among
these sites. The time period of processing also varies. Thus the same transaction may be
active at some sites and inactive at others. When two conflicting transactions are located in a
site, it may happen that one of them is in inactive state. This condition does not arise in a
centralized system. This concern is called transaction location issue.
This concern may be addressed by Daisy Chain model. In this model, a transaction carries
certain details when it moves from one site to another. Some of the details are the list of
tables required, the list of sites required, the list of visited tables and sites, the list of tables
and sites that are yet to be visited and the list of acquired locks with types. After a
transaction terminates by either commit or abort, the information should be sent to all the
concerned sites.
Transaction Control
Transaction control is concerned with designating and controlling the sites required for
processing a transaction in a distributed database system. There are many options regarding
the choice of where to process the transaction and how to designate the center of control,
like −
The site where the transaction enters is designated as the controlling site. The controlling
site sends messages to the sites where the data items are located to lock the items. Then it
waits for confirmation. When all the sites have confirmed that they have locked the data
items, transaction starts. If any site or communication link fails, the transaction has to wait
until they have been repaired.
• In case of site or link failure, a transaction has to wait for a long time so that the sites
recover. Meanwhile, in the running sites, the items are locked. This may prevent
other transactions from executing.
• If the controlling site fails, it cannot communicate with the other sites. These sites
continue to keep the locked data items in their locked state, thus resulting in
blocking.
• Distributed Wound-Die
• Distributed Wait-Wait
Alternatively, deadlock detection algorithms can use timers. Each transaction is associated
with a timer which is set to a time period in which a transaction is expected to finish. If a
transaction does not finish within this time period, the timer goes off, indicating a possible
deadlock.
Another tool used for deadlock handling is a deadlock detector. In a centralized system,
there is one deadlock detector. In a distributed system, there can be more than one deadlock
detectors. A deadlock detector can find deadlocks for the sites under its control. There are
three alternatives for deadlock detection in a distributed system, namely.
• Distributed Deadlock Detector − All the sites participate in detecting deadlocks and
removing them.
Timestamp is a unique identifier created by the DBMS to identify a transaction. They are
usually assigned in the order in which they are submitted to the system. Refer to the
timestamp of a transaction T as TS(T). For basics of Timestamp you may refer here.
Timestamp Ordering Protocol –
The main idea for this protocol is to order the transactions based on their Timestamps. A
schedule in which the transactions participate is then serializable and the only equivalent
serial schedule permitted has the transactions in the order of their Timestamp Values. Stating
simply, the schedule is equivalent to the particular Serial Order corresponding to the order of
the Transaction timestamps. Algorithm must ensure that, for each items accessed
by Conflicting Operations in the schedule, the order in which the item is accessed does not
violate the ordering. To ensure this, use two Timestamp Values relating to each database
item X.
These algorithms ensure that transactions commit in the order dictated by their timestamps.
An older transaction should commit before a younger transaction, since the older transaction
enters the system before the younger one.
• Late Transaction Rule − If a younger transaction has written a data item, then an
older transaction is not allowed to read or write that data item. This rule prevents the
older transaction from committing after the younger transaction has already
committed.
• Younger Transaction Rule − A younger transaction can read or write a data item
that has already been written by an older transaction.
Basic Timestamp Ordering –
Every transaction is issued a timestamp based on when it enters the system. Suppose, if an
old transaction Ti has timestamp TS(Ti), a new transaction Tj is assigned timestamp TS(Tj)
such that TS(Ti) < TS(Tj).The protocol manages concurrent execution such that the
timestamps determine the serializability order. The timestamp ordering protocol ensures that
any conflicting read and write operations are executed in timestamp order. Whenever some
Transaction T tries to issue a R_item(X) or a W_item(X), the Basic TO algorithm compares
the timestamp of T with R_TS(X) & W_TS(X) to ensure that the Timestamp order is not
violated. This describe the Basic TO protocol in following two cases.
1. Whenever a Transaction T issues a W_item(X) operation, check the following
conditions:
1.
• If R_TS(X) > TS(T) or if W_TS(X) > TS(T), then abort and rollback T and reject
the operation. else,
• Execute W_item(X) operation of T and set W_TS(X) to TS(T).
2. Whenever a Transaction T issues a R_item(X) operation, check the following
conditions:
• If W_TS(X) > TS(T), then abort and reject T and reject the operation, else
• If W_TS(X) <= TS(T), then execute the R_item(X) operation of T and set
R_TS(X) to the larger of TS(T) and current R_TS(X).
Whenever the Basic TO algorithm detects twp conflicting operation that occur in incorrect
order, it rejects the later of the two operation by aborting the Transaction that issued it.
Schedules produced by Basic TO are guaranteed to be conflict serializable. Already
discussed that using Timestamp, can ensure that our schedule will be deadlock free.
One drawback of Basic TO protocol is that it Cascading Rollbackis still possible. Suppose
we have a Transaction T1 and T2 has used a value written by T1. If T1 is aborted and
resubmitted to the system then, T must also be aborted and rolled back. So the problem of
Cascading aborts still prevails.
Let’s gist the Advantages and Disadvantages of Basic TO protocol:
• Timestamp Ordering protocol ensures serializablity
• Timestamp protocol ensures freedom from deadlock as no transaction ever waits.
• But the schedule may not be cascade free, and may not even be recoverable.
Optimistic Concurrency Control Algorithm
In systems with low conflict rates, the task of validating every transaction for serializability
may lower performance. In these cases, the test for serializability is postponed to just before
commit. Since the conflict rate is low, the probability of aborting transactions which are not
serializable is also low. This approach is called optimistic concurrency control technique.
In this approach, a transaction’s life cycle is divided into the following three phases −
• Commit Phase − A transaction writes back modified data item in memory to the
disk.
Rule 1 − Given two transactions Ti and Tj, if Ti is reading the data item which Tj is writing,
then Ti’s execution phase cannot overlap with Tj’s commit phase. Tj can commit only after
Ti has finished execution.
Rule 2 − Given two transactions Ti and Tj, if Ti is writing the data item that Tj is reading,
then Ti’s commit phase cannot overlap with Tj’s execution phase. Tj can start executing only
after Ti has already committed.
Rule 3 − Given two transactions Ti and Tj, if Ti is writing the data item which Tj is also
writing, then Ti’s commit phase cannot overlap with Tj’s commit phase. Tj can start to
commit only after Ti has already committed.
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
If the site fails between the first and the second event, the state remains X, and the input
message is lost. If the site fails between the second and third event, then the site reaches state
Y, but the output message is not sent.
1. NONBLOCKING COMMITMENT PROTOCLS WITH SITE FAILURE
The termination protocol for the 2-phase-commitment protocol must allow the
transactions to be terminated at all operational sites when a failure of the coordinator site
occurs. This is possible in the following two cases:
1. At least one of the participants has received the command. In this case, the other
participants can be told by this participant of the outcome of the transactions and can
terminate it.
2. None of the participants has received the command, and only the coordinator site has
crashed, so that all participants are operational. In this case, the participants can elect
a new coordinator and resume the protocol.
In above cases, the transactions can be correctly terminated at all operational sites.
Termination is impossible when no operational participants has received the command and at
least one participant failed, because the operational participants cannot know the failed
participant has done and cannot take an independent decision. So, if a coordinator fails
termination is impossible.
This problem can be eliminated by modifying the 2-phase-commitment protocol in the 3-
phase-commitment protocol.
The 3-phase-commitment protocol
In this protocol, the participants do not directly commit the transactions during the second
phase of commitment, instead they reach in this phase a new prepared-to-commit(PC) state.
So an additional third phase is required for actually committing the transactions.
This new protocol requires three phases for committing a transaction and two phases for
aborting it.
Termination protocol for 3-phase-commitment
“If at least one operational participant has not entered the prepared-to-commit state, then
the transactions can be aborted. If at least one operational participant has entered the
prepared-to-commit state, then the transactions can be safely committed.”
Since the above two condition are not mutually exclusive, in several cases the
termination protocol can decide whether to commit or abort the transactions. The protocol
which always commits the transactions when both cases are possible is called
progressive.
The simplest termination protocol is the centralized, nonprogressive protocol. First a
new coordinator is elected by the operational participants. Then the new coordinator
behaves as follows:
1. If the new coordinator is in the prepared-to-commit state, it issues to all
operational participants the command to enter also in this state. When it has
received all the OK messages, it issues the COMMIT command.
2. If the new coordinator is in commit state, i.e. it has committed the transactions, it
issues the COMMIT command to all participants.
3. If the new coordinator is in the abort state, it issues the ABORT command to all
participants.
4. Otherwise, the new coordinator orders all participants to go back to a state
previous to the prepared-to-commit and after it has received all the
acknowledgements, it issues the ABORT command.
If the 2-phase-commitment protocol is used together with a primary site approach, then it is
possible to terminate all the transactions of the group of the primary site(the primary group),if
and only if the coordinators of all pending transactions belong to this group. This can be
achieved by assigning to the primary site the coordinator function for all transactions.
This approach is inefficient in most types of networks and it is very vulnerable to primary site
failure. To avoid this condition we can use 3-phase-commitment protocol can be used in
primary group.
B. Majority Approach and Quorum-Based Protocols
The majority approach avoids the disadvantages of the primary site approach. The basic idea
is that a majority of sites must agree on the abort or commit of a transaction before the
transaction is aborted or committed. A majority approach requires a specialized commitment
protocol. It cannot be applied with the standard 2-phase-commitment.
A straightforward generalization of the basic majority approach consists of assigning
different weights to the sites. The protocol which use a weighted majority are called
quorum-based protocols. The weights which are assigned to the sites are usually called
votes, since they are used when a site “votes” on the commit or abort of a transaction.
The basic rules of a quorum-based protocol are:
1. Each site I has associated with it a number of votes Vi, Vibeing a positive integer.
2. Let V indicate the sum of the votes of all sites of the network.
3. A transaction must collect a commit quorum Vc before committing.
4. A transaction must collect an abortquorum Va before aborting.
5. Va + Vc> V.
Rule 5 ensures that a transaction is either committed or aborted by implementing the basic
majority idea. In practice, the choice Va + Vc = V + 1 is the most convenient one.
A commitment protocol which implements this rule must guarantee that at one time a number
of sites such that the sum of their votes is greater than Vc agree to commit. It means these
sites have entered a prepared-to-commit state. Therefore a quorum based commitment
protocol can be obtained from the 3-phase-commitment protocol implementing the quorum
requirement.
Termination and restart are more complex in this protocol. So once a site has participated in
building a commit (abort) quorum, it cannot participate in an abort (commit) quorum. Since a
site cab fail after participating in building a quorum, its participation must be recorded in
stable storage.
A centralized termination protocol for the quorum-based 3-phase-commitment has the
following structure.
1. A new coordinator is elected.
2. The coordinator collects state information and acts according to the following rules:
a. If at least one site has committed (aborted), send a COMMIT (ABORT)
command to the other sites.
b. If the number of votes of sites that reached the prepare-to-commit state is
greater than or equal to Vc , send a COMMIT command.
c. If the number of votes of sites in prepare to abort state reaches the abort
quorum, send an ABORT command.
d. If the number of votes of sites that reached the prepare-to-commit state plus
the number of votes of uncertain sites is greater than or equal to Vc, send a
PREPARE-TO-COMMIT command to uncertain sites, and wait for condition 2b to
occur.
e. If the number of votes of sites that reached the prepare-to-abort plus the
number of votes of uncertain sites is greater than or equal to Va, send a PREPARE-
TO-ABORT command to uncertain sites, and wait for condition 2c to occur.
f. Otherwise, wait for the repair of some failure.
If the database is nonredundant, then it is very simple to determine which transactions can
be executed. Let us consider 2-phase-locking is used for concurrency control. A transaction
tries to lock all data items of its readand write-sets before commitment. As there is only one
copy of some data item, this copy is either available or not. If the unique copy of some data
item of the read or write-set is not available, the transaction cannot commit and must
therefore be aborted.
If we assume that only site crashes occur but no partitions, then the availability of the
items which belong only one to the write-set is not required, and it is possible to spool the
update messages for these items. All transactions which have their read-set available
executed completely, including commitment; but the updates affecting sites which are down
are stored at spooler sites. When recovery happens, the restart procedures of the failed sites
will receive this list of deferred updates and execute them. We consider a crashed site as
exclusively locked for the [Link] other transaction can read the values of data items
which are stored here. In the case of partitions the differed updated will cause inconsistent
results to be produced- the failure is catastrophic.
In conclusion, if the database is nonredundant, there is not very much to do in order to
increase its availability in the presence of failures. Therefore, most reliability techniques
consider the case of redundant databases.
B. REDUNDANT DATABASE
We deal here essentially with the second aspect; however, in designing reliable
concurrency control methods for replicated data the first goal also should be kept in mind.
There are three main approaches to concurrency control based on 2-phase-locking in a
redundant database: write-locks-all, majority locking, and primary copy locking.
I. WRITE-LOCK-ALL
For transaction with a small write-set and especially for read-only transactions, the
system is much more available than for transaction with a large write-set. For read-only
transactions sometimes can run in more than one group ,because if a data item has two copies
in two copies in two different groups, then no update transaction can write on it and read-only
transaction can use each copy consistently.
If we make the assumption that no partitions occur, but only site crashes, then the
same approach can be used as with a nonredundant database i.e., the updates of unavailable
copies of data items can be spooled. In this case, the availability of the database for update
transactions increases very much. In fact, since only the read-set matters in the case,
transaction 1,4and 7 have the same availability as transaction 10; transactions 2, 5 and 8 as
transaction 11; and transaction 3,6 and 9 as transaction 12. So the example must be carefully
interpreted. The fact that a transaction can run in a given group means now that it can be run
if all other sites are down, instead of building separate groups. The high increase in
availability is obtained at the risk of catastrophic partitions.
Requests to lock or unlock a data item and the messages of the 2-phase-commitment
protocol are required for the control of transactions. Control messages carry information and
are short. Data messages contain database information and can be long. With the write
locks-all approach, we have:
1. Benefit - For each transaction executed at site i having x in its read-set, one lock message
and one data message are saved.
2. Cost - for each transaction which is not executed at site i and has x in its write-set, one
lock message and one data message are required, plus the messages required by the
commitment protocol.
The pure majority locking approach is not very suitable for our example, because two copies
of each data item exist; hence to lock a majority we must lock both. So consider a weighted
majority approach, or quorum approach, which adopts the same rules which have been used
for quorum-based commitment and termination protocols.
These rules, applied to the locking problem, consist of assigning to each data item x a total
number of votes V(x), and assigning votes V(xi) to each copy xi in such a way that V(x) is
the sum of all V(xi). A read quorum Vr(x) and a write quorum Vw(x) are then determined,
such that:
Vr ( x ) + V w ( x ) V ( x )
Vw ( x ) V ( x ) / 2
With this assignment we can now consider the availability of the system in the case of
partitions. We choose the read and write quorums to be 2 for all data items. The availability
for the 12 transaction is shown in the figure. The following can be observed:
1. Transaction 1,2,3,4,7 and 10 have all the same availability. They are characterized by
the fact that they access all three data items either for reading or for writing or for both. Since
the read quorum is equal to the write quorum, it makes no difference whether the data item is
read or written from the viewpoint of availability. For the same reason, transaction 5,6,8and
11, which access only data items x and y, have the same availability. Also, transactions 9 and
12 have the same availability of the latter group, because the copy with highest weight for y.
2. The availability for update transactions is grater with the weighted majority approach
than with write-locks-all, while the availability for read-only transactions is smaller.
3. With this method , read-only transaction increases their availability if they can read
an inconsistent database, i.e., if they do not need to lock items, in fact, columns 10’,11,and
12, are the same for the majority approach as for the write-locks-all approach .
With the majority approach it is not reasonable to consider the assumption that
partitions do not occur. Notice that if we assume the absence of partitions, then the majority
approach is dominated by the write-locks-all approach(an approach is dominated by another
one if it is worse under all circumstances). In fact, we have seen that the majority and quorum
ideas have been developed essentially for dealing with partitions.
Consider now the locality aspect. A transaction reads a data item x at its site of origin, if
a copy is locally available. Hence, also in this case a data message is saved if a local copy is
available .However, read locks must be obtained at a number of copies corresponding to the
read quorum. Therefore, the addition of a copy of x can also force transactions which read x
to request more read locks at remote sites. This additional cost is incurred by transactions
which have x in their write-set, which must obtain write locks at a number of sites
corresponding to the write quorum. Moreover, they have to send a data message to all the
sites where there are copies of x.
It is clear that, considering only data messages, the same advantages and disadvantage
exist for the majority and the write-locks-all method. When control message are also
considered, then the situation is more complex; however, some of the locality motivations for
read-only transaction are lost.
[Link] COPY LOCKING
In the primary copy locking approach, all locks for a data item x are requested at the
site of the primary copy. We will assume first that also all he read and write operations are
performed on this copy; however, write are then propagated to all other copies.
Several enhancements of the primary copy approach exist which it more attractive.
The principal ones are:
1. Allowing consistent reads at different copies than the primary, even if real locks are
requested only at the primary; this enhances the locality of reads.
2. Allowing the migration of the primary copy if a site crash makes it unavailable; this
enhances availability.
3. Allowing the migration of the primary copy depending on its usage pattern. This also
enhances the locality aspect.
The first point deserves a comment. In order to obtain consistent reads at different
copies from the primary one, we should use the primary copy method for synchronization,
but perform the write and read operations according to the “write all/read one” method. In
this approach, the locks are all requested at the primary copy. So, at commitment all copies
are updated before releasing the write lock. A read can be performed in this way at any copy,
obtaining consistent data.
DETERMINING A CONSISTANT VIEW OF THE NETWORK
There are two aspects of this problem: Monitoring the state of the network, so that
state transitions of a site are discovered as soon as possible, and propagating new state
information to all sites consistently. Normally we use timeouts in the algorithms in order to
discover if a site was down. The use of timeouts can lead to an inconsistent view of the
network. Consider the following example in a 3-site network: site 1 sends a message to site2
requesting an answer. If no answer arrives before a given timeout, site 1 sends assumes that
sites 2 is down. If site 2 was just slow, then site 1 has a wrong view of the state of site2,
which is inconsistent with the view of site 2 about itself. Moreover, a third site 3 could try the
same operation at the same time as site 1, obtain an answer within the timeout, and assume
that site 2 is up. So it has different view that site1.
A generalized network wide mechanism is built such that all higher-level programs
are provided with the following facilities:
1. There is at each site a state tablecontaining an entry for eachsite. The entry can be up
or down. A program can send an inquiry to the state table for state information.
2. Any program can set a “watch” on any site, so that it receives an interrupt when the
site changes state.
The meaning of the state table and of a consistent view in the presence of partitions
failures is defined as follow: A site considers up only those sites with which it can
[Link] all crashed sites and all sites which belong to a different group in case of
partitions are considered down.A consistent view can be achieved only between sites of the
same [Link] of a partition there are as many consistent views as there are isolated
groups of sites. The consistency requirement is therefore that a site has the same state table as
all other sites which are up in its state table.
[Link] the state of the network
The basic mechanism for deciding whether a site is up or down is to request a message from
it and to wait for a timeout. The requesting site is called controller and the other site is called
controlled site. In a generalized monitoring algorithm, instead of having the controller
request message from the controlled site, it is more convenient to have the controlled site
send I-AM-UP messages periodical to the controller and the controlled site.
Note that if only site crashes are considered, the monitoring function essentially has to detect
transitions from up to down states, because the opposite transaction is detected by the site
which performs recovery and restart; this site will inform all the others. If, however,
partitions also are considered, then the monitoring function has also to determine transitions
from down to up states. When a partition is repaired, sites of one group must detect that sites
of the other group must detect that sites of the group become available.
Using this mechanism for detecting whether a site is up or down the problem consists of
assigning controllers to each site so that the overall message overhead is minimized and the
algorithm survives correctly the failure of a controller. The latter requirement is of extreme
importance, since in a distributed approach each site is controlled and at the same time
performs the function of controller of some other site.
A possible solution is to assign circular ordering to the sites and to assign to each site the
function of controller of its predecessor. In the absence of failures, each site periodically
sends an I-AM-UP message to its successor and controls that the I-AM-UP message from its
predecessor arrives in time. If the I-AM-UP message from the predecessor does not arrive in
time, then the controller assumes that the controlled site has failed, updates the state table and
broadcasts the updated state table to all other sites.
If the predecessor of a site is down,then the site also has to control its predecessor, and if this
one is also down, the predecessor of the predecessor, and so on backward in the ordering until
an up site is found is isolated or all other sites have crashed; this does not invalidate the
algorithm). In this way, each operational site always has a controller. For example, in site k
controls site k-3; i.e., it responsible for discovering that sites k-1 and k-2 recover from down
to up. Symmetrically, if the successor of a site is down, then this site has as a controller the
first operational site following it in the ordering. For example, site k-3 has site k as controller.
Note that in the FIG sites k-1 and k-2 is not necessarily crashed; they could belong to a
different group after a [Link], the view of the network of sites k and k-3 is not
necessarily the “real” state.
Property 1 is related to the atomicity to the transactions: either all effects of T or none
of them can appear in a consistent state. Property 2 is related to the serializability of
transactions: if a conflicting transaction T’ has preceded T, then the updates performed by T’
have affected the execution of T; Hence, if we keep the effects of T , we must keep also all
the effects of T’ . Note that durability of transaction cannot be ensured if we are forced to a
cold restart; the effect of some transactions is lost.
The simplest way to reconstruct a global consistent state in a distributed database is to
use local dumps, local logs, and global checkpoints. A global checkpoint is a set of local
checkpoints which are performed all sites of the network and are synchronized by the
following condition: if a subtransaction of a transaction T is contained in the local checkpoint
at some site, then all other subtransactions of T must be contained in the corresponding local
checkpoint at other sites.
If global checkpoints are available, the reconstruction problem is relatively easy. First,
at the failed site the latest local checkpoint which can be considered safe is determined; this
determines which earlier global state has to be reconstructed. Then all other sites are required
to reestablish the local states of the corresponding local checkpoints.
The main problem with the above approach consists in recording global checkpoints.
It is not sufficient for one site to broadcast a “write checkpoints” message to all other sites,
because it is possible that the situation of Fig arises; in this situation, T2 and T3 are
subtransactions of the same transaction T, and the local checkpoint C2 does not contain
subtransaction T2, while the local checkpoint C3 contains sub transaction T3, thus violating
the basic requirement for global checkpoints. FIGURE shows also that the fact that T
performs a 2- phase-commitment does not eliminate this problem, because the
synchronization of subtransactions during 2-phase-commitment and of sites during recording
of the global checkpoint is independent.
The simplest way to avoid the above problem is to require that all sites become
inactive before each other records its local checkpoint. Note that all sites must remain
inactive simultaneously,and therefore coordination is required. A protocol which is very
similar to 2-phase-commitment can be used for this purpose; a coordinator broadcasts “
prepare for checkpoint” to all sites, each site terminates the execution of subtransactions and
then answers READY, and then the coordinator broadcasts “ perform checkpoint”. This type
of method is unacceptable in practice because of the inactivity which is required all the sites.
A site has to remain inactive not only for the time required to record its checkpoints, but until
all other sites have finished their active transactions. Three more efficient solutions are
possible:
1. To find less expensive ways to record global checkpoints, so called loosely
synchronized checkpoints. All sites are asked by a coordinator to record a global
checkpoint; however, they are free to perform it within a large time interval. The
responsibility of guaranteeing that all subtransaction of the same transaction are
contained in the local checkpoints corresponding to the same global checkpoint is left
to transaction management. If the root agent of transaction T starts after checkpoint
Ci and before checkpoint Ci+1 , then each other subtransaction at a different site can be
started only after Ci has been recorded at its sites and before Ci+1 has been recorded .
Observing the first condition may force a subtransaction to wait; observing the second
condition can cause transaction aborts and restarts.
2. To avoid building global checkpoints at all, let the recovery procedure take the
responsibility of reconstructing a consistent global state at cold restart. With this
approach, the notion of global checkpoint is abandoned. Each site records its local
checkpoints independently from other sites, and the whole effort of building a
consistent global state is therefore performed by the cold restart procedure.
3. To use the 2-phase-commitment protocol for guaranteeing that the local checkpoints
created by each sites are ordered in a globally uniform way. The basic ideas is to
modify the 2-phase-commitment protocol so that the check points idea is to modify
the 2-phase-commitment protocol so that the checkpoints of all subtransactions which
belong to two distributed transaction T and T1 are recorded in the same order at all
sites where both transaction T and T’ are recorded in the same order at all sites where
both transactions are executed. Let Ti and Tj be subtransactions T’i and Tj’be
subtransactions of T’. If at site i the checkpoint of subtransaction Tiproceeds the
checkpoint of TJ should precede the checkpoint of subtransaction T’j.
Catalogs are usually updated when the users modify the data [Link] happens
when global relations, fragments, or images are created or moved, local access structures are
modified, or authorization rules are changed.
I. CONTENT OF CATALOGS
In the solution given to these problems in R*prototype, two types of names is used:
1. System wide names are unique names given to each object in the system.
They have four components:
a. The identifier of the user who creates the object
b. The site of that user
c. The object name
d. The birth site of the object, i.e., the site at which the object was
created.
The above requirements are met by storing catalog entries of each object as follows:
1. One entry is stored at the birth site of the object, until the object is destroyed.
If the object is still stored at its birth site, the catalog contains all the
information; otherwise, it indicates the sites at which there are copies of the
object.
2. One entry is stored at every site where there is a copy of the object.
The catalog content in R* includes relation names, column names and types,
authorization rules, low-level objects' names, available access paths, and profiles. R*
supports the "caching" of catalogs, using version numbers to verify the validity of cached
information.
AUTHORIZATION AND PROTECTION
I. Site-to-Site Protection
The first security problem which arises in a distributed database is initiating and
protecting intersite communication. When two database sites communicate, it is important to
make sure that:
1. At the other side of the communication line is the intended site (and not an
intruder).
2. No intruder can either read or manipulate the messages which are exchanged
between the sites.
When a user connects to the database system, they must be identified by the
[Link] identification is a crucial aspect of preserving security, because if an intruder
could pretend to be a valid user, then security would be violated.
In a distributed database, users could identify themselves at any site of the distributed
database. However, this feature can be implemented in two ways which both show negative
aspects.
1. Passwords could be replicated at all the sites of the distributed database. This
would allow user identification to be performed locally at each site, but would
also compromise the security of passwords, since it would they easier for an
intruder to access them.
2. Users could each have a "home" site where their identification is performed; in
this scenario, a user connecting to a different site would be identified by
sending a request to the home site and letting this site perform the
identification.
Once users are properly identified, database systems can use authorization rules to
regulate the actions performed upon database objects by them. In a distributed environment,
additional problems include the allocation of these rules, which are part of the catalog, and
the distribution of the mechanisms used for enforcing them. Two alternative, possible
solutions are:
1. Full replication of authorization rules. This solution is consistent with having fully
replicated catalogs, and requires mechanisms for distributing online updates to
them. But, this solution allows authorization to be checked either at the beginning
of compilation or at the beginning of execution.
2. Allocation of authorization rules at the same sites as the objects to which they
refer. This solution is consistent with local catalogs and does not incur the update
overhead as in the first case.
The second solution is consistent with site autonomy, while the first is consistent with
considering a distributed database as a single system.
The authorizations that can be given to users of a centralized database include the
abilities of reading, inserting, creating, and deleting object instances (tuples) and of creating
and deleting objects (relations or fragments).
For simplifying the mechanisms which deal with authorization and the amount of
stored information, individual users are grouped into classes, which are all granted the same
privileges.
In distributed databases, the following considerations apply to classes of users:
1. A "natural" classification of users is the one which is induced by the distribution
of the database to different [Link] is likely that "all users at site x" have some
common properties from the viewpoint of authorization. An explicit naming
mechanism for this class should be provided.
2. Several interesting problems arise when groups of users include users from
multiple sites. Problems are particularly complex when multiple-site user groups
are considered in the context of site autonomy. So, mechanisms involve the
consensus of the majority or of the totality of involved sites, or a decision made by
a higher-level administrator. So, multiple-site user groups contrast with pure site
autonomy.
SCHOOL OF COMPUTING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
• We consider the following three local schema. The first one is based on the relational
model, the second one on the network model (the CODEASYL network) and the third
one on the entity-relationship data model.
• First scheme, the Relational Engineering Database Representation :
SCHEMA TRANSLATION :
• One entity is created for each record. Thus, an EMPLOYEE and one DEPARTMENT
entity is created.
• The attributes of the records are taken directly into the E-R scheme.
• Finally, the links employs becomes a many-to-one relationship from the EMPLOYEE
entity to the DEPARTMENT entity. The final model looks like :
• The example relational model of the engineering database consists of four relations :
An employee could have only one salary, but a salary can belong to many employees.
• Second salary could be an attribute of E, cleaner, but the relationship between the title
and salary is lost.
• See below the result E-R scheme, with SAL as attribute of E.
• All local scheme are now translated to an intermediate scheme based on the target
model. The task of the schema integration is now to generate the global conceptual
schema (CGS), which can be queried by the user of the MDBMS.
• Ozsu defines the schema integration, as the process of identifying the components of
a database which are related to one another, selecting the best representation for the
global conceptual schema and finally integrating the components of each
intermediate schema.
• Integration methodologies are either binary or unary
In general, the one-shot approach is very complex and rarely used, mostly the binary
approach is used (Determine the best ordering!).
• Very good graphical tools exists now which help the identification and integration
approach.
• Preintegration : identify the keys and defines the ordering of the binary processing
approach.
• Comparison : Identification of naming and structural conflicts.
• Conformation : Resolution of the naming and structural conflicts.
• Restructeration and Merging of the different intermediate schema to the global
conceptual scheme (GCS).
• Interaction with an integrator is absolutely necessary.
Preintegration
• Preintegration establishes the rules of the integration process, i.e. the integration
method is selected (e.g. binary iterative n-ary) and then the order of the schema
integration (i.e. which intermediate schema is integrated with which one first).
• Candidate keys are determined. Here for each of the entities in all intermediate
schemes, the keys are determined.
• Potentially equivalent domains of attributes are detected and transformation rules
between the domains should be determined (e.g. one scheme defines the attribute
temperature in Grad Celsius, the other one in Fahrenheit, transformation rules
between the different domains should be prepared for further integration).
Comparison
1) Equivalent
2) One is subset of the other
3) Some components from any may occur in the other
4) Completely no overlap.
Structural conflicts
Conformation
• Conformation is the resolution of the conflicts that are determined at the comparison
phase.
• Naming conflicts are resolved by simply renaming conflicting ones. In the case
of homonyms, the identical entities or attributes are extended with the name of the
entity and the name of the scheme it belongs to.
• Structural conflicts are resolved by transforming entities/attributes or relationships
between them.
• All modified and non-conflicting schemes must be first merged into a single database
schema and secondly restructured to create the 'best' (see later) one.
• The merging follows the integration order ones fixed in the Preintegration. The
merging should be complete, i.e. all components of all the intermediate schema
should be find their place in the merged one.
• Now a Restructuration would take place which searches for the minimal one, thus
the redundant relationships are removed.
• Finally, the scheme could be re-transformed to be more understandable. This
process is in its great parts autonomous and this mechanism ignores all kind of
understandability, it is often necessary by the integrator to rebuild or extend some
relationships (here the minimalist can be lost) in a way that the user can understand
the scheme and thus formulate correct queries.
QUERY PROCESSING :
Query processing is a set of all activities starting from query placement to displaying the
results of the query. The steps are as shown in the following diagram
Figure 5.4 Query Processing
RELATIONAL ALGEBRA
Relational algebra defines the basic set of operations of relational database model. A
sequence of relational algebra operations forms a relational algebra expression. The result of
this expression represents the result of a database query.
• Projection
• Selection
• Union
• Intersection
• Minus
• Join
Projection
Projection operation displays a subset of fields of a table. This gives a vertical partition of
the table.
If we want to display the names and courses of all students, we will use the following
relational algebra expression −
πName,Course(STUDENT)πName,Course(STUDENT)
Selection
Selection operation displays a subset of tuples of a table that satisfies certain conditions.
This gives a horizontal partition of the table.
For example, in the Student table, if we want to display the details of all students who have
opted for MCA course, we will use the following relational algebra expression −
σCourse="BCA"(STUDENT)σCourse="BCA"(STUDENT)
Union
If P is a result of an operation and Q is a result of another operation, the union of P and Q
(p∪Qp∪Q) is the set of all tuples that is either in P or in Q or in both without duplicates.
For example, to display all students who are either in Semester 1 or are in BCA course −
Sem1Student←σSemester=1(STUDENT)Sem1Student←σSemester=1(STUDENT)
BCAStudent←σCourse="BCA"(STUDENT)BCAStudent←σCourse="BCA"(STUDENT)
Result←Sem1Student∪BCAStudentResult←Sem1Student∪BCAStudent
Intersection
If P is a result of an operation and Q is a result of another operation, the intersection of P and
Q ( p∩Qp∩Q ) is the set of all tuples that are in P and Q both.
For example, given the following two schemas −
EMPLOYEE
PROJECT
To display the names of all cities where a project is located and also an employee resides −
CityEmp←πCity(EMPLOYEE)CityEmp←πCity(EMPLOYEE)
CityProject←πCity(PROJECT)CityProject←πCity(PROJECT)
Result←CityEmp∩CityProjectResult←CityEmp∩CityProject
Minus
If P is a result of an operation and Q is a result of another operation, P - Q is the set of all
tuples that are in P and not in Q.
For example, to list all the departments which do not have an ongoing project (projects with
status = ongoing) −
AllDept←πDepartment(EMPLOYEE)AllDept←πDepartment(EMPLOYEE)
ProjectDept←πDepartment(σStatus="ongoing"(PROJECT))ProjectDept←πDepartment(σSta
tus="ongoing"(PROJECT))
Result←AllDept−ProjectDeptResult←AllDept−ProjectDept
Join
Join operation combines related tuples of two different tables (results of queries) into a
single table.
For example, consider two schemas, Customer and Branch in a Bank database as follows −
CUSTOMER
BRANCH
Page Model
Syntax
Atransactiont is a partial order of steps (actions) of the formr(x) or w(x), where x∈D and
reads and writes as well as multiple writes applied to the same object are ordered.
We write t = (op, <),
for transaction t with step set op and partial order <.
Example:r(s) w(s) r(t) w(t)
Semantics
Interpretation of jth step, pj , of t:
If pj =r(x), then interpretation is assignment vj:= x to local variable vj.
If pj=w(x), then interpretation is assignment x := fj(vj1, ..., vjk).
with unknown function fjand j1, ..., jk denoting t‘s prior read steps.
Object Model
A transaction t is a (finite) tree of labeled nodes with
• the transaction identifier as the label of the root node,
• the names and parameters of invoked operations as labels of inner nodes, and
• page-model read/write operations as labels of leaf nodes, along with a partial order <
on the leaf nodes such that for all leaf-node operations p and q with p of the form w(x)
and q of the form r(x) or w(x) or vice versa, we have
p<q ∨ q<p
Special case: layered transactions(all leaves have same distance from root)
Derived inner-node ordering: a < b ifall leaf-node descendants of a precede all leaf-node
descendants of b
Example: DBS Internal Layers
OMA is an architecture developed by the OMG (Object Management Group) that provides an
industry standard for developing object-oriented applications to run on distributed networks.
The goal of the OMG is to provide a common architectural framework for object-oriented
applications based on widely available interface specifications.
The OMA reference model identifies and characterizes components, interfaces, and protocols
that comprise the OMA. It consists of components that are grouped into application-oriented
interfaces, industry-specific vertical applications, object services, and ORBs (object request
brokers). The ORB defined by the OMG is known more commonly as CORBA (Common
Object Request Broker Architecture).
Accountancy Business Studies Economics Organisational Behaviour Human Resource Management Entrepreneurship Marketing Income Tax Finance Management
Geeky Takeaways:
Mobile Marketing is a dynamic strategy that utilizes channels like MMS, smartphones,
tablets, SMS, and apps to promote products or services.
The primary goal is to engage users consistently on their mobile devices, providing
personalized and targeted marketing experiences.
Responsive mobile websites, interactive mobile apps, social commerce, and direct SMS
marketing are key components.
Examples include IKEA's interactive initiatives, Burger King's mobile outreach, and
Swiggy's mobile engagement.
Table of Content
How does Mobile Marketing Work?
Types of Mobile Marketing
Why is Mobile Marketing important?
Advantages of Mobile Marketing
Disadvantages of Mobile Marketing
How to start a Mobile Marketing Business?
Examples of Mobile Marketing
Differentiate Mobile Marketing from Traditional Marketing
Mobile Marketing Strategy
Free & Paid Mobile Marketing Tools
How much does Mobile Marketing Cost?
Mobile Marketing - FAQs
8. Optimized Social Media Marketing: Social media platforms offer potent channels for
businesses to connect, share content, run ads, and build relationships. Mobile social media
marketing specifically tailors content for mobile viewing and engagement, optimizing for
features like live videos or stories to effectively interact with followers.
9. In-Game Advertising: Placing ads within mobile games engages a captive audience of
gamers. These ads, whether banners, videos, or sponsored content, integrate seamlessly
into the gaming experience, providing marketers with unique opportunities to connect with
users.
10. Engaging Mobile Video Ads: Leveraging the high engagement of video content on
mobile devices, mobile video ads are strategically placed on social media platforms,
websites, or apps. This approach visually compellingly promotes products or services.
11. Mobile Wallet Marketing: Mobile wallet marketing harnesses digital wallets like
Google Pay or Apple Pay to deliver loyalty cards, coupons, promotions, or payment options
directly to users' smartphones. This method offers businesses a convenient way to engage
with customers during transactions.
13. Voice Search Optimization: With the surge in voice assistants, optimizing content for
voice search is imperative. Voice search optimization involves tailoring keywords and
content to match natural language queries made through voice commands, enhancing
visibility in voice-enabled searches.
14. In-App Advertising: In-app advertising involves strategically placing ads within mobile
applications, ranging from banners to videos or native ads. This method capitalizes on
users' active engagement with the app, providing marketers with valuable opportunities to
connect with their target audience.
4. Geographically Targeted Outreach: Mobile Marketing brings forth the capability to tailor
campaigns based on users' specific locations, embracing the power of location-based
targeting. Businesses can curate customized strategies that align with users' geographic
locations and behaviors, thereby adding a layer of relevance and context to their marketing
initiatives.
5. Agile Adaptability: Mobile Marketing, driven by real-time data and user feedback, offers
businesses the flexibility to swiftly adapt and optimize their strategies. This responsiveness
ensures that businesses remain agile and able to navigate and respond effectively to the
ever-evolving landscape of the digital sphere.
3. Navigating Creative Constraints: Crafting compelling and visually appealing content for
mobile devices presents a unique set of challenges due to constraints like screen size and
design limitations. Successfully navigating these restrictions requires innovative approaches
to captivate users effectively within the mobile interface.
4. Budgeting for App Development: For businesses, especially smaller ones with limited
budgets, the high cost associated with designing and maintaining mobile applications can
pose financial challenges. Strategic budgeting and consideration of cost-effective
alternatives become essential to overcome this obstacle.
3. Building an Opt-In Database for SMS Campaigns: For effective SMS marketing
campaigns, prioritize the creation of an opt-in database. Encourage users to willingly
subscribe to your text messages, fostering a relationship built on consent. This approach
complies with regulatory requirements and ensures that your messages reach an audience
genuinely interested in your offerings.
5. Exploring Native Ads for Seamless Integration: Consider investing in native ads, a form
of advertising that seamlessly integrates with the platform's form and function. Native ads
provide a less disruptive and more engaging user experience. This approach fosters a
natural flow within the platform, increasing the likelihood of capturing the audience's
attention and interest.
7. Monitoring and Adapting with Analytics: Track the performance of your mobile
marketing endeavors using robust analytics tools. Regularly monitor and analyze the
results to gain insights into campaign effectiveness. Use the acquired data to make
informed adjustments and refinements to your strategies, ensuring continuous improvement
and optimal outcomes.
2. Burger King's Mobile Outreach: Burger King employs mobile marketing strategies to
effectively reach consumers on their mobile devices, driving both awareness and sales for
their diverse range of products. Through innovative approaches, Burger King leverages the
ubiquity of mobile devices to engage with their audience, ensuring their presence in the
dynamic digital landscape.
3. Swiggy's Mobile Engagement: Swiggy, a prominent food delivery platform, has adeptly
executed impactful mobile marketing campaigns aimed at engaging customers and
amplifying awareness of their services. Leveraging mobile channels, Swiggy strategically
connects with users, employing tailored strategies to promote their platform and foster
customer loyalty.
2. Flurry Analytics: Tailored for mobile apps, Flurry Analytics serves as a free analytics
tool providing valuable insights into user engagement, retention, and in-app behavior. Key
metrics such as session lengths, active users, and conversion rates are tracked, enabling
businesses to comprehend user-app interactions. With data-driven decisions, businesses
can improve app performance and foster heightened user engagement.
3. App Annie: A comprehensive platform, App Annie, delves into app market data,
competitor insights, and industry trends. Businesses gain access to critical information,
including app downloads, revenue estimates, user demographics, and market share. Armed
with this knowledge, businesses can strategically position themselves, identify growth
opportunities, and optimize their app marketing strategies.
4. Branch: Branch, a deep linking platform, facilitates seamless user experiences across
diverse devices and platforms. Through deep linking, users navigate directly to specific
content within an app or website, elevating user engagement and retention. Additionally,
Branch offers attribution analytics, allowing businesses to track campaign effectiveness and
refine user acquisition strategies.
5. Usability Hub: Usability Hub, a user testing platform, empowers businesses to gather
feedback on app usability and design. Employing remote usability tests, surveys, and
preference tests, businesses gain insights into user interactions with their mobile apps. This
feedback proves invaluable in identifying usability issues, enhancing the user experience,
and making informed design decisions.
Explore
1. Distributed databases allow data to be stored across multiple networked computers in a unified manner so that transactions can be processed in a distributed way. 2. Key advantages includ…
Full description
Uploaded by Mahboob AI-enhanced title and description
Reference book
Database Systems: A Practical Approach to Design, Implementation and Management,
by Thomas M. Connolly and Carolyn E. Begg. Ch 22
Centralized database
Distributed database
Site 3 Site 2
Homogeneous
DDBMS
Heterogeneous
Homogenous Heterogeneous
• The sites use identical DBMS or DBMS from the same vendor.
•Each site is aware of all other sites and cooperates with other
sites to process user requests.
Example
• homogeneous database system is an
enterprise’s nation-wide ERP system which
comprises of distributed databases, all of
which are Oracle.
Homogeneous Database
Same software
Example
• Example for a autonomous distributed
database system is Oracle based data marts
which manages data pertaining to sales,
distribution and inventory. Example for a non-
autonomous distributed database system is
Oracle based global sales database which is
partitioned across multiple databases.
Easy to use
Easy to mange
Easy to Design
In this type of database , Different data center may run different DBMS products, with
possibly different underlying data models.
Occurs when sites have implemented their own databases and integration is considered
later.
o Different hardware.
o Different DBMS products.
o Different hardware and different DBMS products.
Sql oracle
Heterogeneous DDBMS
Huge data can be stored in one Global center from different data
center
Difficult to mange
Difficult to design.
ANSI/SPARC Architecture
• In late 1972, the Computer and Information
Processing Committee (X3) of the American
National Standards Institute (ANSI) established a
Study Group on Database Management Systems
under the auspices of its Standards Planning and
Requirements Committee (SPARC).
• The mission of the study group was to study the
feasibility of setting up standards in this area, as
well as determining which aspects should be
standardized if it was feasible.
ANSI -SPARK
Architectural Models
Multi - DBMS
• Multi database refers to multiple databases
where each database has full autonomy - can be
seen as a collection of autonomous databases
(similar to federated databases).
• In relational DB context there is a separate
schema for each database.
• We talk about database integration that relates
data from multiple databases.
• For example there could be a manufacturing
database that records products and a separate
sales database that records sales. The two
database can make up multi database system.
Design Alternatives
Fully Replicated
Partially Replicated
Fragmented
34 pages
85 pages
58 pages
8 pages
4 pages
39 pages
14 pages
11 pages
2 pages
27 pages
5 pages
15 pages
27 pages
28 pages
8 pages
35 pages
130 pages
8 pages
2 pages
2 pages
62 pages
25 pages
33 pages
9 pages
18 pages
30 pages
19 pages
67 pages
5 pages
2 pages
30 pages
157 pages
6 pages
2 pages
18 pages
10 pages
10 pages
23 pages
2 pages
53 pages
2 pages
30 pages
10 pages
22 pages
11 pages
32 pages
3 pages
2 pages
18 pages
8 pages
42 pages
29 pages
19 pages
PDF 0% (1)
Dbms Unit II
68 pages
4 pages
50 pages
35 pages
PDF 0% (1)
Distributed Database Systems Guide
54 pages
19 pages
38 pages
1 page
28 pages
55 pages
1 page
65 pages
27 pages
112 pages
38 pages
25 pages
24 pages
23 pages
3 pages
3 pages
6 pages
122 pages
2 pages
14 pages
14 pages
4 pages
7 pages
6 pages
25 pages
4 pages
1 page
5 pages
5 pages
44 pages
148 pages
1,311 pages
3 pages
4 pages
8 pages
38 pages
95 pages
40 pages
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
The inherent trade-offs in networked shared-data system design make it very difficult to
create a dependable and effective system. The CAP theorem, or CAP principle, is a central
foundation for comprehending these trade-offs in distributed systems. The CAP theorem
emphasizes the limitations that system designers have while addressing distributed data
replication. It states that only two of the three properties—consistency, availability, and
partition tolerance—can be concurrently attained by a distributed system.
Developers must carefully balance these attributes according to their particular application
demands because of this underlying restriction. Designers may decide which qualities to
prioritize to obtain the best performance and reliability for their systems by knowing the
CAP theorem. This article will provide a thorough analysis of all the properties given in the
CAP theorem, investigate the associated trade-offs, and talk about how these ideas relate
to distributed systems in the real world.
1. Consistency
Consistency means that all the nodes (databases) inside a network will have the same
copies of a replicated data item visible for various transactions. It guarantees that every
node in a distributed cluster returns the same, most recent, and successful write. It refers to
every client having the same view of the data. There are various types of consistency
models. Consistency in CAP refers to sequential consistency, a very strong form of
consistency.
Note that the concept of Consistency in ACID and CAP are slightly different since in CAP, it
refers to the consistency of the values in different copies of the same data item in a
replicated distributed system. In ACID, it refers to the fact that a transaction will not violate
the integrity constraints specified on the database schema.
For example, a user checks his account balance and knows that he has 500 rupees. He
spends 200 rupees on some products. Hence the amount of 200 must be deducted
changing his account balance to 300 rupees. This change must be committed and
communicated with all other databases that hold this user's details. Otherwise, there will
be inconsistency, and the other database might show his account balance as 500 rupees
which is not true.
Consistency problem
2. Availability
Availability means that each read or write request for a data item will either be processed
successfully or will receive a message that the operation cannot be completed. Every non-
failing node returns a response for all the read and write requests in a reasonable amount
of time. The key word here is "every". In simple terms, every node (on either side of a
network partition) must be able to respond in a reasonable amount of time.
For example, user A is a content creator having 1000 other users subscribed to his channel.
Another user B who is far away from user A tries to subscribe to user A's channel. Since the
distance between both users are huge, they are connected to different database node of
the social media network. If the distributed system follows the principle of availability, user
B must be able to subscribe to user A's channel.
Availability problem
3. Partition Tolerance
Partition tolerance means that the system can continue operating even if the network
connecting the nodes has a fault that results in two or more partitions, where the nodes in
each partition can only communicate among each other. That means, the system continues
to function and upholds its consistency guarantees in spite of network partitions. Network
partitions are a fact of life. Distributed systems guaranteeing partition tolerance can
gracefully recover from partitions once the partition heals.
For example, take the example of the same social media network where two users are
trying to find the subscriber count of a particular channel. Due to some technical fault, there
occurs a network outage, the second database connected by user B losses its connection
with first database. Hence the subscriber count is shown to the user B with the help of
replica of data which was previously stored in database 1 backed up prior to network
outage. Hence the distributed system is partition tolerant.
Partition Tolerance
The CAP theorem states that distributed databases can have at most two of the three
properties: consistency, availability, and partition tolerance. As a result, database
systems prioritize only two properties at a time.
These types of system always accept the request to view or modify the data sent
by the user and they are always responded with data which is consistent among
all the database nodes of a big, distributed network.
However, such type of distributed systems is not realizable in real world because when
network failure occurs, there are two options: Either send old data which was replicated
moments ago before network failure or do not allow user to access the already moments
old data. If we choose first option, our system will become Available and if we choose
second option our system will become Consistent.
The combination of consistency and availability is not possible in distributed systems and
for achieving CA, the system has to be monolithic such that when a user updates the state
of the system, all other users accessing it are also notified about the new changes which
means that the consistency is maintained. And since it follows monolithic architecture, all
users are connected to single system which means it is also available. These types of
systems are generally not preferred due to a requirement of distributed computing which
can be only done when consistency or availability is sacrificed for partition tolerance.
CAP diagram
These types of system are distributed in nature, ensuring that the request sent by
the user to view or modify the data present in the database nodes are not
dropped and are processed in presence of a network partition.
The system prioritizes availability over consistency and can respond with possibly stale
data which was replicated from other nodes before the partition was created due to some
technical failure. Such design choices are generally used while building social media
websites such as Facebook, Instagram, Reddit, etc. and online content websites like
YouTube, blog, news, etc. where consistency is usually not required, and a bigger problem
arises if the service is unavailable causing corporations to lose money since the users may
shift to new platform. The system can be distributed across multiple nodes and is designed
to operate reliably even in the face of network partitions.
These types of system are distributed in nature, ensuring that the request sent by
the user to view or modify the data present in the database nodes are dropped
instead of responding with inconsistent data in presence of a network partition.
The system prioritizes consistency over availability and does not allow users to read crucial
data from the stored replica which was backed up prior to the occurrence of network
partition. Consistency is chosen over availability for critical applications where latest data
plays an important role such as stock market application, ticket booking application,
banking, etc. where problem will arise due to old data present to users of application.
For example, in a train ticket booking application, there is one seat which can be booked. A
replica of the database is created, and it is sent to other nodes of the distributed system. A
network outage occurs which causes the user connected to the partitioned node to fetch
details from this replica. Some user connected to the unpartitioned part of distributed
network and already booked the last remaining seat. However, the user connected to
partitioned node will still one seat which makes the available data inconsistent. It would
have been better if the user was shown error and make the system unavailable for the user
and maintain consistency. Hence consistency is chosen in such scenarios.
Conclusion
The CAP theorem provides a framework for understanding the trade-offs in designing
distributed systems. It highlights that only two out of three properties—Consistency,
Availability, and Partition Tolerance—can be achieved simultaneously. Depending on the
application requirements, developers must prioritize the properties that best meet their
needs. It's also important to note that many modern databases offer configurations to
balance these properties dynamically based on specific use cases.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
• Key-value pair
• Document-oriented
• Column-oriented
• Graph-based
• Time series
These types of databases are optimized specically for applications that need large data volumes,
exible data models, and low latency. To achieve these objectives, NoSQL databases employ
various techniques, and it's important to note that not all database options prioritize the same set
of factors that are mentioned here:
• Consistency
• Query language and data access richness (simplied Create, Read, Update, and Delete (CRUD)-
style operations with known predictable cost)
• Shifting the burden of data and schema validation to the application (removing referential
integrity enforcement by the database and so on)
1. Key-value — A key-value data store is a type of database that stores data as a collection of key-
value pairs. In this type of data store, each data item is identied by a unique key, and the value
associated with that key can be anything, such as a string, number, object, or even another data
structure.
3
Choosing an AWS NoSQL Database AWS Whitepaper
Documents are retrieved by unique keys. It may also be possible to retrieve only parts of a
document--for example, the cost of an item--to run queries such as aggregation, querying using
examples based on a text string, or even full-text search. Most document databases also allow
you to dene secondary indexes.
You can transfer the application code object model directly into a document using several
dierent formats. The most commonly used are JavaScript Object Notation (JSON), Binary
JavaScript Object Notation (BSON), and Extensible Markup Language (XML).
4
Choosing an AWS NoSQL Database AWS Whitepaper
AWS oers a specialized document database service called Amazon DocumentDB (with
MongoDB compatibility).
3. Wide-column — A wide column data store is a type of NoSQL database that stores data in
columns rather than rows, making it highly scalable and exible. In a wide column data store,
data is organized into column families, which are groups of columns that share the same
attributes. Each row in a wide column data store is identied by a unique row key, and the
columns in that row are further divided into column names and values.
Unlike traditional relational databases, which have a xed number of columns and data types,
wide column data stores allow for a variable number of columns and support multiple data
types. The most signicant benet of having column-oriented databases is that you can store
5
Choosing an AWS NoSQL Database AWS Whitepaper
large amounts of data within a single column. This feature allows you to reduce disk resources
and the time it takes to retrieve information from it.
An example of the kind of data you might store in a wide-column data store
AWS oers Amazon Keyspaces (for Apache Cassandra) as a wide-column managed database
service.
4. Graph — Graph databases are used to store and query highly connected data. Data can be
modeled in the form of entities (also referred to as nodes, or vertices) and the relationships
between those entities (also referred to as edges). The strength or nature of the relationships
also carry signicant meaning in graph databases.
Users can then traverse the graph structure by starting at a dened set of nodes or edges and
travel across the graph, along dened relationship types or strengths, until they reach some
dened condition. Results can be returned in the form of literals, lists, maps, or graph traversal
paths. Graph databases provide a set of query languages that contain syntax designed for
traversing a graph structure, or matching a certain structural pattern.
6
Choosing an AWS NoSQL Database AWS Whitepaper
An example of a social network graph. Given the people (nodes) and their relationships (edges), you
can find out who the "friends of friends" of a particular person are—for example, the friends of
Howard's friends.
5. Time series — A time series database is designed to store and retrieve data records that are
sequenced by time, which are sets of data points that are associated with timestamps and stored
in time sequence order. Time series databases make it easy to measure how measurements or
7
Choosing an AWS NoSQL Database AWS Whitepaper
events change over time; for example, temperature readings from weather sensors or intraday
stock prices.
Amazon Keyspaces (for High scalable apps for: • Extreme write speeds • Tables are encr
Apache Cassandra) with relatively less default
• Equipment maintenance
velocity reads • Capability to e
• Data format: JSON • Fleet management
• Being serverless, encryption at r
• NoSQL type: wide • Route optimization allocates storage and transit
column
read/write throughput
• Consistency: one, directly to tables
local_one, local-quo
rum
Amazon Timestream* • Server metrics Analytics over time series Encrypts all data
• Application performance data default
• NoSQL type: TimeSerie
monitoring
s
• Network data
• Consistency: eventual
• IoT apps
• Sensor data
• Events
• Clicks
• Financial forecasting
• Many other types of
analytics data
Amazon ElastiCache • Caching repeat requests • Simple caching model Capability to ena
(Memcached) • Sticky sessions (to store • Multi-threaded encryption at res
session state) performance transit
• Low TB range
• NoSQL type: in-
memory, key-value
• Consistency: strong/ev
entual
* ACID compliant
NoSQL databases
• Amazon DynamoDB
• Amazon Neptune
• Amazon Timestream
• Amazon ElastiCache
• Amazon MemoryDB
Amazon DynamoDB
Amazon DynamoDB is a fully managed NoSQL database service. Some key capabilities of
DynamoDB include:
• High performance — Designed to provide single-digit millisecond latency for read and write
operations at any scale.
Amazon DynamoDB 11
Choosing an AWS NoSQL Database AWS Whitepaper
Amazon Keyspaces (for Apache Cassandra) (Amazon Keyspaces) is a fully managed, Apache
Cassandra-compatible database service. Some key features of Amazon Keyspaces include:
• Apache Cassandra compatibility ‑— Full compatibility with Cassandra, allowing you to use your
existing Cassandra applications and tools with minimal changes.
• Scalability — Designed to handle millions of requests per second and terabytes of data, making
it suitable for high-scale applications.
• Serverless — Instead of deploying, managing, and maintaining storage and compute resources
for your workload through nodes in a cluster, Amazon Keyspaces allocates storage and read/
write throughput resources directly to tables.
• Global distribution — Supports global distribution of data, allowing you to store and access
data from multiple Regions, reducing latency and improving application performance.
• Monitoring and management — Provides an easy-to-use, web-based console for monitoring
and managing your database, as well as integration with Amazon CloudWatch for metrics and
alerts.
• Integration with other AWS services — Integration with other AWS services such as Amazon S3,
Amazon Redshift, and Amazon EMR, making it easy to build data-driven applications.
• Highly available and secure — Data is replicated automatically across multiple AWS Availability
Zones using a replication factor of three. Amazon Keyspaces encrypts all customer data at rest by
default, and is integrated with AWS IAM to help you manage access to your tables and data.
Amazon Neptune
Amazon Neptune is a fully managed graph database service. Neptune makes it easy to build and
run applications that work with highly connected datasets, including for ID, graph/C360, security,
fraud, and knowledge graph applications. Some key features of Amazon Neptune include:
• High performance — Provides low-latency and high-throughput performance for both read and
write operations, making it suitable for real-time applications.
• Scalability — Neptune can handle billions of vertices and edges, and is designed to
automatically scale to meet the demands of your application.
• Compatibility — Supports the popular graph query languages, including Apache TinkerPop
Gremlin and SPARQL, making it easy to use with existing applications and tools.
• Durability — Automatically replicates data across multiple Availability Zones (AZs) for high
availability (HA) and data durability.
• Integration with other AWS services — Integration with other AWS services such as Amazon
S3, Amazon OpenSearch Service, and Amazon SageMaker AI, making it easy to build data-driven
applications.
Amazon Timestream
Amazon Timestream is a managed time series database. It is specically designed to handle time-
stamped data, such as IoT device data and operational logs, and provides a fast, scalable and cost-
eective way to store and analyze large amounts of time series data.
• Scalable storage — Automatically scales storage as your data grows, so you don’t have to worry
about running out of space.
Amazon Neptune 13
Choosing an AWS NoSQL Database AWS Whitepaper
• Fast querying — Provides fast and ecient querying of your time series data, allowing you to
quickly and easily analyze your data.
• Integrations — Integration with other AWS services, such as Amazon Kinesis, Amazon S3, and
QuickSight, making it easy to collect, store and analyze your data.
• Cost-eective — Provides cost-eective storage and analysis of your time series data, with the
ability to choose between standard and memory-optimized performance tiers.
Amazon ElastiCache
Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory
cache in the cloud. It provides a simple way to cache frequently-used data in memory, reducing
the need to repeatedly fetch this data from a slower disk-based data store such as a relational
database.
ElastiCache supports two popular in-memory cache engines: Memcached and Redis. These engines
can be used to signicantly improve the performance of web applications, mobile apps, gaming
platforms
• Easy setup —Providing a simple, one-click setup process, making it easy to get started with in-
memory caching.
• Scalable performance —Automatically scales cache nodes as your application's needs change, so
you can easily adjust cache performance to meet the demands of your application.
• High availability —Provides built-in replication and failover capabilities, ensuring high
availability and durability of your cache data.
• Integrations —integration with other AWS services, such as Amazon Elastic Compute Cloud
(Amazon EC2), Amazon Relational Database Service (Amazon RDS), and Amazon S3, making it
easy to use caching in your overall application architecture.
Amazon ElastiCache 14
Choosing an AWS NoSQL Database AWS Whitepaper
DocumentDB supports a range of use cases, such as content management systems, e-commerce
applications, and mobile backends. It allows you to store and retrieve JSON-like documents, with
support for complex queries, indexing, and aggregation. It also supports transactions, allowing you
to group multiple write operations into a single atomic unit of work.
• Compatibility — DocumentDB is compatible with existing MongoDB drivers and tools, and
applications can be used with DocumentDB with little or no change.
• Scalability — DocumentDB Elastic Clusters scale within minutes to handle millions of reads and
writes with petabytes of storage capacity, helping you cost-eectively meet the needs of your
most demanding document workloads.
• Flexibility — Supports a exible data model that allows you to store and retrieve JSON-like
documents with varying structure and complexity. This makes it well-suited for a wide range of
use cases, such as content management, user proles, product catalogs, and more.
• Durability — Designed to provide high durability and availability for your data. It provides
automatic backup and recovery, point-in-time recovery, and data replication across multiple
Availability Zones for high availability and disaster recovery (DR). DocumentDB automatically
backs up your data and transaction logs to Amazon S3, which is designed for 99.999999999%
durability. This helps ensure that your data is protected against data loss or corruption, even in
the event of a disaster or outage.
• Global clusters — DocumentDB global clusters provides disaster recovery from Region-wide
outages and enables low-latency global reads.
• Integration with other AWS services — You can integrate with AWS Glue to import and export
data from and to DocumentDB to other AWS services such as Amazon S3, Amazon Redshift,
and Amazon OpenSearch Service.
Amazon MemoryDB
Amazon MemoryDB (MemoryDB) is a fully managed, in-memory database service. It is designed to
provide high performance and low latency for applications that require fast and frequent access to
data.
• Compatibility — Compatibility with Redis, an open-source, in-memory data store that is widely
used for caching, near real-time analytics, and other high-performance applications. MemoryDB
Amazon MemoryDB 15
Choosing an AWS NoSQL Database AWS Whitepaper
supports the same set of Redis data types and parameters, and requires no code change to
migrate from Redis.
• Scalability — Designed to be highly available and durable, with automatic failover and data
replication across multiple Availability Zones for high availability and disaster recovery.
• Data durability — Data is stored across multiple Availability Zones, while ensuring single-digit
millisecond response using AWS proprietary architecture design.
• Support for security features such as Amazon Virtual Private Cloud (Amazon VPC), encryption
with AWS Key Management Service (AWS KMS), and authentication and authorization with Redis
ACLs.
• Flexibility — Provides a number of features and capabilities to help you optimize your
application's performance, including read and write replicas, Multi-AZ deployments, automatic
scaling, and exible pricing options based on usage. MemoryDB is well-suited for a wide range of
use cases, including caching, near real-time analytics, and session stores. It is particularly useful
for applications that require fast and frequent access to data, such as gaming, e-commerce, and
advertising.
Amazon NoSQL databases integrate with AWS Identity and Access Management (AWS IAM) for
access control and security. IAM allows you to manage access to your NoSQL databases by creating
policies that dene permissions for specic users, groups, or roles. You can use IAM to control
access to specic tables or resources within your NoSQL databases, as well as to enforce ne-
grained permissions for read and write operations.
Amazon MemoryDB 16
Choosing an AWS NoSQL Database AWS Whitepaper
Decision making
This section outlines a decision framework you can use to help you determine which AWS-managed
NoSQL database service, or combination of database services would t your workload needs best.
While there is no simple formula you can follow that is comprehensive enough to apply generally,
there are a few important questions related to your application that should be answered during the
selection process:
What type of data is your application planning to persist (such as JSON structures, telemetry
data, image les, geospatial data)?
Dierent databases allow you to access stored data dierently. If you plan to store unstructured
data such as images or encoded payloads, you need a data store that can store and retrieve
the entire unstructured binary payload fast, but may need a rich set of data access features to
introspect the unstructured data.
Conversely, a catalog system needs a richer feature set to access data based on patterns, but also
allow for exibility to expand the set of attributes collected for each item in the catalog. These
capabilities may be more important than the absolute fastest way to retrieve data access.
What performance requirements and service-level commitments have you made to your end
users (for example, a service level agreement that guarantees microsecond or millisecond-level
response latency for queries)?
If your workload requires extremely high read performance with a response time measured in
microseconds rather than single-digit milliseconds, then you may consider using in-memory
caching solutions alongside your database, or a database that supports in-memory data access.
Also consider how predictable your performance needs to be. A database such as Amazon
DynamoDB can deliver consistent, predictable response latencies to reads and writes, but it does
so because it supports a small number of query patterns that have a known cost. It’s a great t for
point queries accessing one or a very small number of records at high frequency.
If your data access patterns require accessing a variable or unpredictable number of records or
volume of data, your performance will also have more variability. Consider also that modern
architectures are implemented using decoupled microservices, each with dierent date access
requirements, compounding the end-to-end latency or performance of the end user request.
17
Choosing an AWS NoSQL Database AWS Whitepaper
Workloads with high availability requirements (such as mission-critical applications that can’t
tolerate any downtime) can span multiple Regions to provide further resiliency in case a specic
AWS region becomes unreachable. For example, you can use DynamoDB global tables to deploy
globally across supported Regions and read or write to the local copies of the tables concurrently.
Amazon DocumentDB also supports multiple Regions through Amazon DocumentDB global
clusters, but you can only write in the primary Region cluster.
After you address these questions, you can use the following decision tree for further direction on
how to narrow down your choices. The decision tree covers two scenarios:
1. If you’re already using a NoSQL database on premises and would like to consider migrating to a
fully managed scalable, highly available AWS NoSQL database service, start your review of our
decision tree at Step 1.
2. If you want to modernize your application and are considering a NoSQL database, you can use
the decision tree to choose the most appropriate AWS-managed NoSQL database service for
your use case based on your requirements by starting at Step 2, You can start eith the data
model that is appropriate for your use case.
18
Choosing an AWS NoSQL Database AWS Whitepaper
19
Choosing an AWS NoSQL Database AWS Whitepaper
Considerations
• If your use cases use relatively static schemas, perform complex table lookups, require accessing
data across multiple keys and might experience high service throughputs it might be a better t
for Amazon RDS oerings.
References
• Scale and performance characteristics of Timestream – Deriving near real-time insights over
petabytes of time series data with Amazon Timestream.
• This blog post provides you with a quick summary and set of resources for common topics so you
can quickly ramp up on Amazon DocumentDB.
• This blog post provides improved performance characteristics of Amazon Keyspaces, lightweight
transactions API, advanced design patterns, and operational best practices.
• Getting started with Amazon Neptune by creating a graph of all of your AWS resources.
• How to migrate an application from using GridFS to using Amazon S3 and Amazon DocumentDB.
• Graph data model lets you traverse through relationships without requiring joins and indexes.
For more information, refer to the "How Do I Know I Need an Amazon Neptune Graph
Database?” video.
• Graph data model lets you traverse through relationships without requiring joins and indexes.
For more information, refer to "How Do I Know I Need an Amazon Neptune Graph Database?”.
• Complex data models (such as arrays, nested elds, and deep relationships) let you consider a
wider range of application needs. For more information, refer to the “When to use DocumentDB
vs DynamoDB” video.
• DynamoDB provides extreme scale for certain data access patterns. For more information, refer
to “How to determine if Amazon DynamoDB is appropriate for your needs”.
• Refer to this tech talk to learn about DocumentDB use cases, and how Amazon DocumentDB
cluster architecture provides better performance, scalability, and availability.
• Amazon MemoryDB is a durable, in-memory database for workloads that require an ultra-fast
Redis-compatible primary database. If you require sub-millisecond performance and need to add
persistence and durability, consider using MemoryDB rather than in-memory cache for Redis.
Refer to this tech talk to learn about Amazon MemoryDB.
Developer references
• Why purpose-built database? This hands-on tutorial will help you get an idea of how AWS
NoSQL databases can help run your specic workloads.
References 21
Choosing an AWS NoSQL Database AWS Whitepaper
• To ensure that development teams were comfortable with transitioning to Amazon, it essential
to train the teams on AWS NoSQL databases and cloud-based design patterns (tech talks,
workshops, and Immersion Days.)
Conclusion
NoSQL databases have become increasingly popular over the years due to their scalability,
exibility, and ability to handle large volumes of complex data. This whitepaper has provided
an overview of the dierent types of NoSQL databases, including document-based, key-value,
column-family, and graph databases, as well as their unique strengths and weaknesses.
It has also explored the various NoSQL database services oered by Amazon Web Services,
including Amazon DynamoDB, Amazon Keyspaces, Amazon Neptune, Amazon DocumentDB, and
Amazon MemoryDB. We hope it helps you make an informed decision on which database to choose
based on your specic needs.
23
Choosing an AWS NoSQL Database AWS Whitepaper
Contributors
Contributors to this document include:
24
Choosing an AWS NoSQL Database AWS Whitepaper
Document revisions
To be notied about updates to this whitepaper, subscribe to the RSS feed.
25
Choosing an AWS NoSQL Database AWS Whitepaper
Notices
Customers are responsible for making their own independent assessment of the information in
this document. This document: (a) is for informational purposes only, (b) represents current AWS
product oerings and practices, which are subject to change without notice, and (c) does not create
any commitments or assurances from AWS and its aliates, suppliers or licensors. AWS products or
services are provided “as is” without warranties, representations, or conditions of any kind, whether
express or implied. The responsibilities and liabilities of AWS to its customers are controlled by
AWS agreements, and this document is not part of, nor does it modify, any agreement between
AWS and its customers.
© 2023 Amazon Web Services, Inc. or its aliates. All rights reserved.
26
Search... Sign In
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Neo4j is the most famous database management system and it is also a NoSQL database
system. Neo4j is different from Mysql or MongoDB it has its own features and it is
designed to efficiently store and query highly interconnected data that's makes it special
compared to other Database Management System.
Neo4j is a cutting-edge database designed to handle and analyze connected data more
efficiently than traditional databases. Instead of using tables, it uses graph structures to
store and query data, making it ideal for applications with complex relationships. Neo4j is
known for its high performance, scalability, and flexibility, and is used in various fields like
social network analysis, fraud detection, recommendation systems, and knowledge graphs.
This article will explore Neo4j's key features, benefits, and uses, showing how it transforms
data management and helps uncover deeper insights
Table of Content
What is Neo4j?
Neo4j structure
What is a Graph Database?
Features of Neo4J
Neo4j Usage
What is Neo4j?
Neo4j is a powerful, high-performance, open-source graph database that enables the
efficient management and querying of highly connected data. Unlike traditional relational
databases, Neo4j uses graph structures to represent and store data, making it uniquely
suited for applications involving complex relationships and dynamic, interconnected data.
As the world's leading graph database, Neo4j has become essential for organizations
looking to leverage the power of graph technology for a variety of use cases.
Neo4j structure
Neo4j stores and present the data in the form of graph not in tabular format or not in a Json
format. Here the whole data is represented by nodes and there you can create a
relationship between nodes. That means the whole database collection will look like a
graph, that's why it is making it unique from other database management system.
A graph Database
MS Access, SQL server all the relational database management system use tables to store
or present the data with the help of column and row but Neo4j doesn't use tables, row or
columns like old school style to store or present the data.
This structure allows graph databases to model real-world scenarios more naturally and
intuitively than traditional relational databases.
Features of Neo4J
Neo4j is designed to handle massive amounts of data and complex queries quickly and
efficiently. Its native graph storage and processing engine ensure high performance and
scalability, even with billions of nodes and relationships.
Neo4j uses Cypher, a powerful and expressive query language tailored for graph
databases. Cypher makes it easy to create, read, update, and delete data, allowing users
to perform complex queries with concise and readable syntax.
ACID Compliance
Neo4j ensures data integrity and reliability through ACID (Atomicity, Consistency, Isolation,
Durability) compliance. This guarantees that all database transactions are processed
reliably and ensures the consistency of the database even in the event of failures.
Flexible Schema
Unlike traditional databases, Neo4j offers a flexible schema, allowing users to add or
modify data models without downtime. This adaptability makes it ideal for evolving data
structures and rapidly changing business requirements.
Neo4j Usage
If your Database Management System has so many interconnecting relationships then you
can use Neo4j that will be the best choice. Neo4j is highly preferable to store data that
contains multiple connections between nodes. This is where the Neo4j comes in it's more
comfortable to use with relational data than the relational database. Because Neo4j
doesn't require a predefined schema, you just need to load the data here the data is the
main structure. It is schema optional Database Management System.
There are some unique features that will make you choose Neo4j over any other Database
Management System. Neo4j is surrounded by relationships but there is no need to set up
primary key or foreign key constraints to any data. Here you can add any relation between
any nodes you want. That makes the Neo4j extremely suited for Networking data, below is
the list of data areas where you can use this Database Management System.
Advantages of Neo4j:
Disadvantages of Neo4j:
Conclusion
Neo4j stands out as a leading graph database solution, offering unparalleled capabilities
for managing and querying highly interconnected data. Its native graph architecture
provides flexibility and performance advantages over traditional relational databases,
especially in scenarios involving complex relationships and real-time queries. As
organizations increasingly recognize the value of understanding relationships within their
data, Neo4j continues to play a crucial role in enabling advanced analytics, recommendation
engines, and knowledge graphs.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Figure 26.1 A simplified COMPANY database used for active rule examples
Figure 26.6 Trigger T1 illustrating the syntax for defining triggers in SQL-99
Figure 26.7 Different types of temporal relational databases (a) Valid time database
schema (b) Transaction time database schema (c) Bitemporal database schema
Figure 26.8 Some tuple versions in the valid time relations EMP_VT and DEPT_VT
Figure 26.16
Predicates for
illustrating
relational operations
This document discusses active databases and database triggers. It defines a trigger as a procedure that is automatically invoked by the
database management system in response to specified changes made to the database. An active database is one that has associated …
triggers. Triggers have three parts - an event that activates the trigger, an optional condition, and an action that is executed if the
condition evaluates to true. Triggers allow maintaining database integrity and performing additional actions in response to insert, update,
Science
or delete statements. They can also be used for auditing and logging changes made to the database.
Read more
1 / 12
2 / 12
3 / 12
4 / 12
5 / 12
6 / 12
7 / 12
8 / 12
9 / 12
10 / 12
11 / 12
12 / 12
Recommended
Query optimization
by Pooja Dixit
PPTX
14 slides ◦ 2.9K views
Architecture of-dbms-
and-data-independence
by Anuj Modi
PPS
12 slides ◦ 23.6K views
by arifmollick8578
PPTX
10 slides ◦ 3.1K views
Integrity Constraints
by madhav bansal
PPTX
21 slides ◦ 21.3K views
joins in database
by Sultan Arshad
PPTX
32 slides ◦ 20.9K views
by Chetan Mahawar
PPTX
9 slides ◦ 11.3K views
Trigger
by VForce Infotech
PPTX
29 slides ◦ 12.4K views
Concurrency Control in
Distributed Database.
by Meghaj Mallick
PPTX
20 slides ◦ 4.5K views
SQL Views
by [Link] - No. 1 s…
PPT
28 slides ◦ 16.4K views
by koolkampus
PPT
76 slides ◦ 31.4K views
by Felipe Costa
PDF
29 slides ◦ 6.2K views
Transaction Processing
in [Link]
by Lovely Professional U…
PPTX
33 slides ◦ 2.4K views
by Anusha sivakumar
PPTX
17 slides ◦ 2.3K views
Introduction to method
overloading & …
by Harshal Misalkar
PPT
10 slides ◦ 3.8K views
3 Level Architecture
by Adeel Rasheed
PPTX
8 slides ◦ 6.3K views
by Dhananjaysinh Jhala
PPTX
12 slides ◦ 13.1K views
by NITISH KUMAR
PPTX
57 slides ◦ 15K views
serializability in dbms
by Saranya Natarajan
PPTX
24 slides ◦ 27K views
Applications of
DBMS(Database …
by chhinder kaur
PPTX
9 slides ◦ 2.9K views
by GirdharRatne
PPTX
17 slides ◦ 31K views
Programming in c Arrays
by janani thirupathi
PPTX
31 slides ◦ 29.6K views
Unit I - Evaluation of
expression
by DrkhanchanaR
PPTX
22 slides ◦ 2.8K views
Packages in java
by Elizabeth alexander
PPTX
14 slides ◦ 12K views
Deadlock dbms
by Vardhil Patel
PPTX
9 slides ◦ 12.7K views
Transaction
management DBMS
by Megha Patel
PPTX
20 slides ◦ 20.7K views
Structure of dbms
by Megha yadav
PPTX
10 slides ◦ 6.2K views
Exception Handling in
[Link]
by rishisingh190
PPTX
20 slides ◦ 1.7K views
Concurrent transactions
by Sajan Sahu
PPT
15 slides ◦ 8.8K views
Multimedia Databases
Concepts: Managing …
by COSMOS58
PPTX
75 slides ◦ 18 views
basicofunit-4-
250728105436-…
by meetpathak040
PPTX
17 slides ◦ 10 views
AD
SQL Views 11. Storage and File Nested Queries Lecture Transaction Processing in basic structure of SQL Introduc
Structure in DBMS [Link] [Link] overload
by [Link] - No. 1 supp… by koolkampus by Felipe Costa by Lovely Professional Univ… by Anusha sivakumar by Harsha
28 slides ◦ 16.4K views 76 slides ◦ 31.4K views 29 slides ◦ 6.2K views 33 slides ◦ 2.4K views 17 slides ◦ 2.3K views 10 slides ◦
AD
Multimedia Databases basicofunit-4- basicofunit-4- Basic information of unit-4 Trigger in mysql Triggers
Concepts: Managing … 250728105436-… 250728105436-… form of ppt. yo datab
by COSMOS58 by meetpathak040 by meetpathak040 by meetpathak040 by [Link] Magar by MrSush
75 slides ◦ 18 views 17 slides ◦ 10 views 17 slides ◦ 6 views 17 slides ◦ 9 views 23 slides ◦ 957 views 9 slides ◦ 9
AD
Recently uploaded
Metamorphism - How Rocks Rotkotoe_ A Unified Science 5 Quarter 2 Living 1. GP1- Kinematics and its Metallic Crystals The Sola
Change Due to Heat and … Framework of Reality … Things Discussion on Sele… [Link] presentation (chemistry) collectio
by EricsonBueza by Rotko Rotko Toe by PeteraBotea by rolandrogerdelatorre by jasminecookiejasytt by Shreya
48 slides ◦ 0 views 16 slides ◦ 0 views 71 slides ◦ 0 views 13 slides ◦ 0 views 10 slides ◦ 0 views 12 slides ◦
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Multimedia Database
Last Updated : 02 Jan, 2025
There are still many challenges to multimedia databases, some of which are :
1. Modelling - Working in this area can improve database versus information retrieval
techniques thus, documents constitute a specialized area and deserve special
consideration.
2. Design - The conceptual, logical and physical design of multimedia databases has not
yet been addressed fully as performance and tuning issues at each level are far more
complex as they consist of a variety of formats like JPEG, GIF, PNG, MPEG which is not
easy to convert from one form to another.
3. Storage - Storage of multimedia database on any standard disk presents the problem
of representation, compression, mapping to device hierarchies, archiving and buffering
during input-output operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows
untyped bitmaps to be stored and retrieved.
4. Performance - For an application involving video playback or audio-video
synchronization, physical limitations dominate. The use of parallel processing may
alleviate some problems but such techniques are not yet fully developed. Apart from
this multimedia database consume a lot of processing time as well as bandwidth.
5. Queries and retrieval -For multimedia data like images, video, audio accessing data
through query opens up many issues like efficient query formulation, query execution
and optimization which need to be worked upon.
Documents and record management : Industries and businesses that keep detailed
records and variety of documents. Example: Insurance claim record.
Knowledge dissemination : Multimedia database is a very effective tool for knowledge
dissemination in terms of providing several resources. Example: Electronic books.
Education and training : Computer-aided learning materials can be designed using
multimedia sources which are nowadays very popular sources of learning. Example:
Digital libraries.
Marketing, advertising, retailing, entertainment and travel. Example: a virtual tour of
cities.
Real-time control and monitoring : Coupled with active database technology,
multimedia presentation of information can be very effective means for monitoring and
controlling complex tasks Example: Manufacturing operation control.
The database must support large objects, since multimedia data such as videos can
occupy up to a few gigabytes of storage. Many database systems do not support
objects larger than a few gigabytes. Larger objects could be split into smaller pieces
and stored in the database. Alternatively, the multimedia object may be stored in a file
system, but the database may contain a pointer to the object; the pointer would typically
be a file name. The SQL/MED standard (MED stands for Management of External Data)
allows external data, such as files, to be treated as if they are part of the database.
With SQL/MED, the object would appear to be part of the database, but can be stored
externally.
The retrieval of some types of data, such as audio and video, has the requirement that
data delivery must proceed at a guaranteed steady rate. Such data are sometimes
called isochronous data, or continuous-media data. For example, if audio data are not
supplied in time, there will be gaps in the sound. If the data are supplied too fast,
system buffers may overflow, resulting in loss of data.
Similarity-based retrieval is needed in many multimedia database applications. For
example, in a database that stores fingerprint images, a query fingerprint image is
provided, and fingerprints in the database that are similar to the query fingerprint must
be retrieved. Index structures such as B+- trees and R-trees cannot be used for this
purpose; special index structures need to be created.
Reference -
Comment H Himanshi_Singh 34
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
Pre-requisites: Database
A temporal database is a database that needs some aspect of time for the organization of
information. In the temporal database, each tuple in relation is associated with time. It
stores information about the states of the real world and time. The temporal database
does store information about past states it only stores information about current states.
Whenever the state of the database changes, the information in the database gets
updated. In many fields, it is very necessary to store information about past states. For
example, a stock database must store information about past stock prizes for analysis.
Historical information can be stored manually in the schema.
Valid Time: The valid time is a time in which the facts are true with respect to the real
world.
Transaction Time: The transaction time of the database is the time at which the fact is
currently present in the database.
Decision Time: Decision time in the temporal database is the time at which the decision
is made about the fact.
Temporal databases use a relational database for support. But relational databases have
some problems in temporal database, i.e. it does not provide support for complex
operations. Query operations also provide poor support for performing temporal queries.
1. It can be used in Factory Monitoring System for storing information about current and
past readings of sensors in the factory.
2. Healthcare: The histories of the patient need to be maintained for giving the right
treatment.
3. Banking: For maintaining the credit histories of the user.
1. An EMPLOYEE table consists of a Department table that the employee is assigned to.
If an employee is transferred to another department at some point in time, this can be
tracked if the EMPLOYEE table is an application time-period table that assigns the
appropriate time periods to each department he/she works for.
Temporal Relation
A temporal relation is defined as a relation in which each tuple in a table of the database is
associated with time, the time can be either transaction time or valid time.
1. Uni-Temporal Relation: The relation which is associated with valid or transaction time is
called Uni-Temporal relation. It is related to only one time.
2. Bi-Temporal Relation: The relation which is associated with both valid time and
transaction time is called a Bi-Temporal relation. Valid time has two parts namely start time
and end time, similar in the case of transaction time.
3. Tri-Temporal Relation: The relation which is associated with three aspects of time
namely Valid time, Transaction time, and Decision time called as Tri-Temporal relation.
The temporal database provides built-in support for the time dimension.
Temporal database stores data related to the time aspects.
A temporal database contains Historical data instead of current data.
It provides a uniform way to deal with historical data.
1. Data Storage: In temporal databases, each version of the data needs to be stored
separately. As a result, storing the data in temporal databases requires more storage as
compared to storing data in non-temporal databases.
2. Schema Design: The temporal database schema must accommodate the time
dimension. Creating such a schema is more difficult than creating a schema for non-
temporal databases.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions
11.2.2019.
After this lecture, you’ll...
2
Part one
What is information retrieval?
Text representations and preprocessing
General information retrieval model
Part two
About the course (IE 663 + IE 681)
Topics
Organization
Part one
What is information retrieval?
Text representations and preprocessing
General information retrieval model
Part two
About the course
Topics
Organization
1996 2009
Part one
What is information retrieval?
Text representations and preprocessing
General information retrieval model
Part two
About the course
Topics
Organization
1. Unstructured representation
Text represented as an unordered set of terms (the so-called bag of words)
Considerable oversimplification
We are ignoring the syntax, semantics, and pragmatics of text
Is this problematic?
Q: „Revenue of Apple”
D: „Apple Pencil 2 'to launch in March 2017‘... Microsoft faces drop in revenue in the 3rd quarter...”
2. Weakly-structured representations
Certain groups of terms given more importance – e.g., nouns or named entities
Other terms’ contribution is either downscaled or completely ignored
Some natural language processing (NLP) tools required
Part-of-speech (POS) tagger to identify nouns or named entity recognizer (NER) to
identify named entities
Additional preprocessing can be costly
3. Structured representations
For example, graphs in which nodes represent some terms/concepts and edges
semantic relations between them
Sophisticated information extraction (IE) and NLP tools needed to induce structure
IE models typically not accurate enough and time-costly
Structured representations are virtually not used in IR
Document snippet
„One evening Frodo and Sam were walking together in the cool twilight. Both of
them felt restless again. On Frodo suddenly the shadow of parting had falling: the
time to leave Lothlorien was near. ”
Weakly-structured representations
Bag of nouns
{(evening, 1), (Frodo, 2), (Sam, 1), (twilight, 1), (shadow, 1), (parting, 1), (time, 1),
(Lothlorien, 1)}
„One evening Frodo and Sam were walking together in the cool twilight. Both of them felt restless
again. On Frodo suddenly the shadow of parting had falling: the time to leave Lothlorien was near. ”
Structured representation
For example, event-based structure
walking
Same time
shadow
Frodo felt restless
Before
leave
The preprocessing (i.e., preparing text for the retrieval process) usually involves
the following steps:
1. Extracting pure textual content (e.g., from HTML, PDF, Word)
2. Language detection
Optional – if you’re dealing with multilingual document collections
3. Tokenization (separating text into character sequences)
4. Morphological normalization (lemmatization or stemming)
5. Stopword removal
Morphological normalization
Reducing different forms of the „same” word to a common representative form
Examples of rules
„-ing” -> „”
„ly” -> „”
„sses” -> „ss”
„ational” -> „ate”
„tional” -> „tion”
Rules are sensitive to the measure of „how much of a word” a string is
Rules consider sequences of consonants and vowels, e.g., [C][VC]m[V]
Rules also often take into account the length of the remaining „root”
E.g., „ement” -> „” is valid only if the remaining word has more than one syllable
„replacement” -> „replac” but „cement” -> „cement”
Part one
What is information retrieval?
Text representations and preprocessing
General information retrieval model
Part two
About the course
Topics
Organization
2. fq is a function that maps queries (raw text) to their representation for retrieval,
i.e., fq(q) = sq, where sq is the retrieval representation of the document q;
Depending on the IR model, fd and fq may or may not be the same function
Index terms are all terms in the collection (i.e., the vocabulary)
Except those we ignore in preprocessing (like stopwords)
The set of all index terms: K = {k1, k2, ..., kt}
Each term ki is, for each document dj, assigned a weight wij
The weight of the index terms not appearing in the document is 0
Document dj is represented by term vector [w1j, w2j, ..., wtj] where t is the number
of index terms
Let g be the function that computes the weights, i.e., wij = g(ki, dj)
Different choices for the weight-computation function g and the ranking function
r define different IR models
Part one
What is information retrieval?
Text representations and preprocessing
General information retrieval model
Part two
About the course
Topics
Organization
Tentative schedule
Topics published: approx. March 1
Topics selected and confirmed: approx. March 15
Project coaching:
Two sessions, on April 8 and May 13
We check the progress of your projects
Help you resolve dilemas and problems you might be facing
This course is powered by the Data and Web Science (DWS) group
Your IR & WS teachers
Prof. Dr. Goran Glavaš (lecturer)
Robert Litschko (teaching assistant)
Office hours (Goran)
Fridays at 15:00 (in lecture weeks only)
B6 29, building C, Room C1.02
Visits should be previously announced via email
E-mail communication
Only for really urgent matters, otherwise come in office hours
If you’re wondering whether your matter is urgent or not, it probably isn’t
All relevant information will be posted timely in ILIAS
Data Science Data Science Projects Data Analysis Data Visualization Machine Learning ML Projects Deep Learning NLP Computer Vision Artificial Intelligence
Information Retrieval (IR) helps to find relevant information from large collections of
documents. It can be defined as a software program that deals with the organization,
storage, retrieval and evaluation of information from documents. It is like a smart
librarian who doesn’t give you direct answers but tells you where to find the right book like
this IR system scans them and pulls out the ones that match your query.
When you search for something Information Retrieval (IR) model helps find the most
relevant documents and ranks them based on your query. It works by comparing your query
with documents in the system using a matching function. This function gives each
document a retrieval status value (RSV) which helps rank the most relevant results first.
To do this IR systems represent documents using descriptors i.e most important keywords
from vocabulary (V).
Estimation of the probability of user’s relevance rel for each document d and query q
with respect to a set R q of training documents: Prob ( rel ∣ d, q, R q )
Problem Identification: A student wants to learn about machine learning and types a
query into a search engine.
Representation: The user converts their need into a search query using keywords or
phrases like instead of asking "How do machines learn?" the student types "machine
learning basics" into Google and the problem is converted into a query (keywords or
phrases).
Query: The user submits the search query into IR system.
Feedback: User can refine or modify the search based on the retrieved results.
Acquisition: The system collects and stores a large number of documents or data
sources. It can includes web pages, books, research papers or any text-based
information.
Representation: Each document in the system is analyzed and represented in a
structured way using keywords (terms). Example: If the document talks about "machine
learning" it is tagged with relevant terms like "AI, deep learning, algorithms, models" to
help retrieval.
File Organization: The documents are indexed and stored efficiently so the system can
quickly find relevant ones. Like organizing a library so books can be found easily based
on topics.
Matching: The system compares the user's search query with stored documents to
find the best matches. It uses matching functions that rank documents based on
relevance.
Retrieved Object: The system returns the most relevant documents to the user. These
documents are ranked so the most useful ones appear at the top.
The user reviews the retrieved results and may provide feedback to refine the search.
The system then processes the updated query and retrieves better results.
Acquisition: In this step the selection of documents and other objects from various web
resources that consist of text-based documents takes place. The required data is
collected by web crawlers and stored in the database.
Representation: It consists of indexing that contains free-text terms, controlled
vocabulary, manual and automatic techniques as well. Example: Abstracting contains
summarizing and Bibliographic description that contains author, title, sources, data and
metadata.
File Organization: There are two types of file organization methods. i.e. Sequential that
contains documents by document data and Inverted: that contains list of records under
each term.
Query: An IR process starts when a user enters a query into the system. Queries are
formal statements of information needs. For example, search strings in web search
engines. In IR a query does not uniquely identify a single object in the collection. Instead
several objects may match the query, perhaps with different degrees of relevancy.
The software program that deals with Data retrieval deals with obtaining data from a
the organization, storage, retrieval database management system such as ODBMS. It is
and evaluation of information from A process of identifying and retrieving the data
document repositories particularly from the database based on the query provided by
textual information. user or application.
Small errors are likely to go unnoticed. A single error object means total failure.
Does not provide a solution to the user Provides solutions to the user of the database
of the database system. system.
Efficient Access: Information retrieval techniques make it possible for users to easily
locate and retrieve vast amounts of data or information.
Personalization of Results: User profiling and personalization techniques are used to
tailor search results to individual preferences and behaviors.
Scalability: They are capable of handling increasing data volumes.
Precision: These systems can provide highly accurate and relevant search results and
reducing the likelihood of irrelevant information appearing in search results.
Explore
Introduction to NLP
Software Engineering Tutorial Software Development Life Cycle Waterfall Model Software Requirements Software Measurement and Metrics Software Design Process System configuration management Soft
Search Engine
Last Updated : 23 Jul, 2025
Imagine you are in a library and are looking for a particular book. Now if you have to go
through every book in each category, it will be a tedious and difficult task. Moreover, if the
library has more than a million books then this task seems next to impossible. You are
definitely going to need a librarian who can bring the relevant books for you without any
delay. Well, that’s where a search engine comes in.
Search engine spamming refers to the practice of creating Web pages, or sets of Web
pages, designed to get a high relevance rank for some queries, even though the sites are
not popular sites. Popularity ranking schemes such as PageRank make the job of search
engine spamming more difficult, since just repeating words to get a high TF– IDF score was
no longer sufficient. However, even these techniques can be spammed, by creating a
collection of Web pages that point to each other, increasing their popularity rank.
Techniques such as using sites instead of pages as the unit of ranking (with appropriately
normalized jump probabilities) have been proposed to avoid some spamming techniques,
but are not fully effective against other spamming techniques. The war between search
engine spammers and search engines continues even today.
The hubs and authorities approach of the HITS algorithm is more susceptible to spamming.
A spammer can create a Web page containing links to good authorities on a topic, and
gains a high hub score as a result. In addition, the spammer’s Web page includes links to
pages that they wish to popularize, which may not have any relevance to the topic. Because
these linked pages are pointed to by a page with high hub score, they get a high but
undeserved authority score.
Table of Content
What is a Search Engine?
History of search engines
Working of a search engine
Architecture Of Search Engine
How queries are processed in search engine?
Search Engine Advantages:
Examples Of Popularly Used Search Engines
We all use search engine in our day to day life or should I say daily in our lives! I guess we
all use Google a number of times in a day even to search basic things. Well Google is one
of the most widely used search engine all around the world due to its variety of services
like web search, image and video search, etc.
If we look back to earlier example the search engine acts as a librarian that gathers
relevant books which is required information from the library of data available on the
internet.
To summarize, when user searches for a particular data the web crawlers scan or crawl
through the data available on web and gather all the relevant information (Crawling). After
this, the gathered information is organized in the form of catalog or database so that the
relevant web pages can be selected quickly. The search engine then picks up the most
relevant results according to the ranking and finally displays it in the results page or SERP.
It is quite a technical process, but all this happens so quickly that user gets the results as
soon as they search something on the search engine.
• Web crawlers – As the name suggests these acts as spiders which crawl all over the
web to collect required information. These are special bots that search throughout the
internet and accumulates data using various links.
• Search Interface – It provides a medium or interface for users so that they can access
and search on the database for required information.
Indexing
• The indexing process begins with web crawling where the so-called spiders crawl across
the world wide web and collect data.
• The data collected is stored in the form of a database for the process of indexing. This is
also termed as text acquisition.
• Then the collected data is broken down into tokens or keywords. These tokens are used
by the search engine in creating indexes. Each keyword is associated to a particular
document and through indexing the data becomes organized and it helps the search engine
to quickly retrieve a particular information.
Querying
• When a user searches something on the search engine a query input is generated.
• Then the search engine parses the generated query and searches at the indexes for the
matching documents.
• Using a ranking algorithm, the search engine ranks the documents based on their
relevance. Finally, the generated list is presented to the user with most relevant results on
the top.
• The quick and efficient responses of search engine make it easier for a user to
immediately get result for their search.
• A search engine not only supports text results but also results like images, videos, maps,
documents and various other formats, hence offering users a one stop solution.
• In today’s time people are using search engines not only for technical purposes but also
for researches, educational purposes and even in day-to-day life because of its diversified
result generation.
• Google - Founded in 1998 by Larry Page and Sergey Brin, Google is the most popularly
and widely used search engine. It has an attractive an user-friendly interface with a
versatile library of features which makes it first choice for maximum users when it comes to
search engines.
• Bing - Founded in 2009 by Microsoft, Bing is quite similar to other search engines. It also
allows users to search through images.
• Yahoo - Founded in 1994 by Jerry Yang and David Filo, Yahoo was among the earliest
used search engines. However, its popularity has declined over time. Earlier Yahoo offered
a platform called "Yahoo answers" where users could ask or answer various questions.
• Baidu, DuckDuckGo and Yandex are some other popular emerging browsers in today's
time.
Explore
Practice Questions
Aptitude Engineering Mathematics Discrete Mathematics Operating System DBMS Computer Networks Digital Logic and Design C Programming Data Structures Algorithms
We classify the relation in Datalog Program or deductive database as either output relation
or input relation. output relations are defined by rules and input relations have a set of
tuples explicitly listed (e.g. assembly) given the instance of the input relation we must
compute instances for the output relations.
The major advantage of a deductive database is the ability to write queries. we can
understand deductive databases more easily using the following diagram.
deductive database
The meaning Datalog programming usually defines deductive database in two different
ways both of which essentially describe the relation instance for output relation. technically
a query is a section over one of the output relations. however, the meaning of the query is
clear once we understand how relation instances are associated with output relation in
Datalog Program
This model gives users a way to understand the Program without thinking about how
that Program will be executed.
This semantics is declarative works like relational calculus and not practical like
relational algebra semantics.
It is comparatively simpler due to recursive rules making it difficult to understand the
Program in terms of evaluation strategy.
least fix point semantics will give a conceptual evaluation strategy to compute the
relation.
It works as the basis for recursive query evaluation.
The efficient query evaluation strategy is used in actually for better implementation.
The correctness of the model is demonstrated by equivalence to the least fixed point
approach.
The deductive database stores rules and facts on datalog formulas in clausal form
It contains quantifiers like existential and universals
Clausal forms of the formula are made up of a number of clauses each clause is
composed of a number of literals connected by OR logical connection or AND logical
connection
The deductive database then interprets all rules using various methods.
Interpretation of rules the fact is considered as axioms. Rules are also called deductive
axioms and are used to construct a proof that derives new facts from existing facts.
Another method of interpretation we have given is an infinite domain of constant values
with an assigned predicate for each combination of values for an argument.
There are many deductive prototypes are available many such systems are memory based.
it assumes all required permanent relations are stored in the main memory and during the
computation process, temporary relations generated can be stored in memory.
Explore
Basics of DBMS
Relational Algebra
Advanced DBMS
Practice Questions