DBMS Unit 2: Relational Algebra & SQL
DBMS Unit 2: Relational Algebra & SQL
For better understanding watch this youtube playlist along with the topics covered
in this written material--
[Link]
Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.
1. Select Operation:
● The select operation selects tuples that satisfy a given predicate.
Notation: σ p(r)
Where:
Output:
2. Project Operation:
● This operation shows the list of those attributes that we wish to appear in the result.
● It is denoted by ∏.
Where
Input:
NAME CITY
Jones Harrison
Smith Rye
Hays Harrison
Curry Rye
Johnson Brooklyn
Brooks Brooklyn
3. Union Operation:
● Suppose there are two tuples R and S. The union operation contains all the tuples that
Notation: R ∪ S
Example:
DEPOSITOR RELATION
CUSTOMER_NAME ACCOUNT_NO
Johnson A-101
Smith A-121
Mayes A-321
Turner A-176
Johnson A-273
Jones A-472
Lindsay A-284
BORROW RELATION
CUSTOMER_NAME LOAN_NO
Jones L-17
Smith L-23
Hayes L-15
Jackson L-14
Curry L-93
Smith L-11
Williams L-17
Input:
Output:
CUSTOMER_NAME
Johnson
Smith
Hayes
Turner
Jones
Lindsay
Jackson
Curry
Williams
Mayes
4. Set Intersection:
● Suppose there are two tuples R and S. The set intersection operation contains all
Notation: R ∩ S
Input:
Output:
CUSTOMER_NAME
Smith
Jones
5. Set Difference:
● Suppose there are two tuples R and S. The set intersection operation contains all
Notation: R - S
Input:
Output:
CUSTOMER_NAME
Jackson
Hayes
Willians
Curry
6. Cartesian product
● The Cartesian product is used to combine each row in one table with each row in the
● It is denoted by X.
Notation: E X D
Example:
EMPLOYEE
1 Smith A
2 Harry C
3 John B
DEPARTMENT
DEPT_NO DEPT_NAME
A Marketing
B Sales
C Legal
Input:
1. EMPLOYEE X DEPARTMENT
Output:
1 Smith A A Marketing
1 Smith A B Sales
1 Smith A C Legal
2 Harry C A Marketing
2 Harry C B Sales
2 Harry C C Legal
3 John B A Marketing
3 John B B Sales
3 John B C Legal
7. Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
Example: We can use the rename operator to rename STUDENT relation to STUDENT1.
ρ(STUDENT1, STUDENT)
Note: Apart from these common operations Relational algebra can be used in Join
operations
Relational Calculus
● Relational calculus is a non-procedural query language. In the non-procedural query
language, the user is concerned with the details of how to obtain the end results.
● The relational calculus tells what to do but never explains how to do.
Notation:
Where
OUTPUT: This query selects the tuples from the AUTHOR relation. It returns a tuple with
'name' from Author who has written an article on 'database'.
TRC (tuple relational calculus) can be quantified. In TRC, we can use Existential (∃) and
Universal Quantifiers (∀).
For example:
Output: This query will yield the same result as the previous one.
● Domain relational calculus uses the same operators as tuple calculus. It uses logical
● It uses Existential (∃) and Universal Quantifiers (∀) to bind the variable.
Notation:
Where
For example:
SQL
SQL (Structured Query Language) is used to perform operations on the records stored in the
database such as updating records, deleting records, creating and modifying tables, views,
etc.
SQL is just a query language; it is not a database. To perform SQL queries, you need to
install any database, for example, Oracle, MySQL, MongoDB, PostGreSQL, SQL Server,
DB2, etc.
What is SQL
● SQL stands for Structured Query Language.
(RDBMS).
● SQL is a database language, it is used for database creation, deletion, fetching rows,
All DBMS like MySQL, Oracle, MS Access, Sybase, Informix, PostgreSQL, and SQL
● With SQL, a user can access data from a relational database management system.
● It allows the user to define the data in the database and manipulate it when needed.
SQL Commands
● SQL commands are instructions. It is used to communicate with the database. It is also
used to perform specific tasks, functions, and queries of data.
● SQL can perform various tasks like create a table, add data to tables, drop the table,
table, etc.
● All the command of DDL are auto-committed that means it permanently save all the
● CREATE
● ALTER
● DROP
● TRUNCATE
Syntax:
CREATE TABLE TABLE_NAME (COLUMN_NAME DATATYPES[,....]);
Example:
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
DROP TABLE ;
Example
c. ALTER: It is used to alter the structure of the database. This change could be either to
modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
EXAMPLE
d. TRUNCATE: It is used to delete all the rows from the table and free the space containing
the table.
Syntax:
Example:
TRUNCATE TABLE EMPLOYEE;
● DML commands are used to modify the database. It is responsible for all form of
● The command of DML is not auto-committed that means it can't permanently save all
● INSERT
● UPDATE
● DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the
row of a table.
Syntax:
INSERT INTO TABLE_NAME (col1, col2, col3,.... col N) VALUES (value1, value2,
value3, .... valueN);
Or
For example:
b. UPDATE: This command is used to update or modify the value of a column in the table.
Syntax:
[WHERE CONDITION]
For example:
UPDATE students SET User_Name = 'Sonoo' WHERE Student_Id = '3'
Syntax:
For example:
● Grant
● Revoke
Example
Example
These operations are automatically committed in the database that's why they cannot be used
while creating tables or dropping them.
● ROLLBACK
● SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax: COMMIT;
Example:
COMMIT;
b. Rollback: Rollback command is used to undo transactions that have not already
Syntax: ROLLBACK;
Example:
ROLLBACK;
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.
● SELECT
a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.
Syntax:
SELECT expressions FROM TABLES WHERE conditions;
For example:
[Link]
[Link]
[Link]
● Define relations/attributes
● Define primary keys
● Define relationships
● Normalization
Domain
A domain is the original sets of atomic values used to model data. By atomic value, we mean
that each value in the domain is indivisible as far as the relational model is concerned. For
example:
● The domain of Marital Status has a set of possibilities: Married, Single, Divorced.
● The domain of Shift has the set of all possible days: {Mon, Tue, Wed…}.
● The domain of Salary is the set of all floating-point numbers greater than 0 and less
than 200,000.
● The domain of First Name is the set of character strings that represents names of
people.
In summary, a domain is a set of acceptable values that a column is allowed to contain. This is
based on various properties and the data type for the column. We will discuss data types in
another chapter.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a
dependent.
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of the employee table
because if we know the Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
database.
● Using the inference rule, we can derive additional functional dependency from
If X ⊇ Y then X → Y
Example:
1. X = {a, b, c, d, e}
2. Y = {a, b, c}
2. Augmentation Rule (IR2 )
The augmentation is also called a partial dependency. In augmentation, if X determines Y,
then XZ determines YZ for any Z.
If X → Y then XZ → YZ
Example:
If X → Y and Y → Z then X → Z
If X → Y and X → Z then X → YZ
Proof:
X → Y (given)
X → Z (given)
X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
This Rule says, if X determines Y and Z, then X determines Y and X determines Z separately.
If X → YZ then X → Y and X → Z
Proof:
X → YZ (given)
YZ → Y (using IR1 Rule)
If X → Y and YZ → W then XZ → W
Proof:
X → Y (given)
WY → Z (given)
WX → WY (using IR2 on 1 by augmenting with W)
Integrity Constraints
● Integrity constraints are a set of rules. It is used to maintain the quality of information.
● Integrity constraints ensure that the data insertion, updating, and other processes have to be
● Thus, integrity constraint is used to guard against accidental damage to the database.
● The data type of domain includes string, character, integer, time, date, currency, etc. The value of
Example:
● This is because the primary key value is used to identify individual rows in relation and if the
primary key has a null value, then we can't identify those rows.
● A table can contain a null value other than the primary key field.
Example:
● In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key of
Table 2, then every value of the Foreign Key in Table 1 must be null or be available in Table 2.
Example:
4. Key constraints
● Keys are the entity set that is used to identify an entity within its entity set uniquely.
● An entity set can have multiple keys, but out of which one key will be the primary key. A primary
key can contain a unique and null value in the relational table.
Example:
Normalization
● Normalization is the process of organizing the data in the database.
● Normalization is used to minimize the redundancy from a relation or set of relations. It is also used
to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.
● Normalization divides the larger table into the smaller table and links them using relationship.
● The normal form is used to reduce redundancy from the database table.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the primary key.
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multivalued
dependency.
5NF A relation is in 5NF if it is in 4NF and does not contain any join dependency and joining
should be lossless.
● It states that an attribute of a table cannot hold multiple values. It must hold only a single-valued
attribute.
● First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
EMPLOYEE table:
9064738238
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 7272826385 UP
14 John 9064738238 UP
● In the second normal form, all non-key attributes are fully functional dependent on the primary
key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a
teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper
subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
● 3NF is used to reduce the data duplication. It is also used to achieve data integrity.
● If there is no transitive dependency for non-prime attributes, then the relation must be in third
normal form.
A relation is in third normal form if it holds at least one of the following conditions for every non-trivial
functional dependency X → Y.
1. X is a super key.
Example:
EMPLOYEE_DETAIL table:
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY depend on EMP_ZIP and EMP_ZIP dependent on EMP_ID.
The non-prime attributes (EMP_STATE, EMP_CITY) transitively depend on super
key(EMP_ID). It violates the rule of third normal form. That's why we need to move the
EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a
Primary key. EMPLOYEE table:
EMPLOYEE_ZIP table:
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
● A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
● For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
1. EMP_ID → EMP_COUNTRY
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
1. EMP_ID → EMP_COUNTRY
Now, this is in BCNF because the left side part of both the functional dependencies is a key.
● For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence,
there is no relationship between COURSE and HOBBY.
Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
lossless.
● 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
Example
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class
for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking
that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary
key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
P2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
P3
SEMESTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Dependency Preserving:
● It is an important constraint of the database.
● In the dependency preservation, at least one decomposed table must satisfy every dependency.
● If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a
R2.
● For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC).
The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving
Lossless Decomposition:
Decomposition is lossless if it is feasible to reconstruct relation R from decomposed tables
using Joins. This is the preferred choice. The information will not lose from the relation when
decomposed. The join would result in the same original relation.
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
<EmpDetails>
Emp_ID Emp_Name Emp_Age Emp_Location
<DeptDetails>
Dept_ID Emp_ID Dept_Name
Dpt2 E002 HR
Therefore, the above relation had lossless decomposition i.e. no loss of information.
Lossy Decomposition
As the name suggests, when a relation is decomposed into two or more relational schemas, the
loss of information is unavoidable when the original relation is retrieved.
<EmpInfo>
Emp_ID Emp_Name Emp_Age Emp_Location Dept_ID Dept_Name
<DeptDetails>
Dept_ID Dept_Name
Dpt1 Operations
Dpt2 HR
Dpt3 Finance
Now, you won’t be able to join the above tables, since Emp_ID isn’t part of the
DeptDetails relation.
2. Optimization
3. Evaluation
Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of extracting the data
from the database. In SQL, a user wants to fetch the records of the employees whose salary is greater than
or equal to 10000. For doing this, the following query is undertaken:
Thus, to make the system understand the user query, it needs to be translated in the form of relational algebra.
We can bring this query in the relational algebra form as:
After translating the given query, we can execute each relational algebra operation by using different
algorithms. So, in this way, query processing begins working.
Evaluation
For this, with addition to the relational algebra translation, it is required to annotate the translated relational
algebra expression with the instructions used for specifying and evaluating each operation. Thus, after
translating the user query, the system executes a query evaluation plan.
● The annotations in the evaluation plan may refer to the algorithms to be used for the particular
● Such relational algebra with annotations is referred to as Evaluation Primitives. The evaluation
primitives carry the instructions needed for the evaluation of the operation.
● Thus, a query evaluation plan defines a sequence of primitive operations used for evaluating a
query. The query evaluation plan is also referred to as the query execution plan.
● A query execution engine is responsible for generating the output of the given query.
It takes the query execution plan, executes it, and finally makes the output for the user query.
Optimization
● The cost of the query evaluation can vary for different types of queries. Although the system is
responsible for constructing the evaluation plan, the user does need not to write their query
efficiently.
● Usually, a database system generates an efficient query evaluation plan, which minimizes its cost.
This type of task is performed by the database system and is known as Query Optimization.
● For optimizing a query, the query optimizer should have an estimated cost analysis of each
operation. It is because the overall operation cost depends on the memory allocations to several
Finally, after selecting an evaluation plan, the system evaluates the query and produces the output of the
query.
Query Equivalence Rules
The equivalence rule says that expressions of two forms are the same or equivalent because both expressions
produce the same outputs on any legal database instance. It means that we can possibly replace the expression
of the first form with that of the second form and replace the expression of the second form with an expression
of the first form. Thus, the optimizer of the query-evaluation plan uses such an equivalence rule or method
for transforming expressions into the logically equivalent one.
The optimizer uses various equivalence rules on relational-algebra expressions for transforming the relational
expressions. For describing each rule, we will use the following symbols:
L1 , L ,L
2 3 … : Used for denoting the list of attributes. E, E1,
Rule 1: Cascade of σ
This rule states the deconstruction of the conjunctive selection operations into a sequence of individual
selections. Such a transformation is known as a cascade of σ. σθ1 ᴧ θ 2 (E) = σθ1 (σθ2 (E))
However, in the case of theta join, the equivalence rule does not work if the order of attributes is considered.
Natural join is a special case of Theta join, and natural join is also commutative.
However, in the case of theta join, the equivalence rule does not work if the order of attributes is considered.
Natural join is a special case of Theta join, and natural join is also commutative.
Rule 3: Cascade of ∏
This rule states that we only need the final operations in the sequence of the projection operations, and other
operations are omitted. Such a transformation is referred to as a cascade of ∏ .
Rule 4: We can combine the selections with Cartesian products as well as theta joins
In the theta associativity, θ2 involves the attributes from E2 and E3 only. There may be chances of empty
conditions, and thereby it concludes that Cartesian Product is also associative.
Note: The equivalence rules of associativity and commutatively of join operations are
essential for join reordering in query optimization.
Under two following conditions, the selection operation gets distributed over the theta-join operation:
a) When all attributes in the selection condition θ0 include only attributes of one of the expressions
which are being joined.
b) When the selection condition θ1 involves the attributes of E1 only, and θ2 includes the attributes of
E2 only.
Under two following conditions, the selection operation gets distributed over the theta-join operation:
a) Assume that the join condition θ includes only in L1 υ L2 attributes of E1 and E2 Then, we get the
following expression:
b) Assume a join as E1 ⋈ E2. Both expressions E1 and E2 have sets of attributes as L1 and L2. Assume
two attributes L3 and L4 where L3 be attributes of the expression E1, involved in the θ join condition but not
in L1 υ L2 Similarly, an L4 be attributes of the expression E2 involved only in the θ join condition and not in
L1 υ L2 attributes. Thus, we get the following expression:
E 1 υ E 2 = E 2 υ E1
E1 E2 = E2 E1
Rule 10: Distribution of selection operation on the intersection, union, and set difference operations.
The below expression shows the distribution performed over the set difference operation.
We can similarly distribute the selection operation on υ and by replacing with -. Further, we get:
Rule 11: Distribution of the projection operation over the union operation.
This rule states that we can distribute the projection operation on the union operation for the given
expressions.
Apart from these discussed equivalence rules, there are various other equivalence rules also.
SQL JOIN
As the name shows, JOIN means to combine something. In case of SQL, JOIN means "to combine two or
more tables".
In SQL, JOIN clause is used to combine the records from two or more tables in a database.
LEFT JOIN
3. RIGHT JOIN
4. FULL JOIN
Sample Table
EMPLOYEE
PROJECT
101 1 Testing
102 2 Development
103 3 Designing
104 4 Development
1. INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the condition is
satisfied. It returns the combination of all rows from both the tables where the condition satisfies.
Syntax
Query
Output
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
2. LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right table. If there
is no matching join value, it will return NULL.
Syntax
Query
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL
3. RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched
values from the left table. If there is no matching in both tables, it will return NULL.
Syntax
Query
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
4. FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outer join. Join tables have all the
records from both tables. It puts NULL on the place of matches not found.
Syntax
Query
EMP_NAME DEPARTMENT
Angelina Testing
Robert Development
Christian Designing
Kristen Development
Russell NULL
Marry NULL