0% found this document useful (0 votes)
11 views73 pages

Understanding Trees in Data Structures

Uploaded by

SURESH M
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views73 pages

Understanding Trees in Data Structures

Uploaded by

SURESH M
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

UNIT-IV

TREES AND GRAPHS

TREES

INTRODUCTION
In non-linear data structure data is organized in random order. A tree is a very popular non-
linear data structure used in a wide range of applications. Tree is a non-linear data structure
which organizes data in hierarchical structure and this is a recursive definition.

Trees Basic Concepts:

A tree is a non-empty set one element of which is designated the root of the tree while the remaining
elements are partitioned into non-empty sets each of which is a sub-tree of the root.

A tree T is a set of nodes storing elements such that the nodes have a parent-child relationship that
satisfies the following:

• If T is not empty, T has a special tree called the root that has no parent.

• Each node v of T different than the root has a unique parent node w; each node with parent w
is a child of w.

Tree nodes have many useful properties. The depth of a node is the length of the path (or the number
of edges) from the root to that node. The height of a node is the longest path from that node to its
leaves. The height of a tree is the height of the root. A leaf node has no children -- its only path is up
to its parent.

Application of Trees:

1. One reason to use trees might be because you want to store information that naturally forms a
1
hierarchy. For example, the file system on a computer:
file system
———–
/ <-- root
/ \
... home
/ \
ugrad course
/ / | \
... cs101 cs112 cs113

2. If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a given
key in moderate time (quicker than Linked List and slower than arrays). Self-balancing search
trees like AVL and Red-Black trees guarantee an upper bound of O(logn) for search.

3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered
Linked Lists). Self-balancing search trees like AVL and Red-Black trees guarantee an upper bound
of O(logn) for insertion/deletion.

4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an upper
limit on number of nodes as nodes are linked using pointers.

The following are the common uses of tree.


1. Manipulate hierarchical data.
2. Make information easy to search (see tree traversal).
3. Manipulate sorted lists of data.
4. As a workflow for compositing digital images for visual effects.
5. Router algorithms

TREE TERMINOLOGIES:
1. Root Node: In a Tree data structure, the first node is called as Root Node. Every tree
must have a root node. We can say that the root node is the origin of the tree data structure. In
any tree, there must be only one root node. We never have multiple root nodes in a tree.

2. Edge: In a Tree, the connecting link between any two nodes is called as EDGE. In a tree
with 'N' number of nodes there will be a maximum of 'N-1' number of edges.

2
3. Parent Node: In a Tree, the node which is a predecessor of any node is called as
PARENT NODE. In simple words, the node which has a branch from it to any other node is
called a parent node. Parent node can also be defined as "The node which has child /
children".

Here, A is parent of B&C. B is the parent of D,E&F and so on…


4. Child Node: In a Tree data structure, the node which is descendant of any node is called
as CHILD Node. In simple words, the node which has a link from its parent node is called as
child node. In a tree, any parent node can have any number of child nodes. In a tree, all the
nodes except root are child nodes.

5. Siblings: In a Tree data structure, nodes which belong to same Parent are called as
SIBLINGS. In simple words, the nodes with the same parent are called Sibling nodes.

3
6. Leaf Node: In a Tree data structure, the node which does not have a child is called
as LEAF Node. In simple words, a leaf is a node with no child. In a tree data structure, the
leaf nodes are also called as External Nodes. External node is also a node with no child. In a
tree, leaf node is also called as 'Terminal' node.

7. Internal Nodes: In a Tree data structure, the node which has atleast one child is called as
INTERNAL Node. In simple words, an internal node is a node with atleast one child.

In a Tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes are
also called as 'Non-Terminal' nodes.

8. Degree: In a Tree data structure, the total number of children of a node is called as
DEGREE of that Node. In simple words, the Degree of a node is total number of children it
has. The highest degree of a node among all the nodes in a tree is called as 'Degree of Tree'

Degree of Tree is: 3

9. Level: In a Tree data structure, the root node is said to be at Level 0 and the children of
root node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2

5
and so on... In simple words, in a tree each step from top to bottom is called as a Level and
the Level count starts with '0' and incremented by one at each level (Step).

10. Height: In a Tree data structure, the total number of edges from leaf node to a particular
node in the longest path is called as HEIGHT of that Node. In a tree, height of the root node
is said to be height of the tree. In a tree, height of all leaf nodes is '0'.

11. Depth: In a Tree data structure, the total number of egdes from root node to a particular
node is called as DEPTH of that Node. In a tree, the total number of edges from root node to
a leaf node in the longest path is said to be Depth of the tree. In simple words, the highest
depth of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root
node is '0'.

12. Path: In a Tree data structure, the sequence of Nodes and Edges from one node to
another node is called as PATH between that 6two Nodes. Length of a Path is total number
of nodes in that path. In below example the path A - B - E - J has length 4.
13. Sub Tree: In a Tree data structure, each child from a node forms a subtree recursively.
Every child node will form a subtree on its parent node.

TREE REPRESENTATIONS:
A tree data structure can be represented in two methods. Those methods are as follows...

1. List Representation
2. Left Child - Right Sibling Representation

Consider the following tree...

1. List Representation
In this representation, we use two types of nodes7 one for representing the node with data
called 'data node' and another for representing only references called 'reference node'. We
start with a 'data node' from the root node in the tree. Then it is linked to an internal node
through a 'reference node' which is further linked to any other node directly. This process
repeats for all the nodes in the tree.
The above example tree can be represented using List representation as follows...

2. Left Child - Right Sibling Representation


In this representation, we use a list with one type of node which consists of three fields
namely Data field, Left child reference field and Right sibling reference field. Data field
stores the actual value of a node, left reference field stores the address of the left child and
right reference field stores the address of the right sibling node. Graphical representation of
that node is as follows...

In this representation, every node's data field stores the actual value of that node. If that node
has left a child, then left reference field stores the address of that left child node otherwise
stores NULL. If that node has the right sibling, then right reference field stores the address of
right sibling node otherwise stores NULL.

The above example tree can be represented using Left Child - Right Sibling representation as
follows...

BINARY TREE:

In a normal tree, every node can have any number 8 of children. A binary tree is a special type
of tree data structure in which every node can have a maximum of 2 children. One is known
as a left child and the other is known as right child.

A tree in which every node can have a maximum of two children is called Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more
than 2 children.

In general, tree nodes can have any number of children. In a binary tree, each node can have
at most two children. A binary tree is either empty or consists of a node called the root
together with two binary trees called the left subtree and the right subtree. A tree with no
nodes is called as a null tree

Example:

Properties of Binary Trees:

Some of the important properties of a binary tree are as follows:

1. If h = height of a binary tree, then

a. Maximum number of leaves = 2h

b. Maximum number of nodes = 2h + 1 - 1

2. If a binary tree contains m nodes at level l, it contains at most 2m nodes at level l + 1.

3. Since a binary tree can contain at most one node at level 0 (the root), it can contain at most 2l
node at level l.

4. The total number of edges in a full binary tree with n node is n - 1.

TYPES OF BINARY TREE:


1. Strictly Binary Tree:
In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none. That means every internal node must
have exactly two children. A strictly Binary Tree can be defined as follows...
A binary tree in which every node has either two or zero number of children is called
Strictly Binary Tree

9
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree.

Strictly binary tree data structure is used to represent mathematical expressions.

10
Example

2. Complete Binary Tree:


In a binary tree, every node can have a maximum of two children. But in strictly binary tree,
every node should have exactly two children or none and in complete binary tree all the
nodes must have exactly two children and at every level of complete binary tree there must be
2level number of nodes. For example at level 2 there must be 2 2 = 4 nodes and at level 3 there
must be 23 = 8 nodes.
A binary tree in which every internal node has exactly two children and all leaf nodes are
at same level is called Complete Binary Tree.

Complete binary tree is also called as Perfect Binary Tree.

3. Extended Binary Tree:


A binary tree can be converted into Full Binary tree by adding dummy nodes to existing
nodes wherever required.
The full binary tree obtained by adding dummy nodes to a binary tree is called as
Extended Binary Tree.

11
In above figure, a normal binary tree is converted into full binary tree by adding dummy
nodes.
4. Skewed Binary Tree:
If a tree which is dominated by left child node or right child node, is said to be a Skewed
Binary Tree.
In a skewed binary tree, all nodes except one have only one child node. The remaining
node has no child.

In a left skewed tree, most of the nodes have the left child without corresponding right child.
In a right skewed tree, most of the nodes have the right child without corresponding left
child.
BINARY TREE REPRESENTATIONS:
A binary tree data structure is represented using two methods. Those methods are as
follows...
1. Array Representation
2. Linked List Representation
Consider the following binary tree...

12
1. Array Representation of Binary Tree
In array representation of a binary tree, we use one-dimensional array (1-D Array) to
represent a binary tree.
Consider the above example of a binary tree and it is represented as follows...

To represent a binary tree of depth 'n' using array representation, we need one dimensional
array with a maximum size of 2n + 1.
2. Linked List Representation of Binary Tree
We use a double linked list to represent a binary tree. In a double linked list, every node
consists of three fields. First field for storing left child address, second for storing actual
data and third for the right child address.
In this linked list representation, a node has the following structure...

The above example of the binary tree represented using Linked list representation is shown as
follows...

BINARY TREE TRAVERSALS:


Unlike linear data structures (Array, Linked List, Queues, Stacks, etc) which have only one
logical way to traverse them, binary trees can be traversed in different ways. Following are
the generally used ways for traversing binary trees.
When we wanted to display a binary tree, we need to follow some order in which all the
nodes of that binary tree must be displayed. In any binary tree, displaying order of nodes
depends on the traversal method.
Displaying (or) visiting order of nodes in a binary tree is called as Binary Tree Traversal.
There are three types of binary tree traversals.
1. In - Order Traversal
2. Pre - Order Traversal 13
3. Post - Order Traversal
1. In - Order Traversal (left Child - root - right Child):
In In-Order traversal, the root node is visited between the left child and right child. In this
traversal, the left child node is visited first, then the root node is visited and later we go for
visiting the right child node. This in-order traversal is applicable for every root node of all
sub trees in the tree. This is performed recursively for all nodes in the tree.
Algorithm:
Step-1: Visit the left subtree, using inorder.
Step-2: Visit the root.
Step-3: Visit the right subtree, using inorder.

void Inorder( TREE T )


{ if( T != NULL )
{
Inorder( T->left );
print_element( T->element );
Inorder( T->right );
}
}
In the above example of a binary tree, first we try to visit left child of root node 'A', but A's
left child 'B' is a root node for left subtree. so we try to visit its (B's) left child 'D' and again D
is a root for subtree with nodes D, I and J. So we try to visit its left child 'I' and it is the
leftmost child. So first we visit 'I' then go for its root node 'D' and later we visit D's right
child 'J'. With this we have completed the left part of node B. Then visit 'B' and next B's
right child 'F' is visited. With this we have completed left part of node A. Then visit root
node 'A'. With this we have completed left and root parts of node A. Then we go for the right
part of the node A. In right of A again there is a subtree with root C. So go for left child of C
and again it is a subtree with root G. But G does not have left part so we visit 'G' and then
visit G's right child K. With this we have completed the left part of node C. Then visit root
node 'C' and next visit C's right child 'H' which is the rightmost child in the tree. So we stop
the process.
That means here we have visited in the order of I - D - J - B - F - A - G - K - C - H using In-
Order Traversal.
2. Pre - Order Traversal ( root - leftChild - rightChild ):
In Pre-Order traversal, the root node is visited before the left child and right child nodes. In
this traversal, the root node is visited first, then its left child and later its right child. This pre-
order traversal is applicable for every root node of all subtrees in the tree. Preorder search is
also called backtracking.
Algorithm:
Step-1: Visit the root. 14
Step-2: Visit the left subtree, using preorder.
Step-3: Visit the right subtree, using preorder.
void Preorder( TREE T )
{ if( T != NULL )
{
print_element( T->element );
Preoder( T->left );
Preorder( T->right ); }}

In the above example of binary tree, first we visit root node 'A' then visit its left
child 'B' which is a root for D and F. So we visit B's left child 'D' and again D is a root for I and
J. So we visit D's left child 'I' which is the leftmost child. So next we go for visiting D's right
child 'J'. With this we have completed root, left and right parts of node D and root, left parts
of node B. Next visit B's right child 'F'. With this we have completed root and left parts of
node A. So we go for A's right child 'C' which is a root node for G and H. After visiting C,
we go for its left child 'G' which is a root for node K. So next we visit left of G, but it does
not have left child so we go for G's right child 'K'. With this, we have completed node C's
root and left parts. Next visit C's right child 'H' which is the rightmost child in the tree. So we
stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order
Traversal.

3. Post - Order Traversal ( leftChild - rightChild - root ):


In Post-Order traversal, the root node is visited after left child and right child. In this
traversal, left child node is visited first, then its right child and then its root node. This is
recursively performed until the right most nodes are visited.
Algorithm:
Step-1: Visit the left subtree, using postorder.
Step-2: Visit the right subtree, using postorder
Step-3: Visit the root.

15
void Postorder( TREE T )
{ if( T != NULL )
{
Postorder( T->left );
Postorder( T->right );
print_element( T->element ); }}

Here we have visited in the order of I - J - D - F - B - K - G - H - C - A using Post-


Order Traversal.

16
17
Expression Tree
The expression tree is a binary tree in which each internal node corresponds to the operator and
each leaf node corresponds to the operand so for example expression tree for 3 + ((5+9)*2) would
be:

Inorder traversal of expression tree produces infix version of given postfix expression (same with
postorder traversal it gives postfix expression)
Use of Expression tree
1. The main objective of using the expression trees is to make complex expressions and can be
easily be evaluated using these expression trees.
2. It is also used to find out the associativity of each operator in the expression.
3. It is also used to solve the postfix, prefix, and infix expression evaluation.
For example, the postfix notation a b + c d e + * * results in the following expression tree. The
corresponding infix notation is (a+b)*(c*(d+e)) which can be produced by traversing the
expression tree in an inorder fashion.

Binary Search Tree


Binary search tree is a data structure that quickly allows us to maintain a sorted list of numbers.
 It is called a binary tree because each tree node has a maximum of two children.
 It is called a search tree because it can be used to search for the presence of a number
in O(log(n)) time.

18
Properties and Operations:
A BST is a binary tree of nodes ordered in the following way:

1. Each node contains one key (also unique)


2. The keys in the left subtree are < (less) than the key in its parent node
3. The keys in the right subtree > (greater) than the key in its parent node
4. Duplicate node keys are not allowed.

The reason binary-search trees are important is that the following operations can be implemented
efficiently using a BST:

 insert a key value


 determine whether a key value is in the tree
 remove a key value from the tree
 print all of the key values in sorted order

Let's illustrate what happens using the following BST:

19
and searching for 12:

What if we search for 15:

20
Inserting a node
A naïve algorithm for inserting a node into a BST is that, we start from the root node, if the node
to insert is less than the root, we go to left child, and otherwise we go to the right child of the
root. We continue this process (each node is a root for some sub tree) until we find a null pointer
(or leaf node) where we cannot go any further. We then insert the node as a left or right child of
the leaf node based on node is less or greater than the leaf node. We note that a new node is
always inserted as a leaf node. A recursive algorithm for inserting a node into a BST is as
follows. Assume we insert a node N to tree T. if the tree is empty, the we return new node N as
the tree. Otherwise, the problem of inserting is reduced to inserting the node N to left of right sub
trees of T, depending on N is less or greater than T. A definition is as follows.

Insert(N, T) = N if T is empty
= insert(N, [Link]) if N < T
= insert(N, [Link]) if N > T

Insertion routine for binary search trees


struct BinaryNode
{
data element;
BinaryNode *left;
BinaryNode *right;
};
void insert ( element_type x, BinaryNode T )
{
if( T == NULL )
{ /* Create and return a one-node tree */
T =new BinaryNode();
if( T == NULL )
fatal_error("Out of space!!!");
else
21
{
T->element = x;
T->left = T->right = NULL;
}
}
else
if( x < T->element )
insert( x, T->left );
else
if( x > T->element )
insert( x, T->right );
/* else x is in the tree already. We'll do nothing */
}
Searching for a node
Searching for a node is similar to inserting a node. We start from root, and then go left or right
until we find (or not find the node). A recursive definition of search is as follows. If the node is
equal to root, then we return true. If the root is null, then we return false. Otherwise we
recursively solve the problem for [Link] or [Link], depending on N < T or N > T. A recursive
definition is as follows.
Search should return a true or false, depending on the node is found
or not. Search(N, T) = false if T is empty
= true if T = N
= search(N, [Link]) if N < T
= search(N, [Link]) if N > T
Deleting a node
A BST is a connected structure. That is, all nodes in a tree are connected to some other node. For
example, each node has a parent, unless node is the root. Therefore deleting a node could affect
all sub trees of that node. For example, deleting node 5 from the tree could result in losing sub
trees that are rooted at 1 and 9.

Hence we need to be careful about deleting nodes from a tree. The best way to deal with deletion
seems to be considering special cases. What if the node to delete is a leaf node? What if the node
is a node with just one child? What if the node is an internal node (with two children). The latter
case is the hardest to resolve. But we will find a way to handle this situation as well.
Case 1 : The node to delete is a leaf node
This is a very easy case. Just delete the node 46. We are done

22
Case 2 : The node to delete is a node with one child.
This is also not too bad. If the node to be deleted is a left child of the parent, then we connect the
left pointer of the parent (of the deleted node) to the single child. Otherwise if the node to be
deleted is a right child of the parent, then we connect the right pointer of the parent (of the

deleted node) to single child.


Case 3: The node to delete is a node with two children
This is a difficult case as we need to deal with two sub trees. But we find an easy way to handle
it. First we find a replacement node (from leaf node or nodes with one child) for the node to be
deleted. We need to do this while maintaining the BST order property. Then we swap leaf node
or node with one child with the node to be deleted (swap the data) and delete the leaf node or
node with one child (case 1 or case 2)

Next problem is finding a replacement leaf node for the node to be deleted. We can easily find
this as follows. If the node to be deleted is N, the find the largest node in the left sub tree of N or
the smallest node in the right sub tree of N. These are two candidates that can replace the node to
be deleted without losing the order property. For example, consider the following tree and
suppose we need to delete the root 38.

23
Then we find the largest node in the left sub tree (15) or smallest node in the right sub tree (45)
and replace the root with that node and then delete that node. The following set of images
demonstrates this process.

Let’s see when we delete 13 from that tree.

24
25
Deletion routine for binary search trees
void delete( element_type x, BinaryNode T )
{
BinaryNode tmp_cell;
if( T == NULL )
error("Element not found");
else
if( x < T->element ) /* Go left */
delete( x, T->left );
else
if( x > T->element ) /* Go right */
delete( x, T->right );
else /* Found element to be deleted */
if( T->left !=NULL&& T->right!=NULL ) /* Two children */
{ /* Replace with smallest in right subtree */
T->element = find_min( T->right->element );
delete( T->element, T->right );
}
else /* One child */
}
tmp_cell = T;
if( T->left == NULL ) /* Only a right child */
T = T->right;
if( T->right == NULL ) /* Only a left child */
T = T->left;
free( tmp_cell );
}
}

Balanced Search Trees:


A self-balancing (or height-balanced) binary search tree is any node-based binary search tree
that automatically keeps its height (maximal number of levels below the root) small in the face of
arbitrary item insertions and deletions. The red–black tree, which is a type of self-balancing
binary search tree, was called symmetric binary B-tree. Self-balancing binary search trees can be
used in a natural way to construct and maintain ordered lists, such as priority queues. They can
also be used for associative arrays; key-value pairs are simply inserted with an ordering based on
the key alone. In this capacity, self-balancing BSTs have a number of advantages and
disadvantages over their main competitor, hash tables. One advantage of self-balancing BSTs is
that they allow fast (indeed, asymptotically optimal) enumeration of the items in key order,
which hash tables do not provide. One disadvantage is that their lookup algorithms get more
complicated when there may be multiple items with the same key. Self-balancing BSTs have
better worst-case lookup performance than hash tables (O(log n) compared to O(n)), but have
worse average-case performance (O(log n) compared to O(1)).

26
Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists,
to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is
implemented with a self-balanced BST, we have a very simple-to-describe yet asymptotically
optimal O(n log n) sorting algorithm. Similarly, many algorithms in computational geometry
exploit variations on self-balancing BSTs to solve problems such as the line segment intersection
problem and the point location problem efficiently. (For average-case performance, however,
self-balanced BSTs may be less efficient than other solutions. Binary tree sort, in particular, is
likely to be slower than merge sort, quicksort, or heapsort, because of the tree-balancing
overhead as well as cache access patterns.)

Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently
record additional information or perform new operations. For example, one can record the
number of nodes in each subtree having a certain property, allowing one to count the number of
nodes in a certain key range with that property in O(log n) time. These extensions can be used,
for example, to optimize database queries or other list-processing algorithms.

Threaded Binary Tree

In the linked representation of binary trees, more than one half of the link fields contain NULL values
which results in wastage of storage space. If a binary tree consists of n nodes then n+1 link fields
contain NULL values. The NULL links are replaced with special links known as threads. Such binary
trees with threads are known as threaded binary trees. Each node in a threaded binary tree either
contains a link to its child node or thread to other nodes in the tree.

Types of Threaded Binary Tree

There are two types of threaded binary trees.

Single Threaded(One-way threaded Binary Tree): Where a NULL right pointer is made to point to
the inorder successor (if successor exists)

27
In one-way threaded binary trees, a thread will appear either in the right or left link field of a node.
If it appears in the right link field of a node then it will point to the next node that will appear on
performing in order traversal. Such trees are called Right threaded binary trees. If thread appears
in the left field of a node then it will point to the nodes inorder predecessor. Such trees are
called Left threaded binary trees. Left threaded binary trees are used less often as they don't
yield the last advantages of right threaded binary trees. In one-way threaded binary trees, the right
link field of last node and left link field of first node contains a NULL. In order to distinguish
threads from normal links they are represented by dotted lines.

The above figure shows the inorder traversal of this binary tree yields D, B, E, A, C, F. When this
tree is represented as a right threaded binary tree, the right link field of leaf node D which contains
a NULL value is replaced with a thread that points to node B which is the inorder successor of a
node D. In the same way other nodes containing values in the right link field will contain NULL
value.

Two-way threaded Binary Trees:

28
In two-way threaded Binary trees, the right link field of a node containing NULL values is
replaced by a thread that points to nodes inorder successor and left field of a node containing
NULL values is replaced by a thread that points to nodes inorder predecessor.

The above figure shows the inorder traversal of this binary tree yields D, B, E, G, A, C, F. If we
consider the two-way threaded Binary tree, the node E whose left field contains NULL is replaced
by a thread pointing to its inorder predecessor i.e. node B. Similarly, for node G whose right and
left linked fields contain NULL values are replaced by threads such that right link field points to
its inorder successor and left link field points to its inorder predecessor. In the same way, other
nodes containing NULL values in their link fields are filled with threads.

29
In the above figure of two-way threaded Binary tree, we noticed that no left thread is possible for
the first node and no right thread is possible for the last node. This is because they don't have any
inorder predecessor and successor respectively. This is indicated by threads pointing nowhere. So
in order to maintain the uniformity of threads, we maintain a special node called the header node.
The header node does not contain any data part and its left link field points to the root node and its
right link field points to itself. If this header node is included in the two-way threaded Binary tree
then this node becomes the inorder predecessor of the first node and inorder successor of the last
node. Now threads of left link fields of the first node and right link fields of the last node will point
to the header node.

30
Height balanced trees

A height-balanced binary tree is defined as a binary tree in which the height of the left and the right
subtree of any node differ by not more than 1. AVL tree, red-black tree are examples of height-
balanced trees.

Height Balanced tree


Conditions for Height-Balanced Binary Tree:
Following are the conditions for a height-balanced binary tree:
 The difference between the heights of the left and the right subtree for any node is not more
than one.
 The left subtree is balanced.
 The right subtree is balanced.

To check if a Binary tree is balanced we need to check three conditions:


31
1. The absolute difference between heights of left and right subtrees at any node should be less
than 1.
2. For each node, its left subtree should be a balanced binary tree.
3. For each node, its right subtree should be a balanced binary tree.

Self-Balancing Binary Search Trees


In data structure and programming, we mainly discuss two self-balancing binary search trees, which
are as follows:
AVL Trees
AVL tree is a self-balancing Binary Search Tree (BST) where the difference between heights of left
and right subtrees cannot be more than one for all nodes.
Example of AVL Trees:

The above tree is AVL because the differences between the heights of left and right subtrees for every
node are less than or equal to 1.

AVL Trees
An AVL (Adelson-Velskii and Landis) tree is a binary search tree with a balance condition. The balance
condition must be easy to maintain, and it ensures that the depth of the tree is O(log N). An AVL tree
is identical to a binary search tree, except that for every node in the tree, the height of the left and
right subtrees can differ by at most 1.
For example, here are some trees:

32
Yes this is an AVL tree. Examination shows that each left sub-tree has a height 1 greater than each
right sub-tree.

No this is not an AVL tree. Sub-tree with root 8 has height 4 and sub-tree with root 18 has height 2.
we define the balance factor for a node as the difference between the height of the left subtree and the
height of the right subtree.
balanceFactor=height(leftSubTree)−height(rightSubTree)
we will define a tree to be in balance if the balance factor is -1, 0, or 1. Once the balance factor of a
node in a tree is outside this range we will need to have a procedure to bring the tree back into
balance. Figure shows an example of an unbalanced tree and the balance factors of each node.

Operations on an AVL Tree:


 Insertion
 Deletion
 Searching [It is similar to performing a search in BST]

When we do an insertion, we need to update all the balancing information for the nodes on the path
33
back to the root, but the reason that insertion is potentially difficult is that inserting a node could
violate the AVL tree property. The violation might occur in four cases:

1. An
insertion into the left subtree of the left child of α
2. An
insertion into the right subtree of the left child of α
3. An
insertion into the left subtree of the right child of α
4. An
insertion into the right subtree of the right child of α
Cases 1 and 4 are mirror image symmetries with respect to α, as are cases 2 and 3.

The case1 and case4, in which the insertion occurs on the “outside” (i.e., left–left or right– right),
is fixed by a single rotation of the tree. The case2 and case3, in which the insertion occurs on the
“inside” (i.e., left–right or right–left) is handled by the slightly more complex double rotation.
Rotating the subtrees in an AVL Tree:
An AVL tree may rotate in one of the following four ways to keep itself balanced:
Case 1: Right Rotation:
If a node is added to the left subtree of the left subtree, the AVL tree may get out of balance, we do a
single right rotation.

Right Rotation in AVL tree

Example 1( create avl tree 3,2,1)

1 3

after

34
Example2:

35
Case 4: Left Rotation:
When a node is added into the right subtree of the right subtree, if the tree gets out of balance, we do
a single left rotation.

Left-Rotation in AVL tree

36
Case 2: Right-Left Rotation:
When a node is added into the right subtree of the left subtree, if the tree gets out of balance, we do a
double rotation A right-left rotation is a combination in which first right rotation takes place after that
left rotation executes.

Right-Left Rotation in AVL tree

37
Case 3: Left-Right Rotation:
When a node is added into the left subtree of the right subtree, if the tree gets out of balance, we do a
double rotation. A left-right rotation is a combination in which first left rotation takes place after that
right rotation executes.

Left-Right Rotation in AVL tree

38
Example

calculate new balance factor


Insert 9 for the given avl tree

39
Routine to perform single rotation

void rotateWithLeftChild( AvlNode * & k2 )


{
AvlNode *k1 = k2->left;
k2->left = k1->right;
k1->right = k2;
k2->height = max( height( k2->left ), height( k2->right ) ) + 1;
k1->height = max( height( k1->left ), k2->height ) + 1;
k2 = k1;
}

Routine to perform double rotation

void doubleWithLeftChild( AvlNode * & k3 )


{
rotateWithRightChild( k3->left );
rotateWithLeftChild( k3 );
}

Deletion in an AVL tree


40
void remove( const Comparable & x, AvlNode *&t)
{
if( t == nullptr )
return; // Item not found; do nothing
if( x < t->element )
remove( x, t->left );
else if( t->element < x )
remove( x, t->right );
else if( t->left != nullptr && t->right != nullptr ) // Two children
{
t->element = findMin( t->right )->element;
remove( t->element, t->right );
}
else
{
AvlNode *oldNode = t;
t = ( t->left != nullptr ) ? t->left : t->right;
delete oldNode;
}
balance( t );
}

Trie
Trie data structure is defined as a Tree based data structure that is used for storing a collection of
strings and performing efficient search, insert, delete, prefix search and sorted-traversal-of-all
operations on them. The word Trie is derived from reTRIEval, which means finding something or
obtaining it.
Trie data structure follows a property that If two strings have a common prefix then they will have
the same ancestor in the trie. This particular property allows to find all words with a given prefix.

A Trie data structure is used for storing and retrieval of data and the same operations could be
done using another data structure which is Hash Table but Trie data structure can perform these
operations more efficiently than a Hash Table. A Trie data structure can be used for prefix-
based searching and a sorted traversal of all words. So a Trie has advantages of both hash table and
self balancing binary search trees. However the main issue with Trie is extra memory space required
to store words and the space may become huge for long list of words and/or for long words.
Advantages of Trie Data Structure over a Hash Table:
The A trie data structure has the following advantages over a hash table:
 We can efficiently do prefix search (or auto-complete) with Trie.
 We can easily print all words in alphabetical order which is not easily possible with hashing.
 There is no overhead of Hash functions in a Trie data structure.
 Searching for a String even in the large collection of strings in a Trie data structure can be
done in O(L) Time complexity, Where L is the number of words in the query string. This
searching time could be even less than O(L) if the query string does not exist in the trie.
Properties of a Trie Data Structure
Below are some important properties of the Trie data structure:
41
 Each Trie has an empty root node, with links (or references) to other nodes
 Each node of a Trie represents a string and each edge represents a character.
 Every node consists of hashmaps or an array of pointers, with each index representing a
character and a flag to indicate if any string ends at the current node.
 Trie data structure can contain any number of characters including alphabets, numbers,
and special characters. But for this article, we will discuss strings with characters a-z.
Therefore, only 26 pointers need for every node, where the 0th index represents ‘a’ and
the 25th index represents ‘z’ characters.
 Each path from the root to any node represents a word or string.
Below is a simple example of Trie data structure.

Trie Data Structure

Heap
A Heap is a complete binary tree data structure that satisfies the heap property: for every node, the
value of its children is greater than or equal to its own value. Heaps are usually used to implement
priority queues, where the smallest (or largest) element is always at the root of the tree. There are two
types of heaps:
 Max-Heap: The value of each node is greater than or equal to the values of its children,
ensuring that the root node contains the maximum value. As you move down the tree, the
42
values decrease.
 Min-Heap: The value of each node is less than or equal to the values of its children, ensuring
that the root node contains the minimum value. As you move down the tree, the values
increase.

Heap Operations
Common heap operations are:
 Insert: Adds a new element to the heap while maintaining the heap property.
 Extract Max/Min: Removes the maximum or minimum element from the heap and returns it.
 Heapify: Converts an arbitrary binary tree into a heap.

43
Heap Data Structure

44
Hashing and Collision:

What is Hashing?

Hashing is the process of converting an input (or key) into a hash value using a hash function. This
hash value is typically a fixed-size integer that is used to index into a data structure, such as a hash
table. The main goal of hashing is to achieve constant-time average complexity for common
operations like insertion, deletion, and search.

Hash Tables

A hash table, also known as a hash map, is a data structure that implements an associative array
abstract data type. It uses a combination of hashing and an underlying array to store key-value pairs.

The hash function determines the index of the array where the key-value pair should be placed. The
index is calculated based on the hash value of the key. The beauty of a hash table is that it allows for
fast insertion, deletion, and retrieval of values given a key.

Hash Functions

A hash function is a mathematical function that takes an input (key) and produces a fixed-size output
(hash value). It should have the following properties:

1. Deterministic: Given the same input, the hash function should always produce the same
output.

2. Fast: The hash function should be computationally efficient.

3. Uniform: The hash function should uniformly distribute the keys across the range of hash
values, reducing the chance of collisions (more on this later).

Hashing is the technique used for performing almost constant time search in case of insertion,
deletion and find operation. Taking a very simple example of it, an array with its index as key
is the example of hash table. So each index (key) can be used for accessing the value in a
constant search time. This mapping key must be simple to compute and must helping in
identifying the associated value. Function which helps us in generating such kind of key-value
mapping is known as Hash Function.
45
In a hashing system the keys are stored in an array which is called the Hash Table. A perfectly
implemented hash table would always promise an average insert/delete/retrieval time of O(1).

Collision Resolution

One of the main challenges in hashing is handling collisions, which occur when two or more input
values produce the same hash value. There are various techniques used to resolve collisions,
including:

o Chaining: In this technique, each hash table slot contains a linked list of all the values that
have the same hash value. This technique is simple and easy to implement, but it can lead to
poor performance when the linked lists become too long.
o Open addressing: In this technique, when a collision occurs, the algorithm searches for an
empty slot in the hash table by probing successive slots until an empty slot is found. This
technique can be more efficient than chaining when the load factor is low, but it can lead to
clustering and poor performance when the load factor is high.
o Double hashing: This is a variation of open addressing that uses a second hash function to
determine the next slot to probe when a collision occurs. This technique can help to reduce
clustering and improve performance.

46
47
Hash collision resolution techniques:

Open Hashing (Separate chaining):

Open Hashing, is a technique in which the data is not directly stored at the hash key index (k) of
the Hash table. Rather the data at the key index (k) in the hash table is a pointer to the head of the
data structure where the data is actually stored. In the most simple and common implementations
the data structure adopted for storing the element is a linked-list.

in this technique when a data needs to be searched, it might become necessary (worst case) to
traverse all the nodes in the linked list to retrieve the data.
Note that the order in which the data is stored in each of these linked lists (or other data
structures) is completely based on implementation requirements. Some of the popular criteria
are insertion order, frequency of access etc.

Closed hashing (open Addressing)


In this technique a hash table with pre-identified size is considered. All items are stored in the
hash table itself. In addition to the data, each hash bucket also maintains the three states:
EMPTY, OCCUPIED, DELETED. While inserting, if a collision occurs, alternative cells are
tried until an empty bucket is found. For which one of the following technique is adopted.
1. Liner Probing
In linear probing, the hash table is searched sequentially that starts from the original
location of the hash. If in case the location that we get is already occupied, then we check
for the next location.
The function used for rehashing is as follows: rehash(key) = (n+1)%table-size.

48
2. Quadratic probing

If you observe carefully, then you will understand that the interval between probes will increase
proportionally to the hash value. Quadratic probing is a method with the help of which we can solve
the problem of clustering that was discussed above. This method is also known as the mid-
square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from
the original hash location. If only the location is occupied then we check the other slots.
et hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution
strategy to be f(i) = i 2 . Insert = 22, 30, and 50.
 Step 1: Create a table of size 7.

49
Hash table

 Step 2 – Insert 22 and 30


 Hash(22) = 22 % 7 = 1, Since the cell at index 1 is empty, we can easily
insert 22 at slot 1.
 Hash(30) = 30 % 7 = 2, Since the cell at index 2 is empty, we can easily
insert 30 at slot 2.

50
Insert keys 22 and 30 in the hash table

 Step 3: Inserting 50
 Hash(50) = 50 % 7 = 1
 In our hash table slot 1 is already occupied. So, we will search for slot 1+1 2,
i.e. 1+1 = 2,
 Again slot 2 is found occupied, so we will search for cell 1+2 2, i.e.1+4 = 5,
 Now, cell 5 is not occupied so we will place 50 in slot 5.

51
Insert key 50 in the hash table

3. Double Hashing
The intervals that lie between probes are computed by another hash function. Double hashing is a
technique that reduces clustering in an optimized way. In this technique, the increments for the
probing sequence are computed by using another hash function. We use another hash function
hash2(x) and look for the i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function
is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
 Step 1: Insert 27
 27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.

52
Insert key 27 in the hash table

 Step 2: Insert 43
 43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.

Insert key 43 in the hash table

 Step 3: Insert 692


53
 692 % 7 = 6, but location 6 is already being occupied and this is a collision
 So we need to resolve this collision using double hashing.
hnew = [h1(692) + i * (h2(692)] % 7
= [6 + 1 * (1 + 692 % 5)] % 7
= 9% 7
=2
Now, as 2 is an empty slot,
so we can insert 692 into 2nd slot.

Insert key 692 in the hash table

 Step 4: Insert 72
 72 % 7 = 2, but location 2 is already being occupied and this is a collision.
 So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.

54
Insert key 72 in the hash table

See this for step-by-step diagrams:

A comparative analysis of Closed Hashing vs Open Hashing

Open Addressing Closed Addressing


All elements would be Additional Data structure
stored in the Hash table needs to be used to
itself. No additional data accommodate collision
structure is needed. data.
Simple and effective
In cases of collisions, a
approach to collision
unique hash key must be
resolution. Key may or may
obtained.
not be unique.
Determining size of the Performance deterioration
hash table, adequate enough of closed addressing much
for storing all the data is slower as compared to
difficult. Open addressing.
State needs be maintained No state data needs to be
for the data (additional maintained (easier to
work) maintain)
Uses space efficiently Expensive on space

55
Applications of Hashing:
A hash function maps a variable length input string to fixed length output string -- its hash
value, or hash for short. If the input is longer than the output, then some inputs must map to the
same output -- a hash collision. Comparing the hash values for two inputs can give us one of
two answers: the inputs are definitely not the same, or there is a possibility that they are the
same. Hashing as we know it is used for performance improvement, error checking, and
authentication. One example of a performance improvement is the common hash table, which
uses a hash function to index into the correct bucket in the hash table, followed by comparing
each element in the bucket to find a match. In error checking, hashes (checksums, message
digests, etc.) are used to detect errors caused by either hardware or software. Examples are TCP
checksums, ECC memory, and MD5 checksums on downloaded files . In this case, the hash
provides additional assurance that the data we received is correct. Finally, hashes are used to
authenticate messages. In this case, we are trying to protect the original input from tampering,
and we select a hash that is strong enough to make malicious attack infeasible or unprofitable.
 Construct a message authentication code (MAC)
 Digital signature
 Make commitments, but reveal message later
 Timestamping
 Key updating: key is hashed at specific intervals resulting in new key

What is Hashing?
Hashing is the process of converting an input (or key) into a hash value using a hash function. This
hash value is typically a fixed-size integer that is used to index into a data structure, such as a hash
table. The main goal of hashing is to achieve constant-time average complexity for common
operations like insertion, deletion, and search.
Hash Tables
A hash table, also known as a hash map, is a data structure that implements an associative array
abstract data type. It uses a combination of hashing and an underlying array to store key-value pairs.
The hash function determines the index of the array where the key-value pair should be placed. The
index is calculated based on the hash value of the key. The beauty of a hash table is that it allows for
fast insertion, deletion, and retrieval of values given a key.
Hash Functions
A hash function is a mathematical function that takes an input (key) and produces a fixed-size output
(hash value). It should have the following properties:
1. Deterministic: Given the same input, the hash function should always produce the same
output.
2. Fast: The hash function should be computationally efficient.
3. Uniform: The hash function should uniformly distribute the keys across the range of hash
56
values, reducing the chance of collisions (more on this later).

Graph

Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes
also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph.
More formally a Graph is composed of a set of vertices (V) and a set of edges (E).
The graph is denoted by G (V, E).
Example:

◦ V = {A,B,C,D}
◦ E = {(A,B),(A,C),(A,D),(B,D),(C,D)}
Basic Operations:

57
 Insertion of Nodes/Edges in the graph – Insert a node into the graph.
 Deletion of Nodes/Edges in the graph – Delete a node from the graph.
 Searching on Graphs – Search an entity in the graph.
 Traversal of Graphs – Traversing all the nodes in the graph.
More Operations:
 Shortest Paths : From a source to a destination, a source to all other nodes and between all
pairs.
 Minimum Spanning Tree : In a weighted, connected undirected graph, finding the minimum
weight edges to connect all.
Applications of Graph:
Following are the real-life applications:
 If we recall all the previous data structures that we have studied like array, linked list, tree, etc.
All these had some restrictions on structure (mostly linear and tree hierarchical which means
no loops). Graph allows random connections between nodes which is useful in many real
world problems where do have restrictions of previous data structures.
 Used heavily in social networks. Everyone on the network is a vertex (or node) of the graph
and if connected, then there is an edge. Now imagine all the features that you see, mutual
friends, people that follow you, etc can seen as graph problems.
 Neural Networks: Vertices represent neurons and edges represent the synapses between them.
Neural networks are used to understand how our brain works and how connections change
when we learn. The human brain has about 10^11 neurons and close to 10^15 synapses.
 Compilers: Graph Data Structure is used extensively in compilers. They can be used for type
inference, for so-called data flow analysis, register allocation, and many other purposes.
They are also used in specialized compilers, such as query optimization in database
languages.
 Robot planning: Vertices represent states the robot can be in and the edges the possible
transitions between the states. Such graph plans are used, for example, in planning paths for
autonomous vehicles.
 In GPS. The problems like finding the closest route, closest petrol pumps, etc are all soled using
graph problems.
 For optimizing the cost of connecting all locations of a network. For example, minimizing
wire length in a wired network to make sure all devices are connected is a standard Graph
problem called Minimum Spanning Tree.
 Can be used to represent the interactions between players on a team, such as passes, shots, and
tackles. Analyzing these interactions can provide insights into team dynamics and areas for
improvement.
 Can be used to represent the topology of computer networks, such as the connections between
routers and switches.
 Graphs are used to represent the connections between different places in a transportation
network, such as roads and airports.

58
 Dependencies in a software project (or any other type of project) can be seen as graph and
generating a sequence to solve all tasks before dependents is a standard graph topological
sorting algorithm.
Types of Graphs
1. Undirected Graph
A graph in which edges do not have any direction. That is the nodes are unordered pairs in the
definition of every edge.

2. Directed Graph(Digraph)
A graph in which edge has direction. That is the nodes are ordered pairs in the definition of every
edge.

3. Weighted Graphs: A graph in which edges have weights or costs associated with them.
Example: A road network graph where the weights can represent the distance between two
cities.

4. Unweighted Graphs: A graph in which edges have no weights or costs associated with them.
Example: A social network graph where the edges represent friendships.

59
5. Complete Graphs: A graph in which each vertex is connected to every other vertex

6. Cycle Graph: A Cycle in a graph is a path that starts and ends at the same vertex, with no
repetitions of vertices (except the starting and ending vertex, which are the same).

7. Degree of a Vertex: The Degree of a Vertex in a graph is the number of edges


incident to that vertex.

8. Indegree of a Graph: Indegree of a vertex is defined as the number of incoming edges incident
on a vertex in a directed graph.

9. Outdegree of a Graph: Outdegree of a vertex is defined as the number of outgoing edges from
a vertex in a directed graph.

60
Degree(V1)=2
Degree(V2)=4
Degree(V3)=2
Degree(V4)=3
Degree(V5)=3

Indegree(V5) =1
Indegree(V1) =1
Indegree(V2) =2
Indegree(V3) =1
Indegree(V4) =2

Outdegree (V1) =1
Outdegree (V2) = 2
Outdegree (V3) = 1
Outdegree (V4) = 1
Outdegree (V5) = 2

Graph and its representations:

Graphs are used to represent many real life applications: Graphs are used to represent networks.
The networks may include paths in a city or telephone network or circuit network. Graphs are
also used in social networks like linkedIn, facebook. For example, in facebook, each person is
represented with a vertex(or node). Each node is a structure and contains information like person
id, name, gender and locale.

Following is an example undirected graph with 5 vertices.

61
Following two are the most commonly used representations of graph.
1. Adjacency Matrix
2. Adjacency List

There are other representations also like, Incidence Matrix and Incidence List. The choice of the
graph representation is situation specific. It totally depends on the type of operations to be
performed and ease of use.

Adjacency Matrix:

Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let
the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j.
Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to
represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with
weight w.

The adjacency matrix for the above example graph is:

Adjacency Matrix Representation of the above graph

Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time.
Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be done
O(1).

Cons: Consumes more space O(V^2). Even if the graph is sparse (contains less number of
62
edges), it consumes the same space. Adding a vertex is O(V^2) time.

Adjacency List:

An array of linked lists is used. Size of the array is equal to number of vertices. Let the array be
array[]. An entry array[i] represents the linked list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
stored in nodes of linked lists. Following is adjacency list representation of the above graph.

Adjacency List Representation of the above Graph

Topological Sort
A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a path
from vi to vj, then vj appears after vi in the ordering. To formalize this, we define the indegree of a
vertex v as the number of edges (u, v). We compute the indegrees of all vertices in the graph.
Assuming that the indegree for each vertex is stored, and that the graph is read into an adjacency list,
we can then apply the algorithm to generate a topological ordering.

v1 v2

v3 v4 v5

v6 v7

Indegree Before Dequeue #

Vertex 1 2 3 4 5 6 7
v1 0 0 0 0 0 0 0
v2 1 0 0 0 0 0 0
63
v3 2 1 1 1 0 0 0
v4 3 2 1 0 0 0 0
v5 1 1 0 0 0 0 0
v6 3 3 3 3 2 1 0
v7 2 2 2 1 0 0 0
Enqueu v1 v2 v5 v4 v3, v7 v
e 6
Dequeu v1 v2 v5 v4 v v7 v
3 6

Topological Sort: V1, V2, V5, V4, V3, V7, V6

Pseudocode to perform topological sort

void Graph::topsort( )
{
Queue<Vertex> q;
int counter = 0;

[Link]( );
for each Vertex v
if( [Link] == 0 )
[Link]( v );

while( ![Link]( ) )
{
Vertex v = [Link]( );
[Link] = ++counter; // Assign next number

for each Vertex w adjacent to


v if( --[Link] == 0 )
[Link]( w );
}

if( counter != NUM_VERTICES )


throw CycleFoundException();
}

64
Example 2

Indegree Before Dequeue #

Vertex 0 1 2 3 4 5
V0 0 0 0 0 0 0
V1 1 0 0 0 0 0
V2 1 1 0 0 0 0
V3 3 2 1 0 0 0
V4 2 2 2 1 0 0
V5 3 3 2 1 1 0

Enqueu V V V4 V5
e 0 1 V2 V3
Dequeu V V V2 V3 V4 V5
0 1

Topological Sort: V0, V1, V2, V3, V4, V5

Breadth First Traversal for a Graph

Breadth First Traversal (or Search) for a graph is similar to Breadth First Traversal of a tree The
only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node
again. To avoid processing a node more than once, we use a boolean visited array.

For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0,
65
we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited
vertices, then 2 will be processed again and it will become a non-terminating process. Breadth
First Traversal of the following graph is 2, 0, 3, 1.

66
Example 2:

BFS: 0,1,2,3,4,5,6

Example 3
graph = { {1, 2}, // Node 0 is connected to nodes 1 and 2
{0, 3, 4}, // Node 1 is connected to nodes 0, 3, and 4
{0, 5}, // Node 2 is connected to nodes 0 and 5
{1}, // Node 3 is connected to node 1
{1, 5}, // Node 4 is connected to nodes 1 and 5
{2, 4} // Node 5 is connected to nodes 2 and 4 }

67
Algorithm for BFS Traversal

1. Select a starting vertex


2. Create a queue and enqueue the
starting vertex.
3. First, mark the selected start vertex
as visited and add it to the visited
array.
4. While the queue is not empty
perform the following operations:
 Dequeue a vertex, print it.
 Take the
dequeued node
and for each
unvisited
neighbor of
that node
o Insert all its unvisited
adjacent vertices in the queue
and mark them as visited.
5. Repeat the above step 4 until the queue is
empty.

#include <iostream>
#include <vector>
#include <queue>

using namespace std;

// Function to add an edge to the graph (undirected graph)


void addEdge(vector<int> adj[], int u, int v) {
adj[u].push_back(v);
adj[v].push_back(u); // Since the graph is undirected
}

// Function to perform BFS on the graph


void BFS(vector<int> adj[], int startVertex, int numVertices) {
vector<bool> visited(numVertices + 1, false); // +1 for 1-based index
queue<int> queue;

// Start from the given starting vertex


visited[startVertex] = true;
[Link](startVertex);

68
while (![Link]()) {
int vertex = [Link]();
[Link]();
cout << vertex << " ";

// Visit all the adjacent vertices


for (int adjVertex : adj[vertex]) {
if (!visited[adjVertex]) {
visited[adjVertex] = true;
[Link](adjVertex);
}
}
}
cout << endl;
}

int main() {
int numVertices = 6; // 6 vertices labeled from 1 to 6
vector<int> adj[numVertices + 1]; // +1 for 1-based index

// Adding edges as per your graph


addEdge(adj, 1, 2);
addEdge(adj, 1, 3);
addEdge(adj, 2, 6);
addEdge(adj, 2, 4);
addEdge(adj, 3, 4);
addEdge(adj, 3, 5);

cout << "Breadth First Search starting from vertex 1:" << endl;
BFS(adj, 1, numVertices);

return 0;
}

69
Applications of Breadth First Traversal

1) Shortest Path and Minimum Spanning Tree for unweighted graph In unweighted graph,
the shortest path is the path with least number of edges. With Breadth First, we always reach a
vertex from given source using minimum number of edges. Also, in case of unweighted graphs,
any spanning tree is Minimum Spanning Tree and we can use either Depth or Breadth first
traversal for finding a spanning tree.

2) Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search is
used to find all neighbor nodes.

3) Crawlers in Search Engines: Crawlers build index using Bread First. The idea is to start
from source page and follow all links from source and keep doing same. Depth First Traversal
can also be used for crawlers, but the advantage with Breadth First Traversal is, depth or levels
of built tree can be limited.

4) Social Networking Websites: In social networks, we can find people within a given distance
‘k’ from a person using Breadth First Search till ‘k’ levels.

5) GPS Navigation systems: Breadth First Search is used to find all neighboring locations.

6) Broadcasting in Network: In networks, a broadcasted packet follows Breadth First Search to


reach all nodes.

7) In Garbage Collection: Breadth First Search is used in copying garbage collection using
Cheney’s algorithm.

8) Cycle detection in undirected graph: In undirected graphs, either Breadth First Search or
Depth First Search can be used to detect cycle. In directed graph, only depth first search can be
used.

9) Ford–Fulkerson algorithm In Ford-Fulkerson algorithm, we can either use Breadth First or


Depth First Traversal to find the maximum flow. Breadth First Traversal is preferred as it
reduces worst case time complexity to O(VE2).

10) To test if a graph is Bipartite We can either use Breadth First or Depth First Traversal.

11) Path Finding We can either use Breadth First or Depth First Traversal to find if there is a
path between two vertices.

12) Finding all nodes within one connected component: We can either use Breadth First or
Depth First Traversal to find all nodes reachable from a given node.

70
Depth First Traversal for a Graph

Depth First Traversal (or Search) for a graph is similar to Depth First Traversal of a tree. The
only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node
again. To avoid processing a node more than once, we use a boolean
visited array. For example, in the following graph, we start traversal from vertex 2. When
we come to vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we
don’t mark visited vertices, then 2 will be processed again and it will become a non-terminating
process. Depth First Traversal of the

71
following graph is 2, 0, 1, 3

Algorithm Depth-First Search

The DFS forms a depth-first forest comprised of more than one depth-first trees. Each tree is
made of edges (u, v) such that u is gray and v is white when edge (u, v) is explored. The
following pseudocode for DFS uses a global timestamp time.

void Graph::dfs( Vertex v )


{
[Link] = true;
for each Vertex w adjacent to v
if( ![Link] )
dfs( w );
}
Applications of Depth First Search

Depth-first search (DFS) is an algorithm (or technique) for traversing a

graph. Following are the problems that use DFS as a building block.

72
1) For an unweighted graph, DFS traversal of the graph produces the minimum spanning tree
and all pair shortest path tree.

2) Detecting cycle in a graph


A graph has cycle if and only if we see a back edge during DFS. So we can run DFS for the
graph and check for back edges. (See this for details)

3) Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
i) Call DFS(G, u) with u as the start vertex.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path
as the contents of the stack

4) Topological Sorting

5) To test if a graph is bipartite


We can augment either BFS or DFS when we first discover a new vertex, color it opposite its
parents, and for each other edge, check it doesn’t link two vertices of the same color. The first
vertex in any connected component can be red or black! See this for details.

6) Finding Strongly Connected Components of a graph A directed graph is called strongly


connected if there is a path from each vertex in the graph to every other vertex. (See this for
DFS based also for finding Strongly Connected Components)

73

You might also like