Understanding Trees in Data Structures
Understanding Trees in Data Structures
TREES
INTRODUCTION
In non-linear data structure data is organized in random order. A tree is a very popular non-
linear data structure used in a wide range of applications. Tree is a non-linear data structure
which organizes data in hierarchical structure and this is a recursive definition.
A tree is a non-empty set one element of which is designated the root of the tree while the remaining
elements are partitioned into non-empty sets each of which is a sub-tree of the root.
A tree T is a set of nodes storing elements such that the nodes have a parent-child relationship that
satisfies the following:
• If T is not empty, T has a special tree called the root that has no parent.
• Each node v of T different than the root has a unique parent node w; each node with parent w
is a child of w.
Tree nodes have many useful properties. The depth of a node is the length of the path (or the number
of edges) from the root to that node. The height of a node is the longest path from that node to its
leaves. The height of a tree is the height of the root. A leaf node has no children -- its only path is up
to its parent.
Application of Trees:
1. One reason to use trees might be because you want to store information that naturally forms a
1
hierarchy. For example, the file system on a computer:
file system
———–
/ <-- root
/ \
... home
/ \
ugrad course
/ / | \
... cs101 cs112 cs113
2. If we organize keys in form of a tree (with some ordering e.g., BST), we can search for a given
key in moderate time (quicker than Linked List and slower than arrays). Self-balancing search
trees like AVL and Red-Black trees guarantee an upper bound of O(logn) for search.
3) We can insert/delete keys in moderate time (quicker than Arrays and slower than Unordered
Linked Lists). Self-balancing search trees like AVL and Red-Black trees guarantee an upper bound
of O(logn) for insertion/deletion.
4) Like Linked Lists and unlike Arrays, Pointer implementation of trees don’t have an upper
limit on number of nodes as nodes are linked using pointers.
TREE TERMINOLOGIES:
1. Root Node: In a Tree data structure, the first node is called as Root Node. Every tree
must have a root node. We can say that the root node is the origin of the tree data structure. In
any tree, there must be only one root node. We never have multiple root nodes in a tree.
2. Edge: In a Tree, the connecting link between any two nodes is called as EDGE. In a tree
with 'N' number of nodes there will be a maximum of 'N-1' number of edges.
2
3. Parent Node: In a Tree, the node which is a predecessor of any node is called as
PARENT NODE. In simple words, the node which has a branch from it to any other node is
called a parent node. Parent node can also be defined as "The node which has child /
children".
5. Siblings: In a Tree data structure, nodes which belong to same Parent are called as
SIBLINGS. In simple words, the nodes with the same parent are called Sibling nodes.
3
6. Leaf Node: In a Tree data structure, the node which does not have a child is called
as LEAF Node. In simple words, a leaf is a node with no child. In a tree data structure, the
leaf nodes are also called as External Nodes. External node is also a node with no child. In a
tree, leaf node is also called as 'Terminal' node.
7. Internal Nodes: In a Tree data structure, the node which has atleast one child is called as
INTERNAL Node. In simple words, an internal node is a node with atleast one child.
In a Tree data structure, nodes other than leaf nodes are called as Internal Nodes. The root
node is also said to be Internal Node if the tree has more than one node. Internal nodes are
also called as 'Non-Terminal' nodes.
8. Degree: In a Tree data structure, the total number of children of a node is called as
DEGREE of that Node. In simple words, the Degree of a node is total number of children it
has. The highest degree of a node among all the nodes in a tree is called as 'Degree of Tree'
9. Level: In a Tree data structure, the root node is said to be at Level 0 and the children of
root node are at Level 1 and the children of the nodes which are at Level 1 will be at Level 2
5
and so on... In simple words, in a tree each step from top to bottom is called as a Level and
the Level count starts with '0' and incremented by one at each level (Step).
10. Height: In a Tree data structure, the total number of edges from leaf node to a particular
node in the longest path is called as HEIGHT of that Node. In a tree, height of the root node
is said to be height of the tree. In a tree, height of all leaf nodes is '0'.
11. Depth: In a Tree data structure, the total number of egdes from root node to a particular
node is called as DEPTH of that Node. In a tree, the total number of edges from root node to
a leaf node in the longest path is said to be Depth of the tree. In simple words, the highest
depth of any leaf node in a tree is said to be depth of that tree. In a tree, depth of the root
node is '0'.
12. Path: In a Tree data structure, the sequence of Nodes and Edges from one node to
another node is called as PATH between that 6two Nodes. Length of a Path is total number
of nodes in that path. In below example the path A - B - E - J has length 4.
13. Sub Tree: In a Tree data structure, each child from a node forms a subtree recursively.
Every child node will form a subtree on its parent node.
TREE REPRESENTATIONS:
A tree data structure can be represented in two methods. Those methods are as follows...
1. List Representation
2. Left Child - Right Sibling Representation
1. List Representation
In this representation, we use two types of nodes7 one for representing the node with data
called 'data node' and another for representing only references called 'reference node'. We
start with a 'data node' from the root node in the tree. Then it is linked to an internal node
through a 'reference node' which is further linked to any other node directly. This process
repeats for all the nodes in the tree.
The above example tree can be represented using List representation as follows...
In this representation, every node's data field stores the actual value of that node. If that node
has left a child, then left reference field stores the address of that left child node otherwise
stores NULL. If that node has the right sibling, then right reference field stores the address of
right sibling node otherwise stores NULL.
The above example tree can be represented using Left Child - Right Sibling representation as
follows...
BINARY TREE:
In a normal tree, every node can have any number 8 of children. A binary tree is a special type
of tree data structure in which every node can have a maximum of 2 children. One is known
as a left child and the other is known as right child.
A tree in which every node can have a maximum of two children is called Binary Tree.
In a binary tree, every node can have either 0 children or 1 child or 2 children but not more
than 2 children.
In general, tree nodes can have any number of children. In a binary tree, each node can have
at most two children. A binary tree is either empty or consists of a node called the root
together with two binary trees called the left subtree and the right subtree. A tree with no
nodes is called as a null tree
Example:
3. Since a binary tree can contain at most one node at level 0 (the root), it can contain at most 2l
node at level l.
9
Strictly binary tree is also called as Full Binary Tree or Proper Binary Tree or 2-Tree.
10
Example
11
In above figure, a normal binary tree is converted into full binary tree by adding dummy
nodes.
4. Skewed Binary Tree:
If a tree which is dominated by left child node or right child node, is said to be a Skewed
Binary Tree.
In a skewed binary tree, all nodes except one have only one child node. The remaining
node has no child.
In a left skewed tree, most of the nodes have the left child without corresponding right child.
In a right skewed tree, most of the nodes have the right child without corresponding left
child.
BINARY TREE REPRESENTATIONS:
A binary tree data structure is represented using two methods. Those methods are as
follows...
1. Array Representation
2. Linked List Representation
Consider the following binary tree...
12
1. Array Representation of Binary Tree
In array representation of a binary tree, we use one-dimensional array (1-D Array) to
represent a binary tree.
Consider the above example of a binary tree and it is represented as follows...
To represent a binary tree of depth 'n' using array representation, we need one dimensional
array with a maximum size of 2n + 1.
2. Linked List Representation of Binary Tree
We use a double linked list to represent a binary tree. In a double linked list, every node
consists of three fields. First field for storing left child address, second for storing actual
data and third for the right child address.
In this linked list representation, a node has the following structure...
The above example of the binary tree represented using Linked list representation is shown as
follows...
In the above example of binary tree, first we visit root node 'A' then visit its left
child 'B' which is a root for D and F. So we visit B's left child 'D' and again D is a root for I and
J. So we visit D's left child 'I' which is the leftmost child. So next we go for visiting D's right
child 'J'. With this we have completed root, left and right parts of node D and root, left parts
of node B. Next visit B's right child 'F'. With this we have completed root and left parts of
node A. So we go for A's right child 'C' which is a root node for G and H. After visiting C,
we go for its left child 'G' which is a root for node K. So next we visit left of G, but it does
not have left child so we go for G's right child 'K'. With this, we have completed node C's
root and left parts. Next visit C's right child 'H' which is the rightmost child in the tree. So we
stop the process.
That means here we have visited in the order of A-B-D-I-J-F-C-G-K-H using Pre-Order
Traversal.
15
void Postorder( TREE T )
{ if( T != NULL )
{
Postorder( T->left );
Postorder( T->right );
print_element( T->element ); }}
16
17
Expression Tree
The expression tree is a binary tree in which each internal node corresponds to the operator and
each leaf node corresponds to the operand so for example expression tree for 3 + ((5+9)*2) would
be:
Inorder traversal of expression tree produces infix version of given postfix expression (same with
postorder traversal it gives postfix expression)
Use of Expression tree
1. The main objective of using the expression trees is to make complex expressions and can be
easily be evaluated using these expression trees.
2. It is also used to find out the associativity of each operator in the expression.
3. It is also used to solve the postfix, prefix, and infix expression evaluation.
For example, the postfix notation a b + c d e + * * results in the following expression tree. The
corresponding infix notation is (a+b)*(c*(d+e)) which can be produced by traversing the
expression tree in an inorder fashion.
18
Properties and Operations:
A BST is a binary tree of nodes ordered in the following way:
The reason binary-search trees are important is that the following operations can be implemented
efficiently using a BST:
19
and searching for 12:
20
Inserting a node
A naïve algorithm for inserting a node into a BST is that, we start from the root node, if the node
to insert is less than the root, we go to left child, and otherwise we go to the right child of the
root. We continue this process (each node is a root for some sub tree) until we find a null pointer
(or leaf node) where we cannot go any further. We then insert the node as a left or right child of
the leaf node based on node is less or greater than the leaf node. We note that a new node is
always inserted as a leaf node. A recursive algorithm for inserting a node into a BST is as
follows. Assume we insert a node N to tree T. if the tree is empty, the we return new node N as
the tree. Otherwise, the problem of inserting is reduced to inserting the node N to left of right sub
trees of T, depending on N is less or greater than T. A definition is as follows.
Insert(N, T) = N if T is empty
= insert(N, [Link]) if N < T
= insert(N, [Link]) if N > T
Hence we need to be careful about deleting nodes from a tree. The best way to deal with deletion
seems to be considering special cases. What if the node to delete is a leaf node? What if the node
is a node with just one child? What if the node is an internal node (with two children). The latter
case is the hardest to resolve. But we will find a way to handle this situation as well.
Case 1 : The node to delete is a leaf node
This is a very easy case. Just delete the node 46. We are done
22
Case 2 : The node to delete is a node with one child.
This is also not too bad. If the node to be deleted is a left child of the parent, then we connect the
left pointer of the parent (of the deleted node) to the single child. Otherwise if the node to be
deleted is a right child of the parent, then we connect the right pointer of the parent (of the
Next problem is finding a replacement leaf node for the node to be deleted. We can easily find
this as follows. If the node to be deleted is N, the find the largest node in the left sub tree of N or
the smallest node in the right sub tree of N. These are two candidates that can replace the node to
be deleted without losing the order property. For example, consider the following tree and
suppose we need to delete the root 38.
23
Then we find the largest node in the left sub tree (15) or smallest node in the right sub tree (45)
and replace the root with that node and then delete that node. The following set of images
demonstrates this process.
24
25
Deletion routine for binary search trees
void delete( element_type x, BinaryNode T )
{
BinaryNode tmp_cell;
if( T == NULL )
error("Element not found");
else
if( x < T->element ) /* Go left */
delete( x, T->left );
else
if( x > T->element ) /* Go right */
delete( x, T->right );
else /* Found element to be deleted */
if( T->left !=NULL&& T->right!=NULL ) /* Two children */
{ /* Replace with smallest in right subtree */
T->element = find_min( T->right->element );
delete( T->element, T->right );
}
else /* One child */
}
tmp_cell = T;
if( T->left == NULL ) /* Only a right child */
T = T->right;
if( T->right == NULL ) /* Only a left child */
T = T->left;
free( tmp_cell );
}
}
26
Self-balancing BSTs can be used to implement any algorithm that requires mutable ordered lists,
to achieve optimal worst-case asymptotic performance. For example, if binary tree sort is
implemented with a self-balanced BST, we have a very simple-to-describe yet asymptotically
optimal O(n log n) sorting algorithm. Similarly, many algorithms in computational geometry
exploit variations on self-balancing BSTs to solve problems such as the line segment intersection
problem and the point location problem efficiently. (For average-case performance, however,
self-balanced BSTs may be less efficient than other solutions. Binary tree sort, in particular, is
likely to be slower than merge sort, quicksort, or heapsort, because of the tree-balancing
overhead as well as cache access patterns.)
Self-balancing BSTs are flexible data structures, in that it's easy to extend them to efficiently
record additional information or perform new operations. For example, one can record the
number of nodes in each subtree having a certain property, allowing one to count the number of
nodes in a certain key range with that property in O(log n) time. These extensions can be used,
for example, to optimize database queries or other list-processing algorithms.
In the linked representation of binary trees, more than one half of the link fields contain NULL values
which results in wastage of storage space. If a binary tree consists of n nodes then n+1 link fields
contain NULL values. The NULL links are replaced with special links known as threads. Such binary
trees with threads are known as threaded binary trees. Each node in a threaded binary tree either
contains a link to its child node or thread to other nodes in the tree.
Single Threaded(One-way threaded Binary Tree): Where a NULL right pointer is made to point to
the inorder successor (if successor exists)
27
In one-way threaded binary trees, a thread will appear either in the right or left link field of a node.
If it appears in the right link field of a node then it will point to the next node that will appear on
performing in order traversal. Such trees are called Right threaded binary trees. If thread appears
in the left field of a node then it will point to the nodes inorder predecessor. Such trees are
called Left threaded binary trees. Left threaded binary trees are used less often as they don't
yield the last advantages of right threaded binary trees. In one-way threaded binary trees, the right
link field of last node and left link field of first node contains a NULL. In order to distinguish
threads from normal links they are represented by dotted lines.
The above figure shows the inorder traversal of this binary tree yields D, B, E, A, C, F. When this
tree is represented as a right threaded binary tree, the right link field of leaf node D which contains
a NULL value is replaced with a thread that points to node B which is the inorder successor of a
node D. In the same way other nodes containing values in the right link field will contain NULL
value.
28
In two-way threaded Binary trees, the right link field of a node containing NULL values is
replaced by a thread that points to nodes inorder successor and left field of a node containing
NULL values is replaced by a thread that points to nodes inorder predecessor.
The above figure shows the inorder traversal of this binary tree yields D, B, E, G, A, C, F. If we
consider the two-way threaded Binary tree, the node E whose left field contains NULL is replaced
by a thread pointing to its inorder predecessor i.e. node B. Similarly, for node G whose right and
left linked fields contain NULL values are replaced by threads such that right link field points to
its inorder successor and left link field points to its inorder predecessor. In the same way, other
nodes containing NULL values in their link fields are filled with threads.
29
In the above figure of two-way threaded Binary tree, we noticed that no left thread is possible for
the first node and no right thread is possible for the last node. This is because they don't have any
inorder predecessor and successor respectively. This is indicated by threads pointing nowhere. So
in order to maintain the uniformity of threads, we maintain a special node called the header node.
The header node does not contain any data part and its left link field points to the root node and its
right link field points to itself. If this header node is included in the two-way threaded Binary tree
then this node becomes the inorder predecessor of the first node and inorder successor of the last
node. Now threads of left link fields of the first node and right link fields of the last node will point
to the header node.
30
Height balanced trees
A height-balanced binary tree is defined as a binary tree in which the height of the left and the right
subtree of any node differ by not more than 1. AVL tree, red-black tree are examples of height-
balanced trees.
The above tree is AVL because the differences between the heights of left and right subtrees for every
node are less than or equal to 1.
AVL Trees
An AVL (Adelson-Velskii and Landis) tree is a binary search tree with a balance condition. The balance
condition must be easy to maintain, and it ensures that the depth of the tree is O(log N). An AVL tree
is identical to a binary search tree, except that for every node in the tree, the height of the left and
right subtrees can differ by at most 1.
For example, here are some trees:
32
Yes this is an AVL tree. Examination shows that each left sub-tree has a height 1 greater than each
right sub-tree.
No this is not an AVL tree. Sub-tree with root 8 has height 4 and sub-tree with root 18 has height 2.
we define the balance factor for a node as the difference between the height of the left subtree and the
height of the right subtree.
balanceFactor=height(leftSubTree)−height(rightSubTree)
we will define a tree to be in balance if the balance factor is -1, 0, or 1. Once the balance factor of a
node in a tree is outside this range we will need to have a procedure to bring the tree back into
balance. Figure shows an example of an unbalanced tree and the balance factors of each node.
When we do an insertion, we need to update all the balancing information for the nodes on the path
33
back to the root, but the reason that insertion is potentially difficult is that inserting a node could
violate the AVL tree property. The violation might occur in four cases:
1. An
insertion into the left subtree of the left child of α
2. An
insertion into the right subtree of the left child of α
3. An
insertion into the left subtree of the right child of α
4. An
insertion into the right subtree of the right child of α
Cases 1 and 4 are mirror image symmetries with respect to α, as are cases 2 and 3.
The case1 and case4, in which the insertion occurs on the “outside” (i.e., left–left or right– right),
is fixed by a single rotation of the tree. The case2 and case3, in which the insertion occurs on the
“inside” (i.e., left–right or right–left) is handled by the slightly more complex double rotation.
Rotating the subtrees in an AVL Tree:
An AVL tree may rotate in one of the following four ways to keep itself balanced:
Case 1: Right Rotation:
If a node is added to the left subtree of the left subtree, the AVL tree may get out of balance, we do a
single right rotation.
1 3
after
34
Example2:
35
Case 4: Left Rotation:
When a node is added into the right subtree of the right subtree, if the tree gets out of balance, we do
a single left rotation.
36
Case 2: Right-Left Rotation:
When a node is added into the right subtree of the left subtree, if the tree gets out of balance, we do a
double rotation A right-left rotation is a combination in which first right rotation takes place after that
left rotation executes.
37
Case 3: Left-Right Rotation:
When a node is added into the left subtree of the right subtree, if the tree gets out of balance, we do a
double rotation. A left-right rotation is a combination in which first left rotation takes place after that
right rotation executes.
38
Example
39
Routine to perform single rotation
Trie
Trie data structure is defined as a Tree based data structure that is used for storing a collection of
strings and performing efficient search, insert, delete, prefix search and sorted-traversal-of-all
operations on them. The word Trie is derived from reTRIEval, which means finding something or
obtaining it.
Trie data structure follows a property that If two strings have a common prefix then they will have
the same ancestor in the trie. This particular property allows to find all words with a given prefix.
A Trie data structure is used for storing and retrieval of data and the same operations could be
done using another data structure which is Hash Table but Trie data structure can perform these
operations more efficiently than a Hash Table. A Trie data structure can be used for prefix-
based searching and a sorted traversal of all words. So a Trie has advantages of both hash table and
self balancing binary search trees. However the main issue with Trie is extra memory space required
to store words and the space may become huge for long list of words and/or for long words.
Advantages of Trie Data Structure over a Hash Table:
The A trie data structure has the following advantages over a hash table:
We can efficiently do prefix search (or auto-complete) with Trie.
We can easily print all words in alphabetical order which is not easily possible with hashing.
There is no overhead of Hash functions in a Trie data structure.
Searching for a String even in the large collection of strings in a Trie data structure can be
done in O(L) Time complexity, Where L is the number of words in the query string. This
searching time could be even less than O(L) if the query string does not exist in the trie.
Properties of a Trie Data Structure
Below are some important properties of the Trie data structure:
41
Each Trie has an empty root node, with links (or references) to other nodes
Each node of a Trie represents a string and each edge represents a character.
Every node consists of hashmaps or an array of pointers, with each index representing a
character and a flag to indicate if any string ends at the current node.
Trie data structure can contain any number of characters including alphabets, numbers,
and special characters. But for this article, we will discuss strings with characters a-z.
Therefore, only 26 pointers need for every node, where the 0th index represents ‘a’ and
the 25th index represents ‘z’ characters.
Each path from the root to any node represents a word or string.
Below is a simple example of Trie data structure.
Heap
A Heap is a complete binary tree data structure that satisfies the heap property: for every node, the
value of its children is greater than or equal to its own value. Heaps are usually used to implement
priority queues, where the smallest (or largest) element is always at the root of the tree. There are two
types of heaps:
Max-Heap: The value of each node is greater than or equal to the values of its children,
ensuring that the root node contains the maximum value. As you move down the tree, the
42
values decrease.
Min-Heap: The value of each node is less than or equal to the values of its children, ensuring
that the root node contains the minimum value. As you move down the tree, the values
increase.
Heap Operations
Common heap operations are:
Insert: Adds a new element to the heap while maintaining the heap property.
Extract Max/Min: Removes the maximum or minimum element from the heap and returns it.
Heapify: Converts an arbitrary binary tree into a heap.
43
Heap Data Structure
44
Hashing and Collision:
What is Hashing?
Hashing is the process of converting an input (or key) into a hash value using a hash function. This
hash value is typically a fixed-size integer that is used to index into a data structure, such as a hash
table. The main goal of hashing is to achieve constant-time average complexity for common
operations like insertion, deletion, and search.
Hash Tables
A hash table, also known as a hash map, is a data structure that implements an associative array
abstract data type. It uses a combination of hashing and an underlying array to store key-value pairs.
The hash function determines the index of the array where the key-value pair should be placed. The
index is calculated based on the hash value of the key. The beauty of a hash table is that it allows for
fast insertion, deletion, and retrieval of values given a key.
Hash Functions
A hash function is a mathematical function that takes an input (key) and produces a fixed-size output
(hash value). It should have the following properties:
1. Deterministic: Given the same input, the hash function should always produce the same
output.
3. Uniform: The hash function should uniformly distribute the keys across the range of hash
values, reducing the chance of collisions (more on this later).
Hashing is the technique used for performing almost constant time search in case of insertion,
deletion and find operation. Taking a very simple example of it, an array with its index as key
is the example of hash table. So each index (key) can be used for accessing the value in a
constant search time. This mapping key must be simple to compute and must helping in
identifying the associated value. Function which helps us in generating such kind of key-value
mapping is known as Hash Function.
45
In a hashing system the keys are stored in an array which is called the Hash Table. A perfectly
implemented hash table would always promise an average insert/delete/retrieval time of O(1).
Collision Resolution
One of the main challenges in hashing is handling collisions, which occur when two or more input
values produce the same hash value. There are various techniques used to resolve collisions,
including:
o Chaining: In this technique, each hash table slot contains a linked list of all the values that
have the same hash value. This technique is simple and easy to implement, but it can lead to
poor performance when the linked lists become too long.
o Open addressing: In this technique, when a collision occurs, the algorithm searches for an
empty slot in the hash table by probing successive slots until an empty slot is found. This
technique can be more efficient than chaining when the load factor is low, but it can lead to
clustering and poor performance when the load factor is high.
o Double hashing: This is a variation of open addressing that uses a second hash function to
determine the next slot to probe when a collision occurs. This technique can help to reduce
clustering and improve performance.
46
47
Hash collision resolution techniques:
Open Hashing, is a technique in which the data is not directly stored at the hash key index (k) of
the Hash table. Rather the data at the key index (k) in the hash table is a pointer to the head of the
data structure where the data is actually stored. In the most simple and common implementations
the data structure adopted for storing the element is a linked-list.
in this technique when a data needs to be searched, it might become necessary (worst case) to
traverse all the nodes in the linked list to retrieve the data.
Note that the order in which the data is stored in each of these linked lists (or other data
structures) is completely based on implementation requirements. Some of the popular criteria
are insertion order, frequency of access etc.
48
2. Quadratic probing
If you observe carefully, then you will understand that the interval between probes will increase
proportionally to the hash value. Quadratic probing is a method with the help of which we can solve
the problem of clustering that was discussed above. This method is also known as the mid-
square method. In this method, we look for the i2‘th slot in the ith iteration. We always start from
the original hash location. If only the location is occupied then we check the other slots.
et hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S
Example: Let us consider table Size = 7, hash function as Hash(x) = x % 7 and collision resolution
strategy to be f(i) = i 2 . Insert = 22, 30, and 50.
Step 1: Create a table of size 7.
49
Hash table
50
Insert keys 22 and 30 in the hash table
Step 3: Inserting 50
Hash(50) = 50 % 7 = 1
In our hash table slot 1 is already occupied. So, we will search for slot 1+1 2,
i.e. 1+1 = 2,
Again slot 2 is found occupied, so we will search for cell 1+2 2, i.e.1+4 = 5,
Now, cell 5 is not occupied so we will place 50 in slot 5.
51
Insert key 50 in the hash table
3. Double Hashing
The intervals that lie between probes are computed by another hash function. Double hashing is a
technique that reduces clustering in an optimized way. In this technique, the increments for the
probing sequence are computed by using another hash function. We use another hash function
hash2(x) and look for the i*hash2(x) slot in the ith rotation.
let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
…………………………………………..
…………………………………………..
Example: Insert the keys 27, 43, 692, 72 into the Hash Table of size 7. where first hash-function
is h1(k) = k mod 7 and second hash-function is h2(k) = 1 + (k mod 5)
Step 1: Insert 27
27 % 7 = 6, location 6 is empty so insert 27 into 6 slot.
52
Insert key 27 in the hash table
Step 2: Insert 43
43 % 7 = 1, location 1 is empty so insert 43 into 1 slot.
Step 4: Insert 72
72 % 7 = 2, but location 2 is already being occupied and this is a collision.
So we need to resolve this collision using double hashing.
hnew = [h1(72) + i * (h2(72)] % 7
= [2 + 1 * (1 + 72 % 5)] % 7
=5%7
= 5,
Now, as 5 is an empty slot,
so we can insert 72 into 5th slot.
54
Insert key 72 in the hash table
55
Applications of Hashing:
A hash function maps a variable length input string to fixed length output string -- its hash
value, or hash for short. If the input is longer than the output, then some inputs must map to the
same output -- a hash collision. Comparing the hash values for two inputs can give us one of
two answers: the inputs are definitely not the same, or there is a possibility that they are the
same. Hashing as we know it is used for performance improvement, error checking, and
authentication. One example of a performance improvement is the common hash table, which
uses a hash function to index into the correct bucket in the hash table, followed by comparing
each element in the bucket to find a match. In error checking, hashes (checksums, message
digests, etc.) are used to detect errors caused by either hardware or software. Examples are TCP
checksums, ECC memory, and MD5 checksums on downloaded files . In this case, the hash
provides additional assurance that the data we received is correct. Finally, hashes are used to
authenticate messages. In this case, we are trying to protect the original input from tampering,
and we select a hash that is strong enough to make malicious attack infeasible or unprofitable.
Construct a message authentication code (MAC)
Digital signature
Make commitments, but reveal message later
Timestamping
Key updating: key is hashed at specific intervals resulting in new key
What is Hashing?
Hashing is the process of converting an input (or key) into a hash value using a hash function. This
hash value is typically a fixed-size integer that is used to index into a data structure, such as a hash
table. The main goal of hashing is to achieve constant-time average complexity for common
operations like insertion, deletion, and search.
Hash Tables
A hash table, also known as a hash map, is a data structure that implements an associative array
abstract data type. It uses a combination of hashing and an underlying array to store key-value pairs.
The hash function determines the index of the array where the key-value pair should be placed. The
index is calculated based on the hash value of the key. The beauty of a hash table is that it allows for
fast insertion, deletion, and retrieval of values given a key.
Hash Functions
A hash function is a mathematical function that takes an input (key) and produces a fixed-size output
(hash value). It should have the following properties:
1. Deterministic: Given the same input, the hash function should always produce the same
output.
2. Fast: The hash function should be computationally efficient.
3. Uniform: The hash function should uniformly distribute the keys across the range of hash
56
values, reducing the chance of collisions (more on this later).
Graph
Graph is a non-linear data structure consisting of vertices and edges. The vertices are sometimes
also referred to as nodes and the edges are lines or arcs that connect any two nodes in the graph.
More formally a Graph is composed of a set of vertices (V) and a set of edges (E).
The graph is denoted by G (V, E).
Example:
◦ V = {A,B,C,D}
◦ E = {(A,B),(A,C),(A,D),(B,D),(C,D)}
Basic Operations:
57
Insertion of Nodes/Edges in the graph – Insert a node into the graph.
Deletion of Nodes/Edges in the graph – Delete a node from the graph.
Searching on Graphs – Search an entity in the graph.
Traversal of Graphs – Traversing all the nodes in the graph.
More Operations:
Shortest Paths : From a source to a destination, a source to all other nodes and between all
pairs.
Minimum Spanning Tree : In a weighted, connected undirected graph, finding the minimum
weight edges to connect all.
Applications of Graph:
Following are the real-life applications:
If we recall all the previous data structures that we have studied like array, linked list, tree, etc.
All these had some restrictions on structure (mostly linear and tree hierarchical which means
no loops). Graph allows random connections between nodes which is useful in many real
world problems where do have restrictions of previous data structures.
Used heavily in social networks. Everyone on the network is a vertex (or node) of the graph
and if connected, then there is an edge. Now imagine all the features that you see, mutual
friends, people that follow you, etc can seen as graph problems.
Neural Networks: Vertices represent neurons and edges represent the synapses between them.
Neural networks are used to understand how our brain works and how connections change
when we learn. The human brain has about 10^11 neurons and close to 10^15 synapses.
Compilers: Graph Data Structure is used extensively in compilers. They can be used for type
inference, for so-called data flow analysis, register allocation, and many other purposes.
They are also used in specialized compilers, such as query optimization in database
languages.
Robot planning: Vertices represent states the robot can be in and the edges the possible
transitions between the states. Such graph plans are used, for example, in planning paths for
autonomous vehicles.
In GPS. The problems like finding the closest route, closest petrol pumps, etc are all soled using
graph problems.
For optimizing the cost of connecting all locations of a network. For example, minimizing
wire length in a wired network to make sure all devices are connected is a standard Graph
problem called Minimum Spanning Tree.
Can be used to represent the interactions between players on a team, such as passes, shots, and
tackles. Analyzing these interactions can provide insights into team dynamics and areas for
improvement.
Can be used to represent the topology of computer networks, such as the connections between
routers and switches.
Graphs are used to represent the connections between different places in a transportation
network, such as roads and airports.
58
Dependencies in a software project (or any other type of project) can be seen as graph and
generating a sequence to solve all tasks before dependents is a standard graph topological
sorting algorithm.
Types of Graphs
1. Undirected Graph
A graph in which edges do not have any direction. That is the nodes are unordered pairs in the
definition of every edge.
2. Directed Graph(Digraph)
A graph in which edge has direction. That is the nodes are ordered pairs in the definition of every
edge.
3. Weighted Graphs: A graph in which edges have weights or costs associated with them.
Example: A road network graph where the weights can represent the distance between two
cities.
4. Unweighted Graphs: A graph in which edges have no weights or costs associated with them.
Example: A social network graph where the edges represent friendships.
59
5. Complete Graphs: A graph in which each vertex is connected to every other vertex
6. Cycle Graph: A Cycle in a graph is a path that starts and ends at the same vertex, with no
repetitions of vertices (except the starting and ending vertex, which are the same).
8. Indegree of a Graph: Indegree of a vertex is defined as the number of incoming edges incident
on a vertex in a directed graph.
9. Outdegree of a Graph: Outdegree of a vertex is defined as the number of outgoing edges from
a vertex in a directed graph.
60
Degree(V1)=2
Degree(V2)=4
Degree(V3)=2
Degree(V4)=3
Degree(V5)=3
Indegree(V5) =1
Indegree(V1) =1
Indegree(V2) =2
Indegree(V3) =1
Indegree(V4) =2
Outdegree (V1) =1
Outdegree (V2) = 2
Outdegree (V3) = 1
Outdegree (V4) = 1
Outdegree (V5) = 2
Graphs are used to represent many real life applications: Graphs are used to represent networks.
The networks may include paths in a city or telephone network or circuit network. Graphs are
also used in social networks like linkedIn, facebook. For example, in facebook, each person is
represented with a vertex(or node). Each node is a structure and contains information like person
id, name, gender and locale.
61
Following two are the most commonly used representations of graph.
1. Adjacency Matrix
2. Adjacency List
There are other representations also like, Incidence Matrix and Incidence List. The choice of the
graph representation is situation specific. It totally depends on the type of operations to be
performed and ease of use.
Adjacency Matrix:
Adjacency Matrix is a 2D array of size V x V where V is the number of vertices in a graph. Let
the 2D array be adj[][], a slot adj[i][j] = 1 indicates that there is an edge from vertex i to vertex j.
Adjacency matrix for undirected graph is always symmetric. Adjacency Matrix is also used to
represent weighted graphs. If adj[i][j] = w, then there is an edge from vertex i to vertex j with
weight w.
Pros: Representation is easier to implement and follow. Removing an edge takes O(1) time.
Queries like whether there is an edge from vertex ‘u’ to vertex ‘v’ are efficient and can be done
O(1).
Cons: Consumes more space O(V^2). Even if the graph is sparse (contains less number of
62
edges), it consumes the same space. Adding a vertex is O(V^2) time.
Adjacency List:
An array of linked lists is used. Size of the array is equal to number of vertices. Let the array be
array[]. An entry array[i] represents the linked list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
stored in nodes of linked lists. Following is adjacency list representation of the above graph.
Topological Sort
A topological sort is an ordering of vertices in a directed acyclic graph, such that if there is a path
from vi to vj, then vj appears after vi in the ordering. To formalize this, we define the indegree of a
vertex v as the number of edges (u, v). We compute the indegrees of all vertices in the graph.
Assuming that the indegree for each vertex is stored, and that the graph is read into an adjacency list,
we can then apply the algorithm to generate a topological ordering.
v1 v2
v3 v4 v5
v6 v7
Vertex 1 2 3 4 5 6 7
v1 0 0 0 0 0 0 0
v2 1 0 0 0 0 0 0
63
v3 2 1 1 1 0 0 0
v4 3 2 1 0 0 0 0
v5 1 1 0 0 0 0 0
v6 3 3 3 3 2 1 0
v7 2 2 2 1 0 0 0
Enqueu v1 v2 v5 v4 v3, v7 v
e 6
Dequeu v1 v2 v5 v4 v v7 v
3 6
void Graph::topsort( )
{
Queue<Vertex> q;
int counter = 0;
[Link]( );
for each Vertex v
if( [Link] == 0 )
[Link]( v );
while(  )
{
Vertex v = [Link]( );
[Link] = ++counter; // Assign next number
64
Example 2
Vertex 0 1 2 3 4 5
V0 0 0 0 0 0 0
V1 1 0 0 0 0 0
V2 1 1 0 0 0 0
V3 3 2 1 0 0 0
V4 2 2 2 1 0 0
V5 3 3 2 1 1 0
Enqueu V V V4 V5
e 0 1 V2 V3
Dequeu V V V2 V3 V4 V5
0 1
Breadth First Traversal (or Search) for a graph is similar to Breadth First Traversal of a tree The
only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node
again. To avoid processing a node more than once, we use a boolean visited array.
For example, in the following graph, we start traversal from vertex 2. When we come to vertex 0,
65
we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we don’t mark visited
vertices, then 2 will be processed again and it will become a non-terminating process. Breadth
First Traversal of the following graph is 2, 0, 3, 1.
66
Example 2:
BFS: 0,1,2,3,4,5,6
Example 3
graph = { {1, 2}, // Node 0 is connected to nodes 1 and 2
{0, 3, 4}, // Node 1 is connected to nodes 0, 3, and 4
{0, 5}, // Node 2 is connected to nodes 0 and 5
{1}, // Node 3 is connected to node 1
{1, 5}, // Node 4 is connected to nodes 1 and 5
{2, 4} // Node 5 is connected to nodes 2 and 4 }
67
Algorithm for BFS Traversal
#include <iostream>
#include <vector>
#include <queue>
68
while (![Link]()) {
int vertex = [Link]();
[Link]();
cout << vertex << " ";
int main() {
int numVertices = 6; // 6 vertices labeled from 1 to 6
vector<int> adj[numVertices + 1]; // +1 for 1-based index
cout << "Breadth First Search starting from vertex 1:" << endl;
BFS(adj, 1, numVertices);
return 0;
}
69
Applications of Breadth First Traversal
1) Shortest Path and Minimum Spanning Tree for unweighted graph In unweighted graph,
the shortest path is the path with least number of edges. With Breadth First, we always reach a
vertex from given source using minimum number of edges. Also, in case of unweighted graphs,
any spanning tree is Minimum Spanning Tree and we can use either Depth or Breadth first
traversal for finding a spanning tree.
2) Peer to Peer Networks. In Peer to Peer Networks like BitTorrent, Breadth First Search is
used to find all neighbor nodes.
3) Crawlers in Search Engines: Crawlers build index using Bread First. The idea is to start
from source page and follow all links from source and keep doing same. Depth First Traversal
can also be used for crawlers, but the advantage with Breadth First Traversal is, depth or levels
of built tree can be limited.
4) Social Networking Websites: In social networks, we can find people within a given distance
‘k’ from a person using Breadth First Search till ‘k’ levels.
5) GPS Navigation systems: Breadth First Search is used to find all neighboring locations.
7) In Garbage Collection: Breadth First Search is used in copying garbage collection using
Cheney’s algorithm.
8) Cycle detection in undirected graph: In undirected graphs, either Breadth First Search or
Depth First Search can be used to detect cycle. In directed graph, only depth first search can be
used.
10) To test if a graph is Bipartite We can either use Breadth First or Depth First Traversal.
11) Path Finding We can either use Breadth First or Depth First Traversal to find if there is a
path between two vertices.
12) Finding all nodes within one connected component: We can either use Breadth First or
Depth First Traversal to find all nodes reachable from a given node.
70
Depth First Traversal for a Graph
Depth First Traversal (or Search) for a graph is similar to Depth First Traversal of a tree. The
only catch here is, unlike trees, graphs may contain cycles, so we may come to the same node
again. To avoid processing a node more than once, we use a boolean
visited array. For example, in the following graph, we start traversal from vertex 2. When
we come to vertex 0, we look for all adjacent vertices of it. 2 is also an adjacent vertex of 0. If we
don’t mark visited vertices, then 2 will be processed again and it will become a non-terminating
process. Depth First Traversal of the
71
following graph is 2, 0, 1, 3
The DFS forms a depth-first forest comprised of more than one depth-first trees. Each tree is
made of edges (u, v) such that u is gray and v is white when edge (u, v) is explored. The
following pseudocode for DFS uses a global timestamp time.
graph. Following are the problems that use DFS as a building block.
72
1) For an unweighted graph, DFS traversal of the graph produces the minimum spanning tree
and all pair shortest path tree.
3) Path Finding
We can specialize the DFS algorithm to find a path between two given vertices u and z.
i) Call DFS(G, u) with u as the start vertex.
ii) Use a stack S to keep track of the path between the start vertex and the current vertex.
iii) As soon as destination vertex z is encountered, return the path
as the contents of the stack
4) Topological Sorting
73