Understanding Tree Data Structures
Understanding Tree Data Structures
Operations- Searching, Insertion and Deletion, Binary tree traversals, threaded binary trees,
AVL Trees : Definition, Height of an AVL Tree, Operations – Insertion, Deletion and Searching
B-Trees: B-Tree of order m, height of a B-Tree, insertion, deletion and searching, B+ Tree.
TREES
A Tree is a data structure in which each element is attached to one or more elements directly beneath it.
Level 0
A
B 1
C D
E F G
H I J
2
K L 3
Terminology
The connections between elements are called branches.
A tree has a single root, called root node, which is shown at the top of the tree. i.e. root is always
at the highest level 0.
Each node has exactly one node above it, called parent. Eg: A is the parent of B,C and D.
The nodes just below a node are called its children. ie. child nodes are one level lower than the
parent node.
A node which does not have any child called leaf or terminal node. Eg: E, F, K, L, H, I and M are
lNeaovdes. with at least one child are called non terminal or internal nodes.
The child nodes of same parent are said to be siblings.
A path in a tree is a list of distinct nodes in which successive nodes are connected by branches in
the tree.
The length of a particular path is the number of branches in that path. The degree of a node
of a tree is the number of children of that node.
The maximum number of children a node can have is often referred to as the order of a
tree. The height or depth of a tree is the length of the longest path from root to any leaf.
1. Root: This is the unique node in the tree to which further sub trees are attached. Eg: A
Degree of the node: The total number of sub-trees attached to the node is called the degree of the
[Link]: For node A degree is 3. For node K degree is 0
3. Leaves: These are the terminal nodes of the tree. The nodes with degree 0 are always the leaf nodes.
Eg: E, F, K, L,H, I, J
Page 1 115
4. Internal nodes: The nodes other than the root node and the leaves are called the internal nodes. Eg:
B, C, D, G
5. Parent nodes: The node which is having further sub-trees(branches) is called the parent node of
those sub-trees. Eg: B is the parent node of E and F.
6. Predecessor: While displaying the tree, if some particular node occurs previous to some other node
then that node is called the predecessor of the other node. Eg: E is the predecessor of the node B.
7. Successor: The node which occurs next to some other node is a successor node. Eg: B is the
successor of E and F.
8. Level of the tree: The root node is always considered at level 0, then its adjacent children are
supposed to be at level 1 and so on. Eg: A is at level 0, B,C,D are at level 1, E,F,G,H,I,J are at level 2,
K,L are at level 3.
9. Height of the tree: The maximum level is the height of the tree. Here height of the tree is 3. The
height if the tree is also called depth of the tree.
10. Degree of tree: The maximum degree of the node is called the degree of the tree.
BINARY TREES
Binary tree is a tree in which each node has at most two children, a left child and a right child. Thus the
order of binary tree is 2.
A binary tree is a finite set of nodes which is either empty or consists of a root and two disjoint
trees called left sub-tree and right sub-tree.
In binary tree each node will have one data field and two pointer fields for representing the
sub-branches. The degree of each node in the binary tree will be at the most two.
1. Left skewed binary tree: If the right sub-tree is missing in every node of a tree we call it as left
skewed tree.
A
Page 2 116
2. Right skewed binary tree: If the left sub-tree is missing in every node of a tree we call it is right
sub-tree.
B C
D E F G
Note:
n
1. A binary tree of depth n will have maximum 2 -1 nodes.
2. A complete binary tree of level l will have maximum 2l nodes at each level, where l starts from 0.
3. Any binary tree with n nodes will have at the most n+1 null branches.
4. The total number of edges in a complete binary tree with n terminal nodes are 2(n-1).
a) Sequential Representation
b) Linked Representation
a) Sequential Representation
The simplest way to represent binary trees in memory is the sequential representation that uses one-
dimensional array.
1) The root of binary tree is stored in the 1 st location of array
th
2) If a node is in the j location of array, then its left child is in the location 2J+1 and its right
child in the location 2J+2
d+1
The maximum size that is required for an array to store a tree is 2 -1, where d is the depth of the tree.
Page 3 117
Advantages of sequential representation:
The only advantage with this type of representation is that the
direct access to any node can be possible and finding the parent or left children of any particular node
is fast because of the random access.
2. Insertions and deletions which are the most common operations can be done without
moving the nodes.
Page 4 118
Disadvantages of linked representation:
1. This representation does not provide direct access to a node and special algorithms are
required.
2. This representation needs additional space in each node for storing the left and right sub-
trees.
Traversing a tree means that processing it so that each node is visited exactly once. A binary
tree can be
traversed a number of [Link] most common tree traversals are
In-order
Pre-order and
Post-order
B C
D E F G
H I J
K
The pre-order traversal is: ABDEHCFGIKJ
The in-order traversal is : DBHEAFCKIGJ
The post-order traversal is:DHEBFKIJGCA
Page 5 119
Inorder Traversal:
rd
Print 3
A
nd th
Print 2 Print 4
B D
C Print this
E
at the last
st
Print 1
C-B-A-D-E is the inorder traversal i.e. first we go towards the leftmost node. i.e. C so print that node
C. Then go back to the node B and print B. Then root node A then move towards the right sub-tree
print D and finally E. Thus we are following the tracing sequence of Left|Root|Right. This type of
traversal is called inorder traversal. The basic principle is to traverse left sub-tree then root and then the
right sub-tree.
Pseudo Code:
is the preorder traversal of the above fig. We are following Root|Left|Right path i.e. data at the
root node will be printed first then we move on the left sub-tree and go on printing the data till
we reach to the left most node. Print the data at that node and then move to the right sub- tree.
Follow the same principle at each sub-tree and go on printing the data accordingly.
Page 6 120
{
if(temp!=NULL)
{
cout<<”temp->data”; preorder(temp->left);
preorder(temp->right);
}
}
From figure the postorder traversal is C-D-B-E-A. In the postorder traversal we are following the
Left|Right|Root principle i.e. move to the leftmost node, if right sub-tree is there or not if not then
print the leftmost node, if right sub-tree is there move towards the right most node. The key idea
here is that at each sub-tree we are following the Left|Right|Root principle and print the data
accordingly.
Pseudo Code:
Page 7 121
Operations On Binary Search Tree:
The basic operations which can be performed on binary search tree are.
1. Insertion of a node in binary search tree.
2. Deletion of a node from binary search tree.
3. Searching for a particular node in binary search tree.
Insertion of a node in binary search tree.
While inserting any node in binary search tree, look for its appropriate position in the binary search
tree. We start comparing this new node with each node of the tree. If the value of the node which is
to be inserted is greater than the value of the current node we move on to the right sub-branch
otherwise we move on to the left sub-branch. As soon as the appropriate position is found we
attach this new node as left or right child appropriately.
Before Insertion
In the above fig, if we wan to insert 23. Then we will start comparing 23 with value of root node
i.e. 10. As 23 is greater than 10, we will move on right sub-tree. Now we will compare 23 with 20
and move right, compare 23 with 22 and move right. Now compare 23 with 24 but it is less than
24. We will move on left branch of 24. But as there is node as left child of 24, we can attach 23 as
left child of 24.
Page 8 122
Deletion of a node from binary search tree.
For deletion of any node from binary search tree there are three which are possible.
i. Deletion of leaf node.
ii. Deletion of a node having one child.
iii. Deletion of a node having two children.
This is the simplest deletion, in which we set the left or right pointer of parent node as NULL.
10
7 15
Before deletion
5 9 12 18
From the above fig, we want to delete the node having value 5 then we will set left pointer of its parent
node as NULL. That is left pointer of node having value 7 is set to NULL.
Page 9 123
To explain this kind of deletion, consider a tree as given below.
Page 10 124
Let us consider that we want to delete node having value 7. We will then find out the inorder successor
of node 7. We will then find out the inorder successor of node 7. The inorder successor will be simply
copied at location of node 7.
That means copy 8 at the position where value of node is 7. Set left pointer of 9 as NULL. This
completes the deletion procedure.
Page 11 125
In the above tree, if we want to search for value 9. Then we will compare 9 with root node 10. As 9 is
less than 10 we will search on left sub branch. Now compare 9 with 5, but 9 is greater than 5. So we
will move on right sub tree. Now compare 9 with 8 but 9 is greater than 8 we will move on right sub
branch. As the node we will get holds the value 9. Thus the desired node can be searched.
AVL TREES
Adelsion Velski and Lendis in 1962 introduced binary tree structure that is balanced with
respect to height of sub trees. The tree can be made balanced and because of this retrieval
of any node can be done in Ο(log n) times, where n is total number of nodes. From the
name of these scientists the tree is called AVL tree.
Definition:
An empty tree is height balanced if T is a non empty binary tree with T L and TR as
its left and right sub trees. The T is height balanced if and only if
i. TL and TR are height balanced.
ii. hL-hR <= 1 where hL and hR are heights of TL and TR.
The idea of balancing a tree is obtained by calculating the balance factor of a tree.
The balance factor BF(T) of a node in binary tree is defined to be hL-hR where hL and hR
are heights of left and right sub trees of T.
For any node in AVL tree the balance factor i.e. BF(T) is -1, 0 or +1.
Page 12 126
Height of AVL Tree:
Theorem: The height of AVL tree with n elements (nodes) is O(log n).
Proof: Let an AVL tree with n nodes in it. Nh be the minimum number of nodes in an AVL tree of
height h.
In worst case, one sub tree may have height h-1 and other sub tree may have height h-2. And both these
sub trees are AVL trees. Since for every node in AVL tree the height of left and right sub trees differ
by at most 1.
Hence
Nh = Nh-1+Nh-2+1
N0=0 N1=2
Page 13 127
We can also write it as
N > Nh = Nh-1+Nh-2+1
> 2Nh-2
> 4Nh-4
.
.
> 2iNh-2i
N > 2h/2-1N2
= O(log N)
This proves that height of AVL tree is always O(log N). Hence search, insertion and deletion can
be carried out in logarithmic time.
The AVL tree follows the property of binary search tree. In fact AVL trees are
basically binary search trees with balance factors as -1, 0, or +1.
After insertion of any node in an AVL tree if the balance factor of any node
becomes other than -1, 0, or +1 then it is said that AVL property is violated. Then
we have to restore the destroyed balance condition. The balance factor is denoted at
right top corner inside the node.
Page 14 128
After insertion of a new node if balance condition gets destroyed, then the nodes on that
path(new node insertion point to root) needs to be readjusted. That means only the affected sub
tree is to be rebalanced.
The rebalancing should be such that entire tree should satisfy AVL property.
In above given example-
Page 15 129
Insertion of a node.
There are four different cases when rebalancing is required after insertion of new node.
1. An insertion of new node into left sub tree of left child. (LL).
2. An insertion of new node into right sub tree of left child. (LR).
3. An insertion of new node into left sub tree of right child. (RL).
4. An insertion of new node into right sub tree of right child.(RR).
Some modifications done on AVL tree in order to rebalance it is called rotations of AVL tree
Insertion Algorithm:
1. Insert a new node as new leaf just as an ordinary binary search tree.
2. Now trace the path from insertion point(new node inserted as leaf) towards root. For each node
‘n’ encountered, check if heights of left (n) and right (n) differ by at most 1.
a) If yes, move towards parent (n).
b) Otherwise restructure by doing either a single rotation or a double rotation.
Thus once we perform a rotation at node ‘n’ we do not require to perform any rotation at any
ancestor on ‘n’.
Page 16 130
When node ‘1’ gets inserted as a left child of node ‘C’ then AVL property gets destroyed i.e. node
A has balance factor +2.
The LL rotation has to be applied to rebalance the nodes.
2. RR rotation:
When node ‘4’ gets attached as right child of node ‘C’ then node ‘A’ gets unbalanced. The rotation
which needs to be applied is RR rotation as shown in fig.
Page 17 131
When node ‘3’ is attached as a right child of node ‘C’ then unbalancing occurs because of LR.
Hence LR rotation needs to be applied.
When node ‘2’ is attached as a left child of node ‘C’ then node ‘A’ gets unbalanced as its balance
factor becomes -2. Then RL rotation needs to be applied to rebalance the AVL tree.
Example:
Page 18 132
Insert 1
To insert node ‘1’ we have to attach it as a left child of ‘2’. This will unbalance the tree as follows.
We will apply LL rotation to preserve AVL property of it.
Insert 25
We will attach 25 as a right child of 18. No balancing is required as entire tree preserves the AVL
property
Page 19 133
Insert 28
The node ‘28’ is attached as a right child of 25. RR rotation is required to rebalance.
Page 20 134
Insert 12
Page 21 135
Deletion:
Even after deletion of any particular node from AVL tree, the tree has to be restructured in order to
preserve AVL property. And thereby various rotations need to be applied.
Page 22 136
The tree becomes
Page 23 137
Searching:
The searching of a node in an AVL tree is very simple. As AVL tree is basically binary search tree, the
algorithm used for searching a node from binary search tree is the same one is used to search a node
from AVL tree.
BTREES
Multi-way trees are tree data structures with more than two branches at a node. The data
structures of m-way search trees, B trees and Tries belong to this category of tree
structures.
AVL search trees are height balanced versions of binary search trees, provide efficient
retrievals and storage operations. The complexity of insert, delete and search operations on
AVL search trees id O(log n).
Applications such as File indexing where the entries in an index may be very large,
maintaining the index as m-way search trees provides a better option than AVL search trees
which are but only balanced binary search trees.
While binary search trees are two-way search trees, m-way search trees are extended binary
search trees and hence provide efficient retrievals.
B trees are height balanced versions of m-way search trees and they do not recommend
representation of keys with varying sizes.
Tries are tree based data structures that support keys with varying sizes.
Page 24 138
UNIT -5
Definition:
A B tree of order m is an m-way search tree and hence may be empty. If non empty, then the following
properties are satisfied on its extended tree representation:
i. The root node must have at least two child nodes and at most m child nodes.
ii. All internal nodes other than the root node must have at least |m/2 | non empty child nodes and at most
m non empty child nodes.
iii. The number of keys in each internal node is one less than its number of child nodes and these keys
partition the keys of the tree into sub trees.
iv. All external nodes are at the same level.
v.
Example:
F K O B tree of order 4
Level 1
C D G M N P Q W
S T X Y Z
Level
3
Insertion
For example construct a B-tree of order 5 using following numbers. 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12,
20, 26, 4, 16, 18, 24, 25, 19
The order 5 means at the most 4 keys are allowed. The internal node should have at least 3 non empty
children and each leaf node must contain at least 2 keys.
1 3 7 14
Page 25
139
UNIT -5
Step 2: Insert 8, Since the node is full split the node at medium 1, 3, 7, 8, 14
1 3 8 14
1 3 5 8 11 14 17
Step 4: Now insert 13. But if we insert 13 then the leaf node will have 5 keys which is not allowed. Hence
8,
11, 13, 14, 17 is split and medium node 13 is moved up.
7 13
1 3 5 8 11 14 17
Page 26
140
UNIT -5
7 13
1 3 5 6 8 11 12 14 17 20 23
Step 6: The 26 is inserted to the right most leaf node. Hence 14, 17, 20, 23, 26 the node is split and 20 will be
moved up.
7 13 20
1 3 5 6 8 11 12 14 17 23 26
Page 27
141
UNIT -5
Step 7: Insertion of node 4 causes left most node to split. The 1, 3, 4, 5, 6 causes key 4 to move up.
Then insert 16, 18, 24, 25.
4 7 13 20
1 3 5 6 8 11 12 14 16 17 18 23 24 25 26
Step 8: Finally insert 19. Then 4, 7, 13, 19, 20 needs to be split. The median 13 will be moved up to
from a root node.
The tree then will be -
13
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
4 7 17 20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
Page 28
142
UNIT -5
4 7 17 20
1 3 5 6 11 12 14 16 18 19 23 24 25 26
Now we will delete 20, the 20 is not in a leaf node so we will find its successor which is 23, Hence 23
will be moved up to replace 20.
13
4 7 17 23
1 3 5 6 11 12 14 16 18 19 24 25 26
Next we will delete 18. Deletion of 18 from the corresponding node causes the node with only one
key, which is not desired (as per rule 4) in B-tree of order 5. The sibling node to immediate right has
an extra key. In such a case we can borrow a key from parent and move spare key of sibling up.
Page 29
143
UNIT -5
13
4 7 17 24
1 3 5 6 11 12 14 16 19 23 25 26
Now delete 5. But deletion of 5 is not easy. The first thing is 5 is from leaf node. Secondly this leaf
node has no extra keys nor siblings to immediate left or right. In such a situation we can combine this
node with one of the siblings. That means remove 5 and combine 6 with the node 1, 3. To make the tree
balanced we have to move parent’s key down. Hence we will move 4 down as 4 is between 1, 3, and 6.
The tree will be-
13
7 17 24
1 3 4 6 11 12 14 16 19 23 25 26
But again internal node of 7 contains only one key which not allowed in B-tree. We then will try to borrow
a key from sibling. But sibling 17, 24 has no spare key. Hence we can do is that, combine 7 with 13 and 17,
24. Hence the B-tree will be
Page 30
144
UNIT -5
7 13 17 24
1 3 4 6 11 12 14 16 19 23 25 26
Searching
The search operation on B-tree is similar to a search to a search on binary search tree. Instead of choosing
between a left and right child as in binary tree, B-tree makes an m-way choice. Consider a B-tree as given
below.
13
4 7
17
20
1 3 5 6 8 11 12 14 16 18 19 23 24 25 26
Page 31
145
UNIT -5
The running time of search operation depends upon the height of the tree. It is O(log n).
Height of B-tree
The maximum height of B-tree gives an upper bound on number of disk access. The maximum number of
keys in a B-tree of order 2m and depth h is
2 h-1
1 + 2m + 2m(m+1) + 2m(m+1) + . . .+ 2m(m+1)
h
i-1
= 1 + ∑ 2m(m+1)
i=1
The maximum height of B-tree with n keys
Page 32
146
B+ Trees
B+ Tree Example
3 7 8
1 3 5 6 7 8 9 12
147
B+ Tree Leaf Node Structure
1. Each leaf node is of the form, <<K 1, Pr1>, <K2, Pr2>,… ,<Kq-1, Prq-1>, Pnext> where
q<=p, each Pri is a data pointer, and Pnext points to the next leaf node of the B+
tree.
2. Within each leaf node, K1 < K2 < …< Kq-1, q<=p
3. Each Pri is a data pointer that points to the record whose search field value is Ki,
or to a file block containing the record (or a block of pointers if the search field is
not a key field)
4. Each leaf node has at least p/2 values.
5. All leaf nodes are at the same level.
B+ Tree Information
By starting at the leftmost block, it is possible to traverse leaf nodes as a linked
list using the Pnext pointers. This provides ordered access to the data records on
the indexing field.
Entries in internal nodes of a B+ tree include search values and tree pointers,
without any data pointers, more entries can be stored into an internal node of a
B+ tree, than for a B-tree.
Therefore the order p will be larger for a B+ tree, which leads to fewer B+ tree
levels, improving the search time.
The order p can be different for the internal and leaf nodes, because of the
structural differences of the nodes.
5 6 bytes
9 bytes
1 3
6 bytes
9 bytes
7 bytes
(pleaf)*((Pr + V)) + P <= 512
16pleaf + 6 <= 512
pleaf <= 506/16
148
pleaf = 31 which means each leaf node can hold up to p leaf = 31 value/data pointer
combinations, assuming data pointers are record pointers.
When we compare this result with the previous B-tree example (Example 5), we can
see that the B+ tree can hold up to 255,507 record pointers, whereas a
corresponding B-tree can only hold 65,535 entries.
Points to Note:
Every key value must exist at the leaf level, because all data pointers are at the
leaf level,
Every value appearing in an internal node, also appears as the rightmost value in
the leaf level of the subtree pointed at by the tree pointer to the left of the value.
When a leaf node is full, and a new entry is inserted there, the node overflows
and must be split. The first j = (p leaf+1)/2 entries (in the example 2 entries) in the
original node are kept there, and the remaining entries are moved to the new leaf
node. The entry at position j is copied/replicated and moved to the parent node.
When an internal node is full, and a new entry is to be inserted, the node
overflows and must be split into 2 nodes. The entry at position j is moved to the
parent node. The first j-1 entries are kept in the original node, and the last j+1
entries are moved to the new node.
149