0% found this document useful (0 votes)
104 views35 pages

Understanding Tree Data Structures

The document provides an overview of tree data structures, specifically focusing on binary trees, binary search trees, AVL trees, and B-trees. It covers definitions, representations, operations such as searching, insertion, and deletion, as well as tree traversals and their characteristics. Additionally, it discusses the advantages and disadvantages of sequential and linked representations of binary trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views35 pages

Understanding Tree Data Structures

The document provides an overview of tree data structures, specifically focusing on binary trees, binary search trees, AVL trees, and B-trees. It covers definitions, representations, operations such as searching, insertion, and deletion, as well as tree traversals and their characteristics. Additionally, it discusses the advantages and disadvantages of sequential and linked representations of binary trees.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Binary Search Trees: Various Binary tree representation, definition, BST ADT, Implementation,

Operations- Searching, Insertion and Deletion, Binary tree traversals, threaded binary trees,
AVL Trees : Definition, Height of an AVL Tree, Operations – Insertion, Deletion and Searching
B-Trees: B-Tree of order m, height of a B-Tree, insertion, deletion and searching, B+ Tree.

TREES
A Tree is a data structure in which each element is attached to one or more elements directly beneath it.

Level 0
A

B 1
C D

E F G
H I J
2
K L 3

Terminology
 The connections between elements are called branches.
 A tree has a single root, called root node, which is shown at the top of the tree. i.e. root is always
at the highest level 0.
 Each node has exactly one node above it, called parent. Eg: A is the parent of B,C and D.
 The nodes just below a node are called its children. ie. child nodes are one level lower than the
parent node.
 A node which does not have any child called leaf or terminal node. Eg: E, F, K, L, H, I and M are
 lNeaovdes. with at least one child are called non terminal or internal nodes.
 The child nodes of same parent are said to be siblings.
 A path in a tree is a list of distinct nodes in which successive nodes are connected by branches in
the tree.
 The length of a particular path is the number of branches in that path. The degree of a node
of a tree is the number of children of that node.
 The maximum number of children a node can have is often referred to as the order of a
tree. The height or depth of a tree is the length of the longest path from root to any leaf.

1. Root: This is the unique node in the tree to which further sub trees are attached. Eg: A
Degree of the node: The total number of sub-trees attached to the node is called the degree of the
[Link]: For node A degree is 3. For node K degree is 0
3. Leaves: These are the terminal nodes of the tree. The nodes with degree 0 are always the leaf nodes.
Eg: E, F, K, L,H, I, J

Page 1 115
4. Internal nodes: The nodes other than the root node and the leaves are called the internal nodes. Eg:
B, C, D, G
5. Parent nodes: The node which is having further sub-trees(branches) is called the parent node of
those sub-trees. Eg: B is the parent node of E and F.
6. Predecessor: While displaying the tree, if some particular node occurs previous to some other node
then that node is called the predecessor of the other node. Eg: E is the predecessor of the node B.
7. Successor: The node which occurs next to some other node is a successor node. Eg: B is the
successor of E and F.
8. Level of the tree: The root node is always considered at level 0, then its adjacent children are
supposed to be at level 1 and so on. Eg: A is at level 0, B,C,D are at level 1, E,F,G,H,I,J are at level 2,
K,L are at level 3.
9. Height of the tree: The maximum level is the height of the tree. Here height of the tree is 3. The
height if the tree is also called depth of the tree.
10. Degree of tree: The maximum degree of the node is called the degree of the tree.
BINARY TREES

Binary tree is a tree in which each node has at most two children, a left child and a right child. Thus the
order of binary tree is 2.

A binary tree is either empty or consists


of a) a node called the root
b) left and right sub trees are themselves binary trees.

A binary tree is a finite set of nodes which is either empty or consists of a root and two disjoint
trees called left sub-tree and right sub-tree.
In binary tree each node will have one data field and two pointer fields for representing the
sub-branches. The degree of each node in the binary tree will be at the most two.

Types Of Binary Trees:


There are 3 types of binary trees:

1. Left skewed binary tree: If the right sub-tree is missing in every node of a tree we call it as left
skewed tree.
A

Page 2 116
2. Right skewed binary tree: If the left sub-tree is missing in every node of a tree we call it is right
sub-tree.

3. Complete binary tree:


The tree in which degree of each node is at the most two is called a complete binary tree. In
a complete binary tree there is exactly one node at level 0, two nodes at level 1 and four nodes at level
l
2 and so on. So we can say that a complete binary tree depth d will contain exactly 2 nodes at each
level l, where l is from 0 to d.

B C

D E F G
Note:
n
1. A binary tree of depth n will have maximum 2 -1 nodes.
2. A complete binary tree of level l will have maximum 2l nodes at each level, where l starts from 0.
3. Any binary tree with n nodes will have at the most n+1 null branches.
4. The total number of edges in a complete binary tree with n terminal nodes are 2(n-1).

Binary Tree Representation


A binary tree can be represented mainly in 2 ways:

a) Sequential Representation
b) Linked Representation

a) Sequential Representation
The simplest way to represent binary trees in memory is the sequential representation that uses one-
dimensional array.
1) The root of binary tree is stored in the 1 st location of array
th
2) If a node is in the j location of array, then its left child is in the location 2J+1 and its right
child in the location 2J+2
d+1
The maximum size that is required for an array to store a tree is 2 -1, where d is the depth of the tree.

Page 3 117
Advantages of sequential representation:
The only advantage with this type of representation is that the
direct access to any node can be possible and finding the parent or left children of any particular node
is fast because of the random access.

Disadvantages of sequential representation:


1. The major disadvantage with this type of representation is wastage of memory. For example in
the skewed tree half of the array is unutilized.
2. In this type of representation the maximum depth of the tree has to be fixed. Because we have
decide the array size. If we choose the array size quite larger than the depth of the tree, then it
will be wastage of the memory. And if we coose array size lesser than the depth of the tree then
we will be unable to represent some part of the tree.
3. The insertions and deletion of any node in the tree will be costlier as other nodes has to be
adjusted at appropriate positions so that the meaning of binary tree can be preserved.
As these drawbacks are there with this sequential type of representation, we will search for more
flexible representation. So instead of array we will make use of linked list to represent the tree.
b) Linked Representation
Linked representation of trees in memory is implemented using pointers. Since each node in a
binary tree can have maximum two children, a node in a linked representation has two pointers for both
left and right child, and one information field. If a node does not have any child, the corresponding
pointer field is made NULL pointer.

In linked list each node will look like this:

Left Child Data Right Child


Advantages of linked representation:
1. This representation is superior to our array representation as there is no wastage of
memory. And so there is no need to have prior knowledge of depth of the tree.
Using dynamic memory concept one can create as much memory(nodes) as
required. By chance if some nodes are unutilized one can delete the nodes by
making the address free.

2. Insertions and deletions which are the most common operations can be done without
moving the nodes.

Page 4 118
Disadvantages of linked representation:

1. This representation does not provide direct access to a node and special algorithms are
required.
2. This representation needs additional space in each node for storing the left and right sub-
trees.

TRAVERSING A BINARY TREE

Traversing a tree means that processing it so that each node is visited exactly once. A binary
tree can be
traversed a number of [Link] most common tree traversals are

 In-order
 Pre-order and
 Post-order

Pre-order [Link] the root Root | Left | Right


[Link] the left sub tree in pre-order
[Link] the right sub tree in pre-order.
In-order [Link] the left sub tree in in-order Left | Root | Right
[Link] the root
[Link] the right sub tree in in-order.
Post-order [Link] the left sub tree in post-order Left | Right | Root
[Link] the right sub tree in post-order.
[Link] the root

B C

D E F G

H I J

K
The pre-order traversal is: ABDEHCFGIKJ
The in-order traversal is : DBHEAFCKIGJ
The post-order traversal is:DHEBFKIJGCA

Page 5 119
Inorder Traversal:
rd
Print 3
A

nd th
Print 2 Print 4
B D

C Print this
E
at the last
st
Print 1

C-B-A-D-E is the inorder traversal i.e. first we go towards the leftmost node. i.e. C so print that node
C. Then go back to the node B and print B. Then root node A then move towards the right sub-tree
print D and finally E. Thus we are following the tracing sequence of Left|Root|Right. This type of
traversal is called inorder traversal. The basic principle is to traverse left sub-tree then root and then the
right sub-tree.

Pseudo Code:

template <class T>


void inorder(bintree<T> *temp)
{
if(temp!=NULL)
{
inorder(temp->left);
cout<<”temp->data”;
inorder(temp->right);
}
}

is the preorder traversal of the above fig. We are following Root|Left|Right path i.e. data at the
root node will be printed first then we move on the left sub-tree and go on printing the data till
we reach to the left most node. Print the data at that node and then move to the right sub- tree.
Follow the same principle at each sub-tree and go on printing the data accordingly.

template <class T>


void preorder(bintree<T> *temp)

Page 6 120
{
if(temp!=NULL)
{
cout<<”temp->data”; preorder(temp->left);
preorder(temp->right);
}
}

From figure the postorder traversal is C-D-B-E-A. In the postorder traversal we are following the
Left|Right|Root principle i.e. move to the leftmost node, if right sub-tree is there or not if not then
print the leftmost node, if right sub-tree is there move towards the right most node. The key idea
here is that at each sub-tree we are following the Left|Right|Root principle and print the data
accordingly.
Pseudo Code:

template <class T>


void postorder(bintree<T> *temp)
{
if(temp!=NULL)
{
postorder(temp->left);
postorder(temp->right);
cout<<”temp->data”;
}
}

BINARY SEARCH TREE


In the simple binary tree the nodes are arranged in any fashion. Depending on user’s desire
the new nodes can be attached as a left or right child of any desired node. In such a case finding for
any node is a long cut procedure, because in that case we have to search the entire tree. And thus
the searching time complexity will get increased unnecessarily. So to make the searching
algorithm faster in a binary tree we will go for building the binary search tree. The binary search
tree is based on the binary search algorithm. While creating the binary search tree the data is
systematically arranged. That means values at left sub-tree < root node value < right sub-tree
values.

Page 7 121
Operations On Binary Search Tree:
The basic operations which can be performed on binary search tree are.
1. Insertion of a node in binary search tree.
2. Deletion of a node from binary search tree.
3. Searching for a particular node in binary search tree.
Insertion of a node in binary search tree.
While inserting any node in binary search tree, look for its appropriate position in the binary search
tree. We start comparing this new node with each node of the tree. If the value of the node which is
to be inserted is greater than the value of the current node we move on to the right sub-branch
otherwise we move on to the left sub-branch. As soon as the appropriate position is found we
attach this new node as left or right child appropriately.

Before Insertion

In the above fig, if we wan to insert 23. Then we will start comparing 23 with value of root node
i.e. 10. As 23 is greater than 10, we will move on right sub-tree. Now we will compare 23 with 20
and move right, compare 23 with 22 and move right. Now compare 23 with 24 but it is less than
24. We will move on left branch of 24. But as there is node as left child of 24, we can attach 23 as
left child of 24.

Page 8 122
Deletion of a node from binary search tree.
For deletion of any node from binary search tree there are three which are possible.
i. Deletion of leaf node.
ii. Deletion of a node having one child.
iii. Deletion of a node having two children.

Deletion of leaf node.

This is the simplest deletion, in which we set the left or right pointer of parent node as NULL.

10

7 15

Before deletion

5 9 12 18

From the above fig, we want to delete the node having value 5 then we will set left pointer of its parent
node as NULL. That is left pointer of node having value 7 is set to NULL.

Deletion of a node having one child.

Page 9 123
To explain this kind of deletion, consider a tree as given below.

If we want to delete the node 15, then we


will simply copy node 18 at place of 16
and then set the node free

Deletion of a node having two children.


Consider a tree as given below.

Page 10 124
Let us consider that we want to delete node having value 7. We will then find out the inorder successor
of node 7. We will then find out the inorder successor of node 7. The inorder successor will be simply
copied at location of node 7.
That means copy 8 at the position where value of node is 7. Set left pointer of 9 as NULL. This
completes the deletion procedure.

Searching for a node in binary search tree.


In searching, the node which we want to search is called a key node. The key node will be compared
with each node starting from root node if value of key node is greater than current node then we
search for it on right sub branch otherwise on left sub branch. If we reach to leaf node and still we do
not get the value of key node then we declare “node is not present in the tree”.

Page 11 125
In the above tree, if we want to search for value 9. Then we will compare 9 with root node 10. As 9 is
less than 10 we will search on left sub branch. Now compare 9 with 5, but 9 is greater than 5. So we
will move on right sub tree. Now compare 9 with 8 but 9 is greater than 8 we will move on right sub
branch. As the node we will get holds the value 9. Thus the desired node can be searched.

AVL TREES
Adelsion Velski and Lendis in 1962 introduced binary tree structure that is balanced with
respect to height of sub trees. The tree can be made balanced and because of this retrieval
of any node can be done in Ο(log n) times, where n is total number of nodes. From the
name of these scientists the tree is called AVL tree.

Definition:

An empty tree is height balanced if T is a non empty binary tree with T L and TR as
its left and right sub trees. The T is height balanced if and only if
i. TL and TR are height balanced.
ii. hL-hR <= 1 where hL and hR are heights of TL and TR.
The idea of balancing a tree is obtained by calculating the balance factor of a tree.

Definition of Balance Factor:

The balance factor BF(T) of a node in binary tree is defined to be hL-hR where hL and hR
are heights of left and right sub trees of T.

For any node in AVL tree the balance factor i.e. BF(T) is -1, 0 or +1.

Page 12 126
Height of AVL Tree:
Theorem: The height of AVL tree with n elements (nodes) is O(log n).

Proof: Let an AVL tree with n nodes in it. Nh be the minimum number of nodes in an AVL tree of
height h.

In worst case, one sub tree may have height h-1 and other sub tree may have height h-2. And both these
sub trees are AVL trees. Since for every node in AVL tree the height of left and right sub trees differ
by at most 1.

Hence

Nh = Nh-1+Nh-2+1

Where Nh denotes the minimum number of nodes in an AVL tree of height h.

N0=0 N1=2

Page 13 127
We can also write it as
N > Nh = Nh-1+Nh-2+1

> 2Nh-2

> 4Nh-4
.
.
> 2iNh-2i

If value of h is even, let i = h/2-1

Then equation becomes

N > 2h/2-1N2

= N > 2(h-1)/2x4 (N2 = 4)

= O(log N)

If value of h is odd, let I = (h-1)/2 then equation becomes


N > 2(h-1)/2 N1
N > 2(h-1)/2 x 1
H = O(log N)

This proves that height of AVL tree is always O(log N). Hence search, insertion and deletion can
be carried out in logarithmic time.

Representation of AVL Tree

 The AVL tree follows the property of binary search tree. In fact AVL trees are
basically binary search trees with balance factors as -1, 0, or +1.
 After insertion of any node in an AVL tree if the balance factor of any node
becomes other than -1, 0, or +1 then it is said that AVL property is violated. Then
we have to restore the destroyed balance condition. The balance factor is denoted at
right top corner inside the node.

Page 14 128
 After insertion of a new node if balance condition gets destroyed, then the nodes on that
path(new node insertion point to root) needs to be readjusted. That means only the affected sub
tree is to be rebalanced.
 The rebalancing should be such that entire tree should satisfy AVL property.
In above given example-

Page 15 129
Insertion of a node.

There are four different cases when rebalancing is required after insertion of new node.

1. An insertion of new node into left sub tree of left child. (LL).
2. An insertion of new node into right sub tree of left child. (LR).
3. An insertion of new node into left sub tree of right child. (RL).
4. An insertion of new node into right sub tree of right child.(RR).

Some modifications done on AVL tree in order to rebalance it is called rotations of AVL tree

There are two types of rotations:

Single rotation Double rotation


Left-Left(LL rotation) Left-Right(LR rotation)

Right-Right(RR rotation) Right-Left(RL rotation)

Insertion Algorithm:
1. Insert a new node as new leaf just as an ordinary binary search tree.
2. Now trace the path from insertion point(new node inserted as leaf) towards root. For each node
‘n’ encountered, check if heights of left (n) and right (n) differ by at most 1.
a) If yes, move towards parent (n).
b) Otherwise restructure by doing either a single rotation or a double rotation.
Thus once we perform a rotation at node ‘n’ we do not require to perform any rotation at any
ancestor on ‘n’.

Page 16 130
When node ‘1’ gets inserted as a left child of node ‘C’ then AVL property gets destroyed i.e. node
A has balance factor +2.
The LL rotation has to be applied to rebalance the nodes.

2. RR rotation:

When node ‘4’ gets attached as right child of node ‘C’ then node ‘A’ gets unbalanced. The rotation
which needs to be applied is RR rotation as shown in fig.

Page 17 131
When node ‘3’ is attached as a right child of node ‘C’ then unbalancing occurs because of LR.
Hence LR rotation needs to be applied.

When node ‘2’ is attached as a left child of node ‘C’ then node ‘A’ gets unbalanced as its balance
factor becomes -2. Then RL rotation needs to be applied to rebalance the AVL tree.
Example:

Insert 1, 25, 28, 12 in the following AVL tree.

Page 18 132
Insert 1

To insert node ‘1’ we have to attach it as a left child of ‘2’. This will unbalance the tree as follows.
We will apply LL rotation to preserve AVL property of it.

Insert 25

We will attach 25 as a right child of 18. No balancing is required as entire tree preserves the AVL
property

Page 19 133
Insert 28
The node ‘28’ is attached as a right child of 25. RR rotation is required to rebalance.

Page 20 134
Insert 12

To rebalance the tree we have to apply LR rotation.

Page 21 135
Deletion:

Even after deletion of any particular node from AVL tree, the tree has to be restructured in order to
preserve AVL property. And thereby various rotations need to be applied.

Algorithm for deletion:

The deletion algorithm is more complex than insertion algorithm.


1. Search the node which is to be deleted.
2. a) If the node to be deleted is a leaf node then simply make it NULL to remove.
b) If the node to be deleted is not a leaf node i.e. node may have one or two children, then the
node must be swapped with its inorder successor. Once the node is swapped, we can remove
this node.
3. Now we have to traverse back up the path towards root, checking the
balance factor of every node along the path. If we encounter unbalancing
in some sub tree
then balance that sub tree using appropriate single or double
rotations. The deletion algorithm takes O(log n) time to delete any node.

Page 22 136
The tree becomes

Page 23 137
Searching:

The searching of a node in an AVL tree is very simple. As AVL tree is basically binary search tree, the
algorithm used for searching a node from binary search tree is the same one is used to search a node
from AVL tree.

The searching of a node from AVL tree takes O(log n) time.

BTREES
 Multi-way trees are tree data structures with more than two branches at a node. The data
structures of m-way search trees, B trees and Tries belong to this category of tree
structures.
 AVL search trees are height balanced versions of binary search trees, provide efficient
retrievals and storage operations. The complexity of insert, delete and search operations on
AVL search trees id O(log n).
 Applications such as File indexing where the entries in an index may be very large,
maintaining the index as m-way search trees provides a better option than AVL search trees
which are but only balanced binary search trees.
 While binary search trees are two-way search trees, m-way search trees are extended binary
search trees and hence provide efficient retrievals.
 B trees are height balanced versions of m-way search trees and they do not recommend
representation of keys with varying sizes.
 Tries are tree based data structures that support keys with varying sizes.

Page 24 138
UNIT -5

Definition:

A B tree of order m is an m-way search tree and hence may be empty. If non empty, then the following
properties are satisfied on its extended tree representation:
i. The root node must have at least two child nodes and at most m child nodes.
ii. All internal nodes other than the root node must have at least |m/2 | non empty child nodes and at most
m non empty child nodes.
iii. The number of keys in each internal node is one less than its number of child nodes and these keys
partition the keys of the tree into sub trees.
iv. All external nodes are at the same level.

v.

Example:
F K O B tree of order 4

Level 1

C D G M N P Q W

S T X Y Z
Level
3

Insertion
For example construct a B-tree of order 5 using following numbers. 3, 14, 7, 1, 8, 5, 11, 17, 13, 6, 23, 12,
20, 26, 4, 16, 18, 24, 25, 19
The order 5 means at the most 4 keys are allowed. The internal node should have at least 3 non empty
children and each leaf node must contain at least 2 keys.

Step 1: Insert 3, 14, 7, 1

1 3 7 14

Page 25
139
UNIT -5

Step 2: Insert 8, Since the node is full split the node at medium 1, 3, 7, 8, 14

1 3 8 14

Step 3: Insert 5, 11, 17 which can be easily inserted in a B-tree.

1 3 5 8 11 14 17

Step 4: Now insert 13. But if we insert 13 then the leaf node will have 5 keys which is not allowed. Hence
8,
11, 13, 14, 17 is split and medium node 13 is moved up.

7 13

1 3 5 8 11 14 17

Page 26
140
UNIT -5

Step 5: Now insert 6, 23, 12, 20 without any split.

7 13

1 3 5 6 8 11 12 14 17 20 23

Step 6: The 26 is inserted to the right most leaf node. Hence 14, 17, 20, 23, 26 the node is split and 20 will be
moved up.

7 13 20

1 3 5 6 8 11 12 14 17 23 26

Page 27
141
UNIT -5
Step 7: Insertion of node 4 causes left most node to split. The 1, 3, 4, 5, 6 causes key 4 to move up.
Then insert 16, 18, 24, 25.

4 7 13 20

1 3 5 6 8 11 12 14 16 17 18 23 24 25 26

Step 8: Finally insert 19. Then 4, 7, 13, 19, 20 needs to be split. The median 13 will be moved up to
from a root node.
The tree then will be -
13

4 7 17 20

1 3 5 6 8 11 12 14 16 18 19 23 24 25 26

Thus the B tree is constructed. 13


Deletion
Consider a B-tree

4 7 17 20

1 3 5 6 8 11 12 14 16 18 19 23 24 25 26

Page 28
142
UNIT -5

Delete 8, then it is very simple.


13

4 7 17 20

1 3 5 6 11 12 14 16 18 19 23 24 25 26

Now we will delete 20, the 20 is not in a leaf node so we will find its successor which is 23, Hence 23
will be moved up to replace 20.
13

4 7 17 23

1 3 5 6 11 12 14 16 18 19 24 25 26

Next we will delete 18. Deletion of 18 from the corresponding node causes the node with only one
key, which is not desired (as per rule 4) in B-tree of order 5. The sibling node to immediate right has
an extra key. In such a case we can borrow a key from parent and move spare key of sibling up.

Page 29
143
UNIT -5

13

4 7 17 24

1 3 5 6 11 12 14 16 19 23 25 26

Now delete 5. But deletion of 5 is not easy. The first thing is 5 is from leaf node. Secondly this leaf
node has no extra keys nor siblings to immediate left or right. In such a situation we can combine this
node with one of the siblings. That means remove 5 and combine 6 with the node 1, 3. To make the tree
balanced we have to move parent’s key down. Hence we will move 4 down as 4 is between 1, 3, and 6.
The tree will be-

13

7 17 24

1 3 4 6 11 12 14 16 19 23 25 26

But again internal node of 7 contains only one key which not allowed in B-tree. We then will try to borrow
a key from sibling. But sibling 17, 24 has no spare key. Hence we can do is that, combine 7 with 13 and 17,
24. Hence the B-tree will be

Page 30
144
UNIT -5

7 13 17 24

1 3 4 6 11 12 14 16 19 23 25 26

Searching

The search operation on B-tree is similar to a search to a search on binary search tree. Instead of choosing
between a left and right child as in binary tree, B-tree makes an m-way choice. Consider a B-tree as given
below.

13

4 7
17
20

1 3 5 6 8 11 12 14 16 18 19 23 24 25 26

If we want to search 11 then

i. 11 < 13 ; Hence search left node

ii. 11 > 7 ; Hence right most node

iii. 11 > 8 ; move in second block

iv. node 11 is found

Page 31
145
UNIT -5

The running time of search operation depends upon the height of the tree. It is O(log n).

Height of B-tree

The maximum height of B-tree gives an upper bound on number of disk access. The maximum number of
keys in a B-tree of order 2m and depth h is
2 h-1
1 + 2m + 2m(m+1) + 2m(m+1) + . . .+ 2m(m+1)

h
i-1
= 1 + ∑ 2m(m+1)
i=1
The maximum height of B-tree with n keys

log m+1 n = O(log n)


2m

Page 32
146
B+ Trees

 Most implementations use the B-tree variation, the B+-tree.


 In the B-tree, every value of the search field appears once at some level in the
tree, along with the data pointer to the record, or block where the record is stored.
 In a B+ tree, data pointers are stored only at the leaf nodes, therefore the
structure of the leaf nodes vary from the structure of the internal (non leaf) nodes.
 If the search field is a key field, the leaf nodes have a value for every value of the
search field, along with the data pointer to the record or block.
 If the search field is a non key field, the pointer points to a block containing
pointers to the data file records, creating an extra level of indirection (similar to
option 3 for the secondary indexes)
 The leaf nodes of the B+ Trees are linked to provide ordered access on the
search field to the record. The first level is similar to the base level of an index.
 Some search field values in the leaf nodes are repeated in the internal nodes of
the B+ trees, in order to guide the search.

B+ Tree Example

3 7 8

1 3 5 6 7 8 9 12

B+ Tree Internal Node Structure


1. Each internal node is of the form <P 1, K1, P2, K2, …., Pq-1, Kq-1, Pq>, where q<=p
and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < ….< Kq-1.
3. For all search field values X in the subtree pointed at by Pi, we have:
 Ki-1 < X <= Ki for 1 < i < q;
 X <=K for i = 1;
 and Ki-1 < X for i = q.
4. Each internal node has at most, p tree pointers.
5. Each internal node, except the root, has at least p/2 tree pointers. The root
node has at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q <=p, has q-1 search field values.

147
B+ Tree Leaf Node Structure
1. Each leaf node is of the form, <<K 1, Pr1>, <K2, Pr2>,… ,<Kq-1, Prq-1>, Pnext> where
q<=p, each Pri is a data pointer, and Pnext points to the next leaf node of the B+
tree.
2. Within each leaf node, K1 < K2 < …< Kq-1, q<=p
3. Each Pri is a data pointer that points to the record whose search field value is Ki,
or to a file block containing the record (or a block of pointers if the search field is
not a key field)
4. Each leaf node has at least p/2 values.
5. All leaf nodes are at the same level.

B+ Tree Information
 By starting at the leftmost block, it is possible to traverse leaf nodes as a linked
list using the Pnext pointers. This provides ordered access to the data records on
the indexing field.
 Entries in internal nodes of a B+ tree include search values and tree pointers,
without any data pointers, more entries can be stored into an internal node of a
B+ tree, than for a B-tree.
 Therefore the order p will be larger for a B+ tree, which leads to fewer B+ tree
levels, improving the search time.
 The order p can be different for the internal and leaf nodes, because of the
structural differences of the nodes.

Example 6 from Text


To calculate the order p of a B+ Tree. suppose the search key field is V = 9 bytes
long, the block size is B = 512 bytes, a record pointer is Pr = 7 bytes and a block
pointer is P = 6 bytes. An internal node of the B+ trees can have up to p tree
pointers and p – 1 search field values, which must fit into a single block.

Calculate the value of p for an internal node:

5 6 bytes
9 bytes

p*P + (p-1)*V <= 512


p*6 + (p-1)*9 <= 512
6p + 9p –9 <= 512
15p <= 522
p = 34 which means that each internal node can hold up to 34 tree pointers, and 33
search key values.

Calculate the value of p for a leaf node:

1 3
6 bytes
9 bytes
7 bytes
(pleaf)*((Pr + V)) + P <= 512
16pleaf + 6 <= 512
pleaf <= 506/16

148
pleaf = 31 which means each leaf node can hold up to p leaf = 31 value/data pointer
combinations, assuming data pointers are record pointers.

Example 7 from Text

Suppose that we construct a B+ tree on the field of Example 6. To calculate the


approximate number of entries of the B+ tree we assume that each node is 69
percent full. On average, each internal node will have 34 * 0.69 or approximately 23
pointers, and hence 22 values. Each leaf node, on the average will hold 0.69*pleaf =
0.69*31 or approximately 21 data record pointers. A B+ tree will have the following
average number of entries at each level.

Root: 1 node 22 entries 23 pointers


Level 1: 23 nodes 506 entries 529 pointers
Level 2: 529 nodes 11,638 entries 12,167 pointers
Leaf Level: 12,167 nodes 255,507 record pointers

When we compare this result with the previous B-tree example (Example 5), we can
see that the B+ tree can hold up to 255,507 record pointers, whereas a
corresponding B-tree can only hold 65,535 entries.

Insertion and Deletion with B+-trees.

The following example has p = 3, and pleaf = 2

Points to Note:
 Every key value must exist at the leaf level, because all data pointers are at the
leaf level,
 Every value appearing in an internal node, also appears as the rightmost value in
the leaf level of the subtree pointed at by the tree pointer to the left of the value.
 When a leaf node is full, and a new entry is inserted there, the node overflows
and must be split. The first j = (p leaf+1)/2 entries (in the example 2 entries) in the
original node are kept there, and the remaining entries are moved to the new leaf
node. The entry at position j is copied/replicated and moved to the parent node.
 When an internal node is full, and a new entry is to be inserted, the node
overflows and must be split into 2 nodes. The entry at position j is moved to the
parent node. The first j-1 entries are kept in the original node, and the last j+1
entries are moved to the new node.

To practice B+ Tree insertion, complete Exercise 14.15 in Chapter 14 of the course


text.

149

You might also like