Data Indexing
Herbert A. Evans
Unit V : Indexing and Multiway
Trees
Indexing and Multiway Trees- Indexing, indexing
techniques-primary, secondary, dense, sparse, Multiway
search trees, B-Tree- insertion, deletion, B+Tree -
insertion, deletion, use of B+ tree inIndexing, Trie Tree.
Purposes of Data Indexing
• What is Data Indexing?
• Why it is important?
Concept of File Systems
• Stores and organizes data into computer
files.
• Makes it easier to find and access data at
any given time.
Database Management Systems
• The file system that manages a database.
• Database - is an organized collection of
logically related data.
How DBMS Accesses Data?
• The operations read, modify, update, and
delete are used to access data from
database.
• DBMS must first transfer the data
temporarily to a buffer in main memory.
• Data is then transferred between disk and
main memory into units called blocks.
Purpose of Data Indexing
• It is a data structure that is added to a file
to provide faster access to the data.
• It reduces the number of blocks that the
DBMS has to check.
Properties of Data Index
• It contains a search key and a pointer.
• Search key - an attribute or set of attributes that is
used to look up the records in a file.
• Pointer - contains the address of where the data
is stored in memory.
• It can be compared to the card catalog system
used in public libraries of the past.
Two Types of Indices
• Ordered index (Primary index or clustering
index) – which is used to access data
sorted by order of values.
• Hash index (secondary index or non-
clustering index ) - used to access data
that is distributed uniformly across a range
of buckets.
Ordered Index
Hash Index
Choosing Indexing Technique
• Five Factors involved when choosing the
indexing technique:
• access type
• access time
• insertion time
• deletion time
• space overhead
Indexing Definitions
• Access type- is the type of access being used.
• Access time - time required to locate the data.
• Insertion time - time required to insert the new
data.
• Deletion time - time required to delete the data.
• Space overhead - the additional space occupied
by the added data structure.
Types of Ordered Indices
• Dense index - an index record appears for
every search-key value in the file.
• Sparse index - an index record that
appears for only some of the values in the
file.
Dense Index
Sparse Index
Choosing Multi-Level Index
• In some cases an index may be too large for efficient
processing.
• In that case use multi-level indexing.
• In multi-level indexing, the primary index is treated as a
sequence file and sparse index is created on it.
• The outer index is a sparse index of the primary index
whereas the inner index is the primary index.
Multi-Level Index
M -Way Tree
M- Way search trees are generalized versions of binary
search trees. The goal of m-way search tree is to minimize
the accesses while retrieving a key from a file.
However an m-way search tree of height h calls for O(h)
number of accesses for an insert/delete/retrieval operation.
B-Tree
A B-tree of order m is an m-way tree (i.e., a tree
where each node may have up to m children) in
which:
Every node has at most m children.
Every non-leaf node (except root) has at least ⌈m/2⌉ child
nodes. The root has at least two children if it is not a leaf node.
A non-leaf node with k children contains k − 1 keys.
B-Tree(Cont.)
All leaves appear at the same level.
It generalizes Binary Search Tree.
An example B-Tree
26 A B-tree of order 5
containing 26 items
6 12
42 51 62
1 2 4 7 8 13 15 18 25
27 29 45 46 48 53 55 60 64 70 90
Note that all the leaves are at the same level
B-Trees
Construction of a B-Tree
• Attempt to insert the new key into a leaf
• If this would result in that leaf becoming too big, split
the leaf into two, promoting the middle key to the leaf’s
parent
• If this would result in the parent becoming too big,
split the parent into two, promoting the middle key
• This strategy might have to be repeated all the way to
the top
• If necessary, the root is split in two and the middle key
is promoted to a new root, making the tree one level
higher
Example of B-Tree
Construct a B-tree of order 3 where keys arrive in the
following order:
20 10 50 30 40 70 60 80 90
Solution
20 10 50 30 40 70 60 80 90
20
• 1
• 2.
10,20
• 3.
10,20
20 ,50 20
10 50
Solution
• 4.
20
10 30,50
• 5. 20 20,40
10 30,40
40 ,50 10 30 50
Solution
20,40
• 6.
10 30 50,70
• 7.
20,40 40
20, 40, 60
10 30 60 70
50,60, 10 30 50 70
Solution
40
20 60
10 30 50 70
Solution
• 8.
40
20 60
10 30 50 70,80
Solution
• 9.
40
20 60
10 30 50 70, 80
80, 90
Solution
40
20 60,80
10 30 50 70 90
Constructing a B-tree
• Suppose we start with an empty B-tree and keys
arrive in the following order:1 12 8 2 25 6 14
28 17 7 52 16 48 68 3 26 29 53 55 45
• We want to construct a B-tree of order 5
• The first four items go into the root:
1 2 8 12
• To put the fifth item in the root would violate
condition 5
• Therefore, when 25 arrives, pick the middle key to
make a new root
B-Trees
Constructing a B-tree (contd.)
1 2 12 25
6, 14, 28 get added to the leaf nodes:
8
1 2 6 12 14 25 28
B-Trees
Constructing a B-tree (contd.)
Adding 17 to the right leaf node would over-fill it, so we take the
middle key, promote it (to the root) and split the leaf
8 17
1 2 6 12 14 25 28
7, 52, 16, 48 get added to the leaf nodes
8 17
1 2 6 7 12 14 16 25 28 48 52
B-Trees
Constructing a B-tree (contd.)
Adding 68 causes us to split the right most leaf, promoting 48 to the
root, and adding 3 causes us to split the left most leaf, promoting 3
to the root; 26, 29, 53, 55 then go into the leaves
3 8 17 48
1 2 6 7 12 14 16 25 26 28 29 52 53 55 68
Adding 45 causes a split of 25 26 28 29
and promoting 28 to the root then causes the root to split
B-Trees
Constructing a B-tree (contd.)
17
3 8 28 48
1 2 6 7 12 14 16 25 26 29 45 52 53 55 68
B-Trees
Inserting into a B-Tree
• Attempt to insert the new key into a leaf
• If this would result in that leaf becoming too big, split the
leaf into two, promoting the middle key to the leaf’s
parent
• If this would result in the parent becoming too big, split
the parent into two, promoting the middle key
• This strategy might have to be repeated all the way to
the top
• If necessary, the root is split in two and the middle key is
promoted to a new root, making the tree one level higher
B-Trees
Exercise in Inserting a B-Tree
• Insert the following keys to a 5-way B-tree:
• 3, 7, 9, 23, 45, 1, 5, 14, 25, 24, 13, 11, 8,
19, 4, 31, 35, 56
B-Trees
Removal from a B-tree
• During insertion, the key always goes into a leaf. For
deletion we wish to remove from a leaf. There are three
possible ways we can do this:
• 1 - If the key is already in a leaf node, and removing it
doesn’t cause that leaf node to have too few keys, then
simply remove the key to be deleted.
• 2 - If the key is not in a leaf then it is guaranteed (by the
nature of a B-tree) that its predecessor or successor will
be in a leaf -- in this case we can delete the key and
promote the predecessor or successor key to the non-
leaf deleted key’s position.
B-Trees
Removal from a B-tree (2)
• If (1) or (2) lead to a leaf node containing less than the minimum
number of keys then we have to look at the siblings immediately
adjacent to the leaf in question:
– 3: if one of them has more than the min. number of keys then we
can promote one of its keys to the parent and take the parent
key into our lacking leaf
– 4: if neither of them has more than the min. number of keys then
the lacking leaf and one of its neighbours can be combined with
their shared parent (the opposite of promoting a key) and the
new leaf will have the correct number of keys; if this step leave
the parent with too few keys then we repeat the process up to
the root itself, if required
B-Trees
Type #1: Simple leaf deletion
Assuming a 5-way
B-Tree, as before... 12 29 52
2 7 9 15 22 31 43 56 69 72
Delete 2: Since there are enough
keys in the node, just delete it
Note when printed: this slide is animated
B-Trees
Type #2: Simple non-leaf
deletion
Delete 52
12 29 56
52
7 9 15 22 31 43 56 69 72
Borrow the predecessor
or (in this case) successor
Note when printed: this slide is animated
B-Trees
Type #4: Too few keys in node
and its siblings
12 29 56
Join back together
7 9 15 22 31 43 69 72
Too few keys!
Delete 72
Note when printed: this slide is animated
B-Trees
Type #4: Too few keys in node
and its siblings
12 29
7 9 15 22 31 43 56 69
Note when printed: this slide is animated
B-Trees
Type #3: Enough siblings
12 29
Demote root key and
promote leaf key
7 9 15 22 31 43 56 69
Delete 22
Note when printed: this slide is animated
B-Trees
Type #3: Enough siblings
12 31
7 9 15 29 43 56 69
Note when printed: this slide is animated
B-Trees
B+ Tree
• B+ Tree is an extension of B Tree which allows efficient
insertion, deletion and search operations.
• In B Tree, Keys and records both can be stored in the
internal as well as leaf nodes. Whereas, in B+ tree,
records (data) can only be stored on the leaf nodes while
internal nodes can only store the key values.
• The leaf nodes of a B+ tree are linked together in the
form of a singly linked lists to make the search queries
more efficient.
B+Tree Advantages
Comparison of B Tree and B+Tree
B+Tree Example
Trie Tree
• What is Trie-
• The word "Trie" is an excerpt from the word
"retrieval". Trie is a sorted tree-based data-structure
that stores the set of strings.
• A trie (derived from retrieval) is a multiway tree data
structure used for storing strings over an alphabet. It
is used to store a large amount of strings. The
pattern matching can be done efficiently using tries.
• The trie shows words like allot, alone, ant, and, are,
bat, bad. The idea is that all strings sharing common
prefix should come from a common node. The tries
are used in spell checking programs.
Trie Tree
Properties of the Trie for a set of the string:
• The root node of the trie always represents the null
node.
• Each child of nodes is sorted alphabetically.
• Each node can have a maximum of 26 children (A to
Z).
• Each node (except the root) can store one letter of
the alphabet.
Trie Tree
• Create Trie tree for string and, ant, dad, do
Trie Tree
• The diagram below depicts a trie representation for the bell, bear,
bore, bat, ball, stop, stock, and stack.
Thank You