Indexing
Database System Concepts, 7th Ed.
©Silberschatz, Korth and Sudarshan
See [Link] for conditions on re-use
Outline
Basic Concepts
Ordered Indices
B+-Tree Index Files
B-Tree Index Files
Database System Concepts - 7th Edition 14.2 ©Silberschatz, Korth and Sudarshan
Basic Concepts
Indexing mechanisms used to speed up access to desired data.
E.g., author catalog in library
Search Key - attribute to set of attributes used to look up records in a file.
An index file consists of records (called index entries) of the form
search-key pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly across “buckets” using
a “hash function”.
Database System Concepts - 7th Edition 14.3 ©Silberschatz, Korth and Sudarshan
Index Evaluation Metrics
Access types supported efficiently. E.g.,
Records with a specified value in the attribute
Records with an attribute value falling in a specified range of values.
Access time
Insertion time
Deletion time
Space overhead
Database System Concepts - 7th Edition 14.4 ©Silberschatz, Korth and Sudarshan
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value.
Clustering index: in a sequentially ordered file, the index whose search key
specifies the sequential order of the file.
Also called primary index
The search key of a primary index is usually but not necessarily the primary
key.
Secondary index: an index whose search key specifies an order different from
the sequential order of the file. Also called
nonclustering index.
Index-sequential file: sequential file ordered on a search key, with a clustering
index on the search key.
Database System Concepts - 7th Edition 14.5 ©Silberschatz, Korth and Sudarshan
Dense Index Files
Dense index — Index record appears for every search-key value in the file.
E.g. index on ID attribute of instructor relation
Database System Concepts - 7th Edition 14.6 ©Silberschatz, Korth and Sudarshan
Dense Index Files (Cont.)
Dense index on dept_name, with instructor file sorted on dept_name
Database System Concepts - 7th Edition 14.7 ©Silberschatz, Korth and Sudarshan
Sparse Index Files
Sparse Index: contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record
points
Database System Concepts - 7th Edition 14.8 ©Silberschatz, Korth and Sudarshan
Sparse Index Files (Cont.)
Compared to dense indices:
Less space and less maintenance overhead for insertions and deletions.
Generally slower than dense index for locating records.
Good tradeoff:
for clustered index: sparse index with an index entry for every block in file,
corresponding to least search-key value in the block.
Database System Concepts - 7th Edition 14.9 ©Silberschatz, Korth and Sudarshan
Secondary Indices Example
Secondary index on salary field of instructor
Index record points to a bucket that contains pointers to all the actual records with
that particular search-key value.
Secondary indices have to be dense
Database System Concepts - 7th Edition 14.10 ©Silberschatz, Korth and Sudarshan
Clustering vs Nonclustering Indices
Indices offer substantial benefits when searching for records.
BUT: indices imposes overhead on database modification
when a record is inserted or deleted, every index on the relation must be
updated
When a record is updated, any index on an updated attribute must be updated
Sequential scan using clustering index is efficient, but a sequential scan using a
secondary (nonclustering) index is expensive on magnetic disk
Each record access may fetch a new block from disk
Each block fetch on magnetic disk requires about 5 to 10 milliseconds
Database System Concepts - 7th Edition 14.11 ©Silberschatz, Korth and Sudarshan
Multilevel Index
If index does not fit in memory, access becomes expensive.
Solution: treat index kept on disk as a sequential file and construct a sparse index
on it.
outer index – a sparse index of the basic index
inner index – the basic index file
If even outer index is too large to fit in main memory, yet another level of index can
be created, and so on.
Indices at all levels must be updated on insertion or deletion from the file.
Database System Concepts - 7th Edition 14.12 ©Silberschatz, Korth and Sudarshan
Multilevel Index (Cont.)
Database System Concepts - 7th Edition 14.13 ©Silberschatz, Korth and Sudarshan
Index Update: Deletion
If deleted record was the only
record in the file with its
particular search-key value,
the search-key is deleted from
the index also.
Single-level index entry deletion:
Dense indices – deletion of search-key is similar to file record deletion.
Sparse indices –
if an entry for the search key exists in the index, it is deleted by replacing
the entry in the index with the next search-key value in the file (in search-
key order).
If the next search-key value already has an index entry, the entry is
deleted instead of being replaced.
Database System Concepts - 7th Edition 14.14 ©Silberschatz, Korth and Sudarshan
Index Update: Insertion
Single-level index insertion:
• Perform a lookup using the search-key value of the record to be inserted.
• Dense indices – if the search-key value does not appear in the index, insert it
Indices are maintained as sequential files
Need to create space for new entry, overflow blocks may be required
• Sparse indices – if index stores an entry for each block of the file, no change
needs to be made to the index unless a new block is created.
If a new block is created, the first search-key value appearing in the new
block is inserted into the index.
Multilevel insertion and deletion: algorithms are simple extensions of the
single-level algorithms
Database System Concepts - 7th Edition 14.15 ©Silberschatz, Korth and Sudarshan
Indices on Multiple Keys
Composite search key
• E.g., index on instructor relation on attributes (name, ID)
• Values are sorted lexicographically
E.g. (John, 12121) < (John, 13514) and
(John, 13514) < (Peter, 11223)
• Can query on just name, or on (name, ID)
Database System Concepts - 7th Edition 14.16 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files
Disadvantage of indexed-sequential files
Performance degrades as file grows, since many overflow blocks get
created.
Periodic reorganization of entire file is required.
Advantage of B+-tree index files:
Automatically reorganizes itself with small, local, changes, in the face of
insertions and deletions.
Reorganization of entire file is not required to maintain performance.
(Minor) disadvantage of B+-trees:
Extra insertion and deletion overhead, space overhead.
Advantages of B+-trees outweigh disadvantages
B+-trees are used extensively
Database System Concepts - 7th Edition 14.17 ©Silberschatz, Korth and Sudarshan
Example of B+-Tree
Database System Concepts - 7th Edition 14.18 ©Silberschatz, Korth and Sudarshan
B+-Tree Index Files (Cont.)
A B+-tree is a rooted tree satisfying the following properties:
All paths from root to leaf are of the same length
Each node that is not a root or a leaf has between
n/2and n children.
A leaf node has between
(n–1)/2and n–1 values
Special cases:
If the root is not a leaf, it has at least 2 children.
If the root is a leaf (that is, there are no other nodes in the tree), it can
have between 0 and (n–1) values.
Database System Concepts - 7th Edition 14.19 ©Silberschatz, Korth and Sudarshan
B+-Tree Node Structure
Typical node
Ki are the search-key values
Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets
of records (for leaf nodes).
The search-keys in a node are ordered
K1 < K2 < K3 < . . . < Kn–1
(Initially assume no duplicate keys, address duplicates later)
Database System Concepts - 7th Edition 14.20 ©Silberschatz, Korth and Sudarshan
Leaf Nodes in B+-Trees
Properties of a leaf node:
For i = 1, 2, . . ., n–1, pointer Pi points to a file record with search-key value Ki,
If Li, Lj are leaf nodes and i < j, Li’s search-key values are less than or equal to Lj’s
search-key values
Pn points to next leaf node in search-key order
Database System Concepts - 7th Edition 14.21 ©Silberschatz, Korth and Sudarshan
Non-Leaf Nodes in B+-Trees
Non leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf
node with m pointers:
All the search-keys in the subtree to which P1 points are less than K1
For 2 i n – 1, all the search-keys in the subtree to which Pi points have values
greater than or equal to Ki–1 and less than Ki
All the search-keys in the subtree to which Pn points have values greater than or
equal to Kn–1
General structure
Database System Concepts - 7th Edition 14.22 ©Silberschatz, Korth and Sudarshan
Example of B+-tree
B+-tree for instructor file (n = 6)
Leaf nodes must have between 3 and 5 values
(
(n–1)/2and n –1, with n = 6).
Non-leaf nodes other than root must have between 3 and 6 children (
(n/2
and n with n =6).
Root must have at least 2 children.
Database System Concepts - 7th Edition 14.23 ©Silberschatz, Korth and Sudarshan
Observations about B+-trees
Since the inter-node connections are done by pointers, “logically” close blocks need
not be “physically” close.
The non-leaf levels of the B+-tree form a hierarchy of sparse indices.
The B+-tree contains a relatively small number of levels
Level below root has at least 2*
n/2values
Next level has at least 2*
n/2*
n/2values
.. etc.
If there are K search-key values in the file, the tree height is no more than
logn/2(K)
thus searches can be conducted efficiently.
Insertions and deletions to the main file can be handled efficiently, as the index can
be restructured in logarithmic time (as we shall see).
Database System Concepts - 7th Edition 14.24 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees
function find(v)
1. C=root
2. while (C is not a leaf node)
1. Let i be least number s.t. V Ki.
2. if there is no such number i then
3. Set C = last non-null pointer in C
4. else if (v = [Link] ) Set C = Pi +1
5. else set C = [Link]
3. if for some i, Ki = V then return [Link]
4. else return null /* no record with search-key value v exists. */
Database System Concepts - 7th Edition 14.25 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
Range queries find all records with search key values in a given range
• See book for details of function findRange(lb, ub) which returns set of all
such records
• Real implementations usually provide an iterator interface to fetch matching
records one at a time, using a next() function
Database System Concepts - 7th Edition 14.26 ©Silberschatz, Korth and Sudarshan
Queries on B+-Trees (Cont.)
If there are K search-key values in the file, the height of the tree is no more than
logn/2(K)
.
A node is generally the same size as a disk block, typically 4 kilobytes
• and n is typically around 100 (40 bytes per index entry).
With 1 million search key values and n = 100
• at most log50(1,000,000) = 4 nodes are accessed in a lookup traversal from
root to leaf.
Contrast this with a balanced binary tree with 1 million search key values —
around 20 nodes are accessed in a lookup
• above difference is significant since every node access may need a disk I/O,
costing around 20 milliseconds
Database System Concepts - 7th Edition 14.27 ©Silberschatz, Korth and Sudarshan
Non-Unique Keys
If a search key ai is not unique, create instead an index on a composite key (ai ,
Ap), which is unique
• Ap could be a primary key, record ID, or any other attribute that guarantees
uniqueness
Search for ai = v can be implemented by a range search on composite key, with
range (v, - ∞) to (v, + ∞)
But more I/O operations are needed to fetch the actual records
• If the index is clustering, all accesses are sequential
• If the index is non-clustering, each record access may need an I/O operation
Database System Concepts - 7th Edition 14.28 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion
Assume record already added to the file. Let
pr be pointer to the record, and let
v be the search key value of the record
1. Find the leaf node in which the search-key value would appear
1. If there is room in the leaf node, insert (v, pr) pair in the leaf node
2. Otherwise, split the node (along with the new (v, pr) entry) as discussed in the
next slide, and propagate updates to parent nodes.
Database System Concepts - 7th Edition 14.29 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Insertion (Cont.)
Splitting a leaf node:
take the n (search-key value, pointer) pairs (including the one being inserted)
in sorted order. Place the first
n/2in the original node, and the rest in a new
node.
let the new node be p, and let k be the least key value in p. Insert (k,p) in the
parent of the node being split.
If the parent is full, split it and propagate the split further up.
Splitting of nodes proceeds upwards till a node that is not full is found.
In the worst case the root node may be split increasing the height of the tree
by 1.
Result of splitting node containing Brandt, Califieri and Crick on inserting Adams
Next step: insert entry with (Califieri, pointer-to-new-node) into parent
Database System Concepts - 7th Edition 14.30 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
Affected nodes
B+-Tree before and after insertion of “Adams”
Database System Concepts - 7th Edition 14.31 ©Silberschatz, Korth and Sudarshan
B+-Tree Insertion
B+-Tree before and after insertion of “Lamport”
Affected nodes
Affected nodes
Database System Concepts - 7 Edition
th
14.32 ©Silberschatz, Korth and Sudarshan
Insertion in B+-Trees (Cont.)
Splitting a non-leaf node: when inserting (k,p) into an already full internal node N
Copy N to an in-memory area M with space for n+1 pointers and n keys
Insert (k,p) into M
Copy P1,K1, …, K
n/2
-1 ,P n/2from M back into node N
Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated node N'
Insert (K n/2,N') into parent N
Example
Read pseudocode in book!
Database System Concepts - 7th Edition 14.33 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion
Before and after deleting “Srinivasan”
Affected nodes
Deleting “Srinivasan” causes merging of under-full leaves
Database System Concepts - 7th Edition 14.34 ©Silberschatz, Korth and Sudarshan
Examples of B+-Tree Deletion (Cont.)
Before and after deleting “Singh” and “Wu”
Affected nodes
Leaf containing Singh and Wu became underfull, and borrowed a value Kim from
its left sibling
Search-key value in the parent changes as a result
Database System Concepts - 7th Edition 14.35 ©Silberschatz, Korth and Sudarshan
Example of B+-tree Deletion (Cont.)
Before and after deletion of “Gold”
Node with Gold and Katz became underfull, and was merged with its sibling
Parent node becomes underfull, and is merged with its sibling
• Value separating two nodes (at the parent) is pulled down when merging
Root node then has only one child, and is deleted
Database System Concepts - 7th Edition 14.36 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Assume record already deleted from file. Let V be the search key value of the record,
and Pr be the pointer to the record.
Remove (Pr, V) from the leaf node
If the node has too few entries due to the removal, and the entries in the node and
a sibling fit into a single node, then merge siblings:
Insert all the search-key values in the two nodes into a single node (the one on
the left), and delete the other node.
Delete the pair (Ki–1, Pi), where Pi is the pointer to the deleted node, from its
parent, recursively using the above procedure.
Database System Concepts - 7th Edition 14.37 ©Silberschatz, Korth and Sudarshan
Updates on B+-Trees: Deletion
Otherwise, if the node has too few entries due to the removal, but the entries in the
node and a sibling do not fit into a single node, then redistribute pointers:
Redistribute the pointers between the node and a sibling such that both have
more than the minimum number of entries.
Update the corresponding search-key value in the parent of the node.
The node deletions may cascade upwards till a node which has n/2or more
pointers is found.
If the root node has only one pointer after deletion, it is deleted and the sole child
becomes the root.
Database System Concepts - 7th Edition 14.38 ©Silberschatz, Korth and Sudarshan
Complexity of Updates
Cost (in terms of number of I/O operations) of insertion and deletion of a single
entry proportional to height of the tree
• With K entries and maximum fanout of n, worst case complexity of
insert/delete of an entry is O(logn/2(K))
In practice, number of I/O operations is less:
• Internal nodes tend to be in buffer
• Splits/merges are rare, most insert/delete operations only affect a leaf node
Average node occupancy depends on insertion order
• 2/3rds with random, ½ with insertion in sorted order
Database System Concepts - 7th Edition 14.39 ©Silberschatz, Korth and Sudarshan
Non-Unique Search Keys
Alternatives to scheme described earlier
• Buckets on separate block (bad idea)
• List of tuple pointers with each key
Extra code to handle long lists
Deletion of a tuple can be expensive if there are many duplicates on
search key (why?)
• Worst case complexity may be linear!
Low space overhead, no extra cost for queries
• Make search key unique by adding a record-identifier
Extra storage overhead for keys
Simpler code for insertion/deletion
Widely used
Database System Concepts - 7th Edition 14.40 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization
B+-Tree File Organization:
• Leaf nodes in a B+-tree file organization store records, instead of pointers
• Helps keep data records clustered even when there are
insertions/deletions/updates
Leaf nodes are still required to be half full
• Since records are larger than pointers, the maximum number of records that
can be stored in a leaf node is less than the number of pointers in a nonleaf
node.
Insertion and deletion are handled in the same way as insertion and deletion of
entries in a B+-tree index.
Database System Concepts - 7th Edition 14.41 ©Silberschatz, Korth and Sudarshan
B+-Tree File Organization (Cont.)
Example of B+-tree File Organization
Good space utilization important since records use more space than pointers.
To improve space utilization, involve more sibling nodes in redistribution during
splits and merges
2n / 3where
Involving 2 siblings in redistribution (to avoid split / merge possible)
results in each node having at least entries
Database System Concepts - 7th Edition 14.42 ©Silberschatz, Korth and Sudarshan
Other Issues in Indexing
Record relocation and secondary indices
• If a record moves, all secondary indices that store record pointers have to be
updated
• Node splits in B+-tree file organizations become very expensive
• Solution: use search key of B+-tree file organization instead of record pointer in
secondary index
Add record-id if B+-tree file organization search key is non-unique
Extra traversal of file organization to locate record
• Higher cost for queries, but node splits are cheap
Database System Concepts - 7th Edition 14.43 ©Silberschatz, Korth and Sudarshan
Indexing Strings
Variable length strings as keys
Variable fanout
Use space utilization as criterion for splitting, not number of pointers
Prefix compression
Key values at internal nodes can be prefixes of full key
Keep enough characters to distinguish entries in the subtrees separated by
the key value
– E.g., “Silas” and “Silberschatz” can be separated by “Silb”
Keys in leaf node can be compressed by sharing common prefixes
Database System Concepts - 7th Edition 14.44 ©Silberschatz, Korth and Sudarshan
Bulk Loading and Bottom-Up Build
Inserting entries one-at-a-time into a B+-tree requires 1 IO per entry
assuming leaf level does not fit in memory
can be very inefficient for loading a large number of entries at a time (bulk
loading)
Efficient alternative 1:
sort entries first (using efficient external-memory sort algorithms discussed later
in Section 12.4)
insert in sorted order
insertion will go to existing page (or cause a split)
much improved IO performance, but most leaf nodes half full
Efficient alternative 2: Bottom-up B+-tree construction
As before sort entries
And then create tree layer-by-layer, starting with leaf level
details as an exercise
Database System Concepts - 7th Edition 14.45 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files
Similar to B+-tree, but B-tree allows search-key values to appear only once;
eliminates redundant storage of search keys.
Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional
pointer field for each search key in a nonleaf node must be included.
Generalized B-tree leaf node
Nonleaf node – pointers Bi are the bucket or file record pointers.
Database System Concepts - 7th Edition 14.46 ©Silberschatz, Korth and Sudarshan
B-Tree Index Files (Cont.)
Advantages of B-Tree indices:
May use less tree nodes than a corresponding B+-Tree.
Sometimes possible to find search-key value before reaching leaf node.
Disadvantages of B-Tree indices:
Only small fraction of all search-key values are found early
Non-leaf nodes are larger, so fan-out is reduced. Thus, B-Trees typically have
greater depth than corresponding B+-Tree
Insertion and deletion more complicated than in B+-Trees
Implementation is harder than B+-Trees.
Typically, advantages of B-Trees do not out weigh disadvantages.
Database System Concepts - 7th Edition 14.47 ©Silberschatz, Korth and Sudarshan
B-Tree Index File Example
B-tree (above) and B+-tree (below) on same data
Database System Concepts - 7th Edition 14.48 ©Silberschatz, Korth and Sudarshan
Indexing on Flash
Random I/O cost much lower on flash
• 20 to 100 microseconds for read/write
Writes are not in-place, and (eventually) require a more expensive erase
Optimum page size therefore much smaller
Bulk-loading still useful since it minimizes page erases
Write-optimized tree structures (discussed later) have been adapted to minimize
page writes for flash-optimized search trees
Database System Concepts - 7th Edition 14.49 ©Silberschatz, Korth and Sudarshan
Indexing in Main Memory
Random access in memory
• Much cheaper than on disk/flash
• But still expensive compared to cache read
• Data structures that make best use of cache preferable
• Binary search for a key value within a large B+-tree node results in many
cache misses
B+- trees with small nodes that fit in cache line are preferable to reduce cache
misses
Key idea: use large node size to optimize disk access, but structure data within a
node using a tree with small node size, instead of using an array.
Database System Concepts - 7th Edition 14.50 ©Silberschatz, Korth and Sudarshan