Summary of our course so far
Introduction to databases
Relational Algebra
Basic SQL commands (select, insert,delete,inner
join, set operations)
Advanced SQL commands (stored procedures,
views, aggregate functions with group by)
How to design a database ( 0 NF, 1NF, 2NF,3NF)
Secondary Storage file structures, B+ Trees
Indexing
Transactions
Summary of our course so
far
Introduction to databases
Relational Algebra
Basic SQL commands (select, insert,delete,inner
join, set operations)
Advanced SQL commands (stored procedures,
views, aggregate functions with group by)
How to design a database ( 0 NF, 1NF, 2NF,3NF)
Secondary Storage file structures, B+ Trees
Indexing
Transactions
B Trees
• B-tree: Multiway search tree, used for efficient
data storage and retrieval.
• B+ tree: A type of B-tree where data records are
in leaf nodes, enabling faster sequential access.
• Importance:
• Optimizes disk access.
• Used in database indexing and large file systems
• Ideal for range queries and large sequential access
data.
• The "B" in B-tree does not stand for binary but is
thought to derive from "Bayer” founder of the
algorithm or “Balanced”
What is Seek Time? Rotational
Latency? Transmission Time?
Formal definition of B+ Tree
Properties
Properties of a B+ Tree of order v :
– All internal nodes (except root) has at least v keys
and at most 2v keys .
– The root has at least 2 children unless it’s a leaf..
– All leaves are on the same level.
– An internal node with k keys has k+1 children
– Very similar to 2-4 keys we learned in Algorithms
class last semester
Please watch
[Link]
Example: B+ tree with order of 1
Each node must hold at least 1 entry, and at
most 2 entries
Root
40
20 33 51 63
10* 15* 20* 27* 33* 37* 40* 46* 51* 55* 63* 97*
Example: Search in a B+ tree
order 2
Search: how to find the records with a given
search key value?
– Begin at root, and use key comparisons to go to leaf
Examples: search for 5*, 16*, all data entries >=
24* ...
– The last one is a range search, we need to do the sequential
scan, starting from Root
the first leaf containing a value >= 24.
13 17 24 30
2* 3* 5* 7* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
B+ Trees in Practice
Typical order: 100. Typical fill-factor:
67%.
– average fanout = 133 (i.e, # of pointers in
internal node)
Can often hold top levels in buffer
pool:
– Level 1 = 1 page = 8 Kbytes
– Level 2 = 133 pages = 1 Mbyte
– Level 3 = 17,689 pages = 133 MBytes
Suppose there are 1,000,000,000 data
entries.
– H = log133(1000000000/132) < 4
– The cost is 5 pages read
How to Insert a Data Entry into
a B+ Tree?
Let’s look at several examples
first.
1
Inserting 16*, 8* into Example
B+ tree
Root
13 17 24 30
2* 3* 5* 7* 8* 14* 15* 16*
You overflow
13 17 24 30
2* 3* 5* 7* 8*
One new child (leaf node)
generated; must add one more
pointer to its parent, thus one more
key value as well.
1
Inserting 8* (cont.)
Copy up 13 17 24 30
the middle Entry to be inserted in parent node.
5 (Note that 5 is
s copied up and
value (leaf continues to appear in the leaf.)
split)
2* 3* 5* 7* 8*
5 13 17 24 30 You overflow!
1
Insertion into B+ tree (cont.)
• Understand
difference 5 13 17 24 30
between copy-up
and push-up
• Observe how We split this node, redistribute entries evenly,
and push up middle key.
minimum
occupancy is
guaranteed in
both leaf and Entry to be inserted in parent node.
17 (Note that 17 is pushed up and only
index pg splits. appears once in the index. Contrast
this with a leaf split.)
5 13 24 30
1
Example B+ Tree After Inserting
8*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
Notice that root was split, leading to increase in height.
1
Inserting a Data Entry into a B+ Tree:
Summary
Find correct leaf L.
Put data entry onto L.
– If L has enough space, done!
– Else, must split L (into L and a new node L2)
• Redistribute entries evenly, put middle key in L2
• copy up middle key.
• Insert index entry pointing to L2 into parent of L.
This can happen recursively
– To split index node, redistribute entries evenly,
but push up middle key. (Contrast with leaf
splits.)
Splits “grow” tree; root split increases
height.
– Tree growth: gets wider or one level taller at 1
Deleting a Data Entry from a
B+ Tree
Examine examples first …
1
Delete 19* and 20*
Root
17
5 13 24 30
2* 3* 5* 7* 8* 14* 16* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39*
ow
nde r fl
u 22*
You
22* 24* 27* 29*
Have we still forgot something?
1
Deleting 19* and 20* (cont.)
Root
17
5 13 27 30
2* 3* 5* 7* 8* 14* 16* 22* 24* 27* 29* 33* 34* 38* 39*
Notice how 27 is copied up.
But can we move it up?
Now we want to delete 24
Underflow again! But can we redistribute
this time? 1
w n g!
i
flo sibl
Deleting 24* un
u ew
r
de ith
Observe the two o
Y erg
M
leaf nodes are
merged, and 27 is 30
discarded from
their parent, but … 22* 27* 29* 33* 34* 38* 39*
Observe `pull
down’ of index
entry (below).
New root 5 13 17 30
2* 3* 5* 7* 8* 14* 16* 22* 27* 29* 33* 34* 38* 39*
1
Deleting a Data Entry from a B+
Tree: Summary
Start at root, find leaf L where entry
belongs.
Remove the entry.
– If L is at least half-full, done!
– If L has only d-1 entries,
• Try to re-distribute, borrowing from sibling (adjacent
node with same parent as L).
• If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry
(pointing to L or sibling) from parent of L.
Merge could propagate to root, decreasing
height.
2
Example of Non-leaf Re-
distribution
Tree is shown below during deletion of 24*.
(What could be a possible initial tree?)
In contrast to previous example, can re-
distribute entry from left child of root to
right child. Root
22
5 13 17 20 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
2
After Re-distribution
Intuitively, entries are re-distributed by
`pushing through’ the splitting entry in the
parent node.
It suffices to re-distribute index entry with
key 20; we’ve re-distributed
Root 17 as well for
illustration. 17
5 13 20 22 30
2* 3* 5* 7* 8* 14* 16* 17* 18* 20* 21* 22* 27* 29* 33* 34* 38* 39*
2
Terminology
Bucket Factor: the number of records which
can fit in a leaf node.
Fan-out : the average number of children of
an internal node.
A B+tree index can be used either as a
primary index or a secondary index.
– Primary index: determines the way the records
are actually stored (also called a sparse index)
– Secondary index: the records in the file are not
grouped in buckets according to keys of secondary
indexes (also called a dense index)
2
Summary
Tree-structured indexes are ideal for range-
searches, also good for equality searches.
B+ tree is a dynamic structure.
– Inserts/deletes leave tree height-balanced; High
fanout (F) means depth rarely more than 3 or 4.
– Almost always better than maintaining a sorted file.
– Typically, 67% occupancy on average.
– If data entries are data records, splits can change
rids!
Most widely used index in database
management systems because of its versatility.
One of the most optimized components of a
DBMS.
2
B+ tree index vs Hash Index
The difference between using a b-tree and a
hash table is that
B+ tree is better with RANGE comparisons ( =,
>, >=, <, <=, or BETWEEN operators)
Hash index is used ONLY FOR EQUALITY
comparisons that use the = or <=> operators.
2
2