B-trees
Eduardo Laber
David Sotelo
What are B-trees?
• Balanced search trees designed for secondary
storage devices
• Similar to AVL-trees but better at minimizing
disk I/O operations
• Main data structure used by DBMS to store
and retrieve information
What are B-trees?
• Nodes may have many children (from a few to
thousands)
• Branching factor can be quite large
• Every B-tree of n keys has height O(log n)
• In practice, its height is smaller than the height of
an AVL-Tree
B-trees and Branching Factor
Definition of B-trees
B-tree is a rooted tree containing the following
five properties:
1. Every node x has the following attributes:
a) x.n, the number of keys stored in node x
b) The x.n keys: x.key1 ≤ x.key2 ≤ ... ≤ [Link].n
c) The boolen [Link] indicating if x is a leaf or an
internal node
Definition of B-trees
2. If x is an internal node it contains x.n + 1
pointers x.p1 , x.p2 , ... , x.p(x.n + 1) to its children
3. The keys [Link] separate ranges of trees stored
in each subtree ([Link] , [Link]+1 )
4. All leaves have the same depth == tree’s
height.
Definition of B-trees
5. Bounds on the number of keys of a node:
• Let B be a positive integer representing the order of
the B-tree.
• Every node (except the root) must have at least B keys.
• Every node (except the root) must have at most 2B
keys.
• Root is free to contain between 1 and 2B nodes (why?)
Example of a B-tree
Exercise 1
Enumerate all valid B-trees of order 2
that represent the set {1, 2, ... , 8}
Exercise 1
Solution:
4 5
1 2 3 5 6 7 8 1 2 3 4 6 7 8
3 6
1 2 4 5 7 8
The height of a B-tree
Theorem : Let h be the height of a B-tree of n
keys and order B > 1. Then: h ≤ log B (n+1)/2
Proof:
• Root contains at least one key.
• All other nodes contain at least B keys
• At least one key at depth 0
• At least 2B keys at depth 1
• At least 2B2 + B keys at depth 2
• At least 2Bi + Bi-1 + Bi-2 + ... + B keys at depth i
Proof (continued)
h
n 1 2 B i
i 1
h 1
B 1
n 1 2
B 1
h n 1 h
n 2 B 1 B
2
n 1
h log B ■
2
Searching a B-tree
• Similar to searching a binary search tree.
• Multiway branching decision according to the
number of the node’s chidren.
• Recursive procedure with a time complexity of
O(B log B n) for a tree of n keys and order B.
Searching a B-tree
B-TREE-SEARCH (x, k)
1 i=1
2 while i ≤ x.n and k > [Link] do i = i + 1
3 if i ≤ x.n and k == [Link] then return (x, i)
4 if [Link] then return NIL
5 else
DISK-READ([Link])
return B-TREE-SEARCH ([Link] , k)
Searching a B-tree
• Search for the key F
C G K P S
A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F
C G K P S
A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key F
C G K P S
A B D E F H I L M N O Q R T U
Searching a B-tree
• Search for the key N
C G K P S
A B D E F H I L M N O Q R T U
Searching a B-tree
Lemma: The time complexity of procedure
B-TREE-SEARCH is O(B log B n)
Proof:
• Number of recursive calls is equal to tree’s
height.
• The height of a B-tree is O(log B n)
• Cost between B and 2B iterations per call.
• Total of O(B log B n) steps. ■
Exercise 2
• Suppose that B-TREE-SEARCH is implemented
to use binary search rather than linear search
within each node.
• Show that this changes makes the time
complexity O(lg n), independently of how B
might be chosen as a function of n.
Exercise 2
Solution:
• By using binary search the number of steps of
the algorithm becomes O(lg B log B n) .
• Observe that log B n = lg n / lg B .
• Therefore O(lg B log B n) = O(lg n).
Linear or Binary B-tree search ?
Lemma: If 1 < B < n then lg n ≤ B logB n
Proof: lg n B
B
B log B n log B n
lg B
B
lg n B
n B B n B / B
B
B
lg n B lg n / B lg n / n B
lg n B
/ n lg n lg n
B 1
Inserting a key into a B-tree
• The new key is always inserted into an existing leaf node (why?)
• Firstly we search for the leaf position at which to insert the new
key.
• If such a node is full we split it.
• A split operation splits a full node around its median key into
two nodes having B keys each.
• Median key moves up into splitted node’s parent
(insertion recursive call).
Split operation
• Inserting key F into a full node (B = 2)
A C E G K M O Q
Split operation
• Node found but already full
A C E FG K M O Q
Split operation
• Median key identified
A C E FG K M O Q
Split operation
• Splitting the node
E J
A C F G K M O Q
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)
E J T X
A C F G K M O Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)
E J T X
A C F G K MNO Q UW Y Z
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)
E J N T X
A C F G K M O Q UW Y Z
SPLIT
Inserting a key into a B-tree
• Insertion can be propagated upward (B = 2)
E J SPLIT T X
A C F G K M O Q U W Y Z
Inserting a key into a B-tree
B-TREE-INSERT (x, k, y)
1 i=1
2 while i ≤ x.n and k < [Link] do i = i + 1
3 x.n = x.n + 1
4 [Link] = k
5 [Link]+1 = y
6 for j = x.n downto i+1 do
7 [Link] = [Link]-1
8 [Link] = [Link]-1
9 end-for
10 DISK-WRITE(x)
Inserting a key into a B-tree
B-TREE-INSERT (x, k)
11 if x.n > 2*B then
12 [m, z] = SPLIT (x)
13 if [Link] != NIL then
14 DISK-READ ([Link])
15 end-if
16 else
17 [Link] = ALLOCATE-NODE()
18 DISK-WRITE (x)
19 root = [Link]
20 end-else
21 B-TREE-INSERT ([Link], m, z)
22 end-if
Inserting a key into a B-tree
SPLIT (x)
1 z = ALLOCATE-NODE()
2 m = FIND-MEDIAN (x)
3 COPY-GREATER-ELEMENTS(x, m, z)
4 DISK-WRITE (z)
5 COPY-SMALLER-ELEMENTS(x, m, x)
6 DISK-WRITE (x)
7 return [m, z]
Inserting a key into a B-tree
• Function B-TREE-INSERT has three arguments:
– The node x at which an element of key k should be inserted
– The key k to be inserted
– A pointer y to the left child of k to be used as one of the
pointers of x during insertion process.
• There is a global variable named root which is a pointer to the root
of the B-Tree.
• Observe that the field [Link] was not defined as an original B-
tree attribute, but is considered just to simplify the process.
• The fields [Link] should also be updated accordingly.
Inserting a key into a B-tree
Lemma: The time complexity of B-TREE-INSERT is O(B log B n)
Proof:
• Recall that B-TREE-SEARCH function is called first and costs
O(log n) by using binary search. Then, B-TREE-INSERT starts by
visiting a node and proceeds upward.
• At most one node is visited per level/depth and only visited nodes
can be splitted. A most one node is created during the insertion
process. Cost for splitting is proportional to 2B.
• Number of visited nodes is equal to tree’s height and the height of
a B-tree is O(log B n). Cost between B and 2B iterations per visited
node. Total of O(B log B n) steps. ■
Some questions on insertion
• Which split operation increases the tree’s height?
The split of the root of the tree.
• How many DISK-READ operations are executed by the
insertion algorithm?
Every node was read at least twice.
• Does binary search make sense here?
Not exactly. We already pay O(B) to split a node (for
finding the median).
Drawbacks of our insertion method
• Once that the key’s insertion node is found it may be
necessary to read its parent node again (due to splitting).
• DISK-READ/WRITE operations are expensive and would be
executed al least twice for each node in the key’s path.
• It would be necessary to store a nodes’s parent or to use
the recursion stack to keep its reference.
• (Mond and Raz, 1985) provide a solution that spends one
DISK-READ/WRITE per visited node (See at CLRS)
Exercise 3
• Show the results of inserting the keys
E, H, B, A, F, G, C, J, D, I
in order into an empty B-tree of order 1.
Exercise 3
Solution: (final configuration)
E
B G I
A C D F H J
Exercise 4
• Does a B-tree of order 1 is a good choice for a balanced
search tree?
• What about the expression h ≤ log B (n+1)/2 when B =
1?
Deleting a key from a B-tree
• Analogous to insertion but a little more complicated.
• A key can be deleted from any node (not just a leaf) and
can affect its parent and its children (insertion operation
just affect parents).
• One must ensure that a node does not get to small
during deletion (less than B keys).
• As a result deleting a node is the most complex operation
on B-trees. It will be considered in 4 particular cases.
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements.
Procedure:
Just remove the key from the node.
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)
E J T X
A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 1: The key is in a leaf node with more
than B elements (B = 2)
E J T X
A D F G K M O Q U W Y Z
Deleting a key from a B-tree
Case 2: The join procedure
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has exactly B elements.
Procedure:
Remove the key k1.
Let k2 be the key that separates nodes x and y in their parent.
Join the the nodes x and y and move the key k2 from the parent
to the new joined node.
If the parent of x becomes with B-1 elements and also has an
“adjacent brother” with B elements, apply the join procedure
recursively for the parent of x (seen as x) and its adjacent
brother (seen as y).
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
K T X
...
H I O Q U W Y Z
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
F
Parent
K T X
...
H I O Q U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
F
Parent
K T X
...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
F
Parent
K T X
...
H I O U W Y Z
Node x Node y
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
F
Parent
K X
...
H I O T U W Y Z
Join
Deleting a key from a B-tree
• Case 2: Delete key Q (B = 2)
K X
...
H I O T U W Y Z
Deleting a key from a B-tree
Case 3: join and split
• The key k1 to be deleted is in a leaf x with exactly B elements.
• Let y be a node that is an “adjacent brother” of x.
• Suppose that y has more than B elements.
Procedure:
Remove the key k1.
Let k2 be the key that separates nodes x and y in their parent.
Join the the nodes x and y and move the key k2 from the parent
to the new joined node z.
Find the median key m of z
Determine the new nodes x and y by splitting z around m.
Insert m into the parent of x and y.
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
E J T X
A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
E J T X
A C D F G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
E J T X
A C D G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
N
Parent
E J T X
A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
N
Parent
E J T X
A C D G K M O Q U W Y Z
Node y Node x
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
J T X
Join
A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
J T X
Median key
A C D E G
K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
D J T X
Split
A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 3: Delete key F (B = 2)
D J T X
A C E G K M O Q U W Y Z
Deleting a key from a B-tree
Case 4: internal node
• The key k1 to be deleted is in a node x that is not a leaf or
a root.
Procedure:
Let k2 be the smallest key that is greater than k1.
Let y be the node of k2, which will be a leaf.
Insert key k2 into x.
Remove the key k1 from x.
Solve now the problem of removing k2 from a leaf y,
previously considered.
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
D J T X
A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J T X
A C E G K M O Q U W Y Z
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J T X
A C E G K M O Q U W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U X
A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U X
A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U X
A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U X
A C E G K M O Q W Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U
A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U
A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U
A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U
A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
N
Node x
D J U
A C E G K M O Q W X Y Z
Node y
Deleting a key from a B-tree
• Case 4: Delete key T (B = 2)
D J N U
A C E G K M O Q W X Y Z