# B-Trees

B-trees are balanced search trees designed to work well on magnetic disks or direct-access secondary storage devices. B-trees differ significantly from red-black trees in that B-tree nodes may have many children, from a handful to thousands.
Figure 1: B-Tree where $t=2$ and the sequence 9,0,8,1,7,2,6,3,5,4 inserted.
It often takes more time to access a page of information and read it from a disk than it takes for the computer to examine all the information read. For this reason, we look at the two principal components of the running time:
• the number of disk accesses, and
• the CPU time.
We model these disk operations in our pseudo-code as follows. Let $x$ be a pointer to an object. DISK-READ( $x$) to read object $x$ into main memory; DISK-WRITE( $x$) to write object $x$ to the disk.

## 1  Definition

A B-tree is a rooted tree having the following properties.
1. Every node $x$ has the following fields:
• $n\left[x\right]$, the number of keys currently stored in node $x$
• the $n\left[x\right]$ keys themselves, stored in nondecreasing order: ${\mathrm{key}}_{1}\left[x\right]\le {\mathrm{key}}_{2}\left[x\right]\le \dots \le {\mathrm{key}}_{n\left[x\right]}\left[x\right]$, and
• $\mathrm{leaf}\left[x\right]$, a boolean value that is TRUE if $x$ is a leaf and FALSE if $x$ is an internal node.
2. If $x$ is an internal node, it also contains $n\left[x\right]+1$ pointers ${c}_{1}\left[x\right],{c}_{2}\left[x\right],\dots ,{c}_{n\left[x\right]+1}\left[x\right]$ to its children. Leaf nodes have no children, so their ${c}_{i}$ fields are undefined.
3. The keys ${\mathrm{key}}_{i}\left[x\right]$ separate the ranges of keys stored in each subtree: if ${k}_{i}$ is any key stored in the subtree with root ${c}_{i}\left[x\right]$, then

 ${k}_{1}\le {\mathrm{key}}_{1}\left[x\right]\le {k}_{2}\le {\mathrm{key}}_{2}\left[x\right]\le \dots \le {\mathrm{key}}_{n\left[x\right]}\left[x\right]\le {k}_{n\left[x\right]+1}.$

4. Every leaf has the same depth, which is the tree's height $h$.
5. There are lower and upper bounds on the number of keys a node can contain. These bounds can be expressed in terms of a fixed integer $t\ge 2$ called the minimum degree of the B-tree:
• Every node other than the root must have at least $t-1$ keys. Every internal node other than the root thus has at least $t$ children. If the tree is non-empty, the root must have at least one key.
• Every node can contain at most $2t-1$ keys. Therefore, an internal node can have at most $2t$ children. We say that a node is full if it contains exactly $2t-1$ keys.
THEOREM If $n\ge 1$, then for any $n$-key B-tree $T$ of height $h$ and minimum degree $t\ge 2$, then
 $h\le {\mathrm{log}}_{t}\frac{n+1}{2}.$

## 2  Operations

The operations for B-trees include B-TREE-SEARCH, B-TREE-CREATE, B-TREE-INSERT, B-TREE-DELETE, etc. We assume that:
• the root of the B-tree is always in main memory.
• any nodes that are passed as parameters must already have had a DISK-READ operation performed on them.

### 2.1  Search

B-TREE-SEARCH $\left(x,k\right)$
1     $i$ $←$  $1$
2    while  $i\le n\left[x\right]$ and $k>{\mathrm{key}}_{i}\left[x\right]$
3                  do  $i$ $←$  $i+1$
4    if  $i\le n\left[x\right]$ and $k={\mathrm{key}}_{i}\left[x\right]$
5          then return  $\left(x,i\right)$
6    if  $\mathrm{leaf}\left[x\right]$
7          then return  $\text{NIL}$
8          else   DISK-READ $\left({c}_{i}\left[x\right]\right)$
9                      return  B-TREE-SEARCH $\left({c}_{i}\left[x\right],k\right)$

### 2.2  Creation

B-TREE-CREATE $\left(T\right)$
1     $x$ $←$  ALLOCATE-NODE $\left(\right)$
2     $\mathrm{leaf}\left[x\right]$ $←$  $\text{TRUE}$
3     $n\left[x\right]$ $←$  $0$
4     DISK-WRITE $\left(x\right)$
5     $\mathrm{root}\left[T\right]$ $←$  $x$

### 2.3  Insertion

A fundamental operation used during insertion is the splitting of a full node $y$ (having $2t-1$ keys) around its median key ${\mathrm{key}}_{t}\left[y\right]$ into two nodes having $t-1$ keys each. The median key moves up into $y$'s parent, which must be nonfull prior to the splitting of $y$.
Figure 2: The arguments for the B-TREE-SPLIT-CHILD algorithm.
Figure 3: The result of child splitting.
B-TREE-SPLIT-CHILD $\left(x,i,y\right)$
1     $z$ $←$  ALLOCATE-NODE $\left(\right)$
2     $\mathrm{leaf}\left[z\right]$ $←$  $\mathrm{leaf}\left[y\right]$
3     $n\left[z\right]$ $←$  $t-1$
4    for  $j$ $←$  $1$ to  $t-1$
5             do  ${\mathrm{key}}_{j}\left[z\right]$ $←$  ${\mathrm{key}}_{j+t}\left[y\right]$
6    if not $\mathrm{leaf}\left[y\right]$
7          then for  $j$ $←$  $1$ to  $t$
8                               do  ${c}_{j}\left[z\right]$ $←$  ${c}_{j+t}\left[y\right]$
9     $n\left[y\right]$ $←$  $t-1$
10    for  $j$ $←$  $n\left[x\right]+1$ downto  $i+1$
11             do  ${c}_{j+1}\left[x\right]$ $←$  ${c}_{j}\left[x\right]$
12     ${c}_{i+1}\left[x\right]$ $←$  $z$
13    for  $j$ $←$  $n\left[x\right]$ downto  $i$
14             do  ${\mathrm{key}}_{j+1}\left[x\right]$ $←$  ${\mathrm{key}}_{j}\left[x\right]$
15     ${\mathrm{key}}_{i}\left[x\right]$ $←$  ${\mathrm{key}}_{t}\left[y\right]$
16     $n\left[x\right]$ $←$  $n\left[x\right]+1$
17     DISK-WRITE $\left(y\right)$
18     DISK-WRITE $\left(z\right)$
19     DISK-WRITE $\left(x\right)$
Inserting a key $k$ into a B-tree $T$ of height $h$ is done in a single pass down the tree, requiring $O\left(h\right)$ disk accesses. The CPU time required is $O\left(\mathrm{th}\right)=O\left(t{\mathrm{log}}_{t}n\right)$.
B-TREE-INSERT $\left(T,k\right)$
1     $r$ $←$  $\mathrm{root}\left[T\right]$
2    if  $n\left[r\right]=2t-1$
3          then  $s$ $←$  ALLOCATE-NODE $\left(\right)$
4                       $\mathrm{root}\left[T\right]$ $←$  $s$
5                       $\mathrm{leaf}\left[s\right]$ $←$  $\text{FALSE}$
6                       $n\left[s\right]$ $←$  $0$
7                       ${c}_{1}\left[s\right]$ $←$  $r$
8                       B-TREE-SPLIT-CHILD $\left(s,1,r\right)$
9                       B-TREE-INSERT-NONFULL $\left(s,k\right)$
10          else   B-TREE-INSERT-NONFULL $\left(r,k\right)$
Figure 4: See Fig 18.6 (page 446) - create a new node for root if root is $2t-1$
B-TREE-INSERT-NONFULL $\left(x,k\right)$
1     $i$ $←$  $n\left[x\right]$
2    if  $\mathrm{leaf}\left[x\right]$
3          then while  $i\ge 1$ and $k<{\mathrm{key}}_{i}\left[x\right]$
4                                    do  ${\mathrm{key}}_{i+1}\left[x\right]$ $←$  ${\mathrm{key}}_{i}\left[x\right]$
5                                           $i$ $←$  $i-1$
6                       ${\mathrm{key}}_{i+1}\left[x\right]$ $←$  $k$
7                       $n\left[x\right]$ $←$  $n\left[x\right]+1$
8                       DISK-WRITE $\left(x\right)$
9          else  while  $i\ge 1$ and $k<{\mathrm{key}}_{i}\left[x\right]$
10                                    do  $i$ $←$  $i-1$
11                       $i$ $←$  $i+1$
12                       DISK-READ $\left({c}_{i}\left[x\right]\right)$
13                      if  $n\left[{c}_{i}\left[x\right]\right]=2t-1$
14                            then  B-TREE-SPLIT-CHILD $\left(x,i,{c}_{i}\left[x\right]\right)$
15                                        if  $k>{\mathrm{key}}_{i}\left[x\right]$
16                                              then  $i$ $←$  $i+1$
17                       B-TREE-INSERT-NONFULL $\left({c}_{i}\left[x\right],k\right)$
Figure 5: Inserting the sequence 9,0,8,1,7,2,6,3,5,4 into a B-Tree.

### 2.4  Deletion

1. If the key $k$ is in node $x$ and $x$ is a leaf, delete the key $k$ from $x$.
2. If the key $k$ is in node $x$ and $x$ is an internal node , do the following:
• If the child $y$ that precedes $k$ in node $x$ and $x$ has at least $t$ keys, then find the predecessor $k\text{'}$ of $k$ in the subtree rooted at $y$. Recursively delete $k\text{'}$, and replace $k$ by $k\text{'}$ in $x$.
• Symmetrically, if the child $z$ that follows $k$ in node $x$ has at least $t$ keys, then find the successor $k\text{'}$ of $k$ in the subtree rooted at $z$. Recursively delete $k\text{'}$, and replace $k$ by $k\text{'}$ in $x$.
• Otherwise, if both $y$ and $z$ have only $t-1$ keys, merge $k$ and all of $z$ into $y$, so that $x$ loses both $k$ and the pointer to $z$, and $y$ now contains $2t-1$ keys. Then free $z$ and recursively delete $k$ from $y$.
3. If the key $k$ is not present in internal node $x$, determine the root ${c}_{i}\left[x\right]$ of the appropriate subtree that must contain $k$, if $k$ is in the tree at all. If ${c}_{i}\left[x\right]$ has only $t-1$ keys, execute step 3a or 3b as necessary to guarantee that we descend to a node containing at least $t$ keys. Then, finish by recursing on the appropriate child of $x$.
1. If ${c}_{i}\left[x\right]$ has only $t-1$ keys but has a sibling with $t$ keys, give ${c}_{i}\left[x\right]$ an extra key by moving a key from $x$ down into ${c}_{i}\left[x\right]$, moving a key from ${c}_{i}\left[x\right]$'s immediate left or right sibling up into $x$, and moving the appropriate child from the sibling into ${c}_{i}\left[x\right]$.
2. If ${c}_{i}\left[x\right]$ and both of ${c}_{i}\left[x\right]$'s immediate siblings have $t-1$ keys, merge ${c}_{i}$ with one sibling., which involves moving a key from $x$ down into the new merged node to become the median key from that node.

File translated from TEX by TTM, version 3.67.
On 31 Mar 2006, 18:12.