- SEARCH( $\mathrm{table},\mathrm{key}$) $\to $ $\mathrm{value}$
- INSERT( $\mathrm{table},\mathrm{key},\mathrm{value}$)
- DELETE( $\mathrm{table},\mathrm{key}$)

DIRECT-ADDRESS-SEARCH $(T,k)$

1return$T[k]$

DIRECT-ADDRESS-INSERT $(T,k,x)$

1 $T[k]$$\leftarrow $$x$

DIRECT-ADDRESS-DELETE $(T,k)$When $U$ is large, and $K$, the set of all the actual keys, is small, a lot of space is wasted in direct addressing. In many applications, $U$ is very large relative to $K$. An example is a database index. If we are indexing a 32-bit integer column, we will need one slot for each possible integer, in the case of 32-bit, that's over 4 billion slots! Even in a large database, with one million rows, we are still only using 0.025% of our table to index that column. Furthermore, each integer in that column has to be unique in order for the direct address table to work. A solution to this is using hash tables.

1 $T[k]$$\leftarrow $NIL

$h(k)\to \{0,1,\dots ,m-1\},\text{where}k\in U$ |

We say that an element with key $k$ hashes to slot $h(k)$, we also say that $h(k)$ is the hash value of key $k$.

CHAINED-HASH-SEARCH $(T,k)$

1 Search for an element with key $k$ in list $T[h(k)]$

CHAINED-HASH-INSERT $(T,x)$

1 Insert $x$ at the head of list $T[h(\mathrm{key}[x])]$

CHAINED-HASH-DELETE $(T,x)$

1 Delete $x$ from the list $T[h(\mathrm{key}[x])]$

$\alpha =\frac{n}{m}$ |

The worst case behaviour for searching for an element in the table is where every element hashes to the same value. In that case, the time is $O(n)$. However, the average case is much better, at $\Theta (1+\alpha )$.

Theorem:In a hash table in which collisions are resolved by chaining, an unsuccessful search (or a successful search) takes time $\Theta (1+\alpha )$ under the assumption ofsimple uniform hashing- all elements are equally likely to hash into any slot, independently of where any other element has been hashed into.

- Hashing by division
- Hashing by multiplication
- Universal hashing

$h(k)=k\mathrm{mod}m$ |

For example, if $m=23$ and the key $k=107$, then $h(k)=15$.

- Multiply the key $k$ by a constant $A$ in the range $0<A<1$ and extract the fractional part of $\mathrm{kA}$.
- Multiply this value by $m$ and take the floor of the result.

$h(k)=\lfloor m(\mathrm{kA}\mathrm{mod}1)\rfloor$ |

where

$\begin{array}{ccc}\hfill h(k)& \hfill =\hfill & \lfloor 10000\times (123456\times 0.61803...\text{mod}1)\rfloor \hfill \\ \hfill & \hfill =\hfill & \lfloor 10000\times (76300.0041151...\text{mod}1)\rfloor \hfill \\ \hfill & \hfill =\hfill & \lfloor 10000\times 0.0041151...\rfloor \hfill \\ \hfill & \hfill =\hfill & \lfloor 41.151...\rfloor \hfill \\ \hfill & \hfill =\hfill & 41\hfill \end{array}$ |

The advantage of this method is that the value of $m$ is not critical.

$h:U\times \{0,1,\dots ,m-1\},\to \{0,1,\dots ,m-1\}$ |

With open addressing, for each $k$, the probe sequence

$(h(k,0),h(k,1),\dots ,h(k,m-1))$ |

is a permutation of $\{0,1,\dots ,m-1\}$ so that every hash table position is eventually considered as a slot for a new key as the table fills up.

HASH-INSERT $(T,k)$

1 $i$$\leftarrow $$0$

2repeat$j$$\leftarrow $$h(k,i)$

3if$T[j]=\text{NIL}$

4then$T[j]$$\leftarrow $$k$

5return$j$

6else$i$$\leftarrow $$i+1$

7until$i=m$

8error"Hash table overflow."

HASH-SEARCH $(T,k)$

1 $i$$\leftarrow $$0$

2repeat$j$$\leftarrow $$h(k,i)$

3if$T[j]=k$

4thenreturn$j$

5 $i$$\leftarrow $$i+1$

6until$T[j]=\text{NIL}$ or $i=m$

7returnNIL

$h(k,i)=(h\text{'}(k)+i)\mathrm{mod}m$ |

for $i=0,1\dots ,m-1$. Given key $k$, the 1st slot is $T[h\text{'}(k)]$, the 2nd slot is $T[h\text{'}(k)+1]$, and so on up to slot $T[m-1]$. Then we wrap around to slots $T[0],T[1],\dots $ until we finally probe slot $T[h\text{'}(k)-1]$.

$h(k,i)=(h\text{'}(k)+{c}_{1}i+{c}_{2}{i}^{2})\mathrm{mod}m$ |

where $h\text{'}$ is an auxiliary hash function, ${c}_{1}$ and ${c}_{2}$ are constant. This method works much better than linear probing, but to make full use of the hash table, the values ${c}_{1}$ , ${c}_{2}$ and $m$ are constrained. Also, if two keys have the same initial probe position, then their probe sequences are the same since $h({k}_{1},0)=h({k}_{2},0)$ implies $h({k}_{1},i)=h({k}_{2},i)$. This leads to the

$H(k,i)=({h}_{1}(k)+i{h}_{2}(k))\mathrm{mod}m$ |

where ${h}_{1}$ and ${h}_{2}$ are auxiliary hash functions. The initial position is $T[{h}_{1}(k)]$, from then on, successive positions are offset from the previous position by ${h}_{2}(k)$, module $m$.

$\begin{array}{ccc}\hfill {h}_{1}(k)& \hfill =\hfill & k\mathrm{mod}m,\hfill \\ \hfill {h}_{2}(k)& \hfill =\hfill & 1+(k\mathrm{mod}m\text{'})\hfill \end{array}$ |

where $m\text{'}$ is chosen to be slightly less than $m$ (say $m-1,m-2$). $k=123456$ and $m=701$, we have ${h}_{1}(k)=80$ and ${h}_{2}(k)=257$. Double hashing represents an improvement over linear or quadratic probing in that $\Theta ({m}^{2})$ probe sequences are used, rather than $\Theta (m)$, since each possible $({h}_{1}(k),{h}_{2}(k))$ pair yields a distinct probe sequence, and as we vary the key, the initial probe position and the offset may vary independently.

File translated from T

On 31 Mar 2006, 18:12.