0% found this document useful (0 votes)
46 views75 pages

Understanding Hashing in Data Structures

detailed notes on hash tables

Uploaded by

lkithlik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views75 pages

Understanding Hashing in Data Structures

detailed notes on hash tables

Uploaded by

lkithlik
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Hashing:

Hashing in Data Structure-


In data structures,
• Hashing is a well-known technique to search any
particular element among several elements.
• It minimizes the number of comparisons while
performing the search.

Advantage-
• Unlike other searching techniques,
• Hashing is extremely efficient.
• The time taken by it to perform the search does not
depend upon the total number of elements.
• It completes the search with constant time complexity
O(1).
2
Hashing Mechanism-
In hashing,
• An array data structure called as Hash table is used to
store the data items.
• Based on the hash key value, data items are inserted into
the hash table.

Hash Key Value-


• Hash key value is a special value that serves as an index
for a data item.
• It indicates where the data item should be be stored in the
hash table.
• Hash key value is generated using a hash function.

3
4
Hash Function-
• Hash function is a function that maps any big number or
string to a small integer value.
• Hash function takes the data item as an input and returns
a small integer value as an output.
• The small integer value is called as a hash value.
• Hash value of the data item is then used as an index for
storing it into the hash table.

Types of Hash Functions-


• There are various types of hash functions available such
as-
• Mid Square Hash Function
• Division Hash Function
• Folding Hash Function etc

• It depends on the user which hash function he wants to 5


use.
The properties of a good hash function are-
• It is efficiently computable.
• It minimizes the number of collisions.
• It distributes the keys uniformly over the table.

6
Hashing: Collision Resolution Schemes
• Collision Resolution Techniques

• Separate Chaining
– Separate Chaining with String Keys
– The class hierarchy of Hash Tables
– Implementation of Separate Chaining

• Introduction to Collision Resolution using Open


Addressing
– Linear Probing
– Quadratic Probing
– Double Hashing
– Rehashing

• Algorithms for insertion, searching, and deletion in Open


Addressing
• Separate Chaining versus Open-addressing
7
Collision Resolution Techniques

• There are two broad ways of collision resolution:

1. Separate Chaining:: An array of linked list implementation.

2. Open Addressing: Array-based implementation.

(i) Linear probing (linear search)


(ii) Quadratic probing (nonlinear search)
(iii) Double hashing (uses two hash functions)

8
Collision Resolution Techniques

9
Collision Resolution Techniques
Which is the Preferred Technique?

The performance of both the techniques depend on the kind


of operations that are required to be performed on the keys
stored in the hash table-

10
Collision Resolution Techniques

11
12
13
14
15
16
17
18
Separate Chaining
• The hash table is implemented as an array of linked lists.

• Inserting an item, r, that hashes at index i is simply insertion into the linked list at
position i.

• Synonyms are chained in the same linked list.

19
Separate Chaining (cont’d)
• Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list
at position i.
• Deletion of an item, r, with hash address, i, is simply deleting r from the linked list
at position i.
• Example: Load the keys 23, 13, 21, 14, 7, 8, and 15 , in this order, in a hash table
of size 7 using separate chaining with the hash function: h(key) = key % 7
h(23) = 23 % 7 = 2
h(13) = 13 % 7 = 6
h(21) = 21 % 7 = 0
h(14) = 14 % 7 = 0 collision
h(7) = 7 % 7 = 0 collision
h(8) = 8 % 7 = 1
h(15) = 15 % 7 = 1 collision

20
Separate Chaining with String Keys
• Recall that search keys can be numbers, strings or some other object.
• A hash function for a string s = c0c1c2…cn-1 can be defined as:
hash = (c0 + c1 + c2 + … + cn-1) % tableSize
this can be implemented as:
public static int hash(String key, int tableSize){
int hashValue = 0;
for (int i = 0; i < [Link](); i++){
hashValue += [Link](i);
}
return hashValue % tableSize;
}

• Example: The following class describes commodity items:

class CommodityItem {
String name; // commodity name
int quantity; // commodity quantity needed
double price; // commodity price
}
21
Separate Chaining with String Keys (cont’d)
• Use the hash function hash to load the following commodity items into a
hash table of size 13 using separate chaining:
onion 1 10.0
tomato 1 8.50
cabbage 3 3.50
carrot 1 5.50
okra 1 6.50
mellon 2 10.0
potato 2 7.50
Banana 3 4.00
olive 2 15.0
salt 2 2.50
cucumber 3 4.50
mushroom 3 5.50
orange 2 3.00
• Solution:

hash(onion) = (111 + 110 + 105 + 111 + 110) % 13 = 547 % 13 = 1


hash(salt) = (115 + 97 + 108 + 116) % 13 = 436 % 13 = 7
hash(orange) = (111 + 114 + 97 + 110 + 103 + 101)%13 = 636 %13 = 12 22
Separate Chaining with String Keys (cont’d)
0 okra potato

1 onion carrot
2 Item Qty Price h(key)
onion 1 10.0 1
3
tomato 1 8.50 10
4 cabbage 3 3.50 4
cabbage carrot 1 5.50 1
5 okra 1 6.50 0
mellon 2 10.0 10
6 mushroom potato 2 7.50 0
Banana 3 4.0 11
7 olive 2 15.0 10
salt salt 2 2.50 7
8 cucumber 3 4.50 9
mushroom 3 5.50 6
9 cucumber orange 2 3.00 12

10 tomato mellon olive


11 banana
12 orange 23
Separate Chaining with String Keys (cont’d)
• Alternative hash functions for a string
s = c0c1c2…cn-1
exist, some are:
• hash = (c0 + 27 * c1 + 729 * c2) % tableSize
• hash = (c0 + cn-1 + [Link]()) % tableSize
s .length ()−1
• hash = [  26 * k + [Link](k)−' ']%tableSize
k =0

24
Implementing Hash Tables: The Hierarchy Tree

Container AbstractContainer

SearchableContainer

HashTable AbstractHashTable

ChainedHashTable

OpenScatterTable
25
Implementation of Separate Chaining
public class ChainedHashTable extends AbstractHashTable {
protected MyLinkedList [ ] array;
public ChainedHashTable(int size) {
array = new MyLinkedList[size];
for(int j = 0; j < size; j++)
array[j] = new MyLinkedList( );
}
public void insert(Object key) {
array[h(key)].append(key); count++;
}
public void withdraw(Object key) {
array[h(key)].extract(key); count--;
}
public Object find(Object key){
int index = h(key);
[Link] e = array[index].getHead( );
while(e != null){
if([Link]([Link]()) return [Link]();
e = [Link]();
}
return null;
}
26
}
Introduction to Open Addressing
• All items are stored in the hash table itself.
• In addition to the cell data (if any), each cell keeps one of the three states: EMPTY,
OCCUPIED, DELETED.
• While inserting, if a collision occurs, alternative cells are tried until an empty cell is found.

• Deletion: (lazy deletion): When a key is deleted the slot is marked as DELETED rather than
EMPTY otherwise subsequent searches that hash at the deleted cell will fail.

• Probe sequence: A probe sequence is the sequence of array indexes that is followed in
searching for an empty cell during an insertion, or in searching for a key during find or delete
operations.

• The most common probe sequences are of the form:


hi(key) = [h(key) + c(i)] % n, for i = 0, 1, …, n-1.
where h is a hash function and n is the size of the hash table
• The function c(i) is required to have the following two properties:
Property 1: c(0) = 0
Property 2: The set of values {c(0) % n, c(1) % n, c(2) % n, . . . , c(n-1) % n} must be a
permutation of {0, 1, 2,. . ., n – 1}, that is, it must contain every integer between 0 and n - 1
inclusive.

27
Introduction to Open Addressing

28
Introduction to Open Addressing

29
Introduction to Open Addressing

30
Introduction to Open Addressing

31
Introduction to Open Addressing (cont’d)
• The function c(i) is used to resolve collisions.

• To insert item r, we examine array location h0(r) = h(r). If there is a collision, array locations
h1(r), h2(r), ..., hn-1(r) are examined until an empty slot is found.

• Similarly, to find item r, we examine the same sequence of locations in the same order.

• Note: For a given hash function h(key), the only difference in the open addressing collision
resolution techniques (linear probing, quadratic probing and double hashing) is in the
definition of the function c(i).
• Common definitions of c(i) are:

Collision resolution technique c(i)


Linear probing i
Quadratic probing ±i2
Double hashing i*hp(key)

where hp(key) is another hash function. 32


Introduction to Open Addressing (cont'd)
• Advantages of Open addressing:
– All items are stored in the hash table itself. There is no need for
another data structure.
– Open addressing is more efficient storage-wise.

• Disadvantages of Open Addressing:


– The keys of the objects to be hashed must be distinct.
– Dependent on choosing a proper table size.
– Requires the use of a three-state (Occupied, Empty, or Deleted)
flag in each cell.

33
Open Addressing Facts
• In general, primes give the best table sizes.

• With any open addressing method of collision resolution,


as the table fills, there can be a severe degradation in the table performance.

• Load factors between 0.6 and 0.7 are common.

• Load factors > 0.7 are undesirable.

• The search time depends only on the load factor, not on the table size.

• We can use the desired load factor to determine appropriate table size:

34
Open Addressing Facts

35
Open Addressing Facts

36
Open Addressing: Linear Probing
• c(i) is a linear function in i of the form c(i) = a*i.
• Usually c(i) is chosen as:
c(i) = i for i = 0, 1, . . . , tableSize – 1

• The probe sequences are then given by:


hi(key) = [h(key) + i] % tableSize for i = 0, 1, . . . , tableSize – 1

• For c(i) = a*i to satisfy Property 2, a and n must be relatively


prime.

37
Open Addressing: Linear Probing

38
Open Addressing: Linear Probing

39
Open Addressing: Linear Probing

40
Open Addressing: Linear Probing

41
Open Addressing: Linear Probing

42
Linear Probing (cont’d)
Example: Perform the operations given below, in the given order, on
an initially empty hash table of size 13 using linear probing with
c(i) = i and the hash function: h(key) = key % 13:

insert(18), insert(26), insert(35), insert(9), find(15), find(48),


delete(35), delete(40), find(9), insert(64), insert(47), find(35)

• The required probe sequences are given by:


hi(key) = (h(key) + i) % 13 i = 0, 1, 2, . . ., 12

43
Linear Probing (cont’d)
a

Index Status Value

0 O 26
1 E
2 E
3 E
4 E
5 O 18
6 E
7 E
8 O 47
9 D 35
10 O 9
11 E
12 O 64
44
Disadvantage of Linear Probing: Primary Clustering
• Linear probing is subject to a primary clustering phenomenon.
• Elements tend to cluster around table locations that they originally hash to.
• Primary clusters can combine to form larger clusters. This leads to long probe
sequences and hence deterioration in hash table efficiency.

Example of a primary cluster: Insert keys: 18, 41, 22, 44, 59, 32, 31, 73, in this order, in an
originally empty hash table of size 13, using the hash function h(key) = key % 13 and c(i) = i:
h(18) = 5
h(41) = 2
h(22) = 9
h(44) = 5+1
h(59) = 7
h(32) = 6+1+1
h(31) = 5+1+1+1+1+1
h(73) = 8+1+1+1
45
Open Addressing: Quadratic Probing
• Quadratic probing eliminates primary clusters.
• c(i) is a quadratic function in i of the form c(i) = a*i2 + b*i. Usually c(i) is chosen
as:
c(i) = i2 for i = 0, 1, . . . , tableSize – 1
or
c(i) = i2 for i = 0, 1, . . . , (tableSize – 1) / 2

• The probe sequences are then given by:


hi(key) = [h(key) + i2] % tableSize for i = 0, 1, . . . , tableSize – 1
or
hi(key) = [h(key)  i2] % tableSize for i = 0, 1, . . . , (tableSize – 1) / 2

• Note for Quadratic Probing:


➢ Hashtable size should not be an even number; otherwise Property 2 will not be
satisfied.
➢ Ideally, table size should be a prime of the form 4j+3, where j is an integer. This
choice of table size guarantees Property 2. 46
Quadratic Probing (cont’d)
• Example: Load the keys 23, 13, 21, 14, 7, 8, and 15, in this order,
in a hash table of size 7 using quadratic probing with c(i) = i2 and
the hash function: h(key) = key % 7
• The required probe sequences are given by:
hi(key) = (h(key)  i2) % 7 i = 0, 1, 2, 3

47
Quadratic Probing (cont’d)
h0(23) = (23 % 7) % 7 = 2 hi(key) = (h(key)  i2) % 7 i = 0, 1, 2, 3
h0(13) = (13 % 7) % 7 = 6
h0(21) = (21 % 7) % 7 = 0
h0(14) = (14 % 7) % 7 = 0 collision
0 O 21
h1(14) = (0 + 12) % 7 = 1
h0(7) = (7 % 7) % 7 = 0 collision
h1(7) = (0 + 12) % 7 = 1 collision 1 O 14
h-1(7) = (0 - 12) % 7 = -1
NORMALIZE: (-1 + 7) % 7 = 6 collision 2 O 23
h2(7) = (0 + 22) % 7 = 4
h0(8) = (8 % 7)%7 = 1 collision 3 O 15
h1(8) = (1 + 12) % 7 = 2 collision
h-1(8) = (1 - 12) % 7 = 0 collision 4 O 7
h2(8) = (1 + 22) % 7 = 5
h0(15) = (15 % 7)%7 = 1 collision 5 O 8
2
h1(15) = (1 + 1 ) % 7 = 2 collision
h-1(15) = (1 - 12) % 7 = 0 collision 6 O 13
2
h2(15) = (1 + 2 ) % 7 = 5 collision
h-2(15) = (1 - 22) % 7 = -3
NORMALIZE: (-3 + 7) % 7 = 4 collision 48
h3(15) = (1 + 32)%7 = 3
Secondary Clusters
• Quadratic probing is better than linear probing because it eliminates primary
clustering.
• However, it may result in secondary clustering: if h(k1) = h(k2) the probing
sequences for k1 and k2 are exactly the same. This sequence of locations is called
a secondary cluster.
• Secondary clustering is less harmful than primary clustering because secondary
clusters do not combine to form large clusters.
• Example of Secondary Clustering: Suppose keys k0, k1, k2, k3, and k4 are
inserted in the given order in an originally empty hash table using quadratic
probing with c(i) = i2. Assuming that each of the keys hashes to the same array
index x. A secondary cluster will develop and grow in size:

49
Double Hashing
• To eliminate secondary clustering, synonyms must have different probe sequences.

• Double hashing achieves this by having two hash functions that both depend on the
hash key.

• c(i) = i * hp(key) for i = 0, 1, . . . , tableSize – 1


where hp (or h2) is another hash function.

• The probing sequence is:


hi(key) = [h(key) + i*hp(key)]% tableSize for i = 0, 1, . . . , tableSize – 1

• The function c(i) = i*hp(r) satisfies Property 2 provided hp(r) and tableSize are
relatively prime.

• To guarantee Property 2, tableSize must be a prime number.

• Common definitions for hp are :


➢ hp(key) = 1 + key % (tableSize - 1)
➢ hp(key) = q - (key % q) where q is a prime less than tableSize
➢ hp(key) = q*(key % q) where q is a prime less than tableSize
50
Double Hashing (cont'd)
Performance of Double hashing:
– Much better than linear or quadratic probing because it eliminates both primary
and secondary clustering.
– BUT requires a computation of a second hash function hp.

Example: Load the keys 18, 26, 35, 9, 64, 47, 96, 36, and 70 in this order, in an
empty hash table of size 13
(a) using double hashing with the first hash function: h(key) = key % 13 and the
second hash function: hp(key) = 1 + key % 12
(b) using double hashing with the first hash function: h(key) = key % 13 and
the second hash function: hp(key) = 7 - key % 7
Show all computations.

51
Double Hashing (cont’d)

h0(18) = (18%13)%13 = 5 hi(key) = [h(key) + i*hp(key)]% 13


h0(26) = (26%13)%13 = 0 h(key) = key % 13
h0(35) = (35%13)%13 = 9
h0(9) = (9%13)%13 = 9 collision hp(key) = 1 + key % 12
hp(9) = 1 + 9%12 = 10
h1(9) = (9 + 1*10)%13 = 6
h0(64) = (64%13)%13 = 12
h0(47) = (47%13)%13 = 8
h0(96) = (96%13)%13 = 5 collision
hp(96) = 1 + 96%12 = 1
h1(96) = (5 + 1*1)%13 = 6 collision
h2(96) = (5 + 2*1)%13 = 7
h0(36) = (36%13)%13 = 10
h0(70) = (70%13)%13 = 5 collision
hp(70) = 1 + 70%12 = 11
h1(70) = (5 + 1*11)%13 = 3
52
Double Hashing (cont'd)

h0(18) = (18%13)%13 = 5 hi(key) = [h(key) + i*hp(key)]% 13


h0(26) = (26%13)%13 = 0 h(key) = key % 13
h0(35) = (35%13)%13 = 9
h0(9) = (9%13)%13 = 9 collision hp(key) = 7 - key % 7
hp(9) = 7 - 9%7 = 5
h1(9) = (9 + 1*5)%13 = 1
h0(64) = (64%13)%13 = 12
h0(47) = (47%13)%13 = 8
h0(96) = (96%13)%13 = 5 collision
hp(96) = 7 - 96%7 = 2
h1(96) = (5 + 1*2)%13 = 7
h0(36) = (36%13)%13 = 10
h0(70) = (70%13)%13 = 5 collision
hp(70) = 7 - 70%7 = 7
h1(70) = (5 + 1*7)%13 = 12 collision
h2(70) = (5 + 2*7)%13 = 6
53
Implementation of Open Addressing
public class OpenScatterTable extends AbstractHashTable {
protected Entry array[];
protected static final int EMPTY = 0;
protected static final int OCCUPIED = 1;
protected static final int DELETED = 2;

protected static final class Entry {


public int state = EMPTY;
public Comparable object;
// …
}

public OpenScatterTable(int size) {


array = new Entry[size];
for(int i = 0; i < size; i++)
array[i] = new Entry();
}
// …
}

54
Implementation of Open Addressing (Con’t.)
/* finds the index of the first unoccupied slot
in the probe sequence of obj */
protected int findIndexUnoccupied(Comparable obj){
int hashValue = h(obj);
int tableSize = getLength();
int indexDeleted = -1;
for(int i = 0; i < tableSize; i++){
int index = (hashValue + c(i)) % tableSize;
if(array[index].state == OCCUPIED
&& [Link](array[index].object))
throw new IllegalArgumentException(
"Error: Duplicate key");
else if(array[index].state == EMPTY ||
(array[index].state == DELETED &&
[Link](array[index].object)))
return indexDeleted ==-1?index:indexDeleted;
else if(array[index].state == DELETED &&
indexDeleted == -1)
indexDeleted = index;
}
if(indexDeleted != -1) return indexDeleted;
throw new IllegalArgumentException(
"Error: Hash table is full");
}
55
Implementation of Open Addressing (Con’t.)
protected int findObjectIndex(Comparable obj){
int hashValue = h(obj);
int tableSize = getLength();

for(int i = 0; i < tableSize; i++){


int index = (hashValue + c(i)) % tableSize;
if(array[index].state == EMPTY
|| (array[index].state == DELETED
&& [Link](array[index].object)))
return -1;
else if(array[index].state == OCCUPIED
&& [Link](array[index].object))
return index;
}
return -1;
}

public Comparable find(Comparable obj){


int index = findObjectIndex(obj);
if(index >= 0)return array[index].object;
else return null; 56
}
Implementation of Open Addressing (Con’t.)
public void insert(Comparable obj){
if(count == getLength()) throw new ContainerFullException();
else {
int index = findIndexUnoccupied(obj);
// throws exception if an UNOCCUPIED slot is not found
array[index].state = OCCUPIED;
array[index].object = obj;
count++;
}
}

public void withdraw(Comparable obj){


if(count == 0) throw new ContainerEmptyException();
int index = findObjectIndex(obj);
if(index < 0)
throw new IllegalArgumentException("Object not found");
else {
array[index].state = DELETED;
// lazy deletion: DO NOT SET THE LOCATION TO null
count--;
}
} 57
Separate Chaining versus Open-addressing
Separate Chaining has several advantages over open addressing:
• Collision resolution is simple and efficient.
• The hash table can hold more elements without the large
performance deterioration of open addressing (The load factor can
be 1 or greater)
• The performance of chaining declines much more slowly than
open addressing.
• Deletion is easy - no special flag values are necessary.
• Table size need not be a prime number.
• The keys of the objects to be hashed need not be unique.
Disadvantages of Separate Chaining:
• It requires the implementation of a separate data structure for
chains, and code to manage it.
• The main cost of chaining is the extra space required for the
linked lists.
• For some languages, creating new nodes (for linked lists) is
expensive and slows down the system.
58
REHASHING
1. Rehashing is a collision resolution technique.

2. Rehashing is a technique in which the table is resized, i.e., the


size of table is doubled by creating a new table. It is preferable is
the total size of table is a prime number. There are situations in
which the rehashing is required.

• When table is completely full

• With quadratic probing when the table is filled half.

• When insertions fail due to overflow.

In such situations, we have to transfer entries from old table to the


new table by re computing their positions using hash functions.

59
REHASHING
Consider we have to insert the elements 37, 90, 55, 22, 17, 49, and
87. the table size is 10 and will use hash function:

H(key) = key mod tablesize

37 % 10 = 7

90 % 10= 0

55 % 10 = 5

22 % 10 = 2

17 % 10 = 7

49 % 10 = 9
60
REHASHING
Now this table is almost full and if we try to insert more elements
collisions will occur and eventually further insertions will fail.
Hence we will rehash by doubling the table size. The old table
size is 10 then we should double this size for new table, that
becomes 20. But 20 is not a prime number, we will prefer to make
the table size as 23. And new hash function will be

H(key)= key mod 23


37 % 23 = 14
90 % 23 = 21
55 % 23 = 9
22 % 23 = 22
17 % 23 = 17
49 % 23 = 3
87 % 23 = 18
Now the hash table is sufficiently large to accommodate new
insertions.
61
REHASHING
Advantages:
1. This technique provides the programmer a flexibility to enlarge
the table size if required.
2. Only the space gets doubled with simple hash function which
avoids occurrence of collisions.

62
EXTENDIBLE HASHING
• Extendible Hashing is a dynamic hashing method wherein directories, and
buckets are used to hash data. It is an aggressively flexible method in which the
hash function also experiences dynamic changes.

• Main features of Extendible Hashing: The main features in this hashing


technique are:
– Directories: The directories store addresses of the buckets in pointers. An id is
assigned to each directory which may change each time when Directory Expansion
takes place.
– Buckets: The buckets are used to hash the actual data.
– Basic Structure of Extendible Hashing:

63
EXTENDIBLE HASHING
Frequently used terms in Extendible Hashing:
• Directories: These containers store pointers to buckets. Each directory is given a unique
id which may change each time when expansion takes place. The hash function returns
this directory id which is used to navigate to the appropriate bucket. Number of
Directories = 2^Global Depth.
• Buckets: They store the hashed keys. Directories point to buckets. A bucket may
contain more than one pointers to it if its local depth is less than the global depth.
• Global Depth: It is associated with the Directories. They denote the number of bits
which are used by the hash function to categorize the keys. Global Depth = Number of
bits in directory id.
• Local Depth: It is the same as that of Global Depth except for the fact that Local Depth
is associated with the buckets and not the directories. Local depth in accordance with
the global depth is used to decide the action that to be performed in case an overflow
occurs. Local Depth is always less than or equal to the Global Depth.
• Bucket Splitting: When the number of elements in a bucket exceeds a particular size,
then the bucket is split into two parts.
• Directory Expansion: Directory Expansion Takes place when a bucket overflows.
Directory Expansion is performed when the local depth of the overflowing bucket is
equal to the global depth.

64
EXTENDIBLE HASHING
Basic Working of Extendible Hashing:

65
EXTENDIBLE HASHING
Basic Working of Extendible Hashing:
Step 1 – Analyze Data Elements: Data elements may exist in various forms eg. Integer,
String, Float, etc.. Currently, let us consider data elements of type integer. eg: 49.

Step 2 – Convert into binary format: Convert the data element in Binary form. For string
elements, consider the ASCII equivalent integer of the starting character and then convert
the integer into binary form. Since we have 49 as our data element, its binary form is
110001.

Step 3 – Check Global Depth of the directory. Suppose the global depth of the Hash-
directory is 3.

Step 4 – Identify the Directory: Consider the ‘Global-Depth’ number of LSBs in the binary
number and match it to the directory id.
Eg. The binary obtained is: 110001 and the global-depth is 3. So, the hash function will
return 3 LSBs of 110001 viz. 001.

Step 5 – Navigation: Now, navigate to the bucket pointed by the directory with directory-id
001.

Step 6 – Insertion and Overflow Check: Insert the element and check if the bucket
overflows. If an overflow is encountered, go to step 7 followed by Step 8, otherwise, go to
step 9.
66
EXTENDIBLE HASHING
Step 7 – Tackling Over Flow Condition during Data Insertion: Many times, while inserting
data in the buckets, it might happen that the Bucket overflows. In such cases, we need to
follow an appropriate procedure to avoid mishandling of data.
First, Check if the local depth is less than or equal to the global depth. Then choose one of
the cases below.
• Case1: If the local depth of the overflowing Bucket is equal to the global depth, then
Directory Expansion, as well as Bucket Split, needs to be performed. Then increment
the global depth and the local depth value by 1. And, assign appropriate pointers.
Directory expansion will double the number of directories present in the hash structure.
• Case2: In case the local depth is less than the global depth, then only Bucket Split takes
place. Then increment only the local depth value by 1. And, assign appropriate
pointers.

67
EXTENDIBLE HASHING
Step 8 – Rehashing of Split Bucket Elements: The Elements present in the overflowing
bucket that is split are rehashed w.r.t the new global depth of the directory.
Step 9 – The element is successfully hashed.

68
EXTENDIBLE HASHING

69
EXTENDIBLE HASHING

70
EXTENDIBLE HASHING

71
EXTENDIBLE HASHING

72
EXTENDIBLE HASHING

73
EXTENDIBLE HASHING

74
EXTENDIBLE HASHING

75
EXTENDIBLE HASHING

76

You might also like