0% found this document useful (0 votes)

5 views50 pages

Data Structure and Algorithm Hashing Tables

The document discusses hashing as a data structure that uses hash functions to map values to keys for efficient data access. It outlines various methods for implementing hashing, including direct access tables, open addressing, and chaining, along with their advantages and limitations. The document also covers performance metrics and provides examples of hash functions and collision handling techniques.

Uploaded by

Edward K Baah Quainoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views50 pages

Data Structure and Algorithm Hashing Tables

Uploaded by

Edward K Baah Quainoo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

HASHING

Lecture 6
Recommended Text
• Tomas H. Cormen et al., Introduction to
Algorithms, MIT press
Hashing Data Structure
Hashing is an important Data Structure which is
designed to use a special function called the
Hash function which is used to map a given
value with a particular key for faster access of
elements.
The efficiency of mapping depends of the
efficiency of the hash function used.
Hashing Data Structure
Let a hash function H(x) maps the value x at
the index x%10 in an Array.

For example if the list of values is

[11,12,13,14,15] it will be stored at positions
{1,2,3,4,5} in the array or Hash table
respectively.
Hashing Data Structure
Hashing Data Structure
Suppose we want to design a system for storing
employee records keyed using phone numbers.
And we want following queries to be performed
efficiently:
i. Insert a phone number and corresponding
information.
ii. Search a phone number and fetch the
information.
iii. Delete a phone number and related
information.
Hashing Data Structure
We can think of using the following data
structures to maintain information about
different phone numbers.

i. Array of phone numbers and records.

ii. Linked List of phone numbers and records.
iii. Balanced binary search tree with phone
numbers as keys.
iv. Direct Access Table.
For arrays and linked lists
For arrays and linked lists, we need to search
in a linear fashion, which can be costly in
practice.
If we use arrays and keep the data sorted, then
a phone number can be searched in O(Logn)
time using Binary Search, but insert and delete
operations become costly as we have to
maintain sorted order.
Balanced Binary Search
Tree
With balanced binary search tree, we get
moderate search, insert and delete times.
All of these operations can be guaranteed to be
in O(Logn) time.
Direct Access Table
Another solution that one can think of is to use a
DIRECT ACCESS TABLE where we make a big
array and use phone numbers as index in the
array.
An entry in array is NIL if phone number is not
present, else the array entry stores pointer to
records corresponding to phone number.
Time complexity wise this solution is the best
among all, we can do all operations in O(1)
time.
Direct Access Table
Direct Address Table is a data structure that
has the capability of mapping records to their
corresponding keys using arrays.
In direct address tables, records are placed
using their key values directly as indexes.
They facilitate fast searching, insertion and
deletion operations.
Direct Access Table
Example:
Create an array of size equal to maximum value
plus one (assuming 0 based index) and then use
values as indexes.
For example, in the following diagram key 21 is
used directly as index.
Direct Access Table
Advantages
Searching in O(1) Time: Direct address tables
use arrays which are random access data
structure, so, the key values (which are also the
index of the array) can be easily used to search
the records in O(1) time.
Insertion in O(1) Time: We can easily insert
an element in an array in O(1) time. The same
thing follows in a direct address table also.
Advantages
Deletion in O(1) Time: Deletion of an element
takes O(1) time in an array. Similarly, to delete
an element in a direct address table we need
O(1) time.
Advantages
Deletion in O(1) Time: Deletion of an element
takes O(1) time in an array. Similarly, to delete
an element in a direct address table we need
O(1) time.
Limitations
i. Prior knowledge of maximum key value
ii. Practically useful only if the maximum value
is very less.
iii. It causes wastage of memory space if there
is a significant difference between total
records and maximum value.
Limitations
This solution has many practical limitations.
First problem with this solution is that a huge
extra space is required.
For example if phone number is n digits, we need
O(m * 10n) space for table where m is size of a
pointer to record and n, digits of the telephone
number
Another problem is an integer in a
programming language may not store n digits.
Hashing Data Structure
Due to above limitations Direct Access Table
cannot always be used. Hashing is the solution
that can be used in almost all such situations
and performs extremely well compared to above
data structures like Array, Linked List, Balanced
BST in practice.
With hashing we get O(1) search time on
average (under reasonable assumptions) and
O(n) in worst case.
Hashing Data Structure
Hashing is an improvement over Direct Access
Table.
The idea is to use hash function that converts
a given phone number or any other key to a
smaller number and uses the small number as
index in a table called hash table.
Hashing Data Structure
Application
i. Database Indexing
ii. Caching
iii. Error Checking
iv. Programme Compilation
v. Password Authentication
Hash Function (Algorithm)
A function that converts a given big phone
number to a small practical integer value.
The mapped integer value is used as an index in
hash table. In simple terms, a hash function
maps a big number or string to a small integer
that can be used as index in hash table.
Properties of a Good Hash
Function
i. Efficiently computable.
ii. Should uniformly distribute the keys (Each
table position equally likely for each key)

For example for phone numbers a bad hash

function is to take first three digits.
A better function is consider last three digits.
NB: This may not be the best hash function.
There may be better ways.
Hash Table
An array that stores pointers to records
corresponding to a given phone number.
An entry in hash table is NIL if no existing phone
number has hash function value equal to the
index for the entry.
Collision Handling
Since a hash function gets us a small number
for a big key, there is possibility that two keys
result in same value.
The situation where a newly inserted key maps
to an already occupied slot in hash table is
called collision and must be handled using
some collision handling technique.
Collision Handling
Technique
i. Open Addressing
ii. Chaining
Open Addressing
In open addressing, all elements are stored in
the hash table itself.
Each table entry contains either a record or NIL.
Any point, size of the table must be greater than
or equal to the total number of keys (Note that
we can increase table size by copying old data if
needed)
Open Addressing
When searching for an element, we one by one
examine table slots until the desired element is
found or it is clear that the element is not in the
table.
Open Addressing
Insert(k): Keep probing until an empty slot is
found. Once an empty slot is found, insert k.
Search(k): Keep probing until slot’s key doesn’t
become equal to k or an empty slot is reached
Delete(k): Delete operation is interesting. If we
simply delete a key, then search may fail.
So slots of deleted keys are marked specially as
“deleted”.
Insert can insert an item in a deleted slot, but
the search doesn’t stop at a deleted slot.
Open Addressing (Types)
i. Linear Probing
ii. Quadratic Probing
iii. Double Hashing
Linear Probing
In linear probing, we linearly probe for next slot.
For example, typical gap between two probes is
1.

let hash(x) be the slot index computed using

hash function and S be the table size
Linear Probing
If slot hash(x) % S is full,
then we try (hash(x) + 1) % S

If (hash(x) + 1) % S is also full,

then we try (hash(x) + 2) % S

If (hash(x) + 2) % S is also full,

then we try (hash(x) + 3) % S
..................................................
..................................................
Linear Probing
Let us consider a simple hash function as “key
mod 7” and sequence of keys as 50, 700, 76,
85, 92, 73, 101.
Linear Probing
Let us consider a simple hash function as “key
mod 7” and sequence of keys as 50, 700, 76,
85, 92, 73, 101.
Linear Probing
Clustering: The main problem with linear
probing is clustering, many consecutive
elements form groups and it starts taking time
to find a free slot or to search an element.
Quadratic Probing
We look for i2th slot in ith iteration.
let hash(x) be the slot index computed using hash
function.
If slot hash(x) % S is full,
then we try (hash(x) + 1*1) % S
If (hash(x) + 1*1) % S is also full,
then we try (hash(x) + 2*2) % S
If (hash(x) + 2*2) % S is also full,
then we try (hash(x) + 3*3) % S
..................................................
..................................................
Double Hashing
We use another hash function hash2(x) and look for
i*hash2(x) slot in i’th rotation.

let hash(x) be the slot index computed using hash function.

If slot hash(x) % S is full,
then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full,
then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full,
then we try (hash(x) + 3*hash2(x)) % S
.............................................
.............................................
Comparison
Linear probing has the best cache
performance but suffers from clustering.
One more advantage of Linear probing is
easy to compute.
Quadratic probing lies between the two in
terms of cache performance and clustering.
Double hashing has poor cache performance
but no clustering.
Double hashing requires more computation
time as two hash functions need to be
computed.
Performance of Open
Addressing
m = Number of slots in the hash table
n = Number of keys to be inserted in the hash table

Load factor α = n/m ( < 1 )

Expected time to search/insert/delete < 1/(1 - α)

So Search, Insert and Delete take (1/(1 - α)) time

Chaining
The idea is to make each cell of hash table point
to a linked list of records that have same hash
function value.
Chaining is simple, but requires additional
memory outside the table.
Chaining
Consider a simple hash function as “key mod 7”
and sequence of keys as 50, 700, 76, 85, 92, 73,
101.
Chaining
Advantages
i. Simple to implement.
ii. Hash table never fills up, we can always add
more elements to chain.
iii. Less sensitive to the hash function or load
factors.
iv. It is mostly used when it is unknown how
many and how frequently keys may be
inserted or deleted.
Disadvantages
i. Cache performance of chaining is not good
as keys are stored using linked list. Open
addressing provides better cache
performance as everything is stored in same
table.
ii. Wastage of Space (Some Parts of hash table
are never used)
iii. If the chain becomes long, then search time
can become O(n) in worst case.
iv. Uses extra space for links.
Performance of Chaining
Performance of hashing can be evaluated under
the assumption that each key is equally likely to
be hashed to any slot of table (simple uniform
hashing).
Performance of Chaining
m = Number of slots in hash table
n = Number of keys to be inserted in hash table

Load factor α = n/m

Expected time to search = O(1 + α)

Expected time to insert/delete = O(1 + α)

Time complexity of search insert and delete is

O(1) if α is O(1)
Chaining Open Addressing
Chaining is Simpler to Open Addressing requires more
implement. computation.

In chaining, Hash table never

In open addressing, table may
fills up, we can always add more
become full.
elements to chain.
Open addressing requires extra
Chaining is Less sensitive to the
care for to avoid clustering and
hash function or load factors.
load factor.
Chaining is mostly used when it
Open addressing is used when
is unknown how many and how
the frequency and number of
frequently keys may be inserted
keys is known.
or deleted.
Chaining Open Addressing
Open addressing provides better
Cache performance of chaining is
cache performance as
not good as keys are stored using
everything is stored in the same
linked list.
table.
Wastage of Space (Some Parts of In Open addressing, a slot can
hash table in chaining are never be used even if an input doesn’t
used). map to it.
Chaining uses extra space for
No links in Open addressing
links.
Quiz
1. Given the following input (4322,
1334, 1471, 9679, 1989, 6171, 6173,
4199) and the hash function x mod
10, show which of the inputs have the
same hash value.
2. Differentiate between Hashing &
Direct Access Table.
3. The keys 22, 8, 12, 3, 13, 15 and 25
are inserted into an initially empty
hash table of length 9 using open
addressing with hash function h(k) =
k mod 10 and quadratic probing.
What is the resultant hash table?
End of Lecture 6

Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
18 pages
Hashing and Open Addressing Explained
No ratings yet
Hashing and Open Addressing Explained
31 pages
Hashing Techniques: Theory & Implementation
No ratings yet
Hashing Techniques: Theory & Implementation
10 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
53 pages
Understanding Hashing and Its Functions
No ratings yet
Understanding Hashing and Its Functions
30 pages
Hash Tables and Collision Strategies
No ratings yet
Hash Tables and Collision Strategies
46 pages
Hash Functions and Collision Resolution
No ratings yet
Hash Functions and Collision Resolution
47 pages
Hashing Data Structures Notes
No ratings yet
Hashing Data Structures Notes
27 pages
Efficient Hash Table Techniques
No ratings yet
Efficient Hash Table Techniques
47 pages
Understanding Hash Tables and Collisions
No ratings yet
Understanding Hash Tables and Collisions
92 pages
Understanding Hash Functions and Collisions
No ratings yet
Understanding Hash Functions and Collisions
73 pages
Understanding Hash Tables in Python
No ratings yet
Understanding Hash Tables in Python
33 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
36 pages
ADS Practical
No ratings yet
ADS Practical
67 pages
Double Hashing in Collision Resolution
No ratings yet
Double Hashing in Collision Resolution
22 pages
Searching Algorithms and Hashing Techniques
No ratings yet
Searching Algorithms and Hashing Techniques
37 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
78 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
55 pages
Understanding Hash Tables and Hashing
No ratings yet
Understanding Hash Tables and Hashing
16 pages
Hashing La
No ratings yet
Hashing La
24 pages
Understanding Hash Tables and Collision Resolution
No ratings yet
Understanding Hash Tables and Collision Resolution
48 pages
Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
32 pages
Module IIIDS
No ratings yet
Module IIIDS
21 pages
Hashing Techniques in Algorithms
No ratings yet
Hashing Techniques in Algorithms
36 pages
Understanding Hash Functions and Tables
No ratings yet
Understanding Hash Functions and Tables
65 pages
Hashing Techniques and Probing Methods
No ratings yet
Hashing Techniques and Probing Methods
78 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
78 pages
Hash Tables and Collision Handling Techniques
No ratings yet
Hash Tables and Collision Handling Techniques
25 pages
Direct Addressing in Hash Tables
No ratings yet
Direct Addressing in Hash Tables
26 pages
Understanding Hash Tables and Functions
No ratings yet
Understanding Hash Tables and Functions
26 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
4 pages
Static vs Dynamic Hashing Overview
No ratings yet
Static vs Dynamic Hashing Overview
32 pages
Understanding Hashing Structures
No ratings yet
Understanding Hashing Structures
47 pages
Hashing
100% (1)
Hashing
28 pages
Double Hashing in Open Addressing
No ratings yet
Double Hashing in Open Addressing
15 pages
Understanding Hashing in Computer Science
No ratings yet
Understanding Hashing in Computer Science
38 pages
Direct Address and Hash Tables Explained
No ratings yet
Direct Address and Hash Tables Explained
10 pages
Module 8
No ratings yet
Module 8
15 pages
Hashing and Linked Lists Explained
No ratings yet
Hashing and Linked Lists Explained
7 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
39 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
32 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
4 pages
Hash Table Implementation and Techniques
100% (1)
Hash Table Implementation and Techniques
30 pages
Chapter 6 Hashing in Data Structure
No ratings yet
Chapter 6 Hashing in Data Structure
17 pages
Hashing Techniques and Time Complexity
No ratings yet
Hashing Techniques and Time Complexity
23 pages
Hashing Techniques and Collision Handling
No ratings yet
Hashing Techniques and Collision Handling
16 pages
Understanding Hashing and Hash Functions
No ratings yet
Understanding Hashing and Hash Functions
60 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
48 pages
Priority Queue Implementation in C Arrays
No ratings yet
Priority Queue Implementation in C Arrays
3 pages
Understanding Hash Functions in DSA
No ratings yet
Understanding Hash Functions in DSA
60 pages
Hashing Techniques and Applications
No ratings yet
Hashing Techniques and Applications
53 pages
Efficient Hashing Techniques Explained
No ratings yet
Efficient Hashing Techniques Explained
23 pages
Hashing Techniques Overview
No ratings yet
Hashing Techniques Overview
30 pages
Open vs Closed Hashing Techniques
No ratings yet
Open vs Closed Hashing Techniques
16 pages
Hashing Techniques in Algorithm Design
No ratings yet
Hashing Techniques in Algorithm Design
38 pages
Understanding Hash Tables and Hashing
No ratings yet
Understanding Hash Tables and Hashing
14 pages
E.V.S Sem 1 Class 2
No ratings yet
E.V.S Sem 1 Class 2
3 pages
Document Control in Quality Assurance
No ratings yet
Document Control in Quality Assurance
2 pages
USA Import Demurrage Rates 2025
No ratings yet
USA Import Demurrage Rates 2025
10 pages
Stalin Triumph and Tragedy 2nd Impression. Edition Volkogonov Full
No ratings yet
Stalin Triumph and Tragedy 2nd Impression. Edition Volkogonov Full
81 pages
Chocolate Market Research Report 2022
No ratings yet
Chocolate Market Research Report 2022
2 pages
Hydraulic Horsepower Calculation Guide
No ratings yet
Hydraulic Horsepower Calculation Guide
4 pages
EC102 Macroeconomics Course Overview
No ratings yet
EC102 Macroeconomics Course Overview
14 pages
Understanding Intake Manifold Functions
No ratings yet
Understanding Intake Manifold Functions
9 pages
Nonlinear Integral Equations Overview
No ratings yet
Nonlinear Integral Equations Overview
350 pages
Gym Startup Cost Breakdown Guide
No ratings yet
Gym Startup Cost Breakdown Guide
9 pages
Simultaneous Equations in Econometrics
No ratings yet
Simultaneous Equations in Econometrics
52 pages
Fire Safety Inspection Report 2025
No ratings yet
Fire Safety Inspection Report 2025
3 pages
Nadi Dasha Systems Overview
No ratings yet
Nadi Dasha Systems Overview
9 pages
Yoshihiro Ginsan Sujihiki Slicer Knife
No ratings yet
Yoshihiro Ginsan Sujihiki Slicer Knife
1 page
Vegetable Farming Patterns in Haryana
No ratings yet
Vegetable Farming Patterns in Haryana
8 pages
Advanced Legal Skills - Introduction To Business Law, Legal Systems, and The Commercial Courts
No ratings yet
Advanced Legal Skills - Introduction To Business Law, Legal Systems, and The Commercial Courts
6 pages
Sustainable Tourism: Principles & Benefits
No ratings yet
Sustainable Tourism: Principles & Benefits
18 pages
Antimicrobial Activity Surfactants
No ratings yet
Antimicrobial Activity Surfactants
8 pages
The Neural Basis of Improved Cognitive Performance by Threat of Shock
No ratings yet
The Neural Basis of Improved Cognitive Performance by Threat of Shock
10 pages
Test Bank For Managerial Economics and Strategy 3rd Edition by by Jeffrey M Perloff
No ratings yet
Test Bank For Managerial Economics and Strategy 3rd Edition by by Jeffrey M Perloff
61 pages
Bible Films: Comprehensive Movie List
No ratings yet
Bible Films: Comprehensive Movie List
15 pages
Technology Trends in Business: Term IX 1 Credit Course, 20 Sessions
No ratings yet
Technology Trends in Business: Term IX 1 Credit Course, 20 Sessions
55 pages
Learning Needs Assessment Plan for Math Teachers
100% (4)
Learning Needs Assessment Plan for Math Teachers
3 pages
Lesson Plan on Biomolecules for Biology
50% (2)
Lesson Plan on Biomolecules for Biology
7 pages
LEWA Ecoflow Eccentric Pump Overview
100% (1)
LEWA Ecoflow Eccentric Pump Overview
2 pages
Toxic and Acquired Metabolic Diseases
No ratings yet
Toxic and Acquired Metabolic Diseases
10 pages
Philippine Airlines E-Ticket Receipt
50% (4)
Philippine Airlines E-Ticket Receipt
3 pages
UAPA English 2 Unit 7 Exercises Guide
No ratings yet
UAPA English 2 Unit 7 Exercises Guide
7 pages
CSS Solar System Animation Project
No ratings yet
CSS Solar System Animation Project
14 pages
Lesson 9 - Osmoregulation
No ratings yet
Lesson 9 - Osmoregulation
18 pages

Data Structure and Algorithm Hashing Tables

Uploaded by

Data Structure and Algorithm Hashing Tables

Uploaded by

HASHING

For example if the list of values is

i. Array of phone numbers and records.

For example for phone numbers a bad hash

let hash(x) be the slot index computed using

If (hash(x) + 1) % S is also full,

If (hash(x) + 2) % S is also full,

let hash(x) be the slot index computed using hash function.

Load factor α = n/m ( < 1 )

Expected time to search/insert/delete < 1/(1 - α)

So Search, Insert and Delete take (1/(1 - α)) time

Load factor α = n/m

Expected time to search = O(1 + α)

Expected time to insert/delete = O(1 + α)

Time complexity of search insert and delete is

In chaining, Hash table never

You might also like