0% found this document useful (0 votes)

12 views16 pages

Unit 4

This document provides comprehensive notes on hash tables, covering their structure, collision resolution techniques, and performance analysis. It discusses direct addressing versus hashing, various collision resolution methods like chaining and open addressing, and the design of effective hash functions. The document also analyzes the time complexity of hash table operations, emphasizing the importance of load factors and the efficiency of different hashing strategies.

Uploaded by

Parul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views16 pages

Unit 4

Uploaded by

Parul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT 4: HASH TABLES - COMPREHENSIVE

NOTES

TABLE OF CONTENTS
1. Introduction to Hash Tables
2. Direct Addressing vs Hashing
3. Collision Resolution Techniques
4. Hash Function Design
5. Open Addressing Methods
6. Performance Analysis
7. Practical Considerations

1. INTRODUCTION TO HASH TABLES

1.1 What is a Hash Table?
Hash Table (Hash Map): A data structure that implements an associative
array—a structure that maps keys to values using a hash function.
Key Components: 1. Hash Function (h): Maps keys from a large universe
U to slots in a table T of size m 2. Hash Table T: Array of m slots (0 to
m-1) 3. Collision Resolution: Method to handle multiple keys mapping to
the same slot
Basic Idea:
Universe of Keys U (size possibly very large)
↓ (apply hash function h)
Hash Table T (size m, usually much smaller than |U|)

1.2 Problem Context

Motivation: - Cannot store all keys from universe U (too large) - Need to store
only the actual keys K present - Want O(1) average-case search, insert, delete
times
Trade-off: - Direct Addressing: O(1) worst-case, but requires space O(|U|)
(wasteful) - Hash Tables: O(1) average-case, requires space O(|K|) (eﬀicient)

1.3 Key Definitions

Hash Function: h: U → {0, 1, …, m-1}
Hash Value: h(k) is the slot where key k maps

1
Collision: When two distinct keys k� and k� have h(k�) = h(k�)
Load Factor: � = n/m (ratio of elements to table size)

2. DIRECT ADDRESSING VS HASHING

2.1 Direct Addressing
Concept: Store element with key k directly in slot k
Table Size: |U| (potentially huge)
Operations:
Direct-Address-Search(T, k)
return T[k]

Direct-Address-Insert(T, x)
T[[Link]] = x

Direct-Address-Delete(T, x)
T[[Link]] = NIL
Advantages: - O(1) worst-case for search, insert, delete - Simple to understand
and implement
Disadvantages: - Space requirement: O(|U|) (wasteful if |U| » |K|) - Requires
keys to be small integers - Cannot handle infinite or large key universes
Example: If U = {0, 1, …, 10�}, need array of size 10�, but only 100 keys stored

2.2 Hashing Overview

Concept: Use hash function to map keys to smaller table of size m
Table Size: m (where m « |U|)
Key Insight: - Use function h to compute slot: T[h(k)] - Storage reduced to
O(|K|) � O(n) - Trade worst-case for good average-case performance
Basic Hash Table Operations:
Hash-Search(T, k)
return T[h(k)] ← Simple, but collisions complicate this

Hash-Insert(T, x)
T[h([Link])] = x ← Must handle collisions

Hash-Delete(T, x)
T[h([Link])] = NIL ← Must handle collisions

2
Challenge: Collisions—multiple keys hashing to same slot

3. COLLISION RESOLUTION TECHNIQUES

3.1 Chaining (Separate Chaining)
3.1.1 Concept
• Each slot in hash table points to a linked list (chain)
• All elements hashing to same slot are in that slot’s list
• Each chain stores elements independently
Visualization:
Hash Table T (size m):
T[0] → [k�] → [k�] → NIL
T[1] → NIL
T[2] → [k�] → NIL
T[3] → [k�] → [k�] → [k�] → NIL
...
T[m-1] → [k�] → NIL

3.1.2 Operations Chained-Hash-Search(T, k)

return List-Search(T[h(k)], k)
Chained-Hash-Insert(T, x)
List-Prepend(T[h([Link])], x) ← Insert at front of list
Chained-Hash-Delete(T, x)
List-Delete(T[h([Link])], x) ← Delete element x from its list

3.1.3 Advantages and Disadvantages Advantages: - Simple to imple-

ment - Deletion is straightforward (O(1) if lists are doubly-linked) - Works well
with unknown load factors - Load factor � can be > 1
Disadvantages: - Extra space for pointers - May be slower in practice due to
cache misses (following pointers) - Deletion requires access to element, not just
key

3.1.4 When to Use Doubly-Linked Lists Doubly-Linked: - Deletion:

O(1) (no need to search list) - Requires extra “previous” pointer
Singly-Linked: - Deletion: O(length of list) (must find predecessor) - Saves
one pointer per element
Recommendation: Use doubly-linked if deletion is frequent

3
3.1.5 Expected Chain Length If there are n elements and m slots: - Ex-
pected length of each chain: E[|chain|] = n/m = � (load factor) - Longer
chains → worse performance - Keep � = O(1) for good performance

3.2 Open Addressing

3.2.1 Concept
• All elements stored directly in hash table (no external lists)
• When collision occurs, probe alternative slots
• Must find empty slot to insert element
• Load factor � = n/m < 1 (table cannot fill completely)
Basic Idea:
Hash-Insert(T, k)
i = 0
repeat
q = h(k, i) ← Probe sequence depends on key and probe number
if T[q] == NIL
T[q] = k
return q
else
i = i + 1
until i == m
error "hash table overflow"

3.2.2 Probe Sequence Requirement For every key k, the probe sequence:
h(k, 0), h(k, 1), ..., h(k, m-1)
must be a permutation of {0, 1, …, m-1}
This ensures every slot is eventually examined as the table fills.

3.2.3 Types of Open Addressing Three common methods differ in how

probe sequences are determined:
1. Linear Probing
2. Quadratic Probing
3. Double Hashing

3.3 Linear Probing

3.3.1 Definition
h(k, i) = (h�(k) + i) mod m
Probe sequence: h�(k), h�(k)+1, h�(k)+2, …, wrapping around
Example:

4
If h�(k) = 5 and m = 10:
Probe sequence: 5, 6, 7, 8, 9, 0, 1, 2, 3, 4

3.3.2 Advantages and Disadvantages Advantages: - Simple to imple-

ment - No need for secondary hash function - Good cache locality (sequential
access)
Disadvantages: - Primary Clustering: Tends to create long runs of occu-
pied slots - Search scans entire cluster, even if key is not present - Degrades
performance as table fills
Primary Clustering Example:
After insertions, creates pattern:
[k�][k�][k�][empty]...
Inserting new key hashing to position 1 must scan through all 3 elements

3.3.3 When to Use

• Small tables where clustering is minimal
• When cache locality is important
• Simple applications where implementation ease matters

3.4 Double Hashing

3.4.1 Definition
h(k, i) = (h�(k) + i·h�(k)) mod m
Two auxiliary hash functions: - h�(k): Initial probe position - h�(k): Step
size for subsequent probes
Probe sequence: h�(k), h�(k)+h�(k), h�(k)+2·h�(k), …, all mod m
Example (from Unit 4 PDF):
m = 13, h�(k) = k mod 13, h�(k) = 1 + (k mod 11)

Inserting key 14:

h�(14) = 14 mod 13 = 1
h�(14) = 1 + (14 mod 11) = 1 + 3 = 4

Probe sequence: 1, 5, 9, 0, 4, 8, 12, 3, 7, 11, 2, 6, 10

Position 1: occupied
Position 5: occupied
Position 9: empty → insert key 14 here

3.4.2 Key Requirements For probe sequence to visit all m slots: -

h�(k) must be relatively prime to m

5
Strategies: - m is prime: Choose h�(k) < m (automatically relatively prime)
- m is power of 2: Choose h�(k) to always be odd

3.4.3 Number of Probe Sequences Double Hashing: Produces m differ-

ent probe sequences (each h�(k), h�(k) pair is distinct)
Ideal (Independent Uniform Permutation Hashing): Would produce m!
permutations
Comparison: - m! » m (double hashing limited, but still good) - Better than
linear probing (only m sequences) - Approximates ideal behavior well in practice

3.4.4 Advantages and Disadvantages Advantages: - Many possible

probe sequences (m different ones) - Avoids primary clustering of linear probing
- Good empirical performance - No secondary clustering
Disadvantages: - Requires good secondary hash function h�(k) - More complex
than linear probing - Two hash function computations per probe

3.4.5 When to Use

• General-purpose applications
• When good hash functions are available
• Performance more important than simplicity
• Deletions are infrequent (or handled specially)

3.5 Quadratic Probing

3.5.1 Definition
h(k, i) = (h�(k) + c�·i + c�·i²) mod m
where c�, c� are constants
Probe sequence: h�(k), h�(k)+c�+c�, h�(k)+c�+2c�+4c�, …

3.5.2 Characteristics Primary Clustering: Avoided (unlike linear prob-

ing)
Secondary Clustering: Can still occur if h�(k) produces same hash for two
different keys
Probe Sequence Permutation: May not visit all m slots (depends on m and
constants)

3.5.3 When to Use

• Middle ground between linear and double hashing
• When good secondary hash function not available
• Table size constraints allow careful constant selection

6
4. HASH FUNCTION DESIGN
4.1 Properties of Good Hash Functions
4.1.1 Independent Uniform Hashing Assumption Definition: A hash
function h satisfies independent uniform hashing if:
1. Uniformity: Each key is equally likely to hash to any of the m slots
2. Independence: Hashing of one key is independent of another
Mathematically: For any two distinct keys k�, k�: - Pr[h(k�) = j] = 1/m for
all slots j - Pr[h(k�) = j � h(k�) = q] = 1/m² for all j, q
In Practice: Diﬀicult to achieve perfectly, but we can approximate it

4.1.2 Desirable Properties

1. Eﬀiciency: Computable in O(1) time
2. Uniformity: Distributes keys evenly across slots
3. Independence: Unrelated to patterns in data
4. Determinism: Same key always maps to same slot
5. Avalanche Effect: Small change in key → large change in hash value

4.2 Static Hashing Methods

Static Hashing: Single fixed hash function, randomness comes from data
distribution

4.2.1 Division Method Formula:

h(k) = k mod m
Example: - m = 12, k = 100: h(100) = 100 mod 12 = 4
Advantages: - Very fast (single division operation) - Simple to implement
Disadvantages: - Quality depends on choice of m - May not distribute keys
uniformly
Best Practices: - Choose m to be prime - Avoid powers of 2 (keys with
patterns may cluster) - Avoid powers of 10 (for decimal keys)
When m is Prime: - Better distribution of keys - Unrelated patterns less
likely to cause clustering

4.2.2 Multiplication Method Formula:

h(k) = �m(kA mod 1)�

7
where A is a constant 0 < A < 1, and kA mod 1 is fractional part of kA
Alternative (Multiply-Shift):
h(k) = �(a·k mod 2^w) / 2^(w-l)�
where a is random w-bit constant, extract l highest-order bits
Advantages: - Value of m is not critical (can be any power of 2) - Works
well with various data distributions - Eﬀicient to compute with bit operations
Disadvantages: - Requires careful choice of constant A - More complex than
division method
When to Use: - m is power of 2 (eﬀicient in binary) - Need good performance
across different data distributions - Constant A can be pre-computed

4.2.3 Example Hash Functions For integer keys:

h(k) = k mod m (division)
h(k) = �m(k·� mod 1)� (multiplication with � = golden ratio)
For string keys:
h(s) = (Σ s[i]·2^i) mod m
or
h(s) = (Σ s[i]·p^i) mod m where p is prime

4.3 Random Hashing (Universal Hashing)

4.3.1 Motivation Problem with Static Hashing: - Fixed hash function
may work poorly on certain data - No guarantee of good average-case perfor-
mance - Adversarial input could cause worst-case behavior
Solution: Choose hash function randomly from family of functions

4.3.2 Universal Hash Family Definition: A family H of hash functions is

universal if for any two distinct keys k�, k�:
|{h � H : h(k�) = h(k�)}| � |H|/m
Interpretation: For random h chosen from H, collision probability is at most
1/m
Probability bound: Pr[h(k�) = h(k�)] � 1/m

4.3.3 Benefits
1. Works well for any input distribution
2. Randomization comes from algorithm, not data
3. Guarantees good average-case performance

8
4. No adversarial inputs

4.3.4 Example: Universal Hashing for Integers Choose: - Prime p �

|U| - Random integers a, b with 0 � a < p, 0 � b < p, a � 0 - m = size of hash
table
Hash function:
h(k) = ((a·k + b) mod p) mod m
Property: Family of all such functions is universal (with appropriate parame-
ters)

5. ANALYSIS OF HASH TABLE OPERATIONS

5.1 Chaining Analysis
5.1.1 Unsuccessful Search (11.1) Theorem - Unsuccessful Search
with Chaining:
Under independent uniform hashing, an unsuccessful search takes Θ(1 + �)
time on average.
Proof Sketch: - Element searched for is not in table - Search examines entire
chain at h(k) - Expected chain length = � = n/m - Time: O(1) for hash function
+ O(�) for searching chain - Total: Θ(1 + �)

5.1.2 Successful Search (11.2) Theorem - Successful Search with

Chaining:
Under independent uniform hashing, a successful search takes Θ(1 + �) time
on average.
Proof Sketch: - Search is for element x (definitely in table) - Examines x plus
elements inserted before x - On average, half the chain length before x - But
accounting carefully: still O(1 + �)
Key Insight: Same asymptotic time whether search is successful or unsuccess-
ful!

5.1.3 Insertion and Deletion Insertion: - Time: O(1) worst-case

(prepend to chain) - No search needed if element not already there
Deletion (doubly-linked list): - Time: O(1) worst-case - If given element x
(not just key k) - Must use singly-linked if given only key → O(chain length)

9
5.1.4 Overall Performance If � = n/m = O(1) (number of elements pro-
portional to table size): - Search (successful/unsuccessful): O(1) average-case -
Insert: O(1) worst-case - Delete: O(1) worst-case
Conclusion: All dictionary operations in O(1) average-case time with chain-
ing!

5.2 Open Addressing Analysis

5.2.1 Unsuccessful Search (11.6) Theorem - Unsuccessful Search:
Given an open-address hash table with load factor � = n/m < 1, the expected
number of probes in an unsuccessful search is at most 1/(1-�), assuming inde-
pendent uniform permutation hashing.
Proof Sketch: - Let X = number of probes - Probability first i-1 probes hit
occupied slots, i-th hits empty - Bound: Pr[X � i] � �^(i-1) - Expected value:
E[X] = Σ Pr[X � i] � Σ �^(i-1) = 1/(1-�)
Examples: - � = 0.5: E[X] � 2 (on average 2 probes) - � = 0.9: E[X] � 10 (on
average 10 probes) - � = 1.0: E[X] = ∞ (table full)

5.2.2 Insertion (11.7) Corollary - Insertion:

Inserting an element into an open-address table with load factor � < 1 requires
at most 1/(1-�) probes on average.
Proof: Insertion = unsuccessful search + place in first empty slot

5.2.3 Successful Search (11.8) Theorem - Successful Search:

A successful search in open-address table with load factor � < 1 takes at most
(1/�)·ln(1/(1-�)) probes on average.
Proof Sketch: - Element k was i-th to be inserted (for i = 1..n) - Load fac-
tor when inserted: (i-1)/m - Expected probes for insertion: 1/(1-(i-1)/m) -
Average over all: (1/n)·Σ(i=1 to n) 1/(1-(i-1)/m) - This sum evaluates to
(m/(m-n))·ln(m/(m-n))
Interpretation: As load factor increases, successful search becomes much
slower
Examples: - � = 0.5: E � 1.387 probes - � = 0.9: E � 2.559 probes - � = 1.0: E
= H� (harmonic number, very large)

5.2.4 Summary of Open Addressing

Operation Expected Time

Unsuccessful search O(1/(1-�))
Successful search O((1/�)·ln(1/(1-�)))

10
Operation Expected Time
Insertion O(1/(1-�))
Deletion Complex (need DELETED marker)

Keep Load Factor Small: - � = 0.5 gives O(1) expected time - As � → 1,

times increase significantly - Rehash when � gets too large

5.3 Comparison: Chaining vs Open Addressing

Aspect Chaining Open Addressing

Load factor � Can be > 1 Must be < 1
Search (unsuccessful) Θ(1 + �) O(1/(1-�))
Search (successful) Θ(1 + �) O((1/�)·ln(1/(1-�)))
Insertion O(1) O(1/(1-�))
Deletion O(1) or O(�) Diﬀicult (DELETED needed)
Space overhead Pointers None (in-place)
Cache behavior Poor Good
When � = 0.5 O(1.5) O(1) to O(2)
When � = 0.9 O(1.9) O(10) to O(2.6)

Choose Chaining When: - Load factor varies unpredictably - Deletions are

frequent - Simplicity is important
Choose Open Addressing When: - Load factor stays small - Cache locality
matters - Extra space for pointers unacceptable

6. HANDLING DELETIONS IN OPEN ADDRESSING

6.1 The Deletion Problem
Issue: Deleting element by setting slot to NIL breaks probe sequences
Example:
Insert sequence: 10, 22, 15 all hash to slot 5
Using linear probing:
T[5] = 10
T[6] = 22
T[7] = 15

Deleting 10 (set T[5] = NIL):

T[5] = NIL, T[6] = 22, T[7] = 15

11
Search for 22:
Probe T[5] (NIL) → conclude 22 not in table (WRONG!)

6.2 Solution: DELETED Marker

Approach: - Mark deleted slot with special value DELETED (not NIL) - Insert
treats DELETED same as NIL (can reuse slot) - Search passes over DELETED
(key may be further along)
Modified HASH-SEARCH:
Hash-Search(T, k)
i = 0
repeat
q = h(k, i)
if T[q] = k
return q
if T[q] = NIL ← Only stop at NIL, not DELETED
return NIL
i = i + 1
until i = m
return NIL
Modified HASH-INSERT:
Hash-Insert(T, k)
i = 0
repeat
q = h(k, i)
if T[q] = NIL or T[q] = DELETED ← Treat DELETED as empty
T[q] = k
return q
i = i + 1
until i = m
error "table overflow"

6.3 Drawback of DELETED Marker

Problem: Search times no longer depend on load factor �
• Even with mostly empty table, must search past many DELETED slots
• Over time, accumulation of DELETED increases search cost
• Eventually must rehash entire table
When to Rehash: - When too many DELETED entries accumulate - Practical
rule: When DELETED count > some threshold - Rebuild table with new size
and hash function

12
7. PRACTICAL CONSIDERATIONS
7.1 Load Factor Management
7.1.1 Choosing Table Size For Chaining: - Keep m roughly equal to
expected n - Allows � = n/m � 1 = good performance - Can handle � > 1
without major degradation
For Open Addressing: - Keep m > n, so � < 1 - Typical: m � 2n (� � 0.5) -
Ensures 1/(1-�) � 2
Dynamic Resizing: - When � exceeds threshold (e.g., 0.75) - Create new table
of size 2m - Rehash all elements into new table - O(n) time, amortized O(1) per
insertion

7.1.2 Alpha and Performance

� = 0.25: Very sparse, wasted space
� = 0.50: Good balance (Open: 1/(1-�)=2, Chaining: 1.5)
� = 0.75: Getting dense (Open: 1/(1-�)=4, Chaining: 1.75)
� = 1.00: Full (Open: impossible, Chaining: slow)

7.2 Handling Different Key Types

7.2.1 Integer Keys Direct: h(k) = k mod m
Multiply: h(k) = �m·(k·A mod 1)�

7.2.2 String Keys Polynomial Rolling Hash:

h(s) = (s[0] + s[1]·p + s[2]·p² + ... + s[n-1]·p^(n-1)) mod m
where p is prime (e.g., p=31, p=37)
Benefits: - Works well for variable-length strings - Can be computed eﬀiciently
with rolling hash - Good distribution properties

7.2.3 Composite Keys For objects with multiple fields:

h(object) = (h�(field1) + h�(field2) + h�(field3)) mod m
or
h(object) = (h�(field1) XOR h�(field2) XOR h�(field3)) mod m

7.3 Collision Resolution Trade-offs

7.3.1 Space Considerations Chaining: - Extra space for pointers - ~8
bytes per element on 64-bit system - Acceptable trade-off for simplicity and
flexibility
Open Addressing: - No extra pointers - Uses only array space - Tight coupling
of data and structure

13
7.3.2 Time Considerations Chaining: - Consistent O(1 + �) across opera-
tions - Good for unpredictable load factors - Cache misses from pointer-chasing
Open Addressing: - Better for � < 0.5 - Degrades quickly as � → 1 - Cache-
friendly sequential access

7.3.3 Deletion Considerations Chaining: - Simple O(1) deletion (if lists

doubly-linked) - No special markers needed - List can be singly-linked if deletions
rare
Open Addressing: - Requires DELETED marker (complicates search) - Over
time, accumulates DELETED entries - Eventually must rehash entire table

7.4 Implementation Tips

7.4.1 Universal Hash Functions
For prime p � |U|:
h(k) = ((a·k + b) mod p) mod m

where a, b chosen randomly with 0 � a < p, 0 � b < p, a � 0

Advantage: Theoretically provable good behavior
Disadvantage: Needs large prime p, two modulo operations

7.4.2 Collision Avoidance Best Effort: - Choose hash function carefully

- Size table appropriately (not too small, not too large) - Keep load factor in
good range
Can’t Avoid Entirely: - By pigeonhole principle, collisions unavoidable - Must
handle gracefully with good collision resolution

7.4.3 Testing Hash Functions Uniformity Test: - Insert many random

keys - Check distribution across slots - Variance should be low
Collision Test: - Count actual collisions - Compare with expected (n·(1-(1-
1/m)^n)) - Function is good if actual � expected

8. IMPORTANT THEOREMS & RESULTS

Chaining
(11.1) Unsuccessful search: Θ(1 + �) average time
(11.2) Successful search: Θ(1 + �) average time

14
Open Addressing
(11.6) Unsuccessful search: O(1/(1-�)) average probes
(11.7) Insertion: O(1/(1-�)) average probes
(11.8) Successful search: O((1/�)·ln(1/(1-�))) average probes

Properties
Independent Uniform Hashing: Collision probability = 1/m for any two
distinct keys
Universal Hash Family: Pr[collision for random h] � 1/m

KEY CONCEPTS SUMMARY

Essential Ideas
1. Hash Function: Maps large universe to small table via h: U → {0, 1, …,
m-1}
2. Collision Resolution:
• Chaining: Each slot has linked list
• Open Addressing: Find alternative slots (linear, quadratic, double
hashing)
3. Load Factor: � = n/m determines performance
4. Analysis:
• Chaining: O(1 + �) for all operations
• Open Addressing: O(1/(1-�)) for unsuccessful search
5. Hash Function Design:
• Static: Division, multiplication methods
• Random: Universal hashing for provable performance
6. Practical Issues:
• Dynamic resizing when � exceeds threshold
• Handling deletions (DELETED marker)
• Choosing between chaining and open addressing

PRACTICE PROBLEMS
1. Implement hash table with chaining (doubly-linked list)
2. Implement hash table with linear probing

15
3. Compare performance of different hash functions on various input distri-
butions
4. Implement dynamic resizing (grow table when � > 0.75)
5. Analyze collision patterns for given input and hash function
6. Design hash function for composite keys (e.g., (name, id) pairs)
7. Implement universal hashing and verify 1/m collision probability
8. Trace operations (insert, search, delete) with collisions

END OF UNIT 4 COMPREHENSIVE NOTES

Distributed Database Concepts
No ratings yet
Distributed Database Concepts
35 pages
Overview of Hashing in Data Structures
No ratings yet
Overview of Hashing in Data Structures
25 pages
DHTML and CSS: Styling Web Pages
No ratings yet
DHTML and CSS: Styling Web Pages
13 pages
Recurrence Relations in DAA Explained
No ratings yet
Recurrence Relations in DAA Explained
8 pages
Database Indexing Explained: Types & Benefits
No ratings yet
Database Indexing Explained: Types & Benefits
19 pages
Principles of Conventional Software Engineering
No ratings yet
Principles of Conventional Software Engineering
5 pages
Python Web Development Essentials
No ratings yet
Python Web Development Essentials
243 pages
Key Features of Data Mining Explained
No ratings yet
Key Features of Data Mining Explained
1 page
Informal Guidelines for Relational Schemas
No ratings yet
Informal Guidelines for Relational Schemas
24 pages
Inference Rules in DBMS Explained
No ratings yet
Inference Rules in DBMS Explained
19 pages
Resolving Grammar Conflicts in Parsing
No ratings yet
Resolving Grammar Conflicts in Parsing
18 pages
Machine Learning Lab Manual: FIND-S & Candidate-Elimination
No ratings yet
Machine Learning Lab Manual: FIND-S & Candidate-Elimination
25 pages
Technical Interview Questions Summary
No ratings yet
Technical Interview Questions Summary
11 pages
DSA in Java Course Syllabus Overview
No ratings yet
DSA in Java Course Syllabus Overview
15 pages
Fundamentals of Algorithmic Problem Solving
No ratings yet
Fundamentals of Algorithmic Problem Solving
4 pages
Data Stream Mining and Architecture Insights
No ratings yet
Data Stream Mining and Architecture Insights
22 pages
C++ OOP Concepts Handwritten Notes
100% (1)
C++ OOP Concepts Handwritten Notes
20 pages
Client-Server Computing Overview
No ratings yet
Client-Server Computing Overview
23 pages
Book Exchange Combinations Modulo 1000000007
No ratings yet
Book Exchange Combinations Modulo 1000000007
5 pages
Computer Science Interview Questions Guide
No ratings yet
Computer Science Interview Questions Guide
45 pages
Concurrent Execution in DBMS
No ratings yet
Concurrent Execution in DBMS
18 pages
Software Requirements Specification Guide
No ratings yet
Software Requirements Specification Guide
21 pages
Python Machine Learning Lab Manual
No ratings yet
Python Machine Learning Lab Manual
9 pages
Challenges of Mutual Exclusion in Concurrency
No ratings yet
Challenges of Mutual Exclusion in Concurrency
32 pages
Prolog AI Lab Manual for IT Students
No ratings yet
Prolog AI Lab Manual for IT Students
17 pages
Advanced Data Structures Syllabus
No ratings yet
Advanced Data Structures Syllabus
2 pages
3-Tier Architecture in DBMS Explained
No ratings yet
3-Tier Architecture in DBMS Explained
7 pages
358 33 Powerpoint Slides DSC Chapter 15
No ratings yet
358 33 Powerpoint Slides DSC Chapter 15
55 pages
Digital Search Trees and Binary Tries
100% (1)
Digital Search Trees and Binary Tries
25 pages
State Transition Testing in Reservation Systems
100% (6)
State Transition Testing in Reservation Systems
33 pages
Java Event Handling Overview
No ratings yet
Java Event Handling Overview
16 pages
Schema Refinement in DBMS Explained
No ratings yet
Schema Refinement in DBMS Explained
29 pages
Semaphore Synchronization in C Lab
No ratings yet
Semaphore Synchronization in C Lab
6 pages
OOPS Concepts in Java Explained
No ratings yet
OOPS Concepts in Java Explained
5 pages
Understanding Hashing Techniques
No ratings yet
Understanding Hashing Techniques
10 pages
Pitfalls in Relational Database Design
No ratings yet
Pitfalls in Relational Database Design
80 pages
Inverses of Triangular Matrices Explained
100% (1)
Inverses of Triangular Matrices Explained
5 pages
Features of Distributed Database Transparency
No ratings yet
Features of Distributed Database Transparency
6 pages
Simple Calculator with HTML, CSS, JS
No ratings yet
Simple Calculator with HTML, CSS, JS
6 pages
C Standard Library Functions Overview
No ratings yet
C Standard Library Functions Overview
13 pages
Problem Solving Methods in AI
No ratings yet
Problem Solving Methods in AI
52 pages
Understanding SDLC and QA Roles
No ratings yet
Understanding SDLC and QA Roles
18 pages
Handwritten Notes on Stack Operations
No ratings yet
Handwritten Notes on Stack Operations
12 pages
Understanding Data Models in DBMS
No ratings yet
Understanding Data Models in DBMS
23 pages
CBB1201 - Flat QB - 2 Marks
No ratings yet
CBB1201 - Flat QB - 2 Marks
5 pages
Composite Objects in VDM-SL
No ratings yet
Composite Objects in VDM-SL
24 pages
Cooperative Task Scheduling in RTOS
No ratings yet
Cooperative Task Scheduling in RTOS
25 pages
NoSQL Database Models Explained
No ratings yet
NoSQL Database Models Explained
55 pages
Distributed vs. Multiprocessor Systems
100% (1)
Distributed vs. Multiprocessor Systems
3 pages
Essential Python Data Structures Guide
No ratings yet
Essential Python Data Structures Guide
15 pages
Static vs Dynamic Hashing in DBMS
100% (1)
Static vs Dynamic Hashing in DBMS
8 pages
Timestamp Protocols in DBMS
No ratings yet
Timestamp Protocols in DBMS
15 pages
Understanding Hashing and Its Functions
No ratings yet
Understanding Hashing and Its Functions
30 pages
Hashing and Hash Tables Overview
No ratings yet
Hashing and Hash Tables Overview
6 pages
Hash Tables and Collision Handling Techniques
No ratings yet
Hash Tables and Collision Handling Techniques
25 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Understanding Hash Tables and Hashing
No ratings yet
Understanding Hash Tables and Hashing
5 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
33 pages
Understanding Hash Tables for Fast Data Storage
No ratings yet
Understanding Hash Tables for Fast Data Storage
4 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
36 pages
Ethical Notes
No ratings yet
Ethical Notes
20 pages
ML Important Notes
No ratings yet
ML Important Notes
11 pages
International Cyber Laws and Case Studies 451
No ratings yet
International Cyber Laws and Case Studies 451
26 pages
Real Analysis Question Paper 2023
No ratings yet
Real Analysis Question Paper 2023
46 pages
Rachnatamak Lekhan Aur Jansanchar Madhyam VAC
No ratings yet
Rachnatamak Lekhan Aur Jansanchar Madhyam VAC
29 pages
Understanding Cytokine Storms
No ratings yet
Understanding Cytokine Storms
16 pages
Computer Organization Past Papers Guide
No ratings yet
Computer Organization Past Papers Guide
2 pages
Leishman Staining Procedure for Blood Smears
No ratings yet
Leishman Staining Procedure for Blood Smears
7 pages
H.264 / MPEG-4 Part 10 White Paper Variable-Length Coding 1
No ratings yet
H.264 / MPEG-4 Part 10 White Paper Variable-Length Coding 1
7 pages
Algorithm Design and Analysis Guide
No ratings yet
Algorithm Design and Analysis Guide
15 pages
EE 353 Midterm Exam I: Microprocessors
No ratings yet
EE 353 Midterm Exam I: Microprocessors
6 pages
Python Programming Lab Course Overview
No ratings yet
Python Programming Lab Course Overview
6 pages
Visual Guide to Mastering Git Basics
No ratings yet
Visual Guide to Mastering Git Basics
288 pages
Types and Handling of Interrupts in OS
No ratings yet
Types and Handling of Interrupts in OS
27 pages
Subtrees of Node C in Trees
No ratings yet
Subtrees of Node C in Trees
18 pages
Employee Registration Form
No ratings yet
Employee Registration Form
9 pages
Mobile App Development Question Bank
No ratings yet
Mobile App Development Question Bank
4 pages
Exp 1
No ratings yet
Exp 1
13 pages
Introduction to Programming Concepts
No ratings yet
Introduction to Programming Concepts
7 pages
React Native JWT Client with AsyncStorage
No ratings yet
React Native JWT Client with AsyncStorage
24 pages
CPP Cheatsheet
No ratings yet
CPP Cheatsheet
10 pages
Essential Excel Functions Guide
No ratings yet
Essential Excel Functions Guide
8 pages
Delphi Internet Programming Overview
100% (1)
Delphi Internet Programming Overview
14 pages
C•CURE 9000 SDK Integration Guide
No ratings yet
C•CURE 9000 SDK Integration Guide
175 pages
Complete DSA Patterns Reference Guide
No ratings yet
Complete DSA Patterns Reference Guide
9 pages
Introduction To Web Dynpro
No ratings yet
Introduction To Web Dynpro
46 pages
KBTU Lab1: Programming Problems
No ratings yet
KBTU Lab1: Programming Problems
4 pages
C++ Math Operations: Built-in vs User-defined
No ratings yet
C++ Math Operations: Built-in vs User-defined
9 pages
JavaScript Basics: Alerts, Input, Loops
No ratings yet
JavaScript Basics: Alerts, Input, Loops
8 pages
Custom Toast and Date Picker Example
No ratings yet
Custom Toast and Date Picker Example
21 pages
DBMS Model Question Paper 2025
No ratings yet
DBMS Model Question Paper 2025
4 pages
Object-Oriented Programming Basics
100% (1)
Object-Oriented Programming Basics
27 pages
Data Warehousing and Mining Course Overview
No ratings yet
Data Warehousing and Mining Course Overview
124 pages
RFI Dorks by - KIIT HACKERS
No ratings yet
RFI Dorks by - KIIT HACKERS
7 pages
Introduction to Object-Oriented Programming
No ratings yet
Introduction to Object-Oriented Programming
58 pages
Introduction to NumPy in Python
No ratings yet
Introduction to NumPy in Python
29 pages
TCAX Changelog: Version Updates & Fixes
No ratings yet
TCAX Changelog: Version Updates & Fixes
4 pages
Linux Kernel and Device Driver Concepts
100% (1)
Linux Kernel and Device Driver Concepts
114 pages

Unit 4

Uploaded by

Unit 4

Uploaded by

UNIT 4: HASH TABLES - COMPREHENSIVE

1. INTRODUCTION TO HASH TABLES

1.2 Problem Context

1.3 Key Definitions

2. DIRECT ADDRESSING VS HASHING

2.2 Hashing Overview

3. COLLISION RESOLUTION TECHNIQUES

3.1.2 Operations Chained-Hash-Search(T, k)

3.1.3 Advantages and Disadvantages Advantages: - Simple to imple-

3.1.4 When to Use Doubly-Linked Lists Doubly-Linked: - Deletion:

3.2 Open Addressing

3.2.3 Types of Open Addressing Three common methods differ in how

3.3 Linear Probing

3.3.2 Advantages and Disadvantages Advantages: - Simple to imple-

3.3.3 When to Use

3.4 Double Hashing

Inserting key 14:

Probe sequence: 1, 5, 9, 0, 4, 8, 12, 3, 7, 11, 2, 6, 10

3.4.2 Key Requirements For probe sequence to visit all m slots: -

3.4.3 Number of Probe Sequences Double Hashing: Produces m differ-

3.4.4 Advantages and Disadvantages Advantages: - Many possible

3.4.5 When to Use

3.5 Quadratic Probing

3.5.2 Characteristics Primary Clustering: Avoided (unlike linear prob-

3.5.3 When to Use

4.1.2 Desirable Properties

4.2 Static Hashing Methods

4.2.1 Division Method Formula:

4.2.2 Multiplication Method Formula:

4.2.3 Example Hash Functions For integer keys:

4.3 Random Hashing (Universal Hashing)

4.3.2 Universal Hash Family Definition: A family H of hash functions is

4.3.4 Example: Universal Hashing for Integers Choose: - Prime p �

5. ANALYSIS OF HASH TABLE OPERATIONS

5.1.2 Successful Search (11.2) Theorem - Successful Search with

5.1.3 Insertion and Deletion Insertion: - Time: O(1) worst-case

5.2 Open Addressing Analysis

5.2.2 Insertion (11.7) Corollary - Insertion:

5.2.3 Successful Search (11.8) Theorem - Successful Search:

5.2.4 Summary of Open Addressing

Operation Expected Time

Keep Load Factor Small: - � = 0.5 gives O(1) expected time - As � → 1,

5.3 Comparison: Chaining vs Open Addressing

Aspect Chaining Open Addressing

Choose Chaining When: - Load factor varies unpredictably - Deletions are

6. HANDLING DELETIONS IN OPEN ADDRESSING

Deleting 10 (set T[5] = NIL):

6.2 Solution: DELETED Marker

6.3 Drawback of DELETED Marker

7.1.2 Alpha and Performance

7.2 Handling Different Key Types

7.2.2 String Keys Polynomial Rolling Hash:

7.2.3 Composite Keys For objects with multiple fields:

7.3 Collision Resolution Trade-offs

7.3.3 Deletion Considerations Chaining: - Simple O(1) deletion (if lists

7.4 Implementation Tips

where a, b chosen randomly with 0 � a < p, 0 � b < p, a � 0

7.4.2 Collision Avoidance Best Effort: - Choose hash function carefully

7.4.3 Testing Hash Functions Uniformity Test: - Insert many random

8. IMPORTANT THEOREMS & RESULTS

KEY CONCEPTS SUMMARY

END OF UNIT 4 COMPREHENSIVE NOTES

You might also like