0% found this document useful (0 votes)

34 views31 pages

Hashing Techniques in Data Structures

This document discusses hashing and hash tables. It begins by explaining why hashing is useful for storing and retrieving key-value pairs efficiently. It then discusses hash functions and how they are used to map keys to indices in a hash table. The document covers collision resolution techniques like chaining and open addressing (linear probing, quadratic probing, double hashing). It also discusses concepts like load factor and provides examples of hash functions and their use in Java.

Uploaded by

Haire Kahfi Maa Takaful

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views31 pages

Hashing Techniques in Data Structures

Uploaded by

Haire Kahfi Maa Takaful

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSC508

DATA STRUCTURES
Zulaile Mabni

TOPIC 6 Hashing
1

CHAPTER OBJECTIVES

Learn about hashing

Learn about hash methods Mid-square Folding Division Learn about collision Open addressing Chaining

Malik, Data Structures Using Java

WHY HASHING?
Need a data structure in which finds/searches are very fast Insert and Delete process should be fast too Objects have unique keys

A key may be a single property/attribute value Or may be created from multiple properties/values

HASH TABLES VS. OTHER DATA STRUCTURES

Maximize efficiency: implement the operations Insert(), Delete() and Search()/Find() efficiently. Arrays: not space efficient (assumes we leave empty space for keys not currently in the structure)

Linked List

space efficient
Insert(), Delete() and Search()/Find() not too efficient

Hash Tables:

Better than the above in terms of space and efficiency

HASH TABLES
Very

useful data structure

Good for storing and retrieving key-value pairs Not good for iterating through a list of items

Example

applications:

Storing objects according to ID numbers

When the ID numbers are widely spread out When you dont need to access items in ID order

HASH INDEX/VALUE
A

hash value or hash index is used to index the hash table (array) A hash function takes a key and returns a hash value/index

The hash index is a integer (to index an array)

The

key is specific value associated with a specific object being stored in the hash table

It is important that the key remain constant for the lifetime of the object

Hash Tables Conceptual View

table
7 6 5 4 3 buckets b1
obj1 key=15

hash value/index

b2 b3 b4

Obj3 key=4

Obj2 key=36

2 1 0

Obj4 key=2 Obj5 key=1

HASH TABLES

Hash Tables solve these problems by using a much smaller array and mapping keys with a hash function. Let universe of keys U and an array of size m. A hash function h is a function from U to 0m, that is:

U
k1

h:U
k2

k 3 k4 k6

(universe of keys)

0 1 2 3 4 5 6 7

h (k2)=2 h (k1)=h (k3)=3 h (k6)=5 h (k4)=7

HASH FUNCTION

You want a hash function/algorithm that is:

Fast Easy to compute Minimize the number of collisions Creates a good distribution of hash values so that the items (based on their keys) are distributed evenly through the array Integer key values String key values Multipart key values
Multipart fields, and/or Multiple fields

Hash functions can use as input

CHOOSING A HASH FUNCTION

The performance of the hash table depends on having a hash function that evenly distributes the keys: uniform hashing is the ideal target Choosing a good hash function requires taking into account the kind of data that will be used.

E.g., Choosing the first letter of a last name will likely cause lots of collisions depending on the nationality of the population.

Most programming languages (including java) have hash functions built in.

COMMONLY USED HASH METHODS

Mid-Square

Hash method, h, computed by squaring the identifier Using appropriate number of bits from the middle of the square to obtain the bucket address Middle bits of a square usually depend on all the characters, it is expected that different keys will yield different hash addresses with high probability, even if some of the characters are the same

Malik, Data Structures Using Java

COMMONLY USED HASH METHODS

Folding Key X is partitioned into parts such that all the parts, except possibly the last parts, are of equal length Parts then added, in convenient way, to obtain hash address

Malik, Data Structures Using Java

Division (Modular arithmetic) key mod m m is the array size; in general, it should be prime number Key X is converted into an integer iX This integer divided by size of hash table to get remainder, giving address of X in HT

THE MOD FUNCTION

Stands for modulo When you divide x by y, you get a result and a remainder Mod is the remainder
8 mod 5 = 3 9 mod 5 = 4 10 mod 5 = 0 15 mod 5 = 0

Thus for key-value mod M, multiples of M give the same result, 0

But multiples of other numbers do not give the same result

COMMONLY USED HASH METHODS

Multiplication method

Floor ((keysomeFraction mod 1)arraySize) Where some fraction is typically 0.618

Java Hash Map method

Create a hash by performing a series of shifts, adds, and xors on the key index = hash mod arraySize

COMMONLY USED HASH METHODS

Suppose that each key is a string. The following Java method uses the division method to compute the address of the key:
Malik, Data Structures Using Java

int hashmethod(String insertKey) { int sum = 0; for(int j = 0; j <= [Link](); j++) sum = sum + (int)([Link](j)); return (sum % HTSize); }//end hashmethod

HASH FUNCTIONS & INSERT()

Usage summary: int hashValue = hashFunction (int key);

Or hashValue = hashFunction (String key); Or hashValue = hashFunction (itemType item);

Insert method: public void insert (int key, itemType item) { hashValue = hashFunction (key); table[hashValue] = item; }

HASH TABLES: INSERT EXAMPLE

For example, if we hash keys 01000 into a hash table with 5 entries and use h(key) = key mod 5 , we get the following sequence of events: Insert 2
key data

Insert 21
key data

Insert 34
key data

Insert 54

0 1
2 3 2

0 1 21
2 3 2

There is a collision at array entry #4

4 34

???

COLLISION RESOLUTION
Algorithms to handle collisions Two categories of collision resolution techniques

Malik, Data Structures Using Java

Open addressing (closed hashing) Chaining (open hashing)

DEALING WITH COLLISIONS

A problem arises when we have two keys that hash in the same array entry this is called a collision. There are two ways to resolve collision:

Hashing with Chaining (a.k.a. Separate Chaining): every hash table entry contains a pointer to a linked list of keys that hash in the same entry
Hashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new key

HASHING WITH CHAINING

The problem is that keys 34 and 54 hash in the same entry (4). We solve this collision by placing all keys that hash in the same hash table entry in a chain (linked list) or bucket (array) pointed by this entry:

Insert 54 0 1 2 3

other key key data

Insert 101 0 1 2 3

21 2 54
CHAIN

101 2 54

OPEN ADDRESSING

Collisions are resolved by systematically examining other table indexes, i0 , i1 , i2 , until an empty slot is located. The key is first mapped to an array cell using the hash function (e.g. key % array-size) If there is a collision find an available array cell There are different algorithms to find (to probe for) the next array cell Linear probing Quadratic probing Random probing Double Hashing

OPEN ADDRESSING: LINEAR PROBING

Suppose that an item with key X is to be inserted in HT Use hash function to compute index h(X) of item in HT Suppose h(X) = t. If HT[t] is empty, store item into array slot. Suppose HT[t] already occupied by another item; collision occurs Linear probing: starting at location t, search array sequentially to find next available array slot: (t + 1) % HTSize, (t + 2) % HTSize,,(t + j) % HTSize Be sure to wrap around the end of the array! Stop when you have tried all possible array indices If the array is full, you need to throw an exception or, better yet, resize the array

Malik, Data Structures Using Java

COLLISION RESOLUTION: OPEN ADDRESSING

Pseudocode implementing linear probing:
hIndex = hashmethod(insertKey); found = false; while(HT[hIndex] != emptyKey && !found) if(HT[hIndex].key == key) found = true; else hIndex = (hIndex + 1) % HTSize; if(found) [Link](Duplicate items not allowed); else HT[hIndex] = newItem;
Malik, Data Structures Using Java

HASH TABLES OPEN ADDRESSING (LINEAR PROBING)

h(key) = key mod 8

table 7 6 5
obj1 key=15 Obj3 key=4 Obj2 key=28

hash value/index

Index=5 Index=4

4 3 2 1 0

Obj5 key=1

Obj4 key=2

RANDOM PROBING
Uses

a random number generator to find the next available slot ith slot in the probe sequence is: (h(X) + ri) % HTSize where ri is the ith value in a random permutation of the numbers 1 to HTSize 1
Suppose HTSize = 101, for h(X) = 26, and r1 = 2, r2 = 5, r3 = 8. The probe sequence of X has the elements 26, 28,31,34

All

Malik, Data Structures Using Java

insertions and searches use the same sequence of random numbers

QUADRATIC PROBING
In Quadratic probing, starting at position t, check the array locations ( t + 1) % HTSize, (t + 2) % HTSize,, (t + i) % HTSize. We do not know if it probes all the positions in the table When HTSize is prime, quadratic probing probes about half the table before repeating the probe sequence

Malik, Data Structures Using Java

DOUBLE HASHING
Apply a second hash function after the first The second hash function, like the first, is dependent on the key Secondary hash function must

Be different than the first And, obviously, not generate a zero arrayIndex = (arrayIndex + stepSize) % arraySize; Where stepSize = constant (key % constant) And constant is a prime less than the array size

Good algorithm:

LOAD FACTOR
Understanding the expected load factor will help you determine the efficiency of your hash table implementation and hash functions Load factor = number of items in hash table / array size For Open Addressing:

If < 0.5, wasting space If > 0.8, overflows significant If < 1.0, wasting space If > 2.0, then search time to find a specific item may factor in significantly to the [relative] performance

For Chaining:

HASH TABLES IN JAVA

Java supports a number of hash table classes Hashtable, HashMap, LinkedHashMap, HashSet, See Sun Java API Documentation [Link] As a programmer, you dont see the collision detection, chaining, etc. You can set The initial table size The load factor (Default is .75) hashCode() hash function

HASHING ANIMATION

[Link] [Link]

REFERENCES

Malik D.S., Nair P.S., Data Structures Using Java, Course Technology, 2003.
Malik, Data Structures Using Java

Weiss Mark Allen, Data Structures & Algorithm Analysis in C++, Pearson Education International Inc, 2003.

Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
31 pages
Java Hashing and Collision Resolution
No ratings yet
Java Hashing and Collision Resolution
42 pages
Hashing Techniques in Java Data Structures
No ratings yet
Hashing Techniques in Java Data Structures
78 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
30 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
35 pages
Hash Tables: Concepts and Techniques
No ratings yet
Hash Tables: Concepts and Techniques
29 pages
Java Data Structures & Algorithms Guide
No ratings yet
Java Data Structures & Algorithms Guide
25 pages
Lazy Deletion in Hash Tables
No ratings yet
Lazy Deletion in Hash Tables
27 pages
Understanding Hash Tables in Data Structures
No ratings yet
Understanding Hash Tables in Data Structures
34 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
48 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
75 pages
Hashing Collision Resolution Techniques
No ratings yet
Hashing Collision Resolution Techniques
20 pages
Understanding Hash Tables and Collisions
No ratings yet
Understanding Hash Tables and Collisions
92 pages
Hash Tables: Probing vs Chaining
No ratings yet
Hash Tables: Probing vs Chaining
35 pages
Understanding Hash Tables in Data Structures
No ratings yet
Understanding Hash Tables in Data Structures
24 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
47 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
26 pages
Hash Table: Structure and Techniques
No ratings yet
Hash Table: Structure and Techniques
9 pages
Understanding Hashing and Hash Tables
No ratings yet
Understanding Hashing and Hash Tables
204 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
30 pages
Hashing Data Structures Notes
No ratings yet
Hashing Data Structures Notes
27 pages
Understanding Hash Tables in C
No ratings yet
Understanding Hash Tables in C
32 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
37 pages
Hashing
No ratings yet
Hashing
27 pages
Hash PDF
No ratings yet
Hash PDF
7 pages
Fibonacci Hashing Explained
No ratings yet
Fibonacci Hashing Explained
25 pages
Understanding Hash Tables and Collision Resolution
No ratings yet
Understanding Hash Tables and Collision Resolution
14 pages
Java Hashtable Overview and Concepts
No ratings yet
Java Hashtable Overview and Concepts
3 pages
Understanding Hashing and Collision Resolution
No ratings yet
Understanding Hashing and Collision Resolution
33 pages
Hashing: Collision Handling Methods
No ratings yet
Hashing: Collision Handling Methods
52 pages
Lecture 10
No ratings yet
Lecture 10
128 pages
Hash-Based Dictionary Implementation
No ratings yet
Hash-Based Dictionary Implementation
40 pages
Understanding Hashing Techniques and Functions
No ratings yet
Understanding Hashing Techniques and Functions
32 pages
Hashing Time Complexity Explained
No ratings yet
Hashing Time Complexity Explained
44 pages
Hash Table Functions and Techniques
No ratings yet
Hash Table Functions and Techniques
26 pages
Hash Table Search Complexity Explained
No ratings yet
Hash Table Search Complexity Explained
43 pages
Hashing
No ratings yet
Hashing
22 pages
Understanding Hashing in Data Structures
No ratings yet
Understanding Hashing in Data Structures
48 pages
Hash Table Cost Analysis and Strategies
No ratings yet
Hash Table Cost Analysis and Strategies
25 pages
Understanding Hashing and Its Functions
No ratings yet
Understanding Hashing and Its Functions
30 pages
Hashing
No ratings yet
Hashing
63 pages
Understanding Hash Tables and Maps
No ratings yet
Understanding Hash Tables and Maps
38 pages
Hashing Techniques in Data Structures
No ratings yet
Hashing Techniques in Data Structures
18 pages
Hash Tables and Collision Techniques
No ratings yet
Hash Tables and Collision Techniques
17 pages
Understanding Hash Tables and Collision Methods
No ratings yet
Understanding Hash Tables and Collision Methods
21 pages
Hashing Time Complexity Explained
No ratings yet
Hashing Time Complexity Explained
22 pages
Hashing Techniques and Collision Handling
No ratings yet
Hashing Techniques and Collision Handling
16 pages
Hash Table Implementation and Techniques
100% (1)
Hash Table Implementation and Techniques
30 pages
Coalesced Hashing and Collision Resolution
No ratings yet
Coalesced Hashing and Collision Resolution
75 pages
Hashing Concepts and Techniques
No ratings yet
Hashing Concepts and Techniques
26 pages
Zobrist Hashing in Python Explained
No ratings yet
Zobrist Hashing in Python Explained
19 pages
Hash Functions and Collision Resolution
No ratings yet
Hash Functions and Collision Resolution
47 pages
Hashing Techniques and Collision Resolution
No ratings yet
Hashing Techniques and Collision Resolution
55 pages
Understanding Hash Tables in Data Structures
No ratings yet
Understanding Hash Tables in Data Structures
23 pages
Hashing Techniques and File Organization
No ratings yet
Hashing Techniques and File Organization
56 pages
5 - HashTable and Collisions
No ratings yet
5 - HashTable and Collisions
6 pages
Understanding Hashing and Hash Functions
No ratings yet
Understanding Hashing and Hash Functions
27 pages
UQ Academic Calendar 2019/2020
No ratings yet
UQ Academic Calendar 2019/2020
1 page
Mat423 2013 01
No ratings yet
Mat423 2013 01
6 pages
Phi454 2013 01
No ratings yet
Phi454 2013 01
4 pages
Overview of Key Philosophy Branches
No ratings yet
Overview of Key Philosophy Branches
5 pages
Universiti Teknologi Mara Final Examination: Confidential CS/JAN 2012/CSC563
No ratings yet
Universiti Teknologi Mara Final Examination: Confidential CS/JAN 2012/CSC563
4 pages
CSC520 April 2011 Uitm
No ratings yet
CSC520 April 2011 Uitm
5 pages
Course Info Bab401
No ratings yet
Course Info Bab401
3 pages
Array-Based List Operations Overview
No ratings yet
Array-Based List Operations Overview
34 pages
Solution Test1 Qmt400
No ratings yet
Solution Test1 Qmt400
3 pages
Statistics Exam Questions and Solutions
No ratings yet
Statistics Exam Questions and Solutions
4 pages
Understanding Cyber-Terrorism Risks
No ratings yet
Understanding Cyber-Terrorism Risks
4 pages
CSC510
No ratings yet
CSC510
5 pages
Universiti Teknologi Mara Final Examination: Confidential CS/JAN 2012/CSC435
No ratings yet
Universiti Teknologi Mara Final Examination: Confidential CS/JAN 2012/CSC435
11 pages
Oracle 12c: SQL: Additional Database Objects
No ratings yet
Oracle 12c: SQL: Additional Database Objects
39 pages
Create School Database Schema
No ratings yet
Create School Database Schema
23 pages
Microsoft Fabric Git Integration Guide
No ratings yet
Microsoft Fabric Git Integration Guide
6 pages
SQL Database Management Overview
No ratings yet
SQL Database Management Overview
17 pages
Redshift Column Management and Features
No ratings yet
Redshift Column Management and Features
8 pages
MySQL Theory Notes Overview
No ratings yet
MySQL Theory Notes Overview
6 pages
Data Science Foundations Exam Questions
0% (1)
Data Science Foundations Exam Questions
6 pages
RDBMS Concepts: © Tata Consultancy Services Ltd. July 7, 2018 1
No ratings yet
RDBMS Concepts: © Tata Consultancy Services Ltd. July 7, 2018 1
38 pages
Data Structures Practical File MCA
No ratings yet
Data Structures Practical File MCA
4 pages
DBMS Unit 2: Relational Model Notes
No ratings yet
DBMS Unit 2: Relational Model Notes
37 pages
2071 (Fundamental of Database)
No ratings yet
2071 (Fundamental of Database)
3 pages
Looker Studio Mastery Checklist Guide
No ratings yet
Looker Studio Mastery Checklist Guide
3 pages
U-M Micro Info Management System Year
No ratings yet
U-M Micro Info Management System Year
20 pages
SAS 9.4 License and Product Info
No ratings yet
SAS 9.4 License and Product Info
3 pages
PL/SQL to Data Engineer: 20 LPA Guide
No ratings yet
PL/SQL to Data Engineer: 20 LPA Guide
7 pages
Overview of Microsoft Access Database
No ratings yet
Overview of Microsoft Access Database
68 pages
Transfer MySQL Database Using mysqldump
No ratings yet
Transfer MySQL Database Using mysqldump
2 pages
DS UNIT 1 PPT Updated
No ratings yet
DS UNIT 1 PPT Updated
94 pages
Understanding Database Normalization Concepts
No ratings yet
Understanding Database Normalization Concepts
110 pages
Class 10 DBMS Questions & Answers
No ratings yet
Class 10 DBMS Questions & Answers
10 pages
Crystal Report Post Test Questions
No ratings yet
Crystal Report Post Test Questions
3 pages
Hitachi HDPS Object Storage Best Practices
No ratings yet
Hitachi HDPS Object Storage Best Practices
12 pages
Power BI Developer & Data Analyst Resume
No ratings yet
Power BI Developer & Data Analyst Resume
2 pages
LiveData and Reactive Streams Guide
No ratings yet
LiveData and Reactive Streams Guide
52 pages
LIS Students' Information Needs at BUK
No ratings yet
LIS Students' Information Needs at BUK
26 pages
UTS Database Foundations Exam Review
No ratings yet
UTS Database Foundations Exam Review
19 pages
DBMS Transaction Concepts and Techniques
No ratings yet
DBMS Transaction Concepts and Techniques
19 pages
ER Model and Notation in DBMS
No ratings yet
ER Model and Notation in DBMS
18 pages
Overview of Relational Database Systems
No ratings yet
Overview of Relational Database Systems
2 pages
Data Analyst Syllabus Overview
No ratings yet
Data Analyst Syllabus Overview
8 pages

Hashing Techniques in Data Structures

Uploaded by

Hashing Techniques in Data Structures

Uploaded by

CSC508

Learn about hashing

Malik, Data Structures Using Java

HASH TABLES VS. OTHER DATA STRUCTURES

Better than the above in terms of space and efficiency

useful data structure

Storing objects according to ID numbers

The hash index is a integer (to index an array)

Hash Tables Conceptual View

Obj4 key=2 Obj5 key=1

h (k2)=2 h (k1)=h (k3)=3 h (k6)=5 h (k4)=7

You want a hash function/algorithm that is:

Hash functions can use as input

CHOOSING A HASH FUNCTION

COMMONLY USED HASH METHODS

Malik, Data Structures Using Java

COMMONLY USED HASH METHODS

Malik, Data Structures Using Java

THE MOD FUNCTION

Thus for key-value mod M, multiples of M give the same result, 0

But multiples of other numbers do not give the same result

COMMONLY USED HASH METHODS

Floor ((key*someFraction mod 1)*arraySize) Where some fraction is typically 0.618

Java Hash Map method

COMMONLY USED HASH METHODS

HASH FUNCTIONS & INSERT()

Or hashValue = hashFunction (String key); Or hashValue = hashFunction (itemType item);

HASH TABLES: INSERT EXAMPLE

There is a collision at array entry #4

Malik, Data Structures Using Java

Open addressing (closed hashing) Chaining (open hashing)

DEALING WITH COLLISIONS

HASHING WITH CHAINING

other key key data

OPEN ADDRESSING: LINEAR PROBING

Malik, Data Structures Using Java

COLLISION RESOLUTION: OPEN ADDRESSING

HASH TABLES OPEN ADDRESSING (LINEAR PROBING)

Malik, Data Structures Using Java

insertions and searches use the same sequence of random numbers

Malik, Data Structures Using Java

HASH TABLES IN JAVA

You might also like

Floor ((keysomeFraction mod 1)arraySize) Where some fraction is typically 0.618