0% found this document useful (0 votes)
8 views21 pages

Space-for-Time Tradeoffs in Algorithms

Chapter 7 discusses space-for-time tradeoffs in algorithms, particularly focusing on input enhancement and prestructuring techniques for efficient problem-solving. It reviews string searching algorithms, including brute force, Knuth-Morris-Pratt, Boyer-Moore, and Horspool's algorithm, highlighting their preprocessing methods and time complexities. Additionally, the chapter covers hashing techniques for implementing dictionaries, addressing collision handling through open and closed hashing methods.

Uploaded by

Jibril hassan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Space-for-Time Tradeoffs in Algorithms

Chapter 7 discusses space-for-time tradeoffs in algorithms, particularly focusing on input enhancement and prestructuring techniques for efficient problem-solving. It reviews string searching algorithms, including brute force, Knuth-Morris-Pratt, Boyer-Moore, and Horspool's algorithm, highlighting their preprocessing methods and time complexities. Additionally, the chapter covers hashing techniques for implementing dictionaries, addressing collision handling through open and closed hashing methods.

Uploaded by

Jibril hassan
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd

Chapter 7

Space and Time


Tradeoffs

Copyright © 2007 Pearson Addison-Wesley. All rights reserved.


Space-for-time tradeoffs

Two varieties of space-for-time algorithms:


 input enhancement — preprocess the input (or its part) to
store some info to be used later in solving the problem
• counting sorts (Ch. 7.1)
• string searching algorithms

 prestructuring — preprocess the input to make accessing its


elements easier
• hashing
• indexing schemes (e.g., B-trees)

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-2
Review: String searching by brute force

pattern: a string of m characters to search for


text: a (long) string of n characters to search in

Brute force algorithm


Step 1 Align pattern at beginning of text
Step 2 Moving from left to right, compare each character of
pattern to the corresponding character in text until
either all characters are found to match (successful
search) or a mismatch is detected
Step 3 While a mismatch is detected and the text is not yet
exhausted, realign pattern one position to the right and
repeat Step 2
Time complexity (worst-case): O(mn)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-3
String searching by preprocessing
Several string searching algorithms are based on the input
enhancement idea of preprocessing the pattern

 Knuth-Morris-Pratt (KMP) algorithm preprocesses


pattern left to right to get useful information for later
searching
O(m+n) time in the worst case
 Boyer -Moore algorithm preprocesses pattern right to left
and store information into two tables
O(m+n) time in the worst case
 Horspool’s algorithm simplifies the Boyer-Moore algorithm
by using just one table

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-4
Horspool’s Algorithm

A simplified version of Boyer-Moore algorithm:

• preprocesses pattern to generate a shift table that


determines how much to shift the pattern when a
mismatch occurs

• always makes a shift based on the text’s character c


aligned with the last compared (mismatched) character
in the pattern according to the shift table’s entry for c

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-5
How far to shift?
Look at first (rightmost) character in text that was compared:
 The character is not in the pattern
.....c...................... (c not in pattern)
BAOBAB

 The character is in the pattern (but not the rightmost)


.....O...................... (O occurs once in pattern)
BAOBAB
.....A...................... (A occurs twice in pattern)
BAOBAB

 The rightmost characters do match


.....B......................
BAOBAB

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-6
Shift table
 Shift sizes can be precomputed by the formula

{
distance from c’s rightmost occurrence in pattern
among its first m-1 characters to its right end
t(c) =
pattern’s length m, otherwise
by scanning pattern before search begins and stored in a
table called shift table. After the shift, the right end of pattern is t(c)
positions to the right of the last compared character in text.

 Shift table is indexed by text and pattern alphabet


Eg, for BAOBAB:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-7
Example of Horspool’s algorithm

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _

1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 6

BARD LOVED BANANAS


BAOBAB
BAOBAB
BAOBAB
BAOBAB (unsuccessful search)

If k characters are matched before the mismatch, then the shift distance is
d1 = t(c) – k. k

{
……………..czyx……….
…c.…bzyx

}
t(c)
Note that the shift could be negative! …c….bzyx
E.g. if text = …ABABAB... A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. nd
ed., Ch. 7 7-8
Boyer-Moore algorithm

Based on the same two ideas:


• comparing pattern characters to text from right to left

• precomputing shift sizes in two tables

– bad-symbol table indicates how much to shift based on


text’s character causing a mismatch

– good-suffix table indicates how much to shift based on


matched part (suffix) of the pattern (taking advantage
of the periodic structure of the pattern)

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-9
Bad-symbol shift in Boyer-Moore algorithm

 If the rightmost character of the pattern doesn’t match, BM


algorithm acts as Horspool’s
 If the rightmost character of the pattern does match, BM
compares preceding characters right to left until either all
pattern’s characters match or a mismatch on text’s
character c is encountered after k > 0 matches
text c
k matches
pattern

bad-symbol shift d1 = max{t(c ) - k, 1}

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-10
Good-suffix shift in Boyer-Moore algorithm
 Good-suffix shift d2 is applied after 0 < k < m last characters
were matched
 d2(k) = the distance between (the last letter of) the matched
suffix of size k and (the last letter of ) its rightmost
occurrence in the pattern that is not preceded by the same
k

{
character preceding the suffix………………czyx……….
.…azyx….bzyx
yx….bzyx

}
d2(k)
Example: CABABA d2(1) = 4
-- -- ….azyx.…bzyx
yx.…bzyx
 If there is no such occurrence, match the longest part (tail)
of the k-character suffix with corresponding prefix;
if there are no such suffix-prefix matches, d2 (k) = m

Example: WOWWOW d2(2) = 5, d2(3) = 3, d2(4) = 3, d2(5) = 3


---
---- --
----
---
-----
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-11
Boyer-Moore Algorithm

After matching successfully 0 < k < m characters, the algorithm


shifts the pattern right by
d = max {d1, d2}
where d1 = max{t(c) - k, 1} is bad-symbol shift
d2(k) is good-suffix shift

Example: Find pattern AT_THAT in


WHICH_FINALLY
| _| HALTS.
| | | _| _|| AT
|| || _| THAT
|
AT_THAT AT_THAT
AT_THAT AT_THAT
AT_THAT
d1 = 7-1 = 6d1 = 4 -2 = 2
t AHT_? d2 123456
1 2 347 355555
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-12
Boyer-Moore Algorithm (cont.)
Step 1 Fill in the bad-symbol shift table
Step 2 Fill in the good-suffix shift table
Step 3 Align the pattern against the beginning of the text
Step 4 Repeat until a matching substring is found or text ends:
Compare the corresponding characters right to left.
If no characters match, retrieve entry t1(c) from the bad-
symbol table for the text’s character c causing the
mismatch and shift the pattern to the right by t1(c).
If 0 < k < m characters are matched, retrieve entry t1(c)
from the bad-symbol table for the text’s character c
causing the mismatch and entry d2(k) from the good-
suffix table and shift the pattern to the right by
d = max {d1, d2}
where d1 = max{t1(c) - k, 1}.
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-13
Example of Boyer-Moore alg. application

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _

1 2 6 6 6 6 6 6 6 6 6 6 6 6 3 6 6 6 6 6 6 6 6 6 6 6 6

B E S S _ K N E W _ A B O U T _ B A O B A B S
B A O B A B
d1 = t(K) = 6 B A O B A B
d1 = t(_)-2 = 4
k pattern d2 d2(2) = 5
1 BAOBAB 2 B A O B A B
d1 = t(_)-1 = 5
2 BAOBAB 5 d2(1) = 2
3 BAOBAB 5 B A O B A B
(success)
4 BAOBAB 5
5 BAOBAB 5 Worst-case time complexity: O(n+m).
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-14
Hashing
 A very efficient method for implementing a
dictionary, i.e., a set with the operations:
– find
– insert
– delete

 Based on representation-change and space-for-time


tradeoff ideas

 Important applications:
– symbol tables
– databases (extendible hashing)
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-15
Hash tables and hash functions
The idea of hashing is to map keys of a given file of size n into
a table of size m, called the hash table, by using a predefined
function, called the hash function,
h: K  location (cell) in the hash table

Example: student records, key = SSN. Hash function:


h(K) = K mod m where m is some integer (typically, prime)
If m = 1000, where is record with SSN= 314159265 stored?

Generally, a hash function should:


• be easy to compute
• distribute keys about evenly throughout the hash table

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-16
Collisions

If h(K1) = h(K2), there is a collision

 Good hash functions result in fewer collisions but some


collisions should be expected (birthday paradox)

 Two principal hashing schemes handle collisions differently:


• Open hashing
– each cell is a header of linked list of all keys hashed to it
• Closed hashing
– one key per cell
– in case of collision, finds another cell by
– linear probing: use next free bucket
– double hashing: use second hash function to compute increment
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-17
Open hashing (Separate chaining)

Keys are stored in linked lists outside a hash table whose


elements serve as the lists’ headers.
Example: A, FOOL, AND, HIS, MONEY, ARE, SOON, PARTED
h(K) = sum of K’s letters’ positions in the alphabet MOD 13

Key A FOOL AND HIS MONEY ARE SOON PARTED


h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

A AND MONEY FOOL HIS ARE PARTED

SOON
Search for KID
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-18
Open hashing (cont.)
 If hash function distributes keys uniformly, average length of
linked list will be α = n/m. This ratio is called load factor.

 For ideal hash functions, the average numbers of probes in


successful, S, and unsuccessful searches, U:
S  1+α/2, U = α (CLRS, Ch. 11)

 Load α is typically kept small (ideally, about 1)

 Open hashing still works if n > m

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-19
Closed hashing (Open addressing)
Keys are stored inside a hash table.
Key A FOOL AND HIS MONEY ARE SOON PARTED
h(K) 1 9 6 10 7 11 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12
A
A FOOL
A AND FOOL
A AND FOOL HIS
A AND MONEY FOOL HIS
A AND MONEY FOOL HIS ARE
A AND MONEY FOOL HIS ARE SOON
PARTED A AND MONEY FOOL HIS ARE SOON
Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-20
Closed hashing (cont.)
 Does not work if n > m
 Avoids pointers
 Deletions are not straightforward
 Number of probes to find/insert/delete a key depends on
load factor α = n/m (hash table density) and collision
resolution strategy. For linear probing:
S = (½) (1+ 1/(1- α)) and U = (½) (1+ 1/(1- α)²)
 As the table gets filled (α approaches 1), number of probes
in linear probing increases dramatically:

Copyright © 2007 Pearson Addison-Wesley. All rights reserved. A. Levitin “Introduction to the Design & Analysis of Algorithms,” 2nd ed., Ch. 7 7-21

You might also like