0% found this document useful (0 votes)

25 views38 pages

String Matching Algorithms Explained

String matching involves finding occurrences of a pattern within a text, with applications in various fields like web searching and document processing. Several algorithms are used for this purpose, including the Naive String Matcher, Knuth-Morris-Pratt, Rabin-Karp, and string matching with finite automata, each with different efficiencies and methods of operation. The document provides detailed explanations of these algorithms, including their implementations and complexities.

Uploaded by

shreyathakur4637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views38 pages

String Matching Algorithms Explained

Uploaded by

shreyathakur4637

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

String Matching?

String Matching consists of finding all of the occurrences of

given string in a text
Some of the Applications are
• Finding patterns in documents formed using a large
alphabet
– Word processing
– Web searching
– Desktop search (Google, MSN)
• Matching strings of bytes containing
– Graphical data
– Machine code
• grep in unix
– grep searches for lines matching a pattern.
Pattern Matching
• Given a text string T[0..n-1] and a pattern
P[0..m-1], find all occurrences of the pattern
within the text.

• Example: T = ababcabdabcaabc and P = abc, the

occurrences are:
– first occurrence starts at T[3]
– second occurrence starts at T[9]
– third occurrence starts at T[13]
Let Σ denotes the set of alphabet .

• Given:

A string of alphabets T[1..n] of size “n” and a pattern P[1..m] of

size “m”
where, m<n.

• To Find:

Whether the pattern P occurs in text T or not. If it does, then give

the first occurrence of P in T.

The alphabets of both T and P are drawn from finite set Σ.

ALGORITM USED FOR STRING MATCHING

• The NAIVE STRING-MATCHER ALGORITHM

• The KNUTH-MORRIS-PRATT ALGORITHM
• The RABIN-KARP ALGORITHM
• STRING MATCHING WITH FINITE AUTOMATA ALGORITHM
The NAIVE-STRING-MATCHER algorithm
The naive algorithm finds all valid shifts using a loop that checks
the condition P[1 . . m] = T[s + 1 . . s + m] for each of the n - m + 1
possible values of s.

NAIVE-STRING-MATCHER(T, P)

1n length[T]

2m length[P]

3 for s 0 to n – m

4 do if P[1 . . m] = T[s + 1 . . s + m]

5 then print "Pattern occurs with shift" s

NAÏVE APPROACH

T: a b c a b d a a b c d e

P: a b d
Example ( Step – 1 )

T: a b c a b d a a b c d e

P: a b d

Mismatch after 3 Comparisons

Example ( Step – 2 )

T: a b c a b d a a b c d e

a b d
P:

Mismatch after 1 Comparison

Example ( Step – 3 )

T: a b c a b d a a b c d e

a b d
P:

Mismatch after 1 Comparison

Example ( Step – 4 )

T: a b c a b d a a b c d e

a b d
P:

Match found after 8 Comparisons

Thus, after 8 comparisons the

substring P is found in T.
Naive (char* T,char*P)
{
int n=strlen(T);
int m=strlen(P);
int s;
for(s=0;s< n-m+1; s++)
{
int j;
for(j=0;j<m;j++)
If(T(s+j)!=P[j])
break;
If(j=m)
Printf(“%d”, “pattern is found at shift ”+s);
}
}
s
T:
a b c a b d a a b c d e
j Naive (char* T,char*P)
P: n= 12
m=3 {
a b d s=0, j=0
Int n=strlen(T);
T[0]=P[0]
j=1 Int m= strlen(P);
T[0+1]=P[1] Int s;
j=2
for(s=0;s<n-m+1;s++)
T[0+2]!=P[2]
s=1, j=0 {
T[1+0]!=P[0] int j;
s=2, J=0
for(j=0;j<m;j++)
T[2+0]!=P[0]
s=3, J=0 If(T(s+j)!=P[j])
T[3+0]=P[0] break;
J=1
If(j=m)
T[3+1]=P[1]
J=2 Printf(“%d”, “pattern is found at
T[3+2]=P[2] shift ”+s);
}
}
The Knuth-Morris-Pratt (KMP)Algorithm
Knuth-Morris and Pratt introduce a linear time algorithm for the
string matching problem. A matching time of O (n) is achieved
by avoiding comparison with an element of 'S' that have
previously been involved in comparison with some element of
the pattern ‘P' to be matched. i.e., backtracking on the string 'S'
never occurs
Components of KMP Algorithm:

1. The Prefix Function (Π): The Prefix Function, Π for a pattern encapsulates
knowledge about how the pattern matches against the shift of itself. This
information can be used to avoid a useless shift of the pattern ‘P.' In other words,
this enables avoiding backtracking of the string 'S.‘

2. The KMP Matcher: With string 'S,' pattern ‘P' and prefix function 'Π' as
inputs, find the occurrence of ‘P' in 'S' and returns the number of shifts of ‘P'
after which occurrences are found.
Π table generates longest prefix that same as suffix
P a b c d a b c a b f
Π 0 0 0 0 1 2 3 1 2 0

P a b c d e a b f a b c
Π 0 0 0 0 0 1 2 0 1 2 3

P a a a a b a a c d
Π 0 1 2 3 0 1 2 0 0

P a b a b a b a b c a
Π 0 0 1 2 3 4 5 6 0 1
The Prefix Function (Π)
Following pseudo code compute the prefix function,
Π:

COMPUTE- PREFIX- FUNCTION (P)

1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:

In the above pseudo code for calculating the prefix function, the for loop
from step 4 to step 10 runs 'm' times. Step1 to Step3 take constant time.
Hence the running time of computing prefix function is O (m).
Example: Compute Π for the pattern ‘P' below:

Solution: Initially: m = length [P] = 7 Π [1] = 0 k=0

1. m ←length [P] //'p' pattern

to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and
P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
1. m ←length [P] //'p' pattern to be
matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and
P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π

After iteration 6 times, the prefix function computation is complete:

The KMP Matcher with the pattern 'p,' the string 'S' and prefix
function 'Π' as input, finds a match of p in S. Following pseudo
code compute the matching component of KMP algorithm:

KMP-MATCHER (T, P)
1. n ← length [T]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
[Link] q = m // is all of p matched?
11. then print "Pattern occurs with shift" i – m+1
12. q ← Π [q] // look for the next match
Running Time Analysis:

The for loop beginning in step 5 runs 'n' times, i.e., as long as the
length of the string 'S.' Since step 1 to step 4 take constant times,
the running time is dominated by this for the loop. Thus running
time of the matching function is O (n).

Example: Given a string 'T' and pattern 'P' as follows:

i
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
a b a b c a b c a b a b a b d
j
0 1 2 3 4 5
1. Take two variables i and j
2. Initialize i=1 and j=0
a b a b d
3. compare i and j+1
Π 0 0 1 2 0 If match, move i and j both to
right
else initialize j to its Π value.
If i and j+1 mismatch and j=0
then increase i by 1.
Let us execute the KMP Algorithm to find whether 'P' occurs in 'T.'
For ‘P' the prefix function, ? was computed previously and is as
follows:

Solution:
Initially: n = size of T = 15 m = size of P = 7
1. Take two variables i and q
2. Initialize i=1 and q=0
3. compare i and q+1
If match, move i and q both to right
else initialize q to its Π value.
If i and q+1 mismatch and q=0 then
increase i by 1.

4. q ← 0 // numbers of characters matched

5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does
not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character
matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift“
i-m
12. q ← Π [q] // look for the next match
Rabin-Karp Algorithm
Rabin-Karp Algorithm is a string searching algorithm created
by Richard M. Karp and Michael O. Rabin that uses hashing to
find any one of a set of pattern strings in a text.
In Rabin-Karp algorithm, we'll generate a hash of
our pattern that we are looking for & check if the rolling hash
of our text matches the pattern or not.

If it doesn't match, we can guarantee that the pattern doesn't

exist in the text. However, if it does match, the pattern can be
present in the text.
Complexity:
The running time of RABIN-KARP-MATCHER in the worst case scenario O ((n-
m+1) m but it has a good average case running time. If the expected number of
strong shifts is small O (1) and prime q is chosen to be quite large, then the
Rabin-Karp algorithm can be expected to run in time O (n+m) plus the time to
require to process spurious hits.
Rabin-karp(T,P)
n=length(T)
m=length(P)
H(P) =hash[P[1...m]] i.e. Pmod q
H(T) =hash[T[1...m]] i.e. Tmod q
For s=0 to n-m
if (h(P)=h(T))
if(P[0.......m-1] = T[s+0,s+1,.....S+m-1])
print “Pattern found with shipt”+s
if (s<n-m)
h(T)=hash(T(s+1,..........s+m))
String Matching with Finite Automata

This string matching algorithm is very efficient to

examine pattern by examine each text character
exactly once. Hence the time complexity is O(n).

A finite automaton is a quintuple (Q, , , s, F):

Q: the finite set of states
: the finite input alphabet
: the “transition function” from Qx to Q
s Q: the start state
F  Q: the set of final (accepting) states
How it works

A finite automaton accepts

strings in a specific language. It
begins in state q0 and reads
characters one at a time from
the input string. It makes
transitions based on these
characters, and if when it
reaches the end of the tape it
is in one of the accept states,
that string is accepted by the
language.
The Suffix Function

In order to properly
search for the string, the P = abcdabc
Prefixes are:
program must define a a, ab, abc, abcd, abcda, abcdab
suffix function () which
Suffixes are:
checks to see how much c,bc,abc, dabc, cdabc, bcdabc
of what it is reading
Suffix function finds the longest
matches the search substring in the pattern where prefix is
equal to suffix. Here abc
string at any given
moment.
String-Matching Automata
• For any pattern P of length m, we can define its
string matching automata:
Q = {0,…,m} (states)
q0 = 0 (start state)
F = {m} (accepting state)
(q,a) = (Pqa)
The transition function chooses the next state to
maintain the invariant:
(Ti) = (Ti)
After scanning in i characters, the state number is the
longest prefix of P that is also a suffix of Ti.
Finite-Automaton-Matcher
The simple loop
structure implies a
running time for a
string of length n is
O(n).
However: this is only
the running time for
the actual string
matching. It does not
include the time it
takes to compute the
transition function.
b,c

b,c

DNA Pattern Matching Algorithms
No ratings yet
DNA Pattern Matching Algorithms
27 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
28 pages
Algebraic Computation in DAA
No ratings yet
Algebraic Computation in DAA
22 pages
KMP String Matching Explained
No ratings yet
KMP String Matching Explained
21 pages
Comparing Rabin-Karp and KMP Algorithms
No ratings yet
Comparing Rabin-Karp and KMP Algorithms
41 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
11 pages
KMP Algorithm and Prefix Function
No ratings yet
KMP Algorithm and Prefix Function
21 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
27 pages
Knuth-Morris-Pratt String Matching Algorithm
No ratings yet
Knuth-Morris-Pratt String Matching Algorithm
20 pages
Knuth-Morris-Pratt String Matching Algorithm
No ratings yet
Knuth-Morris-Pratt String Matching Algorithm
20 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
30 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
35 pages
Redundant Chains in Trie Structures
No ratings yet
Redundant Chains in Trie Structures
34 pages
KMP Algorithm: Pseudocode & Function
No ratings yet
KMP Algorithm: Pseudocode & Function
20 pages
Knuth-Morris-Pratt String Search Algorithm
No ratings yet
Knuth-Morris-Pratt String Search Algorithm
12 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
43 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
63 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
52 pages
Comparing String Matching Algorithms
No ratings yet
Comparing String Matching Algorithms
35 pages
Unit-5 String Matching
No ratings yet
Unit-5 String Matching
45 pages
Redundant Chains in Trie Algorithms
No ratings yet
Redundant Chains in Trie Algorithms
22 pages
Understanding Web Crawlers and KMP Algorithm
No ratings yet
Understanding Web Crawlers and KMP Algorithm
28 pages
StringMatching Slide
No ratings yet
StringMatching Slide
20 pages
KMP Algorithm for Pattern Matching
No ratings yet
KMP Algorithm for Pattern Matching
26 pages
KMP Algorithm for String Matching
No ratings yet
KMP Algorithm for String Matching
5 pages
Module 3 - String Matching Algorithms - Modified
No ratings yet
Module 3 - String Matching Algorithms - Modified
52 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
42 pages
KMP Algorithm for String Matching
No ratings yet
KMP Algorithm for String Matching
20 pages
String Searching Algorithms: KMP & Brute Force
No ratings yet
String Searching Algorithms: KMP & Brute Force
40 pages
DSA Works
No ratings yet
DSA Works
16 pages
01 String Matching
No ratings yet
01 String Matching
53 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
27 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
19 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
94 pages
Unit 2 - Brute Force & Divide and Conquer Strategy
No ratings yet
Unit 2 - Brute Force & Divide and Conquer Strategy
87 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
34 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
42 pages
Knuth-Morris-Pratt Algorithm Overview
No ratings yet
Knuth-Morris-Pratt Algorithm Overview
4 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
7 pages
String Matching Algorithms Explained
No ratings yet
String Matching Algorithms Explained
11 pages
String Algorithms and Matching Techniques
100% (1)
String Algorithms and Matching Techniques
12 pages
Polynomial and NP Algorithm Overview
No ratings yet
Polynomial and NP Algorithm Overview
14 pages
Understanding the KMP String Matching Algorithm
No ratings yet
Understanding the KMP String Matching Algorithm
24 pages
Patternmatching
No ratings yet
Patternmatching
29 pages
String Matching Techniques Explained
No ratings yet
String Matching Techniques Explained
5 pages
Redundant Chains in Trie Algorithms
No ratings yet
Redundant Chains in Trie Algorithms
23 pages
Naive Algorithm for String Matching
No ratings yet
Naive Algorithm for String Matching
5 pages
String Matching Algorithms Explained
100% (1)
String Matching Algorithms Explained
27 pages
Pattern Matching Algorithms Overview
No ratings yet
Pattern Matching Algorithms Overview
14 pages
KMP Algorithm for String Matching
No ratings yet
KMP Algorithm for String Matching
46 pages
Pattern Matching Algorithms Overview
No ratings yet
Pattern Matching Algorithms Overview
41 pages
String Sorting and Substring Searching
No ratings yet
String Sorting and Substring Searching
12 pages
String Pattern Matching Algorithms
No ratings yet
String Pattern Matching Algorithms
43 pages
KMP Algorithm in Bioinformatics
No ratings yet
KMP Algorithm in Bioinformatics
7 pages
Adv Data Structure Chapter - 6
No ratings yet
Adv Data Structure Chapter - 6
15 pages
String Matching Algorithms Overview
No ratings yet
String Matching Algorithms Overview
53 pages
String Matching Algorithms in Bioinformatics
No ratings yet
String Matching Algorithms in Bioinformatics
7 pages
Hangman Game Project in Python
No ratings yet
Hangman Game Project in Python
10 pages
Seer Robotics Product Portfolio Overview
No ratings yet
Seer Robotics Product Portfolio Overview
25 pages
SYS600 - Communication Programming Interface
No ratings yet
SYS600 - Communication Programming Interface
88 pages
Salesforce Admin Ebook 1 1
No ratings yet
Salesforce Admin Ebook 1 1
155 pages
Emircom IT Solutions Overview
No ratings yet
Emircom IT Solutions Overview
3 pages
DC Motor Control with L298N and Arduino
No ratings yet
DC Motor Control with L298N and Arduino
7 pages
Overview Crowdstrike - 2026
No ratings yet
Overview Crowdstrike - 2026
37 pages
Optimod-FM 2200 Operation Guide
No ratings yet
Optimod-FM 2200 Operation Guide
20 pages
Computer Network Lesson Plan for Grade 12
No ratings yet
Computer Network Lesson Plan for Grade 12
4 pages
Implementing eAuction Database Design
No ratings yet
Implementing eAuction Database Design
18 pages
Programming Paradigms II: C++ & Java
No ratings yet
Programming Paradigms II: C++ & Java
50 pages
WIBO-THINK Wi-Fi Robot Controller
No ratings yet
WIBO-THINK Wi-Fi Robot Controller
13 pages
GPU Video Processing with OpenCV
No ratings yet
GPU Video Processing with OpenCV
11 pages
MSBTE Mobile App Development Q&A Guide
No ratings yet
MSBTE Mobile App Development Q&A Guide
6 pages
Data Structures & Algorithms Lab Syllabus
No ratings yet
Data Structures & Algorithms Lab Syllabus
56 pages
CMMS-PB 2011-03a 554346g1
No ratings yet
CMMS-PB 2011-03a 554346g1
24 pages
Cyber Terrorism: A Growing Threat
No ratings yet
Cyber Terrorism: A Growing Threat
3 pages
Cybersecurity Awareness and Tools Guide
No ratings yet
Cybersecurity Awareness and Tools Guide
56 pages
Router Configuration and DHCP Setup Guide
No ratings yet
Router Configuration and DHCP Setup Guide
3 pages
Proteus Software Installation Guide
No ratings yet
Proteus Software Installation Guide
17 pages
GitHub Profile of Tran The Hao
No ratings yet
GitHub Profile of Tran The Hao
13 pages
AP CSP Exam Review Guide
100% (1)
AP CSP Exam Review Guide
4 pages
Model Answers for OOP Exam 17432
No ratings yet
Model Answers for OOP Exam 17432
33 pages
Intro to XR: Technologies & Applications
100% (1)
Intro to XR: Technologies & Applications
119 pages
Si Fargo Hdp5000e Printer BR en 0
No ratings yet
Si Fargo Hdp5000e Printer BR en 0
8 pages
SYLLABUS
No ratings yet
SYLLABUS
2 pages
CIND123 Lab 1 Console
No ratings yet
CIND123 Lab 1 Console
4 pages
Network Virtualization Overview and Benefits
No ratings yet
Network Virtualization Overview and Benefits
20 pages
Computerized Library System for FCAN
25% (4)
Computerized Library System for FCAN
12 pages
MS Word Practice File Download
No ratings yet
MS Word Practice File Download
3 pages

String Matching Algorithms Explained

Uploaded by

String Matching Algorithms Explained

Uploaded by

String Matching?

String Matching consists of finding all of the occurrences of

• Example: T = ababcabdabcaabc and P = abc, the

A string of alphabets T[1..n] of size “n” and a pattern P[1..m] of

Whether the pattern P occurs in text T or not. If it does, then give

The alphabets of both T and P are drawn from finite set Σ.

• The NAIVE STRING-MATCHER ALGORITHM

5 then print "Pattern occurs with shift" s

Mismatch after 3 Comparisons

Mismatch after 1 Comparison

Mismatch after 1 Comparison

Match found after 8 Comparisons

Thus, after 8 comparisons the

COMPUTE- PREFIX- FUNCTION (P)

Solution: Initially: m = length [P] = 7 Π [1] = 0 k=0

1. m ←length [P] //'p' pattern

After iteration 6 times, the prefix function computation is complete:

Example: Given a string 'T' and pattern 'P' as follows:

4. q ← 0 // numbers of characters matched

If it doesn't match, we can guarantee that the pattern doesn't

This string matching algorithm is very efficient to

A finite automaton is a quintuple (Q, , , s, F):

A finite automaton accepts

You might also like