0% found this document useful (0 votes)
8 views13 pages

Understanding Suffix Arrays in Text Processing

All about suffix array data structure

Uploaded by

nav21techi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views13 pages

Understanding Suffix Arrays in Text Processing

All about suffix array data structure

Uploaded by

nav21techi
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Suffix Array

Suffix Array
▪ A suffix array for a text T is an array of all suffixes of T arranged in sorted order
▪ It is an array of integers that represents the starting indices of all the suffixes of a
string, arranged in lexicographically sorted order
▪ It is a data structure used in string processing, especially for fast searching, pattern
matching and text indexing
▪ Applications:
▪ Solving the DNA sequencing problem in bioinformatics
▪ Finding a substring in a string
▪ Finding number of occurrences of a substring in a string
▪ Finding longest repeated substring in a string
▪ Comparing Two Substrings of a String

2
Suffix Array
▪ Example: Text string S = banana

3
SUFFIX ARRAY EXAMPLE
▪ In a suffix array, all suffixes of S are in the non-decreasing lexical order.
▪ For example, S=“ATCACATCATCA”

i 0 1 2 3 4 5 6 7 8 9 10 11
A 11 3 8 0 5 10 2 7 4 9 1 6
3 ATCACATCATCA S(0) 0 A S(11)
10 TCACATCATCA S(1) 1 ACATCATCA S(3)
6 CACATCATCA S(2) 2 ATCA S(8)
1 ACATCATCA S(3) 3 ATCACATCATCA S(0)
8 CATCATCA S(4) 4 ATCATCA S(5)
4 ATCATCA S(5) 5 CA S(10)
11 TCATCA S(6) 6 CACATCATCA S(2)
7 CATCA S(7) 7 CATCA S(7)
2 ATCA S(8) 8 CATCATCA S(4)
9 TCA S(9) 9 TCA S(9)
5 CA S(10) 10 TCACATCATCA S(1)
0 A S(11) 11 TCATCA S(6) 4
Search in Suffix Array
▪ A binary search of the suffix array would be enough to determine if the pattern P is in
the text
▪ O(m log n) where the log n comparisons for binary search on the suffix array, and m
time for each comparison during the binary search involves comparing the pattern
(of length m) with a suffix from the array

5
Linear Time Suffix Array Construction
Skew-algorithm
▪ Step 1:
SA≠ 0 = sort the suffixes starting at position i ≠ 0 mod 3.
▪ Step 2:
SA= 0 = sort the suffixes starting at position i = 0 mod 3.
▪ Step 3:
SA = merge SA= 0 and SA≠ 0 .

6
Construction of Suffix Array in Linear Time
Radix sort sorts keys by processing them character by character (usually from rightmost to
leftmost

"caa", "cab", "aba", "aac"

7
Construction of Suffix Array in Linear Time

8
Construction of Suffix Array in Linear Time

9
Construction of Suffix Array in Linear Time

10
Construction of Suffix Array in Linear Time
S = ABRACADABRA

11
Construction of Suffix Array in Linear Time

12
Time Complexity Analysis

▪ Step1: O(n) + T(2n/3)


▪ Step2: O(n)
▪ Step3: O(n)
▪ T(n) = O(n) + T(2n/3) = O(n)

You might also like