Suffix trees

**seminar flower** · 06-10-2012, 03:02 PM

Suffix trees

.pdf

Suffix tree.pdf (Size: 456.79 KB / Downloads: 49)

Data structures for string pattern
matching: Suffix trees
• Linear algorithms for exact string matching
– KMP
– Z-value algorithm
• What is suffix tree?
– A tree-like data structure for solving problems
involving strings.
– Related data structures: Trie (retrieval) & PATRICIA
(radix tree)
– Allow the storage of all substrings of a given string in
linear space
– Simple algorithm to solve string pattern matching
problem in linear time

Better than hash tables?

• Hash tables are certainly easier to understand.
And, one can produce a hash table of all length
k strings in O(m) time and look up a k-length
string x in O(k) time, finding all p places where
string x is found in O(p) time. This is the same
as the bound for suffix trees.
• What if you don’t know how long the string x is
going to be?
• And most other string matching tricks don’t work
for it either.

Suffix Tree: definition

• A suffix tree ST for an m-character string S
is a rooted directed tree with exactly m
leaves numbered 1 to m.
• Each internal node, other than the root,
has at least two children and each edge is
labeled with a nonempty substring of S.
• No two edges out of a node can have
edge-labels beginning with the same
character.
• The key feature of the suffix tree is that for
any leaf i, the concatenation of the edgelabels
on the path from the root to the leaf
i exactly spells out the suffix of S that
starts at position i.

Exact string matching problem

• Given a pattern P of length m, find all
occurrences of P in text T
– O(n+m) algorithm
• Solution: Build a suffix tree ST for text T in
O(m) time. Then, match the characters of
P along the unique path in ST until either
P is exhausted or no more matches are
possible.

Comparing to the other algorithms

• KMP and Boyer-Moore both achieve this worst
case bound.
– O(m+n) when the text and pattern are presented
together.
• Suffix trees are much faster when the text is fixed
and known first while the patterns vary.
– O(m) for single time processing the text, then only O(n)
for each new pattern.
• Based on suffix trees, is faster for searching a
number of patterns at one time against a single
text (exact set matching problem)
– Aho-Corasick algorithm: preprocessing P instead of T.

Exact string matching

• Both P (|P|=n) and T (|T|=m) are known:
– Suffix tree method achieves same worst-case bound
O(n+m) as KMP.
• T is fixed and build suffix tree, then P is input, k
is the number of occurrences of P
– Using suffix tree: O(n+k)
– In contrast (KMP, preprocess P): O(n+m) for any
single P
• P is fixed, then T is input
– Selecting KMP rather than suffix tree
– or Aho-Corasick algorithm (exact set matching
problem)

Applications

• Problems
– linear-time longest common substring
– constant-time least common ancestor
– maximally repetitive structures
– all-pairs suffix-prefix matching
– compression
– inexact matching
– conversion to suffix arrays

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	B-Trees report	project girl	1	913	19-09-2017, 01:21 PM Last Post: jaseela123
	Point range trees	nit_cal	0	9,557,413	25-08-2017, 09:32 PM Last Post: nit_cal
	Light Trees	presentation Abstract	0	227	11-05-2015, 03:04 PM Last Post: presentation Abstract
	Huffman Coding: An Application of Binary Trees and Priority Queues PPT	seminar projects maker	0	429	09-01-2014, 03:15 PM Last Post: seminar projects maker
	NEAREST NEIGHBOUR MATCHING USING K D-TREES pdf	seminar projects maker	0	401	28-09-2013, 01:05 PM Last Post: seminar projects maker
	R-trees Report	study tips	0	696	09-05-2013, 04:07 PM Last Post: study tips
	Binary Search Trees PPT	study tips	0	549	03-05-2013, 03:10 PM Last Post: study tips
	(Linear, Circular, Doubly) Linked Lists, Stacks, Queues, Trees	study tips	0	688	13-04-2013, 04:52 PM Last Post: study tips
	ppt on Trees	project girl	0	502	24-01-2013, 02:33 PM Last Post: project girl
	Trees Report	project girl	0	620	21-01-2013, 10:15 AM Last Post: project girl

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.