The Boyer-Moore Algorithm pdf

**project girl** · 19-01-2013, 12:33 PM

The Boyer-Moore Algorithm

.pdf

The Boyer-Moore.pdf (Size: 157.45 KB / Downloads: 180)

Introduction

This chapter develops a number of classical comparison-based matching algorithms
for the exact matching problem. With suitable extensions, all of
these algorithms can be implemented to run in linear worst case time, and
all achieve this performance by preprocessing pattern P. (Methods that
preprocess T will be considered in Part II of the book.) The original preprocessing
methods for these various algorithms are related in spirit, but quite
dierent in conceptual diculty. Some of the original preprocessing methods
are quite dicult1. This chapter does not follow the original preprocessing
methods but instead exploits fundamental preprocessing, developed in the
previous chapter, to implement the needed preprocessing for each specic
matching algorithm.
Also, in contrast to previous expositions, we emphasize the Boyer-Moore
method over the Knuth-Morris-Pratt method, since Boyer-Moore is the practical
method of choice for exact matching. Knuth-Morris-Pratt is nonetheless
completely developed, partly for historical reasons but mostly because
it generalizes to problems such as real-time string matching and matching
against a set of patterns more easily than Boyer-Moore does. These two
topics will be described in this chapter and the next.

The Boyer-Moore Algorithm

As in the naive algorithm, the Boyer-Moore algorithm successively aligns P
with T and then checks whether P matches the opposing characters of T.
Further, after the check is complete, P is shifted right relative to T just as
in the naive algorithm. However, the Boyer-Moore algorithm contains three
clever ideas not contained in the naive algorithm { the right to left scan, the
bad character shift rule and the good sux shift rule. Together, these ideas
lead to a method that typically examines fewer than m + n characters (an
expected sublinear-time method), and that (with a certain extension) runs
in linear worst case time. Our discussion of the Boyer-Moore algorithm, and
extensions of it, concentrates on provable aspects of its behavior. Extensive
experimental and practical studies of Boyer-Moore and variants have been
reported in [?, ?, ?, ?, ?].

Right to left scan

For any alignment of P with T the Boyer-Moore algorithm checks for an
occurrence of P by scanning characters from right to left rather than from
left to right as in the naive algorithm. For example consider the alignment
of P against T shown below.

Bad character rule

To get the idea of the bad character rule, suppose that the last (rightmost)
character of P is y and the character in T it aligns with is x 6= y. When this
initial mismatch occurs, if we know the rightmost position in P of character
x, we can safely shift P to the right so that the rightmost x in P is below
the mismatched x in T. Any shorter shift would only result in an immediate
mismatch. So, the longer shift is correct, i.e., it will not shift past any
occurrence of P in T. Further, if x never occurs in P, then we can shift P
completely past the point of mismatch in T. In these cases, some characters
of T will never be examined and the method will actually run in \sub-linear"
time. This observation is formalized below.

Extended bad character rule

The bad character rule is a useful heuristic for mismatches near the right end
of P, but it has no eect if the mismatching character from T occurs in P to
the right of the mismatch point. This may be common when the alphabet
is small and the text contains many similar, but not exact, substrings. That
situation is typical of DNA which has an alphabet of size four, and even
protein, which has an alphabet of size twenty, often contains dierent regions
of high similarity.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Software Crisis pdf	study tips	1	2,117	21-09-2017, 04:31 PM Last Post: jaseela123
	HOW EMAIL WORKS pdf	project girl	1	3,067	20-09-2017, 11:39 AM Last Post: jaseela123
	Cyber crime detection, investigation and prosecution pdf	seminar projects maker	1	958	20-09-2017, 11:31 AM Last Post: jaseela123
	Review: Context Aware Tools for Smart Home Development pdf	study tips	1	1,227	20-09-2017, 11:22 AM Last Post: jaseela123
	Getting Started with the MAXQ1103 Evaluation Kit and the CrossWorks Compiler pdf	project girl	1	969	15-09-2017, 03:11 PM Last Post: jaseela123
	Wireless Application Protocol (WAP) pdf	project girl	1	1,531	15-09-2017, 02:42 PM Last Post: jaseela123
	MAC Protocol for Reliable Multicast over Multi-Hop Wireless Ad Hoc Networks pdf	study tips	1	1,029	15-09-2017, 12:39 PM Last Post: jaseela123
	Wireless Automotive Communications pdf	seminar projects maker	1	637	14-09-2017, 01:27 PM Last Post: jaseela123
	Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data pdf	study tips	1	2,018	13-09-2017, 12:59 PM Last Post: jaseela123
	Internetworking connectionless and connection-oriented networks pdf	project girl	1	1,151	13-09-2017, 11:03 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.