26-07-2012, 12:18 PM
Knuth-Morris-Pratt
Knuth-Morris-Pratt.pdf (Size: 183.61 KB / Downloads: 106)
Abstract
KMP is a string searching algorithm. The problem is to find the
occurrence of P in S, where S is the given string and P is the Pattern string.
Firstly, we must compare each character of P with S, if it is matched then
we must continue else we must shift P one position to the right and repeat
the matching procedure. The running time of the algorithm is O(m + n)
where m is the length of pattern P and n is the length of the string S.
Since we know that
m < n (1)
Therefore running time is O(n).
1 Statement of Problem
This problem is to find a string inside an another string. In a pattern P for
each position i, spi(p) is said to be the length of longest suffix of P[1, 2i], which
matches to the prefix P. It is similar to that of naive algorithm which performs
its comparisons from left to right. It also calculates the maximum possible shifts
from left to right for the pattern P.
History
The main reason behind the slowness of the naive algorithm is that after every
mismatch the pattern is shifted towards right by one character position to the
text so that when we again compare those pairs of characters which has aready
been scanned. But Knuth-Morris-Pratt observes that it is not necessary to side
the pattern by one character position only. They used the information that is
gained during previously compared characters and developed a algorithm knows
after the names as Knuth-Morris-Pratt algorithm.
Knuth, Morris and Pratt discovered first linear time string-matching algorithm
by following a tight analysis of the naive algorithm. Knuth-Morris-Pratt
algorithm keeps the information that naive approach wasted gathered during
Knuth-Morris-Pratt
the scan of the text. By avoiding this waste of information, it achieves a running
time of O(n+m), which is optimal in the worst case sense. That is, in the
worst case Knuth-Morris-Pratt algorithm, we have to examine all the characters
in the text and pattern at least once. It uses a minimum of comparison through
the use of a pre-computed table. The implementation of the Knuth-Morris-
Pratt is efficient because it minimizes the total number of comparisons of the
pattern against the input string, while consuming characters is a predictable
way from the buffer. There are string matching algorithms that have better
average-case running times than Knuth-Morris-Pratt algorithm, but there is no
single comparator algorithm with better worst-case characteristics. The worst
case performance of the consistent consumption of input characters makes a
buffered implementation possible.
Time Complexity
The prefix-function values can be computed in O(m). Number of iterations for
finding the match takes 2n iterations. Totally, the running time of the knuthmorris-
pratt algorithm will be O(m + n). In preprocessing phase it is O(m).
In searching phase it would be O(n + m) (since it is independent from the
alphabet size).
Comparison with other string algorithms
Karp-Rabin Algorithm
Statement of the problem
The Rabin-Karp algorithm is a string searching algorithm that uses hashing
function to find the set of pattern strings in a text. The key to performance
of Rabin-Karp algorithm is the efficient computations of hash values of the
successive substrings of the text. Compared to other algorithms it is inferior
to single pattern matching because of its slow worst case behaviour. It is best
suits for multiple pattern search.