Algorithm Implementation/String searching/Knuth-Morris-Pratt pattern matcher
From Wikibooks, the open-content textbooks collection
Contents |
[edit] Pascal
[edit] Python
# Knuth-Morris-Pratt string matching # David Eppstein, UC Irvine, 1 Mar 2002 #from http://code.activestate.com/recipes/117214/ def KnuthMorrisPratt(text, pattern): '''Yields all starting positions of copies of the pattern in the text. Calling conventions are similar to string.find, but its arguments can be lists or iterators, not just strings, it returns all matches, not just the first one, and it does not need the whole text in memory at once. Whenever it yields, it will have read the text exactly up to and including the match that caused the yield.''' # allow indexing into pattern and protect against change during yield pattern = list(pattern) # build table of shift amounts shifts = [1] * (len(pattern) + 1) shift = 1 for pos in range(len(pattern)): while shift <= pos and pattern[pos] != pattern[pos-shift]: shift += shifts[pos-shift] shifts[pos+1] = shift # do the actual search startPos = 0 matchLen = 0 for c in text: while matchLen == len(pattern) or \ matchLen >= 0 and pattern[matchLen] != c: startPos += shifts[matchLen] matchLen -= shifts[matchLen] matchLen += 1 if matchLen == len(pattern): yield startPos
[edit] Ada
The following Ada implementation contains both the algorithms as well as a test program to test correctness of implementation.
[edit] C++
The following C++ implementation contains only the algorithms without a test program to test correctness of implementation.
#include <iostream> #include <vector> using namespace std; //---------------------------- //Returns a vector containing the zero based index of // the start of each match of the string K in S. // Matches may overlap //---------------------------- vector<int> KMP(string S, string K) { vector<int> T(K.size() + 1, -1); for(int i = 1; i <= K.size(); i++) { int pos = T[i - 1]; while(pos != -1 && K[pos] != K[i - 1]) pos = T[pos]; T[i] = pos + 1; } vector<int> matches; int sp = 0; int kp = 0; while(sp < S.size()) { while(kp != -1 && (kp == K.size() || K[kp] != S[sp])) kp = T[kp]; kp++; sp++; if(kp == K.size()) matches.push_back(sp - K.size()); } return matches; }
[edit] C and Java
C and Java implementations can be found in the history of the Wikipedia page on the same topic. The correctness and licensing status of these implementations are not clear at this time.
[edit] Lua
This implementation requires Lua version 5.1 or better.
-- Implementation of the Knuth Morris Pratt algorithm to find -- substrings within strings. -- Sean Brewer -- Berry College CS 320 -- March 25, 2008 -- Generate the failure table using the substring and the length -- of the substring function generate_fail_table(substring,sublen) comparisons = 0 fail = {0} for i=2,sublen do temp = fail[i - 1] while temp > 0 and string.sub(substring,temp,temp) ~= string.sub(substring,i - 1,i - 1) do comparisons = comparisons + 1 temp = fail[temp] end fail[i] = temp + 1 end return {fail, comparisons + 1} end --The actual kmp algorithm which takes --the substring and text as arguments function kmp_algorithm(substring,text) --starting positions... subloc = 1 textloc = 1 --get the length of substring and text.. sublen = string.len(substring) textlen = string.len(text) --generate the fail links... failure = generate_fail_table(substring,sublen) --store failure comparisons comparisons = failure[2] --store fail table fail = failure[1] --the main loop.. while textloc <= textlen and subloc <= sublen do if subloc == 0 or string.sub(text,textloc,textloc) == string.sub(substring,subloc,subloc) then comparisons = comparisons + 1 textloc = textloc + 1 subloc = subloc + 1 else --no match found so follow fail link subloc = fail[subloc] end end if subloc > sublen then return {textloc - sublen, comparisons} else return {0, comparisons} end end