Artificial Intelligence/Search/Exhaustive search/Finite state automata

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Searching with Finite State Automata[edit | edit source]

A common way of searching for a pattern in a string of symbols is by using Finite State Automata, also known as finite state machines or FSMs. FSMs are abstract structures which can be implemented in a number of different ways, but all of them share some common properties. Each FSM has[1]:

  • a finite set of states
  • a way of receiving inputs
  • transitions, which are paths from one state to another triggered by inputs
  • optional actions, which come in a few different flavours depending on what kind of event triggers them:
    • entry actions occur when entering a particular state
    • exit actions occur when exiting a particular state
    • input actions only occur when a particular class of input is received while the machine is in a particular state
    • transition actions occur during specified transitions

Acceptors[edit | edit source]

When used for pattern matches within strings, an FSM is given each successive character of the string as its input. For simple substring matching, the FSM would have one state for each number of characters in the substring matched so far.

Finite state automata which simply accept or reject an input string with a yes/no response are called acceptors [2]. Here is a state diagram of an acceptor that returns a "yes" response when a string contains the substring "aba":

The machine starts in the "empty" state, representing an empty substring buffer. The "other" transitions represent inputs of any character besides "a" or "b", so the only input that will get the machine out of the empty state is the first character of the substring to be matched, "a". From then on, inputs are converted into transitions from state to state.

The double circle around "aba" indicates that it is an accept state of this FSM, meaning that the machine will give a "yes" result if it is in this state when it has no more inputs to process. All other states are non-accpting states and would result in a "no" result at the end of processing.

Transducers[edit | edit source]

An FSM with actions that provide output is instead called a finite state transducer[2]. Here is an example of a transducer written in Ruby which strips C++/Java-style comments out of an input stream:

#!/usr/bin/env ruby

# comment_stripper.rb
# 
# Requires Ruby 1.8.7
# 
# Strips C++/Java style comments from an input stream using a finite state
# transducer. Single-line comments begin with two forward slashes ("//") and
# extend to the end of the line. Block comments begin with a slash followed by an
# asterisk ("/*") and end with an asterisk followed by a slash ("*/").
# 
# When run from the command line, performs the transformation on standard input.
# 
# Example:
# 
# cat hello_world.cpp | ruby comment_stripper.rb

class CommentStripper
  def initialize()
    @current_state = :code
  end
  
  def input(c)
    output = INPUT_ACTIONS[c][@current_state] || ''
    @current_state = TRANSITIONS[c][@current_state] || @current_state
    return output
  end
  
  TRANSITIONS = Hash.new do |hash,input|
    # triggered by any characters other than the ones explicitly defined below
    {
      :slash  => :code,
      :star   => :block
    }
  end.merge(
    # overrides the default transitions defined above
    "/" => {
      :code   => :slash,
      :slash  => :line,
      :star   => :code
    },
    "*" => {
      :slash  => :block,
      :block  => :star
    },
    "\n" => {
      :line   => :code,
      :slash  => :code,
      :star   => :block
    }
  )
  
  INPUT_ACTIONS = Hash.new do |hash,input|
    {
      :code   => input,
      :slash  => ("/" + input)
    }
  end.merge(
    "/" => {},
    "*" => {
      :code   => "*"
    },
    "\n" => {
      :code   => "\n",
      :slash  => "/\n",
      :line   => "\n"
    }
  )
end

if $0 == __FILE__
  cs = CommentStripper.new
  $stdin.each_char do |c|
    print cs.input(c)
  end
end

The transitions and input actions are both defined in nested hash tables keyed by input and state. The input method takes in a single character of the input string, performs any necessary transition to a new state, and returns the transduced character based on the original state. This diagram shows the transitions and states defined in this machine:

References[edit | edit source]

  1. Wagner, Ferdinand (2006). Modeling Software with Finite State Machines. CRC Press. p. 369. ISBN 0849380863. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  2. a b Roche, Emmanuel (1997). Finite-state language processing. MIT Press. p. 464. ISBN 0262181827.

Further Reading[edit | edit source]

  • Booth, Taylor L. (1967). Sequential Machines and Automata Theory (1st ed.). New York: John Wiley and Sons, Inc. Library of Congress Card Catalog Number 67-25924. {{cite book}}: Cite has empty unknown parameter: |coauthors= (help)