Computability and Complexity/Formal Languages/Chomsky Hierarchy/Context Sensitive Languages

Jump to: navigation, search

Context Sensitive Languages

Context sensitive languages are those languages whose strings can be generated by a context sensitive grammar. This type of grammar is similar in form to a context free grammar (see Context Free Languages), but instead of rules of the form A -> X, where A is a non-terminal symbol and X is a string of terminals and non-terminals, a context sensitive grammar contains rules of the form X -> Y, where X and Y are strings of terminals and non-terminals, with the constraint that Y be at least as long as X. There is an exception to this rule, that if S is the starting symbol, and S never appears on the right side of a transition, then the transition S -> ε is allowed. This makes it possible for a grammar to generate the empty string.

An example of such a grammar is:

• S -> ε
• S -> aSBC
• S -> aBC
• CB -> BC
• aB -> ab
• bB -> bb
• bC -> bc
• cC -> cc

This grammar generates the language $\{a^nb^nc^n|n \ge 0\}$, a language which is not context free.

An alternative definition for a context sensitive grammar is a grammar where each of transitions has the form YAZ -> YXZ, where A is a non-terminal, and X, Y and Z are all strings of terminals and non-terminals (Y and Z may empty). Because Y and Z remain unchanged, this fills the same function as a transition A -> X, but can use the context of A as a requirement for the transition. In 1, Noam Chomsky shows that these two definitions are equivalent, and produce the same languages.

Any context free grammar is also a context sensitive grammar under the second definition which only uses empty contexts, so the set of context free languages is a subset of the set of context sensitive languages. Sine, in the example above, a context sensitive grammar can produce the language $\{a^{n}b^{n}c^{n}|n \ge 0\}$ while a PDA cannot, the context free languages must be a proper subset of the context sensitive languages.

Linear Bounded Automata

The class of context sensitive languages is equivalent to the class of languages that can be recognized by linear bounded automata. A linear bounded automaton (LBA) is a deterministic machine based on a state, a "tape" that contains the input string, and a read/write head that moves left and right along the tape. The machine compares its current state and the symbol on the tape at the head's position with a finite number of rules to determine its next state, what symbol to write on the tape, and which direction (left/right) to move the tape head in. Similarly to both finite automata and pushdown automata, linear bounded automata accept or reject an input based on whether they halt in an accept state or not. An LBA halts when there are no rules for its current combination of state and character being read.

The only other restriction on an LBA is that the tape must be finite, with a size linearly derived from the size of the input string. For example, if a particular machine uses tapes 2*s+5 long, where s is the size of the input, then when processing a string of size 10, it would have a tape of length 25.

Abilities

This design gives the LBA a great deal of computational freedom, and so it can recognize any languages who's space requirements for string recognition grow linearly. One example is the language $\{a^{n}b^{n}c^{n}d^{n}|n \ge 0\}$. This language can be recognized in space s+1, meaning that it only requires the section of the tape it was written on, plus a single blank character at the end. This is because unlike a PDA or DFA, an LBA need not deal with the input in sequence - it can (and does in the sample machine available below) mark off the characters in sets, first an 'a', then a 'b', then a 'c', then a 'd', and repeat. It will not accept if it does not exhaust all character types on the same run.

This language cannot be recognized by DFAs and PDAs, because they would have to count and store the number of 'a's as they went by. A DFA could only remember a number of 'a's equal to or fewer than its number of states, and so could not check the number of 'b's in an arbitrarily long string. A PDA could store the number of 'a's on the stack, but would necessarily destroy that information when comparing them to the 'b's, and thus could not compare it to the number of 'c's.

Looping

Note that while it is possible for an LBA to enter an infinite loop, since there is only a finite tape, a finite number of symbols to put on it, and a finite number of states it can be in, the total number of conditions for a given machine and input is finite. That means that any infinite loop must, within a finite number of steps, return to a set of conditions it has already encountered. Also, since LBAs are deterministic, a machine that repeats its set of conditions will continue to repeat the loop it has just traversed indefinitely. These two statements show that an LBA is caught in an infinite loop (and thus will never accept or reject) if and only if it repeats its total conditions (including state, head position, and tape contents). Since $S \cdot T \cdot A^T$ is the total number of possible conditions, where S is the number of states, T is the size of the tape, and A is the size of the alphabet, any machine that runs through more than S*T*A^T steps must repeat a set of conditions, and thus be in an infinite loop. While this number may be quite large, it provides a definite, finite-time method of determining whether a given LBA will loop on a given input. The similar but more advanced Turing Machine (see Unrestricted Languages) will have no such method.

Examples

The code below is a sample LBA emulator in Perl. Given the description of a machine and an input string, it simulates the machine processing the input string, and shows whether the machine accepts or not.

The syntax is: progname.pl LBAFile inputFile, where LBAFile is a text file containing the LBA instructions, and inputFile is a text file containing the input string. Some sample inputs, including a set of LBA instructions for a machine to recognize $\{a^nb^nc^nd^n|n \ge 0\}$, can be found under sample LBA inputs

#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;

my (@tape, $tapeIndex,$tapeMax, %accepts, %alphabet, @rules);

# Grabs the filenames for the machine and the word to be run on it.
my $lbaFile =$ARGV[0];
my $input =$ARGV[1];

# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState. my$machine = readLBA($lbaFile); # Rules and accepts are extracted from the$machine structure for ease of access.
@rules = @{$machine->{rules}}; %accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
($tapeMax, @tape) = readInput($input, $machine->{alphabet},$machine->{tapeBound});
# $changed records whether or not a rule has been used when running through the rules list to make transitions. my$changed = 1;
# $state is the state the Turing Machine is in, and is initialized to the start state from the machine file. my$state = $machine->{startState}; #$tapeIndex is the position of the machine's head on the tape.
$tapeIndex = 0; # Now that the machine is initialized, we can begin making transitions # As long as things keep changing, keep cycling through the rules. while($changed)
{
# Unless it changes while going through the rules, the machine will terminate.
$changed = 0; # The current tape is printed, with the symbol under the head highlighted. print "@tape[0..$tapeIndex-1]<".$tape[$tapeIndex].">@tape[$tapeIndex+1..@tape-1]\n"; # The current state of the machine is printed. print "$state\n";
# A new state is calculated by checking conditions against the list of rules
for my $rNum (0..@rules-1) { # print "::$rules[$rNum][0]??$branches[$i][0]"; # print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
#            print "::$rules[$rNum][2]??".${$branches[$i][2]}[@{$branches[$i][2]}-1]."::\n"; # Checks the current state and tape symbol against the rule being examined if (($rules[$rNum][0] eq$state) and
($rules[$rNum][1] eq $tape[$tapeIndex]))
{
# The state transition is printed.
#            print "State: ".$state." -> ".$rules[$rNum][2]."\n\n"; # Set the new state,$state = $rules[$rNum][2];
# Write the new symbol to the tape,
$tape[$tapeIndex] = $rules[$rNum][3];
# Shift the tape to the new index,
$tapeIndex +=$rules[$rNum][4]; # and make sure it hasn't run past the left edge of the tape. if ($tapeIndex < 0) { $tapeIndex = 0; } # If the machine nears the end of the allocated tape, and more tape is allowed, expand the tape. if (($tapeIndex >= @tape-2) and (@tape < $tapeMax)) { push(@tape, "_"); } # If the head runs past the right end of the tape, keep it on the right end. if ($tapeIndex >= $tapeMax-1) {$tapeIndex = $tapeMax-1; }$changed = 1;
# Once we've made a transition, we can stop and begin looking for the next one.
last;
}
}
}
# When there are no more possible transitions, if the machine is in an accepting state,
if (defined($accepts{$state}))
{
# Print that it accepts and quit.
print "The machine accepts the string.\n";
exit;
}
# Otherwise, print that it does not accept, and quit.
print "The machine does not accept the string.\n";
exit;

###################################################

sub readLBA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
my (%states, %accepts, %alphabet, @rules, @tapeBound);
open(INFILE, shift) or die "Can't open machine file: $!"; # This block reads the list of states from the machine file. # Discards the section header, <INFILE>; my$line = <INFILE>;
chomp($line); my @words = &parse_line('\s+', 0,$line);
for (@words)
{
# records the state names for checking the rules,
$states{$_} = 0;
}

# This block reads the start state from the machine file.
# Discards the header,
<INFILE>;
my $startState = <INFILE>; # takes the whole line as the start state, chomp($startState);
# and makes sure that the start state is defined in the list of states.
defined($states{$startState}) or die "The start state $startState isn't a state!"; # This block reads the list of accepting states from the machine file. # Discards the header, <INFILE>;$line = <INFILE>;
chomp($line); # breaks up the line into state names, @words = &parse_line('\s+', 0,$line);
for (@words)
{
# checks to make sure that the accept states are defined states,
defined($states{$_}) or die "$_ isn't a state!"; # and defines those names in a new hash. The use of a hash makes it easier to determine later if a specific state name accepts or not.$accepts{$_} = 1; } # This block reads the list of symbols in the alphabet from the machine file. # Discards the header, <INFILE>;$line = <INFILE>;
chomp($line); # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length), @words = &parse_line('\s+', 0,$line);
# e is used as the empty symbol in the rules.
$alphabet{e} = 1; for (@words) { # This records which symbols are in the alphabet for checking the rules.$alphabet{$_} = 0; } # This block reads the linear bound on the tape size from the machine file. # Discards the header, <INFILE>;$line = <INFILE>;
chomp($line); # breaks up the line into two parts - the "slope" and the "intercept". @words = &parse_line('\s+', 0,$line);
# and stores them in the @tapeBound array
$tapeBound[0] =$words[0];
$tapeBound[1] =$words[1];

# This block reads the state transition rules from the machine file.
# Discards the header,
<INFILE>;
# This variable synchronizes the position of each rule in the rules array.
my $rulesCounter=0; while(<INFILE>) { # breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol. chomp($_);
@words = &parse_line('\s+', 0, $_); # checks that the first four pieces are defined in the state and alphabet hashes, defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!"; defined($states{$words[2]}) or die "$words[2] isn't a defined state!";
defined($alphabet{$words[3]}) or die "$words[3] isn't defined in the alphabet!"; # and converts the left/right instruction into a number to be added to the position counter. if (($words[4] eq "left") or ($words[4] eq "-1")) {$words[4]=-1;
}
elsif (($words[4] eq "right") or ($words[4] eq "+1"))
{
$words[4]=1; } else { (die "$words[4] isn't left, right, -1, or +1!");
}
# then creates an array of each rule.
for (0..4)
{
$rules[$rulesCounter][$_] =$words[$_]; } # The synchronization variable has to be updated.$rulesCounter++;
}

# Reading complete, the subroutine closes the file and returns the name of the start state.
close INFILE;
# The relevant data is stored in the $machine structure and returned to the main routine. my$machine =
{
rules      => \@rules,
accepts    => \%accepts,
alphabet   => \%alphabet,
startState => $startState, tapeBound => \@tapeBound }; return$machine;
}

sub readInput
# This subroutine reads the starting tape from the specified file into an array of symbols.
{
my (@tape, %alphabet, $alphaRef, @tapeBound,$boundRef);
open(INFILE, shift) or die "Can't open ".$input.":$!";
$alphaRef = shift; %alphabet = %{$alphaRef};
$boundRef = shift; @tapeBound = @{$boundRef};
# The first line of the file is read as the initial state of the tape, with symbols delimited by spaces.
my $line = <INFILE>.""; chomp($line);
@tape = &parse_line('\s+', 0, $line); # This makes sure every symbol in the input string was defined in the machine's alphabet. for (@tape) { defined($alphabet{$_}) or die "$_ in $input isn't in this machine's alphabet!"; } close INFILE; my$tapeMax = @tape*$tapeBound[0]+$tapeBound[1];
return (\$tapeMax, @tape);
}