Computability and Complexity/Formal Languages/Chomsky Hierarchy/Context Free Languages

From Wikibooks, open books for an open world
Jump to: navigation, search

Context Free Languages[edit]

Context free languages are the second most restricted class of languages in the Chomsky Hierarchy. Languages in this class can be described by a set of generation rules using 'non-terminal' symbols and 'terminal symbols', where the terminal symbols are the alphabet of the language. These rules replace non-terminal symbols with strings of terminals or non-terminals, or with an empty string. It is common practice to use capital letters for the non-terminal symbols, and lower-case letters for the symbols in the alphabet (and thus for the terminal symbols). This set of replacement rules is called a context-free grammar. A language is context free if there is a context-free grammar which generates every string in the language (and no other strings) from a starting non-terminal symbol.

As an example, consider the following grammar:

  • S -> ε (ε here stands for the empty string)
  • S -> A
  • A -> aAb
  • A -> ab

From the starting symbol S, this grammar can generate either the empty string or a word composed of any number of as followed by an equal number of bs. You may recall from the section on Regular Languages that this language, which can also be written as \{a^n b^n|n \ge 0\}, is not part of the class of regular languages, but we now see that it is context-free.

Note, in the example, that the grammar provides a choice with what rule to apply to both S and A non-terminals. It is these choices that allow the grammar to generate all the strings in a language, and not just a single string.

Pushdown Automata[edit]

The class of context free languages is the same as the class of languages recognized by machines called pushdown automata. A pushdown automaton (PDA) is a non-deterministic machine comprised of a finite number of states with transitions between them, much like an NFA (see Regular Languages), but with the addition of a stack of unlimited size. As part of its transitions, the machine can pop the top item off of the stack and use its contents as part of its transition, and can also push a new item onto the stack. Its transitions can be written as {A,x,y} -> {B,z}, where A and B are the from and to states, x is the next input character, y is what is popped off the stack, and z is what is placed on the stack. Any of x, y, or z can be ε, meaning that nothing is placed or consumed as part of that transition.

Abilties[edit]

The addition of the stack provides a functionally limited but arbitrarily large amount of memory to the machine, enabling it to recognize more complex languages than finite automata. As above, the language \{a^nb^n|n \ge 0\} is a context-free language, and so there is a PDA which recognizes it. It can do this by adding an item to the stack for each a in the string, thus storing a count of them, and removing an item from the stack for each b. If the number of as and bs is the same, the stack will empty when there are no more bs, and so the two numbers can be effectively compared. For a more detailed example of a machine to recognize this language, see the sample machine below.

If a PDA simply doesn't use the stack, it is the same as an NFA and thus equivalent to a DFA, so any languages recognized by a finite automaton can be recognized by a PDA as well. Because of this, and because there are non-regular languages which are context-free, the class of regular languages is a proper subset of the class of context-free languages.

Limitations[edit]

As with the regular languages, there are many languages which are not context-free. The stack on the PDA, while it provides infinite storage capacity, is still a stack, and so only the last element placed on it can be access at any given time. Accessing earlier elements requires removing and thus losing the later elements, since there is no other stack on which to place them. The PDA is also limited in that it must consume the input characters in the order in which they are received, and cannot access them again, except by placing them on the stack.

These limitations make it impossible for a PDA to recognize the language \{a^nb^nc^n|n \ge 0\}. By using the stack, a PDA can count the number of as, and compare them to the number of bs, but in doing so, it must consume its record of the as, so the cs cannot be compared in the same way. The machine cannot create a record of the bs to compare with the cs without obscuring the count of the as, so that method is equally unsuccessful. In short, the lack of any sort of random or direct access to the memory of the PDA prevents it from recognizing this and many other languages, and since no PDA can recognize them, they cannot be context-free languages.

Examples[edit]

The code below is a sample PDA emulator in Perl. Given the description of a machine and an input string, it simulates the machine processing the input string, and shows whether the machine accepts or not.

The syntax is: progname.pl PDAFile inputFile, where PDAFile is a text file containing the TM instructions, and inputFile is a text file containing the input string. Some sample inputs, including a set of PDA instructions for a machine to recognize \{a^nb^n|n \ge 0\}, can be found under sample PDA inputs

#!usr/bin/perl
use Text::ParseWords;
use strict;
use warnings;
 
my (@branches, @stack, %alphabet);
 
# Grabs the filenames for the machine and the word to be run on it.
my $pdaFile = $ARGV[0];
my $input = $ARGV[1];
 
# We use subroutines to parse and verify the data in the input files.
# The machine data is stored in the $machine structure as the keys rules, accepts, alphabet, and startState.
my $machine = readPDA($pdaFile);
# Rules and accepts are extracted from the $machine structure for ease of access.
my @rules = @{$machine->{rules}};
my %accepts = %{$machine->{accepts}};
# This reads the input file and parses it into an array of strings, with each element being one input symbol.
# It checks to make sure the elements are all in the machine's alphabet.
my @string = readInput($input, $machine->{alphabet});
# The newstate is a temporary storage point for when a new state is calculated.
my $newstate;
# The usedRule is a temporary storage point for when a new state is calculated.
my $usedRule;
# The changed variable represents whether or the current branch is unfinished.
my $changed = 1;
push(@stack, "");
# The top level of the branches array corresponds to each branch of possibilities created by the non-determinism of the PDA.
# Each element contains the conditions of the machine for that branched possibility.
# The first element of each collection is the state of the branch.
$branches[0][0] = $machine->{startState};
# The second element is how much of the input string the branch has read.
$branches[0][1] = 0;
# The third element is an array containing the stack for that branch.
$branches[0][2][0] = "";
# Now that the first branch is initialized, the processing can begin
 
for my $i (0..$#branches)
{
   # When we start a branch, print the branch number
    print "\nBeginning branch ".$i.".\n";
   # As long as things keep changing, keep cycling through the rules.
    while($changed)
    {
       # Unless it changes while going through the rules, this branch will quit.
        $changed = 0;
       # The input word is printed, with the next symbol highlighted.
        print "Input: @string[0..$branches[$i][1]-1] <".$string[$branches[$i][1]]."> @string[$branches[$i][1]+1..@string-1]\n";
       # The current state of the stack is printed.
        print "Stack: @{$branches[$i][2]}\n";
       # A new state is calculated by checking conditions against the list of rules
        for my $rNum (0..$#rules)
        {
#            print "::$rules[$rNum][0]??$branches[$i][0]";
#            print "::$rules[$rNum][1]??$string[$branches[$i][1]]";
#            print "::$rules[$rNum][2]??".$branches[$i][2]->[-1]."::\n";
           # Checks the current state, input, and top stack item against the rule
            if ($rules[$rNum][0] eq $branches[$i][0] &&
                ($rules[$rNum][1] eq "e" || $rules[$rNum][1] eq $string[$branches[$i][1]]) &&
                ($rules[$rNum][2] eq "e" || $rules[$rNum][2] eq $branches[$i][2]->[-1]))
            {
                if ($changed == 0)
                {
                   # Set the new state.
                    $newstate = $rules[$rNum][3];
                   # The state transition is printed.
                    print "State: ".$branches[$i][0]." -> ".$newstate."\n\n";
                    $changed = 1;
                   # Because possible branched depend on this state, we can't update it yet.
                   # When we can update this state, $usedRule will help us remember which rule to base those updates on.
                    $usedRule = $rNum;
                }
                else
                {
                   # Set the new state.
                    my $branchState = $rules[$rNum][3];
                   # The state transition is printed.
                    print "(branching) State: ".$branches[$i][0]." -> ".$branchState."\n\n";
                    my $newBranch = @branches;
                   # The state in the new branch is set.
                    $branches[$newBranch][0] = $branchState;
                   # The new branch starts with the same string position as the old branch,
                    $branches[$newBranch][1] = $branches[$i][1];
                   # and the same stack, so the stack has to be replicated.
                    @{$branches[$newBranch][2]} = @{$branches[$i][2]};
                   # If we read a symbol from the input to make the transition,
                    unless ($rules[$rNum][1] eq "e")
                    {
                       # then we should move to the next symbol.
                        $branches[$newBranch][1]++;
                    }
                   # If we used an element from the stack to make the transition,
                    unless ($rules[$rNum][2] eq "e")
                    {
                       # then it's used up and should be removed.
                        pop(@{$branches[$newBranch][2]});
                    }
                   # If the rule adds something to the stack,
                    unless ($rules[$rNum][4] eq "e")
                    {
                       # then it gets added.
                        push(@{$branches[$newBranch][2]}, $rules[$rNum][4]);
                    }
                }
            }
        }
       # Now that any branching has been finished, we can update the original branch.
        if ($changed)
        {
           # If we read a symbol from the input to make the transition,
            unless ($rules[$usedRule][1] eq "e")
            {
               # then we should move to the next symbol.
                $branches[$i][1]++;
            }
           # If we used an element from the stack to make the transition,
            unless ($rules[$usedRule][2] eq "e")
            {
               # then it's used up and should be removed.
                pop(@{$branches[$i][2]});
            }
           # If the rule adds something to the stack,
            unless ($rules[$usedRule][4] eq "e")
            {
               # then it gets added.
                push(@{$branches[$i][2]}, $rules[$usedRule][4]);
            }
           # The state changes to the new state.
            $branches[$i][0] = $newstate;
        }
    }
   # When the input is exhausted, the branch is in its final state.
    print "Final state of branch ".$i." is ".$branches[$i][0]."\n";
   # If that state is in the accept states list, the machine accepts the input and halts.
    if (defined($accepts{$branches[$i][0]}) && $branches[$i][1] == $#string)
    {
        print "The machine accepts the string.\n";
        exit;
    }
   # If that state doesn't, point it out.
    else { print "The branch does not accept the string.\n"; }
   # And move on.
    $changed = 1;
}
print "The machine does not accept the string.\n";
 
###################################################
 
 
sub readPDA
# This subroutine reads the machine data from the specified file into variables (mostly hashes).
{
    my (%states, %stackAlphabet, %accepts, %alphabet, @rules);
    open(INFILE, shift) or die "Can't open machine file: $!";
 
# This block reads the list of states from the machine file.
   # Discards the section header,
    <INFILE>;
    my $line = <INFILE>;
    chomp($line);
    my @words = &parse_line('\s+', 0, $line);
    for (@words)
    {
       # records the state names for checking the rules,
        $states{$_} = 0;
    }
 
# This block reads the start state from the machine file.
   # Discards the header,
    <INFILE>;
    my $startState = <INFILE>;
   # takes the whole line as the start state,
    chomp($startState);
   # and makes sure that the start state is defined in the list of states.
    defined($states{$startState}) or die "The start state $startState isn't a state!";
 
# This block reads the list of accepting states from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into state names,
    @words = parse_line('\s+', 0, $line);
    for (@words)
    {
   # checks to make sure that the accept states are defined states,
        defined($states{$_}) or die "$_ isn't a state!";
   # and defines those names in a new hash.  The use of a hash makes it easier to determine later if a specific state name accepts or not.
        $accepts{$_} = 1;
    }
 
# This block reads the list of symbols in the alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $alphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $alphabet{$_} = 0;
    }
 
# This block reads the list of symbols in the stack alphabet from the machine file.
   # Discards the header,
    <INFILE>;
    $line = <INFILE>;
    chomp($line);
   # breaks up the line into alphabet symbols (note that the symbols can be of arbitrary length),
    @words = parse_line('\s+', 0, $line);
   # e is used as the empty symbol in the rules.
    $stackAlphabet{e} = 1;
    for (@words)
    {
       # This records which symbols are in the alphabet for checking the rules.
        $stackAlphabet{$_} = 0;
    }
 
# This block reads the state transition rules from the machine file.
   # Discards the header,
    <INFILE>;
   # This variable synchronizes the position of each rule in the rules array.
    my $rulesCounter=0;
    while(<INFILE>)
    {
   # breaks each rule into start state, input symbol, stack symbol, end state, and new stack symbol.
        chomp;
        @words = parse_line('\s+', 0, $_);
   # checks that all five pieces are defined in the state and alphabet hashes,
        defined($states{$words[0]}) or die "$words[0] isn't a defined state!";
        defined($alphabet{$words[1]}) or die "$words[1] isn't defined in the alphabet!";
        defined($stackAlphabet{$words[2]}) or die "$words[2] isn't defined in the stack alphabet!";
        defined($states{$words[3]}) or die "$words[3] isn't a defined state!";
        defined($stackAlphabet{$words[4]}) or die "$words[4] isn't defined in the stack alphabet!";
   # then creates an array of each rule.
        for (0..4)
        {
            $rules[$rulesCounter][$_] = $words[$_];
        }
       # The synchronization variable has to be updated.
        $rulesCounter++;
    }
 
   # Reading complete, the subroutine closes the file and returns the name of the start state.
    close INFILE;
   # The relevant data is stored in the $machine structure and returned to the main routine.
    my $machine = 
    {
        rules      => \@rules,
        accepts    => \%accepts,
        alphabet   => \%alphabet,
        startState => $startState
    };
    return $machine;
}
 
 
sub readInput
# This subroutine reads the input string from the specified file into an array of symbols.
{
    open(INFILE, shift) or die "Can't open ".$input.": $!";
    my $alphaRef = shift;
   # The first line of the file is read as the input, with symbols delimited by spaces.
    my $line = <INFILE>."";
    chomp($line);
    my @string = parse_line('\s+', 0, $line);
   # This makes sure every symbol in the input string was defined in the machine's alphabet.
    for (@string)
    { exists($alphaRef->{$_}) or die "$_ in $input isn't in this machine's alphabet!"; }
    close INFILE;
   # Since the machine can continue to make transitions after the string is exhausted, this adds a blank element to keep the string from overrunning.
    push(@string, "");
    return @string;
}

Previous | Next