Theory of Formal Languages, Automata, and Computation

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Author[edit | edit source]

Douglas H. Fisher, with original and editorial contributions from Kyle Moore and Jesse Roberts.

Preface[edit | edit source]

Why study the theory of automata, languages, and computation (ALC)? It is not an area of high-profile research, and it’s not regarded as an applied area either, though some of the material is bread and butter, nuts and bolts computing – so pervasive that its invisible to many. That is, it’s a foundational course. Happily, I think, we study ALC because its engaging and fun for those interested in computation. In its abstractions you will find exercises in pure computational theory and computational thinking. Like a humanities class that you take as a computing student, if you are strongly inclined only towards the vocational thrusts of computing, and you still take a course on this material, I suggest that you take on faith that some-to-much of what you learn will come in handy later. I say “take on faith” because the links between theory and application probably won’t be as overt in this text as you would see in an algorithms text for example. There are, however, conceptual links between ALC and algorithms of course, programming languages, and artificial intelligence, which this book addresses.

I took “this class” in 1979 from a PhD student, Dov Harel, and I thought it was beautiful – I thought that the Pumping Lemma was a metaphor for life. The field hasn’t changed much (foundations don’t change much, or perhaps only rarely), though the onslaught of AI on the computational fields, and on other publics, has led to new analyses of how deep learning frameworks, which are dominant in AI and machine learning right now, fit into classical computational theory. While the book touches upon the recent work on AI, as well as classical material from AI, it is largely conventional in its treatment of the theory of formal languages, grammars, automata, and computation.

Maybe you’ll have a reaction to ALC something like mine (as well as other reactions perhaps) – it’s adoration of computing as a field that offers greater wisdom, not one that simply manifests as mastery of the latest gadgetry, but stems largely from an appreciation of computing’s earliest core abstractions.

The goal of the book is to "streamline" treatment of ALC to what I generally cover in a one semester course, with an organization that I prefer, with added treatments of applications too, including artificial intelligence (AI). My knowledge of this area resulted from undergraduate learning and from teaching undergraduates in the area, using primarily the texts of Hopcroft and Ullman[1][2][3], though I bring personal insights to the treatment, as well as treatment of the relationship of ALC to AI and machine learning. The exercises are often taken from other sources, and are cited as such. I add other exercises as well.

My original exercises uniquely include some that require students to interact with AIs to discuss material, and to vet these interactions. Instructors like me, who are not researchers in the area, but love it nonetheless, might recognize my interest in talking to AIs about this material -- who else would I be able to talk to about it in the normal course of my day!?! The interactions with large language models of AI are an opportunity for students to discuss ALC -- its methods, concepts, and philosophies -- in ways that they might never have done before. These AIs change the way that humans can now interact with computers, and they have the potential to rob us of something that is (or was) one of the unique skill sets of computer scientists -- the ability to communicate with the world's stupidest entity capable of response! Computer scientists are, or should be, among the world's best communicators in my opinion. The mode of communication, that is programming languages, are no longer necessarily needed, but at least in the foreseeable future some of the principles of good communication, such as the danger of making assumptions about what your conversation partner knows or believes, are still germane. The exercises on conversations with AIs on ALC material ask students to analyze, debug, expand, specialize, and otherwise interrogate their discussions.

Introduction[edit | edit source]

This textbook is fundamentally on finite representations of infinite languages, where the representations are amenable to computational analysis and characterization. That's it in a nutshell. The rest of this book fleshes out this nutshell in considerable detail, with pointers to still more detailed treatments.

Languages[edit | edit source]

A formal language is a set of strings over some alphabet. A language, call it L, can be finite or infinite. The alphabet, which is often denoted as Σ, is a finite set of primitive symbols, such as Σ = {a,…,z, A,….,Z} or Σ = {0,….,9}. Each string is a sequence (i.e., ordered) of symbols from the alphabet. Just as sets generally can be represented intensionally or extensionally, languages can too. An extensional representation is an explicit listing of the strings in the language, and is only relevant to the representation of finite languages. For example, the finite language of binary representations of natural numbers less than 4 is L = {0, 1, 10, 11} or alternatively it could be {00, 01, 10, 11}, over the alphabet Σ = {0, 1}.

In contrast, intensional representations of languages are finite descriptions of the set of strings in the language. These finite-length descriptions are used to describe infinite, as well as finite languages. The natural language description “binary representations of natural numbers less than 4” is an intensional representation of a language, which can also be writen as L = {w | w is a binary representation of an integer less than 4}. The phrase “binary representation” suggests the relevant alphabet of {0, 1}. There are other intensional representations of finite sets above. What are they? While I used Σ = {a,…,z, A,….,Z} or Σ = {0,….,9} as representing alphabets, I could also say L = {0,….,9}, for example, or L = “the single digits in common arithmetic, base 10”.

Intensional descriptions are also absolutely needed to finitely represent infinite languages. Consider L = {w | w is a binary representation of an integer greater than 3}, or perhaps “L is the set of binary representations of natural numbers greater than 3”, with Σ = {0, 1}. An interesting observation about intensional descriptions is that there are implicit (typically unstated) inferences that are required to interpret them. There are inferences required of extensional representations too, but I would say that the inferences in the latter case are much closer to the mechanical level that are unambiguous.

Sometimes the intensional description, upon further thought, is insufficient for unambiguous understanding. Fortunately, in the intensional description of the digits, most of us will know how to interpret the ‘…’ because of our prior training, but others will not know how to interpret that. Even for the best trained among us, the description of the binary representations of natural numbers greater than 3 may, upon reflection, seem incomplete. Perhaps we should add a phrase, as in L = {w | w is a binary representation of an integer greater than 3, where w is represented by the least number of binary digits necessary} = {100, 101, 110, 111, 1000, 1001, …}. Why? Even here we might want to say w has no leading 0s.

To my knowledge, no course on formal languages will seek to formalize the nature of implicit inferences in the general case, though this would be required to fully understand the complexity of language representation and comprehension, and is undoubtedly of importance to pure mathematics. We will touch on some of these issues, particularly when we talk about the relationship between AI and formal languages.

As we have already mentioned, an intensional description (+ the inference mechanism, which may be a computer program) is a finite representation of what is often an infinite language. In fact, much of this course is about two classes of finite representations of languages – grammars and automata (or machines) – that are both amenable to mechanical, unambiguous, automated inference. Roughly speaking, a grammar specifies how to generate strings in a language, and an automaton specifies how to recognize strings in a language. This distinction between generators and recognizers of languages is not as sharp as suggested here, but it’s a good starting point.

Proof[edit | edit source]

A traditional goal in automata and formal language courses is to practice skills of deductive proof, which may influence and benefit thought for a lifetime. The word ‘proof’ is often misused, as when it is used to state that there is no proof of this or that natural or social phenomenon (e.g., no proof that climate change is human made). Such areas are not appropriate for deductive proof, but as we will see when discussing AI, the logical mechanisms for carrying out deductive reasoning can be augmented with probabilities and utilities, and be reinterpreted, to become the skeletal structure for other important forms of reasoning: possibilistic reasoning (i.e., what is possible?), probabilistic reasoning (i.e., what is probable?), evidential reasoning (i.e., what is probable given evidence/data/observations?), and rational reasoning (i.e., what are the expected consequences of actions, which includes probabilities but also utilities/costs of outcomes?).

Returning now to deductive proof, Hopcroft, Motwani, and Ullman (2007) state

  • "In the USA of the 1990's it became popular to teach proof as a matter of personal feelings about the statement." (p. 5, [3])

But they continue

  • "While it is good to feel the truth of a statement you need to use, important techniques of proof are no longer mastered in high school. Yet proof is something that every computer scientist needs to understand.” (p. 5, [3])

Contrast this statement with breakout box on page 12 of the same text "How Formal Do Proofs Have to Be?" where it acknowledges that "persuasion" of an audience member is very relevant, but this will depend on what knowledge that audience member has. It goes on the say, in effect, that proof should persuade an audience (that will include knowledgeable members too) and not just individuals.

Another point is that proof (which I will characterize as  "deductive" and Polya[4] calls "demonstrative" and "finished", and includes all forms of proof that we discuss) is the result of search for a proof through intelligent, knowledgeable guessing -- that is, mathematics "in the making". 

  • "Mathematics is regarded as a demonstrative science. Yet this is only one of its aspects. Finished mathematics presented in a finished form appears as purely demonstrative consisting of proofs only. Yet mathematics in the making resembles any other human knowledge in the making. You have to guess a mathematical theorem before you can prove it; you have to guess the ideas of a proof before you carry through the details. " (p. vi, [4])

Polya makes clear that the "guessing" behind the search for a proof should not be arbitrary but should follow principles of "plausible" reasoning. Textbooks often present "finished" proofs, but embedded in the discussion sometimes there will be insights into the search and rules of thumb for easing and directing the search for the finished product.

As an example, consider Goldbach’s Conjecture – that every even number greater than 4 is the sum of two prime numbers. But how do I come up with this statement to begin with? How did Goldbach come up with Goldbach’s conjecture? I do so through what Polya also includes as part of plausible reasoning – I notice that every even number that I can think of is the sum of two primes. This is an empirical exercise of identifying a pattern based on data. After identifying a plausible pattern, I can attempt a proof of the statement (e.g., If N is even and greater than 4, then there exists () N-k and k that are both prime). No one, by the way, has ever proven Goldbach’s Conjecture. It is so easily stated but as yet unproved as true or false. That’s one of the attractions of number theory – easily comprehended and difficult to prove hypotheses, at least in many cases. A classic program of early AI and machine learning is known as AM, standing for "A Mathematician", by Douglas Lenat[5]. Rather than proving theorems, which was an early focus of AI research predating AM, Lenat's system searches for empirically supported patterns in number theory which then are presented by the system as candidate theorems. AM illustrates the tasks of mathematics in the making, which are integral to computational theory as well as mathematics.

Once a candidate theorem is identified, a search for a proof of it (or not) can begin. Generally, finding a valid proof is a much more difficult problem than validating a given proof. An interesting hybrid problem of finding and validating arises when grading a proof, where ostensibly the task is to validate or invalidate the "proof" submitted by a researcher, students included, but this will sometimes require that a focused search evaluates "how far" the submission is from a valid proof (e.g., for the purpose of assigning a grade). This grading of a proof attempt is a good example of an AI task, rather than ALC territory per se.

Proof by Induction[edit | edit source]

Figure InductionProof1: A simple inductive proof, demonstrating a basis case and the use of an inductive step.
Figure InductionProof2: A second inductive proof demonstrating a base case and use of an inductive step.
Figure InductionProof3: This illustrates a proof by induction on strings. The problem is taken from Hopcroft and Ullman (1979)[6]. This figure shows Part 1 of a proof of equivalence between the two definitions. That is that satisfaction of Definition 1 implies satisfaction of Definition 2. While this is only part of a proof of equivalence (implication in both directions), it is an example of an entire proof of implication in one direction.

You may have covered proof by induction in a course on discrete structures, and if so, our treatment here is review. If we want to prove a statement St of all natural numbers, then we explicitly show base cases -- that St(0) is true and perhaps other base cases -- and then we show the truth of an inductive step, that ‘if St(k) is true then St(k+1) is true’ for all k 0 and/or some other base cases. The inductive step suggests the importance of proofs by induction in formal languages, where strings have lengths and the truth of statements about strings of length k+1 are demonstrated by building on the assumption of truth of strings of length k. Figures InductionProof1, InductionProof2, and InductionProof3 give sample proofs, two arithmetic and the third involving strings.

The deductive validity of proof by induction rests on the axioms of number theory -- one in particular. But to review, Peano_axioms of number theory are as follows.

1. 0 is a natural number.

2. For every natural number x, x = x. That is, equality is reflexive.

3. For all natural numbers x and y, if x = y, then y = x. That is, equality is symmetric.

4. For all natural numbers x, y and z, if x = y and y = z, then x = z. That is, equality is transitive.

5. For all a and b, if b is a natural number and a = b, then a is also a natural number. That is, the natural numbers are closed under equality.

6. For every natural number n, the successor of n, n+1, is a natural number. That is, the natural numbers are closed under the successor function.

7. For all natural numbers m and n, m = n if and only if the successor of m is the successor of n, m+1 = n+1.

8. For every natural number n, n+1 = 0 is false. That is, there is no natural number whose successor is 0.

9. If K is a set such that: 0 is in K, and for every natural number n, n being in K implies that n+1 is in K, then K contains every natural number.

It is the 9th axiom that justifies proof by induction. It says precisely what we have been illustrating above, which is that if a statement is true of n (i.e., St(n) is true; n is in the set of natural natural numbers for which St is true), and St(n) St(n+1), then the statement, St, is true of every natural number.

The 9th axiom, instantiated for particular statements, St, is itself a finite representation of the countably infinite set of natural numbers.

Proof by Contradiction[edit | edit source]

There are other forms of proof too, like proof by contradiction. Suppose you want to prove proposition p, then the equivalent statement p false (or simply that p is false) may be more amenable to proof.

For example, prove “If x is greater than 2 (p1) and x is prime (p2), then x is odd (q)” by contradiction. Abbreviate this as proving (p1 ⋀ p2 q by contradiction, so negate the statement resulting in

((p1 ⋀ p2 q), which is equivalent to (represented as )

((p1 ⋀ p2) ⋁ q)

( (p1p2) ⋁ q)

((p1p2) ⋀ q)

((p1p2) ⋀ q)

((p1⋀ p2) ⋀ q)

(p1⋀ p2q) // (x is odd) (x is even)

Does (x is greater than 2) and (x is prime) and (x is even) create a contradiction? If x is even then x = 2y,  for some y > 1, then x is composite by definition, contradicting x is prime (p2).

Here is another example of proof by contradiction. This example is due to Hopcroft, Motwani, and Ullman (2007)[7]. Let S be a finite subset of an infinite set U. Let T be the complement of S with respect to U. Prove by contradiction that T is infinite. Note that U and S represent. the cardinality or size of sets U and S.

Prove  “IF   S ⊂ U (p1) and

  there is no integer n s.t. U = n (p2) and

  there is an integer m s.t. S = m (p3) and

  S ⋃ T = U (p4) and

  S ∩ T = {} (p5)

  THEN there is no integer p s.t. T = p  (q)

by contradiction

Similar to prior example, negate the entire expression and simplify, getting p1 ⋀ p2 ⋀ p3 ⋀ p4 ⋀ p5q.

q there is an integer p s.t. |T| = p

  p4 and p5 U = S + T = m + p contradicting p2 (that there is no integer n s.t. U = n)

Diagonalization[edit | edit source]

Figure Diagonalization: A special case of proof by contradiction is diagonalization. This figure illustrates Cantor's diagonal argument that the real numbers between 0 and 1 are not countably infinite.

Special cases of proof by contradiction include proof by counterexample and proof by diagonalization. This latter strategy is used to prove exclusion of items from an infinite set of like items. In diagonalization one defines a matrix in which the rows, of which there may be a countably infinite number, represents members of a set. Each row is defined by the variables given by each of the columns of the matrix. One defines the matrix and uses the diagonal of the matrix to construct a hypothetical candidate row, but it is a candidate row that cannot belong to the set of rows in the matrix by virtue of the way the candidate row was constructed.

Cantor’s proof that the real numbers are not countably infinite introduced the diagonal argument, and assumes a matrix where rows correspond to the (countably infinite) natural numbers (i.e., 0, 1, 2, 3, …) and the columns correspond to digits that are to the right of the decimal point, as in 0.x1x2x3… (e.g., 0.3271…). We can then take the diagonal of this matrix and create a candidate row by taking each digit xj,j in the diagonal and changing it, say to (xj,j+1) mod 10. This candidate sequence cannot match any row, by definition, since at least one digit in it, the one that was changed along the diagonal, is different than any row. So, there are syntactically valid sequences of digits to the right of the decimal point that cannot correspond to any row in the countably infinite set of rows. This is illustrated in Figure Diagonalization.

When this strategy was introduced in the late 19th century there was some controversy around it because of the novelty of creating an infinite-length counterexample, and Wittgenstein was reported to have called the diagonal argument “hocus pocus”, but it is now accepted as mathematically valid. We will use it later in proving that some languages are not members of certain language classes.

Proof with Construction[edit | edit source]

The most common form of "proof" or demonstration that you will see in this textbook involves construction. For example, if we want to show that two descriptions, X and Y, define the same language, we show that we can construct/translate Y from X: X Y, and vice versa, Y X. But we can't construct just any old thing and expect that to be a sufficient demonstration of equivalence. You have to be convinced that the translation or construction from one representation to the other is not buggy and is otherwise valid. If we want to be formally rigorous we might therefore follow construction by a proof, say by induction or contradiction, of the construction's validity. We will typically only allude to or sketch the proof that the construction is valid, if that, spending most of our time explaining the construction and its validity.

Automated Proof[edit | edit source]

In AI, automatic proof has a rich history, and we take time to talk about AI proof as a way of demonstrating important principles of thinking. An important insight that is rarely taught in lecture is that to prove something requires that one search for a valid proof. When was the last time that you saw a lecturer prove something live and previously unseen by the lecturer? This search can take a long time and it may not be successful at all. Rather, the search for a proof (or the search for a correct computer program for a given specification) typically happens beforehand, perhaps from scratch or by looking up the results of someone else’s search. Demonstrating the validity of an existing proof, which is what is typically done in lecture, is generally much easier than finding a valid proof through search! This distinction between validating and finding will relate to our study on P and NP problems, respectively. You may have seen P and NP elsewhere, perhaps an agorithms course, but if not we address it in Properties of Language Classes. Our talk of AI proof will make explicit the importance of search or what Polya includes as part of plausible reasoning for theory "in the making".

If you know a statement that you want to prove, then it often makes sense to reason backwards from that statement to a known set of axioms by inverting deductively valid steps (so that when read forward the steps are valid). Backwards reasoning is an important strategy in AI and in thinking generally. An early, if not the first AI paper on this as thought strategy was Shachter and Heckerman (1987)[8] but a search on Google for backwards reasoning or the like turns up numerous business articles and talks in the last decade. I am reminded of an old addage in motorcycle circles -- "when you are making a turn look to where you want to be", which I am sure is a metaphor for much that you will read in business and elsewhere.

A more technical example is that if I am planning a route to particular destination, I might start with the destination on a road map and work backwards along the roads leading to it, selecting those that are in the direction of my current location. As a matter of interest, automated route planners might work in both directions, from destination to start location, and vice versa, in search of intersections in the search "frontiers".

Automated proof (aka theorem proving) is grounded in formal logic. You saw propositions (e.g., 'p's and 'q's), as used in propositional logic, under Proof by Contradiction above. Two basic operations of inference in propositional logic are modus ponens and resolution. The modus ponens inference rule says that if you know 'p' (i.e., proposition p is true) and you know 'p q' (i.e., p implies q) then you can deductively conclude proposition 'q' (i.e., q is true). You now know both p and q, i.e., (p q). Modus ponens is the best known deductive inference rule.

The resolution rule relies on the equivalence between implication and disjunction, (p q) (p q). If you know 'p' and you know '(p q)' then you know q of course. Its as if the p and the p cancel each other, and if you think of it logically you can see why: if p (p q) (p p q) (p p) (p q) false (p q), which is again (p q).

Modus ponens and resolution are equivalent rules of inference at this simple level, but they offer different perspectives on deductive reasoning, and we will see them again when discussing computational theory, and again in the last chapter on applications, where we will also talk about automated theorem proving in both propositional and first-order logic.

Problems, Procedures, and Algorithms[edit | edit source]

Before getting to the various types of grammars and automata to finitely represent languages, lets briefly preview some theory of computation with concepts that we already know – procedures and algorithms. A procedure is a finite sequence of steps that can be carried out automatically, that is by a computer. An algorithm is a procedure that is guaranteed to halt on all inputs.

Decidability and Undecidability[edit | edit source]

Procedures, be they algorithms or not, can be written to recognize languages. The question of whether a string w is a member of a language L is a yes/no question, and a procedure for answering the question is a finite representation of L if and only if (a) the procedure says 'yes' for every w in L (and halts), and (b) the procedure never says 'yes' for any w that is not in L. If in addition the procedure says 'no' for every w not in L (and halts) then the procedure is an algorithm. Notice that these conditions collectively allow for a procedure that is a finite representation of L but that is not an algorithm. Such a non-algorithm procedure says 'yes' and halts if and only if w in L, but there exists w that are not in L for which the procedure will not answer 'no' and will not halt. If for language L there is no algorithm for correctly answering the yes/no question of membership in the language, then membership in L is said to be undecidable. If an algorithm does exist for correctly answering the question of membership in L for all possible input strings, members and non-members, then membership in L is decidable.

Presumably most readers can write a procedure that recognizes whether an input integer is a prime number or not, outputting yes or no, respectively. If written correctly the procedure would certainly halt in all cases, so its an algorithm and it recognizes the language of prime numbers. An algorithm can also be written to determine whether an integer is a perfect number or not. A perfect number is a number that is the sum of all its divisors (excepting itself, but including 1). So, membership in the language of prime numbers is decidable, as is membership in the language of perfect numbers.

Decision Problems[edit | edit source]

Formally, decision problems refer to a yes/no questions. The question of membership in a language is one type of decision problem, for which we have given examples of testing for primality and testing for "perfection". Lets consider two other decision problems. You can write an algorithm for answering whether there is a prime number that is greater than a number that you supply as input. Since the prime numbers are known to be infinite then its a simple algorithm indeed -- just answer 'yes' regardless of input! But for the sake of illustration, assume that in addition to yes, you wanted to give the smallest prime that is greater than the number that is passed as input. You can write an algorithm for doing this that could use the algorithm for membership testing for primes as a subroutine.

You can also write a procedure for answering the question of whether there is a perfect number greater than a number that is passed as input, but it is not known whether the perfect numbers are infinite, and so its not known whether this procedure is an algorithm or not – its not known whether it will halt on all inputs. In the case of an input that is greater than the largest currently known perfect number, the procedure will halt with a yes if a larger perfect number is eventually found, but again, the procedure will not halt if there is no larger perfect number. This decision problem of next perfect number is not known to be decidable or undecidable, but as in all examples of undecidable decision problems thus far, the 'undecidability aspect' of a question lies in the handling of 'no' cases.

Notice that the knowledge that the primes are infinite and the uncertainty on whether the perfect numbers are (or are not) infinite lies outside the procedures for next-prime and next-perfect, respectively. We can write the procedures for each next-type problem in very much the same way, and the fact that one is an algorithm and one is not known to be an algorithm is not revealed in the procedures themselves.

As other examples of decision problems lets consider inspirations from Goldbach's conjecture. Consider the language of even numbers greater than 4. Call it LEven. Consider the language, LEvenSum, of even numbers greater than 4 that are the sum of two primes. Now, consider the language, LEvenNotSum, of even numbers greater than 4 that are not the sum of two primes. One decision problem is whether LEvenSum LEven? A second decision problem is whether LEvenNotSum , i.e., the empty language? Intuitively, these two questions seem similar, and perhaps equivalent. Its left as an exercise to show that the two questions are indeed equivalent.

Given that Goldbach's Conjecture is still a conjecture, it is not known whether LEvenSum LEven and LEvenNotSum are decidable or undecidable. Not coincidentally, it is not apparent on how the write a procedure for answering the questions! Such a procedure would constitute the main part in a proof with construction on the decidability or undecidablity of these decision problems.

LEvenSum LEven is just one example of a decision problem that tests the equality of two languages. Likewise, LEvenNotSum is just one example of a decision problem testing for the emptiness of a language. In the chapter on Properties and Language Classes we will study decision problems in the context of classes (aka sets) of languages rather than specific languages only. For example, is the question of L decidable or undecidable for all languages L in a given class of languages?

Interestingly, decision problems themselves define languages -- languages of "languages"! Let Lmeta be the set (aka language) of recognition procedures for languages that are empty. Convince yourself that this makes sense. A recognition procedure for a language is after all a finite sequence of steps, which is a finite string in a some programming language. Is Lmeta ? Is membership in Lmeta decidable? Does a recognition procedure for Lmeta even exist that can correctly answer in all of the yes cases and no cases?

Languages that cannot be Finitely Represented[edit | edit source]

If no recognition procedure for Lmeta exists (i.e., no procedure exists that can recognize all, and only, members of Lmeta) , then recognition for Lmeta would be undecidable in an even stronger sense than other undecidable decision problems that we've discussed thus far, in which the 'undecidability aspect' of the problem rested with the 'no' cases only.

Intuitively, examples of languages that cannot be finitely represented at all will be those for which there is no basis by which a recognizer can be constructed. A language in which the members are selected randomly would be an intuitive illustrative example: LR = {w | w in Σ* and random(w) == true} could be formed by enumerating the strings over alphabet Σ in some systematic order and randomly deciding in each case whether a string, w, is in the language or not. The reason that a recognizer can't be built for this language assumes that random(w) is applied in the recognizer with no memory of its application in the creation of the language. And we can't simply remember the members, since there are an infinite number. In contrast, if random(w) were a pseudorandom process then such procedures are repeatable and we could implement a recognizer in that case, but if we have a truly random process then no recognizer can be built to say yes correctly in all cases.

Random languages, as above, are posed to give you an example of languages that are not finitely representable that might be less mind numbing than Lmeta. We will return to these questions when we talk more on the theory of computation.

Generating Languages[edit | edit source]

Recognition algorithms like those for prime and perfect numbers can be used as subroutines for generating languages as well. The language of prime numbers can be generated by initializing the set of "strings" to {2}, and embedding the recognition procedure in a loop that iterates over increasingly large odd integers, starting with 3, testing each for primeness, and adding only those numbers that are found to be prime. Note that because the primes are known to be infinite this generation procedure won’t terminate, so its not an algorithm but of course its not a yes/no questions either. We could create a non-equivalent algorithm from the generator by using a counter that artificially stops the iterative process after a specified number of iterations (though its not a generator of the complete language any more), but only a generator up to strings in the language up to a prescribed length.

We can use the same strategy of systematically generating larger numbers, not odd only in this case, and using the perfect number recognition algorithm on each. This would be a generator of the perfect numbers. This too would run forever unless it was constrained by a maximum count on iterations.

A procedure that is designed to "run forever" is called an anytime algorithm because at any time it can be paused and the output to that point can be examined before continuing to play the algorithm forward. Anyone who has experienced a system crash knows that an anytime algorithm is best regarded as a theoretical construct, though even in the case of a crash, computer systems generally save their state and can be restarted from that saved state, so an any time algorithm could proceed indefinitely, if not forever.

There are ways to convert a generation procedure for a language to one that generates one string, w, of a language, L, on each run of the algorithm. An exercise asks you to sketch an approach to do so, which will guarantee that each string in the language will be generated with non-zero probability.

Exercises, Projects, and Discussions[edit | edit source]

Figure InductionExercise1: Prove by induction. Clearly state the Basis case(s) and the use of the inductive assumption/step, as well as other work.

Induction Exercise 1: Prove the equality given in Figure InductionExercise1 by induction.

Induction Exercise 2: (Due to Hopcroft and Ullman (1979)[9]). Briefly explain what is wrong with the following inductive “proof” that all elements in any set must be identical. The basis is that for sets with one element the statement is trivially true. Assume the statement is true for sets with n-1 elements, and consider a set S with n elements. Let a be an element of S. Let S = S1 S2, where S1 and S2 each have n-1 elements, and each contains a. By the inductive hypothesis (assumption) all elements in S1 are identical to a and all elements in S2 are identical to a. Thus all elements in S are identical to a.

Induction Exercise 3: Prove by induction that the number of leaves in a non-empty full binary tree is exactly one greater than the number of internal nodes.

Proof Discussion 1: Have a discussion with an AI large language model on the history and philosophy of "proof", potentially including the role of plausible reasoning, infinity, and/or other issues that you are particularly interested in. Include your prompts and the responses, and/or ask the LLM to summarize the discussion in X words or less (e.g., X = 500).

Decision Problem Exercise 1: Show that LEvenSum LEven and LEvenNotSum are equivalent decision problems.

Decision Problems Exercise 2: There are ways to convert a generation procedure for a language to one that generates one string, w, of a language, L, on each run of the algorithm. Sketch an approach to do so, which will gaurantee that each string in the language will be generated with non-zero probability. Your "one-shot" version of generate can use the recognition procedure for L as a subroutine.

Proof Project 1: Learn about AI proof assistants and use these throughout the semester to addresses problems that you are assigned, and unassigned. Report the results and experiences in a proof portfolio. https://www.nytimes.com/2023/07/02/science/ai-mathematics-machine-learning.html

Grammars and the Chomsky Hierarchy[edit | edit source]

Figure GrammarDefinition: This is a succinct definition of a grammar of the most general type. -- Type 0 or unrestricted.

Grammars specify how the strings in a language can be generated. Grammars are finite representations of formal languages. In this section we describe four broad categories of grammars and corresponding categories of languages that the grammar categories represent. The grammars and languages are ancestrally related and were proposed by Chomsky as potential models for natural language. Thus, the four language (and grammar) classes are known as the Chomsky hierarchy, which are summarized in Figure ChomskyOverview. The broadest class of languages, those with the least restrictive grammars, are the unrestricted or Type 0 languages (grammars).

Figure ChomskyOverview: The Chomsky hierarchy consists of four classes of languages (i.e., Unrestricted, Context Sensitive, Context Free, and Regular), each defined by a class of grammars. Chomsky introduced and considered these language classes as possible models of natural language.

Unrestricted (Type 0) Grammars and Languages[edit | edit source]

A grammar is specified by a 4-tuple G = (V, Σ, P, S), where V is a finite set of variables, also called non-terminal symbols; Σ is the finite alphabet of the language being represented, also called the terminal symbols of the grammar; P is a finite set of productions; and S is the start symbol and is a member of V. V and Σ have the null intersection. Each production in P takes the form , where is a string of symbols from V+Σ and || > 0; and where is also a string of symbols from V+Σ, but || 0 (i.e., can be the empty string, denoted by ). This and no other restrictions is the definition of unrestricted or type 0 grammars (Figure GrammarDefinition). A grammar G specifies the strings of a language L(G) or LG. This definition of a type 0 grammar seems to allow to be a string of only alphabet (terminal) symbols, but I have never seen an example of this in textbooks. An exercise below asks you to reflect on this issue further.

Figure SimpleGrammar: The grammar for some number of 0s followed by an equal number of 1s, with one example derivation shown.

Consider this simple grammar, GSimple = (V, Σ, P, S), where V = {S}, Σ = {0, 1}, P = {S ε, S 0S1}, S is S. There are two productions in this grammar, and we'll often use the 'or' symbol, '|', to refer to multiple productions with the same left-hand side. So, S ε and S 0S1 are abbreviated S ε 0S1. ε signifies the empty string, and is typically not part of the alphabet of a grammar (else it would be in Σ, which is not the case in this grammar). We just need some way of representing 'nothing'. In this example, there is only one variable, and the left-hand side of each production, , is a single variable.

Figure SimpleDerivationTree: This illustrates a breadth-first enumeration of strings in the language 0n1n. Nodes are sentential forms. The root is the start symbol. Leaves are strings in the language. Arcs correspond to productions. A path from the root to a leaf is a derivation.

Beginning with only the start symbol, we can rewrite it using any of the productions that include only the start symbol on the left-hand side. This rewrite will create a string that contains variables and/or terminals. This resulting string is called a sentential form. We can then look at the sentential form created and ask whether any productions can be applied to it. A production can be applied to a sentential form if the production's left-hand side () is a substring of the sentential form. In case of such a match, a new sentential form is created by rewriting the matched substring with the matching production's right-hand side (), resulting in another sentential form. And this can continue indefinitely, where a sequence of sentential forms (that are the result of s sequence of production applications) are called a derivation, where the last string in the derivation only contains terminal symbols and is a string of the language generated by the grammar (Figures SimpleGrammar and SimpleDerivationTree).

Because there may be multiple productions that have left-hand sides that match a given sentential form, we can write an enumeration procedure, like depth-first or breadth first enumeration, that tries all the relevant productions, and that traces out possible derivation sequences. Enumeration as it is often taught in introductory courses is probably a procedure for enumerating or systematically visiting the nodes of an explicit graphs, but the enumeration procedure for generating a language is better thought of as enumerating an implicit graph (i.e., the “vertices”, that is sentential forms, are generated on demand). Enumerating implicit graphs is also characteristic of many AI problems.

A more complicated grammar, GComplex = (V = {S, A, B, C, D}, Σ = {a}, P, S), due to Hopcroft and Ullman (1979)[10], where P is this set of productions.

1) S ACaB

2) Ca aaC

3) CB DB

4) CB E // FYI: we could summarize productions (3) and (4) as CB DB | E

5) aD Da

6) AD AC

7) aE Ea

8) AE ε // ε is the empty string

What language could GComplex represent? Since 'a' is the only terminal symbol, it must be a language of strings that only contain 'a's. Its a fair guess that the number of a's in each string is what determines membership. Lets start seeing what strings of 'a's we are able too derive. Can we derive the string of zero a's, i.e., the empty string? No, since the start symbol is on the left-hand side of only production (1), and an 'a' is introduced right off the bat, and importantly there is no production that removes a's once one is introduced.

Can we derive a string with exactly one 'a'? S ACaB using production (1) AaaCB using production (2). Indeed, production (2) is the only production that is applicable, but you should confirm that the left-hand side of production (2), 'Ca', is the only left-hand side that matches a substring of 'ACaB'. Again, there is no way to remove an 'a' from 'AaaCB' in order to arrive at string of a single 'a'.

Can we derive a string with exactly two a's? S ACaB using production (1) AaaCB using production (2) AaaDB using production (3) AaDaB using production (5) ADaaB using production (5) ACaaB using production (6) AaaCaB using production (2), but again too many a's if two a's is our target. Again, failing to find ways of combining productions to remove a's, this path of rewrites is a deadend. But notice that when we applied production (3) during this sequence of rewrites we had another option available to us -- we could have applied production (4) instead of (3). Lets look at that option instead. S ACaB using production (1) AaaCB using production (2) AaaE using production (4). Continuing from AaaE we have AaaE AaEa using production (7) AEaa using production (7) aa using production (8).

So we can derive 'aa' from GComplex. Can we derive 'aaa'? If we continue on from our previous deadend of 'AaaCaB' above we should quickly conclude that only production (2) will apply, which will result in 'AaaaaCB'. We can't derive 'aaa', but you should confirm that we can derive 'aaaa'. Up until this point we have been hand simulating an enumeration procedure, which is probably best left to automation. As an exercise work with a large language model such as chatGPT to code a procedure for performing an enumeration of strings generated by GComplex. But lets also step back and reason about the generation process. We have a variable 'C' that works its way from left to right through successive sentential forms, doubling 'a's as it goes. When the 'C' reaches the right end, it either changes to an 'E' using production (4) and goes left repeatedly using production (7) until it reaches the left end and 'exits' using production (8); or the 'C' at the right end becomes a 'D' using production (3), again going left repeatedly using production (5), but this time leading to a sentential form that will again result in doubling the 'a's. Rather than the language of an even number of a's, confirm your understanding that GComplex generates the strings with a positive power of 2 a's. Followup questions in our analysis include (i) why must E go left at all? (ii) Could the grammar be altered, creating an equally valid equivalent grammar, so that D doubled a's as it was going left? (iii) And if so, what other revisions would have to be made?

What is made particularly clear in this latter example of a grammar is that there is a procedure -- an iterative, looping procedure -- that underlies the design of the grammar, and though each production can be interrogated in isolation, each rule only makes sense in its relation to the entire grammar.

Markov Algorithms[edit | edit source]

Markov algorithms make the procedure explicit in the formalism

Markov of 'Markov algorithms' is the son of Markov of 'Markov chain' fame. both are named ...

Non-Contracting aka Context Sensitive (Type 1) Grammars and Languages[edit | edit source]

If for every production , then the grammar is said to be a non-contracting grammar since sentential forms never shrink along a derivation path. The class of non-contracting grammars define the class of languages that is identical to the languages that are defined by the context sensitive grammars. A context sensitive grammar has productions of the form 1A2 12 where A is a variable, and 1, 2, and are arbitrary strings of variables and alphabet symbols, but cannot be the empty string. The name context sensitive is because 1 and 2 give the context in which A can be rewritten as .

It should be clear that every context sensitive grammar is a non-contracting grammar, but its also the case that every non-contracting grammar can be converted to a context sensitive grammar that defines the same language.[11][12] Thus, the non-contracting grammars and the context sensitive grammars define the same class of languages. You can think of context-sensitive as a normal form for non-contracting, but context sensitive is so widely used that even when talking about non-contracting grammars, we'll often use the equivalent label of context sensitive grammars (CSGs) and languages (CSLs). In any case, these grammars and languages are also referenced as type 1.

Consider this CSG, Gabc = ({S, B, C}, {a, b, c}, P, S), which is due to Hopcroft and Ullman (1969)[13], where P is the set containing the following productions.

1) S aSBC

2) S aBC

3) CB BC

4) aB ab

5) bB bb

6) bC bc

7) cC cc

Let's understand the procedure that is specified by this grammar. Production (1) can be used repeatedly to add a positive number of a's, as well as 'placeholders' for an equal number of b's (Bs) and c's (Cs). After applying production (1) a desired number of times, we can 'exit' with production (2), then move C's to the right with production (3), and convert the placeholders of Bs and Cs to their lower case equivalents. Confirm your understanding that the language generated by Gabc is {anbncn n 1} (i.e., a positive number of a's followed by an equal number of b's and finally an equal number of c's).

Consider the grammar GComplex presented earlier. This is clearly not a CSG, since productions (4) and (8) are contracting. Does this mean that L(GComplex) is not a CSL? Not necessarily, since its possible that there is a CSG that is equivalent to GComplex. Finding an alternative and equivalent CSG for GComplex is left as an exercise.

A small point is that a grammar that is non-contracting cannot define a language that includes the empty string, since there can be no production of the form , since = 0. So technically the context-sensitive languages do not include the empty string. But it’s a technicality that is easily overcome and we won’t let it necessarily prevent us from talking about languages that include but that are “otherwise’ context-sensitive. Suppose we have a grammar G (with start symbol S) that is context-sensitive and thereby excludes in L(G). If we want to include then we can create a new grammar G’ with new start symbol S’ that is identical to G, except with an additional variable S’ and two new productions, S’S and S’. S’ is never used otherwise. If there is something theoretical that relies on exclusion of then we can focus on L(G), but for purposes of allowing for we can use L(G’). In fact, we will relax this a bit more and allow a production of the form S , so long as S is the start symbol. Note that if S appears on the right hand side of any production then this allowance will allow a rewrite to in intermediate sentential forms, but again that’s easily remedied if there is a theoretical point in doing so.

We will talk more about the computational implications of context sensitive languages later, but we will preview an important point now. Suppose we ask whether w is in a language L(G) – yes or no? We can construct an algorithm to answer this question for a CSL. We can enumerate the strings in the language using the CSG, and whenever we derive a string of terminals only, we check to see if its equal to w and answer ‘yes’ if so. In contrast, when a sentential form is created that is longer than w, we know that w cannot lie along that path, since the sentential forms along that path will never shrink. If all paths in the breadth first enumeration have sentential forms that are longer than w then we know that w cannot be in the language and the algorithm can answer ‘no’.

Context Free (Type 2) Grammars and Languages[edit | edit source]

If in each of G’s productions is a single variable, then G is a context-free grammar (CFG) and L(G) is a context-free language (CFL). That is, regardless of the symbols in a sentential form that surround the variable on a production’s left-hand side, the variable can be rewritten to . Context-free grammars and languages are subsets of context-sensitive (or non-contracting) grammars and languages. Every context free language is also a context sensitive language, but not vice versa. Similarly, every context sensitive language is an unrestricted language, but not vice versa. Thus, because inclusion is transitive, every CFG is an unrestricted grammar and every CFL is an unrestriced language.

Consider the grammar GSimple from a previous section on unrestricted languages. GSimple is a CFG and L(GSimple) is a CFL. A similarly structured CFG is GPalindromes, over Σ = {a, b}.

S aSa bSb a b ε

Note that in this example and others, we will make an allowance for the empty string ε to be in the language. Technically, the empty string is not in a CFL because that would mean that there was at least one contracting production, which is disllowed in a CSL, and thus a CFL too. Nonetheless, the exception is widely used and we'll allow the start symbol of a CSG (and this CFG) to produce the empty string.

A derivation for 'bbaabaabb', S bbaabaabb, using GPalindromes, is S bSb bbSbb bbaSabb bbaaSaabb bbaabaabb.

The CFGs and CFLs are probably the most important class of grammars and languages to us, at least for practical purposes. After Chomsky’s treatment of grammars was published, Algol became the first programming language with a syntax that was defined by a context free grammar. These grammars have been used to define other programming languages, we well as markup languages, and the grammars are adapted to define parsers for these languages. You have probably seen what amounts to context free grammars in syntax diagrams for various languages.

Leftmost and Rightmost Derivations[edit | edit source]

In a derivation of a string using a CFG, every sentential form before the final string will have one of more variables in it. If at every step we rewrite the leftmost variable in the sentential form next, then the derivation is a leftmost derivation. Similarly, if we rewrite the rightmost variable at each step then the derivation is a rightmost derivation. We will typically be most concerned with leftmost derivations because of the use of CFGs in programming languages. Though we won’t interrogate the relationship deeply, suffice it to say that because a programming language syntax is largely based on CFGs and because a language translator parses a program left-to-right, leftmost derivations are more practically relevant.

In all derivations of strings in L(GPalindrome) there will always be exactly one variable (i.e., S) for every sentential form of the derivation, except the last, so that there is no difference between leftmost and rightmost derivations in that example. Let's consider a grammar for balanced parentheses instead wher the productions of GBalanced are as follows.

B (B) BB ( ) ε

A leftmost derivation for '(( ))( )' is B BB (B)B (( ))B (( ))( ).

In contrast, a rightmost derivation for the same string is B BB B( ) (B)( ) (( ))( ).

Note that these derivations appear different, but the difference is entirely due to the order in which the variables are rewritten, and not due to something more fundamental.

Parse Trees[edit | edit source]

This is the single parse tree for the derivation of 'bbaabaabb' using GPalindrome. The dashed red enclosure highlights the actual string in L(GPalindrome).

A parse tree is another representation of a derivation of a string using a CFG. The root of a parse tree is the start variable of the grammar, the internal nodes each correspond to a variable to be rewritten, the children of an internal node represent the various symbols, terminal and non-terminal, in the right-hand side of the production that is used to rewrite the variable representing the parent, and the leaves throughout the parse tree are the characters in a string of the language. The derived string can be written by doing a depth-first enumeration of the tree, writing leaf/terminal symbols only.

Three different ways of representing the derivation of '(( ))( )'. Note that a parse tree does not vary solely as the result of the direction, leftmost or rightmost or mixed for that matter, that variables are rewritten.

Parse trees are invariant relative to the order of variable rewrites, so that the leftmost and rightmost derivation for '(( ))( )' using GBalanced correspond to the same parse tree.

Ambiguity of a CFG[edit | edit source]

A grammar is ambiguous if there are two or more leftmost derivations for at least one string in the language generated by the grammar. Alternatively, we could say that if there are two or more rightmost derivations, or two or more distinct parse trees, for a string in the language then the generating grammar is ambiguous. These three definitions of ambiguity are equivalent. Consider a CFG, GExp = (V, Σ, P, S), for simplified arithmetic expressions, where V = {Exp}, Σ = {id, +, *, (, )}, S = Exp, and P equals the set containing these productions:

Exp id

Exp Exp + Exp

Exp Exp * Exp

Exp (Exp)

Two leftmost derivations for the same string that demonstrate the ambiguity of grammar GExp for simple arithmetic expressions.
Two parse trees for same string that are an alternate and equivalent demonstration for ambiguity of grammar GExp for simple arithmetic expressions.

This grammar is ambiguous since there are two distinct leftmost derivations for the string id + id * id. There are two distinct parse trees for this string as well. You can confirm that there are two distinct rightmost derivations as well. Can you find other strings in L(GExp) that have more than one leftmost (rightmost) derivation and more than one parse tree?

Ambiguity of context-free grammars is particularly important because the syntax of so many programming and markup languages are specified by context-free grammars. As an interpreter or compiler processes a program or document, it should know precisely what to do in response, and an ambiguous grammar can be problematic in guiding the interpreter on the right course of action (e.g., on what micro-instructions should be executed or written as the higher level program is being processed). Its possible to add decision rules on what to do that lie outside the grammar, but removing the ambiguity in the grammar itself is an elegant option. The grammar for arithmetic expressions is the classic educational example of revising an ambiguous grammar to remove ambiguity.

It would be nice if there was a general algorithm that accepted an ambiguous grammar and created an unambiguous grammar that generated the same language as the input grammar. Alas, no such algorithm exists. There is not even an algorithm that identifies an arbitrary input grammar as ambiguous or not! The picture is not that dismal however since in practice there are rules of thumb for disambiguating a grammar based on ‘domain’ knowledge. Programming language constructs are one kind of domain knowledge that can guide and constrain disambiguation, as in the case of the arithmetic expression example, where the domain knowledge is the constraints provided by precedence conventions, with identifiers and parenthesized expressions having the highest precedence, addition having the lowest, with multiplication in between. In the case of GExp a variation we'll call GOpPrec = (V, Σ, P, S) generates the same language (i.e., L(GExp) = L(GOpPrec)), but is not ambiguous. For GOpPrec, V = {Factor, Term, Exp}, Σ = {id, +, *, (, )}, S = Exp, and P equals the set containing these productions:

Factor id | (Exp)

Term Factor | Term * Factor

Exp Term | Exp + Term

The grammar GBalanced is ambiguous because there is more than one way to derive '(( ))( )' and other strings by alternatively using production B ( ) and B ε. Notice that in the parse tree, ε is shown as a leaf, but is not considered an explicit symbol in the final string.
An expression grammar based on conventional binary operator precedence, and nests the highest precedence operator (i.e., *) more deeply, as on the left, and in a postorder traversal of the tree deeper operators will be evaluated before their ancestors. In contrast, operators of the same precedence, as in the right, are evaluated left to right, and so those on the left will be deeper in the parse tree.

We will return to this grammar when we talk about the relevance of CFLs to programming languages. Consider another example of an ambiguous grammar, GBalanced, from above. Can you see why this grammar is ambiguous? Consider the string that we have already addressed, '(( ))( )'. There are at least two leftmost derivations for this string. We previously only looked at one, which is repeated here: B BB (B)B (( ))B (( ))( ). This derivation uses the production, B ( ), to transition from the third to the fourth sentential form. But consider this leftmost derivation which uses the production B ε: B BB (B)B ((B))B (( ))B (( ))( ). Are there any other leftmost derivations for the same string? Naturally, if there are distinct leftmost derivations. then there are distinct rightmost derivations and distinct parse trees too.

In the case of this last example of ambiguity, there is a domain independent strategy for removing the source of ambiguity, while still retaining ε as a member of the language. We create a new grammar, GBalanced', by introducing a new start variable, B', and two productions, B' B | ε, and removing the production B ε. Note that ε is still in the language and that L(GBalanced') = L(GBalanced), but there is now only one leftmost derivation of '(( ))( )' and other strings as well. This domain independent strategy can be applied in any similar circumstance to remove the same form of ambiguity.

Inherent ambiguity of a CFL[edit | edit source]

If all grammars that generate a language are ambiguous then the language is inherently ambiguous. There is no algorithm in the general case that can identify whether a language is inherently ambiguous or not, but clearly if an unambiguous grammar that generates the language is identified then the language is not inherently ambiguous.

Remember that ambiguity is a property of grammars, and inherent ambiguity is a property of languages.

Normal Forms[edit | edit source]

There are two normal forms for context free grammars that have been used for practical and theoretical purposes. Any CFL that excludes ε can be expressed by a grammar in both normal forms.

Chomsky normal form (or CNF, not to be confused with conjunctive normal form in logic) has productions of the form X YZ or X x, where X, Y, and Z are variables and x is a terminal. Greibach normal form (or GNF) has productions of the form X xβ, where X is a variable, x is a terminal, and β is a (possibly empty) string of variables only.

There are algorithms for translating an arbitrary CFG into an equivalent CFG in CNF and into an equivalent CFG in GNF. The conversion to CNF is the easiest. For example, if we want to find a CNF that is equivalent to Gsimplelogic, with productions S ~S [S ⊃ S] p q then first (1) replace all terminals with terminal specific variables when the right hand side (rhs) is larger than 1 terminal:

S NS LSISR p q

N ~

L [

R ]

I

and the (2) accumulate variables in the righthand side of length greater than 2

S NS p q XLSISR

XLSIS XLSIS

XLSI XLSI

XLS LS

The Pumping Lemma for CFLs[edit | edit source]

If there is a string w that is generated by a CFG in Chomsky Normal Form with m variables, and if the string is long enough (|w| > 2m-1), then there is a variable that must have been used twice in the longest path of the parse tree for w, and w can be written as w = uvxyz
If w = uvxyz is in the CFL L and |w| > 2m-1 with |vxy| ≤ 2m and v and y are not both ε, then uvixyiz for i ≥ 0 is in L too

There are identifiable patterns in CFLs that are useful to know, for purposes of theory at least. An important such pattern is given by the Pumping Lemma (PL) for CFLs. The PL says that for any CFL there is a length threshold. If a string is equal to or larger than that threshold, call it n, then the string, w can be written as uvxyz and substrings v and y can be repeated indefinitely and the resulting string will be in the language too. Substrings v and y can be omitted too, signified by v0 and y0, and the resulting string will still be in the language. The lemma can be expressed with quantifiers as follows. If L is a CFL then

n (∀w∈L,|w| ≥ n (∃uvxyz=w,|vxy|≤n,|vy|≥1(∀i≥0 uvixyiz ∈ L)))

“∃n (∀w∈L,|w| ≥ n” says that there is a threshold n such that for all sufficiently long strings in the language; “(∃uvxyz=w,|vxy|≤n,|vy|≥1” says that z can be rewritten in terms of 5 substrings, the central three are less or equal to the threshold (i.e., they are sufficiently close), and v and y together consist of at least one character between them.

“(∀i≥0 uvixyiz ∈ L)))” says that we can repeat v and y zero or more times and the resulting string will still be in the language.

The reasoning behind this construction stems from considering the parse tree for a sufficiently long string generated by a Chomsky Normal Form grammar, which can represent any CFL. If the CNF grammar has k variables, then let the threshold of n be 2k. As the accompanying figures illustrate for the case of a string of at least n in length, at least one variable must appear at least twice along a path of the parse tree. It is the identification of this necessary repeating variable that is basis of being able to repeat or ‘pump’ substrings with the expectation that the result will be in the language. If the pumping lemma seems somewhat convoluted, give it time and it may seem elegant, and maybe does already.

Proofs using the Pumping Lemma[edit | edit source]

The Pumping Lemma can be used to show by contradiction that certain languages are not CFLs. Our job is to find a sufficiently long string w that is undeniably in the language of interest and then to show that there no no way of partitioning the string into uvxyz under the PL's constraints such that the result of pumping v and y some number of times results in a string that is also in the language. The outline of a proof that L is not a CFL follows the lemma as expressed with quantifiers above.

  1. “∃n (∀w∈L,|w| ≥ n” Pick an n. It must exist if L is a CFL. We don't have to tie n down to a particular value because we will next select a sufficiently long w that is defined in terms of n.
  2. “(∃uvxyz=w,|vxy|≤n,|vy|≥1” w was selected in step 1 so as to constrain the partitioning into uvxyz is a manner that satisfies the constraints |vxy|≤n,|vy|≥1.
  3. “(∀i≥0 uvixyiz ∈ L)))” Find a value i (we only need to find one value of i, but again it can be a function of n) such that uvixyiz L, which contradicts ∀i≥0 uvixyiz ∈ L.

Example: Show that L = {ai | i is a prime number} is not a CFL.

Step 1: Let n be the pumping lemma constant. Select w = am where m be the first prime greater than or equal to n.

Step 2: w = am = uvxyz, where |vxy|≤n and |vy|≥1, though the former constraint is not used in this example

Step 3: Without lose of generality assume v has j a's and y has k a's (j+k 1), so v=aj, y = ak, uxz = am-(j+k)

Now consider successive values if i

|w| =   |uvxyz| = j + k + m – (j + k) = m

  |uv2xy2z| = 2j + 2k + m – (j+k) = m + j + k

  |uv3xy3z| = 3j + 3k + m – (j+k) = m + 2j + 2k

  …

  |uvm+1xym+1z| = (m+1)j + (m+1)k + m – (j+k)

          = m + mj + mk

          = m (1 + j + k)   can’t be prime, so am (1 + j + k) not in language, which violates the PL condition that (∀i≥0 uvixyiz ∈ L) if L is a CFL.

In step 3 it can be the case that we have to consider a number of different cases in order to show a contradiction across all i.

Example: Show that {aibjck | i+1 < j, j+1 < k} is not CFL

Step 1: Let n be the pumping lemma constant. Let w = anbn+2cn+4. w is certainly long enough whatever the particular value of n.

Step 2: Let w = anbn+2cn+4 = uvxyz ; |vx| >= 1; |vxy| <= n.

Step 3: Case of v, y all a’s and/or b’s: pump, then more a’s than b’s or b’s than c’s

Step 3: Case of v some number of a’s; y some number of c’s – can’t happen, violates constraint that |vxy| <= n

Step 3: Case of vy all c’s. Consider uv0xy0z, then more b’s than c’s or equal number of b’s and c’s

Regular (Type 3) Grammars and Languages[edit | edit source]

Figure ParseRegular: A parse tree for '1110' using Gregular.

Finally, regular languages are a subset of the context free languages, and regular grammars puts a further restriction on the context free grammar. I regular grammar is a context free grammar where every production is of the form A aB or A a, where A and B are variables and a is a terminal. At first glance a regular gram the grammar G GNF. But where GNF allows any number of variables to follow the first terminal in the righthand side of a production, a regular grammar allows at most one. We get to applications of regular grammars and languages later when discussing the broader applications of context free languages for programming and markup language syntax.

Consider this grammar Gregular = ({S, A, B}, {0, 1}, P, S}, where P is the set of productions:

S 0A

S 1B

A 0A

A 0S

A 1B

B 1B

B 1

B 0

S   0

You are asked to characterize the language generated by Gregular as an exercise, but clearly it is regular. Figure ParseRegular gives a parse tree for the string 1110, which is a right linear tree. Every parse tree for a string generated by a regular grammar will be right linear. For this reason regular grammars are sometimes called right linear grammars, and the languages they generate are sometimes called right linear languages in other sources. This book will use 'regular' consistently when referring to grammars and languages, however.

Every regular grammar is a context free grammar. Note that regular grammars, by definition, are in GNF. A regular grammar can be easily changed to a grammar in CNF that accepts the same language. In particular, replace each production of the form A aB in a regular grammar with A A'B, where A' a is also added as a production.

The Pumping Lemma for Regular Languages[edit | edit source]

The Pumping Lemma for regular languages is a special case of the Pumping Lemma for context free languages.

Since every regular language is also a context free language, the pumping lemma for CFLs also apples to RLs, but in this latter case of a regular language, y and z are both empty. Thus, the Pumping Lemma for regular languages can be simplified to ∃n (∀w∈L,|w| ≥ n (∃uvx=w,|uv|≤n,|v|≥1(∀i≥0 uvix ∈ L))).

Let's revisit L = {0m | m is prime}. We already know that this is not a CFL, so its obviously not an RL either, so this is just to illustrate the RL form of the Pumping Lemma. Assume L is regular, so L is accepted by a DFA with n states (aka pumping lemma constant n). Consider a string of m 0s where m is the first prime number greater than n.

0m = 0j0k0m-(j+k), where u = 0j, v = 0k, and x = 0m-(j+k).

|0m|  =  j+k+(m-(j+k))   =  m

Pump v once  j+2k+(m-(j+k))  =  k+m

Pump v twice  j+3k+(m-(j+k))  =  2k+m

Pump m times  j+(m+1)k+(m-(j+k))  =  mk+m  = m(k+1) is not prime

0m(k+1) not in L by definition – a contradiction

Exercises, Projects, and Discussions[edit | edit source]

Type 0 Discussion 1: We noted in the first paragraph of this section that the definition of a grammar did not seem to overtly preclude the possibility of a production with a left-hand side of only terminal symbols (i.e., with no variables). List the implications of allowing productions with left-hand sides of only terminal symbols. As you reflect on the matter, consider discussing the issue with a large language model -- you are welcome to do so.

Type 0 Project 1: Research and discuss the relationship between Type 0 grammars and Markov Algorithms.

CSL Exercise 1: Give a derivation for 'aabbcc' using Gabc.

CSL Exercise 2: Give an attempted derivation for 'aabc' using Gabc. Clearly indicate deadend paths in a breadth-first enumeration of paths.

CSL Exercise 3: Find a context sensitive (non-contracting) grammar that is equivalent to GComplex -- that is, your answer should generate the same language as GComplex, but have no contracting productions. Hint: such a grammar can be found. You are welcome to use a large language model to gain insights into the process of converting GComplex to an equivalent CSG, GComplex'.

CSG Exercise 4: Give a context-sensitive grammar, Gww, for the language L = {ww | w is a non-empty binary string of 0s and 1s, and ww is such a string concatenated with itself}.

PL Exercise 1: Show {aj2bj | j >= 1} is not a CFL. Note that the number of a's is j2, not j*2, and that the number of b's is j. That is, the number of a's is the square of the number of b's.

PL Exercise 2: Show that {bi#bi+1 | bi is i in binary} is not a CFL

PL Exercise 3: Show that {aibici | i ≥ 1} is not a CFL.

PL Exercise 4: Can v0 and y0 be used in an alternative proof that L = {ai | i is a prime number} is not a CFL? If so, how?

CFG Exercise 1: Give a context-free grammar for the language of strings over {a,b} with exactly twice as many a’s as b’s. Due to Hopcroft and Ullman (1979)[14] and Hopcroft, Motwani, and Ullman (2007)[15]. Hint: you may have to consider different special cases, allowing for the a's to be positioned differently relative to the bs.

CFG Exercise 2: Give a context-free grammar for the language of {aibjck | i ≠j or j ≠k}. Due to Hopcroft and Ullman (1979) and Hopcroft, Motwani, and Ullman (2007).

CFG Exercise 3: Give a Chomsky Normal Form grammar (a) for L(GExp), (b) for balanced parens, (c) for palidromes over {a, b}.

CFG Exercise 4: Give a Greibach Normal Form grammar (a) for L(GExp), (b) for balanced parens, (c) for palidromes over {a, b}.

Type 3 Exercise 1: Give an English characterization of the language generated by Gregular. If you use a large language model, then give the prompt that you used, and the response that you received, along with any edits or corrections that you made.

Type 3 Exercise 2: Give a regular grammar that generates allowable identifiers in a fictional programming language, where the identifiers must begin with a letter, followed by any finite number of letters and digits. Special characters _ (underscore) and - (hyphen) can be used but no two of these special symbols can appear consecutively, nor can an identifier end in a special symbol.

Type 3 Exercise 3: (Due to ?) Give a regular grammar generating L = {w | w is a string of one or more 0s and 1s and w does not contain two consecutive 1s}

PL Exercise 5: Show that the set of strings of balanced parentheses is not regular using the Pumping Lemma.

PL Exercise 6: Show that L = {0i*i | i is an integer and i ≥ 1} is not regular. I used i*i instead of i2 for reasons of formatting.

Automata and the Chomsky Hierarchy[edit | edit source]

Automata (or Machines) specify how the strings in a language can be recognized. Automata are another form of finite representations of formal, typically infinite languages. We describe four broad categories of automata and corresponding categories of languages that the automata categories represent. The close correspondence between between generators and recognizers should not surprise us, since we have already seen in the form of programming language parsers how a generator (grammar) can be adapted to a recognizer, but interestingly, and I think reassuringly, the language classes defined by the major types of automata correspond exactly to the language classes of the Chomsky Hierarchy as defined by grammar classes!

Finite automata (FAs) (aka finite state machines or FSMs) are recognizers of the regular languages within the Chomsky hierarchy.

In contrast to our treatment of grammars, which proceeded from broadest class (Type 0 or unrestricted) to the most constrained class (Type 3 or regular), when presenting classes of automata for language classes we work in the opposite direction, from simplest and most constrained automata (Type 3, regular), known as finite automata (FAs), to the computationally most powerful theoretical automata, which are Turing machines for Type 0, unrestricted languages. The most important reason for this change in direction is that FAs can be regarded as standalone computational devices, but they are also integral to all the automata types that we will be discussing.

Finite Automata: Recognizers of Regular Languages[edit | edit source]

As we will see soon, finite state automata or simply finite automata are recognizers of regular languages. A finite automaton (FA), M over an alphabet Σ, is a system (Q, Σ , , q0 , A) , where Q is a finite, nonempty set of states; Σ is finite input alphabet; is a mapping of Q × Σ into Q (that is, is a set of transitions between states on particular Σ symbols); q0 is in Q and is the start state of M; and AQ and is the set of accepting states. FAs are nicely summarized visually as state transition diagrams (see Figure FAexample1), where states (Q) are shown as circles, transitions between states () from one state to another state are labeled by an alphabet (Σ) symbol, an unlabelled arc that emanates from outside the diagram indicates the start state (q0), and concentrentic circles identify the accepting states. In other sources you may see "final" states used instead of or in addition to "accepting" states. This book uses "accepting", however, rather than "final", since a so-called "final" state need not be final at all -- transitions from "final" states to other states are possible and common.

Figure FAexample1: This finite automaton recognizes (or accepts) strings of 0s and 1s that start with '00'. Any state with concentric circles, only q2 in this example, is an accepting state. The start state, q0 in this case, is indicated by an arc that originates external to the automaton. A trap state, q3 in this case, is a state for which there is no exit.
Figure FAexample2: This automaton is a 'shorthand' for the automaton in FAexample1. Since there are no transitions explicitly shown on '1' out of states q0 and q1, we take this to imply that there are transitions to (unnamed) trap states from q0 and q1 on '1'.

To determine whether a string, w of symbols from Σ is in the language recognized by an FA, M, the string is processed from the left to right, one symbol at a time. Before processing of the input string begins, the FA is in its start state. As each symbol in w is read from the left-to-right, an appropriate transition is taken from the current state to the next state indicated by a transition. In addition to a state transition diagram, the transition function is often represented as a list of transitions or as a table of transitions. Both are shown in Figure FAexample1, which represents/accepts the strings over 0s and 1s that start with '00'.

The language accepted by an FA is the set of strings that place the FA in an accepting state when the string has been completely processed (i.e., read). In the case of the example FA, there will be exactly one path through the FA from the start state that corresponds to a given string. The string either ends up in an accepting state (i.e., its in the language defined by the FA) or it ends up in a non-accepting state (i.e., its not in the language defined by the FA).

When presenting an FA transition diagram (or list or table) it is a common convention, for purposes of comprehensibility, to omit explicit inclusion of trap states, and any transition to a trap state is omitted as well. If in processing a string there is no transition shown for a (state, input) pair, we might call the transition undefined, but literally it signifies a transition to a non-accepting trap state, so we know the string being processed is not in the language. A "shorthand" version of the FA in Figure FAexample1 is shown in Figure FAexample2. It should be clear that from the standpoint of acceptance and non-acceptance, an FA never needs more than one trap state, if any at all.

Deterministic and NonDeterministic FAs[edit | edit source]

If there is exactly one transition defined for every (state, input) pair in an FA, then the FA is deterministic. The FA in FAexample1 is a deterministic FA (DFA), as is the FA in Figure FAexample2, remembering it is but shorthand, with implied (but real), transitions to a trap state. The FA of Figure FAExample3 is also a deterministic automaton.

Figure NFAExample1: In a nondeterministic finite automaton (NFA) every (state, input) pair (qi, j) maps to one or more states (qi,j) = qk. In this example, all (state, input) pairs map to exactly one state, except (p, 0), for which there are two transitions defined. This figure shows the shorthand version of the NFA too, where transition to a trapstate is left implicit.
Figure NFASearch1: The points of nondeterminism in the NFA in this case are at (p,0). A breadth first search for a path on input 00101 leads to one accepting state, so the NFA accepts.

If there is one or more transitions that can be taken for any of the same (state, input) pair then the FA is a nondeterministic finite automaton (NFA). Notice that by definition every DFA is an NFA, but not vice versa. Figure NFAExample1 illustrates an NFA, which is not also a DFA. In this example there are two transitions defined for (q0, 0).

In the case of an NFA, there may be more than one path from the start state to accepting states after processing a string. You can think of each path being enumerated using a breadth first enumeration strategy. An NFA accepts a string if at least one path ends with an accepting state after the string is processed. This is illustrated in Figure NFASearch1 for the input string 00101.

Equivalence of NFAs and DFAs[edit | edit source]

Figure NFAFrontiers: Frontiers of possible NFA recognition trajectories correspond to states in an equivalent DFA.

The languages that are recognized by the NFAs and DFAs are the same. That is, for any NFA there is a corresponding DFA that recognizes the same language as the NFA, and as we have already noted, every DFA is by definition an NFA. In the former case, we can translate an arbitrary NFA to a DFA, where the DFA is constructed by considering combinations of states in the NFA to be single states in the DFA. Intuitively, each state in the DFA will correspond to a possible frontier in the breadth first enumeration of strings/paths recognized/expanded by the NFA. Figure NFAfrontiers shows some of the frontiers, in ellipses, in the search for a particular string, but that process is generalizable to enumeration of all strings, not just 00101. In the worst case, if there are |K| states in an NFA, there will be O(2|K|) states in a corresponding DFA. While the NFA description of a language is often simpler and more elegant than the DFA equivalent, the NFA will require enumeration in practice when actually processing strings, and thus the NFA has higher runtime costs than a corresponding DFA.

Liberal Expansion then Pruning[edit | edit source]

One approach to do the translation from NFA to DFA uses an approach of liberal expansion then pruning. We won't ultimately use this approach, but its a good example of algorithm development, and I'll try to demonstrate the process of developing an algorithm. (1) Given an NFA, let's take the power set of the NFA states (i.e., the set of all possible combinations of NFA states), where each member of the power set will correspond to a state of the new, equivalent DFA. (2) The start state of the DFA will correspond to the start state of the NFA. Consider the top NFA in Figure NFAExample1. The start state of the DFA will be {p}, where p is the start state of the NFA and {p} is a member of the power set of NFA states. (3) To determine whether there is a transition between any two DFA states, lets look at the NFA states that are used in the DFA state composition, and create a transition between the DFA states, call them X and Y, for which there is a corresponding transition between an NFA state in X and an NFA state in Y. Two DFA states would correspond to X={q, r} and Y={r, t}, since each is a combination of NFA states. To determine if there is a transition on a '1' from X to Y, we note that in the NFA there is a transition from q (one member of X) to r (one member of Y). This observation isn't sufficient for adding a transition from X (i.e., {q,r}) to Y (i.e., {r,t}) on '1' in the DFA, however, since that same reasoning would also result in a transition from {q, r} to {r} in the DFA, and from {q, r} to {r, t, s} too, and for every Y for which r was a member. This is just a bad way of creating another NFA, rather than translating an NFA to a DFA. Rather, let's assume that a transition is made only between X and Y on a symbol x if every member of X participates in an NFA transition to a member of Y on x, and if every member of Y is the result of a transition on x from a member of X in the NFA. So, returning to our example, we would create a transition from {q, r} to {r, t} on '1' in the DFA because and in the NFA, but {q, r} would not transition to any other state on '1' in the DFA. (4) After adding transitions using this strategy, we prune any states and subgraphs that are not reachable from the start state. For example, {r, t} is reachable from {q,r}, but is {q,r} reachable from {p}? No, since p is part of every DFA state because of the initial loop on 0,1. This is seen in the top-most path in Figure NFAfrontiers for '00101' but confirm your understanding that p is part of any frontier of a breadth first enumeration of all reachable states. Having found that {q,r} is not reachable from {p}, we prune {q,r}, then eliminate any states for which there are no incoming links, and repeat the process until all unreachable states are eliminated.

Liberal expansion and pruning, as we have sketched it, works for NFA to DFA translation, but it initially creates many more DFA states than necessary in most cases (i.e., O(2|K|). Nonetheless, variations on liberal expansion and pruning are useful in many circumstances, which we address further when discussing AI.

Another strategy for NFA to DFA translation is to grow the number of DFA states more judicially than naive creation and pruning of the power set of NFA states.

Systematic and Judicious Growth[edit | edit source]

Figure NFAtoDFAExample1a: Steps described in main text.

The prefered process for NFA to DFA translation essentially does as we started with -- a breadth first expansion of possible paths taken through the NFA. We can use a table representation of this expansion, and where each cell in the table is a DFA state represented as a possible NFA frontier. Again, the worst. case table size will be exponential in the number of NFA states, but in most cases it is far smaller. Using our running example (but with the 'shorthand' FA), we see in Figure NFAtoDFAExample1a that {p} is the DFA start state. p and q are on the frontier of the breadth-first enumeration after processing a 0, so DFA state {p,q} are added as a DFA state. Only p is on the frontier after processing the first 1, and so {p} transitions to itself in the DFA.

Continuing, {p, q} is a DFA state that must be investigated. From {p, q} on a 0 we have to identify states that can be next from either p or q (inclusive or). From p on a 0 we can go to p or q in the NFA. From q on a 0 we can go to r. So {p,q,r} becomes a new state in the growing DFA. Note that {p,q,r} corresponds to a frontier in Figure NFAfrontiers.

Still on Figure NFAtoDFAExample1a, from {p,q} on a 1 we can stay in p, or from q we can go to r in the NFA, so {p,r} is a new DFA state.

Figure NFAtoDFAExample1b: Steps described in main text.
Figure NFAtoDFAExample1c: Steps described in main text.

We continue to process those DFA states that were created in the previous step. In Figure NFAtoDFAExample1b we take {p, r} next and then {p,q,r}, though either order would yield the same results. As we know, there is an implicit trap state that can be reached from r too, but we can keep the trap state implicit for simplicity's sake (and the resulting DFA may be a 'shorthand' too).Continuing further we arrive at the table of Figure NFAtoDFAExample1c, with all DFA states generated from all NFA frontiers that are reachable from {p}.

Figure NFAtoDFA1d: Final DFA equivalent to the NFA of Figures NFAtoDFA1a - NFAtoDFA1c. Note that the transition from pr to pqs on 0 is mistakenly missing, and will be corrected.

The final DFA, expressed as a transition diagram, is shown in Figure NFAtoDFAExample1d. Note that the accepting states in the final DFA correspond to all NFA frontiers that include an accepting state in the NFA, since each of these include at least one path that accepts a string. At this point we could rename the DFA states if we wished to simple atoms, but we can leave the names as is, which is certainly helpful for illustration here, but remember the names, while appearing to be 'composite' are actually atomic in the final DFA.

Connection to Machine Learning[edit | edit source]

If we compare the DFA of NFAtoDFAExample1d with the NFA that we began with we are reminded of an observation that we began with: that the "NFA description of a language is often simpler and more elegant than the DFA equivalent" but the "NFA will require enumeration in practice when actually processing strings, and thus the NFA has higher runtime costs than a corresponding DFA." So a development strategy that suggests itself is that human software developers start by specifying a non-deterministic software solution to a problem, and an AI translate this non-deterministic solution to a deterministic solution. In real world problems however it is probably not possible to translate to a completely deterministic solution because of uncertainties in an environment, but in real life settings it is often possible to eliminate or otherwise mitigate the costs of nondeterministic choices with the help of probabilities of transitions that are seen in practice. We will delve further into this in later chapters.

Retrospection[edit | edit source]

Systematic and judicious growth is another general strategy for problem solving. We generate only those states that we need to, again by using an enumeration strategy. "When we need to" is something that is up for interpretation, however, and yet another strategy, which is sometimes called a lazy enumeration strategy will only enumerate the space necessary to identify paths for a given string, one at a time (e.g., much like Figure NFAFrontiers). Such lazy strategies are used in AI systems and algorithms more generally, and can be particularly important when strings (and derivations and recognition paths) vary in the probability with which they occur in some environment. Again, this will be covered more when we discuss AI and machine learning, as well as probabilitic grammars and probabilistic automata, in more depth.

Finally, while we used Figure NFAExample1 to illustrate the equivalence of NFAs and DFAs, we didn't actually say what language the NFA and DFA of Figure NFAtoDFA1d represented. You could reflect on this before reading on, but the answer could help comprehension of the example. Spoiler. The language recognized are strings of 0s and 1s in which there are two 0s seperated by one other symbol. In state p of the NFA there are two possible actions when a 0 is encountered. One action is to guess that the 0 is the first of the two paired 0s by transitioning from p to q, and the other action is to guess that the 0 is not the first of two suitably paired 0s and stay in state p.

Equivalence of Regular Grammars and FAs[edit | edit source]

Figure FA-RegularExample1: Equivalent regular grammar and NFA.

We defined the regular languages as those that could be generated by regular grammars. But the regular languages are exactly those that are recognized by FAs as well. We show this by showing how any regular grammar can be converted to an NFA (which can then be translated into a DFA), and by showing how a DFA (which is also an NFA) can be translated to a regular grammar. Both processes are straightforward.

To translate a regular grammar to an NFA, insure first that ε only appears, if at all, on the righthand side of the start symbol, S, and that if ε does so appear (because ε is in the language generated by the grammar), then S does not appear on the right hand side of any production. Let G = (VG, ΣG, PG, SG) be a regular (type 3) grammar. Then there exists a finite automaton M = (QM, ΣM, M, SM, AM) with L(M) = L(G) (i.e., the languages recognized by M and the language generated by G are the same), and thus ΣG = ΣM.

(1) Define the states of the NFA, QM, to correspond to the variables of the grammar, VG, plus an additional state H, so the states of the NFA are QM = VG {H}.

(2) The initial state of the NFA corresponds to the start symbol, SM = SG, of the regular grammar.

(3) If there is a production SG ε, then the accepting states of the NFA are AM = {SM, H}, otherwise AM = {H}.

(4) M(X, d) contains the accepting state H if X d is in PG. Recognize that there is always a single variable in each sentential form of a regular grammar derivation, except the last one which removes the remaining variable.

(5) M(X, d) contains Y for each Y that appears in a production X dY.

In short, the NFA will follow very closely from the grammar, with states corresponding to non-terminal symbols and transitions in the NFA corresponding to productions in the grammar. Figure FA-RegularExample1 shows a Regular grammar and a NFA representing the same language. Nondeterminism occurs at State A on an 'a' and at state B on a 'b'.

Now we wish to show that an FA can be translated to a grammar that represents the same language as the FA. In this directiuon we assume the FA is a DFA, M, knowing that we can always translate an NFA to a. DFA if needed. Again, the correspondence between FA and grammar is very close. Define a regular grammar G = (VG, ΣG, PG, SG), where PG includes X dY if M(X,  d) = Y and X d if M(X,  d) = H and H is in AM. Again, see Figure FA-RegularExample1 to confirm your understanding, remembering that there is something you need to do before applying the translation procedure from FA to regular grammar.

NFAs with epsilon Transitions[edit | edit source]

Figure NFAExample2: A NFA with transitions. See the main text for an explanation of transitions. The top transition diagram is 'shorthand' since it does not explicitly show the trap state, but otherwise both NFAs accept the strings of zero or more 0s, followed by zero or more 1s, followed by zero or more 2s. In the lower version, when in state q0 a transition on a 1 goes to the trap state, but note that strings that start with 1 (because there are no 0s) can still occur by taking the transition to q1 before reading a 1.

An important source of nondeterminism in FAs is when multiple transitions are defined for any (state, input) pair; we have already discussed that source of nondeterminism. A second source of nondeterminism is epsilon () transitions. An transition from one state to another can be taken without reading an input symbol. Consider the NFA in Figure NFAExample2. In the top, shorthand NFA, from q0 we can take an transition to q1 before reading an input symbol. This choice of taking transitions can be applied repeatedly. If we are just starting to read a string from left to right using NFAExample2, we can either (1) read the first symbol in q0 and stay in q0 if its a '0', or go to a trap state otherwise; or (2) we can take an transition to q1 and read the first symbol in q1 and stay in q1 if its a '1', or go to a trap state otherwise; or (3) we can take an transition to q1 and then another transition to q2 and read the first symbol in q2 and stay in q2 if its a '2', or go to a trap state otherwise. In general, transitions can be a source of considerable nonterminism. Any of these 3 options will be the first steps of different recognition trajectories. In each of these three options, we chose to take the transitions BEFORE reading an input symbol, but its also allowed to take the AFTER reading a symbol. So, for example, we could expand option (2) from q0 we could take an transition to q1, read a '1', and take an transition to q2, all before reading the second input symbol.

Figure NFAProcess2a: Transitions between states can occur without consuming input, but when an input symbol is processed it must be consumed. Note that when transitions are included, the frontier is regarded as all those states that occur after the input symbol (i.e., '0' in this case) is read, even if transitions are subsequently taken along a path. So the frontier in this case includes all states -- q0, q1, q2, and the implicit trap states qt), rather than simply those states that are end points along a path.
Figure NFAProcess2b: Trajectories after processing the first '1'. The froontier in this case are all states that occur after reading the '1' along some path, so the frontier is composed of states q1, q2, and the trap state qt.

Epsilon-closure[edit | edit source]

An important construct is the epsilon-closure or -closure of a state, say p. -closure(p) is the set of all states that are reachable from p using only transitions. -closure(p) includes p itself. In Figure NFAExample2 -closure(q0) = {q0, q1, q2}, -closure(q1) = {q1, q2}, and -closure(q2) = {q2}.

Updating frontiers[edit | edit source]

Figure NFAProcess2c: This repeats Figure NFAProcess2b, still after processing the first '1', but not showing all paths for the sake of comprehensibility. While the second '1' has not been read yet, it should be clear that when it is, only paths extending state q1 will avoid the trap state in this example.

In general, when processing a string from left to right, the process traces out multiple trajectories, and if at least one trajectory or path ends in an accepting state then the string is accepted. As each symbol in the string is processed, the trajectories so far are expanded, with as many -transitions as are allowed extending paths, as well as the current input symbol. Figures NFAProcess2a - NFAProcess2c illustrate the process for the first two symbols of '0112'. The captions explain the conventions used, notably on what constitutes a frontier at each step. Processing the last two symbols of '0112' is left as an exercise.

Translation to DFAs[edit | edit source]

Figure NFAtoDFAExample2: When converting an ε-NFA to a DFA, the start state of the DFA is the -closure(q0), where q0 is the start state of the ε-NFA.

The process of translating NFAs with transitions (ε-NFAs) to DFAs is much like the previously presented procedure of translating NFAs without transitions to DFAs, with one important additional step. In the previous translation procedure, the start state of the NFA is used as the start state of the DFA being constructed. But when epsilon transitions are present, the start state of the DFA will correspond to the epsilon closure of the start state of the NFA. In the example of Figure NFAExample2 the -closure(q0) = {q0, q1, q2}. So (q0, q1, q2) is the start state of the DFA, and because q2 is an accepting state in the NFA, (q0, q1, q2) will be an accepting state in the DFA too. Figure NFAtoDFAExample2 should the DFA that is equivalent to the NFA.

As a thought experiment consider what would have resulted without taking the first step. q0 would have been the DFA start state, but it would not have been an accepting state, so the DFA would not have accepted the empty string, though NFAExample2 does accept the empty string. Thus, the two representations would not have been equivalent in terms of the language that they represented. In general, always do a sanity check when doing translations between representations, testing on selected example strings, but of course placing your highest reliance on understanding and using equivalence-preserving operations for translation.

Equivalent Representations of Regular Languages[edit | edit source]

Figure NFAExample3: An NFA for strings over {0, 1}, such that two 0s are separated by a string with length 4i, for some i 1. There is an ε transition, as well as two transitions defined for (q0, 0).

To recap, there are two possible sources of nondeterminism or choice in an NFA: ε transitions and multiple possible transitions on the same (state, input) pair. In general, both kinds of nondeterminism can appear in the same NFA, as in Figure NFAExample3, and both can be removed by the procedure we have covered for translation to a DFA. Given the discussion so far, there are the class of languages defined by NFAs with no ε transitions and the class of languages defined by NFAs in which ε transitions are allowed. By definition, the latter (ε transitions allowed) are a superset of the former (no ε transitions). But are they a proper superset? Does the addition of ε transitions add any computational power at all beyond the elegance of expression? No, because all NFAs, ε transitions or not, are translatable to DFAs.

Figure RegularRepresentations1: These are ways of finitely representing the regular languages that we have described so far. All four formalisms in this figure are equivalent. See main text for more details.

We have described a number of ways of representing regular languages, which are summarized in Figure RegularRepresentations1. Equivalence between two representations is demonstrated if there is a path from one representation to another, and vice versa. Thus, the class of languages of regular grammars is equivalent to the class of languages of DFAs since there is a path from one to the other in each direction. The DFAs are trivially NFAs, and the NFAs with no transitions are trivially NFAs with transitions allowed. FYI, we could eliminate the central two-way arc between NFAs (no transitions) and DFAs (i.e., we could have chosen not to demonstrate those translations), and the ablated graph that resulted would still be sufficient to indicate the equivalence of all four representations.

Regular Expressions[edit | edit source]

Regular expressions (REs) are yet another way of representing regular languages. A RE is a declarative represention, which expresses a pattern that strings in a language adhere to, whereas a FA specifies a process for recognizing strings in a language and a grammar specifies a process to generate strings in a language.

An RE is an expression over an alphabet Σ, using binary operators for concatenation (×) and for choice (+), and the unary operator of repetition, often known as Kleene Closure or simply closure.

All symbols in Σ are REs, which are atomic. If Σ = {a, b, c} then 'a' is an RE representation the finite language {a}, 'b' represents {b}, and 'c' represents {c}.

If r is an RE and s is an RE then r + s is an RE representing the language represented by r ORed (union) with the language represented by s. Strings in the language specified by r + s would be strings that matched r or strings that matched s. If r = a and s = b, then L(r+s) = {a, b}.

If r is an RE and s is an RE then r × s (or simply rs) is an RE representing the language represented by r concatenated with the language represented by s. Concatenating two languages results in another language with strings that are a concatenation of a string in the first language with a string in the second language. If r = a + b and s = c + b then L((a+b) × (c + b)), alternatively L((a+b)(c+b)), equals {ac, ab, bc, bb}.

As another example, suppose r = (a+b)(b+c) and s = c × b, then L(rs) = {abcb, accb, bbcb, bccb}. (I use c × b instead of cb to make it clear that I am using concatenation, so that cb is not confused an atomic identifier).

And another example, r = (a × b c c) + (c × a) and s = (b + a). L(rs) = {abcb, abca, cab, caa}.

Note that when using '+' the order of the arguments does not matter. In contrast, when using '×' the order does matter.

The repetition operator is unary and signified by *. If r is an RE then r* is an RE that represents the infinite language of strings in which strings in L(r*) are repeated 0 or more times. So if r = a, then L(r*) = L(a*) = {, a, aa, aaa, aaaa, ...}. Notice that the empty string is a member of L(r*), for the case where r is "repeated" 0 times.

Assume r = (a+b)(b+c), then L(r*) = L(((a+b)(b+c))*) = {, ab, ac, bb, bc, abab, abac, abbb, abbc, acab, acac, acbb, acbc, bbab, bbac, bbbb, bbbc, bcab, bcac, bcbb, bcbc...}. Occasionally, you might see notation that fixes the number of repetitions to some constant. For example, L(r0) = {}, L(r1) = {ab, ac, bb, bc}, L(r2) = {abab, abac, abbb, abbc, acab, acac, acbb, acbc, bbab, bbac, bbbb, bbbc, bcab, bcac, bcbb, bcbc}. This notation could be helpful to some students in understanding the repetition operator itself. For example, L(r*) = L(r0) L(r1) L(r2) L(r3) ... .

If we wish to express repetition 1 or more times we can write rr* or equivalently r(r*) since * has higher precedence than concatenation. But it is also common to write repetition 1 or more times with +, so if r = a+b then L(r+) = {a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, aaaa, ....} = L(rr*), with excluded.

Consider the set of all strings over {0, 1} with at most one pair of consecutive 0s. 0n, where n is greater than 0, would count as n-1 consecutive pairs. A first stab at a solution might be (01)* 00 (10)*, BUT it doesn’t allow a string that starts with 1, it requires exactly one 00, doesn’t allow a string that ends with 1, and it doesn’t allow for any repeating 1s. (1+ε) (01)* 00 (10)*(1+ε) is a second attempt, but it still requires exactly one 00, and it doesn’t allow for any repeating 1s. How about (1+ε) (01+1)* (00+ε) (10+1)*(1+ε)? What about a single 0? So, how about (1+ε) (01+1)* (00+0+ε) (10+1)*(1+ε)? Consider wherther this last example represents the language that's requested. If so, can you simplify the RE so that the revised version still represents the same language? In general, finding a correct (and elegant) RE will require some search.

REs represent the same class of languages as those of Figure RegularRepresentations1. Since the four previously studied schemes for regular languages are known to be equivalent, we can show the equivalence of REs to these other representstions by showing how any of the previous schemes can be translated to REs, and showing how REs can be translated to any of the other representations.

Converting DFAs to Regular Expressions[edit | edit source]

Figure DFAtoRE1: A DFA to RE example. Sequences such as state A to C and C to D are translated to a concatenation; looping translates to the repetition operator (*); and alternative branching translates to the choice operator (+).
Figure DFAtoRE1a: Adding an artificial start state and an artificial end state can simplify automated translation of DFAs to REs, insuring that all states that need to be eliminated are intermediate between other states, even if only the artificial additional states.

Converting DFAs to Regular Expressions (REs) uses a process of state elimination. At the beginning, single terminal symbols label the arcs of the DFA. Of course, each terminal symbol is a primitive RE. The state elimination algorithm eliminates a state at a time, and as it does so, REs of increasing complexity will label the arcs of the shrinking DFA. This calls for a generalization of what we mean by a DFA, but it is a natural generalization, in which arbitrary REs can label arcs instead of simply primitive, single-symbol REs.

For example, in the simplest case, suppose that states P, Q, and R, are arranged such that there is a transition from P to Q on an input ‘a’ and there is a transition from Q to R on an input ‘b’. Then if we want to eliminate Q next, we will replace arcs from P to Q and from Q to R with a single transition directly from P to R, and the transition will be labeled by the RE ‘ab’. If that is the only involvement of Q in the DFA, then Q is eliminated. But typically a state to be eliminated like Q will be involved with many other states, and for each pair of states Pi and Rk with transitions into and out of Q respectively, the algorithm will create new direct transitions between Pi and Rk with suitable REs labeling the transitions. A Pi and Rk may actually be the same state, so a loop is introduced from Pi (Rk) back to itself when eliminating Q.

Figure DFAtoRE2: Translating a DFA to an RE through repeated state elimination. When first eliminating state B we look at all the ways that we can go from state A to state C. All paths from A to C through B in this example, take a '1' from A to B, then take a '0' from B to C, but in between these starting and ending steps, respectively, we can take a '1' or a '01' repeatedly zero or more times.

Figure DFAtoRE1 illustrates the various steps in translation from a DFA to RE. A interesting anomaly occurs in this example when eliminating state A. Since A originates two paths, there is no path in which A is intermediate. We have shown the result as an RE that has no source, but transitions to E, the start state after eliminating A. The meaning of such a sourceless transition should be straightforward, as external context that is relevant to all strings recognized by the remaining generalized FA. We are reminded to giving external context to a large language model, for example, in the form of a prompt. Nonetheless, a practice that remedies this anomaly, particularly if we were to automate the translation procedure from an FA M is to introduce two "dummy" states, Sstart and Send, to M yielding M', where Sstart is the start state of M' and has an transition to the start state of M, and Send is the single accepting state of M', and all accepting states of M have an transition to Send. Figure DFAtoRE1a illustrates the effect of this policy on the previous example. Its true that the addition of transitions makes M' an NFA, particularly because of these transitions from all of the accepting states of M to the single accepting state of M', but if the FA is otherwise deterministic then you would be right to call the process of Figure DFAtoRE1a a DFA to RE translation.

Figure DFAtoRE2 illustrates another translation. In principle it does not matter what the order of state elimination is, but I find it helpful to start by eliminating states that are less involved with other states. One of the exercises asks you to translate Figure DFAtoRE2 using a different order of state elimination.

Converting Regular Expressions to NFAs with ε transitions[edit | edit source]

Figure REtoNFA1a: This shows the correspondence between a simple regular expression that includes concatenation (i.e., the lower path between states Q and R), choice (i.e., between the upper and lower paths between Q and R), and repetition (i.e., the transition from R to Q).

Regular expressions are most easily translated into NFAs with ε transitions, which in turn can be translated to DFAs if desired, but that latter step is not necessary for demonstrating the equivalence of REs and Regular languages. While the basic steps of the algorithm are intuitive (i.e., a concatenation in the RE translates to a strict sequence of states and transitions, a ‘+’ leads to branching into two or more transitions from a state; and repetition (*) in the RE leads to a loop in the NFA), the very liberal introduction of ε transitions into a growing NFA makes the algorithm more easily implementable, but makes the construction very cumbersome for illustration, and so you will see me simplifying the growing NFA at intermediate steps.

Figure REtoNFA1b: This represents the RE of 1(0+01)*00. The yellow portion of the image is copied over from Figure REtoNFA1a, which illustrates (0+01)*.

Let's start with an example of the previous section by translating (0 + 1(1+01)* 00)* to an NFA with ε transitions, and simplifying along the way. I will illustrate by working from the inside out, translating simpler subexpressions first. Figure REtoNFA1a illustrates the translation of (1 + 01)*, which is a subexpression of (0 + 1(1 + 01)* 00)*. The initial NFA of the Figure shows liberal introduction of transitions before most all subexpressions. In an automated algorithm an transition would be introduced between the 0 and 1 transitions in the lower path between states Q and R as well, but I used shorthand and jumped ahead on the simplification step for reasons of space. Following simplication we are left with essentially a DFA (excepting the transitions from the artificial start state and to the artificial end state).

Figure REtoNFA1b continues to add the the NFA to include additional subcomponents of (0 + 1(1 + 01)*00)*, now 1(1 + 01)*00. In this image, b, an additional '1' has been placed in front of the earlier NFA, with added transitions. It gets ridiculous, particularly the consecutive transitions, but again, the rationale is that the liberal addition eases the algorithm implementation, since taking 'shortcuts' can lead to mistakes, and since the very last step in such an algorithm would be to translate to a DFA, getting rid of all the transitions, if that were important. Nonetheless, I ran out of room on the page and used shorthand when adding '00' at the end.

Figure REtoNFA1c: The NFA in yellow is copied from image (b), with the final steps in the translation to an NFA are added at the top. The 0 choice is added, and repetition is added as an transition. The NFA for (0 + 1(1 + 01)*00)* that results is then translated to a DFA, which is shown at bottom.
Figure DFAsComparison: Two DFAs that represent the same language. The two accepting states in the right DFA can be combined into a single accepting state so that the underlying directed, labeled graph is identical to the DFA on the left.

Figure REtoNFAc shows the final step in the RE to NFA translation at the top, and this is followed by an NFA to DFA translation. Remember that we started with a RE that was the result of a translation from an NFA, as shown in Figure DFAtoRE2. We might expect that translation of that RE to a DFA as shown as the end up as the same DFA that we started with DFAtoRE2. Obviously, as Figure DFAsComparison makes clear, the two DFAs are not the same. But more importantly, do these two DFAs represent the same language? The answer, thankfully, is yes. As we will see in the next section on DFA minimization, the DFA on the right can be minimized so that it is the same as the DFA on the left, except for the naming of states. In this example, the minimization of the right results in states L and M,L being combined into a single accepting state.

Having demonstrated that REs are translatable to and from other regular language representations, we give the updated representational equivalencies in Figure RegularRepresentations2.

Figure RegularRepresentations2: The equivalence of representations for the regular languages, now including regular expressions (REs), as demonstrated in the text.

DFA Minimization[edit | edit source]

DFAs can be minimized into a unique minimal-state DFA, differing only in the naming of states between two “versions” of the minimal DFA. The minimization algorithm operates by identifying which pairs of states are ‘distinguishable’ or not. After looking at all pairs, if two states are not distinguishable, then they are merged in the final, minimal state DFA. It seems to me that a remarkable property of the minimization algorithm is that it looks only at the transitions out of two states to determine whether those states are ‘distinguishable’ or not. That is, transitions INTO the states being compared is not an overt part the algorithm. Moreover, its worth noting that the minimum state DFAs are identical, except in the naming of states – the transitions between states are identical in structure too, though we don’t get into the proof of this.

Figure MinimizingDFA1a: Minimizing a DFA by identifying distinguishable states (with an 'X').

It is conventional to create a table like that on the right of Figure MinimizingDFA1a that can be used to compare pairs of states. The DFA on the left of the figure is the same as we left off with in NFA to DFA translation in Figures REtoNFA1c and DFAcomparison, but with states renamed. The first step in the minimization process is to mark every pair of states in which one is an accepting state and one is not an accepting state as "distinguishable". This is the state of the table in Figure MinimizingDFA1a. Only one pair of states remain, L and P. Are L and P distinguishable or are they equivalent? They both transition to the same states on '0' and on '1', respectively, and we conclude that they are indistinguishable, aka equivalent. If we combine L and P into a single accepting state, LP, and update transitions into and out of LP (i.e., so state N transitions to LP on a '0', LP transitions to LP on a '0', LP transitions to M on a '1') then we have the DFA at the left of Figure DFAsComparison.

The general algorithm for filling in a table is below. The basic idea is to find states that cannot be equivalent, and then states that are equivalent, and merge equivalent states.

Algorithm for marking pairs of inequivalent states (“Table-filling” algorithm)

FOR p in A and q in QA DO mark(p, q) // accepting states cannot be equivalent to non-accepting states

FOR each pair of unmarked, distinct states (p, q) in A x A or (QA) x (QA) DO

    IF for some input symbol ‘a’, ((p,a), (q,a)) is marked

    THEN  mark(p, q)

    ELSE /* no pair ((p,a), (q,a)) is marked

         FOR all input symbols ‘a’ DO

              put (p, q) on followup list for ((p,a), (q,a)) UNLESS (p,a) = (q,a))


Repeat second FOR loop until no changes in what is marked (this amounts to processing follow-up lists)

The DFA minimization algorithm can be used to determine whether two DFAs accept the same language or not – minimize each of the DFAs being compared and then examine the structure of the resulting minimum state DFAs; if the minimal state DFAs are identical then the two original DFAs accept the same language. The DFA minimization algorithm can be used more broadly to judge whether any two descriptions using any of the representations that we studied (RGs, REs, DFAs, NFAs with and without ε transitions) denote the same language – translate to a DFA and minimize and compare the minimal DFAs.

The algorithm for answering whether two regular language representations are the same by translating each representation to a minimal DFA and comparing the two DFAs to see if they are isomorphic directed graphs is an example of a decision algorithm for regular languages.

Pumping Lemma for Regular Languages Revisited[edit | edit source]

An illustration of the Pumping Language for Regular languages.

Though we have described the Pumping Lemma for regular languages previously in terms of grammars, its helpful to also consider the typical way of introducing the PL using DFAs. This is equivalent to our earlier presentation and serves to further highlight the equivalency of FAs and regular (or right linear) grammars.

The Pumping Lemma says that there must be at least one loop in a DFA accepting an infinite regular language, and this loop can be repeated indefinitely or removed from an accepted string that is long enough to have been forced into taking a loop. We can show a language is not regular by assuming that it is regular, carefully selecting a string that is long enough, and pumping a suitable substring 0 or more times (recall that 0 times is the same as deleting the substring) until the resulting string is known not to be in the language thus violating the Pumping Lemma property of regular languages.

Pushdown Automata: Recognizers of Context Free Languages[edit | edit source]

PDAs are basically FAs with an additional form of memory – a stack. A pushdown automaton M = (Q, Σ, Γ, , q0, X0, A), where Q, Σ, , q0, and A have roles similar to those in FAs, and Γ and Z0 are new constructs associated with the PDA's stack memory.

  • Q is a finite set of states,
  • Σ is the finite input alphabet,
  • Γ is the finite stack alphabet,
  • q0 is in Q and is the start state,
  • X0 is in Γ and is the symbol that is initially on the stack,
  • AQ is the set of accepting states.

is a mapping from Q × (Σ ∪ ε) × Γ to finite subsets of Q × Γ*. That is, is a finite set of transitions of the form (qi, a, X) {...(qk, α)...} where the PDA transitions from state qi to qk when the current input symbol is a (or ε) and the top of stack symbol is X. Upon transition the top of stack symbol X is replaced with a string of stack symbols, α. That is, if in state qi, 'a' is the input, and X is the top of stack symbol, then the transition is made to state qk and X is replaced by popping X and pushing a sequence of stack symbols denoted by α, where 'X' might be among the stack symbols pushed back on. Note that if α = ‘WYZ’, then after popping X, then Z is pushed first, then Y, then W, and W (i.e., the leftmost stack symbol) is the new top of stack.

Figure PDARecognitionPath: This illustrates a path or trajectory of instantaneous descriptions (IDs) or configurations.
Figure PDAExample1a: An example PDA.
Figure PDAExample1b: An example PDA.

The definition of a PDA allows for nondeterminism in that each (qi, a, X) is a set of one or more possible moves, and ε transitions are allowed. Similar to representations of NFA simulations as breadth-first enumerations of recognition paths or trajectories (e.g., Figure NFASearch1), PDA simulations can be represented as a breadth-first enumeration too, but because of the increased complexity of PDAs relative to FAs, each node in a PDA enumeration is called an instantaneous description (ID) or configuration. Each PDA ID is made up of a state, qk, the top of stack, and for convenience we will typically include the remaining input string as well. Figure PDARecognitionPath shows one path only in what would be a larger enumeration of IDs in search of recognition paths for a string 'abc', using a PDA with transitions that include those at the bottom (e.g., (q0, a, X0) (q1, X1)). Figures PDAExample1a and PDAExample1b give an example of a single PDA definition, along with in-line comments and illustrations of the function of transitions. In-line comments are generally the minimum that would be desirable in presenting a PDA definition.

Acceptance by Empty Stack and by Accepting State[edit | edit source]

In addition to accepting by accepting state, PDAs can also be designed to accept by empty stack. In particular, if the input is exhausted and the PDA's stack is empty then the PDA accepts. When accepting by empty stack the accepting states, A, are irrelevant and A = { } is the indication that a PDA accepts by empty stack. We may sometimes use notation to denote the language accepted by a PDA M = (Q, Σ, Γ, , q0, X0, { }) on the empty stack: L(M) = {w (q0, w, X0) (qi, ε,  ε)} indicates that there is a sequence of 0 or more transitions from (q0, w, X0) to (qi, ε,  ε), where the first ε indicates that the input is exhausted and the the second ε indicates that the stack is empty. The notation can also be used to denote a language accepted by accepting state: for PDA M = (Q, Σ, Γ, , q0, X0, A{ }), L(M) = {w (q0, w, X0) (p, ε, γ)}, where p is an accepting state and γ indicates 0 or more stack symbols.

The PDA of Figures PDAExample1a and PDAExample1b accept by accepting state (i.e., state q3). One of the exercises asks you to revise this PDA to accept by empty stack. In general, any PDA that accepts by the empty stack can be translated to a PDA that accepts by accepting state, and vice versa.

Equivalence of PDAs and CFGs[edit | edit source]

PDAs and CFGs are equivalent finite representations of the CFLs. Since CFGs were used to delimit the CFLs, we would want to show that PDAs and CFGs are equivalent. We can do this by showing that any PDA can be translated to a CFG, and any CFG can be translated to a PDA. We only briefly sketch the translation strategies here.

Thm: If L is a CFL over with alphabet Σ, then there exists a PDA  M, such that L = L(M).

Assume that G = (V, Σ, P, S) is a context free grammar in Greibach Normal Form (i.e., all productions of form A aβ, where 'a' is a terminal symbol, 'A' is a variable, and β is a possibly empty string of variables. We assume that ε is not in L(G). The translation strategy is to construct a PDA M = ({q1}, Σ, V, , q1, S, {}), accepting by empty stack, where the start state of M is the start symbol of G (i.e., S), where (q1, a, A) contains (q1, γ) whenever A aγ is a production of G. Can you confirm that (q1, S) (q1, ε) for any string w iff S w?

Thm: If L is L(M) for a PDA M,  then L is a CFL.

Construct a CFG G from PDA M, such that the leftmost derivation of any string w using G simulates the PDA on w (i.e., the variables in a sentential form correspond to the stack symbols). Can you flesh this argument out?

Deterministic PDAs[edit | edit source]

As with FAs, deterministic PDAs (DPDAs) are a special case of (nondeterminism-allowed) PDAs, where the righthand side of every transition is a singleton set in the DPDA and ε transitions are not allowed in a DPDA. Unlike the case of FAs, however, the deterministic and nondeterminism-allowed PDAs are not equivalent. Whenever you see 'PDA' with no qualifier, assume the default, nondeterminism-allowed PDA definition.

How can we determine whether a CFL cannot be recognized by a DPDA? We will be satified by intuition. Consider that you are playing the role of a PDA, and are scanning from left to right a string in the language {wwR | w and its reverse are defined over (0+1)*}. How do you know when you are at the boundary between w and wR and that you should therefore switch from remembering (pushing) symbols of w to checking off (popping) symbols from wR? There is no other way than to guess that you are at the boundary, or not, and then follow the implications of that guess along different recognition paths. The language {wwR | w and its reverse are defined over (0+1)*} cannot be recognized by a DPDA. In contrast, consider the language {wcwR | w and its reverse are defined over (0+1)*, and symbol c marks the boundary between them}. There is no need to guess, and the language wcwR can be recognized by a DPDA. As another example, consider the language {wwR | w and its reverse are defined over (0+1)10}, that is, w is exactly 10 symbols. Again, there is no need for the PDA to guess, and this language can be recognized by a DPDA. In fact, the language is regular and it can be recognized by a DFA, a huge DFA to be sure, but a DFA nonetheless.

A language for which there exists a DPDA that recognizes it is a DPDAL or simply a DCFL.

Deterministic and Inherently Ambiguous (IA) Languages[edit | edit source]

Figure CFLsubclasses: The class of CFLs are further divided into a variety subclasses based on inherent ambiguity and (non)determinism of their automata.

A question that may occur to some is on the relationship between inherently ambiguous CFLs, as was defined when discussing grammars, and DCFLs, as defined just now by DPDAs. One might suspect that DCFLs and IACFLs are complements, but there was no mention of this rather obvious question in references texts, most notably the 1969 and 1979 editions of Hopcroft and Ullman, where I would have expected at least something like "you might wonder about the relationship between DCFLs and IACFLs, but no relationships are proven as yet". After discussing the issue with chatGPT, with several followups to correct contradictions by the AI, it pointed out that Hopcroft, Motwani, and Ullman (p. 255-256, Section 6.4.4) addressed the issue. All deterministic languages (recognized by DPDAs) have unambiguous grammars, but DCFLs and IACFLs are not perfect complements since counterexamples can be found. For instance, S 0S0 | 1S1 | ε  is an unambiguous grammar, G, but L(G) = {wwR | w and its reverse are defined over (0+1)*} is not a DCFL. Thus, it might be ok to colloqially refer to the relationship as "somewhat complementary", but not so formally as complements.

An interesting additional point is that in the realm of CFLs generally, the languages recognized by PDAs accepting by empty stack and the languages recognized by PDAs accepting by accepting state are the same sets, but in the realm of DCFLs only, the DCFLs defined by accepting state are a proper subset of the DCFLs recognized by empty stack. All DCFLs are recognized by a DPDA accepting by empty stack, but only some DCFLs are recognized by DPDAs accepting by accepting state.

Here is a convenient summary of relationships involving deterministic languages.

• If there is a deterministic PDA that accepts L then L is a deterministic (CF) language or DCFL; will also call a DCFL this a DPDA language.

• All regular languages are also deterministic languages (DCFLs or DPDALs), but not vice versa. That is, regular languages are a proper subset of the DCFLs, which in turn are a proper subset of the CFLs.

• All deterministic languages have unambiguous grammars that generate them, but not vice versa. That is, deterministic languages are a proper subset of the CFLs that are not inherently ambiguous.

• Languages accepted by PDA accepting/final state and languages accepted by PDA empty stack are the same, and both are exactly the CFLs.

• This is not the case with DCFLs – languages accepted by accepting/final state are a proper subset of the languages accepted by empty stack within the class of DCFLs.

The subclass relationships present in the CFLs are illustrated in Figure CFLsubclasses.

Turing Machines: Recognizers of Unrestricted Languages[edit | edit source]

Figure TMDefinition: A TM is a finite automaton plus an infinite memory in the form of a tape.

Turing machines (TMs) are recognizers of unrestricted languages, which include all the other language classes that we have described (i.e., CSLs, CFLs, RLs). TMs are representationally more expresive than FAs or PDAs. TMs can also enumerate languages and compute functions, and can simulate digital computers and other automata. In fact, TMs are taken to delimit what is computable.

TMs are specified by M = (Q, Σ, , , q0, B, D), as summarized in Figure TMDefinition. The set of states in the finite control (Q), a start state, q0, a set of accepting states A Q, an input alphabet, Σ, and a transition function, , have similar roles as in other automata. A tape alphabet, , are all symbols that can appear on the infinite tape memory. A special symbol, B , represents a 'blank' on the tape. An input string, w, over Σ, is the initial contents of the tape, with Bs on both sides of w. Thus, Σ . Note that this convention of having the tape be an explicit part of the TM notation is different from the conventions for FAs and PDAs, in which the input string's presence on a READ-ONLY tape is often implicit. The TM tape is a read-AND-WRITE memory, similar to a PDA stack, but different from a PDA stack, which does not initially contain the input string. Finally, the TM can move its read-and-write head one tape cell in either direction (denoted Left or Right) or keeping it unchanged (U), rather than in simply one direction as in the case of an FA's and PDA's (implicit) read-only tape.

Textbooks typically define the transition function, , to be deterministic by default in TMs. That is, for each (state, tape symbol) pair, the TM makes exactly one move (aka transition), going to a another state, rewriting the current tape symbol (perhaps to the same value), and moving the read-write tape head left, right, or keeping it unchanged. Succinctly, a transition is (qi, X) = (qk, Y, D), where qi, qk Q; X, Y ; D {L, R, U}.

TM's as recognizers[edit | edit source]

Figure TMRecognizer1: A simple TM as a recognizer. This language, L = {0n1n | n 1}, is actually a CFL. The solution is due to Hopcroft, Motwani, and Ullman (2007).

In keeping with our treatment of other kinds of automata, we first consider some examples of TMs as recognizers of languages. Because a TM can move to the right and left an indeterminant number of times, there is no telling when it's input is exhausted. Rather, we will assume that the transitions are defined so the as soon as a TM enters an accepting state, it accepts, and if a TM ever encounters a (state, tape symbol) configuration that is not the lefthand side of any transition then the TM immediately rejects.

Figure TMRecognizer1a: This trace of instantaneous descriptions on input '000111' illustrates that the procedure ticks off 0s and corresponding 1s one at a time from left to right. State q3 is entered when the last 0 is read, and from q3 the TM goes to q4 on the last corresponding 1. You should confirm that on strings that don't match the 0n1n pattern, n 1, particularly with more 0s than 1s or more 1 than 0s, that the TM willl reach an undefined configuration and thus reject.

An example of the TM as a recognizer is shown in Figure TMRecognizer1. TM programming can be quite tedious, even more tedious than programming a computer in assembly or machine language, but one typically gives high level pseudocode descriptions of procedures first, then translates to TM code, as illustrated in bold red in the Figure. Figure TMRecognizer1a traces the TM procedure on input '000111', shown at the top of the leftmost column.

As with PDAs we define instantaneous descriptions (IDs) or configurations for TMs. An ID for TMs include (a) the current state, (b) the location of the read/write head on the tape, (c) the tape symbol under the read/write head, and (d) for convenience of illustration we will often include all the current tape symbols as part of the ID, though this is not typically required. Figure TMRecognizer1a shows the IDs in a recognition path for '000111'.

Equivalent Variants of Turing Machines[edit | edit source]

There are variations on the Turing machine definition that neither add nor diminish the computational power of TMs. The variations that are considered here are (a) multi-track, single tape machines, (b) multi-tape TMs, (c) one-way infinite tape TMs, and (d) nondeterministic TMs.

Multi-track TMs[edit | edit source]

We can divide up a single tape into multiple tracks, say n of them, as illustrated in Figure TMmultitracks. Each track contains cells that are coincident with the cells of the other tracks. Each track j has its own finite tape alphabet, j, and different tracks can have the same tape alphabets or different tape alphabets. There is, however, only one input alphabet, , and one set of states, Q. There is only one tape head, and at each step the tape head will point at a single tape location with tuple (X1, X2, ..., Xn) for tracks 1 through n, where each Xj j. The transitions of a multi-track are defined as (qi, (X1, X2, ..., Xn)) = (qk, (Y1, Y2, ..., Yn), D), where qi, qk Q; Xj, Yj j; D {L, R, U}. In sum, the definition of a multi-track TM with n tracks is M = (Q, Σ, (1,2, ..., n), , q0, B, D), is a subset of each j.

A multi-track TM adds no computational power to the basic TM model, which has a single track tape. To see the equivalence in computational power, note that the standard single-tape model is a special case of the multi-track model in which n = 1. To go the other way, that any multi-track TM with n > 1 has an equivalent single track TM that accepts the same language, recognize that we can create a single tape alphabet, , by concatenating the symbol names of the different track alphabets, that is the cross-product of the individual track alphabets, 1 2 ... n. Transitions are redefined to use this one alphabet (qi, X1 X2 ... Xn) = (qk, Y1 Y2 ... Yn, D), where X1 X2 ... Xn and Y1 Y2 ... Yn are symbols in .

The advantage of the multitrack formalism is that it can make expression of many procedures easier to conceptualize and write, and that includes acting as an intermediate formalism in showing that still other formalisms are equivalent to the basic single tape, single track TM model.

Multi-tape TMs[edit | edit source]

The multi-tape TM formalism, summarized in Figure TMmultitapes, is different than the multi-track formalism, in that a multi-tape TM has multiple read/write heads, one for each of n tapes, as well as multiple tape alphabets, j. Each transition is defined as (qi, (X1, X2, ..., Xn)) = (qk, (Y1, Y2, ..., Yn), (D1, D2, ..., Dn)), where Xj and Yj are members of their respective tape alphabets j, and each Dj is the direction (L, R, U) of the read/write head for the jth tape.Thus, at any given time in processing the n read/write heads need not be aligned at the same cells on their respective tapes, relative to where they all began, which is assumed to be in alignment at the start of the input string on the first tape at the very start of processing.

The multi-tape model adds no computational power to the standard single tape model. To see this, note that the standard single tape model is a special case of the multi-tape model where n = 1. Moreover, any multi-tape TM has an equivalent single tape, multi-track TM, which in turn can be converted into an equivalent single-tape, single-track TM. We just show a sketch of the first step here, and a sketch of the second step is above.

Each tape of the multi-tape TM will correspond to a track in a single tape, multi-track TM. In addition to n tracks for the n tapes, there will be n additional tracks, which give the relative location of the read/write head of the corresponding tape in the original multi-tape TM. So, if the head for tape j had moved three steps L (-3) and one step right (+1) and two steps unchanged (+0), then after the six steps the track representing its location would be -2 relative to its start location.

TMs with one-way infinite tapes[edit | edit source]

its intuitive that the basic TM model can be simulated by associating one track with one side of the standard TM’s read/write head, and another track or stack associated with the other side of the read/write head. Thus, these seeming restrictions to a semi-infinite tape or two stacks don’t reduce the set of languages that can be recognized over the standard TM model.

Can we reformulate the multi-tape conversion, using two tracks per tape

Nondeterministic TMs[edit | edit source]

A nondeterministic TM includes at least one configuration in which more than one transition is possible. So, we extend the transition function so that the value of a (qi, X) is a set, such as (qi, X) = { ... (qk, Y, D) ...}. In many cases the set for a configuration might still be a singleton, and if we chose we could still consider a deterministic TM as a special case of this more general definition.

Nondeterministic TMs add no computational power beyond

Compare TMs to the definition of PDAs, which were defined initially as inherently nondeterministic, and deterministic PDAs were defined as a special case of the default definition, rather than nondeterministic versions defined as extensions to the deterministic machines, as is the case with TMs and FAs. Why?

Presumably, the reason for this switch in convention among is because for FAs and TMs the deterministic versions accept the same set of languages as the nondeterministic versions, so why not introduce the deterministic versions first and use them as a jumping off point for the nondeterministic version. In contrast, the deterministic PDAs are not as powerful as the nondeterministic PDAs – the former recognizes a proper subset of the latter. So make the nondeterministic PDAs the default, thus conveying an unambiguous equivalence to the CFLs.

Why are the nondeterministic and deterministic machines of equivalent power for FAs and TMs, but not PDAs? Offhand I don’t have a definitive answer, but it seems intuitive that it is related to a ceiling effect in the case of TMs (i.e., the most powerful machines) and a floor effect for FAs (the least powerful machines). PDAs are intermediate between the extremes, and it seems to be the case in many settings, far and wide, not just ALC, that much of the most interesting, or at least most variable, behaviors happens between the extremes.

TM's to compute functions[edit | edit source]

Let's start with the multiplication function.

TMs as subroutines[edit | edit source]

Subroutines are an important part of TM programming, where ‘calling’ a subroutine amounts to the TM transitioning to a network of states that have been specially set aside as implementing the subroutine, and the subroutine returns by leaving the machine in a configuration that can be picked up by the ‘calling’ routine. The use of the multiplication subroutine as a means of implementing the square function (i.e., n2) is a good example of this

TM's as enumerators[edit | edit source]

Turing Equivalency of Digital Computers[edit | edit source]

Finally, there is an equivalence between TMs and standard digital computers. In one direction it seems clear that we could all simulate a TM in our favorite programming language. In the other direction, we illustrate how the basic components of interpreting a low level machine language program could be simulated on a multi-tape machine (which is equivalent to some very complicated single tape TM).

The basic components of a digital computer include (a) a central processing unit, (b) a memory, and we single out

In showing that TMs are equivalent to digital computers in terms of what can be computed in principle. Examples have been given that show TMs recognizing and enumerating various languages and computing different functions. But something important is missing in a compelling argument for equivalence. Just as there are “universal” computer programs that can translate/compile/interpret other computer programs, there is a universal TM, call it MU, that can interpret or simulate another TM, Mk, on a given input string to Mk. In order for a TM, Mk, to be input to TMU, we need to encode Mk as a string (because any TM, MU included, requires a string as input that is used to initialize MU’s tape). The language accepted by TMU, call it LU, is of the form { (<Mk>$<w>) | <Mk> is a string encoding of a TM, Mk, <w> is a string that is input to Mk, and $ is a delimiter separating the two substrings -- the Mk and its input}.

Turing Machines and Language Classes[edit | edit source]

As already noted, TMs are recognizers of unrestricted languages. Given an unrestricted grammar, G, we can design a TM, M, such that L(G) = L(M). The easiest strategy for constructing such a recognizer, though not the most efficient, is to design a TM recognizer that calls a subroutine TM that is an enumerator of L(G), as already discussed, such that given a string w, if the TM enumerator eventually generates w, then the TM recognizer accepts w as a member of L(G), else the TM recognizer rejects w as a member of L(G).

Recursively enumerable (RE) languages[edit | edit source]

Because TMs can enumerate the members of any language defined by an unrestricted grammar using, for example, a breadth-first enumeration strategy, the languages recognized by TMs are also referred to as the recursively enumerable languages. The recursively enumerable (RE) languages are synonymous with the unrestricted languages. But what happens when a TM recognizer of an RE (aka unrestricted) language is given a string that is not in the language. Ideally, and in most cases with a properly designed TM, the TM will reject w and halt. But in the case of some strings that are not in the language, L(G) = L(M), the recognizer may not halt, but may run forever. To intuit this possibility, recall that an unrestricted grammar need not be a noncontracting grammar, and if so, a TM constructed from a contracting grammar will result in paths in a breadth-first enumeration that can shrink in size. This complicates the test of when a string w of length w is not a future sentential form on an enumeration path, and in some cases there will be no such test.

To repeat, for some RE languages there is no TM that will halt in all cases where an input string is not in the TM's language.

Recursive languages[edit | edit source]

A TM (or more generally, a procedure) that halts on all inputs, be the input string a member of the TM's language or not a member, is called an algorithm. The language of a TM that always halts is called a recursive language. The recursive languages are a proper subset of the RE languages. The recursive languages are said to be decidable or solvable, since TMs to recognize them always halt with a decision of accept or reject.

RE languages that are not recursive are said to be undecidable or unsolvable, since there are some cases in which their TM recognizers will not halt with a reject decision. Non-RE languages are undecidable (aka unsolvable) in a second, even stronger sense -- there is no TM at all for recognizing all strings in the language in both the accept and reject cases.

We return to issues of undecidability/unsolvability and decidability/solvability when we discuss the properties of the RE languages and properties of the recursive languages, respectively.

Languages that are not RE[edit | edit source]

There are languages that are not recognized by any Turing machine, that is there are languages that are not recursively enumerable (not RE, and thus not recursive either).

One might ask what languages could possibly be non-RE. Intuitively, examples of non-RE languages will be those for which there is no basis by which a recognizer (or enumerator) can be constructed. A language in which the members are selected randomly would be an intuitive illustrative example: LR = {w | w in Σ* and random(w) == true} could be formed by enumerating the strings in Σ* in canonical order and randomly deciding in each case whether a string, w, is in the language or not. The reason that a recognizer can't be built for this language assumes that random(w) is applied in the recognizer with no memory of its application in the creation of the language. And we can't simply remember the members, since there are an infinite number. In contrast, if random(w) were a pseudorandom process then such procedures are repeatable and we could implement a recognizer in that case, but if we have a truly random process then no TM recognizer can be built and the language is not RE.

Figure TableofTMLanguages: Each row represents the strings that is accepted by the TM that heads that row. The diagonal represents the set of encodings of TMs that accept their own encodings . That is, a '1' in (4, 4) says that M4 accepts M4's encoding. If a row does not represent a valid TM encoding then its assumed to be an encoding of a "dummy" machine that represents the empty language (i.e., a row of all 0s for j 1).

In addition to "random languages" there are demonstrably other non-RE languages. A demonstration of non-RE languages uses a diagonalization argument. Consider Figure TableofTMLanguages and recall that we can encode TMs as strings. Each row corresponds to the language represented by one TM encoding -- the actual encoding is not shown but is not important to the argument. We only show a label for the TM. If a row does not represent a valid TM encoding then its assumed to be an encoding of a "dummy" machine that represents the empty language (i.e., a row of all 0s for j 1). The '1' values in cells of the table, (i, j), indicate that TMi accepts string j. The diagonal represents the set of encodings of TMs that accept their own encodings . That is, a '1' in (4, 4) says that TM4 accepts TM4's encoding.

Figure DiagonalLanguage: The complement of the diagonal of TM languages cannot equal any row – there is a language d with no TM representation, thus d is not RE

Its possible that the diagonal corresponds to a row in the table, which would therefore represent the TM that represents the language of TMs (encodings) that accept "themselves". But if we complement the diagonal, as shown in Figure DiagonalLanguage, then that complemented diagnonal, d, is one that differs with every other language in the table by at least one cell -- and so d cannot correspond to a row in the table, and therefore there is no TM that represents d, the language of TM encodings that do not accept themselves.

d = {<Mk> | <Mk> is not accepted by Mk} is not RE.

There are other languages that are not RE. The language of string encodings of TMs for recursive languages is not RE. And thus the language of string encodings of TMs for languages that are not recursive is also not RE (p. 187 second edition).

Languages that are not RE are said to be undecidable or unsolvable, but in an even stronger sense than RE languages that are not recursive. In the former case, there is no TM recognizer that halts in all cases with either accept or reject decisions. There might be TMs that are approximate representations of a non-RE language, but there is no TM that precisely represents a non-RE language.

Ld and Lu are RE but not recursive[edit | edit source]

The language LU is recursively enumerable, but is not recursive. This means that the universal TM halts on accept, but does not halt on reject in all cases. Intuitively, if the argument to the universal TM, Mu, doesn't halt, then Mu doesn't halt either. Any programmer who has ever coded an infitite loop will recognize this. But to be convincing, can we identify a TM that doesn't halt in all cases, thus representing an RE language (because there is a TM) that is not recursive (doesn't halt in all cases).

Recall from the section above that d = {<Mk> | <Mk> is NOT accepted by Mk} is not RE. The complement of this language is d = {<Mk> | <Mk> is accepted by Mk} (i.e., the diagnonal in Figure TableofTMLanguages and the left table of Figure DiagonalLanguage). In the previous section we noted that d could correspond to a row in Figure TableofTMLanguages, and indeed we can build a TM, Md, that recognizes d by providing the universal TM with the input (<Mk>$<Mk>), and if the universal TM accepts, then <Mk> is in d = {<Mk> | <Mk> is accepted by Mk}, else <Mk> is not in d.

The existence of Md indicates that d is RE. Is d recursive however? The answer must be no, because if d were recursive then d would necessarily be recursive as well. To preview a discussion later in "Properties of Recursive Languages", if L is a recursive language, then there is a TM, call it ML, that recognizes L and that is guaranteed to halt for all accept cases and in all reject cases. We can create a new TM, ML', that calls ML as a subroutine. If ML returns accept for a input string, then ML' returns reject. If ML returns reject for a input string, then ML' returns accept. ML' accepts the complement of L, and always halts, so the complement of L is recursive.

So d is a language that is RE but not recursive, since if d were recursive then d would be recursive too, and we know d isn't recursive (its not even RE). We can say therefore that Lu isn't recursive either, since it won't halt when its argument is a non-halting TM like Md.

Reductions[edit | edit source]

A reduction is a kind of translation. When we have talked about translations previously, as in the case of inductive proofs or ..., we generally used steps in the translation that were symmeteric, and each step yielded an equivalent form, so direction of a demonstration is arbitrary, at least formally. But reduction doesn't require equivalence necessarily, and direction is important. Rather, you can think of a reduction as showing a one way implication. If you hear “Reduce problem P1 to problem P2” in computational theory it means to “find a means of determining a solution for problem P1 using a means of determining a solution for P2.”

Figure ReductionExample1: Reducing the problem of membership in d to the problem of membership in u.

In the previous section on showing d and u were not recursive, we reduced the problem of recognizing d to the problem of recognizing u, as illustrated in Figure ReductionExample1. In this example, input to TM Md was an encoded TM <Mk>. This input was preprocessed into <Mk>$<Mk>, which was passed to Mu.

Figure ReductionInProof: Four different but logically equivalent ways for thinking about how reduction can be used in a proof by contradiction that P2 (e.g., Lu) is not recusive (i.e., R, or undecidable). The four interpretations are expressed in terms of modus ponens or unit resolution, as described in the first chapter.

Whatever means for determining solutions you devise, the reduction should have the property that for every instance of the P1 problem (e.g., membership in d) that should return a ‘yes/accept’, the corresponding instance of the P2 problem (e.g., membership in u) returns a ‘yes/accept’, and for all instances of P1 that should return a ‘no/reject’, so does P2 for the corresponding instances.

In a reduction from P1 to P2, you can think about the P2 solution procedure (e.g., Mu) as a subroutine of the P1 solution procedure (e.g., Md) , albeit a subroutine that does the bulk of the work. Again, we are not showing P1 and P2 are equivalent, but only that one problem solution procedure can be found using another problem solution procedure. In a reduction from P1 to P2, we can also say colloqially that P2 (e.g., membership in Lu) is at least as hard as P1 (e.g., membership in Ld).

Figure ReductionInProof also gives us an alternative way to prove, by contradiction, that Lu is not recursive (i.e., undecidable) given that d is not recursive. Of the four logically equivalent ways for thinking about how reduction can be used in a proof by contradiction, the first (i.e., P2 R P1 R) follows most intuitively from the reduction in Figure ReductionExample1 where we reduce the problem of membership in d to the problem of membership in u. In that reduction we assume that u is recursive, and thus Mu halts in all cases. But if that were so then the reduction indicates that Md would also halt in all cases, which we know cannot be the case. We are left with a contradiction, and so our assumption that Lu is recursive must be wrong.

Linear bounded TMs and the CSLs[edit | edit source]

Linear Bounded Turing Machines are a form of restricted TMs that exactly recognize the class of context sensitive languages (CSLs). To the set of input symbols a linear bounded TM adds two special symbols to Σ, one that delimits the left end of the input and the other delimits the right end of the input, and these special symbols are only used in that way and they cannot be overwritten. A linearly bounded TM is a nondeterministic single tape TM that never leaves the tape cells on which its input was placed. The input is {w | w is in Σ, but with the two special end markers omitted}.

Thm: If L is a CSL, then L is accepted by a linear bounded TM. Intuitively, we know that L can be enumerated by a CSG in canonical order and the derivation of a string cannot include a sentential form that is longer than the string.

Thm: If L = L(M) where M is an linear bounded TM then L (excluding ε, if necessary) is a CSL.

The CSLs are a proper subset of the recursive languages. A language that is recursive but not context sensitive, thus showing the proper subset relationship, is described next, following Hopcroft and Ullman.

A Recursive Language that is not a CSL[edit | edit source]

Develop a binary encoding scheme for the type 0 grammars (the grunt work is not important to understand the argument), and these can be numbered from 1 to infinity (which will include “dummy” grammars, just as an earlier argument acknowledged “dummy” TMs). Of the type 0 grammars its easy to recognize whether the ith grammar in the ordering is a CSG (its productions’ right-hand sides will never be smaller in length than the corresponding left-hand sides).

Define language L = {wi | wi is NOT in L(Gi)}. That is, L is the language of binary strings wi that encode CSGs Gi where wi is not not generated by Gi. (i.e., the language of grammars that do not generate “themselves” so to speak). Since Gi is a CSG, there is an algorithm (always halts) that determines whether wi is in L(Gi). Thus, L is recursive, since given a string wi the test for membership of wi in L will always halt -- if wi is not a CSG then reject wi as a member of L, and if wi is a CSG then if it generates itself then accept wi as a member of L (and halt), else reject wi and halt. But L itself is not generated by any CSG.

A proof by contradiction shows that L cannot be generated by a CSG. Suppose that there was a CSG Gk that generated L, so L = L(Gk). If wk were in L(Gk) we get a contradiction, because L is the language of binary encoded CSGs that are not generated by “themselves”. Consider then if wk is not in L, then wk is not in L(Gk), and again a contradiction, because L is defined so as to include wk in that case.

Thus, L is a recursive language that is not a CSL.

If the reasoning sounds like a Escher painting initially, I understand, but reflect on it. Many proofs by contradiction at this level have the feel of paradoxes, though they are not.

Rather than simply becoming comfortable with the logic of paradoxical-sounding proofs by contradiction, however, its often instructive to dig deeper. By saying that L is not a CSL is to say that there is no non-contracting grammar that generates L -- none, nil, nada! Remember that CSGs are a normal form of non-contracting grammars. Is there something about L = {wi | wi is NOT in L(Gi) and binary encoded Gi = wi} that precludes a non-contracting grammar? Carrying the implications further, for all Type 0 (unrestricted) grammars that do generate L, there must exist a wj in L

How do we connect this reasoning to the fact that the CSLs are non-contracting; the LR~CS must be non-contracting. Does this suggest other Recursive but not CSL languages, like planning is which sometimes you must undo subgoals?

Exercises, Projects, and Discussions[edit | edit source]

Figure FAExample3: A FA transition diagram.

FA Exercise 1: Consider the FA of Figure FAexample3. Rewrite this FA to explicitly show all states and transitions.

FA Exercise 2: Describe the language that the FA of Figure FAExample3 represents.

FA Exercise 3: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 53) Define an FA that accepts strings of 0s and 1s that contain three consecutive 1s.

FA Exercise 4: (due to Hopcroft and Ullman, 1969, p. 44) Define an FA that accepts strings such that every 0 is followed immediately by a 1.

FA Exercise 5: Have a discussion with an AI large language model (LLM) on formalizing and generalizing the discussion under "Equivalence of NFAs and DFAs/Systematic and Judicious Growth". Your goal is to obtain a concise algorithm, or runnable program if you wish, for translating an NFA to a DFA using the strategy that is discussed. You should not accept out of hand whatever description that you get from the LLM, but check it and iterate until you feel confident that you can explain the algorithm and inputs/outputs to the instructor, TA, or other student who is familiar with the area, but. needs help with this particular algorithm.

FA Exercise 6: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54) Construct an NFA that accepts the set of all strings of 0s and 1s such that the 6th symbol from the right end is 1. Hint: Because the 6th from the right end will be read by the NFA BEFORE it reaches the right end, the NFA cannot know that the current '1' is 6th from the right or not. So use the guessing strategy described in "Equivalence of NFAs and DFAs/Retrospection".

FA Exercise 7: (adapted from Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54, pp. 64-65) Translate the NFA that you develop in FA Exercise 5 to a DFA. Choose an unambiguous way of representing the DFA.

FA Exercise 8: Translate the DFA you construct in FA Exercise 6 to a regular grammar.

FA Exercise 9: (due to Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 54) Give a FA that accepts all strings beginning with a 1 that when interpreted as a binary integer is a multiple of 5. For example 101, 1010, and 1111 are accepted and 0, 100, and 111 are not.

FA Exercise 10: (due to Hopcroft and Ullman, 1979, p. 48; Hopcroft, Motwani, and Ullman, 2007, p. 53) Construct a FA over {0, 1} that accepts all strings such that each block of 5 consecutive symbols contains at least two 0s. Use a sliding window interpretation of the problem, so that there are two blocks of 5, for example, in a 6 symbol string: symbols 1-5 and symbols 2-6.

FA Exercise 11: Continue processing the input of '0112' in Figure NFAProcess2c, and confirm '0112' is accepted by the NFA. Also confirm that '122' is accepted, and that '120' is not accepted.

FA Exercise 12: (a) (due to Hopcroft and Ullman, 1979, p. 48) Give an NFA (which is not also a DFA) for the language described in Figure NFAExample3, except for i 0, instead of i 1; (b) Give a DFA for the language of (a).

FA Exercise 13: Give an NFA (that is not also a DFA) that accepts the same language as Figure NFAExample2, but with no transitions. Can you write a general procedure for translating an NFA with transitions to an NFA without transitions, and that is not necessarily a DFA?

RE Exercise 1: There is a pedagogical case, as well as a practical case, to be made for starting this chapter with regular expressions, showing their equivalence to regular grammars, then presenting the generalized version of FAs with transitions labeled by arbitrary REs, showing that these are equivalent to FAs with only primitive REs (i.e., alphabet symbols) labeling the transitions, and continuing on. Carry out a thought experiment in which you redesign the chapter with REs labeling FA transitions. How would NFAs be defined by presenting the generalized FAs first? How might demonstrations of equivalence be affected? How might design and development of FAs be impacted by an encouraged allowance to start with generalized FAs?

RE Exercise 2: Translate Figure DFAtoRE2 by eliminating state C first.

RE Exercise 3: (due to Hopcroft and Ullman, 1979, p. 50; Hopcroft, Motwani, and Ullman, 2007, p. 92) Construct an RE for the set of all strings over {0,1} such that the number of 0s and number of 1s are equal and no prefix has two more 0s than 1s and no prefix has two more 1s than 0s  (i.e., for any prefix the number of 0s and number of 1s differ by no more than one).

RE Exercise 4: Using the textual descriptions and examples as a guide, write an algorithm in pseudocode or a programming language of your choice to translate DFAs to REs. Demonstrate the algorithm on a few test cases. If you use a large language model, insure that you understand the LLM's result, and revise it as necessary. Show the prompts you used.

PDA Exercise 1: Give a pushdown automaton for {w | w is a string in (0+1)* and w consists of an equal number of 0s and 1s}. Hint: you can do this with a deterministic PDA.

PDA Exercise 2: Give a pushdown automaton for {wcwR | c is a character and w is a string in (a+b)*}.

PDA Exercise 3: Give a pushdown automaton for {cwwR | c is a character and w is a string in (a+b)*} that accepts by empty stack. Hint: you can revise the PDA defined by Figures PDAExample1a and PDAExample1b if you wish.

PDA Exercise 4: Construct a PDA for palindromes over {a, b}.

PDA Exercise 5: Just as we gave an FA explanation of the Pumping Lemma for RLs, give a PDA explanation of the PL for CFLs.

TM Exercise 1: Define a formalism for a PDA+ that has two stacks instead of only one stack. Show that the two-stack PDA+ is equivalent in computational power to a TM.

Properties of Language Classes[edit | edit source]

A comprehensive picture of the hierarchy of language classes is shown in Figure LanguageHierarchy, along with the briefest reference to concepts and factoids that have been covered. It is intended as a reference that can be reviewed quickly to good effect, possibly before an exam, or perhaps years later to confirm a fleeting memory.

Figure LanguageHierarchy: The languages classes studied thus far, but with a few references on some of the topics to come, most notably algorithm complexity and more on undecidability.

Kinds of Properties[edit | edit source]

A property of a given language is a statement or predicate that is true of the language. Suppose the language is the set of prime numbers. One property is that there is an algorithm for recognizing any input as a prime number or not. That is, the language of prime numbers is recursive. We've spent a lot of time on such properties -- that a language is in a class of languages, or that a particular language includes a particular string or not. We've covered other properties of specific languages too, perhaps only briefly, such as the property that a given language is inherently ambiguous or not.

A property of a language class is a statement or predicate that is true of all members of the class. We have not spent much time thus far on properties of language classes.

Closure Properties[edit | edit source]

To preview one example, lets consider the class of CFLs. ∀L L CFLs → P(L), where L is a language and P is a property. A property of the CFLs is that if L is a CFL then L* is a CFL. This is an example of a closure property -- that the CFLs are closed under Kleene Closure, aka repetition. As another example of a closure property, again of the CFL class, ∀L1,L2 (L1 CFLs ∧ L2 CFLs) → R(L1, L2), where L1 and L2 are languages and R is a property, such as union: if L1 and L2 are CFLs then L1 L2 is a CFL; the CFLs are closed under union.

Figure ClosurePropertiesSummary: Closure properties of language classes. A check in a cell indicates that the language class in the corresponding column is closed under the operation of the corresponding row. No check in a cell indicates that the language class is not closed under the operation.

But if we say that a property does not hold for a language class, such as the CFLs, this means that the property is not true of all CFLs, or equivalently, there exists a CFL for which the property does not hold. That is, for CFLs, ¬(∀L L CFLs → P(L)) ≡ ∃L ¬(L CFLs P(L)) ≡ ∃L L CFLs ¬P(L)), where L is a language and P is a property. For example, if L is a CFL, then L's complement, , is not necessarily a CFL; the CFLs are not closed under complement, or put another way, the class of CFLs does not possess the closed-under-complement property. Its important to recognize that this is a statement about a class of languages, the CFLs, not a statement about a particular language. Thus, its not inconsistent to say that the class of CFLs do not possess the closed-under-complement property, but that the complement of a particular CFL is also a CFL.

We have talked extensively about properties of various language classes already -- the type of grammar that generates languages in the class, the kind of automata that recognize them, and other representations and patterns of languages in the class (e.g., as stated in the Pumping Lemma). These are all examples of definitional properties of selected language classes. A new focus in this chapter is closure properties (or not) of language classes. This text focuses on six closure properties, but there are many other closure properties that are addressed in other texts. A preview of our coverage is shown in Figure ClosurePropertiesSummary.

Decision Properties[edit | edit source]

In addition to closure properties, we will also discuss selected decision properties in this chapter. A decision property corresponds to a yes/no question about a language class that is decidable in all cases. We have already covered in considerable detail, for example, membership decision properties. Every language class we have studied, except RE, has the decision property that a test of membership in a language of the class is an algorithm. A test of membership is decidable for regular languages, DCFLs, CFLs, CSLs, and recursive languages, and thus each of those classes has the test-of-membership decision property. The RE languages generally, notably to include those languages that are not recursive, do not have the test-of-membership decision property because the question of membership of an arbitrary RE language is undecidable. We've spent so much time on this question already that we don't repeat the analysis in discussing each language class in this chapter, but we do include it in the summary of decision properties in Figure DecisionPropertiesSummary.

Recall that a question is decidable if there exists a TM (or computer program) that given a requisite number of language specifications (e.g., as grammars or automata) as input, and correctly answers the question for which the TM was designed to answer. As yet another example of a decision property, in this case of the regular languages, recall we have already sketched an algorithm for answering whether two regular languages, as represented finitely through DFAs, NFAs, REs, or RGs, are the same language. So if we are given an RE that represents a language and an NFA that represents a language, then clearly the languages represented by each are regular, and we can translate each language specification into a DFA, minimize each of the resulting DFAs, and see if the two minimal state DFAs are identical, aside from state names. This process can be implemented as a TM or computer program that correctly answers whether its two input language specifications represent the same language or not. In my example I seemed to suggest that the inputs can be different in form -- one as an RE and one as a NFA -- but for purposes of implementation we could insist on both inputs as REs, or both as NFAs, or both as DFAs, or both as RGs; or we could use TMs as the way we represent all languages that are input to decision questions.

The class of regular languages has the property that the question of equivalence is decidable because we can test for equivalence by using a single correct TM for all (pairs of) regular languages that are inputs to the TM.

In contrast, if we say that a decision question is undecidable for a class of languages then that means that there does not exist a TM that correctly answers the question (and halts) for all languages in the class. For example, it is undecidable whether two CFLs, as represented finitely by CFGs or PDAs, or TMs, are the same language. But we want to be careful about language here. We've said earlier that a property of a class of languages is a statement that is true of all languages in the class. So rather than saying that the class of CFLs has the property that the question of equivalence is undecidable, which you might see in some sources, we'll say that the class of CFLs does not have the property of decidability on the question of equivalence. This is consistent with the discussion on closure properties as well. It is still the case that particular pairs of CFL specifications can be shown to be equivalent using a TM created to answer the equivalence questions for CFLs, but no TM can be found that is always correct and always halts.

As with closure properties we will only reference a limited number of decision properties, though when we reach the RE languages we will describe Rice's theorem and its astounding conclusion that an infinite number of decision questions are undecidable in the case of the RE languages generally.

Properties of Regular Languages[edit | edit source]

If we want to show that a language is regular it will be possible to construct an FA, regular expression, or regular grammar that demonstrably represents the language, perhaps verified by a proof by induction or contradiction. A construction argument, whether followed by an auxiliary proof or not, often represents a kind of gold standard of demonstration.

Closure Properties of Regular Languages[edit | edit source]

Closure properties can also be used to show a language, L, is regular (or in some other class of languages for that matter). We can do this by applying transformations known to preserve regularity to a language, X, known to be regular (e.g., by construction), until reaching the target language, L, thus demonstrating that X RL L RL. In addition, closure properties can be handy in showing that a language, L, is not regular, by applying transformations known to preserve regularity to L, until a language, X, is derived that is known through some other demonstration to not be regular. Thus, the original language, L, must not have been regular either (i.e., demonstrating X RL L RL).

The regular languages are closed under

  • Complement
  • Union
  • Intersection
  • Concatenation
  • Repetition or Kleene Closure
  • Substitution

It will be interesting to compare the closure properties of the regular languages with those closure properties of more inclusive languages – context free, context sensitive, and unrestricted.

Regular Languages are closed under Complement[edit | edit source]

Theorem: If L is a regular language then the complement of L, , is regular.

Since L is regular, there is a DFA, M, that recognizes L. Construct the DFA for , call it , by simultaneously changing all accepting states in M to non-accepting states, and changing all non-accepting states in M to accepting states. If accepts a string w then w must have taken a path to an accepting state of , which was a non-accepting state in M, so w is not in L. If did not accept a string w then w must have taken a path to a non-accepting state of , which. was an accepting state in M, so w is in L.

Regular Languages are closed under Union[edit | edit source]

Theorem: If L1 and L2 are both regular languages, then L1 L2 is a regular language.

Since L1 and L2 are each regular languages, there are DFAs M1 and M2, respectively, that recognize each. Construct an NFA with transitions, M12, for L1 L2 as follows.

  • Create copies of M1 and M2, call the. copies M1' and M2'.
  • Create a start state for M12, and add transitions from that start state to each of the start states of M1' and M2'.
  • Create a single accepting state for M12, and add an transition for each of the accepting states of M1' and of M2' to the accepting state of M12.
  • Change the accepting states of M1' and M2' to non-accepting states.

M12 is an NFA that recognizes L1 L2 and thus L1 L2 is a regular language.

Regular Languages are closed under Intersection[edit | edit source]

Figure DFAofIntersection: The language {0m | m is evenly divisible by 2 and evenly divisible by 3} is regular.

Theorem: If L1 and L2 are both regular languages, then L1 L2 is a regular language.

Suppose L1 and L2 are regular languages over alphabet Σ. Then L1 L2 is regular. This must be so since L1 L2 = ~(~L1 ~L2) and the regular languages are closed under complement and union. But we can also directly construct a DFA that accepts L1 L2 from the DFAs that must exist for L1 (M1 = (Q1, Σ , 1, q1, F1)) and L2 (M2 = (Q2, Σ , 2, q2, F2)), respectively. The following is adapted from (pp. 59-60, [2]; p. 137, [3]).

M1∩2 = (Q1×Q2, Σ, , [q1, q2 ], F1×F2) that accepts L1 L2. For each pair of states from M1 and M2 respectively, define a state in M1∩2 . For each of these pairs of states, on each symbol a in Σ, define a transition in M1∩2

                                                                          ([q1i, q2k ], a) = [1(q1i, a), 2(q2k, a)]

As an example, consider {0m | m is evenly divisible by 2 and evenly divisible by 3}. Its easy to build a DFA that accepts an even number of 0s, and its easy to build a DFA that accepts an integer multiple of 3 number of 0s. A DFA for their intersection is shown in Figure DFAofIntersection, and thus the intersection of the two regular languages is thus regular.

Regular Languages are closed under Concatenation[edit | edit source]

Theorem: If L1 and L2 are both regular languages, then L1L2 is a regular language.

The concatenation of languages L1 and L2, written L1L2, is {w1w2 | w1j L1 and w2k L2}. That is, any word from L1 followed immediately by any word of L2, is a string in the language of the concatenation.

Since L1 and L2 are each regular languages, there are DFAs M1 and M2, respectively, that recognize each. Construct an NFA M12 with transitions by

  • copying M1 and M2 , call them M1' and M2',
  • make the start state of M12 be the start state of M1',
  • make the accepting states of M12 be the accepting states of M2',
  • connect M1' and M2' by adding an transition from each accepting state of M1' to the start state of M2', and
  • change every accepting state in M1' to be a non-accepting state.

M12 is an NFA that recognizes L1L2 and thus L1L2 is a regular language.

As a matter of interest and subsequent utility, we can extend concatenation to a sequence of more than two languages, so that L1L2...Lm = {w1w2...wm | w1j L1 and w2k L2 and ... and wmi Lm}.

Regular Languages are closed under Kleene Closure[edit | edit source]

Theorem: If L is a regular language then L* is a regular language.

We introduced Kleene closure in defining regular expressions, and also called it the repetition operator. The Kleene closure of a language L, L*, equals {w1w2w3...wm | for all m 0 and each wk L}. That is, strings in L* are each a concatenation of an indefinite number of strings from L.

If L is a regular language then there is a DFA, M, that recognizes it. To construct an NFA with transitions that recognizes L*,

  • copy M as M',
  • add an transition from each accepting state to M' to the existing start state of M',
  • add a new start state to M' with an transition to the original start state of M', and
  • make the new start state of M' an accepting state as well (so that the NFA accepts the empty string).

The resulting NFA accepts L* and so L* is a regular language.

Regular Languages are closed under Substitution[edit | edit source]

Substitution is the most complicated of the operations that we will consider. Suppose L is a language, regular or not, over alphabet Σ. Suppose further that for each symbol in Σ, xk, we associate a language Lxk. We will speak of the substitution operation, Subst, as being applied to each symbol of Σ, to each string of L, and to L itself. If xk is a symbol in Σ then Subst(xk) is the strings in Lxk, i.e., simply Lxk itself. If w = a1a2...am is a string in L, then Subst(w) = Subst(a1)Subst(a2)...Subst(am), that is Subst(w) is the concatenation of the languages (see above) associated with the various symbols in w. Finally, Subst(L) = {Subst(w) w L}, that is Subst(L) is the set of strings resulting from the substitution applied too all strings in L. Importantly, the alphabets for the various languages, L and the Lxk's don't have too be the same. The alphabet for the language Subst(L) is the union of the alphabets of all the Lxk languages, and that alphabet may or may not share any symbols with the alphabet of L.

If L is the set of strings of 0s and 1s with at least two consecutive 0s, and L0 = {w w matches (+)+} and L1 = {, a, ba} then Subst(0) = L0 = {w w matches (+)+} and Subst(1) = L1 = {, a, ba} . Subst(100) = {w w in Subst(1)Subst(0)Subst(0)} = {, , , , a, a, a, a, ba, ba, ..., a, ...}. That is, Subst(100) is all possible ways of drawing a string from L1, i.e., {, a, ba}, followed by two draws from L0 = {w w matches (+)+}. Subst(L) is the set of strings over an alphabet of {a,b,,} where each b must followed immediately by an 'a' (why?), and there must be at least one consecutive pair of and/or (why?).

Theorem: If L is a regular language over an alphabet Σ and for each symbol, xk in Σ, there is an associated regular language Lxk, then Subst(L) is a regular language.

If L and all Lxk's are regular languages then there are DFAs that recognize L and each of the Lxk's. Call these DFAs M (for L) and Mxk (for each Lxk). To construct an NFA with transitions for Subst(L), do as follows.

  • Make a copy of M called M'
  • For each transition in M' from state qi to qj on input symbol xk, splice in the DFA for Lxk, by
    • make a copy of each Mxk, call it Mxk'
    • add an transition from qi to the start state of Mxk'
    • add an transition from each accepting state of Mxk' to qj.

The resulting NFA recognizes Subst(L) and thus Subst(L) is a regular language.

Decision Properties of Regular Languages[edit | edit source]

The decision properties of the regular languages include

  • equivalence test of languages
  • test for emptiness of a language
  • test for all strings over an alphabet

Equivalence test of two regular languages is decidable[edit | edit source]

Theorem: If L1 is a regular language and L2 is a regular language then there is an algorithm that determines whether L1 and L2 are the same language.

We described the proof sketch above under Kinds_of_properties/Decision_properties and do not repeat that here.

Test of whether a regular language is empty is decidable[edit | edit source]

Theorem: If L is a regular language then there is an algorithm that determines whether L is the empty language ({ } or ).

There is a unique minimal-state DFA over an alphabet, Σ, that accepts the empty language -- it is a DFA with only one state, which is a non-accepting state, and all transitions for all alphabet members loop back to that one state. Given a finite representation of a regular language use the same process as described earlier of translating the input to a DFA, minimize the DFA, and see if its identical to the single-state DFA just described. If the DFAs are identical then L is empty, else its not empty.

Test of whether a regular language is Σ* is decidable[edit | edit source]

Theorem: If L is a regular language then there is an algorithm that determines whether L is Σ*.

Do very much as described immediately above with one small change.

There is a unique minimal-state DFA over an alphabet, Σ, that accepts Σ* -- it is a DFA with only one state, which is an accepting state, and all transitions for all alphabet members loop back to that one state. Given a finite representation of a regular language translate the input to a DFA, minimize the DFA, and see if its identical to the single-state DFA just described. If the DFAs are identical then L is Σ*, else its not.

Properties of Deterministic Context Free Languages[edit | edit source]

Closure Properties of DCFLs[edit | edit source]

Decision Properties of DCFLs[edit | edit source]

Properties of Context Free Languages[edit | edit source]

Closure Properties of CFLs[edit | edit source]

The CFLs are closed under

  • Union
  • Concatenation
  • Kleene Closure
  • Substitution

In contrast to the regular languages, the CFLs are not closed under complementation and the CFLs are not closed under intersection.

CFLs are closed under Union[edit | edit source]

Theorem: If L1 and L2 are both CFLs, then L1 L2 is a CFL.

If L1 and L2 are CFLs then each has an associated CFG, GL1 and GL2, that generates the respective languages. Assume that the names of the variables in the two CFGs are disjoint (so no possibility of confusion). To construct a CFG for L1 L2, call it GL1 L2, create a new start symbol, SL1 L2, with two associated productions, one to the start symbol of GL1 and one to the start symbol of GL2 (i.e., add SL1 L2 SL1 SL2). GL1 L2 is a CFG that generates L1 L2, so L1 L2 is a CFL.

CFLs are closed under Concatenation[edit | edit source]

Theorem: If L1 and L2 are both CFLs, then L1L2 is a CFL.

If L1 and L2 are CFLs then each has an associated CFG, GL1 and GL2, that generates the respective languages. Assume that the names of the variables in the two CFGs are disjoint (so no possibility of confusion). Assume the name of the start symbol for GL1 is SL1 and that the name of the start symbol for GL2 is SL2. To create a CFG for L1L2, create a start symbol SL1L2 with a single production to SL1SL2 (i.e., add SL1L2 SL1SL2). GL1L2 is a CFG that generates L1L2, so L1L2 is a CFL.

CFLs are closed under Kleene Closure[edit | edit source]

Theorem: If L is a CFLs, then L* is a CFL.

We've previously used the Kleene closure or repetition operator only in reference to regular languages, but the operation applies to languages of any class. L* is the set of strings w that are composed of 0 or more concatenated substrings, wi, where each wi is a string in L.

If L is a CFL then there is a CFG, GL, that generates L, with a start symbol that we'll call SL. To create a CFG for L*, create a new start symbol, call it SL*, with two productions, one to and one to SLSL*. (i.e., add SL* SLSL*). GL* is a CFG that generates L*, so L* is a CFL.

CFLs are closed under Substitution[edit | edit source]

Theorem: If L is a CFL over an alphabet Σ and for each symbol, xk in Σ, there is an associated CFL Lxk, then Subst(L) is a CFL.

If L is a CFL then there is a CFG that generates L, call it GL. If all Lxk's are CFLs then there are CFGs that generate each, call them GLxk respectively. To create a CFG that generates Subst(L), GS(L), replace every instance of an alphabet/terminal symbol in the productions of GL with the start symbol for the grammar of the language corresponding to that symbol, and make the productions of GS(L) be the union of all the revised productions of GL and all the productions of all the GLxk's. GS(L) is a CFG that generates Subst(L), so Subst(L) is a CFL.

CFLs are not closed under complementation[edit | edit source]

Intuitively, you might think that given a PDA for a language, we can just invert the accepting and non-accepting states as we did with FAs, but the added complexity of the stack makes this insufficient for showing the CFLs are closed under complement.

To say that a language class is not closed under an operation, is to say that there exists at least one language (for a unary operation such as complementation) or at least two languages for a binary operation such as intersection, that don't result in a language of the specified class. So, finding such a counterexample is sufficient for showing the CFLs are not closed under complementation, for example. But finding such a counterexample can be nontrivial. How do we show that the complement of a particular CFL is not a CFL? We would have to show. that there is no CFG or PDA for it.

Consider the language L = {aibjck | i,j,k 0 and i j or j k}. CFG Exercise 2 in chapter "Context Free (Type 2) Grammars and Languages" asked you to give a CFG for this language, thus demonstrating that it is a CFL. The complement of L is {anbncn | n 0}, which is not a CFL because ...

Decision Properties of CFLs[edit | edit source]

Properties of Context Sensitive Languages[edit | edit source]

Closure Properties of CSLs[edit | edit source]

The CSLs are closed under

  • Complement
  • Union
  • Intersection
  • Concatenation
  • Kleene Closure

However, the CSLs are not closed under substitution.

The CSLs are closed under Complement[edit | edit source]

If L is a CSL then its complement, Lc, is a CSL. If L is. a CSL then there is a linear bounded TM, ML, that recognizes L, and that halts on all inputs. A linear bounded TM for Lc can be created that calls ML as a subroutine and that accepts if ML rejects, and that rejects if ML accepts.

The CSLs are closed under Union[edit | edit source]

If L1 and L2 are CSLs then L1 L2 is a CSL. If L1 and L2 are CSLs then there are CSGs, GL1 and GL2, that generate L1 and L2 respectively. To construct a CSG to generate L1 L2 copy all the productions GL1 and GL2 with variables renamed as necessary so as to not confuse variables from different grammars, and create a start symbol SL1 L2 for the new grammar, GL1 L2, with two productions SL1 L2 SL1 and SL1 L2 SL2. This new grammar is a CSG since the new productions are all noncontracting, and it generates L1 L2.

The CSLs are closed under Intersection[edit | edit source]

If L1 and L2 are CSLs then L1 L2 is a CSL.

The CSLs are closed under Concatenation[edit | edit source]

If L1 and L2 are CSLs then L1L2 is a CSL.

The CSLs are closed under Kleene Closure[edit | edit source]

If L is a CSL then L*, is a CSL. We can say instead that L+ = L* - is a CSL if it is important to exclude .

The CSLs are not closed under Substitution[edit | edit source]

A table from the first edition of Hopcroft and Ullman on closure properties of languages in the Chomsky hierarchy show that the CSLs are not closed under substitution, but the table from the second edition show that the CSLs ARE closed under substitution.

Decision Properties of CSLs[edit | edit source]

Properties of Recursive Languages[edit | edit source]

As I have already noted under section "Turing Machines and Language Classes/Recursive languages", the definitional characteristic of the class of recursive languages is that there is a TM that recognizes the language and that is guaranteed to halt in both accept and reject cases. Because of this gaurantee, recognition of a recursive language is said to be decidable. We also say that the recursive languages are recognized by an algorithm. Unlike all the other language classes discussed so far, there is no class of grammar that precisely delimits the recursive languages, though of course there are grammars that generate a subset of the recursive languages, notably CSLs.

Closure Properties of Recursive Languages[edit | edit source]

The recursive languages are closed under

  • Complement
  • Union
  • Intersection
  • Concatenation
  • Repetition or Kleene Closure

However, the recursive languages are not closed under substitution.

Since there is no equivalent class of grammars for the recursive languages, all our arguments about closure (or not) of recursive languages will be based on TMs.

The Recursive languages are closed under Complement[edit | edit source]

If L is a recursive language, then there is a TM, call it ML, that recognizes L and that is guaranteed to halt for all accept cases and in all reject cases. Create a new TM, ML', that calls ML as a subroutine. If ML returns accept for a input string, then ML' returns reject. If ML returns reject for a input string, then ML' returns accept. ML' accepts the complement of L, and always halts, so the complement of L is recursive.

The Recursive languages are closed under Union[edit | edit source]

If L1 is a recursive language and L2 is a recursive language, then L1 L2 is recursive. Let ML1 and ML2 be the TMs recognizing L1 and L2 respectively and each is gauranteed to halt. From ML1 and ML2 we construct an always halting TM, ML1L2 , for L1 L2. Since TMs and computers are equivalent in terms of what can be computed, we represent ML1L2 in programming language pseudocode with the understanding that this can be translated to a TM.

<ML1L2 , w> = BOOLEAN FUNCTION Union (ML1, ML2, w) { IF ML1(w) == accept OR ML2(w) == accept THEN RETURN accept ELSE RETURN reject}

Each of the component TMs for L1 and L2 are being used as subroutines. The construction for ML1L2's behavior covers all conditions and is gauranteed to halt.

The Recursive languages are closed under Intersection[edit | edit source]

If L1 is a recursive language and L2 is a recursive language, then L1 L2 is recursive. Let ML1 and ML2 be the TMs recognizing L1 and L2 respectively and each is gauranteed to halt.

<ML1L2 , w> = BOOLEAN FUNCTION Intersection (ML1, ML2, w) { IF ML1(w) == accept AND ML2(w) == accept THEN RETURN accept ELSE RETURN reject}

The Recursive languages are closed under Concatenation[edit | edit source]

If L1 is a recursive language and L2 is a recursive language, then L1L2 is recursive. Let ML1 and ML2 be the TMs recognizing L1 and L2 respectively and each is gauranteed to halt. At a high level,

<ML1L2 , w> = BOOLEAN FUNCTION Concatenation (ML1, ML2, w) {

FOR each of the |w|+1 ways to partition w into xy

           IF M1(x) == accept and M2(y) == accept THEN RETURN accept

RETURN reject

}

The Recursive languages are closed under Kleene Closure[edit | edit source]

If L is a recursive language then L* is recursive. Let ML be the TM recognizing L and that is gauranteed to halt.

<ML*, w> = BOOLEAN FUNCTION Repetition (ML, w) {

FOR each of the 2|w|-1 ways to partition w into x1x2…xk,

             IF ML(xi) == accept for all xi in x1x2…xk THEN RETURN accept // can implement as a loop

RETURN reject

}

The Recursive Languages are not closed under Substitution[edit | edit source]

If you are given a recursive language R, and each symbol in R’s alphabet corresponds to a recursive language too, then the language that results from applying the substitution to R is not necessarily recursive. See the end-of-chapter exercises below for more.

Decision Properties of Recursive languages[edit | edit source]

Runtime Complexity[edit | edit source]

Many students who are taking a class on formal languages, automata, and computation will have already studied algorithm efficiency, perhaps in the form of big-O notation. Runtime efficiency is typically regarded as a characteristic of recursive languages (i.e, membership algorithms defined by always-halting TMs or computer programs), which is why we address efficiency in this chapter on properties of the recursive languages. It would make less sense to talk about efficiency in the case of undecidable problems, but a project asks you to investigate the relevance of efficiency and undecidable problems.

Big-O notation is used to indicate an upper bound on runtime cost, or space requirements or some other resource, but we will focus on time initially. Precisely, if n is a measure of the size of the input to a procedure, and I say that the procedure is O(f(n)), then that means that for big enough values of n, the procedure’s actual runtime g(n) ≤ c*f(n), for n ≥ t. t is the value that tells us what value of n is “big enough”. In English, this tells us that the actual runtime of a procedure, g(n), never exceeds some constant c times f(n) for big enough n.

Where do the values of constants c and t come from? For practical purposes they could come from experiments with a procedure on different size n, or from analysis, but for theoretical purposes (at least to many theoreticians much of the time) we don’t care. Its enough to know that these constants exist, and that they depend on the algorithm’s implementation details (i.e., what is the language of implementation, the hardware the procedure is run on, etc), and this is precisely why we don’t care about their particular values from a theoretical standpoint. And so we typically don’t fret about the constants and simply say the procedure is O(f(n)), where f(n) is typically a member of some general and simply-stated class of functions, like n0 or 1 (constant), log n (logarithmic), n (linear), n log n, n2 (quadratic), n3, 2n (exponential), and n! (combinatoric). I have listed these simply-stated function classes in order of increasing complexity or growth rate. That is, n log n will always be less than n2 for sufficiently sized n, for example.

If I say that an algorithm runs in O(n3) time, then that means the actual run time, g(n), of the procedure will be bounded above for big enough n: g(n)c*n3, for nt. Notice that if a algorithm is O(n), for example, then it is also O(n2) and O(n3) and … and O(n!). Convince yourself that this is true given the formal definition of big-O notation. The problem, however, with saying that an algorithm’s runtime is O(n3) when it is also O(n2) is that the former is misleading because n3 is not as "tight" an upper bound as is possible. This concern with tightly characterizing the run time complexity of an algorithm is one reason that we also like to characterize algorithms by lower bounds. Ω(f(n)) means that for big enough values of n, the algorithm’s actual runtime g(n) ≥ c*f(n), for n ≥ t. The constants for a big-Omega characterization of an algorithm may be different than the constants for a big-O characterization of the same algorithm, but again, we don’t typically care about the constants. If an algorithm can be characterized by both O(f(n)) and Ω(f(n)) (i.e., upper and lower bounds for the same f(n)) then we consider f(n) as a tight characterization of the algorithm’s runtime, and we signify this with big-Theta notation Θ(f(n)) means c1*f(n) ≥ g(n) ≥ c2*f(n) for n ≥ t (t = max(t1,t2)).

It would be easy to dive still deeper into complexity theory, but under the assumption that you have or will look at this more deeply in a advanced course on algorithm analysis, I’ll sum up with three miscellaneous points. First, its common to say that if an algorithm is characterized by O(f(n)) behavior (or Omega or Theta), then the actual runtime, g(n) = O(f(n)). This may seem like an odd use of the equality symbol and it is indeed an odd convention (if I had my druthers I might have overloaded the membership symbol instead, so g(n) ∈ O(f(n))). Second, when using any of the notations it is appropriate to say “in the worst case” or “the best case” or the “average case”, though you might think that big-O naturally corresponds to the worst case, big-Omega to the best, and big-Theta to the average, and you wouldn’t be wrong in some sense. But imagine cases, in say AI, where there is a partial ordering on run times, and in one subset of instances performance ranges from best to worst for that subset, while a different subset has a different values for best, average, and worst. In any case, you’ll hear and read that worst case for insertion sort is O(n2) and best case is O(n), as but one example where O-notation serves double duty.

Thirdly, in theory settings, particularly in introductory texts like this one, worst case performance is taken to be of most importance – just how bad can this algorithm be!?! Worst case performance that is greater than polynomial (i.e., with growth rate greater than np for any positive value of p), notably O(2n) and O(n!), will determine whether a problem is "intractable".

In short, using big-O (or big-Omega or big-Theta) there is a hierarchy of language (problem) classes: O(1) O(log n) O(n) O(n log n) O(n2) O(n3) O(2n) O(n!). There are many other language classes defined in terms of runtime of course.

Intractability[edit | edit source]

A dictionary definition of an "intractable problem" is one that is extremely hard or impossible to solve. Problems that are in RE but not recursive are undecidable, and are intractable in a strong sense. Problems that don’t have solutions that are representable by Turing machines at all (i.e., non-RE) are intractable in the strongest sense.

But what we will usually mean by intractability are problems for which the only known solution procedures are greater than polynomial in time and/or space as a function of input size (e.g., exponential, combinatorial).

P and NP[edit | edit source]

P and NP are two broad classes of languages (problems). Any deterministic algorithm that runs in O(np) time, where p is any real number constant, belongs to class P(olynomial). Remember that O(np) will include other, smaller growth rate functions, like n log n, as well. An example of a problem in P is finding the minimum weight spanning tree (MWST) of a weighted graph. Kruskal’s algorithm is a polynomial time algorithm that finds the MWST.

Any algorithm that cannot be characterized by polynomial deterministic runtime is not in P.

Figure NPprocedure: A non-deterministic polynomial algorithm is one that pursues multiple paths (of instantaneous descriptions) “simultaneously” towards a solution (brute-force, breadth-first), where each path length is a polynomial function of input size.

NP, standing for Nondeterministic Polynomial, is the class of algorithms, which when run on a nondeterministic TM that is capable of simulating all possible solution paths in parallel, will run in time that is polynomial as a function of the input size. Of course, a NTM cannot simulate all paths in parallel, so a more practical expression of the NP class are algorithms where solution paths are a polynomial function of their input in length. NP problems have a hard to solve, "easy" to validate character. Finding a solution may require exploring a large number of paths, perhaps O(2n) or O(n!) paths, but each path is polynomial in length, and thus a given solution can be validated in polynomial time, as illustrated in Figure NPprocedure.

An example of an NP problem is the Traveling Sales Problem (TSP) with a weighted graph. That is, expressed as a language, the TSP asks whether there is a cycle of all nodes in a weighted graph with a total weight that is a specified maximum or less. If so, the encoded graph is a member of the language of graphs for which there is an acceptable tour, otherwise it is not. It is standard that a satisfying tour accompany a 'yes' answer. There are at least O(n!) number of paths (hard to find) of length n, where n is the number of vertices, so each solution is O(n) in length (easy to validate).

Every problem in P is also in NP, so P NP. What is strongly suspected, but not yet proved is that P is a proper subset of NP, P NP. Despite suspicions to the contrary, you will hear that demonstrating that P NP (or P NP) is one of the biggest open problems in computational theory. In light of all this, we will call a problem intractable if it is in NP, and not known to be in P.

The Satisfiability Problem is NP[edit | edit source]

The TSP is one example of an NP problem that is not known to be in P. A second, and especially important example of an NP problem, which is not known to be in P, is satisfiability, or SAT for short.

The SAT problem is

  • Given: a Boolean expression over n variables, say in conjunctive normal form
  • Find: an assignment of truth values to all n variables such that the entire expression is true.

For example, say we are given (¬x0 ⋁ x1 ⋁ x2) ( x0 ⋁ ¬x1 ⋁ x2) (¬x0 ⋁ ¬x1 ⋁ ¬x2) (¬x0 ⋁ ¬x1 ⋁ x3).

Since there are 4 variables, x0 through x3, this is called a 4-SAT problem. There are 24 combinations of assignments to the four variables. An assignment that renders the entire expression true is x0 = false, x1 = true, x2 = true, x3 = false: (¬false ⋁ true ⋁ true) ( false ⋁ ¬true ⋁ true) (¬false ⋁ ¬true ⋁ ¬true) (¬false ⋁ ¬true ⋁ false) = (true ⋁ true ⋁ true) ( false ⋁ false ⋁ true) (true ⋁ false ⋁ false) (true ⋁ false ⋁ false). Since there is at least one assignment that renders the entire expression true, the expression is satisfiable. If all assignments led to a true expression, then the expression would be a tautology, but that need not be the case for the expression to be satisfiable.

Figure SATisNP: In the worst case the algorithm at top may need to iterate through the FOR loop 2n times.

Consider this 2-SAT problem: (¬x0 ⋁ ¬x1) ( x0 ⋁ ¬x1) (¬x0 ⋁ x1) (x0 ⋁ x1). There are 22 assignments. Confirm that none of the four assignments result in the entire expression being true. This expression in unsatisfiable (aka the expression is a contradiction).

Figure SAThillclimbing: Max_restarts and Max_tweeks are set by the user. "Tweeks" are small changes to truth assignment, such as flipping the truth value of one variable (e.g., flip the value of xk from true to false). When this algorithm answers 'no' it will only be correct with some probability since the procedure will typically stop short of examining all possible assignments. Nonetheless, enough assignments will have been examined typically to insure high probability of correctness in the 'no' case (and certainty of correctness in the 'yes' case).

SAT is NP because there are an exponential number of possible solutions, 2n, for n variables, but each solution is size n. In the worst case, an algorithm like that shown in Figure SATisNP, might have to examine all the assignments, but in point of fact, it's not generally that bad. Even problems for which a solution is nondeterministic polynomial, and there is no known polynomial time solution, its very often the case that answers can be found quickly. The SATisfiability problem is NP, but a very simple procedure that works well in practice is hill-climbing in search of a value assignment that makes an entire Boolean expression true. In this approach, a solution is guessed randomly, checked for satisfaction of the expression, and revised or simply guessed again. Details of a procedure are shown in Figure SAThillclimbing. Hill climbing, btw, is a popular greedy search procedure that is used in many experimentally-oriented fields, like AI, which are often fast and yield satisfactory results. In general, hill-climbing is one of many ways of effectively dealing with intractability in practice.

Deterministic Polynomial Time Reductions[edit | edit source]

We have previously studied reductions of a language/problem Q to a language/problem R. Recall that if we say Q reduces to R then if we have a solution for R we can use it to create a solution for Q; R is at least as hard as Q (else a solution for R would not assure a solution for Q). As one implication of this hardness observation, to show that R has a certain hardness characteristic (i.e., undecidability, intractability), or something “worse”, then show a correct reduction from a procedure that we know has that hardness characteristic to R. Previously, this hardness characteristic was undecidability. Now, it can be intractability too.

We know that SATisfiability is in NP (with no known solution in P) – we constructed an algorithm for it that was clearly NP in Figure SATisNP. We can reduce SAT to another problem, say solving simultaneous integer linear inequalities, thereby showing that this latter problem is "at least" NP. Consider the earlier SAT problem: (¬x0 ⋁ x1 ⋁ x2) ( x0 ⋁ ¬x1 ⋁ x2) (¬x0 ⋁ ¬x1 ⋁ ¬x2) (¬x0 ⋁ ¬x1 ⋁ x3). This is easily translated to an integer linear program (ILP) by replacing each Boolean variable xk with an integer variable tk, and replacing each negated variable ¬xk with (1 - tk). That's the reduction! So, the example SAT problem becomes the ILP:

(1-t0) + t1 + t2 >= 1

t0 + (1-t1) + t2 >= 1

(1-t0) + (1-t1) + (1-t2) >= 1

(1-t0) + (1-t1) + t3 >= 1

A solution to this ILP is

t0 = 0

t1 = 1

t2 = 1

t3 = 0

which can be mapped to a solution to the SAT problem, again through an easy substitution: xk = false iff tk = 0, and xk = true iff tk = 1. Confirm the correctness of this substitution to the SAT problem. The significance of all this is, again, that an algorithm for solving the ILP can be adapted to solving SAT. Moreover, the adaptation of the solution is easy in this case -- its O(n), where n is the number of source variables tk, and also number of target variables xk. So, if we should ever find an efficient, deterministic polynomial time algorithm, O(np), for solving ILPs, we will have a polyomial time algorithm for solving SAT, since O(np) + O(n) = O(np)!

In general, to the idea of reduction, we add a constraint that in cases where we care about algorithm complexity, the reduction must be polynomial in time using a deterministic algorithm, so that if efficient polynomial time algorithms are found for what are currently regarded as intractable problems, the cost of adapting the more efficient solutions to still other problems does not push the adaptation back into intractable territory.

In the case of SAT and ILP, we have a deterministic polynomial time (i.e., O(n)) reduction from SAT to ILP. Confirm that we can also construct a deterministic polynomial time reduction from ILP to SAT (also of deterministic O(n) time). So these problems are of comparable complexity. This need not always be the case, and in general the reduction in one direction may be more costly than the other direction, but still O(np) for it to be a deterministic polynomial time reduction.

NP Completeness[edit | edit source]

A problem Q is NP hard if every problem in NP deterministically polynomial reduces to Q (i.e., Q is at least as hard as every NP problem). Moreover, if (1) Q is also in the class NP, and (2) every problem in NP deterministic polynomial reduces to Q, then Q is said to be NP complete. Note that there are two conditions, as just stated, that have to be satisfied for a problem to be NP complete.

So that this is concrete, let’s say that Q is SAT. If we can show that every problem in NP reduces to SAT, then that says that a solution to SAT can be adapted to solve every other NP problem as well. And moreover, because the reductions from all NP problems to SAT will be polynomial time reductions, then if an efficient deterministic polynomial time solution is ever found for SAT (i.e., if SAT P), then every problem in NP can be solved in deterministic polynomial time, and P NP. That's significant!

But how do we show that every problem in NP reduces to SAT in polynomial time? There are an infinite number of problems/languages in NP !

We know that every problem in NP is solved by a non-deterministic Turing Machine that always halts, and we can express a "generic" NDTM for any NP problem in terms of SAT. This is one insight of Cook's Theorem.

Cook's Theorem[edit | edit source]

Cook’s Theorem says that every problem in NP (i.e., a question of membership of an input w in the language of any non-deterministic Turing Machine, NTM, assured of halting) can be deterministically polynomial-time reduced to SAT (i.e., a question of whether a Boolean expression is satisfiable).

How can we reduce <NTM, w> to a Boolean expression, ENTM,w, where ENTM,w is judged satisfiable iff NTM accepts w?

Any NP problem, by the definition of NP, has paths of configurations or instantaneous descriptions (i.e., solution or nonsolution paths) that are each O(np) in length (i.e., a (non)solution is polynomial in the size of the input, n -- remember that this is why NP solutions are 'easy' to validate). Because each transition of NTM can move its read/write head at most cell on its tape, and can only write at most one symbol of its tape, the longest that a single instantaneous description (ID) can become is also O(np) since there is a maximum of O(np) moves along any path of IDs by NTM. Cook therefore posited a matrix of O(np) rows, each an ID, and O(np) columns, each corresponding to a tape cell in an ID. Thus, each cell of the matrix is an (ID, Tape Cell) pair. Moving down the rows correspond to making moves along a path of IDs, and making a change in a tape cell is the change to that cell in moving from one ID to the next. Without loss of generality, we assume that NTM uses a one-way infinite tape.

Figure Cook'sThmMatrix: A O(n2p) matrix that is a schema for all possible trajectories of a nonderministic TM that represents an NP language, on an input of length n.

The size of this matrix is O(np) O(np) = O(n2p) as shown in Figure Cook'sThmMatrix.

Importantly, this matrix is strictly a theoretical construct, important because it puts bounds on the size of the problem to be reduced. The matrix does not represent an actual run of NTM along any one path -- it couldn't because if running NTM were part of the reduction, then the reduction would not necessaily be deterministic polynomial time! Rather, the matrix is a schema that represents the totality of the set of possible paths that can be pursued by NTM on w, just as the reduction to ENTM,w will imply the set of of possible n-tuple truth value assignments to the n variables of the ENTM,w. Also, recognize that the O(np) O(np) matrix is an upperbound on the dimensions of a possible path followed by NTM. Some problems will require fewer rows/IDs than O(np) and/or fewer tape cells in an ID than O(np). But for convenience of demonstration, we think of the upperbounds as the actual dimensions, and we assume that every row is filled with blanks of unused rightmost tape cells within an ID up to the O(np) limit on columns, and we assume that the row/ID corresponding to entering an accepting state is duplicated up to the O(np) limit on rows.

Ask yourself, if you were given the definition of NTM, and an input w, how would you confirm that an arbitrary matrix of IDs, as described above, followed from the NTM definition and that NTM accepted w.

Your procedure for analyzing a matrix would necessarily have to look at the following:

  1. Does the first row/ID of the matrix correspond to the initial ID, that is to NTM being in its start state, q0, and NTM's read/write head being at the left end of the input w, followed by blanks?
  2. Does the last row/ID of the matrix indicate that the NTM state is an accepting state, and if so this indicates acceptance, and non-acceptance otherwise?
  3. For each pair of consecutive rows/IDs, does the second of the pair follow from the first given the definition of NTM's transitions, ?

You could define this as an automated procedure for taking NTM and w, and you could embed this in a loop to look at all possible matrices. If any one of those matrices indicated acceptance of the input, then your procedure would indicate that NTM accepted w, and if all matrices corresponded to non-acceptance, then your procedure would reject w as a member of L(NTM).

Presumably you recognize the analog between determining whether one of many NTM paths accepts and determining whether one assignment of truth values satisfies a Boolean expression. So, lets convert the algorithm into a SAT problem, ENTM,w, which you can think of a "logic program" for those who have used Prolog or another logic programming language.

Figure Cook'sInitialConditions: A Boolean subexpression representing initial conditions that must be satisfied in a valid solution path by an NTM accepting w.

Step 1: To translate step 1 above into a Boolean subexpression, the first row/ID of the matrix corresponds to NTM being in its start state, q0, and NTM's read/write head being at the left end of the input w, followed by blanks. Introduce Boolean variables corresponding to the following

  • One Boolean condition is that the current state of ID 0 is the start state, q0.
  • One Boolean condition is that the read/write head of ID 0 is at tape location 0.
  • For each of the symbols in the input, w, from i = 0 to i = w—1, create a Boolean condition that the value of the ith tape cell in ID 0 is the ith symbol of w.
  • For each of the tape locations in ID 0 from w to the end (i.e., O(np)) create a Boolean condition that the value of the cell location is the blank.

Conjoin these Boolean variables into one Boolean subexpression. Call it ENTM,w,initial. This is illustrated in Figure Cook'sInitialConditions. Note that the cost of creating this subexpression is linear relative to the input size, = O(n).

Figure Cook'sFinalConditions: Boolean subexpression representing the final conditions that must be satisfied in a valid solution path by an NTM accepting w.

Step 2: To translate step 2 into a Boolean subexpression, that the last row/ID of the matrix indicates that the NTM state is an accepting state,

  • For each accepting state in NTM's definition, create a Boolean variable that reflects whether the current state of the last ID in row O(n2p) is that accepting state.

Disjoin these Boolean variables into one Boolean subexpression. Call it ENTM,accepting. This is illustrated in Figure Cook'sFinalConditions. Note that this subexpression does not depend on a particular w, and the cost of creating this subexpression is therefore constant time relative to the input size, O(1).

Figure Cook'sIntermediateConditions: Each transition, such as that for (q3, a) in red, may have more than one outcome in a nondeterministic TM; of the two possible outcomes in the figure, one is reflected in the matrix, in green, and one is not reflected, in purple.

Step 3: To translate step 3, that for each pair of consecutive rows/IDs, the second of the pair should be immediately obtainable from the first of the pair given the definition of NTM's transitions, . The subexpression that we write should specify conditions that must apply to each row/ID of the matrix (and its successor row/ID), except the last row, starting with row/ID 0.

The translation requires going through the transitions of NTM's functions and composing subexpressions for each. Figure Cook'sIntermediateConditions illustrates the translation of one transition for (q3, a). For each possible outcome, say (q5, b, R) for example, becomes a set of conjoined conditions, (csk+1= q5 ∧ Lock+1 = j+1 ∧ Zk+1,j = b). This must be repeated for each possible outcome of (q3, a) listed in the transitions' function, as well as for each entry (State, Input Symbol) pair listed in . In the blue, the Figure additionally shows that for all cells other than the one under the read/write head, the values will remain the same between ID k and ID k+1.

Given what we have said so far, we would then have a general Boolean expression, of which Figure Cook'sIntermediateConditions only shows a small part, that stated conditions necessary to the validity between an ID k to ID k+1. Given a matrix we could then loop through the consecutive IDs and check them against this Boolean expression. But to fully reduce to SAT, we cannot use explicit looping, but must explicitely replicate the Boolean expression for each value of k, from ID 0 to the penultimate ID/row.

While the process of building the subexpression does not directly depend on a particular word, w, the number of terms we must write in the subexpression, call it ENTM,|w|,intermediate depends on the size of w, i.e., the size of the matrix, O(n2p).

Taking Steps 1-3 together, the Boolean expression representing the SAT problem can be written as

  • ENTM,w = ENTM,w,initial ENTM,accepting ENTM,|w|,intermediate

There are factors that have not been directly addressed here, notably

  • that rows after entering an accepting state can simply repeat in order to fill up to O(np) rows;
  • that situations when the location of the R/W head is the leftmost location (i.e., so that there is no j-1 location) or rightmost cell (i.e., when there is no j+1 location);
  • that there must be exactly one value (tape symbol) for each cell in the matrix, which precludes impossible states that could otherwise be considered by a SAT solver.

While important to the formal proof, these further details aren't needed to appreciate the genius of Cook's theorem, and for that matter the genius of conceiving of the concept of NP completeness generally. While the Boolean expression representing the SAT problem corresponding to an arbitrary NP problem is long and involved, it is so because it is general to an infinite number of NP problems. As we saw in relation to ILP and SAT, polynomial reductions between two specific problems are often much simpler.

Again, one consequence of demonstrating that SAT is NP complete is that if it is discovered that SAT has a deterministic polynomial time solution, and thus a member of P, then all NP problems have a polynomial time solution, if by no other means than using SAT as a subroutine, and thus P = NP.

Other NP Complete Problems[edit | edit source]

Knowing that SAT is NP complete, we can now show that other NP problems are NP complete by polynomial reducing SAT to these other problems. You’ll see that we are essentially reducing both ways. Cook’s theorem polynomial reduced from every NP problem to SAT, indicating that SAT was at least as hard as any other NP problem. This is illustrated in Figure AllNPsReducedtoSAT. And now, by polynomial reducing SAT is another NP problem, we are showing that the other problem is at least as hard as SAT. For example, we already showed that SAT polynomial reduces to ILP. With both directions of reduction demonstrated, we can say that the other problem (e.g., ILP) is comparable to SAT in complexity – it too is NP complete. In Figure ExtendingNPComplete, assume that ILP is Lk in row (1) column (a), and the two-way arrow indicates that SAT polynomial reduces to Lk, and vice versa. Since Lk has now been shown to be NP complete, all NP problems reduce to Lk as well, as shown in row (1) column (b).

Figure AllNPsReducedtoSAT: Cook's theorem shows that all NP problems polynomial reduced to SAT. Thus SAT is NP hard; all NP problems can use a procedure for solving SAT as a subroutine. And because SAT is itself NP, SAT is NP complete.

One important point to emphasize is that the other problem, Lk, can now be used in subsequent reductions to show that still other problems are NP complete, since the demonstrations of equivalent complexity are transitive. Furthermore, as illustrated in row (2) of Figure ExtendingNPComplete we can take another problem in NP, say Lj, and by showing that Lk reduces to Lj we have shown that every NP problem reduces to Lj as well, and so Lj is NP complete.

Figure ExtendingNPComplete: An explanation for expanding the known NP complete problems, as illustrated in this figure, is given in the main text.

By now a large number of problems have been shown to be NP complete. A question that you might be asking yourself is whether there is any problem in NP (and not known to be in P) that is not NP complete?

Again, one consequence of demonstrating a large class of NP complete problems that are equivalent in terms of complexity is that if any one of them turns out to have a deterministic polynomial time solution, and thus a member of P, then they all have a deterministic polynomial time solution and P = NP.

Space Complexity[edit | edit source]

Properties of Recursively Enumerable (or Unrestricted) Languages[edit | edit source]

The RE languages are the broadest class of languages that we will address, and the class of RE languages includes all other language classes that we have considered -- recursive languages, CSLs, CFLs, DCFLs, and regular languages. The RE languages are equivalently defined by unrestricted grammars and by TMs that may not halt on their input in the case where that input is not a member of the language defined by the TM.

We will start with the closure properties of RE languages, then talk about the inherent undecidability of other decision questions of RE languages. Because membership in RE languages that are not also recursive is undecidable, and therefore are not implementable by algorithms, we don't include issues of algorithmic runtime or space efficiency in this section.

Closure Properties of RE languages[edit | edit source]

The RE (aka unrestricted) languages are closed under

  • Union
  • Intersection
  • Concatenation
  • Repetition or Kleene Closure
  • Substitution

However, the RE languages are not closed under Complement.

The RE languages are closed under Union[edit | edit source]

If L1 is a RE language and L2 is a RE language, then L1 L2 is RE. It is left as an exercise to show by construction of an unrestricted grammar or a TM for the language represented by L1 L2 that the RE languages are closed under union.

The RE languages are closed under Intersection[edit | edit source]

If L1 is a RE language and L2 is a RE language, then L1 L2 is RE. It is left as an exercise to show by construction of an unrestricted grammar or a TM for the language represented by L1 L2 that the RE languages are closed under intersection.

The RE languages are closed under Concatenation[edit | edit source]

If L1 is a RE language and L2 is a RE language, then L1L2 is RE. It is left as an exercise to show by construction of an unrestricted grammar or a TM for the language represented by L1L2 that the RE languages are closed under concatenation.

The RE languages are closed under Kleene Closure[edit | edit source]

If L is a RE language, then L* is RE. It is left as an exercise to show by construction of an unrestricted grammar or a TM for the language represented by L* that the RE languages are closed under Kleene Closure.

The RE languages are closed under Substitution[edit | edit source]

If L is a RE language and for each symbol, xi, in the alphabet of L there is an associated RE language, then Subst(L) is RE. It is left as an exercise to show by construction of an unrestricted grammar or a TM for the language represented by Subst(L) that the RE languages are closed under substitution.

The RE languages are not closed under Complement[edit | edit source]

If L is a RE language then its complement, , is not necessarily RE. We can show this by example of an RE language RE with a complement that is definitely not RE. We have already given such an example earlier in the text. Its left as an exercise to find and understand the example.

Decision Properties of RE languages[edit | edit source]

We covered decidability and undecidability of selected computational problems, notably questions on the membership in recursively enumerable languages. The universal language, which we re-expressed as the halting problem (i.e., will the universal Turing machine always halt on its input) is undecidable (because MU may not halt when its argument TM does not accept its input).

What are properties of the RE class (of all RE languages)?

Rice's Theorem[edit | edit source]

Every non-trivial question of the RE languages is undecidable. A question is trivial if the correct answer is either 'yes' in all cases (i.e., its a property of the RE languages) or its 'no' in the case of all RE languages. Otherwise, its a non-trivial question.

•A property of the RE languages is the subset of the RE languages with that property.

•For example

•the property of RE of being CFL is the set of CFLs.

•the property of being empty is the set {∅}.

•A property is trivial if it is either the set of all RE languages (universally true) or the empty set of languages (universally false). Otherwise the property is non-trivial.

•Note that the empty set (no languages at all), ∅, is different from {∅}, the set containing the empty language (which is accepted by a TM that accepts the empty language).

•Reminder: a set of languages can be represented by a set of strings, each representing a TM.

•Suppose LP is the set of languages with non-trivial property P.

•Is the question of a language L’s membership in the LP languages decidable or undecidable?

•More particularly, is the question of a Turing machine’s encoding of a language L’s membership in the Turing machine’s encodings of the LP languages decidable or undecidable?

•Prove the question of a language L’s membership in the LP languages is undecidable.

•Assume that membership in LP is decidable. Then there is an algorithm, TMP, that recognizes LP.

•Regardless of non-trivial P, reduce LU (the Universal Language aka the Halting Problem, known undecidable) to LP, thus showing a contradiction with assumption that LP is decidable.

Rice’s Theorem implies an infinite number of undecidable properties for recursively enumerable languages, and it does it in one fell swoop. Rice’s Theorem tells us that for sufficiently expressive languages undecidability is the norm rather than the exception.

Of the problems implied by Rice’s Theorem to be undecidable, there are different kinds. Its undecidable whether a TM accepts the empty set, or a non-empty set. Given intuitions about complementation, we might think of these questions as essentially the same, but in fact when dealing with RE languages that are not recursive our intuitions regarding complement may be misleading. In the case of the question of whether an arbitrary TM accepts the empty language is not RE – its not a property that can be tested by any TM. Whereas, the question of whether a TM accepts a non-empty language is RE (but not recursive).

Expanding the Classes of Languages[edit | edit source]

Any characteristic defines a set of languages for which the characteristic is true of all members. Any question defines a set of languages for which the answer to the question is 'yes'.

Exercises, Projects, and Discussions[edit | edit source]

RL Props Exercise 1: For each of the constructions described under Regular Languages are closed under Union, Concatenation, Kleene Closure, and Substitution, draw a visualization of the construction, choosing a way of visually representing the component DFAs in the construction, as well as the constructed NFA. Why do this exercise? Because many learners are visual learners, and undoubtedly all learners benefit from some visualization, be it mental imagery or manifest on 'paper'. An important skill for many or most in CS generally is an ability to visualize algorithms and data structures. This will undoubtedly continue to be a desirable skill even as AIs take over much of the software development burdon. As an aside, if you are interested in societal benefits of your efforts, consider and potentially act on the creation of educational materials in CS for the blind and hard of sight.

RL Props Exercise 2: For each of the constructions described under Regular Languages are closed under Union, Concatenation, Kleene Closure, and Substitution, give alternative arguments using regular expressions.

RL Props Exercise 3: For each of the constructions described under Regular Languages are closed under Union, Concatenation, Kleene Closure, and Substitution, give alternative arguments using regular grammars.

RL Props Exercise 4: Regular expressions were defined in terms of three operations -- concatenation, choice, and Kleene closure. Expand regular expressions based on closure properties for regular languages. The intent is not that you increase the representational power of REs, but that the expansion provides syntactic convenience and comprehensibility.

RL Props Exercise 5: Show that the regular languages are closed under reversal. That is, if L is a regular language then LR is a regular language.

CFL Exercise 1: Show that the CFLs are not closed under intersection.

Project 1: Investigate the properties of the class of inherently ambiguous CFLs.

Project 2: Investigate NP problems that are not NP complete

Project 3: Investigate superpolynmial algorithms that are subexponential.

Project 4: Investigate Closure properties of P, NP, and NP complete problems

Exercise RecursiveSubstition1: Give a demonstration that if you are given a recursive language R, and each symbol in R’s alphabet corresponds to a recursive language too, then the language that results from applying the substitution to R is not necessarily recursive.

Exercises (Closure Properties of RE languages) For each of the subsections above, demonstrate the truth value of the closure property as described.

Applications of Language Classes[edit | edit source]

Context Free Languages, Parsing, Lexical Analysis, and Translation[edit | edit source]

Recursive Descent Parsing and Translation[edit | edit source]

Three grammars representing the same language of arithmentic expressions. (a) is an ambiguous grammar. (b) is an unambiguous, operator precedence grammar, but is also left recursive, a problem for recursive descent parsing. (c) is an equivalent right recursive grammar and suits the parsing application well.

CFGs have been used for some time in defining the syntax of programming languages, starting with Algol. Ideally, we can directly translate a grammar into mutually recursive procedures for parsing a computer program, and ultimately for compiling it. Consider the grammars of Figure ExpressionGrammars. The first of these grammars for arithmetic expressions is simple, yet ambiguous, since id + id * id (and other strings) can be generated by two (or more) distinct leftmost derivations or distinct parse trees. So, that is unsuitable as the basis foor automated parsing. The second grammar is not ambiguous, having enforced operator precedence rules to ensure desirable, single parse trees for every string. This is as far as we got when introducing this CFL of arithmentic expressions. As it turns out, grammar (b) presents another prooblem for the recursive procedures we have in mind. This grammar is a left recursive. When this grammar is translated into mutually recursive procedures, we have a function such as Expression (i.e., E) potentially calling itself, as indicated in E E + T, with no change in the argument that is passed. This can lead to infinite recursion.

Figure AlgolGrammar: Simplified Algol 60 grammar as given by chatGPT with some minor revisions by the author. The bold red shows the arithmentic expression portion of the grammar in right recursive form, and the bold blue shows those parts of the grammar that correspond to the lexical analyzer, in close-to regular grammar form. The notation varies slightly and straightforwardly from that used in the remainder of the textbook.
Figure AlgolTranslator: A recursive descent parser and translator for a small part of Algol-60.

Figure AlgolGrammar shows a simplified CFG for the Algol-60 programming language. Algol-60, as in the year 1960, was the first programming language with syntax defined by a context free grammar. The variables in the grammar translate rather directly into mutually recursive functions in a recursive descent parser. Figure AlgolTranslator shows the parsing routines corresponding to the grammar variables of expression, term, and factor. From parse_expression(), a term is what is first expected, and so parse_term() is called to identify a portion of code that was generated by the term part of the grammar. Parse_term() then calls parse-factor() to identify the Algol code that was derived from the factor variable of the grammar.

In parse_factor() actual symbols in the program are scanned by the part of the parser called the lexical analyzer, which is concerned with identifying basic tokens, such as constants, variables, operators, and punctuation in the Algol code. In the sample translator code, the match() function is highlighted in bold blue, and this function contains a call to the next_token() function, which is essentially the lexical analyzer and performs the recognition procedures associated with the parts of the grammar in Figure AlgolGrammar that are highlighted in bold blue. Those productions, in bold blue, correspond to a regular grammar (almost), and you should confirm that productions not conforming exactly to the regular grammar constraints of A aB or A a can be easily translated to productions that do adhere to the constraints (though its a bit unwieldy to do so because of the space it would require). Though the next_token() code is not shown in this illustration, it is an important part of a parser implementation since the lexical analyzer is what actually scans the input string/program.

In Figure AlgolTranslator, the generation of machine code is shown on bold purple. The machine code language assumes that arguments are pushed (loaded) onto a stack, and binary operations such as addition and multiply pop the top two items on the stack, apply the operation, and push the result back on the stack. So in the illustration of Figure AlgolTranslator the machine code translation of the Algol expression '1.2 + Var21 * 3' reads as follows and would execute as indicated in comments when it was run.

  1. LOAD_CONST 1.2
  2. LOAD_VAR Var21 // (i.e., push the value of Var21)
  3. LOAD_CONST 3
  4. MULTIPLY // pop 3 and pop value of Var21, compute 3 * value of Var21, and push the result, P, onto the runtime stack
  5. ADD // pop P and pop 1.2, compute P + 1.2, and push the result, R, onto the runtime stack

If I had asked chatGPT to show more code for the parser, then it would include parser code for other grammar variables such as 'assignment statement' and included additional code generation commands such as STORE_VAR for storing the top of the runtime stack in a relative memory address associated with an Algol program variable name. There are other important parts of a translator, such as the symbol table for associating Algol program identifiers with computer memory (relative) addresses, that are not tied to any underlying grammar.

In sum, a recursive descent parser and translator follows elegantly from the specification of a programming language grammar. The recursive routines, as with any function calls, will push records onto the parser's runtime stack (not to be confused with the runtime stack that is assumed by the assembly code above), and this reliance on a stack by the parser makes it comprable to a PDA.

Table-Based Parsing[edit | edit source]

Relationships to Artificial Intelligence[edit | edit source]

There are many informal and formal connections between AI and formal languages, automata, and computation. This chapter delves into these connections, as well as the requisite and optional AI material. Notes embedded in the main text indicate how I often present the material over a semester. While this material is at the end of the textbook, it is typically scattered in my lectures throughout the semester.

Probabilistic Grammars and Automata[edit | edit source]

Figure ProbabilisticMechanisms: The probability of a derivation (and string) can be computed as the product of the transition probabilities used in the derivation. For example, S A with probability 0.5 and A a with probability 0.8. Thus, S a with probability 0.5 * 0.8 = 0.4.

In AI, non-determinism is prevalent, but cannot always be removed entirely while guaranteeing the same results because (a) operations/productions/transitions are uncertain, and (b) because the world/IDs are not typically complete or correct (i.e., the AI doesn’t know some things or it is wrong).

Measures of uncertainty are used in AI, most notably probabilities or fuzzy membership.

Probabilistic grammars are a type of formal grammar where each production rule has a probability associated with it.

These probabilities represent the likelihood that a particular rule will be used to generate a string.

Natural Language Processing: Probabilistic grammars can be used to model the probability distribution of sentences in a natural language, making it useful for tasks such as speech recognition and machine translation.

Information Retrieval: Probabilistic grammars can be used to generate queries that match documents in a database, allowing for more accurate search results.

DNA Sequencing: Probabilistic grammars can be used to model the probability distribution of DNA sequences, making it useful for tasks such as genome assembly and alignment.

NFA-to-DFA translation and speedup learning[edit | edit source]

The first point of inspiration is in the elimination of nondeterminism when translating an NFA to a DFA (or a nondeterministic PDA to a deterministic PDA, if the latter exists). I’ve said that NFAs and nondeterminism generally can lead to simpler and more elegant solutions, and be much easier to specify by a human, but these automata will require enumeration (depth first or breadth first or some heuristic approach) when they are actually used (to see whether a string reaches an accepting configuration along at least one path of the machine), and that is computational costly. So ideally, we can imagine that in the case of a complex problem, someone can specify an NFA solution, but then use automated translation to acquire a more efficient-to-execute deterministic recognizer. The same can be said about AI and machine learning generally. I regard it as definitional of AI that an AI system explore alternatives, requiring enumeration or search – that is nondeterminism. Machine learning can then be used to eliminate nondeterminism, and thus “speedup” problem solving, planning, and other processes by the AI.

Its rarely, if ever, the case that all sources of nondeterminism can be eliminated in an AI program, however. The reasons are several. One, operators or “transitions” in AI often do not have perfectly predictable outcomes. A robot may try to pick up a cup, for example, and the cup slips from the robot’s grasp – there may always be at least two outcomes, I’ve got the cup or I don’t, and probably more outcomes if there can be liquid in the cup or not, hot versus cold liquid, etc. While some outcomes may be rare, we still want the robot to be able to respond to them – this possibility of multiple outcomes is the clear manifestation of nondeterminism. Another source of nondeterminism that is related to the first is that the AI may not know with certainty about all relevant aspects of the environment. For example, an intelligent vehicle may not know that a pedestrian is skateboarding behind a building but is about to abruptly enter traffic, and the AI had better anticipate that possibility, as well as mundane circumstances. Nondeterminism again.

Even though its unlikely that machine learning can get rid of all nondeterminism, it can “reduce” the nondeterminism. In a real AI application we might measure the extent that the nondeterminism is reduced by the expected number of states that an AI system has to enumerate, both before learning and after learning, and while learning. The expected number of states generated during an enumeration or search is the sum of the states generated across all possible problems, weighted by the probability of each problem. So an expected number is a weighted average.

DFA-to-RE Translation and Macro Learning[edit | edit source]

The second learning task that I want to relate is called macro learning, which is used in AI learning to solve problems more effectively. In particular, if an AI observes that transitions are repeatedly sequenced together, or it discovers that certain transitions that are sequenced together lead to solutions, then the AI might form a macro operator (or macro transition) that concatenates the transitions together so that they can be applied as a single packet, or macro. This process of macro formation will both reduce the nondeterminism found in enumeration by taking large steps in individual paths or derivations, but it also adds to nondeterminism by creating additional choice points in an enumeration or search for a string’s derivation, since the macro is added to, rather than replacing, the operators (or transitions) that are used in its composition. The management of the tradeoffs in macro learning led to a good deal of research on learning choice preferences or biases as well that involved probabilities that are placed on choices. I don’t get into probabilities here, but the slides do touch on their use in probabilistic grammars which are related to AI approaches to solving problems or analogously in deriving strings.

Machine Learning, No-Free Lunch, Undecidability, Intractability[edit | edit source]

AI theorem proving[edit | edit source]

AI Planning[edit | edit source]

While the implication operator in logic theorem proving looks the same as the production symbol, , theorem proving is additive. When making an inference, nothing in the current world model is overwritten.

In contrast, in AI planning a “mental” model of the current world state is maintained and changed through the application of operators (or productions). A common representation of operators show PREconditions, which have to be true before the operator can be applied, and EFFects, which are the conditions that are true after the operator is applied. For our purposes we can think of an operator as a production, as in a grammar, PRE EFF, where effects rewrite preconditions. There is more nuance to it than that, but the details are left to an AI course.

The slides show two examples of the derivations of single plans, analogous to a single derivation of a string, but as with languages, there is a search for the derivation (plan), which requires an enumeration of many paths, most of which are unsuccessful derivations (and that’s ok – we are looking for just one plan/derivation).

Generating a plan can be modeled as generating a string with a grammar. The operators can be regarded as skeletal productions, that is productions with variables, as illustrated in the blocks-world operators of the slides – Pickup(?X), Putdown(?X), Unstack(?X, ?Y), Stack (?X, ?Y). The last slides of Friday’s lecture give a brief glimpse at this equivalence, though AI planning will generally operate with additional pattern matching capabilities than we have seen with respect to strings. The equivalence also makes reference to ridiculous computational (storage and runtime) requirements in the case where we are interpreting AI states as strings and AI operators as productions, but computational cost is not an issue we are concerned with at this point, and similar equivalence arguments that are not concerned with costs are made by Hopcroft, Motwani, and Ullman 3rd Edition (2007) when comparing Turing Machines and computers (e.g., breakout boxes on pp., 322, 346, 364).

The grammars that are equivalent to AI planning problems would typically be context sensitive, but a general or universal AI planning system is unrestricted. I don’t show this, but have a chat with chatGPT to learn more. Here are some handy references too, not just to AI planning but to other material at the intersection of CS 3252 and AI, notably forms of machine learning that have fallen out of favor, but still very interesting – these references are optional material.

Solomonoff, R. J. (1964). “A Formal Theory of Inductive Inference, Part 1”, Information and Control, Vol. 7.

Biermann, A.W. (1972). “On the Inference of Turing machines from sample computations”, Artificial Intelligence, Vol 3.

Angluin, D. (1980). “Inductive Inference of Formal Languages from Positive Data”, Information and Control, Vol. 45.

Moore, R.C. (1984). “A Formal Theory of Knowledge and Action” Technical Report 320, SRI.

Tate, Alan AI Planning MOOC.

Also, plenty of other references online (search for ‘AI Planning’) or take the AI course.

Turing Equivalency of Selected Computational Platforms[edit | edit source]

Author Kyle Moore[edit | edit source]

Large Language Models[edit | edit source]

Author Jesse Roberts[edit | edit source]

Exercises, Projects, and Discussions[edit | edit source]

Project 1: Build a compiler for a non-trivial subset of a programming language of choice.

Appendices[edit | edit source]

Course Syllabi and Slides[edit | edit source]


Doug's Fall 2023[16] offering of the Theory of Automata, Formal Languages, and Computation.

Student contributions for consideration[edit | edit source]

Additional Exercises[edit | edit source]

Index[edit | edit source]


AM system

Automated proof (aka Automated Theorem Proving)

Backwards reasoning

Cantor

Goldbach

Wittgenstein

References[edit | edit source]

  1. Hopcroft, John E., Ullman, Jeffrey D. (1969). Formal Languages and Their Relation to Automata. Addison-Wesley Publishing Company.
  2. a b Hopcroft, John E., Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley Publishing Company.
  3. a b c d Hopcroft, John E., Motwani, Rajeev, Ullman, Jeffrey D. (2007). Introduction to Automata Theory, Languages, and Computation 3rd Edition. Pearson Education, Inc/Addison-Wesley Publishing Company.
  4. a b Polya, George (1954). Induction and Analogy in Mathematics. Princeton University Press. ISBN 0-691-08005-4.
  5. Lenat, Douglas. "Automated Theory Formation in Mathematics" (PDF). Proceedings of the 5th International Joint Conference in Artificial Intelligence: pp. 833-842. {{cite journal}}: |pages= has extra text (help)
  6. Hopcroft, John E; Ullman, Jeffrey D (1979). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. p. 11. ISBN 0-201-02988-X.
  7. Hopcroft, John. E (2007). Introduction to Automata Theory, Languages, and Computation (3rd ed.). Pearson Education, Inc. p. 9. ISBN 0-321-45536-3.
  8. Shachter, Ross; Heckerman, David (Fall 1987). "Thinking Backwards for Knowledge Acquisition" (PDF). AI Magazine. 8 (3): 55–61.
  9. Hopcroft, John E; Ullman, Jeffrey D (1979). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. p. 11. ISBN 0-201-02988-X.
  10. Hopcroft, John E; Ullman, Jeffrey D (1979). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. p. 220. ISBN 0-201-02988-X.
  11. Mateescu & Salomaa (1997), Theorem 2.1, p. 187
  12. John E. Hopcroft, Jeffrey D. Ullman (1979). Introduction to Automata Theory, Languages, and Computation. Addison-Wesley. ISBN 0-201-02988-X. Exercise 9.9, p.230.
  13. Hopcroft, John E; Ullman, Jeffrey D (1969). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. p. 12. ISBN 0-201-02983-9.
  14. Hopcroft, John E; Ullman, Jeffrey D (1979). Introduction to Automata Theory, Languages, and Computation. Addison Wesley. p. 103. ISBN 0-201-02988-X.
  15. Hopcroft, John E; Motwani, Rajeev; Ullman, Jeffrey D (2007). Introduction to Automata Theory, Languages, and Computation. Pearson Education, Inc. pp. 181–182. ISBN 0-201-02983-9.
  16. Fisher, Douglas. "Theory of Automata, Formal Languages, and Computation".