Foundations of Computer Science/Limits of Computing
Limits of Computing
We have studied some big ideas of computing, which allows us to perform amazing tasks through information process. You might have gotten the impression that if we can quantify information and design an algorithm we can solve any problem using computing. In this chapter we will study the theory about the limits of computing. There are theoretical and practical limits on what computing can do for us.
Turing Machine Revisited
When we talk about the formal definition of algorithm we introduced the Turing machine, which is a mathematical model for computation. With this model we can study the nature of computing including what it is can possibly do. Alan Turing contributed greatly to the computability theory with his model machine. He proved that Turing machine is as powerful as any computer we can ever build (Turing completeness). If we can find a problem the Turing machine cannot solve, we just proved the solution is not computable, i.e. the problem is not solvable through computing on any computer.
In 1936 Alan Turing proved that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist, i.e. the halting problem is unsolvable using computing. More specifically, the halting problem is undecidable because it is the simplest type of question that answers a yes or no (decision) question: whether a program will eventually halt (stop) given an input. Obviously it is infeasible to actually run the program with the input to see whether it will halt because it may take forever if there is an infinite loop in the program. The idea is to analyze a program with given input to determine whether it will halt or not.
Alan Turing proved the halting problem is undecidable by a proof technique called proof by contradiction. It would be very helpful to review some of the sample proofs on wikipedia. Another classic example is the proof of Euclid's theorem in number theory, which asserts that there are infinitely many prime numbers. All proofs by contradiction start with an assumption that the proposition we are trying to proof is false, follow a logical sequence of valid steps, and arrive at a conclusion that is clearly false or contradicts, which is also false because a statement can be either true or false but never both.
If we assume the halting problem is decidable/solvable we should be able to design an algorithm and implement it as a program that solve the problem for us. The program should take a program and the input to the program as a input parameters and return the answer - whether the program will halt or not on the input. Such a program may sound strange because it takes a program as input, but we have seen such programs, e.g. the higher order function blocks in Snap! takes a block (program) as input date and an interpreter program takes the source code of a program as data and runs the program. Programs are data and there no intrinsic difference between the tow. The following proof shows that a program that answers the halting question cannot exist.
- Assume such a program A exists. A(P, D) -> whether program P halt on input data D.
- We can create another program B that takes a program as input. Inside program B we can call program A with the given input to determine whether the input program will halt on itself as input data and if the answer is no (will not halt) we halt (return) and if the answer is yes (will halt) loop forever.
B(P): if A(P, P) = yes infinite loop else return
- What happens if we feed program B to itself as the input? or more simply, would program B halt on itself? There are two possible outcomes of the exercise: program B halts on itself or program B runs forever when itself is the input. The actual outcome/result depends on the outcome of program A called inside program B. If program B halts on itself, according to the design of program B it should run forever because the program A with program B as both inputs should run an infinite loop. On the other hand, if program B will not halt on itself, the output from program B with itself as input should return right away. Either way it is a contradiction.
- So far, we have made an assumption, followed a set of logically sound steps, and finally arrived at contradictions. What went wrong? The contradictions cannot be true because they are logically impossible. The steps are logically sound and the only part that could be wrong is the assumption. Therefore, the assumption cannot be true, i.e. the halting problem is not decidable.
Here is a YouTube video illustrating the proof https://www.youtube.com/watch?v=92WHN-pAFCs
Halting problem is hard because it is not solvable algorithmically even in principle. There are other hard problems that are solvable in principle but in practice they are close to being impossible to solve. As you can see we can categorize problems by the performance of the best known algorithms. If a problem can be solved using a fast algorithm, the problem is easy because we can use a computer to solve it fast. On the contrary if the best known algorithm we know takes a long time to solve a problem, it is hard because computers cannot solve it fast.
Using algorithm complexity theory we can put each algorithm into a particular category according to the algorithm's complexity. If the big-O notation of a category contains only polynomial terms, the problems solvable using algorithms in this category are called P problems (Polynomial time solutions exist), such as , , and . The P problems are the easy problems to computers. Among the problems without a polynomial time solution there are problems that if we can guess the solution it can be verified in polynomial time. For example, the integer factorization (or prime decomposition) problem has no known polynomial time solution but given an answer we can verify it very quickly by simply multiplying the factors and comparing the result to the integer. These types of problems are called NP (Non-deterministic Polynomial) problems.
Collectively we call problems that take too long to solve intractable problems, which include problems with best algorithm in exponential time () or those with polynomial time solutions but the exponent is too larger, e.g. .
If a problem's best algorithmic solution is in the , when and a computer does operations per second it would take years (the age of the universe) to solve the problem on the computer.
P v.s. NP
Obviously P is a subset of NP because NP is defined only by polynomial answer verification time and being able to solve a problem in polynomial time (P) certainly qualifies for that. Whether P is a proper subset of NP or, in other words, whether remains one of the open questions in computer science. Nobody knows the answer. You can win a million dollars if you can solve it as one of the Millennium Prize Problems.
To attack this P v.s. NP problem theoretical computer scientist have defined another category called NP-complete problems. The relationships among the three categories are illustrated in the following figure.
All problems in this category are NP problems sharing one special property - ALL other NP-complete problems can be translated/reduced to each of the NP-complete problems in polynomial time. Because of the nature of NP-completeness if we can solve ANY single problem in this category we have proved all NP problems are solvable in polynomial time, i.e. . We can take any NP problem, translate it to the solved NP-complete problem, and solve the problem in polynomial time. The overall time is still polynomial because polynomial + polynomial is polynomial. Thousands of NP-complete problems have been discovered but none of them has been solved. NP-complete problems are, in a sense, the most difficult known problems.
Most computer scientist believe because the implications otherwise. The "creative leaps" will disappear because solving a problem is as easy as being able to recognize the right answer. Most encryption algorithms are computationally secure because breaking them are NP-problems - there is no known efficient polynomial time solution. If then all encryptions are broken.
There are other unsolved problems in computer science. You can find a list of such problems at https://en.wikipedia.org/wiki/List_of_unsolved_problems_in_computer_science