Algorithms/Print version

From Wikibooks, the open-content textbooks collection

Jump to: navigation, search
Image:AlgorithmsTitleImage.png

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Contents

If you have saved this file to your computer, click on a link in the contents to go to that section.

Introduction

This book covers techniques for the design and analysis of algorithms. The algorithmic techniques covered include: divide and conquer, backtracking, dynamic programming, greedy algorithms, and hill-climbing.

Any solvable problem generally has at least one algorithm of each of the following types:

  1. the obvious way;
  2. the methodical way;
  3. the clever way; and
  4. the miraculous way.

On the first and most basic level, the "obvious" solution might try to exhaustively search for the answer. Intuitively, the obvious solution is the one that comes easily if you're familiar with a programming language and the basic problem solving techniques.

The second level is the methodical level and is the heart of this book: after understanding the material presented here you should be able to methodically turn most obvious algorithms into better performing algorithms.

The third level, the clever level, requires more understanding of the elements involved in the problem and their properties or even a reformulation of the algorithm (e.g., numerical algorithms exploit mathematical properties that are not obvious). A clever algorithm may be hard to understand by being non-obvious that it is correct, or it may be hard to understand that it actually runs faster than what it would seem to require.

The fourth and final level of an algorithmic solution is the miraculous level: this is reserved for the rare cases where a breakthrough results in a highly non-intuitive solution.

Naturally, all of these four levels are relative, and some clever algorithms are covered in this book as well, in addition to the methodical techniques. Let's begin.

Prerequisites

To understand the material presented in this book you need to know a programming language well enough to translate the pseudocode in this book into a working solution. You also need to know the basics about the following data structures: arrays, stacks, queues, linked-lists, trees, heaps (also called priority queues), disjoint sets, and graphs.

Additionally, you should know some basic algorithms like binary search, a sorting algorithm (merge sort, heap sort, insertion sort, or others), and breadth-first or depth-first search.

If you are unfamiliar with any of these prerequisites you should review the material in the Data Structures book first.

When is Efficiency Important?

Not every problem requires the most efficient solution available. For our purposes, the term efficient is concerned with the time and or space needed to perform the task. When either time or space is abundant and cheap, it may not be worth it to pay a programmer to spend a day or so working to make a program faster.

However, here are some cases where efficiency matters:

  • When resources are limited, a change in algorithms could create great savings and allow limited machines (like cell phones, embedded systems, and sensor networks) to be stretched to the frontier of possibility.
  • When the data is large a more efficient solution can mean the difference between a task finishing in two days versus two weeks. Examples include physics, genetics, web searches, massive online stores, and network traffic analysis.
  • Real time applications: the term "real time applications" actually refers to computations that give time guarantees, versus meaning "fast." However, the quality can be increased further by choosing the appropriate algorithm.
  • Computationally expensive jobs, like fluid dynamics, partial differential equations, VLSI design, and cryptanalysis can sometimes only be considered when the solution is found efficiently enough.
  • When a subroutine is common and frequently used, time spent on a more efficient implementation can result in benefits for every application that uses the subroutine. Examples include sorting, searching, pseudorandom number generation, kernel operations (not to be confused with the operating system kernel), database queries, and graphics.

In short, it's important to save time when you do not have any time to spare.

When is efficiency unimportant? Examples of these cases include prototypes that are used only a few times, cases where the input is small, when simplicity and ease of maintenance is more important, when the area concerned is not the bottle neck, or when there's another process or area in the code that would benefit far more from efficient design and attention to the algorithm(s).

Inventing an Algorithm

Because we assume you have some knowledge of a programming language, let's start with how we translate an idea into an algorithm. Suppose you want to write a function that will take a string as input and output the string in lowercase:

// tolower -- translates all alphabetic, uppercase characters in str to lowercase
function tolower(string str): string

What first comes to your mind when you think about solving this problem? Perhaps these two considerations crossed your mind:

  1. Every character in str needs to be looked at
  2. A routine for converting a single character to lower case is required

The first point is "obvious" because a character that needs to be converted might appear anywhere in the string. The second point follows from the first because, once we consider each character, we need to do something with it. There are many ways of writing the tolower function for characters:

function tolower(character c): character

There are several ways to implement this function, including:

  • look c up in a table -- a character indexed array of characters that holds the lowercase version of each character.
  • check if c is in the range 'A' ≤ c ≤ 'Z', and then add a numerical offset to it.

These techniques depend upon the character encoding. (As an issue of separation of concerns, perhaps the table solution is stronger because it's clearer you only need to change one part of the code.)

However such a subroutine is implemented, once we have it, the implementation of our original problem comes immediately:

// tolower -- translates all alphabetic, uppercase characters in str to lowercase
function tolower(string str): string
  let result := ""
  for-each c in str:
    result.append(tolower(c))
  repeat
  return result
end
This code sample is also available in Ada.

The loop is the result of our ability to translate "every character needs to be looked at" into our native programming language. It became obvious that the tolower subroutine call should be in the loop's body. The final step required to bring the high-level task into an implementation was deciding how to build the resulting string. Here, we chose to start with the empty string and append characters to the end of it.

Now suppose you want to write a function for comparing two strings that tests if they are equal, ignoring case:

// equal-ignore-case -- returns true if s or t are equal, ignoring case
function equal-ignore-case(string s, string t): boolean

These ideas might come to mind:

  1. Every character in strings s and t will have to be looked at
  2. A single loop iterating through both might accomplish this
  3. But such a loop should be careful that the strings are of equal length first
  4. If the strings aren't the same length, then they cannot be equal because the consideration of ignoring case doesn't affect how long the string is
  5. A tolower subroutine for characters can be used again, and only the lowercase versions will be compared

These ideas come from familiarity both with strings and with the looping and conditional constructs in your language. The function you thought of may have looked something like this:

// equal-ignore-case -- returns true if s or t are equal, ignoring case
function equal-ignore-case(string s[1..n], string t[1..m]): boolean
  if n != m:
    return false               \if they aren't the same length, they aren't equal\
  fi
  
  for i := 1 to n:
    if tolower(s[i]) != tolower(t[i]):
      return false
    fi
  repeat
  return true
end
This code sample is also available in Ada.

Or, if you thought of the problem in terms of functional decomposition instead of iterations, you might have thought of a function more like this:

// equal-ignore-case -- returns true if s or t are equal, ignoring case
function equal-ignore-case(string s, string t): boolean
  return tolower(s).equals(tolower(t))
end

Alternatively, you may feel neither of these solutions is efficient enough, and you would prefer an algorithm that only ever made one pass of s or t. The above two implementations each require two-passes: the first version computes the lengths and then compares each character, while the second version computes the lowercase versions of the string and then compares the results to each other. (Note that for a pair of strings, it is also possible to have the length precomputed to avoid the second pass, but that can have its own drawbacks at times.) You could imagine how similar routines can be written to test string equality that not only ignore case, but also ignore accents.

Already you might be getting the spirit of the pseudocode in this book. The pseudocode language is not meant to be a real programming language: it abstracts away details that you would have to contend with in any language. For example, the language doesn't assume generic types or dynamic versus static types: the idea is that it should be clear what is intended and it should not be too hard to convert it to your native language. (However, in doing so, you might have to make some design decisions that limit the implementation to one particular type or form of data.)

There was nothing special about the techniques we used so far to solve these simple string problems: such techniques are perhaps already in your toolbox, and you may have found better or more elegant ways of expressing the solutions in your programming language of choice. In this book, we explore general algorithmic techniques to expand your toolbox even further. Taking a naive algorithm and making it more efficient might not come so immediately, but after understanding the material in this book you should be able to methodically apply different solutions, and, most importantly, you will be able to ask yourself more questions about your programs. Asking questions can be just as important as answering questions, because asking the right question can help you reformulate the problem and think outside of the box.

Understanding an Algorithm

Computer programmers need an excellent ability to reason with multiple-layered abstractions. For example, consider the following code:

function foo(integer a):
  if (a / 2) * 2 == a:
     print "The value " a " is even."
  fi
end

To understand this example, you need to know that integer division uses truncation and therefore when the if-condition is true then the least-significant bit in a is zero (which means that a must be even). Additionally, the code uses a string printing API and is itself the definition of a function to be used by different modules. Depending on the programming task, you may think on the layer of hardware, on down to the level of processor branch-prediction or the cache.

Often an understanding of binary is crucial, but many modern languages have abstractions far enough away "from the hardware" that these lower-levels are not necessary. Somewhere the abstraction stops: most programmers don't need to think about logic gates, nor is the physics of electronics necessary. Nevertheless, an essential part of programming is multiple-layer thinking.

But stepping away from computer programs toward algorithms requires another layer: mathematics. A program may exploit properties of binary representations. An algorithm can exploit properties of set theory or other mathematical constructs. Just as binary itself is not explicit in a program, the mathematical properties used in an algorithm are not explicit.

Typically, when an algorithm is introduced, a discussion (separate from the code) is needed to explain the mathematics used by the algorithm. For example, to really understand a greedy algorithm (such as Dijkstra's algorithm) you should understand the mathematical properties that show how the greedy strategy is valid for all cases. In a way, you can think of the mathematics as its own kind of subroutine that the algorithm invokes. But this "subroutine" is not present in the code because there's nothing to call. As you read this book try to think about mathematics as an implicit subroutine.

Overview of the Techniques

The techniques this book covers are highlighted in the following overview.

  • Divide and Conquer: Many problems, particularly when the input is given in an array, can be solved by cutting the problem into smaller pieces (divide), solving the smaller parts recursively (conquer), and then combining the solutions into a single result. Examples include the merge sort and quicksort algorithms.
  • Randomization: Increasingly, randomization techniques are important for many applications. This chapter presents some classical algorithms that make use of random numbers.
  • Backtracking: Almost any problem can be cast in some form as a backtracking algorithm. In backtracking, you consider all possible choices to solve a problem and recursively solve subproblems under the assumption that the choice is taken. The set of recursive calls generates a tree in which each set of choices in the tree is considered consecutively. Consequently, if a solution exists, it will eventually be found.

    Backtracking is generally an inefficient, brute-force technique, but there are optimizations that can be performed to reduce both the depth of the tree and the number of branches. The technique is called backtracking because after one leaf of the tree is visited, the algorithm will go back up the call stack (undoing choices that didn't lead to success), and then proceed down some other branch. To be solved with backtracking techniques, a problem needs to have some form of "self-similarity," that is, smaller instances of the problem (after a choice has been made) must resemble the original problem. Usually, problems can be generalized to become self-similar.

  • Dynamic Programming: Dynamic programming is an optimization technique for backtracking algorithms. When subproblems need to be solved repeatedly (i.e., when there are many duplicate branches in the backtracking algorithm) time can be saved by solving all of the subproblems first (bottom-up, from smallest to largest) and storing the solution to each subproblem in a table. Thus, each subproblem is only visited and solved once instead of repeatedly. The "programming" in this technique's name comes from programming in the sense of writing things down in a table; for example, television programming is making a table of what shows will be broadcast when.
  • Greedy Algorithms: A greedy algorithm can be useful when enough information is known about possible choices that "the best" choice can be determined without considering all possible choices. Typically, greedy algorithms are not challenging to write, but they are difficult to prove correct.
  • Hill Climbing: The final technique we explore is hill climbing. The basic idea is to start with a poor solution to a problem, and then repeatedly apply optimizations to that solution until it becomes optimal or meets some other requirement. An important case of hill climbing is network flow. Despite the name, network flow is useful for many problems that describe relationships, so it's not just for computer networks. Many matching problems can be solved using network flow.


Algorithm and code example

Level 1 (easiest)

1. Find maximum With algorithm and several different programming languages

2. Find minimum With algorithm and several different programming language

3. Find average With algorithm and several different programming language

4. Find mode With algorithm and several different programming language

5. Find total With algorithm and several different programming language

6. Counting With algorithm and several different programming language

Level 2

1. Talking to computer Lv 1 With algorithm and several different programming language

2. Sorting-bubble sort With algorithm and several different programming language

3.

Level 3

1. Talking to computer Lv 2 With algorithm and several different programming language

Level 4

1. Talking to computer Lv 3 With algorithm and several different programming language

2. Find approximate maximum With algorithm and several different programming language

Level 5

1. Quicksort

Mathematical Background

Before we begin learning algorithmic techniques, we take a detour to give ourselves some necessary mathematical tools. First, we cover mathematical definitions of terms that are used later on in the book. By expanding your mathematical vocabulary you can be more precise and you can state or formulate problems more simply. Following that, we cover techniques for analysing the running time of an algorithm. After each major algorithm covered in this book we give an analysis of its running time as well as a proof of its correctness

Mathematical Definitions

A set can be intuitively thought of as a collection of different objects. Such an object can be anything, including another set. The fundamental operation on a set is "member of", written "\in". Two sets are equal if and only if they contain the same members. The following are all sets:

{1,2}
The set containing the integers "1" and "2".
\mathbb{N}
The set of all natural numbers (including 0).
\mathbb{Z}
The set of all integers.
\mathbb{Q}
The set of all rational numbers.
\mathbb{R}
The set of all real numbers.
\empty = \{\}
The set that contains nothing, known as the empty set.

We can also describe a new set using a set comprehension of the form

\{i\in S : P(i)\}

where S is an already-defined set and P is an arbitrary predicate (something that is either true or false of everything it is applied to). You get the new set by going through all of the elements i in S (whose elements you already know), and picking out the ones for which P(i) is true. For example,

\{i\in\mathbb{Z} : i \bmod 2 = 0\}

is the set of all even integers. Frequently in this book, we will leave out the \in S and expect it to be implied from the context.

An ordered n-tuple, often called just an n-tuple, is an ordered group of n objects, possibly not all of the same type. It is written "\langle a, b, c,\ldots\rangle". The same object can appear more than once in a given n-tuple. Two n-tuples are considered equal when they have exactly the same objects in exactly the same positions. The most common n-tuple is the "ordered pair," written "\langle a, b\rangle".

We can compare sets based on their elements. The fundamental comparisons are true when one set contains all of the members of another set. In the following definitions, A and B, and C are sets.

Comparison name Notation Mathematical definition English definition
Subset A\subseteq B x\in A\Rightarrow x\in B All members of A are also members of B. For any set A, \empty\subseteq A and A\subseteq A.
Proper subset A\subset B A\subseteq B\and A\ne B This comparison is used when we want to exclude the set itself.
Superset A\supseteq B B\subseteq A
Proper superset A\supset B B\subset A

There are also many operations that apply to sets.

Operation name Notation Mathematical definition English definition
Union A\cup B \{x:x\in A \or x\in B\} The set of things that are in either A or B.
Intersection A\cap B \{x:x\in A\and x\in B\} The set of things that are in both A and B.
Set difference A\setminus B or sometimes A - B \{x:x\in A\and x\notin B\} The set of things in A but not B.
Complement A - 1, AC, or A' \{x\in U:x\notin A\} The set of things not in A. This is only well-defined with respect to some implied universal set U.
Power set \wp(A) or 2A \{S:S\subset A\} The set of subsets of A. For any set A, \empty\in\wp(A) and A\in\wp(A).
Cartesian product A\times B \{\langle a,b\rangle:a\in A\and b\in B\} The set of every pairing of an element from A with an element from B.

We can extend the Cartesian product definition to include n-ary products, defined as

A_1\times\ldots\times A_n = \{\langle a_1,\ldots,a_n\rangle:a_i\in A_i\quad\forall 1\le i\le n\}.

Using this extension, we can also define Cartesian exponentiation:

A^n = \begin{matrix}\underbrace{A\times\ldots\times A} \\ \mbox{n times} \end{matrix}.

A relation R on a set S is a subset of S\times S. We often write xRy to mean \langle x, y\rangle\in R. Some common relations are: "is taller than", "knows about", "is derived from", "lives near", and "<". We can classify relations by what objects they relate:

A relation R on S is when (for all a, b, and c in S)
reflexive aRa
irreflexive \neg aRa
symmetric if aRb then bRa
antisymmetric if aRb and bRa, then a = b
asymmetric if aRb then \neg bRa
transitive if aRb and bRc then aRc
intransitive if aRb and bRc then \neg aRc
a weak partial order it is reflexive, antisymmetric, and transitive
a weak total order it is a weak partial order and either aRb or bRa
a strict partial order it is irreflexive, asymmetric, and transitive
a strict total order it is a strict partial order and either aRb, bRa, or a = b
well founded there are no infinite descending chains in S

We usually use the symbol "≤" for a weak order and "<" for a strict order. The set of integers, the set of floating-point numbers, and the set of strings can all have a total ordering defined on them.

Note that none of the pairs of reflexive and irreflexive, symmetric and asymmetric, or transitive and intransitive are opposites of each other. For example, the relation \{\langle 1,1\rangle,\langle 1,2\rangle\} is neither reflexive nor irreflexive.

A sequence is an ordered list, ordered multiset, or an ordered array of items of the same type. The following variables are all sequences:

let A := array {"a", "tree", "that", "is", "inflexible", "will", "snap"}
let B := list {1, 4, 1, 5, 9, 2, 6, 5, 3, 5}
let C := array {}

Here, A is an array of strings, B is a list of integers, and C is an empty array. Note that B contains the values "1" twice and "5" three times.

A subsequence of a sequence is a new sequence formed by deleting some elements from the old sequence and leaving the remaining elements in the same relative order. For example,

let A_T_words:= array {"tree", "that"}
let B_evens := list {4, 2, 6}
let B_odds := list {1, 1, 5, 9, 5, 3, 5}
let B_primes := list {5, 2, 5, 3, 5}
let B_empty := list {}

Note that the empty sequence (C and B_empty, above) is a subsequence of every sequence. Also note that a sequence can have many different subsequences.

A median of a set of values is the value that separates the highest half of the values from the lowest half. To find the median, arrange all the observations from lowest value to highest value and pick the middle one. If there are an even number of values, the median can be defined as the mean of the two middle values.

The closed form of a function is a description of the function in mathematical terms that use a bounded number of well-known operations. For example, expressions using only addition, subtraction, multiplication, division, exponentiation, negation, absolute-value, and the factorial function would be considered closed form. Specifically not allowed is when the number of operations depends upon the values of the function's variable. For example, here is a non-closed form version of the function sum(n), the sum of the first n positive integers:

\textrm{sum}(n) =\sum_{i=1}^{n} i = 1 + 2 + \cdots + (n-1) + n.

And, here is the closed-form version of that same function:

\textrm{sum}(n) = \frac{n(n + 1)}{2}.

Asymptotic Notation

In addition to correctness another important characteristic of a useful algorithm is its time and memory consumption. Time and memory are both valuable resources and there are important differences (even when both are abundant) in how we can use them.

How can you measure resource consumption? One way is to create a function that describes the usage in terms of some characteristic of the input. One commonly used characteristic of an input dataset is its size. For example, suppose an algorithm takes as input an array of n integers. We can describe the time this algorithm takes as a function f written in terms of n. For example, we might write:

f(n) = n2 + 3n + 14

where the value of f(n) is some unit of time (in this discussion the main focus will be on time, but we could do the same for memory consumption). Rarely are the units of time actually in seconds, because that would depend on the machine itself, the system it's running, and its load. Instead, the units of time typically used are in terms of the number of some fundamental operation performed. For example, some fundamental operations we might care about are: the number of additions or multiplications needed; the number of element comparisons; the number of memory-location swaps performed; or the raw number of machine instructions executed. In general we might just refer to these fundamental operations performed as steps taken.

Is this a good approach to determine an algorithm's resource consumption? Yes and no. When two different algorithms are similar in time consumption a precise function might help to determine which algorithm is faster under given conditions. But in many cases it is either difficult or impossible to calculate an analytical description of the exact number of operations needed, especially when the algorithm performs operations conditionally on the values of its input. Instead, what really is important is not the precise time required to complete the function, but rather the degree that resource consumption changes depending on its inputs. Concretely, consider these two functions, representing the computation time required for each size of input dataset:

f(n) = n3 - 12n2 + 20n + 110
g(n) = n3 + n2 + 5n + 5

They look quite different, but how do they behave? Let's look at a few plots of the function (f(n) is in red, g(n) in blue):

Plot of f and g, in range 0 to 5
Plot of f and g, in range 0 to 5
Plot of f and g, in range 0 to 15
Plot of f and g, in range 0 to 15
Plot of f and g, in range 0 to 100
Plot of f and g, in range 0 to 100
Plot of f and g, in range 0 to 1000
Plot of f and g, in range 0 to 1000

In the first, very-limited plot the curves appear somewhat different. In the second plot they start going in sort of the same way, in the third there is only a very small difference, and at last they are virtually identical. In fact, they approach n3, the dominant term. As n gets larger, the other terms become much less significant in comparison to n3.

As you can see, modifying a polynomial-time algorithm's low-order coefficients doesn't help much. What really matters is the highest-order coefficient. This is why we've adopted a notation for this kind of analysis. We say that:

f(n) = n3 - 12n2 + 20n + 110 = O(n3)

We ignore the low-order terms. We can say that:

O(\log {n}) \le O(\sqrt{n}) \le O(n) \le O(n \log {n}) \le O(n^2) \le O(n^3) \le O(2^n)

This gives us a way to more easily compare algorithms with each other. Running an insertion sort on n elements takes steps on the order of O(n2). Merge sort sorts in O(nlogn) steps. Therefore, once the input dataset is large enough, merge sort is faster than insertion sort.

In general, we write

f(n) = O(g(n))

when

\exists c>0, \exists n_0> 0, \forall n\ge n_{0}: 0\le f(n)\le c\cdot g(n).

That is, f(n) = O(g(n)) holds if and only if there exists some constants c and n0 such that for all n > n0 f(n) is positive and less than or equal to cg(n).

Note that the equal sign used in this notation describes a relationship between f(n) and g(n) instead of reflecting a true equality. In light of this, some define Big-O in terms of a set, stating that:

f(n)\in O(g(n))

when

f(n)\in \{f(n) : \exists c>0, \exists n_0> 0, \forall n\ge n_0: 0\le f(n)\le c\cdot g(n)\}.

Big-O notation is only an upper bound; these two are both true:

n3 = O(n4)
n4 = O(n4)

If we use the equal sign as an equality we can get very strange results, such as:

n3 = n4

which is obviously nonsense. This is why the set-definition is handy. You can avoid these things by thinking of the equal sign as a one-way equality, i.e:

n3 = O(n4)

does not imply

O(n4) = n3

Always keep the O on the right hand side.

Big Omega

Sometimes, we want more than an upper bound on the behavior of a certain function. Big Omega provides a lower bound. In general, we say that

f(n) = Ω(g(n))

when

\exists c>0, \exists n_0> 0, \forall n\ge n_{0}: 0\le c\cdot g(n)\le f(n).

i.e. f(n) = O(g(n)) if and only if there exist constants c and n0 such that for all n>n0 f(n) is positive and greater than or equal to cg(n).

So, for example, we can say that

n2 - 2n = Ω(n2) - (c=1/2, n0=4) or
n2 - 2n = Ω(n) - (c=1, n0=3),

but it is false to claim that

n2 - 2n = Ω(n3).

Big Theta

When a given function is both O(g(n)) and Ω(g(n)), we say it is Θ(g(n)), and we have a tight bound on the function. A function f(n) is Θ(g(n)) when

\exists c_1>0, \exists c_2>0, \exists n_0> 0, \forall n\ge n_0 : 0\le c_1\cdot g(n)\le f(n)\le c_2\cdot g(n),

but most of the time, when we're trying to prove that a given f(n) = Θ(g(n)), instead of using this definition, we just show that it is both O(g(n)) and Ω(g(n)).

Little-O and Omega

When the asymptotic bound is not tight, we can express this by saying that f(n) = o(g(n)) or f(n) = ω(g(n)). The definitions are:

f(n) is o(g(n)) iff \forall c>0, \exists n_0> 0, \forall n\ge n_0: 0\le f(n) < c\cdot g(n) and
f(n) is ω(g(n)) iff \forall c>0, \exists n_0> 0, \forall n\ge n_0: 0\le c\cdot g(n) < f(n).

Note that a function f is in o(g(n)) when for any coefficient of g, g eventually gets larger than f, while for O(g(n)), there only has to exist a single coefficient for which g eventually gets at least as big as f.

[TODO: define what T(n,m) = O(f(n,m)) means. That is, when the running time of an algorithm has two dependent variables. Ex, a graph with n nodes and m edges. It's important to get the quantifiers correct!]

Algorithm Analysis: Solving Recurrence Equations

Merge sort of n elements: T(n) = 2 * T(n / 2) + c(n) This describes one iteration of the merge sort: the problem space n is reduced to two halves (2 * T(n / 2)), and then merged back together at the end of all the recursive calls (c(n)). This notation system is the bread and butter of algorithm analysis, so get used to it.

There are some theorems you can use to estimate the big Oh time for a function if its recurrence equation fits a certain pattern.

[TODO: write this section]

Substitution method

Formulate a guess about the big Oh time of your equation. Then use proof by induction to prove the guess is correct.

[TODO: write this section]

Summations

[TODO: show the closed forms of commonly needed summations and prove them]

Draw the Tree and Table

This is really just a way of getting an intelligent guess. You still have to go back to the substitution method in order to prove the big Oh time.

[TODO: write this section]

The Master Theorem

Consider a recurrence equation that fits the following formula:

T(n) = a T\left({\frac{n}{b}}\right) + O(n^k)

for a ≥ 1, b > 1 and k ≥ 0. Here, a is the number of recursive calls made per call to the function, n is the input size, b is how much smaller the input gets, and k is the polynomial order of an operation that occurs each time the function is called (except for the base cases). For example, in the merge sort algorithm covered later, we have

T(n) = 2 T\left({\frac{n}{2}}\right) + O(n)

because two subproblems are called for each non-base case iteration, and the size of the array is divided in half each time. The O(n) at the end is the "conquer" part of this divide and conquer algorithm: it takes linear time to merge the results from the two recursive calls into the final result.

Thinking of the recursive calls of T as forming a tree, there are three possible cases to determine where most of the algorithm is spending its time ("most" in this sense is concerned with its asymptotic behaviour):

  1. the tree can be top heavy, and most time is spent during the initial calls near the root;
  2. the tree can have a steady state, where time is spread evenly; or
  3. the tree can be bottom heavy, and most time is spent in the calls near the leaves

Depending upon which of these three states the tree is in T will have different complexities:

The Master Theorem

Given T(n) = a T\left({\frac{n}{b}}\right) + O(n^k) for a ≥ 1, b > 1 and k ≥ 0:

  • If a < bk, then T(n) = O(n^k)\ (top heavy)
  • If a = bk, then T(n) = O(n^k\cdot \log n) (steady state)
  • If a > bk, then T(n) = O(n^{\log_b a}) (bottom heavy)

For the merge sort example above, where

T(n) = 2 T\left({\frac{n}{2}}\right) + O(n)

we have

a=2, b=2, k=1\implies b^k = 2

thus, a = bk and so this is also in the "steady state": By the master theorem, the complexity of merge sort is thus

T(n) = O(n1logn) = O(nlogn).

Amortized Analysis

[Start with an adjacency list representation of a graph and show two nested for loops: one for each node n, and nested inside that one loop for each edge e. If there are n nodes and m edges, this could lead you to say the loop takes O(nm) time. However, only once could the innerloop take that long, and a tighter bound is O(n+m).]


Divide and Conquer

The first major algorithmic technique we cover is divide and conquer. Part of the trick of making a good divide and conquer algorithm is determining how a given problem could be separated into two or more similar, but smaller, subproblems. More generally, when we are creating a divide and conquer algorithm we will take the following steps:


Divide and Conquer Methodology
  1. Given a problem, identify a small number of significantly smaller subproblems of the same type
  2. Solve each subproblem recursively (the smallest possible size of a subproblem is a base-case)
  3. Combine these solutions into a solution for the main problem

The first algorithm we'll present using this methodology is the merge sort.

Merge Sort

The problem that merge sort solves is general sorting: given an unordered array of elements that have a total ordering, create an array that has the same elements sorted. More precisely, for an array a with indexes 1 through n, if the condition

for all i, j such that 1 ≤ i < jn then a[i] ≤ a[j]

holds, then a is said to be sorted. Here is the interface:

// sort -- returns a sorted copy of array a
function sort(array a): array

Following the divide and conquer methodology, how can a be broken up into smaller subproblems? Because a is an array of n elements, we might want to start by breaking the array into two arrays of size n/2 elements. These smaller arrays will also be unsorted and it is meaningful to sort these smaller problems; thus we can consider these smaller arrays "similar". Ignoring the base case for a moment, this reduces the problem into a different one: Given two sorted arrays, how can they be combined to form a single sorted array that contains all the elements of both given arrays:

// merge -- given a and b (assumed to be sorted) returns a merged array that
// preserves order
function merge(array a, array b): array

So far, following the methodology has led us to this point, but what about the base case? The base case is the part of the algorithm concerned with what happens when the problem cannot be broken into smaller subproblems. Here, the base case is when the array only has one element. The following is a sorting algorithm that faithfully sorts arrays of only zero or one elements:

// base-sort -- given an array of one element (or empty), return a copy of the
// array sorted
function base-sort(array a[1..n]): array
  assert (n <= 1)
  return a.copy()
end

Putting this together, here is what the methodology has told us to write so far:

// sort -- returns a sorted copy of array a
function sort(array a[1..n]): array
  if n <= 1: return a.copy()
  else:
    let sub_size := n / 2
    let first_half := sort(a[1,..,sub_size])
    let second_half := sort(a[sub_size + 1,..,n])
    
    return merge(first_half, second_half)
  fi
end

And, other than the unimplemented merge subroutine, this sorting algorithm is done! Before we cover how this algorithm works, here is how merge can be written:

// merge -- given a and b (assumed to be sorted) returns a merged array that
// preserves order
function merge(array a[1..n], array b[1..m]): array
  let result := new array[n + m]
  let i, j := 0
  
  for k := 1 to n + m:
    if i >= n: result[k] := b[j]; j += 1
    else-if j >= m: result[k] := a[i]; i += 1
    else:
      if a[i] < b[j]:
        result[k] := a[i]; i += 1
      else:
        result[k] := b[j]; j += 1
      fi
    fi
  repeat
end

[TODO: how it works; including correctness proof]

Analysis

Let T(n) be the number of steps the algorithm takes to run on input of size n.

Merging takes linear time and we recurse each time on two sub-problems of half the original size, so

T(n) = 2\cdot T(\frac{n}{2}) + O(n).

By the master theorem, we see that this recurrence has a "steady state" tree. Thus, the runtime is:

T(n) = O(n \cdot \log n).

Iterative Version

This merge sort algorithm can be turned into an iterative algorithm by starting with all pairs, then all fours, et cetera... However, because the recursive version's call tree is logarithmically deep, it does not require much run-time stack space: Even sorting 4 gigs of items would only require 32 call entries on the stack, a very modest amount considering if even each call required 256 bytes on the stack, it would only require 8 kilobytes. In addition, iterative algorithms tend to be faster in practice due to lack of function overhead.

The iterative version of mergesort is a minor modification to the recursive version - in fact we can reuse the earlier merging function. The algorithm works by merging small, sorted subsections of the original array to create larger subsections of the array which are sorted. To accomplish this, we iterate through the array with successively larger "strides".

// sort -- returns a sorted copy of array a
function sort_iterative(array a[1..n]): array     
   let result := a.copy()
   for power := 0 to log2(n-1)
     let unit := 2^power
     for i := 1 to n by unit*2
       let a1[1..unit] := result[i..i+unit-1]
       let a2[1..unit] := result[i+unit..min(i+unit*2-1, n)]
       result[i..i+unit*2-1] := merge(a1,a2)
     repeat
   repeat
   
   return result
end

This works because the array starts out as a set of chunks length 1 which are "sorted." Each iteration through the array (using counting variable i) doubles the size of sorted chunks by merging adjacent chunks into sorted larger versions. The current size of sorted chunks in the algorithm is represented by the unit variable.

Binary Search

Once an array is sorted, we can quickly locate items in the array by doing a binary search. Binary search is different from other divide and conquer algorithms in that it is mostly divide based (nothing needs to be conquered). The concept behind binary search will be useful for understanding the partition and quicksort algorithms, presented in the randomization chapter.

Finding an item in an already sorted array is similar to finding a name in a phonebook: you can start by flipping the book open toward the middle. If the name you're looking for is on that page, you stop. If you went too far, you can start the process again with the first half of the book. If the name you're searching for appears later than the page, you start from the second half of the book instead. You repeat this process, narrowing down your search space by half each time, until you find what you were looking for (or, alternatively, find where what you were looking for would have been if it were present).

The following algorithm states this procedure precisely:

// binary-search -- returns the index of value in the given array, or
// -1 if value cannot be found. Assumes array is sorted in ascending order
function binary-search(value, array A[1..n]): integer
  return search-inner(value, A, 1, n + 1)
end

// search-inner -- search subparts of the array; end is one past the
// last element 
function search-inner(value, array A, start, end): integer
  if start == end: 
     return -1                   // not found
  fi

  let length := end - start
  if length == 1:
    if value == A[start]:
      return start
    else:
      return -1 
    fi
  fi
  
  let mid := start + (length / 2)
  if value == A[mid]:
    return mid
  else-if value > A[mid]:
    return search-inner(value, A, mid + 1, end)
  else:
    return search-inner(value, A, start, mid)
  fi
end

Note that all recursive calls made are tail-calls, and thus the algorithm is iterative. We can explicitly remove the tail-calls if our programming language does not do that for us already by turning the argument values passed to the recursive call into assignments, and then looping to the top of the function body again:

// binary-search -- returns the index of value in the given array, or
// -1 if value cannot be found. Assumes array is sorted in ascending order
function binary-search(value, array A[1,..n]): integer
  let start := 1
  let end := n + 1
  
  loop:
    if start == end: return -1 fi                 // not found
  
    let length := end - start
    if length == 1:
      if value == A[start]: return start
      else: return -1 fi
    fi
  
    let mid := start + (length / 2)
    if value == A[mid]:
      return mid
    else-if value > A[mid]:
      start := mid + 1
    else:
      end := mid
    fi
  repeat
end

Even though we have an iterative algorithm, it's easier to reason about the recursive version. If the number of steps the algorithm takes is T(n), then we have the following recurrence that defines T(n):

T(n) = 1\cdot T(\frac{n}{2}) + O(1).

The size of each recursive call made is on half of the input size (n), and there is a constant amount of time spent outside of the recursion (i.e., computing length and mid will take the same amount of time, regardless of how many elements are in the array). By the master theorem, this recurrence has values a = 1,b = 2,k = 0, which is a "steady state" tree, and thus we use the steady state case that tells us that

T(n) = \Theta(n^k\cdot\log n) = \Theta(\log n).

Thus, this algorithm takes logarithmic time. Typically, even when n is large, it is safe to let the stack grow by logn activation records through recursive calls.

Integer Multiplication

If you want to perform arithmetic with small integers, you can simply use the built-in arithmetic hardware of your machine. However, if you wish to multiply integers larger than those that will fit into the standard "word" integer size of your computer, you will have to implement a multiplication algorithm in software. For example, RSA encryption needs to work with integers of very large size (that is, large relative to the 64-bit word size of many machines) and utilizes special multiplication algorithms.

Grade School Multiplication

How do we represent a large, multi-word integer? We can have a binary representation by using an array (or an allocated block of memory) of words to represent the bits of the large integer. Suppose now that we have two integers, X and Y, and we want to multiply them together. For simplicity, let's assume that both X and Y have n bits each (if one is shorter than the other, we can always pad on zeros at the beginning). The most basic way to multiply the integers is to use the grade school multiplication algorithm. This is even easier in binary, because we only multiply by 1 or 0:

         x6 x5 x4 x3 x2 x1 x0
      ×  y6 y5 y4 y3 y2 y1 y0
      -----------------------
         x6 x5 x4 x3 x2 x1 x0 (when y0 is 1; 0 otherwise)
      x6 x5 x4 x3 x2 x1 x0  0 (when y1 is 1; 0 otherwise)
   x6 x5 x4 x3 x2 x1 x0  0  0 (when y2 is 1; 0 otherwise)
x6 x5 x4 x3 x2 x1 x0  0  0  0 (when y3 is 1; 0 otherwise)
  ... et cetera

As an algorithm, here's what multiplication would look like:

// multiply -- return the product of two binary integers, both of length n
function multiply(bitarray x[1,..n], bitarray y[1,..n]): bitarray
  bitarray p = 0
  for i:=1 to n:
    if y[i] == 1:
      p := add(p, x)
    fi
    x := pad(x, 0)         // add another zero to the end of x
  repeat
  return p
end

The subroutine add adds two binary integers and returns the result, and the subroutine pad adds an extra digit to the end of the number (padding on a zero is the same thing as shifting the number to the left; which is the same as multiplying it by two). Here, we loop n times, and in the worst-case, we make n calls to add. The numbers given to add will at most be of length 2n. Further, we can expect that the add subroutine can be done in linear time. Thus, if n calls to a O(n) subroutine are made, then the algorithm takes O(n2) time.

Divide and Conquer Multiplication

As you may have figured, this isn't the end of the story. We've presented the "obvious" algorithm for multiplication; so let's see if a divide and conquer strategy can give us something better. One route we might want to try is breaking the integers up into two parts. For example, the integer x could be divided into two parts, xh and xl, for the high-order and low-order halves of x. For example, if x has n bits, we have

x = x_{h}\cdot 2^{n/2} + x_{l}

We could do the same for y:

y = y_{h}\cdot 2^{n/2} + y_{l}

But from this division into smaller parts, it's not clear how we can multiply these parts such that we can combine the results for the solution to the main problem. First, let's write out x\times y would be in such a system:

x\times y = x_h\times y_h\cdot (2^{n/2})^2 + (x_h\times y_l + x_l\times y_h)\cdot (2^{n/2}) + x_l\times y_l

This comes from simply multiplying the new hi/lo representations of x and y together. The multiplication of the smaller pieces are marked by the "\times" symbol. Note that the multiplies by 2n / 2 and (2n / 2)2 = 2n does not require a real multiplication: we can just pad on the right number of zeros instead. This suggests the following divide and conquer algorithm:

// multiply -- return the product of two binary integers, both of length n
function multiply(bitarray x[1,..n], bitarray y[1,..n]): bitarray
  if n == 1: return x[1] * y[1] fi          // multiply single digits: O(1)
  
  let xh := x[n/2 + 1, .., n]               // array slicing, O(n)
  let xl := x[0, .., n / 2]                 // array slicing, O(n)
  let yh := y[n/2 + 1, .., n]               // array slicing, O(n)
  let yl := y[0, .., n / 2]                 // array slicing, O(n)
  
  let a := multiply(xh, yh)                 // recursive call; T(n/2)
  let b := multiply(xh, yl)                 // recursive call; T(n/2)
  let c := multiply(xl, yh)                 // recursive call; T(n/2)
  let d := multiply(xl, yl)                 // recursive call; T(n/2)
  
  b := add(b, c)                            // regular addition; O(n)
  a := shift(a, n)                          // pad on zeros; O(n)
  b := shift(b, n/2)                        // pad on zeros; O(n)
  return add(a, b, d)                       // regular addition; O(n)
end

We can use the master theorem to analyze the running time of this algorithm. Assuming that the algorithm's running time is T(n), the comments show how much time each step takes. Because there are four recursive calls, each with an input of size n / 2, we have:

T(n) = 4T(n / 2) + O(n)

Here, a = 4,b = 2,k = 1, and given that 4 > 21 we are in the "bottom heavy" case and thus plugging in these values into the bottom heavy case of the master theorem gives us:

T(n)=O(n^{\log_2 4}) = O(n^2).

Thus, after all of that hard work, we're still no better off than the grade school algorithm! Luckily, numbers and polynomials are a data set we know additional information about. In fact, we can reduce the running time by doing some mathematical tricks.

First, let's replace the 2n / 2 with a variable, z:

x\times y = x_h*y_h z^2 + (x_h*y_l + x_l*y_h)z + x_l*y_l

This appears to be a quadratic formula, and we know that you only need three co-efficients or points on a graph in order to uniquely describe a quadratic formula. However, in our above algorithm we've been using four multiplications total. Let's try recasting x and y as linear functions:

P_x(z) = x_h\cdot z + x_l
P_y(z) = y_h\cdot z + y_l

Now, for x\times y we just need to compute (P_x\cdot P_y)(2^{n/2}). We'll evaluate Px(z) and Py(z) at three points. Three convenient points to evaluate the function will be at (P_x\cdot P_y)(1), (P_x\cdot P_y)(0), (P_x\cdot P_y)(-1):

[TODO: show how to make the two-parts breaking more efficient; then mention that the best multiplication uses the FFT, but don't actually cover that topic (which is saved for the advanced book)]

Base Conversion

[TODO: Convert numbers from decimal to binary quickly using DnC.]

Along with the binary, the science of computers employs bases 8 and 16 for it's very easy to convert between the three while using bases 8 and 16 shortens considerably number representations.

To represent 8 first digits in the binary system we need 3 bits. Thus we have, 0=000, 1=001, 2=010, 3=011, 4=100, 5=101, 6=110, 7=111. Assume M=(2065)8. In order to obtain its binary representation, replace each of the four digits with the corresponding triple of bits: 010 000 110 101. After removing the leading zeros, binary representation is immediate: M=(10000110101)2. (For the hexadecimal system conversion is quite similar, except that now one should use 4-bit representation of numbers below 16.) This fact follows from the general conversion algorithm and the observation that 8=23 (and, of course, 16=24). Thus it appears that the shortest way to convert numbers into the binary system is to first convert them into either octal or hexadecimal representation. Now let see how to implement the general algorithm programmatically.

For the sake of reference, representation of a number in a system with base (radix) N may only consist of digits that are less than N.

More accurately, if

(1)M = akNk + ak − 1Nk − 1 + ... + a1N1 + a0

with 0 < = ai < N we have a representation of M in base N system and write

M = (akak − 1...a0)N

If we rewrite (1) as

(2)M = a0 + N * (a1 + N * (a2 + N * ...))

the algorithm for obtaining coefficients ai becomes more obvious. For example, a_0=M\ modulo\ n and a_1=(M/N)\ modulo\ n, and so on.

Recursive Implementation

Let's represent the algorithm mnemonically: (result is a string or character variable where I shall accumulate the digits of the result one at a time)

result = "" 
if M < N, result = 'M' + result. Stop. 
S = M mod N, result = 'S' + result
M = M/N 
goto 2 

A few words of explanation.

"" is an empty string. You may remember it's a zero element for string concatenation. Here we check whether the conversion procedure is over. It's over if M is less than N in which case M is a digit (with some qualification for N>10) and no additional action is necessary. Just prepend it in front of all other digits obtained previously. The '+' plus sign stands for the string concatenation. If we got this far, M is not less than N. First we extract its remainder of division by N, prepend this digit to the result as described previously, and reassign M to be M/N. This says that the whole process should be repeated starting with step 2. I would like to have a function say called Conversion that takes two arguments M and N and returns representation of the number M in base N. The function might look like this

1 String Conversion(int M, int N) // return string, accept two integers 
2 {  
3     if (M < N) // see if it's time to return 
4         return new String(""+M); // ""+M makes a string out of a digit 
5     else // the time is not yet ripe 
6         return Conversion(M/N, N) +
           new String(""+(M mod N)); // continue 
7 }  

This is virtually a working Java function and it would look very much the same in C++ and require only a slight modification for C. As you see, at some point the function calls itself with a different first argument. One may say that the function is defined in terms of itself. Such functions are called recursive. (The best known recursive function is factorial: n!=n*(n-1)!.) The function calls (applies) itself to its arguments, and then (naturally) applies itself to its new arguments, and then ... and so on. We can be sure that the process will eventually stop because the sequence of arguments (the first ones) is decreasing. Thus sooner or later the first argument will be less than the second and the process will start emerging from the recursion, still a step at a time.

Iterative Implementation

Not all programming languages allow functions to call themselves recursively. Recursive functions may also be undesirable if process interruption might be expected for whatever reason. For example, in the Tower of Hanoi puzzle, the user may want to interrupt the demonstration being eager to test his or her understanding of the solution. There are complications due to the manner in which computers execute programs when one wishes to jump out of several levels of recursive calls.

Note however that the string produced by the conversion algorithm is obtained in the wrong order: all digits are computed first and then written into the string the last digit first. Recursive implementation easily got around this difficulty. With each invocation of the Conversion function, computer creates a new environment in which passed values of M, N, and the newly computed S are stored. Completing the function call, i.e. returning from the function we find the environment as it was before the call. Recursive functions store a sequence of computations implicitly. Eliminating recursive calls implies that we must manage to store the computed digits explicitly and then retrieve them in the reversed order.

In Computer Science such a mechanism is known as LIFO - Last In First Out. It's best implemented with a stack data structure. Stack admits only two operations: push and pop. Intuitively stack can be visualized as indeed a stack of objects. Objects are stacked on top of each other so that to retrieve an object one has to remove all the objects above the needed one. Obviously the only object available for immediate removal is the top one, i.e. the one that got on the stack last.

Then iterative implementation of the Conversion function might look as the following.

 1 String Conversion(int M, int N) // return string, accept two integers 
 2 {  
 3     Stack stack = new Stack(); // create a stack 
 4     while (M >= N) // now the repetitive loop is clearly seen 
 5     {  
 6         stack.push(M mod N); // store a digit 
 7         M = M/N; // find new M 
 8     }  
 9     // now it's time to collect the digits together  
10     String str = new String(""+M); // create a string with a single digit M 
11     while (stack.NotEmpty())  
12         str = str+stack.pop() // get from the stack next digit 
13     return str;  
14 }  

The function is by far longer than its recursive counterpart; but, as I said, sometimes it's the one you want to use, and sometimes it's the only one you may actually use.

Closest Pair of Points

For a set of points on a two-dimensional plane, if you want to find the closest two points, you could compare all of them to each other, at O(n2) time, or use a divide and conquer algorithm.

[TODO: describe the problem better, and show the n^2 algorithm]

[TODO: write the algorithm, include intuition, proof of correctness, and runtime analysis]

Use this link for the original document.

http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairDQ.html

Closest Pair: A Divide-and-Conquer Approach

Introduction

The brute force approach to the closest pair problem (i.e. checking every possible pair of points) takes quadratic time. We would now like to introduce a faster divide-and-conquer algorithm for solving the closest pair problem. Given a set of points in the plane S, our approach will be to split the set into two roughly equal halves (S1 and S2) for which we already have the solutions, and then to merge the halves in linear time to yield an O(nlogn) algorithm. However, the actual solution is far from obvious. It is possible that the the desired pair might have one point in S1 and one in S2, does this not force us once again to check all possible pairs of points? The divide-and-conquer approach presented here generalizes directly from the one dimensional algorithm we presented in the previous section.

Closest Pair in the Plane

Alright, we'll generalize our 1-D algorithm as directly as possible (see figure 3.2). Given a set of points S in the plane, we partition it into two subsets S1 and S2 by a vertical line l such that the points in S1 are to the left of l and those in S2 are to the right of l.

We now recursively solve the problem on these two sets obtaining minimum distances of d1 (for S1), and d2 (for S2). We let d be the minimum of these.

Now, identical to the 1-D case, if the closes pair of the whole set consists of one point from each subset, then these two points must be within d of l. This area is represented as the two strips P1 and P2 on either side of l

Up to now, we are completely in step with the 1-D case. At this point, however, the extra dimension causes some problems. We wish to determine if some point in say P1 is less than d away from another point in P2. However, in the plane, we don't have the luxury that we had on the line when we observed that only one point in each set can be within d of the median. In fact, in two dimensions, all of the points could be in the strip! This is disastrous, because we would have to compare n2 pairs of points to merge the set, and hence our divide-and-conquer algorithm wouldn't save us anything in terms of efficiency. Thankfully, we can make another life saving observation at this point. For any particular point p in one strip, only points that meet the following constraints in the other strip need to be checked:

  • those points within d of p in the direction of the other strip
  • those within d of p in the positive and negative y directions

Simply because points outside of this bounding box cannot be less than d units from p (see figure 3.3). It just so happens that because every point in this box is at least d apart, there can be at most six points within it (I won't let myself get away with that scot-free, click here to see the proof). Well this is simply fantastic news, because now we don't need to check all n2 points. All we have to do is sort the points in the strip by their y-coordinates and scan the points in order, checking each point against a maximum of 6 of its neighbors. This means at most 6*n comparisons are required to check all candidate pairs. However, since we sorted the points in the strip by their y-coordinates the process of merging our two subsets is not linear, but in fact takes O(nlogn) time. Hence our full algorithm is not yet O(nlogn), but it is still an improvement on the quadratic performance of the brute force approach (as we shall see in the next section). In section 3.4, we will demonstrate how to make this algorithm even more efficient by strengthening our recursive sub-solution.

Summary and Analysis of the 2-D Algorithm

We present here a step by step summary of the algorithm presented in the previous section, followed by a performance analysis. The algorithm is simply written in list form because I find pseudo-code to be burdensome and unnecessary when trying to understand an algorithm. Note that we pre-sort the points according to their x coordinates, and maintain another structure which holds the points sorted by their y values(for step 4), which in itself takes O(nlogn) time.

ClosestPair of a set of points:

  1. Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
  2. Let d be the minimal of the two minimal distances.
  3. Eliminate points that lie farther than d apart from l.
  4. Consider the remaining points according to their y-coordinates, which we have precomputed.
  5. Scan the remaining points in the y order and compute the distances of each point to all of its neighbors that are distanced no more than d(that's why we need it sorted according to y). Note that there are no more than 5 such points(see previous section).
  6. If any of these distances is less than d then update d.

Analysis:

  • Let us note T(n) as the efficiency of out algorithm
  • Step 1 takes 2T(n/2) (we apply our algorithm for both halves)
  • Step 3 takes O(n) time
  • Step 5 takes O(n) time (as we saw in the previous section)

so,

T(n) = 2T(n / 2) + O(n)

which, according the the Master Theorem, result

T(n) \isin O(nlogn)

Hence the merging of the sub-solutions is dominated by the sorting at step 4, and hence takes O(nlogn) time.

This must be repeated once for each level of recursion in the divide-and-conquer algorithm,

hence the whole of algorithm ClosestPair takes O(logn*nlogn) = O(nlog2n) time.

Improving the Algorithm

We can improve on this algorithm slightly by reducing the time it takes to achieve the y-coordinate sorting in Step 4. This is done by asking that the recursive solution computed in Step 1 returns the points in sorted order by their y coordinates. This will yield two sorted lists of points which need only be merged (a linear time operation) in Step 4 in order to yield a complete sorted list. Hence the revised algorithm involves making the following changes: Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate. Step 4: Merge the two sorted lists into one sorted list in O(n) time. Hence the merging process is now dominated by the linear time steps thereby yielding an O(nlogn) algorithm for finding the closest pair of a set of points in the plane.

Towers Of Hanoi Problem

[TODO: Write about the towers of hanoi algorithm and a program for it]

There are n distinct sized discs and three pegs such that discs are placed at the left peg in the order of their sizes. The smallest one is at the top while the largest one is at the bottom. This game is to move all the discs from the left peg

Rules

1) Only one disc can be moved in each step.

2) Only the disc at the top can be moved.

3) Any disc can only be placed on the top of a larger disc.

Solution

Intuitive Idea

In order to move the largest disc from the left peg to the middle peg, the smallest discs must be moved to the right peg first. After the largest one is moved. The smaller discs are then moved from the right peg to the middle peg.

Recurrence

Suppose n is the number of discs.

To move n discs from peg a to peg b,

1) If n>1 then move n-1 discs from peg a to peg c

2) Move n-th disc from peg a to peg b

3) If n>1 then move n-1 discs from peg c to peg a

Pseudocode

void hanoi(n,src,dst){
  if (n>1)
    hanoi(n-1,src,pegs-{src,dst});
  print "move n-th disc from src to dst";
  if (n>1)
    hanoi(n-1,pegs-{src,dst},dst);
}

Analysis

The analysis is trivial. T(n) = 2T(n − 1) + O(1) = O(2n)


Randomization

As deterministic algorithms are driven to their limits when one tries to solve hard problems with them, a useful technique to speed up the computation is randomization. In randomized algorithms, the algorithm has access to a random source, which can be imagined as tossing coins during the computation. Depending on the outcome of the toss, the algorithm may split up its computation path.

There are two main types of randomized algorithms: Las Vegas algorithms and Monte-Carlo algorithms. In Las Vegas algorithms, the algorithm may use the randomness to speed up the computation, but the algorithm must always return the correct answer to the input. Monte-Carlo algorithms do not have the former restriction, that is, they are allowed to give wrong return values. However, returning a wrong return values must have a small probability, otherwise that Monte-Carlo algorithm would not be of any use.

Many approximation algorithms use randomization.

Ordered Statistics

Before covering randomized techniques, we'll start with a deterministic problem that leads to a problem that utilitizes randomization. Suppose you have an unsorted array of values and you want to find

  • the maximum value,
  • the minimum value, and
  • the median value.

In the immortal words of one of our former computer science professors, "How can you do?"

find-max

First, it's relatively straightforward to find the largest element:

// find-max -- returns the maximum element
function find-max(array vals[1..n]): element
  let result := vals[1]
  for i from 2 to n:
    result := max(result, vals[i])
  repeat
  
  return result
end

As assignment of -\infty to result will work as well but this is a useless call to the max function since the first element compared gets set to result. By initializing result as such the function only requires n-1 comparisons. (Moreover, in languages capable of metaprogramming, the data type may not be strictly numerical and there might be no good way of assigning -\infty; using vals[1] is type-safe.)

A similar routine to find the minimum element can be done by calling the min function instead of the max function.

find-min-max

But now suppose you want to find the min and the max at the same time; here's one solution:

// find-min-max -- returns the minimum and maximum element of the given array
function find-min-max(array vals): pair
  return pair {find-min(vals), find-max(vals)}
end

Because find-max and find-min both make n-1 calls to the max or min functions (when vals has n elements), the total number of comparisons made in find-min-max is 2n − 2.

However, some redundant comparisons are being made. These redundancies can be removed by "weaving" together the min and max functions:

// find-min-max -- returns the minimum and maximum element of the given array
function find-min-max(array vals[1..n]): pair
  let min := \infty
  let max := -\infty
  
  if n is odd:
    min := max := vals[1]
    vals := vals[2,..,n]          // we can now assume n is even
    n := n - 1
  fi
  
  for i:=1 to n by 2:             // consider pairs of values in vals
    if vals[i] < vals[i + 1]:
      let a := vals[i]
      let b := vals[i + 1]
    else:
      let a := vals[i + 1]
      let b := vals[i]            // invariant: a <= b
    fi
    
    if a < min: min := a fi
    if b > max: max := b fi
  repeat
  
  return pair {min, max}
end

Here, we only loop n / 2 times instead of n times, but for each iteration we make three comparisons. Thus, the number of comparisons made is (3 / 2)n = 1.5n, resulting in a 3 / 4 speed up over the original algorithm.

Only three comparisons need to be made instead of four because, by construction, it's always the case that a\le b. (In the first part of the "if", we actually know more specifically that a < b, but under the else part, we can only conclude that a\le b.) This property is utilized by noting that a doesn't need to be compared with the current maximum, because b is already greater than or equal to a, and similarly, b doesn't need to be compared with the current minimum, because a is already less than or equal to b.

In software engineering, there is a struggle between using libraries versus writing customized algorithms. In this case, the min and max functions weren't used in order to get a faster find-min-max routine. Such an operation would probably not be the bottleneck in a real-life program: however, if testing reveals the routine should be faster, such an approach should be taken. Typically, the solution that reuses libraries is better overall than writing customized solutions. Techniques such as open implementation and aspect-oriented programming may help manage this contention to get the best of both worlds, but regardless it's a useful distinction to recognize.

find-median

Finally, we need to consider how to find the median value. One approach is to sort the array then extract the median from the position vals[n/2]:

// find-median -- returns the median element of vals
function find-median(array vals[1..n]): element
  assert (n > 0)
  
  sort(vals)
  return vals[n / 2]
end

If our values are not numbers close enough in value (or otherwise cannot be sorted by a radix sort) the sort above is going to require O(nlogn) steps.

However, it is possible to extract the nth-ordered statistic in O(n) time. The key is eliminating the sort: we don't actually require the entire array to be sorted in order to find the median, so there is some waste in sorting the entire array first. One technique we'll use to accomplish this is randomness.

Before presenting a non-sorting find-median function, we introduce a divide and conquer-style operation known as partitioning. What we want is a routine that finds a random element in the array and then partitions the array into three parts:

  1. elements that are less than or equal to the random element;
  2. elements that are equal to the random element; and
  3. elements that are greater than or equal to the random element.

These three sections are denoted by two integers: j and i. The partitioning is performed "in place" in the array:

// partition -- break the array three partitions based on a randomly picked element
function partition(array vals): pair{j, i}

Note that when the random element picked is actually represented three or more times in the array it's possible for entries in all three partitions to have the same value as the random element. While this operation may not sound very useful, it has a powerful property that can be exploited: When the partition operation completes, the randomly picked element will be in the same position in the array as it would be if the array were fully sorted!

This property might not sound so powerful, but recall the optimization for the find-min-max function: we noticed that by picking elements from the array in pairs and comparing them to each other first we could reduce the total number of comparisons needed (because the current min and max values need to be compared with only one value each, and not two). A similar concept is used here.

While the code for partition is not magical, it has some tricky boundary cases:

// partition -- break the array into three ordered partitions from a random element
function partition(array vals): pair{j, i}
  let m := 0
  let n := vals.length - 1
  let irand := random(m, n)   // returns any value from m to n
  let x := vals[irand]
  
  TODO: Still working on the code! (Testing it in C and Python, and I *sigh* have bugs)
end

We can use partition as a subroutine for a general find operation:

// find -- moves elements in vals such that location k holds the value it would when sorted
function find(array vals, integer k)
  assert (0 <= k < vals.length)        // k it must be a valid index
  if vals.length <= 1:
    return
  fi
  
  let pair (j, i) := partition(vals)
  if k <= i:
    find(a[0,..,i], k)
  else-if j <= k:
    find(a[j,..,n], k - j)
  fi
  TODO: debug this!
end

Which leads us to the punch-line:

 // find-median -- returns the median element of vals
function find-median(array vals): element
  assert (vals.length > 0)
  
  let median_index := vals.length / 2;
  find(vals, median_index)
  return vals[median_index]
end

One consideration that might cross your mind is "is the random call really necessary?" For example, instead of picking a random pivot, we could always pick the middle element instead. Given that our algorithm works with all possible arrays, we could conclude that the running time on average for all of the possible inputs is the same as our analysis that used the random function. The reasoning here is that under the set of all possible arrays, the middle element is going to be just as "random" as picking anything else. But there's a pitfall in this reasoning: Typically, the input to an algorithm in a program isn't random at all. For example, the input has a higher probability of being sorted than just by chance alone. Likewise, because it is real data from real programs, the data might have other patterns in it that could lead to suboptimal results.

To put this another way: for the randomized median finding algorithm, there is a very small probability it will run suboptimally, independent of what the input is; while for a deterministic algorithm that just picks the middle element, there is a greater chance it will run poorly on some of the most frequent input types it will receive. This leads us to the following guideline:

Randomization Guideline:
If your algorithm depends upon randomness, be sure you introduce the randomness yourself instead of depending upon the data to be random.

Note that there are "derandomization" techniques that can take an average-case fast algorithm and turn it into a fully deterministic algorithm. Sometimes the overhead of derandomization is so much that it requires very large datasets to get any gains. Nevertheless, derandomization in itself has theoretical value.

The randomized find algorithm was invented by C. A. R. "Tony" Hoare. While Hoare is an important figure in computer science, he may be best known in general circles for his quicksort algorithm, which we discuss in the next section.

Quicksort

The median-finding partitioning algorithm in the previous section is actually very close to the implementation of a full blown sorting algorithm.

Shuffling an Array

  This keeps data in during shuffle
  temporaryArray = { }
  This records if an item has been shuffled
  usedItemArray = { }
  Number of item in array
  itemNum = 0
  while ( itemNum != lengthOf( inputArray) ){
      usedItemArray[ itemNum ] = false None of the items have been shuffled
      itemNum = itemNum + 1
  }
  itemNum = 0 we'll use this again
  itemPosition = randdomNumber( 0 --- (lengthOf(inputArray) - 1 ))
  while( itemNum != lengthOf( inputArray ) ){
      while( usedItemArray[ itemPosition ] != false ){
          itemPosition = randdomNumber( 0 --- (lengthOf(inputArray) - 1 ))
      }
      temporaryArray[ itemPosition ] = inputArray[ itemNum ]
  }
  inputArray = temporaryArray


Equal Multivariate Polynomials

[TODO: as of now, there is no known deterministic polynomial time solution, but there is a randomized polytime solution. The canonical example used to be IsPrime, but a deterministic, polytime solution has been found.]

Skip Lists

[TODO: Talk about skips lists. The point is to show how randomization can sometimes make a structure easier to understand, compared to the complexity of balanced trees.]

Derandomization

[TODO: Deterministic algorithms for Quicksort exist that perform as well as quicksort in the average case and are guaranteed to perform at least that well in all cases. Best of all, no randomization is needed. Also in the discussion should be some perspective on using randomization: some randomized algorithms give you better confidence probabilities than the actual hardware itself! (e.g. sunspots can randomly flip bits in hardware, causing failure, which is a risk we take quite often)]

[Main idea: Look at all blocks of 5 elements, and pick the median (O(1) to pick), put all medians into an array (O(n)), recursively pick the medians of that array, repeat until you have < 5 elements in the array. This recursive median constructing of every five elements takes time T(n)=T(n/5) + O(n), which by the master theorem is O(n). Thus, in O(n) we can find the right pivot. Need to show that this pivot is sufficiently good so that we're still O(n log n) no matter what the input is. This version of quicksort doesn't need rand, and it never performs poorly. Still need to show that element picked out is sufficiently good for a pivot.]

Exercises

  1. Write a find-min function and run it on several different inputs to demonstrate its correctness.


Backtracking

Backtracking is a general algorithmic technique that considers searching every possible combination in order to solve an optimization problem. Backtracking is also known as depth-first search or branch and bound. By inserting more knowledge of the problem, the search tree can be pruned to avoid considering cases that don't look promising. While backtracking is useful for hard problems to which we do not know more efficient solutions, it is a poor solution for the everyday problems that other techniques are much better at solving.

However, dynamic programming and greedy algorithms can be thought of as optimizations to backtracking, so the general technique behind backtracking is useful for understanding these more advanced concepts. Learning and understanding backtracking techniques first provides a good stepping stone to these more advanced techniques because you won't have to learn several new concepts all at once.

Backtracking Methodology
  1. View picking a solution as a sequence of choices
  2. For each choice, consider every option recursively
  3. Return the best solution found

This methodology is generic enough that it can be applied to most problems. However, even when taking care to improve a backtracking algorithm, it will probably still take exponential time rather than polynomial time. Additionally, exact time analysis of backtracking algorithms can be extremely difficult: instead, simpler upperbounds that may not be tight are given.

Longest Common Subsequence (exhaustive version)

Note that the solution to the longest common subsequence (LCS) problem discussed in this section is not efficient. However, it is useful for understanding the dynamic programming version of the algorithm that is covered later.

The LCS problem is similar to what the Unix "diff" program does. The diff command in Unix takes two text files, A and B, as input and outputs the differences line-by-line from A and B. For example, diff can show you that lines missing from A have been added to B, and lines present in A have been removed from B. The goal is to get a list of additions and removals that could be used to transform A to B. An overly conservative solution to the problem would say that all lines from A were removed, and that all lines from B were added. While this would solve the problem in a crude sense, we are concerned with the minimal number of additions and removals to achieve a correct transformation. Consider how you may implement a solution to this problem yourself.

The LCS problem, instead of dealing with lines in text files, is concerned with finding common items between two different arrays. For example,

let a := array {"The", "great", "square", "has", "no", "corners"}
let b := array {"The", "great", "image", "has", "no", "form"}

We want to find the longest subsequence possible of items that are found in both a and b in the same order. The LCS of a and b is

"The", "great", "has", "no"

Now consider two more sequences:

let c := array {1, 2, 4, 8, 16, 32}
let d := array {1, 2, 3, 32, 8}

Here, there are two longest common subsequences of c and d:

1, 2, 32; and
1, 2, 8

Note that

1, 2, 32, 8

is not a common subsequence, because it is only a valid subsequence of d and not c (because c has 8 before the 32). Thus, we can conclude that for some cases, solutions to the LCS problem are not unique. If we had more information about the sequences available we might prefer one subsequence to another: for example, if the sequences were lines of text in computer programs, we might choose the subsequences that would keep function definitions or paired comment delimiters intact (instead of choosing delimiters that were not paired in the syntax).

On the top level, our problem is to implement the following function

// lcs -- returns the longest common subsequence of a and b
function lcs(array a, array b): array

which takes in two arrays as input and outputs the subsequence array.

How do you solve this problem? You could start by noticing that if the two sequences start with the same word, then the longest common subsequence always contains that word. You can automatically put that word on your list, and you would have just reduced the problem to finding the longest common subset of the rest of the two lists. Thus, the problem was made smaller, which is good because it shows progress was made.

But if the two lists do not begin with the same word, then one, or both, of the first element in a or the first element in b do not belong in the longest common subsequence. But yet, one of them might be. How do you determine which one, if any, to add?

The solution can be thought in terms of the back tracking methodology: Try it both ways and see! Either way, the two sub-problems are manipulating smaller lists, so you know that the recursion will eventually terminate. Whichever trial results in the longer common subsequence is the winner.

Instead of "throwing it away" by deleting the item from the array we use array slices. For example, the slice

a[1,..,5]

represents the elements

{a[1], a[2], a[3], a[4], a[5]}

of the array as an array itself. If your language doesn't support slices you'll have to pass beginning and/or ending indices along with the full array. Here, the slices are only of the form

a[1,..]

which, when using 0 as the index to the first element in the array, results in an array slice that doesn't have the 0th element. (Thus, a non-sliced version of this algorithm would only need to pass the beginning valid index around instead, and that value would have to be subtracted from the complete array's length to get the pseudo-slice's length.)

// lcs -- returns the longest common subsequence of a and b
function lcs(array a, array b): array
  if a.length == 0 OR b.length == 0:
    // if we're at the end of either list, then the lcs is empty
    
    return new array {}
  else-if a[0] == b[0]:
    // if the start element is the same in both, then it is on the lcs,
    // so we just recurse on the remainder of both lists.
    
    return append(new array {a[0]}, lcs(a[1,..], b[1,..]))
  else
    // we don't know which list we should discard from. Try both ways,
    // pick whichever is better.
    
    let discard_a := lcs(a[1,..], b)
    let discard_b := lcs(a, b[1,..])
    
    if discard_a.length > discard_b.length:
      let result := discard_a
    else
      let result := discard_b
    fi
    return result
  fi
end

Shortest Path Problem (exhaustive version)

To be improved as Dijkstra's algorithm in a later section.

Largest Independent Set

Bounding Searches

If you've already found something "better" and you're on a branch that will never be as good as the one you already saw, you can terminate that branch early. (Example to use: sum of numbers beginning with 1 2, and then each number following is a sum of any of the numbers plus the last number. Show performance improvements.)

Constrained 3-Coloring

This problem doesn't have immediate self-similarity, so the problem first needs to be generalized. Methodology: If there's no self-similarity, try to generalize the problem until it has it.

Traveling Salesperson Problem

Here, backtracking is one of the best solutions known.


Dynamic Programming

Dynamic programming can be thought of as an optimization technique for particular classes of backtracking algorithms where subproblems are repeatedly solved. Note that the term dynamic in dynamic programming should not be confused with dynamic programming languages, like Scheme or Lisp. Nor should the term programming be confused with the act of writing computer programs. In the context of algorithms, dynamic programming always refers to the technique of filling in a table with values computed from other table values. (It's dynamic because the values in the table are fill