AP Computer Science/Sorting

Sorting and searching are two commonly used operations in computer science. For the AP exam, you will be required to know certain methods of sorting through collections, called sorting algorithms, as well as certain ways of searching for given items in collections, called searching algorithms.

Sorting

There are many common algorithms used for sorting through collections. Here are some important ones to know:

Selection Sort

Selection sort is an iterative sort algorithm that uses a "search and swap" approach to sort a collection. For each pass through the collection, the algorithm finds the smallest element to be sorted and swaps it with the first unsorted element in the collection. The algorithm continues in this manner, finding the next smallest element in the collection and swapping it with the next unsorted element. Finally, when just two unsorted elements remain, they are compared (and if necessary, swapped) and the sort is complete.

Efficiency

The efficiency of the selection sort algorithm and the number of comparisons made do not depend on the initial order of the elements in the collection; the same number of comparisons is made, regardless of initial order. For a collection of n elements, the collection is sorted after n-1 passes. After the kth pass, the first k elements have been sorted and are placed in their final sorted positions. In the kth pass, n-k comparisons are made. The total number of comparisons made after the sort is complete is n(n-1)/2. The selection sort algorithm has an efficiency of O(n²) in the best, average, and worst cases. In other words, there is no best or worst case for selection sort; every case is considered average.

Insertion Sort

Insertion sort is also an iterative sort algorithm. When using this algorithm, the first element is "sorted" with respect to itself. The algorithm then takes the second element and compares it with the first, inserting it in front of the first one if necessary. From this point forward, the algorithm takes the next unsorted element and compares it with the sorted elements, inserting the element in the correct position as needed. The algorithm continues in this manner, shifting sorted elements as needed to make room for the next element to be inserted.

Efficiency

The efficiency of the insertion sort algorithm and the number of comparisons made do depend on the initial order of the elements in the collection. For an array of n elements, the array is sorted after n-1 passes. The efficiency differs in the best, average, and worst cases.

The best case for insertion cases occurs if the collection is already initially sorted in order. Each pass through the array will require only one comparison, which will indicate that the element being compared is already in its correct position with respect to the sorted list. No elements will need to be moved. The total number of comparisons made is (n-1). In this case, the insertion sort algorithm has an efficiency of O(n).
The worst case for insertion sort occurs if the collection is initially sorted in reverse order, which will result in the maximum possible number of comparisons and moves being needed to sort the collection. After the kth pass, the first k+1 elements are sorted with respect to each other but are not placed in their final sorted positions. In the kth pass, k comparisons are made. The total number of comparisons made after the sort is complete is n(n-1)/2. In this case, the insertion sort algorithm has an efficiency of O(n²).

Mergesort

Mergesort is a recursive algorithm that uses a "divide and conquer" approach to sorting collections. Each time the algorithm is called, the algorithm checks to see if there is more than one element in the collection, and if there is, the collection is "broken" into two halves. If there is even number of elements, the halves are equal, but if there is an odd number of elements, the left half will contain one more element than the right half. At this point, the algorithm uses recursion, calling itself to first mergesort the left half of the collection, then mergesort the right half. When the algorithm calls itself to mergesort one of the halves, it again "breaks" the half into two halves, then calls itself to sort each half. This process is repeated until the entire collection is "broken" into individual elements, or subcollections of length 1. When the method calls itself to sort one of these, the initial test that sees if there is more than one element in the subcollection fails, since there is only one element in the subcollection. This is where the recursion stops, and the algorithm returns to sorting the half by comparing the two adjacent elements, sorting them, and then recombining them into a sorted subcollection. This process continues until all of the individual elements have been sorted and recombined into sorted subcollections. Then the subcollections themselves are compared, sorted, and recombined. This process continues until the subcollections have been recombined to form two sorted halves. Finally, the left and right halves are compared to each other, sorted, and recombined to create the final, fully sorted collection.

Pseudocode

Here is a pseudocode representation of the mergesort algorithm:

If there is more than one element in the collection
     Break the collection into two halves
     Mergesort the left half
     Mergesort the right half
     Compare the two halves
     Merge the two subcollections into a sorted collection

Efficiency

The efficiency of mergesort is not affected by the initial ordering of the elements. Because mergesort requires the use of temporary collections that are as large as the collections to be sorted, it should not be used if only limited memory space is available. This algorithm consists of multiple parts:

The part of the algorithm that merges the subcollections first compares each element in the subcollections, which is an O(n) process, then copies the elements from the temporary collection back into the original collection, which is also an O(n) process. This makes the actual merging of the algorithm requires a total of 2n operations, making this part of the algorithm an O(n) process. (Remember that constants are ignored in Big-O notation.)
The "breaking" of the collection of n elements into n subcollections of one element each requires log₂n divisions, which is an O(logn) process.

Remember that for each division of the collection into subcollections, the merging part of the algorithm must be called. This makes the mergesort algorithm have an efficiency of O(nlogn) in the best, average, and worst cases. In other words, there is no best or worst case because every case is considered average.

Quicksort

Quicksort is, on average, the fastest sorting algorithm for sorting collections with a large number of elements. Quicksort is recursive and also uses a "divide and conquer" approach to sorting. This algorithm starts by partitioning the collection, selecting a pivot point, which is usually either the first element in the collection or an element selected at random, then moving all elements less than the pivot point value to the left of the pivot point, and all elements greater than or equal to the pivot point value to the right of the pivot point. The algorithm then uses recursion, splitting the array into two halves and calling itself to quicksort each half. Finally, the sort is complete when every element has been moved to its correct location.

Pseudocode

Here is a pseudocode representation of the quicksort algorithm:

If there are at least two elements in the collection
     Partition the collection
     Quicksort the left collection
     Quicksort the right collection

Searching

Linear Search Involves going through each element of a collection until the appropriate value is found. This is often implemented using a simple for loop, where you begin at index 0 and count up to the last index in the collection. This is the simplest search method, and becomes less effective if the collection is very large.

Binary Search Involves going through a collection and comparing the middle index against the value. This requires a sorted collection in order to work.

For Example: You are trying to find the number 12 in an array of 100 integers, with values equal to the index+1 (1,2,3,4,5,..,99,100). Notice that this array is sorted with the smallest values on the left and the largest at the right. You would compare the middle value (50) against the search value (12). Since 50>12, you would exclude all the values 50 to 100 since they are also greater than the search value. You would then check 25 against the search value since that is the new middle between 49 and 1. Once again this value is larger than the search term and so all number 25 to 49 are excluded. This process continues until you finally arrive with 12 as the center value.