Searching algorithms

From Wikibooks, open books for an open world
Jump to navigation Jump to search

PAPER 1 - ⇑ Fundamentals of algorithms ⇑

← Reverse Polish Searching algorithms Sorting algorithms →


A searching algorithm looks for a given item in a given data structure. The algorithm used depends on how the data is structured.

Linear Search[edit | edit source]

If you have a list (or array) that is not sorted, then the simplest searching algorithm is linear search: go through the list item by item and compare to the searched item. If a comparison succeeds, the algorithm has found the item. If all comparisons fail, the item doesn't exist in the array or list.

In the simplest variant, the algorithm returns a boolean to indicate success or failure. Here is the pseudo-code:

for each item in the list:
    if that item has the desired value then
        stop the search and return true
return false

which can be directly translated to Python:

def exists(soughtItem, aList):
  """Return True if and only if soughtItem occurs in aList."""
  for item in aList:
    if item == soughtItem:
      return True
  return False

# automatic tests
assert exists(2, [2, 1, 3]) # sought item in first position
assert exists(3, [2, 1, 3]) # sought item in last position
assert not exists(3, []) # list is empty
assert not exists(0, [2, 1, 3]) # sought item doesn't exist

A second variant returns the position of the item in the list, if it exists. If it doesn't, the algorithm returns an impossible position, like -1. Here's the pseudo-code:

For each position in the list:
    If the item at that position has the desired value then
        stop the search and return the position
Return -1

Here is the Python code:

def index(soughtItem, aList):
  """Return the position of soughtItem in aList if it exists, otherwise return -1."""
  for position in range(len(aList)):
    if aList[position] == soughtItem:
      return position
  return -1

# automatic tests
assert position(2, [2, 1, 3]) == 0 # sought item in first position
assert position(3, [2, 1, 3]) == 2 # sought item in last position
assert position(3, []) == -1 # list is empty
assert position(0, [2, 1, 3]) == -1 # sought item doesn't exist

The following complete VB program asks the user for a letter and searches it in an array.

dim items() = {"h","g","a","d","w","n","o","q","l","b","c"}
dim searchItem as string

console.write("What are you searching for: ")
searchItem = console.readline()

For x = 0 to 10
  If items(x) = searchItem Then
    console.writeline("Found item " & searchItem & " at position " & x)
    Exit For
  End If
Next
console.writeline(-1)

Try the code above searching for letter "w" and then for letter "z":

   Code Output

What are you searching for: w
Found item w at position 4

   Code Output

What are you searching for: z
-1

Exercise: Linear Search

Consider the list of strings "Cat","Mouse","Frog","Lion","Panda","Llama","Bee", in this order. How many comparisons would it take to find "Panda"?

Answer:

5

And how many when searching for "Camel"?

Answer:

It would take 7 to reach the conclusion the string is not in the list.

Make a trace table for the code above, where searchItem = "d".

Answer:

searchItem x Output item
0 1 2 3 4 5 6 7 8 9 10
d 0 h g a d w n o q l b c
1
2
3 Found item d at position 3

For a list with items, what is the maximum number of comparisons it would take to see if an item is there or not?

Answer:

It would take comparisons if the item is in the last position or not at all in the list.

Binary Search[edit | edit source]

Binary Search
ClassSearch algorithm
Data structureArray
Worst case performanceO(log n)
Best case performanceO(1)
Average case performanceO(log n)
Worst case space complexityO(1)

As the last question points out, a linear search may take as many comparisons as there are items in the list, so searching for a name among a list of several million names (e.g. the electoral register of a country) could take a very long time.

If your list was put in ascending order by a sorting algorithm then you can perform a binary search. This involves splitting the data into half at each comparison, thereby 'zooming in' more quickly into the part of the list where the item must be, if it exists in the list. This results in much better performance, as the side box shows (the Big-O notation is explained in Unit 4).

  • Let's search for the name Miles in the following sorted list:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • We compare Miles to the middle name, Mohammed:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • Miles comes alphabetically before Mohammed, so we know that Miles won't be to the right of Mohammed. We can thus 'throw away' the right half of the list:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • We now compare Miles to the middle name in the remaining list, Bernie:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • Miles comes alphabetically after Bernie, so we can throw the left hand side away:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • Finally we compare Miles to the middle name of this single item list, Claire:
Ali, Bernie, Claire, Mohammed, Peter, Simon, Yvonne
  • Miles isn't the same as Claire, there are no more items to compare so we know that Miles isn't in the list.

This only took 3 comparisons using binary search, it would have taken 7 using linear search. It gets even better when the list is large. For example, a 1,000,000,000 item list would only take a maximum of 30 comparisons using binary search! It's very useful to have sorted data.

Exercise:Linear vs Binary Search

Sorted data is also useful for linear search. How could a modified linear search algorithm make fewer than 7 comparisons when searching Miles?

Answer:

The modified linear search would know after 4 comparisons (against Ali, Bernie, Claire and Mohammed) that Miles is not in the sorted list, because Miles should have appeared before Mohammed.

For the list of 7 names shown above, can you think of a case where linear search is faster than binary search?

Answer:

If we search for the first item in the list, Ali, binary search still takes 3 comparisons (against Mohammed, Bernie and Ali) but linear search only needs 1 comparison.

For linear search of a large list, the best case is if the sought item is in the first position. What is the best case for binary search of a large list?

Answer:

Binary search only requires 1 comparison If the sought item is in the middle of the list.

After each unsuccessful comparison, binary search reduces the search space by half. The sublist that is being searched can be represented by two integers, with the start and end positions of the sublist. The Python code is:

def index(soughtItem, sortedList):
  """Return the position of soughtItem in sortedList if it exists, otherwise return -1.
  sortedList must be in ascending order."""
  # Initially, the sublist is the whole list of N items, from positions 0 to N-1
  start = 0
  end = len(sortedList) - 1
  while start <= end:                    # while the sublist is not empty
    middle = (start + end) // 2
    if soughtItem == sortedList[middle]: # the item is in the middle of the sublist
        return middle
    if soughtItem >  sortedList[middle]: # the item is in the right half
        start = middle + 1  
    if soughtItem <  sortedList[middle]: # the item is in the left half
        end = middle - 1   
  return -1                              # empty sublist, the item doesn't exist
  
# tests
assert index(3, [1, 2, 3]) == 2
assert index(1, [1, 2, 3]) == 0
assert index(1, []) == -1
assert index(0, [1, 2, 3]) == -1


UNIT 3 - ⇑ Fundamentals of algorithms ⇑

← Reverse Polish Searching algorithms Sorting algorithms →