Skip to main content

Example 2: Analysis of Binary Search

The analysis of a binary search is not the same as that of linear search because the loop of a binary search does not follow the pattern of going from the start of the array all the way to the end. It starts in the middle of an array and jump around. Not every element will be considered during the search process so this will be a bit different.

info

A binary search can only be done to lists if two criteria are true:

  1. the list must be sorted
  2. access to any element given its position must have a constant run time.
def binary_search(my_list, key):
rc = -1
low = 0
high = len(my_list) - 1

while rc == -1 and low <= high:
mid = (low + high) // 2
if key < my_list[mid]:
high = mid - 1
elif key > my_list[mid]:
low = mid + 1
else:
rc = mid

return rc

In the above code, there are two variables low and high. These represent the first and last index of the elements where key may be located. In other words, the only possible places key may exist is somewhere between my_list[low] and my_list[high] inclusive. In each iteration of the while loop we check the middle element between them. If key is smaller than the middle element, we move the high marker so that it is to one element before the middle one. If key is bigger, we move the low marker one element after the middle one. In other words we eliminate half of the remaining possibilities of key's location after each iteration of the while loop. We stop when we either find it or we have eliminated all possible locations for key.

Similar to a linear search we should analyze the run time for what happens in its worst case. In this case, it is also performing an unsuccessful search

Let n represent the size of the array

Let T(n) represent the number of operations needed to find key within the array

In a binary search the following operators are done exactly one time:

    rc = -1                   # 1
low = 0 # 1
high = len(my_list) - 1 # 1 operator here + 1 function call + 1 operation for assignment

return rc # 1

The following are done for each iteration within the loop:

    rc == -1 and low <= high   # 3 ops per iteration of loop
mid = (low + high) // 2 # 3 ops per iteration of loop

Within the loop there is also an if/else if/else statement. there are 3 paths through this statement. The path that would would lead the the highest operation count would be to follow the else-if. The reason is that we would end up doing both checks. and there are more operations in the else-if than the else statement. Thus, we should use that pathway

if key < my_list[mid]:          # <---check this (1 op)
high = mid - 1 # <---skip, check fails
elif key > my_list[mid]: # <---check (1 op)
low = mid + 1 # <---do this (2 ops)
else: # <---skip
rc = mid # <---skip

Thus what we can say is that each time the loop runs, it will perform 10 operations at worst

Number of Loop Iterations

The next question then is to figure out how many times this loop will run. We are of course assuming that the search will fail. Thus, the only way for this loop to stop is for low<=high to become untrue.

When we started off we have n elements to search. After one iteration through the loop we have approximately half of the original number of elements. After one more iteration we eliminate another half of the remaining elements.

the last iteration would involve only 1 element because low == high == mid, after that we would end up with low > high

Lets look at the pattern of the relationship between number elements within the search set and the iteration number.

Loop iterationNumber of elements between low and high inclusive (approx)Denominator
1n=n1n = {n \over 1}1=201 = 2^0
2n2n\over22=212 = 2^1
3n2/2=n4{n\over2}/2 = {n\over4}4=224 = 2^2
4n4/2=n8{n\over4}/2 = {n\over8}8=238 = 2^3
5n8/2=n16{n\over8}/2 = {n\over16}16=2416 = 2^4
.........
??1=nn1 = {n \over n}n=2?n = 2^ ?

Now look at the exponent 2 was raised to in the denominator and the relationship with its loop iteration. If we can find what we need to raise 2 to in order to get n we will be able to find the number of iterations the loop will run.

By definition:

bx=a​{b^x} = a iff logba=xlog_ba = x

In other words, xx is the exponent we need to raise bb to in order to get aa . That value is logbalog_ba. Applied to our situation, the exponent we must raise 22 to in order to get nn would be log2nlog_2n.

Our loop runs one time more than the exponent... thus our loop will run 1+logn1+logn time.

Putting these ideas together, we can summarize the operation counts as follows:

def binary_search(my_list, key):
rc = -1 # 1
low = 0 # 1
high = len(my_list) - 1 # 3

while rc == -1 and low <= high: # 3 * (1 + log_n)
mid = (low + high) // 2 # 3 * (1 + log_n)
if key < my_list[mid]: # (1 + log_n)
high = mid - 1
elif key > my_list[mid]: # (1 + log_n)
low = mid + 1 # 2 * (1 + log_n)
else:
rc = mid

return rc # 1

Adding it all up we find the following expression for T(n)T(n)

T(n)=10(1+logn)+6=10+10logn+6=(10(logn))+16\begin{aligned} T(n) &= 10 ( 1 + logn) + 6 \\ &= 10 + 10logn + 6 \\ &= (10( log n)) + 16 \end{aligned}

The dominating term is 10(logn)10(logn) . Thus, T(n)T(n) is O(logn)O(logn)