Relearning C - Mastering Algorithms with C - Chapter 4


Chapter 4 - Analysis of Algorithms

This chapter covers:

Worst-Case Analysis

Most algorithms do not perform the same in all cases; normally an algorithm’s performance varies with the data passed to it. Typically, three cases are recognised: the best case, worst case, and average case. For any algorithm, understanding what constitutes each of these cases is an important part of analysis because performance can vary significantly between them. Consider even a simple algorithm such as linear search. Linear search is a natural but inneficient search technique in which we look for an element simply by traversing a set from one end to the other. In the best case, the element we are looking for is the first element we inspect, so we end up traversing only a single element. In the worst case, however, the desired element is the last one we inspect, in which case we end up traversing all of the elements. On average, we can expect to find the element somewhere in the middle.

Reasons for Worst-Case Analysis

A basic understanding of how an algorithm performs in all cases is important, but usually we are most interested in how an algorithm performs in the worst case. There are four reasons why algorithms are generally analysed by their worst case: - Many algorithms perform to their worst case a large part of the time. For example, the worst case in searching occurs when we do not find what we are looking for at all. Imagine how frequently this takes place in some database applications. - The best case is not very informative because many algorithms perform exactly the same in the best case. For example, nearly all searching algorithms can locate an element in one inspection at best, so analysing this case does not tell us much. - Determining average-case performance is not always easy Often it is difficult to determine exactly what the “average case” even is. Since we can seldom guarantee precisely how an algorithm will be exercised, usually we cannot obtain an average-case measurement that is likely to be accurate. - The worst case gives us an upport bound on performance. Analysing an algorithm’s worst case guarantees that it will never perform worse than what we determine. Therefore, we know that the other cases must perform at least as well.

Although worst-case analysis is the metric for many algorithms, it is worth noting that there are exceptions. Sometimes special circumstances let us base performance on the average case. For example, randomised algorithms such as quicksort use principles of probability to virtually guarantee average-case performance.

\(\mathcal{O}\)-Notation

\(\mathcal{O}\)-notation is the most common notation used to express an algorithm’s performance in a formal manner. Formally, \(\mathcal{O}\)-notation expresses the upper bound of a function within a constant factor. Specifically, if \(g(n)\) is an upper bound of \(f(n)\), then for some constant \(c\), it is possible to find a value of \(n\), call it \(n_0\), for which any value of \(n >= n_0\) will result in \(f(n) <= cg(n)\).

Normally we express an algorithm’s performance as a function of the size of the data it processes. That is, for some data of size \(n\), we describe its performance with some function \(f(n)\). However, while in many cases we can determine \(f\) exactly, usually it is not necessary to be this precise Primarily we are interested only in the growth rate of \(f\), which describes how quickly the algorithm’s performance will degrade as the size of the data it processes becomes arbitrarily large. An algorithm’s growth rate, or order of growth, is significant because ultimately it describes how efficient the algorithm inputs. \(\mathcal{O}\)-notation reflects an algorithm’s order of growth.

Simple Rules for \(\mathcal{O}\)-Notation

When we look at some function \(f(n)\) in terms of its growth rate, a few things become apparent. First, we can ignore constant terms because as the value of n becomes larger and larger, eventually constant terms will become insignificant. Second, we can ignore constant multipliers of terms because they too will become insignificant as the value of \(n\) increases. Finally, we need only consider the highest-order term because, again, as \(n\) increases, higher-order terms quickly outweight the lower-order ones. These ideas are formalised in the following simple rules for expressing functions in \(\mathcal{O}\)-notation.