INTRODUCTIONParallel computation offers the promise of great improvements in the solution of problems that, if we were restricted to sequential computation, would take so much time that solution would be impractical.A drawback to the use of parallel computers is that they are harder to program. For this reason, parallel computation is often restricted to simple problems such as matrix multiplication. Certainly this is useful, and in fact we shall see later some non-obvious uses of matrix manipulation, but many important problems are more complex. In particular, problems may be structured as graphs or trees, rather than in the regular order of a matrix.We describe a number of techniques that have been developed for solving such combinatorial problems. Our intent is to show how these tools can be used as building blocks for higher level algorithms, and to provide pointers to the literature for the details of these algorithms. We make no claim to completeness; a number of techniques have been omitted for brevity or because their chief application is not combinatorial. In particular we give very little attention to sorting, although it is used as a subroutine in a number of the algorithms we describe.We use a shared memory model of parallelism; nevertheless we hope that the techniques described will be useful not only on shared memory machines, but also on other types of parallel computers, either through simulations of shared memory [48,70,35,57], or through analogous techniques for different models of parallel computation (e.g. see [30]).
THE MODEL OF PARALLELISMThe model of parallel computation we use is the shared memory parallel random access machine (PRAM). This model consists of a collection of identical processors and a separate collection of memory cells; any processor can access any memory cell in unit time. Processors are allowed to perform any other operation that a typical sequential processor could do. Processors can know the size of their input, and the number of processors is assumed to be a function of that size, but the program controlling the processors is the same for all input sizes.Practically, one would like a PRAM algorithm to be as efficient as possible; that is, we want to minimize the number of operations, which may be computed as the product of the algorithm time and the number of processors. Clearly there must be at least as many operations as the best known sequential time; a parallel algorithm that meets this bound is called optimal. A less restrictive condition is that the operations are within a polylogarithmic factor of optimality; we call such algorithms almost optimal.Theoretically, we would like as small a running time as possible. In particular, an important class of algorithms, NC, is defined to be those taking polylogarithmic time with polynomially many processors.Many PRAM algorithms are both in NC and almost optimal. For a number of other problems, the best known NC algorithm takes a number of processors that is a small polynomial of the input size, and so for small problem ...