This paper presents a general methodology for the communication-efficient parallelization of graph algorithms using the divide-and-conquer approach and shows that this class of problems can be solved in cluster environments with good communication efficiency. Specifically, the first practical parallel algorithm, based on a general coarse-grained model, for finding Hamiltonian paths in tournaments is presented. On any such parallel machines, this algorithm uses only (3 log p + 1), where p is the number of processors, communication rounds, which is independent of the tournament size, and can reuse the existing linear-time algorithm in the sequential setting. For theoretical completeness, the algorithm is revised for fine-grained models, where the ratio of computation and communication throughputs is low or the local memory size, Owith O(log p) communication rounds, while the hidden constant grows with the scalability factor 1/ . Experiments have been carried out on a Linux cluster of 32 Sun Ultra5 computers and an SGI Origin 2000 with 32 R10000 processors. The algorithm performance on the Linux Cluster reaches 75% of the performance on the SGI Origin 2000 when the tournament size is about one million.