Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random.he generation of a multiple sequence alignment (MSA) is standard practice during most comparative analyses of homologous genes or proteins. Since the mid-1980s, most automated MSAs have been made using a heuristic approach that Feng and Doolittle (1) called "progressive alignment." This involves clustering the sequences into a tree or dendrogram-like structure, called a "guide tree" in Higgins et al. (2). This guide tree is then used to align the sequences into progressively larger and larger alignments, following the branching order in the tree. Variations on the method were described by various groups in the 1980s [e.g., Taylor (3) and Barton and Sternberg (4)], but the earliest clear description of the approach is from Hogeweg and Hesper (5). Progressive alignment is a heuristic approach and is not guaranteed to find the best possible alignment for any given scoring scheme. It does, however, allow alignments of many sequences to be made quickly, even on personal computers (6). The quality of the alignments is good enough for the alignments to be used automatically in many analysis pipelines.The computational complexity of the alignment process, once a guide tree is created, is approximately OðNÞ for N sequences of the same length. The creation of the guide tree involves comparing all N sequences to each other to generate a distance matrix, which is clearly going to require OðN 2 Þ time and computer memory. Once the distance matrix is made, it will require a further clustering step that is usually OðN 2 Þ but can be more expensive. For large N, the construction of the guide tree becomes limiting and prevents the routine alignment of more than a few thousand sequences. Over the years, various attempts have been made to get around this problem. One solution is to quickly make a crude guide tree initially and to iterate that from an initial MSA. This approach is adopted in the widely used Muscle (7) and Mafft (8) packages. Barton and Sternberg were the first authors to use iteration, but they used a simple "chained" guide tree topology, effectively aligning the sequences one at a time to a growing...