Kieran Boyce scite author profile

Kieran Boyce

3Publications

59Citation Statements Received

36Citation Statements Given

How they've been cited

How they cite others

Affiliations

University College Dublin

Publications

Order By: Most citations

Simple chained guide trees give high-quality protein multiple sequence alignments

Boyce

Sievers

Higgins

2014

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random.he generation of a multiple sequence alignment (MSA) is standard practice during most comparative analyses of homologous genes or proteins. Since the mid-1980s, most automated MSAs have been made using a heuristic approach that Feng and Doolittle (1) called "progressive alignment." This involves clustering the sequences into a tree or dendrogram-like structure, called a "guide tree" in Higgins et al. (2). This guide tree is then used to align the sequences into progressively larger and larger alignments, following the branching order in the tree. Variations on the method were described by various groups in the 1980s [e.g., Taylor (3) and Barton and Sternberg (4)], but the earliest clear description of the approach is from Hogeweg and Hesper (5). Progressive alignment is a heuristic approach and is not guaranteed to find the best possible alignment for any given scoring scheme. It does, however, allow alignments of many sequences to be made quickly, even on personal computers (6). The quality of the alignments is good enough for the alignments to be used automatically in many analysis pipelines.The computational complexity of the alignment process, once a guide tree is created, is approximately OðNÞ for N sequences of the same length. The creation of the guide tree involves comparing all N sequences to each other to generate a distance matrix, which is clearly going to require OðN 2 Þ time and computer memory. Once the distance matrix is made, it will require a further clustering step that is usually OðN 2 Þ but can be more expensive. For large N, the construction of the guide tree becomes limiting and prevents the routine alignment of more than a few thousand sequences. Over the years, various attempts have been made to get around this problem. One solution is to quickly make a crude guide tree initially and to iterate that from an initial MSA. This approach is adopted in the widely used Muscle (7) and Mafft (8) packages. Barton and Sternberg were the first authors to use iteration, but they used a simple "chained" guide tree topology, effectively aligning the sequences one at a time to a growing...

show abstract

Instability in progressive multiple sequence alignment algorithms

Boyce

Sievers

Higgins

2015

Algorithms Mol Biol

View full text Add to dashboard Cite

BackgroundProgressive alignment is the standard approach used to align large numbers of sequences. As with all heuristics, this involves a tradeoff between alignment accuracy and computation time.ResultsWe examine this tradeoff and find that, because of a loss of information in the early steps of the approach, the alignments generated by the most common multiple sequence alignment programs are inherently unstable, and simply reversing the order of the sequences in the input file will cause a different alignment to be generated. Although this effect is more obvious with larger numbers of sequences, it can also be seen with data sets in the order of one hundred sequences. We also outline the means to determine the number of sequences in a data set beyond which the probability of instability will become more pronounced.ConclusionsThis has major ramifications for both the designers of large-scale multiple sequence alignment algorithms, and for the users of these alignments.

show abstract

Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments

Boyce

Sievers

Higgins

2015

Proc. Natl. Acad. Sci. U.S.A.

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kieran Boyce

Simple chained guide trees give high-quality protein multiple sequence alignments

Instability in progressive multiple sequence alignment algorithms

Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments

Contact Info

Product

Resources

About