Jumpstarting phylogenetic analysis

Mecham, Jesse; Clement, Mark J.; Snell, Quinn; Freestone, Todd; Seppi, Kevin; Crandall, Keith A.

doi:10.1504/ijbra.2006.009191

Cited by 4 publications

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generating trees also costs time and money (Mecham et al 2006). There are several published articles praising on the computation effort invested into generating trees and the use of computer clusters for phylogenetic analyses is growing exponentially because investigators need more sophisticated analyses to avoid problems of local optima when analyzing large data sets.…”

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

“…Here, recycling of previous analyses is proposed as starting points for new phylogenetic analysis, even if the previous analyses contain fewer taxa than the newly analyzed data sets. This strategy has been recently called “jumpstarting phylogenetics” (Mecham et al 2006). …”

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

“…While the strategy discussed in the previous section could be called constrained searches , the strategy described here could be referred to as pre-processed searches . Although this technique has been in use for a number of years by POY users (since the incorporation of the command –terminalsfile), it has also been described in a more general context recently as a mode of “jumpstarting” phylogenetic analyses (Mecham et al 2006) (see flowchart for a preprocessed search in Fig. 3).…”

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

See 2 more Smart Citations

Efficient Tree Searches with Available Algorithms

Giribet

2007

Evol Bioinform Online

View full text Add to dashboard Cite

Phylogenetic methods based on optimality criteria are highly desirable for their logic properties, but time-consuming when compared to other methods of tree construction. Traditionally, researchers have been limited to exploring tree space by using multiple replicates of Wagner addition followed by typical hill climbing algorithms such as SPR or/and TBR branch swapping but these methods have been shown to be insufi cient for "large" data sets (or even for small data sets with a complex tree space). Here, I review different algorithms and search strategies used for phylogenetic analysis with the aim of clarifying certain aspects of this important part of the phylogenetic inference exercise. The techniques discussed here apply to both major families of methods based on optimality criteria-parsimony and maximum likelihood-and allow the thorough analysis of complex data sets with hundreds to thousands of terminal taxa. A new technique, called pre-processed searches is proposed for reusing phylogenetic results obtained in previous analyses, to increase the applicability of the previously proposed jumpstarting phylogenetics method. This article is aimed to serve as an educational and algorithmic reference to biologists interested in phylogenetic analysis. RationaleIn phylogenetic analysis, numerical methods are preferred over other methods because of their effi ciency and repeatability. Within numerical methods, those based on optimality criteria are to be preferred because they allow for hypothesis testing and tree comparisons based on objective measures. However, methods based on optimality criteria are more time consuming than most other numerical methods (e.g. UPGMA, neighbor-joining). The reason for this is simple, in order to choose an optimal solution, multiple trees need to be compared. The two main optimality criteria are parsimony and maximum likelihood 1 . While their limits on effi cient searches differ due to the computation requirements by each method (e.g. Sanderson and Kim, 2000;Goloboff, 2003), the issues discussed in this article apply, at least in principle, to both methodologies.Finding the optimal tree(s) for a given optimality criterion-the so-called "tree search"-is a NPcomplete problem Chor and Tuller, 2005); a problem that is unlikely to have a solution in polynomial time. Tree searches are diffi cult due to the exponential growth of possible trees when increasing the number of terminals (OTUs) (Felsenstein, 1978). If a method were to compare all the possible trees using an explicit enumeration technique, an optimality value (tree length for parsimony or −lnL score for maximum likelihood) would be assigned to each tree and those that optimize the selected criterion would be chosen. However, explicit enumeration is not a very effi cient method and there are many algorithmic speedups that will fi nd the optimal solution without the burden of evaluating all possible trees. An alternative solution to explicit enumeration is the use of shortcuts that guarantee fi nding all optimal trees. ...

show abstract

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

Section: Heuristic Methods and Efficient Tree Searches—strategiesmentioning

confidence: 99%

See 1 more Smart Citation

Efficient Tree Searches with Available Algorithms

Giribet

2007

Evol Bioinform Online

View full text Add to dashboard Cite

show abstract

“…This can be thought of as 'jump-starting' alignment (cf. Mecham et al 2006), where all of the hard work done to produce previous alignments is not wasted but is instead used as the starting point for later work. Personally, I (and others, e.g.…”

Section: Incorporating Structure Information Into Alignmentmentioning

confidence: 99%

Multiple sequence alignment for phylogenetic purposes

Morrison¹

2006

Aust. Systematic Bot.

135

113

View full text Add to dashboard Cite

I have addressed the biological rather than bioinformatics aspects of molecular sequence alignment by covering a series of topics that have been under-valued, particularly within the context of phylogenetic analysis. First, phylogenetic analysis is only one of the many objectives of sequence alignment, and the most appropriate multiple alignment may not be the same for all of these purposes. Phylogenetic alignment thus occupies a specific place within a broader context. Second, homology assessment plays an intricate role in phylogenetic analysis, with sequence alignment consisting of primary homology assessment and tree building being secondary homology assessment. The objective of phylogenetic alignment thus distinguishes it from other sorts of alignment. Third, I summarise what is known about the serious limitations of using phenetic similarity as a criterion for automated multiple alignment, and provide an overview of what is currently being done to improve these computerised procedures. This synthesises information that is apparently not widely known among phylogeneticists. Fourth, I then consider the recent development of automated procedures for combining alignment and tree building, thus integrating primary and secondary homology assessment. Finally, I outline various strategies for increasing the biological content of sequence alignment procedures, which consists of taking into account known evolutionary processes when making alignment decisions. These procedures can be objective and repeatable, and can involve computerised algorithms to automate much of the work. Perhaps the most important suggestion is that alignment should be seen as a process where new sequences are added to a pre-existing alignment that has been manually curated by the biologist.

show abstract

Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences

Morrison

2007

Systematic Biology

View full text Add to dashboard Cite

Even when the maximum likelihood (ML) tree is a better estimate of the true phylogenetic tree than those produced by other methods, the result of a poor ML search may be no better than that of a more thorough search under some faster criterion. The ability to find the globally optimal ML tree is therefore important. Here, I compare a range of heuristic search strategies (and their associated computer programs) in terms of their success at locating the ML tree for 20 empirical data sets with 14 to 158 sequences and 411 to 120,762 aligned nucleotides. Three distinct topics are discussed: the success of the search strategies in relation to certain features of the data, the generation of starting trees for the search, and the exploration of multiple islands of trees. As a starting tree, there was little difference among the neighbor-joining tree based on absolute differences (including the BioNJ tree), the stepwise-addition parsimony tree (with or without nearest-neighbor-interchange (NNI) branch swapping), and the stepwise-addition ML tree. The latter produced the best ML score on average but was orders of magnitude slower than the alternatives. The BioNJ tree was second best on average. As search strategies, star decomposition and quartet puzzling were the slowest and produced the worst ML scores. The DPRml, IQPNNI, MultiPhyl, PhyML, PhyNav, and TreeFinder programs with default options produced qualitatively similar results, each locating a single tree that tended to be in an NNI suboptimum (rather than the global optimum) when the data set had low phylogenetic information. For such data sets, there were multiple tree islands with very similar ML scores. The likelihood surface only became relatively simple for data sets that contained approximately 500 aligned nucleotides for 50 sequences and 3,000 nucleotides for 100 sequences. The RAxML and GARLI programs allowed multiple islands to be explored easily, but both programs also tended to find NNI suboptima. A newly developed version of the likelihood ratchet using PAUP* successfully found the peaks of multiple islands, but its speed needs to be improved.

show abstract

Jumpstarting phylogenetic analysis

Cited by 4 publications

References 18 publications

Efficient Tree Searches with Available Algorithms

Efficient Tree Searches with Available Algorithms

Multiple sequence alignment for phylogenetic purposes

Increasing the Efficiency of Searches for the Maximum Likelihood Tree in a Phylogenetic Analysis of up to 150 Nucleotide Sequences

Contact Info

Product

Resources

About