The statistical selection of best-fit models of nucleotide substitution is routine in the phylogenetic analysis of DNA sequence alignments. The programs ModelTest 1 and jModelTest 2 are very popular tools to accomplish this task, with thousands of users and citations. The latter uses PhyML 3 to obtain maximum likelihood estimates of model parameters, and implements different statistical criteria to select among 88 models of nucleotide substitution, including hierarchical and dynamical likelihood ratio tests, Akaike's and Bayesian information criteria (AIC and BIC) and a performance-based decision theory method (see ref. 4 ). jModelTest also provides estimates of model selection uncertainty, parameter importances and model-averaged parameter estimates, including model-averaged phylogenies 4 .However, in recent years the advent of NGS technologies has changed the field, and most researchers are now moving from phylogenetics to phylogenomics, where large sequence alignments typically include hundreds or thousands of loci. Phylogenetic resources therefore need to be adapted to a High Performance Computing (HPC) paradigm, allowing demanding analyses at the genomic level. Here we introduce jModelTest 2, which incorporates more models, new heuristics, efficient technical optimizations and multithreaded and MPI-based implementations for statistical model selection. jModelTest 2 includes several important new features (Supplementary Table 1). We have expanded the set of candidate models from 88 to 1624, resulting from the consideration of the 203 different partitions of the 4 ×4 nucleotide substitution rate matrix (R-matrix) combined with rate variation among sites and equal/unequal base frequencies. Indeed, likelihood computations for a large number of models or for large data sets can be extremely time-consuming, so we have also implemented two different heuristics for the selection of the best-fit model. The first one is a greedy hill-climbing hierarchical clustering that searches the set of 1624 models optimizing at most 288 models (Supplementary Note 1) with almost the same accuracy as an exhaustive search. The second is a heuristic filtering dposada@uvigo.es
MotivationPhylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets.ResultsWe present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric.Availability and implementationThe code is available under GNU GPL at . RAxML-NG web service (maintained by Vital-IT) is available at .Supplementary information Supplementary data are available at Bioinformatics online.
Supplementary data are available at Bioinformatics online.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.