Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site dN/dS rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]
Bayesian analysis of macroevolutionary mixtures (BAMM) has recently taken the study of lineage diversification by storm. BAMM estimates the diversification-rate parameters (speciation and extinction) for every branch of a study phylogeny and infers the number and location of diversification-rate shifts across branches of a tree. Our evaluation of BAMM reveals two major theoretical errors: (i) the likelihood function (which estimates the model parameters from the data) is incorrect, and (ii) the compound Poisson process prior model (which describes the prior distribution of diversification-rate shifts across branches) is incoherent. Using simulation, we demonstrate that these theoretical issues cause statistical pathologies; posterior estimates of the number of diversification-rate shifts are strongly influenced by the assumed prior, and estimates of diversificationrate parameters are unreliable. Moreover, the inability to correctly compute the likelihood or to correctly specify the prior for rate-variable trees precludes the use of Bayesian approaches for testing hypotheses regarding the number and location of diversification-rate shifts using BAMM.volutionary biologists have long sought to detect patterns and understand the causes of variation in rates of lineage diversification (speciation − extinction). This has motivated the development of several statistical methods for detecting whether (and where) diversification rates have changed across the branches of a phylogeny (1-4). A recent approach-Bayesian analysis of macroevolutionary mixtures (BAMM) (5)-promises to greatly enhance our ability to study this problem.This important new method offers several key advantages. (i) BAMM is based on an explicit model that describes how diversification rates shift across the branches of a tree. (ii) The underlying branching process is more complex (and presumably more realistic) than those used in previous methods. Specifically, BAMM not only includes parameters for the rate of speciation and extinction, but also accommodates possible time-dependent effects (where the age of a lineage may affect its diversification rate). This is intended to approximate the phenomenon of diversity-dependent diversification (where the number of species in a lineage may affect its diversification rate), which is believed to be a prevalent feature of empirical phylogenies (6). (iii) By virtue of developing this method in a Bayesian statistical framework, BAMM allows us to gauge the uncertainty in our inferences by providing marginal posterior probability densities rather than point estimates of parameters. (iv) By averaging inferences over any number of diversification-rate shifts, BAMM both accommodates uncertainty in the choice of model and avoids potential complications associated with model selection.BAMM provides estimates of the number and location of diversification-rate shifts across the branches of a tree and also estimates the diversification-rate parameters-speciation, extinction, and time dependence-on each branch of the tree. ...
Increasingly, large data sets pose a challenge for computationally intensive phylogenetic methods such as Bayesian Markov chain Monte Carlo (MCMC). Here, we investigate the performance of common MCMC proposal distributions in terms of median and variance of run time to convergence on 11 data sets. We introduce two new Metropolized Gibbs Samplers for moving through "tree space." MCMC simulation using these new proposals shows faster average run time and dramatically improved predictability in performance, with a 20-fold reduction in the variance of the time to estimate the posterior distribution to a given accuracy. We also introduce conditional clade probabilities and demonstrate that they provide a superior means of approximating tree topology posterior probabilities from samples recorded during MCMC.
In macroevolution, the Red Queen (RQ) model posits that biodiversity dynamics depend mainly on species-intrinsic biotic factors such as interactions among species or life-history traits, while the Court Jester (CJ) model states that extrinsic environmental abiotic factors have a stronger role. Until recently, a lack of relevant methodological approaches has prevented the unraveling of contributions from these 2 types of factors to the evolutionary history of a lineage. Herein, we take advantage of the rapid development of new macroevolution models that tie diversification rates to changes in paleoenvironmental (extrinsic) and/or biotic (intrinsic) factors. We inferred a robust and fully-sampled species-level phylogeny, as well as divergence times and ancestral geographic ranges, and related these to the radiation of Apollo butterflies (Parnassiinae) using both extant (molecular) and extinct (fossil/morphological) evidence. We tested whether their diversification dynamics are better explained by an RQ or CJ hypothesis, by assessing whether speciation and extinction were mediated by diversity-dependence (niche filling) and clade-dependent host-plant association (RQ) or by large-scale continuous changes in extrinsic factors such as climate or geology (CJ). For the RQ hypothesis, we found significant differences in speciation rates associated with different host-plants but detected no sign of diversity-dependence. For CJ, the role of Himalayan-Tibetan building was substantial for biogeography but not a driver of high speciation, while positive dependence between warm climate and speciation/extinction was supported by continuously varying maximum-likelihood models. We find that rather than a single factor, the joint effect of multiple factors (biogeography, species traits, environmental drivers, and mass extinction) is responsible for current diversity patterns and that the same factor might act differently across clades, emphasizing the notion of opportunity. This study confirms the importance of the confluence of several factors rather than single explanations in modeling diversification within lineages.
The birth-death process is widely used in phylogenetics to model speciation and extinction. Recent studies have shown that the inferred rates are sensitive to assumptions about the sampling probability of lineages. Here, we examine the effect of the method used to sample lineages. Whereas previous studies have assumed random sampling (RS), we consider two extreme cases of biased sampling: "diversified sampling" (DS), where tips are selected to maximize diversity and "cluster sampling (CS)," where sample diversity is minimized. DS appears to be standard practice, for example, in analyses of higher taxa, whereas CS may occur under special circumstances, for example, in studies of geographically defined floras or faunas. Using both simulations and analyses of empirical data, we show that inferred rates may be heavily biased if the sampling strategy is not modeled correctly. In particular, when a diversified sample is treated as if it were a random or complete sample, the extinction rate is severely underestimated, often close to 0. Such dramatic errors may lead to serious consequences, for example, if estimated rates are used in assessing the vulnerability of threatened species to extinction. Using Bayesian model testing across 18 empirical data sets, we show that DS is commonly a better fit to the data than complete, random, or cluster sampling (CS). Inappropriate modeling of the sampling method may at least partly explain anomalous results that have previously been attributed to variation over time in birth and death rates.
Summary1. The paleontological record chronicles numerous episodes of mass extinction that severely culled the Tree of Life. Biologists have long sought to assess the extent to which these events may have impacted particular groups. We present a novel method for detecting the impact of mass-extinction events on molecular phylogenies, even in the presence of tree-wide diversification-rate variation and in the absence of additional information from the fossil record. 2. Our approach is based on an episodic stochastic-branching process model in which rates of speciation and extinction are constant between events. We model three types of events: (i) instantaneous tree-wide shifts in speciation rate; (ii) instantaneous tree-wide shifts in extinction rate and (iii) instantaneous tree-wide mass-extinction events. Each type of event is modelled as an independent compound Poisson process (CPP), where the waiting times between events are exponentially distributed with event-specific rate parameters. The magnitude of each event is drawn from an event-specific prior distribution. Parameters of the model are then estimated in a Bayesian statistical framework using a reversible-jump Markov chain Monte Carlo algorithm. This Bayesian approach enables us to distinguish between tree-wide diversification-rate variation and mass-extinction events by specifying a biologically informed prior on the magnitude of mass-extinction events and empirical hyperpriors on the diversification-rate parameters. 3. We demonstrate via simulation that this method has substantial power to detect the number of mass-extinction events and provides unbiased estimates of the timing of mass-extinction events, while exhibiting an appropriate (i.e. <5%) false-discovery rate, even when background diversification rates vary. Finally, we provide an empirical demonstration of this approach, which reveals that conifers experienced a major episode of mass extinction %23 Ma. 4. This new approach -the CPP on Mass-Extinction Times (CoMET) model -provides an effective tool for detecting the impact of mass-extinction events on molecular phylogenies, even when the history of those groups includes temporal variation in diversification rates and when the fossil history of those groups is poorly known.
The source code for TESS is freely available at http://cran.r-project.org/web/packages/TESS/ CONTACT: Sebastian.Hoehna@gmail.com.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.