We propose a software package, PhyloBayes 3, which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.
Several models have been proposed to relax the molecular clock in order to estimate divergence times. However, it is unclear which model has the best fit to real data and should therefore be used to perform molecular dating. In particular, we do not know whether rate autocorrelation should be considered or which prior on divergence times should be used. In this work, we propose a general bench mark of alternative relaxed clock models. We have reimplemented most of the already existing models, including the popular lognormal model, as well as various prior choices for divergence times (birth-death, Dirichlet, uniform), in a common Bayesian statistical framework. We also propose a new autocorrelated model, called the "CIR" process, with well-defined stationary properties. We assess the relative fitness of these models and priors, when applied to 3 different protein data sets from eukaryotes, vertebrates, and mammals, by computing Bayes factors using a numerical method called thermodynamic integration. We find that the 2 autocorrelated models, CIR and lognormal, have a similar fit and clearly outperform uncorrelated models on all 3 data sets. In contrast, the optimal choice for the divergence time prior is more dependent on the data investigated. Altogether, our results provide useful guidelines for model choice in the field of molecular dating while opening the way to more extensive model comparisons.
We propose a continuous model for evolutionary rate variation across sites and over the tree and derive exact transition probabilities under this model. Changes in rate are modelled using the CIR process, a diffusion widely used in financial applications. The model directly extends the standard gamma distributed rates across site model, with one additional parameter governing changes in rate down the tree. The parameters of the model can be estimated directly from two well-known statistics: the index of dispersion and the gamma shape parameter of the rates across sites model. The CIR model can be readily incorporated into probabilistic models for sequence evolution. We provide here an exact formula for the likelihood of a three taxa tree. Larger trees can be evaluated using Monte-Carlo methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.