We propose a software package, PhyloBayes 3, which can be used for conducting Bayesian phylogenetic reconstruction and molecular dating analyses, using a large variety of amino acid replacement and nucleotide substitution models, including empirical mixtures or non-parametric models, as well as alternative clock relaxation processes.
Fossils of organisms dating from the origin and diversification of cellular life are scant and difficult to interpret, for this reason alternative means to investigate the ecology of the last universal common ancestor (LUCA) and of the ancestors of the three domains of life are of great scientific value. It was recently recognized that the effects of temperature on ancestral organisms left 'genetic footprints' that could be uncovered in extant genomes. Accordingly, analyses of resurrected proteins predicted that the bacterial ancestor was thermophilic and that Bacteria subsequently adapted to lower temperatures. As the archaeal ancestor is also thought to have been thermophilic, the LUCA was parsimoniously inferred as thermophilic too. However, an analysis of ribosomal RNAs supported the hypothesis of a non-hyperthermophilic LUCA. Here we show that both rRNA and protein sequences analysed with advanced, realistic models of molecular evolution provide independent support for two environmental-temperature-related phases during the evolutionary history of the tree of life. In the first period, thermotolerance increased from a mesophilic LUCA to thermophilic ancestors of Bacteria and of Archaea-Eukaryota; in the second period, it decreased. Therefore, the two lineages descending from the LUCA and leading to the ancestors of Bacteria and Archaea-Eukaryota convergently adapted to high temperatures, possibly in response to a climate change of the early Earth, and/or aided by the transition from an RNA genome in the LUCA to organisms with more thermostable DNA genomes. This analysis unifies apparently contradictory results into a coherent depiction of the evolution of an ecological trait over the entire tree of life.
We combined the category (CAT) mixture model (Lartillot N, Philippe H. 2004) and the nonstationary break point (BP) model (Blanquart S, Lartillot N. 2006) into a new model, CAT-BP, accounting for variations of the evolutionary process both along the sequence and across lineages. As in CAT, the model implements a mixture of distinct Markovian processes of substitution distributed among sites, thus accommodating site-specific selective constraints induced by protein structure and function. Furthermore, as in BP, these processes are nonstationary, and their equilibrium frequencies are allowed to change along lineages in a correlated way, through discrete shifts in global amino acid composition distributed along the phylogenetic tree. We implemented the CAT-BP model in a Bayesian Markov Chain Monte Carlo framework and compared its predictions with those of 3 simpler models, BP, CAT, and the site- and time-homogeneous general time-reversible (GTR) model, on a concatenation of 4 mitochondrial proteins of 20 arthropod species. In contrast to GTR, BP, and CAT, which all display a phylogenetic reconstruction artifact positioning the bees Apis mellifera and Melipona bicolor among chelicerates, the CAT-BP model is able to recover the monophyly of insects. Using posterior predictive tests, we further show that the CAT-BP combination yields better anticipations of site- and taxon-specific amino acid frequencies and that it better accounts for the homoplasies that are responsible for the artifact. Altogether, our results show that the joint modeling of heterogeneities across sites and along time results in a synergistic improvement of the phylogenetic inference, indicating that it is essential to disentangle the combined effects of both sources of heterogeneity, in order to overcome systematic errors in protein phylogenetic analyses.
Variations of nucleotidic composition affect phylogenetic inference conducted under stationary models of evolution. In particular, they may cause unrelated taxa sharing similar base composition to be grouped together in the resulting phylogeny. To address this problem, we developed a nonstationary and nonhomogeneous model accounting for compositional biases. Unlike previous nonstationary models, which are branchwise, that is, assume that base composition only changes at the nodes of the tree, in our model, the process of compositional drift is totally uncoupled from the speciation events. In addition, the total number of events of compositional drift distributed across the tree is directly inferred from the data. We implemented the method in a Bayesian framework, relying on Markov Chain Monte Carlo algorithms, and applied it to several nucleotidic data sets. In most cases, the stationarity assumption was rejected in favor of our nonstationary model. In addition, we show that our method is able to resolve a well-known artifact. By Bayes factor evaluation, we compared our model with 2 previously developed nonstationary models. We show that the coupling between speciations and compositional shifts inherent to branchwise models may lead to an overparameterization, resulting in a lesser fit. In some cases, this leads to incorrect conclusions, concerning the nature of the compositional biases. In contrast, our compound model more flexibly adapts its effective number of parameters to the data sets under investigation. Altogether, our results show that accounting for nonstationary sequence evolution may require more elaborate and more flexible models than those currently used.
BackgroundTunicates represent a key metazoan group as the sister-group of vertebrates within chordates. The six complete mitochondrial genomes available so far for tunicates have revealed distinctive features. Extensive gene rearrangements and particularly high evolutionary rates have been evidenced with regard to other chordates. This peculiar evolutionary dynamics has hampered the reconstruction of tunicate phylogenetic relationships within chordates based on mitogenomic data.ResultsIn order to further understand the atypical evolutionary dynamics of the mitochondrial genome of tunicates, we determined the complete sequence of the solitary ascidian Herdmania momus. This genome from a stolidobranch ascidian presents the typical tunicate gene content with 13 protein-coding genes, 2 rRNAs and 24 tRNAs which are all encoded on the same strand. However, it also presents a novel gene arrangement, highlighting the extreme plasticity of gene order observed in tunicate mitochondrial genomes. Probabilistic phylogenetic inferences were conducted on the concatenation of the 13 mitochondrial protein-coding genes from representatives of major metazoan phyla. We show that whereas standard homogeneous amino acid models support an artefactual sister position of tunicates relative to all other bilaterians, the CAT and CAT+BP site- and time-heterogeneous mixture models place tunicates as the sister-group of vertebrates within monophyletic chordates. Moreover, the reference phylogeny indicates that tunicate mitochondrial genomes have experienced a drastic acceleration in their evolutionary rate that equally affects protein-coding and ribosomal-RNA genes.ConclusionThis is the first mitogenomic study supporting the new chordate phylogeny revealed by recent phylogenomic analyses. It illustrates the beneficial effects of an increased taxon sampling coupled with the use of more realistic amino acid substitution models for the reconstruction of animal phylogeny.
Supplementary data are available at Bioinformatics online.
Due to the lack of macromolecular fossils, the enzymatic repertoire of extinct species has remained largely unknown to date. In an attempt to solve this problem, we have characterized a cyclase subunit (HisF) of the imidazole glycerol phosphate synthase (ImGP-S), which was reconstructed from the era of the last universal common ancestor of cellular organisms (LUCA). As observed for contemporary HisF proteins, the crystal structure of LUCA-HisF adopts the (βα)8-barrel architecture, one of the most ancient folds. Moreover, LUCA-HisF (i) resembles extant HisF proteins with regard to internal 2-fold symmetry, active site residues, and a stabilizing salt bridge cluster, (ii) is thermostable and shows a folding mechanism similar to that of contemporary (βα)8-barrel enzymes, (iii) displays high catalytic activity, and (iv) forms a stable and functional complex with the glutaminase subunit (HisH) of an extant ImGP-S. Furthermore, we show that LUCA-HisF binds to a reconstructed LUCA-HisH protein with high affinity. Our findings suggest that the evolution of highly efficient enzymes and enzyme complexes has already been completed in the LUCA era, which means that sophisticated catalytic concepts such as substrate channeling and allosteric communication existed already 3.5 billion years ago.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.