Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.
Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, F w , that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max x 2 , NSS, a coalescentbased likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r 2 and jD9j) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max x 2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that F w is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances.
Reconstructing the origin and evolution of land plants and their algal relatives is a fundamental problem in plant phylogenetics, and is essential for understanding how critical adaptations arose, including the embryo, vascular tissue, seeds, and flowers. Despite advances in molecular systematics, some hypotheses of relationships remain weakly resolved. Inferring deep phylogenies with bouts of rapid diversification can be problematic; however, genome-scale data should significantly increase the number of informative characters for analyses. Recent phylogenomic reconstructions focused on the major divergences of plants have resulted in promising but inconsistent results. One limitation is sparse taxon sampling, likely resulting from the difficulty and cost of data generation. To address this limitation, transcriptome data for 92 streptophyte taxa were generated and analyzed along with 11 published plant genome sequences. Phylogenetic reconstructions were conducted using up to 852 nuclear genes and 1,701,170 aligned sites. Sixty-nine analyses were performed to test the robustness of phylogenetic inferences to permutations of the data matrix or to phylogenetic method, including supermatrix, supertree, and coalescent-based approaches, maximumlikelihood and Bayesian methods, partitioned and unpartitioned analyses, and amino acid versus DNA alignments. Among other results, we find robust support for a sister-group relationship between land plants and one group of streptophyte green algae, the Zygnematophyceae. Strong and robust support for a clade comprising liverworts and mosses is inconsistent with a widely accepted view of early land plant evolution, and suggests that phylogenetic hypotheses used to understand the evolution of fundamental plant traits should be reevaluated.land plants | Streptophyta | phylogeny | phylogenomics | transcriptome T he origin of embryophytes (land plants) in the Ordovician period roughly 480 Mya (1-4) marks one of the most important events in the evolution of life on Earth. The early evolution of embryophytes in terrestrial environments was facilitated by numerous innovations, including parental protection for the developing embryo, sperm and egg production in multicellular protective structures, and an alternation of phases (often referred to as generations) in which a diploid sporophytic life history stage gives rise to a multicellular haploid gametophytic phase. With Significance Early branching events in the diversification of land plants and closely related algal lineages remain fundamental and unresolved questions in plant evolutionary biology. Accurate reconstructions of these relationships are critical for testing hypotheses of character evolution: for example, the origins of the embryo, vascular tissue, seeds, and flowers. We investigated relationships among streptophyte algae and land plants using the largest set of nuclear genes that has been applied to this problem to date. Hypothesized relationships were rigorously tested through a series of analyses to assess systematic er...
Tunicates or urochordates (appendicularians, salps and sea squirts), cephalochordates (lancelets) and vertebrates (including lamprey and hagfish) constitute the three extant groups of chordate animals. Traditionally, cephalochordates are considered as the closest living relatives of vertebrates, with tunicates representing the earliest chordate lineage. This view is mainly justified by overall morphological similarities and an apparently increased complexity in cephalochordates and vertebrates relative to tunicates. Despite their critical importance for understanding the origins of vertebrates, phylogenetic studies of chordate relationships have provided equivocal results. Taking advantage of the genome sequencing of the appendicularian Oikopleura dioica, we assembled a phylogenomic data set of 146 nuclear genes (33,800 unambiguously aligned amino acids) from 14 deuterostomes and 24 other slowly evolving species as an outgroup. Here we show that phylogenetic analyses of this data set provide compelling evidence that tunicates, and not cephalochordates, represent the closest living relatives of vertebrates. Chordate monophyly remains uncertain because cephalochordates, albeit with a non-significant statistical support, surprisingly grouped with echinoderms, a hypothesis that needs to be tested with additional data. This new phylogenetic scheme prompts a reappraisal of both morphological and palaeontological data and has important implications for the interpretation of developmental and genomic studies in which tunicates and cephalochordates are used as model animals.
Correspondence to H.P. email: herve.philippe@umontreal.ca -2 - PrefaceAs more complete genomes are sequenced, phylogenetic analysis is entering a new era -that of phylogenomics. One branch of this expanding field aims to reconstruct the evolutionary history of organisms based on the analysis of their genomes. Recent studies have demonstrated the power of this approach, which has the potential to provide answers to a number of fundamental evolutionary questions. However, challenges for the future have also been revealed. The very nature of the evolutionary history of organisms and the limitations of current phylogenetic reconstruction methods mean that part of the tree of life may prove difficult, if not impossible, to resolve with confidence. Introductory paragraphUnderstanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. The notion of phylogeny follows directly from the theory of evolution presented by Charles Darwin in "The Origin of Species" 1 : the only illustration in his famous book is the first representation of evolutionary relationships among species, in the form of a phylogenetic tree. The subsequent enthusiasm of biologists for the phylogenetic concept is illustrated by the publication of Ernst Haeckel's famous "trees" as early as 1866 2 .Today, phylogenetics -the reconstruction of evolutionary history -relies on using mathematical methods to infer the past from features of contemporary species, with only the fossil record to provide a window on the evolutionary past of life on our planet. This reconstruction involves the identification of HOMOLOGOUS CHARACTERS that are shared among different organisms, and the inference of phylogenetic trees from the comparison of these characters using reconstruction methods (BOX 1). The accuracy of -3 -the inference is therefore heavily dependent upon the quality of models for the evolution of such characters. Because the underlying mechanisms are not yet well understood, reconstructing the evolutionary history of life on Earth based solely on the information provided by living organisms has turned out to be difficult.Until the 1970s, which brought the dawn of molecular techniques for sequencing proteins and DNA, phylogenetic reconstruction was essentially based on the analysis of morphological or ultrastructural characters. The comparative anatomy of fossils and extant species has proved powerful in some respects; for example, the main groups of animals and plants have been delineated fairly easily using these methods. However, this approach is hampered by the limited number of reliable homologous characters available; these are almost non-existent in micro-organisms 3 and are rare even in complex organisms.The introduction of the use of molecular data in phylogenetics 4 led to a revolution.In the late 1980s, access to DNA sequences increased the number of homologous characters that could be compared from less than 100 to more than 1,000, ...
The origin of many of the defining features of animal body plans, such as symmetry, nervous system, and the mesoderm, remains shrouded in mystery because of major uncertainty regarding the emergence order of the early branching taxa: the sponge groups, ctenophores, placozoans, cnidarians, and bilaterians. The "phylogenomic" approach [1] has recently provided a robust picture for intrabilaterian relationships [2, 3] but not yet for more early branching metazoan clades. We have assembled a comprehensive 128 gene data set including newly generated sequence data from ctenophores, cnidarians, and all four main sponge groups. The resulting phylogeny yields two significant conclusions reviving old views that have been challenged in the molecular era: (1) that the sponges (Porifera) are monophyletic and not paraphyletic as repeatedly proposed [4-9], thus undermining the idea that ancestral metazoans had a sponge-like body plan; (2) that the most likely position for the ctenophores is together with the cnidarians in a "coelenterate" clade. The Porifera and the Placozoa branch basally with respect to a moderately supported "eumetazoan" clade containing the three taxa with nervous system and muscle cells (Cnidaria, Ctenophora, and Bilateria). This new phylogeny provides a stimulating framework for exploring the important changes that shaped the body plans of the early diverging phyla.
In the Bayesian paradigm, a common method for comparing two models is to compute the Bayes factor, defined as the ratio of their respective marginal likelihoods. In recent phylogenetic works, the numerical evaluation of marginal likelihoods has often been performed using the harmonic mean estimation procedure. In the present article, we propose to employ another method, based on an analogy with statistical physics, called thermodynamic integration. We describe the method, propose an implementation, and show on two analytical examples that this numerical method yields reliable estimates. In contrast, the harmonic mean estimator leads to a strong overestimation of the marginal likelihood, which is all the more pronounced as the model is higher dimensional. As a result, the harmonic mean estimator systematically favors more parameter-rich models, an artefact that might explain some recent puzzling observations, based on harmonic mean estimates, suggesting that Bayes factors tend to overscore complex models. Finally, we apply our method to the comparison of several alternative models of amino-acid replacement. We confirm our previous observations, indicating that modeling pattern heterogeneity across sites tends to yield better models than standard empirical matrices.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.