Modeling across site variation of the substitution process is increasingly recognized as important for obtaining more accurate phylogenetic reconstructions. Both finite and infinite mixture models have been proposed and have been shown to significantly improve on classical single-matrix models. Compared with their finite counterparts, infinite mixtures have a greater expressivity. However, they are computationally more challenging. This has resulted in practical compromises in the design of infinite mixture models. In particular, a fast but simplified version of a Dirichlet process model over equilibrium frequency profiles implemented in PhyloBayes has often been used in recent phylogenomics studies, while more refined model structures, more realistic and empirically more fit, have been practically out of reach. We introduce a message passing interface version of PhyloBayes, implementing the Dirichlet process mixture models as well as more classical empirical matrices and finite mixtures. The parallelization is made efficient thanks to the combination of two algorithmic strategies: a partial Gibbs sampling update of the tree topology and the use of a truncated stick-breaking representation for the Dirichlet process prior. The implementation shows close to linear gains in computational speed for up to 64 cores, thus allowing faster phylogenetic reconstruction under complex mixture models. PhyloBayes MPI is freely available from our website www.phylobayes.org.
Modeling the interplay between mutation and selection at the molecular level is key to evolutionary studies. To this end, codonbased evolutionary models have been proposed as pertinent means of studying long-range evolutionary patterns and are widely used. However, these approaches have not yet consolidated results from amino acid level phylogenetic studies showing that selection acting on proteins displays strong site-specific effects, which translate into heterogeneous amino acid propensities across the columns of alignments; related codon-level studies have instead focused on either modeling a single selective context for all codon columns, or a separate selective context for each codon column, with the former strategy deemed too simplistic and the latter deemed overparameterized. Here, we integrate recent developments in nonparametric statistical approaches to propose a probabilistic model that accounts for the heterogeneity of amino acid fitness profiles across the coding positions of a gene. We apply the model to a dozen real proteincoding gene alignments and find it to produce biologically plausible inferences, for instance, as pertaining to site-specific amino acid constraints, as well as distributions of scaled selection coefficients. In their account of mutational features as well as the heterogeneous regimes of selection at the amino acid level, the modeling approaches studied here can form a backdrop for several extensions, accounting for other selective features, for variable population size, or for subtleties of mutational features, all with parameterizations couched within population-genetic theory.codon substitution | Dirichlet process | phylogeny | selection coefficients F ollowing the seminal works of Muse and Gaut (1) and Goldman and Yang (2), most early applications of codon-based evolutionary models were focused on evaluations of selective effects operating at different positions along a gene or at different time points along the phylogeny (see refs. 3, 4 for reviews). Many of these approaches have modeled selective effects using a parameter representing the nonsynonymous/synonymous rate ratio. However, this may not be ideal, in particular because it amounts to ignoring differences between different pairs of possible amino acid replacements resulting from nonsynonymous point mutations. In recent years, questions regarding selective effects have diversified, such as in the work of Yang and Nielsen (5), who propose a test for selection on codon usage. This test is based on models that invoke a multidimensional specification of scaled selection coefficients, based on either 20 or 61 (under the universal genetic code) scaled fitness parameters-adding 19 or 60 degrees of freedom to the underlying codon substitution model-in contrast with the more conventional use of the single nonsynonymous/synonymous rate ratio parameter, viewing all nonsynonymous events as equivalent (e.g., see ref. 6). By assigning scaled fitness parameters to each of the 20 amino acids, or to the 61 sense codons, Yang and Nie...
Adaptation is likely to be an important determinant of the success of many pathogens, for example when colonizing a new host species, when challenged by antibiotic treatment, or in governing the establishment and progress of long-term chronic infection. Yet, the genomic basis of adaptation is poorly understood in general, and for pathogens in particular. We investigated the genetics of adaptation to cystic fibrosis-like culture conditions in the presence and absence of fluoroquinolone antibiotics using the opportunistic pathogen Pseudomonas aeruginosa. Whole-genome sequencing of experimentally evolved isolates revealed parallel evolution at a handful of known antibiotic resistance genes. While the level of antibiotic resistance was largely determined by these known resistance genes, the costs of resistance were instead attributable to a number of mutations that were specific to individual experimental isolates. Notably, stereotypical quinolone resistance mutations in DNA gyrase often co-occurred with other mutations that, together, conferred high levels of resistance but no consistent cost of resistance. This result may explain why these mutations are so prevalent in clinical quinolone-resistant isolates. In addition, genes involved in cyclic-di-GMP signalling were repeatedly mutated in populations evolved in viscous culture media, suggesting a shared mechanism of adaptation to this CF–like growth environment. Experimental evolutionary approaches to understanding pathogen adaptation should provide an important complement to studies of the evolution of clinical isolates.
Across the great diversity of life, there are many compelling examples of parallel and convergent evolution-similar evolutionary changes arising in independently evolving populations. Parallel evolution is often taken to be strong evidence of adaptation occurring in populations that are highly constrained in their genetic variation. Theoretical models suggest a few potential factors driving the probability of parallel evolution, but experimental tests are needed. In this study, we quantify the degree of parallel evolution in 15 replicate populations of Pseudomonas fluorescens evolved in five different environments that varied in resource type and arrangement. We identified repeat changes across multiple levels of biological organization from phenotype, to gene, to nucleotide, and tested the impact of 1) selection environment, 2) the degree of adaptation, and 3) the degree of heterogeneity in the environment on the degree of parallel evolution at the gene-level. We saw, as expected, that parallel evolution occurred more often between populations evolved in the same environment; however, the extent of parallel evolution varied widely. The degree of adaptation did not significantly explain variation in the extent of parallelism in our system but number of available beneficial mutations correlated negatively with parallel evolution. In addition, degree of parallel evolution was significantly higher in populations evolved in a spatially structured, multiresource environment, suggesting that environmental heterogeneity may be an important factor constraining adaptation. Overall, our results stress the importance of environment in driving parallel evolutionary changes and point to a number of avenues for future work for understanding when evolution is predictable.
Experimental evolution (EE) combined with whole-genome sequencing (WGS) has become a compelling approach to study the fundamental mechanisms and processes that drive evolution. Most EE-WGS studies published to date have used microbes, owing to their ease of propagation and manipulation in the laboratory and relatively small genome sizes. These experiments are particularly suited to answer long-standing questions such as: How many mutations underlie adaptive evolution, and how are they distributed across the genome and through time? Are there general rules or principles governing which genes contribute to adaptation, and are certain kinds of genes more likely to be targets than others? How common is epistasis among adaptive mutations, and what does this reveal about the variety of genetic routes to adaptation? How common is parallel evolution, where the same mutations evolve repeatedly and independently in response to similar selective pressures? Here, we summarize the significant findings of this body of work, identify important emerging trends and propose promising directions for future research. We also outline an example of a computational pipeline for use in EE-WGS studies, based on freely available bioinformatics tools.
Significance The bacterium Pseudomonas aeruginosa is an opportunistic pathogen of humans and is the leading cause of death in patients with cystic fibrosis (CF). We sequenced the genomes of P. aeruginosa isolated from respiratory tracts of patients with CF to investigate general patterns of adaptation associated with chronic infection. Selection imposed by the CF lung environment has had a major influence on genomic evolution and the genetic characteristics of isolates causing contemporary infection. Many of the genes and pathways implicated in adaptive evolution within the host had obvious roles in the pathogenic lifestyle of this bacteria. Genome sequence data indicated that an epidemic strain, with increased virulence and multidrug resistance, has spread between clinics in the United Kingdom and North America.
Background: Probabilistic methods have progressively supplanted the Maximum Parsimony (MP) method for inferring phylogenetic trees. One of the major reasons for this shift was that MP is much more sensitive to the Long Branch Attraction (LBA) artefact than is Maximum Likelihood (ML). However, recent work by Kolaczkowski and Thornton suggested, on the basis of simulations, that MP is less sensitive than ML to tree reconstruction artefacts generated by heterotachy, a phenomenon that corresponds to shifts in site-specific evolutionary rates over time. These results led these authors to recommend that the results of ML and MP analyses should be both reported and interpreted with the same caution. This specific conclusion revived the debate on the choice of the most accurate phylogenetic method for analysing real data in which various types of heterogeneities occur. However, variation of evolutionary rates across species was not explicitly incorporated in the original study of Kolaczkowski and Thornton, and in most of the subsequent heterotachous simulations published to date, where all terminal branch lengths were kept equal, an assumption that is biologically unrealistic.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.