Dana Azouri scite author profile

Determining the most suitable model for phylogeny reconstruction constitutes a fundamental step in numerous evolutionary studies. Over the years, various criteria for model selection have been proposed, leading to debate over which criterion is preferable. However, the necessity of this procedure has not been questioned to date. Here, we demonstrate that although incongruency regarding the selected model is frequent over empirical and simulated data, all criteria lead to very similar inferences. When topologies and ancestral sequence reconstruction are the desired output, choosing one criterion over another is not crucial. Moreover, skipping model selection and using instead the most parameter-rich model, GTR+I+G, leads to similar inferences, thus rendering this time-consuming step nonessential, at least under current strategies of model selection.

show abstract

A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions

Loewenthal

Rapoport

Avram

et al. 2021

View full text Add to dashboard Cite

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here we introduce several improvements to indel modeling: (1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; (2) We introduce numerous summary statistics that allow Approximate Bayesian Computation (ABC) based parameter estimation; (3) We develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical datasets; (4) Using a model-selection scheme we test whether the richer model better fits biological data compared to the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical datasets and that, for the majority of these datasets, the deletion rate is higher than the insertion rate.

show abstract

Harnessing machine learning to guide phylogenetic-tree search algorithms

et al. 2021

View full text Add to dashboard Cite

Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.

show abstract

A probabilistic model for indel evolution: differentiating insertions from deletions

Loewenthal

Rapoport

Avram

et al. 2020

Preprint

View full text Add to dashboard Cite

Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here we introduce several improvements to indel modeling: (1) while previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here, we propose a richer model that explicitly distinguishes between the two; (2) We introduce numerous summary statistics that allow Approximate Bayesian Computation (ABC) based parameter estimation; (3) We develop a neural-network model-selection scheme to test whether the richer model better fits biological data compared to the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed indel model better fits a large number of empirical datasets and that, for the majority of these datasets, the deletion rate is higher than the insertion rate. Finally, we demonstrate that indel rates are negatively correlated to the effective population size across various phylogenomic clades.

show abstract

Heterogeneity in the rate of molecular sequence evolution substantially impacts the accuracy of detecting shifts in diversification rates

et al. 2020

View full text Add to dashboard Cite

As species richness varies along the tree of life, there is a great interest in identifying factors that affect the rates by which lineages speciate or go extinct. To this end, theoretical biologists have developed a suite of phylogenetic comparative methods that aim to identify where shifts in diversification rates had occurred along a phylogeny and whether they are associated with some traits. Using these methods, numerous studies have predicted that speciation and extinction rates vary across the tree of life. In this study we show that asymmetric rates of sequence evolution lead to systematic biases in the inferred phylogeny, which in turn lead to erroneous inferences regarding lineage diversification patterns. The results demonstrate that as the asymmetry in sequence evolution rates increases, so does the tendency to select more complicated models that include the possibility of diversification rate shifts. These results thus suggest that any inference regarding shifts in diversification pattern should be treated with great caution, at least until any biases regarding the molecular substitution rate have been ruled out.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Dana Azouri

Model selection may not be a mandatory step for phylogeny reconstruction

A Probabilistic Model for Indel Evolution: Differentiating Insertions from Deletions

Harnessing machine learning to guide phylogenetic-tree search algorithms

A probabilistic model for indel evolution: differentiating insertions from deletions

Heterogeneity in the rate of molecular sequence evolution substantially impacts the accuracy of detecting shifts in diversification rates

Contact Info

Product

Resources

About