2020
DOI: 10.1101/2020.01.09.899906
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ModelTeller: model selection for optimal phylogenetic reconstruction using machine learning

Abstract: Statistical criteria have long been the standard for selecting the best model for phylogenetic reconstruction and downstream statistical inference. While model selection is regarded as a fundamental step in phylogenetics, existing methods for this task consume computational resources for long processing time, they are not always feasible, and sometimes depend on preliminary assumptions which do not hold for sequence data. Moreover, while these methods are dedicated to revealing the processes that underlie the … Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
14
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(14 citation statements)
references
References 80 publications
0
14
0
Order By: Relevance
“…Yet another alternative approach is to train a machine-learning algorithm to assess historical signal. Machine learning has recently been proposed in phylogenetics for substitution model selection (Abadi et al 2020), inference of tree topology (Suvorov et al 2019), species delimitation (Derkarabetian et al 2019), and analyses of molecular rates across lineages (Tao et al 2019). A random forest or an artificial neural network might prove to be highly effective for identifying the factors that are associated with accurate inferences.…”
Section: Discussionmentioning
confidence: 99%
“…Yet another alternative approach is to train a machine-learning algorithm to assess historical signal. Machine learning has recently been proposed in phylogenetics for substitution model selection (Abadi et al 2020), inference of tree topology (Suvorov et al 2019), species delimitation (Derkarabetian et al 2019), and analyses of molecular rates across lineages (Tao et al 2019). A random forest or an artificial neural network might prove to be highly effective for identifying the factors that are associated with accurate inferences.…”
Section: Discussionmentioning
confidence: 99%
“…The only other tool currently employing a machine learning approach for model estimation is ModelTeller, which uses random forests for identifying the correct model of sequence evolution (Abadi et al 2020). However, ModelTeller and our approach are sufficiently different in aim and methodology to make direct comparison meaningless.…”
Section: Discussionmentioning
confidence: 99%
“…While it may be possible to ameliorate the influence of MSA uncertainty on relative model selection, we must also ask: Do we need to mitigate this issue in the first place? For example, recent studies have shown that, for both nucleotide and amino-acid models, the model selection procedure itself may not be a critical step in phylogenetic reconstruction, since different models with extreme differences in relative fit may not actually result in systematically different results (Spielman and Kosakovsky Pond 2018;Abadi et al 2019;Spielman 2020) although how the precise model used may influence branch length and/or divergence estimation remains an important question Abadi et al (2019Abadi et al ( , 2020. As such, if distinct models may yield highly similar inferences, optimizing the model selection procedure itself has diminishing returns, akin to optimizing a mouse trap in a house without mice.…”
Section: Discussionmentioning
confidence: 99%
“…We note that one limitation of this study is that we only explore the influence of MSA uncertainty on relative model selection and not other approaches to identifying best-fitting models, including tests of model adequacy (Goldman 1993a,b;Duchêne et al 2018), Bayesian assessments of absolute model fit (Brown 2014;Lewis et al 2014), or more recently-developed machine-learning methods for model selection (Abadi et al 2020). Indeed, it may be possible that other approaches to model selection are more robust to MSA uncertainty, but the computational demands of these methods prohibit similar large-scale benchmarking.…”
Section: Discussionmentioning
confidence: 99%