2020
DOI: 10.48550/arxiv.2001.07844
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

NJst and ASTRID are not statistically consistent under a random model of missing data

John A. Rhodes,
Michael G. Nute,
Tandy Warnow

Abstract: Species tree estimation from multi-locus datasets is statistically challenging for multiple reasons, including gene tree heterogeneity across the genome due to incomplete lineage sorting (ILS). Species tree estimation methods have been developed that operate by estimating gene trees and then using those gene trees to estimate the species tree. Several of these methods (e.g., ASTRAL, ASTRID, and NJst) are provably statistically consistent under the multi-species coalescent (MSC) model, provided that the gene tr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 9 publications
0
5
0
Order By: Relevance
“…Under the i.i.d. model, ASTRAL is statistically consistent while ASTRID is not [9,47], but both methods are statistically consistent under the clade-based model. That said, the condition under which ASTRAL is consistent for the clade-based model requires that it be run in exact mode, rather than the heuristic mode where a set of allowed bipartitions is computed and the returned tree must draw its bipartitions from that set [47].…”
Section: Impact Of Missing Datamentioning
confidence: 82%
See 1 more Smart Citation
“…Under the i.i.d. model, ASTRAL is statistically consistent while ASTRID is not [9,47], but both methods are statistically consistent under the clade-based model. That said, the condition under which ASTRAL is consistent for the clade-based model requires that it be run in exact mode, rather than the heuristic mode where a set of allowed bipartitions is computed and the returned tree must draw its bipartitions from that set [47].…”
Section: Impact Of Missing Datamentioning
confidence: 82%
“…In recent years, many summary methods that are statistically consistent under the MSC have been developed, such as MP-EST [8], NJst [9], ASTRAL [6], ASTRID [10], FASTRAL [11], and wQFM [12]. Many of these methods are scalable to thousands of species with genomic-scale data (i.e., with thousands of genes).…”
Section: Introductionmentioning
confidence: 99%
“…Nonetheless, it is impossible to know how many gene trees are sufficient to alleviate the effects of missing data. In addition, it has been shown that, under specific conditions and for some models of taxon deletion, ASTRID is guaranteed to converge to an incorrect species tree when the number of gene trees increases asymptotically (Rhodes et al ., 2020).…”
Section: Discussionmentioning
confidence: 99%
“…We refer to the models describing this stochastic process as models of taxon deletion . It has been shown that under some very specific models of taxon deletion, ASTRAL-I and ASTRAL-II are still consistent under the MSC model (Nute and Chou, 2017), but that ASTRID is inconsistent (Rhodes et al ., 2020). However, the consistency of ASTRAL as well as other methods such as FastRFS has not yet been established for more general models of taxon deletion.…”
Section: Introductionmentioning
confidence: 99%
“…In recent years, many accurate summary methods statistically consistent under the MSC have been developed, such as MP-EST [20], NJst [37], ASTRAL [26], ASTRID [46], FASTRAL [8], wQFM [23]. Many of these methods are scalable to genomic-scale data, and under sufficient gene signal and ILS tend to be more accurate than concatenation [31].…”
Section: Introductionmentioning
confidence: 99%