2019
DOI: 10.1101/821439
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Polynomial-Time Statistical Estimation of Species Trees under Gene Duplication and Loss

Abstract: Phylogenomics-the estimation of species trees from multilocus datasets-is a common step in many biological studies. However, this estimation is challenged by the fact that genes can evolve under processes, including incomplete lineage sorting (ILS) and gene duplication and loss (GDL), that make their trees different from the species tree. In this paper, we address the challenge of estimating the species tree under GDL. We show that species trees are identifiable under a standard stochastic model for GDL, and t… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

6
48
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
5
2

Relationship

3
4

Authors

Journals

citations
Cited by 26 publications
(54 citation statements)
references
References 55 publications
6
48
0
Order By: Relevance
“…Legried et al [74] found that ASTRID had similar or higher accuracy than all other methods evaluated. Overall, distance-based methods appear to be a generally accurate and efficient method for inferring species trees using paralogs.…”
Section: Methods Based On Neighbor Joining (And Other Clustering Apprmentioning
confidence: 94%
“…Legried et al [74] found that ASTRID had similar or higher accuracy than all other methods evaluated. Overall, distance-based methods appear to be a generally accurate and efficient method for inferring species trees using paralogs.…”
Section: Methods Based On Neighbor Joining (And Other Clustering Apprmentioning
confidence: 94%
“…In fact, ASTRAL-multi is the only method that has been proven statistically consistent under any GDL model. Yet, a comparison reported by Legried et al (2020) between ASTRAL-multi and three earlier species tree estimation methods, including DupTree, STAG ( Emms and Kelly, 2018 ), and MulRF, showed that ASTRAL-multi had good but not exceptional accuracy; specifically, when the duplication and loss rates were both high, ASTRAL-multi was more accurate than DupTree (except when GTEE was low) and STAG (which often failed to complete), but was less accurate than MulRF.…”
Section: Introductionmentioning
confidence: 97%
“…In a very recent advance, Legried et al (2020) proved that ASTRAL-multi ( Rabiee et al , 2019 ), an extension of ASTRAL ( Mirarab et al , 2014 ) to address multi-allele inputs, is statistically consistent under the standard stochastic model of GDL proposed by Arvestad et al (2009) in which all the genes evolve independently and identically distributed ( i.i.d. ) within a species tree, with duplication and loss rates fixed across the edges of the species tree.…”
Section: Introductionmentioning
confidence: 99%
“…Second, we presented FastMulRFS, a polynomial time algorithm to find an exact solution to the RFS-multree problem within a constrained search space, and we proved that the default version is statistically consistent under generic duplication-only or loss-only models. Thus, FastMulRFS is the second of only two methods proven to be statistically consistent under scenarios with gene duplication and/or loss (ASTRAL-multi, which was proven consistent under a parametric model of GDL in [24], is the other). Third, we showed that FastMulRFS maintains high accuracy even under conditions where both duplication and loss occur, where moderate incomplete lineage sorting (ILS) is present, where there is substantial gene tree estimation error (GTEE), and for 25 to 500 genes.…”
Section: Discussionmentioning
confidence: 99%
“…In a recent study, Legried et al [24] showed that ASTRAL-multi [28], a recent extension of ASTRAL [27] (a method that was developed for the problem of species tree estimation in the presence of ILS) is statistically consistent under a standard stochastic model of GDL. They also compared ASTRAL-multi to MulRF and DupTree, and found that it was more accurate than DupTree but not quite as accurate as MulRF, given estimated gene trees where true gene tree heterogeneity was due to GDL.…”
Section: Introductionmentioning
confidence: 99%