Colby Long scite author profile

Most current methods for inferring species-level phylogenies under the coalescent model assume that no gene flow occurs following speciation. Several studies have examined the impact of gene flow (e.g., Eckert and Carstens 2008; Chung and Ané 2011; Leaché et al. 2014; Solís-Lemus et al. 2016) and of ancestral population structure (DeGeorgio and Rosenberg 2016) on the performance of species-level phylogenetic inference, and analytic results have been proven for network models of gene flow (e.g., Solís-Lemus et al. 2016; Zhu et al. 2016). However, there are few analytic results for a continuous model of gene flow following speciation, despite the development of mathematical tools that could facilitate such study (e.g., Hobolth et al. 2011; Andersen et al. 2014; Tian and Kubatko 2016). In this article, we consider a three-taxon isolation-with-migration model that allows gene flow between sister taxa for a brief period following speciation, as well as variation in the effective population sizes across the species tree. We derive the probabilities of each of the three gene tree topologies under this model, and show that for certain choices of the gene flow and effective population size parameters, anomalous gene trees (i.e., gene trees that are discordant with the species tree but that have higher probability than the gene tree concordant with the species tree) exist. We characterize the region of parameter space producing anomalous trees and show that the probability of the gene tree that is concordant with the species tree can be arbitrarily small. We then show that there is theoretical support for using SVDQuartets with an outgroup to infer the rooted three-taxon species tree in a model of gene flow between sister taxa. We study the performance of SVDQuartets on simulated data and compare it to three other commonly-used methods for species tree inference, ASTRAL, MP-EST, and concatenation. The simulations show that ASTRAL, MP-EST, and concatenation can be statistically inconsistent when gene flow is present, while SVDQuartets performs well, though large sample sizes may be required for certain parameter choices.

show abstract

Distinguishing Phylogenetic Networks

Gross¹,

Long²

2018

SIAM J. Appl. Algebra Geometry

View full text Add to dashboard Cite

Phylogenetic networks are becoming increasingly popular in phylogenetics since they have the ability to describe a wider range of evolutionary events than their tree counterparts. In this paper, we study Markov models on phylogenetic networks and their associated geometry. We restrict our attention to large-cycle networks, networks with a single undirected cycle of length at least four. Using tools from computational algebraic geometry, we show that the semi-directed network topology is generically identifiable for Jukes-Cantor large-cycle network models. arXiv:1706.03060v1 [q-bio.PE] 9 Jun 2017 do not assume any knowledge about which sites were produced by the same subtree of the network. The two-state Cavender-Farris-Neyman model may seem the more natural starting point for our exploration of network identifiabilty. However, as is evident from our computations in Proposition 4.7, the restricted coordinate space for this model makes it impossible to identify small networks from one another, our main strategy for eventually proving identifiability in the Jukes-Cantor case.Since the Jukes-Cantor model is time-reversible, the precise location of the root within the network will be unidentifiable from the distribution. However, we cannot simply study the unrooted topology of networks without orientation, since reticulation edges, edges directed into vertices of indegree two, play a special role defining the distribution. Thus, our results concern the identifiability of the semi-directed network topology, the unrooted, undirected network with distinguished reticulation edges. We will also restrict our attention to networks with only a single reticulation vertex which we call cycle-networks. We will refer to the set of all cycle-networks with cycle length greater than 4 as the class of large-cycle networks. The main result of this paper is the following theorem.Theorem 1.1. The semi-directed network topology parameter of large-cycle Jukes-Cantor network models is generically identifiable.Markov models on networks with a single reticulation vertex are very closely related to 2-tree mixture models but with some subtle differences that we discuss in Section 2.1. Using techniques from algebraic statistics, it is shown in [2] that the tree parameters of a 2tree Jukes-Cantor mixture are generically identifiable. Here we adopt a similar approach. We associate to each network N an algebraic variety V N that is the Zariski closure of the set of probability distributions attained by varying the numerical parameters in the model on N . We then study the associated ideals of the networks to find algebraic invariants that distinguish networks from one another. The two networks in Figure 1 demonstrate that the generic identifiability results for 2-tree mixtures do not apply for phylogenetic networks. These networks have different semi-directed network topologies and induce different multisets of embedded trees. Suprisingly, however, the algebraic variety for the network on the left is properly contained in that of the network on th...

show abstract

Species Tree Inference from Genomic Sequences Using the Log-Det Distance

Allman

Long

Rhodes

2019

SIAM J. Appl. Algebra Geometry

View full text Add to dashboard Cite

The log-det distance between two aligned DNA sequences was introduced as a tool for statistically consistent inference of a gene tree under simple non-mixture models of sequence evolution. Here we prove that the log-det distance, coupled with a distancebased tree construction method, also permits consistent inference of species trees under mixture models appropriate to aligned genomic-scale sequences data. Data may include sites from many genetic loci, which evolved on different gene trees due to incomplete lineage sorting on an ultrametric species tree, with different time-reversible substitution processes. The simplicity and speed of distance-based inference suggests log-det based methods should serve as benchmarks for judging more elaborate and computationallyintensive species trees inference methods.

show abstract

Identifiability and Reconstructibility of Species Phylogenies Under a Modified Coalescent

Long¹,

Kubatko

2018

Bull Math Biol

View full text Add to dashboard Cite

Coalescent models of evolution account for incomplete lineage sorting by specifying a species tree parameter which determines a distribution on gene trees, and consequently, a site pattern probability distribution. It has been shown that the unrooted topology of the species tree parameter of the multispecies coalescent is generically identifiable, and a reconstruction method called SVDQuartets has been developed to infer this topology. In this paper, we describe a modified multispecies coalescent model that allows for varying effective population size and violations of the molecular clock. We show that the unrooted topology of the species tree parameter for these models is generically identifiable and that SVDQuartets can still be used to infer this topology.

show abstract

Bounds on the Expected Size of the Maximum Agreement Subtree

Bernstein¹,

Ho²,

Long³

et al. 2015

SIAM J. Discrete Math.

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Colby Long

The Effect of Gene Flow on Coalescent-based Species-Tree Inference

Distinguishing Phylogenetic Networks

Species Tree Inference from Genomic Sequences Using the Log-Det Distance

Identifiability and Reconstructibility of Species Phylogenies Under a Modified Coalescent

Bounds on the Expected Size of the Maximum Agreement Subtree

Contact Info

Product

Resources

About