Abstract:Abstract.A parameterized version of the Steiner tree problem in phylogeny is defined, where the parameter measures the amount by which a phylogeny differs from "perfection." This problem is shown to be solvable in polynomial time for any fixed value of the parameter.
“…In defining formal models for parsimony-based phylogeny construction, we borrow definitions and notations from Fernandez-Baca and Lagergren [6]. The input to the BNPP problem is an n × m matrix I where rows R represent taxa and are strings over states.…”
Section: Preliminariesmentioning
confidence: 99%
“…Fernandez-Baca and Lagergren recently considered the problem of reconstructing optimal near-perfect phylogenies [6], which assume that the size of the optimal phylogeny is at most q larger than that of a perfect phylogeny for the same input size. They developed an algorithm to find the most parsimonious tree in time…”
Abstract.We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number q of additional mutations. In this paper, we develop an algorithm for constructing optimal phylogenies and provide empirical evidence of its performance. The algorithm runs in time O((72κ) q nm + nm 2 ) where n is the number of taxa, m is the number of characters and κ is the number of characters that share four gametes with some other character. This is fixed parameter tractable when q and κ are constants and significantly improves on the previous asymptotic bounds by reducing the exponent to q. Furthermore, the complexity of the previous work makes it impractical and in fact no known implementation of it exists. We implement our algorithm and demonstrate it on a selection of real data sets, showing that it substantially outperforms its worstcase bounds and yields far superior results to a commonly used heuristic method in at least one case. Our results therefore describe the first practical phylogenetic tree reconstruction algorithm that finds guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
“…In defining formal models for parsimony-based phylogeny construction, we borrow definitions and notations from Fernandez-Baca and Lagergren [6]. The input to the BNPP problem is an n × m matrix I where rows R represent taxa and are strings over states.…”
Section: Preliminariesmentioning
confidence: 99%
“…Fernandez-Baca and Lagergren recently considered the problem of reconstructing optimal near-perfect phylogenies [6], which assume that the size of the optimal phylogeny is at most q larger than that of a perfect phylogeny for the same input size. They developed an algorithm to find the most parsimonious tree in time…”
Abstract.We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number q of additional mutations. In this paper, we develop an algorithm for constructing optimal phylogenies and provide empirical evidence of its performance. The algorithm runs in time O((72κ) q nm + nm 2 ) where n is the number of taxa, m is the number of characters and κ is the number of characters that share four gametes with some other character. This is fixed parameter tractable when q and κ are constants and significantly improves on the previous asymptotic bounds by reducing the exponent to q. Furthermore, the complexity of the previous work makes it impractical and in fact no known implementation of it exists. We implement our algorithm and demonstrate it on a selection of real data sets, showing that it substantially outperforms its worstcase bounds and yields far superior results to a commonly used heuristic method in at least one case. Our results therefore describe the first practical phylogenetic tree reconstruction algorithm that finds guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
“…Rather, the major determinant of run time appears to be a dataset's imperfection, i.e., the difference between the optimal length and the number of variant sites. It has recently been shown that the phylogeny problem under various assumptions is fixed parameter tractable in imperfection [6,13,31,32] possibly suggesting why it is a critical factor in run time determination. The pars program of phylip, despite providing no guarantees of optimality, does indeed find optimal phylogenies in all of the above instances.…”
Section: Resultsmentioning
confidence: 99%
“…[3,12,27]). Some theoretical advances have recently been made in the efficient solution of near-perfect phylogenies, those that deviate only by a fixed amount from the assumption of perfection [6,13,31,32]. These methods can provide provably efficient solutions in many instances, but still struggle with some moderate-size data sets in practice.…”
Abstract. Reconstruction of phylogenetic trees is a fundamental problem in computational biology. While excellent heuristic methods are available for many variants of this problem, new advances in phylogeny inference will be required if we are to be able to continue to make effective use of the rapidly growing stores of variation data now being gathered. In this paper, we introduce an integer linear programming formulation to find the most parsimonious phylogenetic tree from a set of binary variation data. The method uses a flow-based formulation that could use exponential numbers of variables and constraints in the worst case. The method has, however, proved extremely efficient in practice on datasets that are well beyond the reach of the available provably efficient methods. The program solves several large mtDNA and Y-chromosome instances within a few seconds, giving provably optimal results in times competitive with fast heuristics than cannot guarantee optimality.
“…• Generalized Steiner tree problem [73,69,93]: Given an undirected graph G = (V, E, w), a subset Y of V , and partitions {Y 1 , Y 2 , · · · , Y k }, find a shortest tree T , such that at least one point from each Y i is in T .…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.