Abstract:In this paper, we examine two related problems of inferring the evolutionary history of n objects, either from present characters of the objects or from several partial estimates of their evolutionary history. The first problem is called the Phylogeny problem, and second is the Tree Compatibility problem. Both of these problems are central in algorithmic approaches to the study of evolution and in other problems of historical reconstruction. In this paper, we show that both of these problems can be solved by g… Show more
“…that they can be placed at the leaves of an evolutionary tree within which each site mutates at most once. Haplotype matrices admitting a perfect phylogeny are completely characterised [8] [9] by the absence of the forbidden submatrix…”
The problem Parsimony Haplotyping (P H) asks for the smallest set of haplotypes which can explain a given set of genotypes, and the problem Minimum Perfect Phylogeny Haplotyping (M P P H) asks for the smallest such set which also allows the haplotypes to be embedded in a perfect phylogeny evolutionary tree, a well-known biologically-motivated data structure. For P H we extend recent work of [16] by further mapping the interface between "easy" and "hard" instances, within the framework of (k, l)-bounded instances. By exploring, in the same way, the tractability frontier of M P P H we provide the first concrete, positive results for this problem, and the algorithms underpinning these results offer new insights about how M P P H might be further tackled in the future. In both P H and M P P H intriguing open problems remain.
“…that they can be placed at the leaves of an evolutionary tree within which each site mutates at most once. Haplotype matrices admitting a perfect phylogeny are completely characterised [8] [9] by the absence of the forbidden submatrix…”
The problem Parsimony Haplotyping (P H) asks for the smallest set of haplotypes which can explain a given set of genotypes, and the problem Minimum Perfect Phylogeny Haplotyping (M P P H) asks for the smallest such set which also allows the haplotypes to be embedded in a perfect phylogeny evolutionary tree, a well-known biologically-motivated data structure. For P H we extend recent work of [16] by further mapping the interface between "easy" and "hard" instances, within the framework of (k, l)-bounded instances. By exploring, in the same way, the tractability frontier of M P P H we provide the first concrete, positive results for this problem, and the algorithms underpinning these results offer new insights about how M P P H might be further tackled in the future. In both P H and M P P H intriguing open problems remain.
“…He showed that the BPP problem can be solved in linear time [8]. The problem we consider is an extension called the binary near perfect phylogeny reconstruction (BNPP).…”
Section: Preliminariesmentioning
confidence: 99%
“…Input Assumptions: If no pair of characters in input I contains the fourgamete property, we can use Gusfield's elegant algorithm [8] to reconstruct a perfect phylogeny. We assume that the all zeros taxa is present in the input.…”
Section: Lemma 1 [8] the Most Parsimonious Phylogeny For Input I Is mentioning
confidence: 99%
“…Only characters corresponding to non-isolated vertices can mutate more than once in any optimal phylogeny (a simple proof follows from Buneman graphs [16]). Since all characters of C \ M mutate exactly once, the algorithm constructs a perfect phylogeny on this character set using Gusfield's linear time algorithm [8]. The perfect phylogeny is unique because of Lemma 2.…”
Section: Let G(v E) Be the Conflict Graph Of I 2 Let Vnis ⊆ V Be Thmentioning
Abstract.We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states (referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally efficient but not robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number q of additional mutations. In this paper, we develop an algorithm for constructing optimal phylogenies and provide empirical evidence of its performance. The algorithm runs in time O((72κ) q nm + nm 2 ) where n is the number of taxa, m is the number of characters and κ is the number of characters that share four gametes with some other character. This is fixed parameter tractable when q and κ are constants and significantly improves on the previous asymptotic bounds by reducing the exponent to q. Furthermore, the complexity of the previous work makes it impractical and in fact no known implementation of it exists. We implement our algorithm and demonstrate it on a selection of real data sets, showing that it substantially outperforms its worstcase bounds and yields far superior results to a commonly used heuristic method in at least one case. Our results therefore describe the first practical phylogenetic tree reconstruction algorithm that finds guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.
“…Numerous phylogenetic inference methods, e.g. maximum parsimony, maximum likelihood, distance matrix fitting, subtrees consistency, and quartet based methods have been proposed over the years [15,1,14,26,17,27,4]; furthermore, it is rather common to compare the same set of species w.r.t. different biological sequences or different genes, hence obtaining various trees.…”
Abstract. A phylogenetic tree is a rooted tree with unbounded degree such that each leaf node is uniquely labelled from 1 to n. The descendent subtree of of a phylogenetic tree T is the subtree composed by all edges and nodes of T descending from a vertex. Given a set of phylogenetic trees, we present linear time algorithms for finding all leaf-agree descendent subtrees as well as all isomorphic descendent subtrees.
The normalized cluster distance, d(A, B), of two sets is defined by d(A, B) = ∆(A, B)/(|A| + |B|), where ∆(A, B)denotes the symmetric set difference of two sets. We show that computing all pairs normalized cluster distances between descendent subtrees of two phylogenetic trees can be done in O(n 2 ) time. Since the total size of the outputs will be Θ(n 2 ), the algorithm is thus computationally optimal. A nearest subtree of a subset of leaves is such a descendent subtree that has the smallest normalized cluster distance to these leaves. Here we show that finding nearest subtrees for a collection of pairwise disjointed subsets of leaves can be done in O(n) time. Several applications of these algorithms in areas of bioinformatics is considered. Among them, we discuss the 2CS (Two component systems) functional analysis and classifications on bacterial genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.