BackgroundA variety of methods based on sequence similarity, reconciliation, synteny or functional characteristics, can be used to infer orthology and paralogy relations between genes of a given gene family G. But is a given set C of orthology/paralogy constraints possible, i.e., can they simultaneously co-exist in an evolutionary history for G? While previous studies have focused on full sets of constraints, here we consider the general case where C does not necessarily involve a constraint for each pair of genes. The problem is subdivided in two parts: (1) Is C satisfiable, i.e. can we find an event-labeled gene tree G inducing C? (2) Is there such a G which is consistent, i.e., such that all displayed triplet phylogenies are included in a species tree?ResultsPrevious results on the Graph sandwich problem can be used to answer to (1), and we provide polynomial-time algorithms for satisfiability and consistency with a given species tree. We also describe a new polynomial-time algorithm for the case of consistency with an unknown species tree and full knowledge of pairwise orthology/paralogy relationships, as well as a branch-and-bound algorithm in the case when unknown relations are present. We show that our algorithms can be used in combination with ProteinOrtho, a sequence similarity-based orthology detection tool, to extract a set of robust orthology/paralogy relationships.
MotivationsGene trees inferred solely from multiple alignments of homologous sequences often contain weakly supported and uncertain branches. Information for their full resolution may lie in the dependency between gene families and their genomic context. Integrative methods, using species tree information in addition to sequence information, often rely on a computationally intensive tree space search which forecloses an application to large genomic databases.ResultsWe propose a new method, called ProfileNJ, that takes a gene tree with statistical supports on its branches, and corrects its weakly supported parts by using a combination of information from a species tree and a distance matrix. Its low running time enabled us to use it on the whole Ensembl Compara database, for which we propose an alternative, arguably more plausible set of gene trees. This allowed us to perform a genome-wide analysis of duplication and loss patterns on the history of 63 eukaryote species, and predict ancestral gene content and order for all ancestors along the phylogeny.AvailabilityA web interface called RefineTree, including ProfileNJ as well as a other gene tree correction methods, which we also test on the Ensembl gene families, is available at: http://www-ens.iro.umontreal.ca/~adbit/polytomysolver.html. The code of ProfileNJ as well as the set of gene trees corrected by ProfileNJ from Ensembl Compara version 73 families are also made available.
Phylogenetic tree reconstruction is usually done by local search heuristics that explore the space of the possible tree topologies via simple rearrangements of their structure. Tree rearrangement heuristics have been used in combination with practically all optimization criteria in use, from maximum likelihood and parsimony to distance-based principles, and in a Bayesian context. Their basic components are rearrangement moves that specify all possible ways of generating alternative phylogenies from a given one, and whose fundamental property is to be able to transform, by repeated application, any phylogeny into any other phylogeny. Despite their long tradition in tree-based phylogenetics, very little research has gone into studying similar rearrangement operations for phylogenetic network—that is, phylogenies explicitly representing scenarios that include reticulate events such as hybridization, horizontal gene transfer, population admixture, and recombination. To fill this gap, we propose “horizontal” moves that ensure that every network of a certain complexity can be reached from any other network of the same complexity, and “vertical” moves that ensure reachability between networks of different complexities. When applied to phylogenetic trees, our horizontal moves—named rNNI and rSPR—reduce to the best-known moves on rooted phylogenetic trees, nearest-neighbor interchange and rooted subtree pruning and regrafting. Besides a number of reachability results—separating the contributions of horizontal and vertical moves—we prove that rNNI moves are local versions of rSPR moves, and provide bounds on the sizes of the rNNI neighborhoods. The paper focuses on the most biologically meaningful versions of phylogenetic networks, where edges are oriented and reticulation events clearly identified. Moreover, our rearrangement moves are robust to the fact that networks with higher complexity usually allow a better fit with the data. Our goal is to provide a solid basis for practical phylogenetic network reconstruction.
BackgroundWhile tree-oriented methods for inferring orthology and paralogy relations between genes are based on reconciling a gene tree with a species tree, many tree-free methods are also available (usually based on sequence similarity). Recently, the link between orthology relations and gene trees has been formally considered from the perspective of reconstructing phylogenies from orthology relations. In this paper, we consider this link from a correction point of view. Indeed, a gene tree induces a set of relations, but the converse is not always true: a set of relations is not necessarily in agreement with any gene tree. A natural question is thus how to minimally correct an infeasible set of relations. Another natural question, given a gene tree and a set of relations, is how to minimally correct a gene tree so that the resulting gene tree fits the set of relations.ResultsWe consider four variants of relation and gene tree correction problems, and provide hardness results for all of them. More specifically, we show that it is NP-Hard to edit a minimum of set of relations to make them consistent with a given species tree. We also show that the problem of finding a maximum subset of genes that share consistent relations is hard to approximate. We then demonstrate that editing a gene tree to satisfy a given set of relations in a minimum way is NP-Hard, where “minimum” refers either to the number of modified relations depicted by the gene tree or the number of clades that are lost. We also discuss some of the algorithmic perspectives given these hardness results.
BackgroundReconciled gene trees yield orthology and paralogy relationships between genes. This information may however contradict other information on orthology and paralogy provided by other footprints of evolution, such as conserved synteny.ResultsWe explore a way to include external information on orthology in the process of gene tree construction. Given an initial gene tree and a set of orthology constraints on pairs of genes or on clades, we give polynomial-time algorithms for producing a modified gene tree satisfying the set of constraints, that is as close as possible to the original one according to the Robinson-Foulds distance. We assess the validity of the modifications we propose by computing the likelihood ratio between initial and modified trees according to sequence alignments on Ensembl trees, showing that often the two trees are statistically equivalent.AvailabilitySoftware and data available upon request to the corresponding author.
In 1990, Hendry Conjectured that every Hamiltonian chordal graph is cycle extendable; that is, the vertices of any non-Hamiltonian cycle are contained in a cycle of length one greater. We disprove this conjecture by constructing counterexamples on n vertices for any n ≥ 15. Furthermore, we show that there exist counterexamples where the ratio of the length of a nonextendable cycle to the total number of vertices can be made arbitrarily small. We then consider cycle extendability in Hamiltonian chordal graphs where certain induced subgraphs are forbidden, notably Pn and the bull. Introduction.All graphs considered here are simple, finite, and undirected. A graph is Hamiltonian if it has a cycle containing all vertices; such a cycle is a Hamiltonian cycle. A graph G on n vertices is pancyclic if G contains a cycle of length m for every integer 3 ≤ m ≤ n. Let C and C be cycles in G of length m and m + 1, respectively, such that V (C ) \ V (C) = {v}. We say that C is an extension of C and that C is extendable (or, C extends through v to C ). If every non-Hamiltonian cycle of G is extendable, then G is cycle extendable. If, in addition, every vertex of G is contained in a triangle, then G is fully cycle extendable. The study of pancyclic graphs was initiated by Bondy [3], who recognized that most of the sufficient conditions for Hamiltonicity known at the time in fact implied a more complex cycle structure. Hendry [12] introduced the concept of cycle extendability and proved that many known sufficient conditions for a graph to be pancyclic in fact were sufficient for a graph to be (fully) cycle extendable.Given a graph G and a set of vertices U ⊆ V (G), we denote by G[U ] the subgraph obtained by deleting from G all vertices except those in U ; G[U ] is the subgraph induced by U , and a subgraph of G is an induced subgraph if it is induced by some U ⊆ V (G). A graph is chordal if it contains no induced cycles of length 4 or greater. It is not hard to show that every Hamiltonian chordal graph is pancyclic (see Proposition 3.4); however, the question of whether not every Hamiltonian chordal graph is cycle extendable has remained open since 1990.Conjecture 1.1 (Hendry's Conjecture [12]). If G is a Hamiltonian chordal graph, then G is fully cycle extendable.In this paper, we settle Hendry's Conjecture in the negative. In section 2, we show that (a) for any n ≥ 15 there exists a counterexample to Hendry's Conjecture on n vertices and (b) for every real number α > 0 there exists a counterexample G with a nonextendable cycle C such that |V (C)| < α|V (G)|. The question then remains: *
Background The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.