Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.
P-POD, the Princeton Protein Orthology Database, classifies proteins from model organisms and medically-important organisms into families of homologs and provides curated evidence from the literature addressing these relationships. The web page for each protein family includes a phylogenetic tree, sequence alignment, and cross-references to disease-related papers from SGD, papers describing complementation experiments, and OMIM gene and disease information.As participants in the Gene Ontology Consortium’s Reference Genome project, we seek to provide a consistent centralized method to identify orthologous proteins. We have expanded P-POD to include the protein complement of the twelve Reference Genomes. In addition, we have added new tools and search options to provide greater depth, breadth, and flexibility. Users may view families from multiple analyses generated by different methods and/or based on different sets of proteins. Using Notung, a software package that uses duplication-loss parsimony to resolve uncertainty in protein family trees, we have improved P-POD’s phylogenetic trees by fitting the protein trees to an established species phylogeny and annotating them with duplications and losses. A Notung applet on the P-POD web site identifies orthologous and paralogous relationships within each family and allows users to perform custom analyses on the phylogenies. These improvements and others make P-POD, in conjunction with the PANTHER database, an ideal tool for predicting the function of new, uncharacterized genes on the basis of their orthologous relationships to characterized ones.All the data in P-POD are freely and publicly available through the web and by downloading the entire database system via the URL "http://ortholog.princeton.edu/". This work is supported by supplemental funds (Kara Dolinski, subcontract PI) to NHGRI grant HG002273 (PIs Judith Blake, Michael Ashburner, J. Michael Cherry and Suzanna Lewis).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.