Aiton Goldman scite author profile

Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.

show abstract

P-POD, The Princeton Protein Orthology Database, as a Tool for Identifying Gene Function

Livstone

Oughtred

Heinicke

et al. 2009

Nat Prec

View full text Add to dashboard Cite

P-POD, the Princeton Protein Orthology Database, classifies proteins from model organisms and medically-important organisms into families of homologs and provides curated evidence from the literature addressing these relationships. The web page for each protein family includes a phylogenetic tree, sequence alignment, and cross-references to disease-related papers from SGD, papers describing complementation experiments, and OMIM gene and disease information.As participants in the Gene Ontology Consortium’s Reference Genome project, we seek to provide a consistent centralized method to identify orthologous proteins. We have expanded P-POD to include the protein complement of the twelve Reference Genomes. In addition, we have added new tools and search options to provide greater depth, breadth, and flexibility. Users may view families from multiple analyses generated by different methods and/or based on different sets of proteins. Using Notung, a software package that uses duplication-loss parsimony to resolve uncertainty in protein family trees, we have improved P-POD’s phylogenetic trees by fitting the protein trees to an established species phylogeny and annotating them with duplications and losses. A Notung applet on the P-POD web site identifies orthologous and paralogous relationships within each family and allows users to perform custom analyses on the phylogenies. These improvements and others make P-POD, in conjunction with the PANTHER database, an ideal tool for predicting the function of new, uncharacterized genes on the basis of their orthologous relationships to characterized ones.All the data in P-POD are freely and publicly available through the web and by downloading the entire database system via the URL "http://ortholog.princeton.edu/". This work is supported by supplemental funds (Kara Dolinski, subcontract PI) to NHGRI grant HG002273 (PIs Judith Blake, Michael Ashburner, J. Michael Cherry and Suzanna Lewis).

show abstract

P-POD, The Princeton Protein Orthology Database, as a Tool for Identifying Gene Function

et al. 2009

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Aiton Goldman

Reconciliation with Non-Binary Species Trees

P-POD, The Princeton Protein Orthology Database, as a Tool for Identifying Gene Function

P-POD, The Princeton Protein Orthology Database, as a Tool for Identifying Gene Function

Contact Info

Product

Resources

About