The identification of genomic loci associated with human genetic syndromes has been significantly facilitated through the generation of high density SNP arrays. However, optimal selection of candidate genes from within such loci is still a tedious labor-intensive bottleneck. Syndrome to Gene (S2G) is based on novel algorithms which allow an efficient search for candidate genes in a genomic locus, using known genes whose defects cause phenotypically similar syndromes. S2G (http://fohs.bgu.ac.il/s2g/index.html) includes two components: a phenotype Online Mendelian Inheritance in Man (OMIM)-based search engine that alleviates many of the problems in the existing OMIM search engine (negation phrases, overlapping terms, etc.). The second component is a gene prioritizing engine that uses a novel algorithm to integrate information from 18 databases. When the detailed phenotype of a syndrome is inserted to the webbased software, S2G offers a complete improved search of the OMIM database for similar syndromes. The software then prioritizes a list of genes from within a genomic locus, based on their association with genes whose defects are known to underlie similar clinical syndromes. We demonstrate that in all 30 cases of novel disease genes identified in the past year, the disease gene was within the top 20% of candidate genes predicted by S2G, and in most cases-within the top 10%. Thus, S2G provides clinicians with an efficient tool for diagnosis and researchers with a candidate gene prediction tool based on phenotypic data and a wide range of gene data resources. S2G can also serve in studies of polygenic diseases, and in finding interacting molecules for any gene of choice.
BackgroundThe OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, varying sentence structures and terminological variants.These variations hinder searching for syndromes or using the large amount of phenotypic information for research purposes. In addition, negation forms also create false positives when searching the textual description of phenotypes and induce noise in text mining applications.DescriptionOur method allows efficient and complete search of OMIM phenotypes as well as improved data-mining of the OMIM phenome. Applying natural language processing, each phrase is tagged with additional semantic information using UMLS and MESH. Using a grammar based method, annotated phrases are clustered into groups denoting similar phenotypes. These groups of synonymous expressions enable precise search, as query terms can be matched with the many variations that appear in OMIM, while avoiding over-matching expressions that include the query term in a negative context. On the basis of these clusters, we computed pair-wise similarity among syndromes in OMIM. Using this new similarity measure, we identified 79,770 new connections between syndromes, an average of 16 new connections per syndrome. Our project is Web-based and available at http://fohs.bgu.ac.il/s2g/csiomimConclusionsThe resulting enhanced search functionality provides clinicians with an efficient tool for diagnosis. This search application is also used for finding similar syndromes for the candidate gene prioritization tool S2G.The enhanced OMIM database we produced can be further used for bioinformatics purposes such as linking phenotypes and genes based on syndrome similarities and the known genes in Morbidmap.
This paper defines and studies a new, interesting, and challenging benchmark problem that originates in systems biology. The minimal seed-set problem is defined as follows: given a description of the metabolic reactions of an organism, characterize the minimal set of nutrients with which it could synthesize all nutrients it is capable of synthesizing. Current methods used in systems biology yield only approximate solutions. And although it is natural to cast it as a planning problem, current optimal planners are unable to solve it, while non-optimal planners return plans that are very far from optimal. As a planning problem, it is inherently delete-free, has many zero-cost actions, all propositions are landmarks, and many legal permutations of the plan exist. We show how a simple uninformed search algorithm that exploits inherent independence between sub-goals can solve it optimally by reducing the branching factor drastically.
Delete-free planning underlies many popular relaxation (h+) based heuristics used in state-of-the-art planners; it provides a simpler setting for exploring new pruning methods and other ideas; and a number of interesting recent planning domains are naturally delete-free. In this paper we explore new pruning methods for planning in delete-free planning domains. First, we observe that optimal delete-free plans can be composed from contiguous sub-plans that focus on one fact landmark at a time. Thus, instead of attempting to achieve the goal, the planner can focus on more easily achievable landmarks at each stage. Then, we suggest a number of complementary pruning techniques that are made more powerful with this observation. To carry out these pruning techniques efficiently, we make heavy use of an And/Or graph depicting the planning problem. We empirically evaluate these ideas using the FD framework, and show that they lead to clear improvements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.