Gene regulatory pathways converge at the level of transcription, where interactions among regulatory genes and between regulators and target genes result in the establishment of spatiotemporal patterns of gene expression. The growing identification of direct target genes for key transcription factors (TFs) through traditional and high-throughput experimental approaches has facilitated the elucidation of regulatory networks at the genome level. To integrate this information into a Web-based knowledgebase, we have developed the Arabidopsis Gene Regulatory Information Server (AGRIS). AGRIS, which contains all Arabidopsis (Arabidopsis thaliana) promoter sequences, TFs, and their target genes and functions, provides the scientific community with a platform to establish regulatory networks. AGRIS currently houses three linked databases: AtcisDB (Arabidopsis thaliana cis-regulatory database), AtTFDB (Arabidopsis thaliana transcription factor database), and AtRegNet (Arabidopsis thaliana regulatory network). AtTFDB contains 1,690 Arabidopsis TFs and their sequences (protein and DNA) grouped into 50 (October 2005) families with information on available mutants in the corresponding genes. AtcisDB consists of 25,806 (September 2005) promoter sequences of annotated Arabidopsis genes with a description of putative cis-regulatory elements. AtRegNet links, in direct interactions, several hundred genes with the TFs that control their expression. The current release of AtRegNet contains a total of 187 (September 2005) direct targets for 66 TFs. AGRIS can be accessed at http://Arabidopsis.med.ohio-state.edu.
The Unified Medical Language System (UMLS) is the largest thesaurus in the biomedical informatics domain. Previous works have shown that knowledge constructs comprised of transitively-associated UMLS concepts are effective for discovering potentially novel biomedical hypotheses. However, the extremely large size of the UMLS becomes a major challenge for these applications. To address this problem, we designed a k-neighborhood Decentralization Labeling Scheme (kDLS) for the UMLS, and the corresponding method to effectively evaluate the kDLS indexing results. kDLS provides a comprehensive solution for indexing the UMLS for very efficient large scale knowledge discovery. We demonstrated that it is highly effective to use kDLS paths to prioritize disease-gene relations across the whole genome, with extremely high fold-enrichment values. To our knowledge, this is the first indexing scheme capable of supporting efficient large scale knowledge discovery on the UMLS as a whole. Our expectation is that kDLS will become a vital engine for retrieving information and generating hypotheses from the UMLS for future medical informatics applications.
The conduct of clinical and translational research regularly involves the use of a variety of heterogeneous and large-scale data resources. Scalable methods for the integrative analysis of such resources, particularly when attempting to leverage computable domain knowledge in order to generate actionable hypotheses in a high-throughput manner, remain an open area of research. In this report, we describe both a generalizable design pattern for such integrative knowledge-anchored hypothesis discovery operations and our experience in applying that design pattern in the experimental context of a set of driving research questions related to the publicly available Osteoarthritis Initiative data repository. We believe that this 'test bed' project and the lessons learned during its execution are both generalizable and representative of common clinical and translational research paradigms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.