PERL (ActivePerl, version 5.8) is available at activestate.com; the PERLMagick libraries are available at imagemagick.org, and IMAJIN_COLOC, the source code and user documentation can be downloaded from http://www.fda.gov/cber/research/imaging/imageanalysis.htm.
Abstract:The bioscience field has seen some spectacular advances in genomic and proteomic technologies that are able to deliver vast quantities of information on cellular activity. Such technologies are of critical importance to biology, medical science and in drug discovery. However, living systems are highly complex and to fully exploit these technologies requires knowledge at many different levels. Information such as genome sequence data, gene expression data, protein-to-protein interactions and metabolic pathways is required to understand the complexity of biological processes. The challenge for bioinformatics is to tackle the problem of fragmentation of knowledge by integrating the many sources of heterogeneous information into a coherent entity. Another problem is that the high level of biological complexity and the fragmented nature of biological research has meant that it is difficult to keep fully conversant with the latest research and discoveries. Progress in one area of biology may have implications for other areas but the dissemination of this knowledge is not straightforward; difficulties such as differences in naming conventions for genes and biological processes has led to confusion and the lack of productivity. This paper reviews the most recent research to overcome the fragmentation problem where technologies such as text mining and ontologies are used within the knowledge discovery process and the specific technical challenges they address.
This paper describes how high level biological knowledge obtained from ontologies such as the Gene Ontology (GO) can be integrated with low level information extracted from a Bayesian network trained on protein interaction data. We can automatically generate a biological ontology by text mining the type II diabetes research literature. The ontology is populated with the entities and relationships from protein-to-protein interactions. New, previously unrelated information is extracted from the growing body of research literature and incorporated with knowledge already known on this subject from the gene ontology and databases such as BIND and BioGRID.We integrate the ontology within the probabilistic framework of Bayesian networks which enables reasoning and prediction of protein function.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.