This paper investigates the role of outliers in literature-based knowledge discovery. It shows that detecting interesting outliers which appear in the literature on a given phenomenon can help the expert to find implicit relationships among concepts of different domains. The underlying assumption is that while the majority of articles in the given scientific domain describe matters related to a common understanding of the domain, the exploration of outliers may lead to the detection of scientifically interesting bridging concepts among disjoint sets of scientific articles. The proposed approach contributes to cross-context link discovery by proving the utility of outlier detection for finding bisociative links in the process of autism literature exploration, as well as by uncovering implicit relationships in the articles from the migraine domain.
The aim of this chapter is to present the role of outliers in literaturebased knowledge discovery that can be used to explore potential bisociative links between different domains of expertise. The proposed approach upgrades the RaJoLink method which provides a novel framework for effectively guiding the knowledge discovery from literature, based on the principle of rare terms from scientific articles. This chapter shows that outlier documents can be successfully used as means of detecting bridging terms that connect documents of two different literature sources. This linking process, known also as closed discovery, is incorporated as one of the steps of the RaJoLink methodology, and is performed by using publicly available topic ontology construction tool OntoGen. We chose scientific articles about autism as the application example with which we demonstrated the proposed approach.
In the field of autism, an enormous increase in available information makes it very difficult to connect fragments of knowledge into a more coherent picture. We present a literature mining method, RaJoLink, to search for matched themes in unrelated literature that may contribute to a better understanding of complex pathological conditions, such as autism. 214 full text articles on autism, published in PubMed, served as a source of data. Using ontology construction, we identified the main concepts of what is already known about autism. Then, the RaJoLink method, based on Swanson's ABC model, was used to reveal potentially interesting, but not yet investigated, connections between different concepts in research. Among the more interesting concepts identified with RaJoLink in our study were calcineurin and NF-kappaB. Both terms can be linked to neuro-immune abnormalities in the brain of patients with autism. Further research is needed to provide stronger evidence about calcineurin and NF-kappaB involvement in autism. However, the analysis presented confirms that this method could support experts on their way towards discovering hidden relationships and towards a better understanding of the disorder.
Text mining methods can facilitate the generation of biomedical hypotheses by suggesting novel associations between diseases and genes. Previously, we developed a rare-term model called RaJoLink (Petric et al, J. Biomed. Inform. 42(2): 219-227, 2009) in which hypotheses are formulated on the basis of terms rarely associated with a target domain. Since many current medical hypotheses are formulated in terms of molecular entities and molecular mechanisms, here we extend the methodology to proteins and genes, using a standardized vocabulary as well as a gene/protein network model. The proposed enhanced RaJoLink rare-term model combines text mining and gene prioritization approaches. Its utility is illustrated by finding known as well as potential gene-disease associations in ovarian cancer using MEDLINE abstracts and the STRING database.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.