BackgroundThe large and growing number of published studies, and their increasing rate of publication, makes the task of identifying relevant studies in an unbiased way for inclusion in systematic reviews both complex and time consuming. Text mining has been offered as a potential solution: through automating some of the screening process, reviewer time can be saved. The evidence base around the use of text mining for screening has not yet been pulled together systematically; this systematic review fills that research gap. Focusing mainly on non-technical issues, the review aims to increase awareness of the potential of these technologies and promote further collaborative research between the computer science and systematic review communities.MethodsFive research questions led our review: what is the state of the evidence base; how has workload reduction been evaluated; what are the purposes of semi-automation and how effective are they; how have key contextual problems of applying text mining to the systematic review field been addressed; and what challenges to implementation have emerged?We answered these questions using standard systematic review methods: systematic and exhaustive searching, quality-assured data extraction and a narrative synthesis to synthesise findings.ResultsThe evidence base is active and diverse; there is almost no replication between studies or collaboration between research teams and, whilst it is difficult to establish any overall conclusions about best approaches, it is clear that efficiencies and reductions in workload are potentially achievable.On the whole, most suggested that a saving in workload of between 30% and 70% might be possible, though sometimes the saving in workload is accompanied by the loss of 5% of relevant studies (i.e. a 95% recall).ConclusionsUsing text mining to prioritise the order in which items are screened should be considered safe and ready for use in ‘live’ reviews. The use of text mining as a ‘second screener’ may also be used cautiously. The use of text mining to eliminate studies automatically should be considered promising, but not yet fully proven. In highly technical/clinical areas, it may be used with a high degree of confidence; but more developmental and evaluative work is needed in other disciplines.Electronic supplementary materialThe online version of this article (doi:10.1186/2046-4053-4-5) contains supplementary material, which is available to authorized users.
Abstract. Technical terms (henceforth called terms),are important elements for digital libraries. In this paperwe present a domain-independent method for the automatic extraction of multi-word terms, from machinereadable special language corpora.The method, (C-value/NC-value), combines linguistic and statistical information. The rst part, C-value enhances the common statistical measure of frequency of occurrence for term extraction, making it sensitive t o a particular ty p e o f m ulti-word terms, the nested terms. The second part, NC-value, gives: 1) a method for the extraction of term context words (words that tend to appear with terms), 2) the incorporation of information from term context words to the extraction of terms.
Entity mentions embedded in longer entity mentions are referred to as nested entities. Most named entity recognition (NER) systems deal only with the flat entities and ignore the inner nested ones, which fails to capture finer-grained semantic information in underlying texts. To address this issue, we propose a novel neural model to identify nested entities by dynamically stacking flat NER layers. Each flat NER layer is based on the state-ofthe-art flat NER model that captures sequential context representation with bidirectional long short-term memory (LSTM) layer and feeds it to the cascaded CRF layer. Our model merges the output of the LSTM layer in the current flat NER layer to build new representation for detected entities and subsequently feeds them into the next flat NER layer. This allows our model to extract outer entities by taking full advantage of information encoded in their corresponding inner entities, in an inside-to-outside way. Our model dynamically stacks the flat NER layers until no outer entities are extracted. Extensive evaluation shows that our dynamic model outperforms state-ofthe-art feature-based systems on nested NER, achieving 74.7% and 72.2% on GENIA and ACE2005 datasets, respectively, in terms of Fscore.
Systematic reviews are a widely accepted research method. However, it is increasingly difficult to conduct them to fit with policy and practice timescales, particularly in areas which do not have well indexed, comprehensive bibliographic databases. Text mining technologies offer one possible way forward in reducing the amount of time systematic reviews take to conduct. They can facilitate the identification of relevant literature, its rapid description or categorization, and its summarization. In this paper, we describe the application of four text mining technologies, namely, automatic term recognition, document clustering, classification and summarization, which support the identification of relevant studies in systematic reviews. The contributions of text mining technologies to improve reviewing efficiency are considered and their strengths and weaknesses explored. We conclude that these technologies do have the potential to assist at various stages of the review process. However, they are relatively unknown in the systematic reviewing community, and substantial evaluation and methods development are required before their possible impact can be fully assessed. Copyright © 2011 John Wiley & Sons, Ltd.
The volume of biomedical literature is increasing at such a rate that it is becoming difficult to locate, retrieve and manage the reported information without text mining, which aims to automatically distill information, extract facts, discover implicit links and generate hypotheses relevant to user needs. Ontologies, as conceptual models, provide the necessary framework for semantic representation of textual information. The principal link between text and an ontology is terminology, which maps terms to domain-specific concepts. This paper summarises different approaches in which ontologies have been used for text-mining applications in biomedicine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.