We investigate the application of deep learning to biocuration tasks that involve classification of text associated with biomedical evidence in primary research articles. We developed a large-scale corpus of molecular papers derived from PubMed and PubMed Central open access records and used it to train deep learning word embeddings under the GloVe, FastText and ELMo algorithms. We applied those models to a distant supervised method classification task based on text from figure captions or fragments surrounding references to figures in the main text using a variety or models and parameterizations. We then developed document classification (triage) methods for molecular interaction papers by using deep learning mechanisms of attention to aggregate classification-based decisions over selected paragraphs in the document. We were able to obtain triage performance with an accuracy of 0.82 using a combined convolutional neural network, bi-directional long short-term memory architecture augmented by attention to produce a single decision for triage. In this work, we hope to encourage biocuration systems developers to apply deep learning methods to their specialized tasks by repurposing large-scale word embedding to apply to their data.
Evidence plays a crucial role in any biomedical research narrative, providing justification for some claims and refutation for others. We seek to build models of scientific argument using information extraction methods from fulltext papers. We present the capability of automatically extracting text fragments from primary research papers that describe the evidence presented in that paper's figures, which arguably provides the raw material of any scientific argument made within the paper. We apply richly contextualized deep representation learning pre-trained on biomedical domain corpus to the analysis of scientific discourse structures and the extraction of "evidence fragments" (i.e., the text in the results section describing data presented in a specified subfigure) from a set of biomedical experimental research articles. We first demonstrate our state-of-the-art scientific discourse tagger on two scientific discourse tagging datasets and its transferability to new datasets. We then show the benefit of leveraging scientific discourse tags for downstream tasks such as claim-extraction and evidence fragment detection. Our work demonstrates the potential of using evidence fragments derived from figure spans for improving the quality of scientific claims by cataloging, indexing and reusing evidence fragments as independent documents.
The biomedical scientific literature comprises a crucial, sometimes life-saving, natural language resource whose size is accelerating over time. The information in this resource tends to follow a style of discourse that is intended to provide scientific explanations for various pieces of evidence derived from experimental findings. Studying the rhetorical structure of the narrative discourse could enable more powerful information extraction methods to automatically construct models of scientific argument from full-text papers. In this paper, we apply richly contextualized deep representation learning to the analysis of scientific discourse structures as a clause-tagging task. We improve the current state-of-the-art clauselevel sequence tagging over text clauses for a set of discourse types (e.g. "hypothesis", "result", "implication", etc.) on scientific paragraphs. Our model uses contextualized embeddings, word-to-clause encoder, and clauselevel sequence tagging models and achieves F1 performance of 0.784.
Academic research is an exploration activity to solve problems that have never been resolved before. By this nature, each academic research work is required to perform a literature review to distinguish its novelties that have not been addressed by prior works. In natural language processing, this literature review is usually conducted under the "Related Work" section. The task of automatic related work generation aims to automatically generate the "Related Work" section given the rest of the research paper and a list of cited papers. Although this task was proposed over 10 years ago, it received little attention until very recently, when it was cast as a variant of the scientific multi-document summarization problem. However, even till today, the problems of automatic related work and citation text generation are not yet standardized. In this survey, we conduct a meta-study to compare the existing literature on related work generation from the perspectives of problem formulation, dataset collection, methodological approach, performance evaluation, and future prospects to provide the reader insight into the progress of the state-of-the-art studies, as well as and how future studies can be conducted. We also survey relevant fields of study that we suggest future work to consider integrating.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.