Knowledge base construction is crucial for summarising, understanding and inferring relationships between biomedical entities. However, for many practical applications such as drug discovery, the scarcity of relevant facts (e.g. gene X is therapeutic target for disease Y) severely limits a domain expert's ability to create a usable knowledge base, either directly or by training a relation extraction model. In this paper, we present a simple and effective method of extracting new facts with a pre-specified binary relationship type from the biomedical literature, without requiring any training data or hand-crafted rules. Our system discovers, ranks and presents the most salient patterns to domain experts in an interpretable form. By marking patterns as compatible with the desired relationship type, experts indirectly batch-annotate candidate pairs whose relationship is expressed with such patterns in the literature. Even with a complete absence of seed data, experts are able to discover thousands of high-quality pairs with the desired relationship within minutes. When a small number of relevant pairs do exist -even when their relationship is more general (e.g. gene X is biologically associated with disease Y) than the relationship of interest -our system leverages them in order to i) learn a better ranking of the patterns to be annotated or ii) generate weakly labelled pairs in a fully automated manner. We evaluate our method both intrinsically and via a downstream knowledge * Equal contribution. Listing order is alphabetical. Theodosia proposed and co-ordinated the research project, built the early prototypes and contributed the different methods for extracting and lexicalising patterns. Ashok provided conceptual work on the metrics for ranking simplifications and for the intrinsic evaluation, developed the simplification extraction module, ran the experiments for the automated workflow (with all the parameter variations) and performed all the extrinsic evaluations. Julien was mainly responsible for the system architecture and workflow, the intrinsic evaluation (including interacting with the experts), handling negation and speculation and the clustering algorithm. base completion task, and show that it is an effective way of constructing knowledge bases when few or no relevant facts are already available.
This paper presents our participation in the AGAC Track from the 2019 BioNLP Open Shared Tasks. We provide a solution for Task 3, which aims to extract "gene -function change -disease" triples, where "gene" and "disease" are mentions of particular genes and diseases respectively and "function change" is one of four pre-defined relationship types. Our system extends BERT (Devlin et al., 2018), a state-of-the-art language model, which learns contextual language representations from a large unlabelled corpus and whose parameters can be fine-tuned to solve specific tasks with minimal additional architecture. We encode the pair of mentions and their textual context as two consecutive sequences in BERT, separated by a special symbol. We then use a single linear layer to classify their relationship into five classes (four pre-defined, as well as 'no relation'). Despite considerable class imbalance, our system significantly outperforms a random baseline while relying on an extremely simple setup with no specially engineered features.
We study slow-roll inflation on a three-brane in a five-dimensional bulk where the effects of energy loss from the brane due to graviton emission is included in a self-consistent manner. We explicitly derive the form of the energy loss term due to inflaton-to-graviton scattering and thus determine the precise dynamics of the two resulting inflationary solutions. What is also remarkable is that nonconservation of energy on the brane causes the curvature perturbation to not be conserved on superhorizon scales even for the purely adiabatic perturbations produced in single-field inflation. Thus the standard method of calculating the power spectrum of inflaton fluctuations at Hubble exit and equating it to the power spectrum at horizon reentry no longer holds. The superhorizon evolution of the perturbations must be tracked from horizon exit through to when the modes reenter the horizon for the late time power spectrum to be calculated. We develop the methodology to do this in this paper as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.