Analyzing the relationships among various drugs is an essential issue in the field of computational biology. Different kinds of informative knowledge, such as drug repurposing, can be extracted from drug-drug relationships. Scientific literature represents a rich source for the retrieval of knowledge about the relationships between biological concepts, mainly drug-drug, disease-disease, and drug-disease relationships. In this paper, we propose DDREL as a general-purpose method that applies deep learning on scientific literature to automatically extract the graph of syntactic and semantic relationships among drugs. DDREL remarkably outperforms the existing human drug network method and a random network respected to average similarities of drugs’ anatomical therapeutic chemical (ATC) codes. DDREL is able to shed light on the existing deficiency of the ATC codes in various drug groups. From the DDREL graph, the history of drug discovery became visible. In addition, drugs that had repurposing score 1 (diflunisal, pargyline, fenofibrate, guanfacine, chlorzoxazone, doxazosin, oxymetholone, azathioprine, drotaverine, demecarium, omifensine, yohimbine) were already used in additional indication. The proposed DDREL method justifies the predictive power of textual data in PubMed abstracts. DDREL shows that such data can be used to 1- Predict repurposing drugs with high accuracy, and 2- Reveal existing deficiencies of the ATC codes in various drug groups.
Entity resolution refers to the process of identifying, matching, and integrating records belonging to unique entities in a data set. However, a comprehensive comparison across all pairs of records leads to quadratic matching complexity. Therefore, blocking methods are used to group similar entities into small blocks before the matching. Available blocking methods typically do not consider semantic relationships among records. In this paper, we propose a Semantic-aware Meta-Blocking approach called SeMBlock. SeMBlock considers the semantic similarity of records by applying locality-sensitive hashing (LSH) based on word embedding to achieve fast and reliable blocking in a large-scale data environment. To improve the quality of the blocks created, SeMBlock builds a weighted graph of semantically similar records and prunes the graph edges. We extensively compare SeMBlock with 16 existing blocking methods, using three real-world data sets. The experimental results show that SeMBlock significantly outperforms all 16 methods with respect to two relevant measures, F-measure and pair-quality measure. F-measure and pair-quality measure of SeMBlock are approximately 7% and 27%, respectively, higher than recently released blocking methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.