Broad-coverage biomedical relation extraction with SemRep

Kilicoglu, Halil; Rosemblat, Graciela; Fiszman, Marcelo; Shin, Dongwook

doi:10.1186/s12859-020-3517-7

Cited by 74 publications

(57 citation statements)

References 102 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are methods aimed at NER that have been developing during the last years (Kaewphan et al, 2018 ; Korvigo et al, 2018 ; Hemati and Mehler, 2019 ; Hong and Lee, 2020 ; Huang et al, 2020 ; Kilicoglu et al, 2020 ). Most of them are based on algorithms for NER related either to chemicals or biological objects.…”

Section: Introductionmentioning

confidence: 99%

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

et al. 2020

View full text Add to dashboard Cite

Text analysis can help to identify named entities (NEs) of small molecules, proteins, and genes. Such data are very important for the analysis of molecular mechanisms of disease progression and development of new strategies for the treatment of various diseases and pathological conditions. The texts of publications represent a primary source of information, which is especially important to collect the data of the highest quality due to the immediate obtaining information, in comparison with databases. In our study, we aimed at the development and testing of an approach to the named entity recognition in the abstracts of publications. More specifically, we have developed and tested an algorithm based on the conditional random fields, which provides recognition of NEs of (i) genes and proteins and (ii) chemicals. Careful selection of abstracts strictly related to the subject of interest leads to the possibility of extracting the NEs strongly associated with the subject. To test the applicability of our approach, we have applied it for the extraction of (i) potential HIV inhibitors and (ii) a set of proteins and genes potentially responsible for viremic control in HIV-positive patients. The computational experiments performed provide the estimations of evaluating the accuracy of recognition of chemical NEs and proteins (genes). The precision of the chemical NEs recognition is over 0.91; recall is 0.86, and the F1-score (harmonic mean of precision and recall) is 0.89; the precision of recognition of proteins and genes names is over 0.86; recall is 0.83; while F1-score is above 0.85. Evaluation of the algorithm on two case studies related to HIV treatment confirms our suggestion about the possibility of extracting the NEs strongly relevant to (i) HIV inhibitors and (ii) a group of patients i.e., the group of HIV-positive individuals with an ability to maintain an undetectable HIV-1 viral load overtime in the absence of antiretroviral therapy. Analysis of the results obtained provides insights into the function of proteins that can be responsible for viremic control. Our study demonstrated the applicability of the developed approach for the extraction of useful data on HIV treatment.

show abstract

Section: Introductionmentioning

confidence: 99%

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

et al. 2020

View full text Add to dashboard Cite

show abstract

“…The INDRA system can translate scientific prose directly into executable graphical models [108, 109]. The SemRep system (soon to be released in Java) is being upgraded with exciting features, including factuality levels (potentially useful for improving “knowledge hygiene” and identifying contradictory claims [110]) and end-user extensibility [46].…”

Section: Discussionmentioning

confidence: 99%

“…SemMedDB is a knowledge database deployed extensively in biomedical research and developed at the US National Library of Medicine. The knowledge contained in SemMedDB consists of subject-predicate-object triples (or predications) extracted from titles and abstracts in MEDLINE [44] using the SemRep biomedical NLP system [44, 45, 46]. SemRep can be thought of as a machine reading utility for transforming biomedical literature into computable knowledge.…”

Section: Introductionmentioning

confidence: 99%

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

Malec

Bernstam

Wei

et al. 2020

Preprint

View full text Add to dashboard Cite

Introduction: Confounding bias threatens the reliability of observational studies and poses a significant scientific challenge. This paper introduces a framework for identifying confounding factors by exploiting literature-derived computable knowledge. In previous work, we have shown that semantic constraint search over computable knowledge extracted from the literature can be useful for reducing confounding bias in statistical models of EHR-derived observational clinical data. We hypothesize that adjustment sets of literature-derived confounders could also improve causal inference. Methods: We introduce two methods (semantic vectors and string-based confounder search) that query the literature for potential confounders and use this information to build models from EHR-derived data to more accurately estimate causal effects. These methods search SemMedDB for indications TREATED BY the drug that is also known to CAUSE the adverse event. For evaluation, we attempt to rediscover associations in a publicly available reference dataset containing expected pairwise relationships between drugs and adverse events from empirical data derived from a corpus of 2.2M EHR-derived clinical notes. For our knowledge-base, we use SemMedDB, a database of computable knowledge mined from the biomedical literature. Using standard adjustment and causal inference procedures on dichotomous drug exposures, confounders, and adverse event outcomes, varying numbers of literature-derived confounders are combined with EHR data to predict and estimate causal effects in light of the literature-derived confounders. We then compare the performance of the new methods with naive ($\chi^2$, reporting odds ratio) measures of association. Results and Conclusions: Logistic regression with ten vector space-derived confounders achieved the most improvement with AUROC of 0.628 (95\% CI: [0.556,0.720]), compared with baseline $\chi^2$ 0.507 (95\% CI: [0.431,0.583]). Bias reduction was improved more often in modeling methods using more rather than less information, and using semantic vector rather than string-based search. We found computable knowledge useful for improving automated causal inference, and identified opportunities for further improvement, including a role for adjudicating literature-derived confounders by subject matter experts.

show abstract

“…There is disagreement as to whether cooccurrence based methods are too noisy. More complex methods such as relation extraction (Kilicoglu et al, 2020) models exist, however there is a trade-off between precision and recall with these models. Co-occurrence-based models inherently have a high recall since all co-occurrences are considered a relation, but this high recall comes at the expense of precision since many co-occurrences do not in actuality constitute a relationship.…”

Section: Limitationsmentioning

confidence: 99%

Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest

Henry

Wijesinghe

Myers

et al. 2021

Front. Res. Metr. Anal.

View full text Add to dashboard Cite

In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.

show abstract

Broad-coverage biomedical relation extraction with SemRep

Cited by 74 publications

References 102 publications

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance

Using Literature Based Discovery to Gain Insights Into the Metabolomic Processes of Cardiac Arrest

Contact Info

Product

Resources

About