2022
DOI: 10.1093/database/baac098
|View full text |Cite
|
Sign up to set email alerts
|

Chemical–protein relation extraction with ensembles of carefully tuned pretrained language models

Abstract: The identification of chemical–protein interactions described in the literature is an important task with applications in drug design, precision medicine and biotechnology. Manual extraction of such relationships from the biomedical literature is costly and often prohibitively time-consuming. The BioCreative VII DrugProt shared task provides a benchmark for methods for the automated extraction of chemical–protein relations from scientific text. Here we describe our contribution to the shared task and report on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(15 citation statements)
references
References 38 publications
0
8
0
Order By: Relevance
“…If we employ an ensemble method for the DrugProt task, e.g. Weber et al (2021) and Luo et al (2021) , we can expect better performance; however, it is beyond the scope of this article.…”
Section: Resultsmentioning
confidence: 99%
“…If we employ an ensemble method for the DrugProt task, e.g. Weber et al (2021) and Luo et al (2021) , we can expect better performance; however, it is beyond the scope of this article.…”
Section: Resultsmentioning
confidence: 99%
“…We approached the drug-target relation classification task as a multi-label problem similar to [5]. However, we employed various setup configurations, including the use of Focal Loss instead of binary cross-entropy loss.…”
Section: Methodsmentioning
confidence: 99%
“…Figure 1 We conducted experiments using various model configurations, and subsequently linked chemical and gene to their ontologies. To ensure consistency, we adopted the same unique gene identifiers as in [5] for both the training and development datasets. These identifiers were normalized using a BioSync model [25] for chemicals, leveraging the comprehensive BioCreative V CDR (BC5CDR) dataset [26], and for proteins, utilizing the BioCreative II Gene Normalization (BC2GN) dataset [27].…”
Section: Datasetsmentioning
confidence: 99%
See 2 more Smart Citations