2019
DOI: 10.1093/database/baz045
|View full text |Cite
|
Sign up to set email alerts
|

An effective biomedical document classification scheme in support of biocuration: addressing class imbalance

Abstract: Published literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate. In addition, biomedical document classification often amounts to identifying a s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
10
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(10 citation statements)
references
References 41 publications
0
10
0
Order By: Relevance
“…Other work focuses on the task of associating genes with diseases [267] or finding associations between metabolites, proteins, genes, and diseases [258]. Another challenge is document triage, such as finding documents relevant to a context or field of study [268][269][270]. For a thorough review of NLP techniques, as applied to the biomedical literature, we invite the reader to reference [271].…”
Section: Computationally Predicted Resourcesmentioning
confidence: 99%
“…Other work focuses on the task of associating genes with diseases [267] or finding associations between metabolites, proteins, genes, and diseases [258]. Another challenge is document triage, such as finding documents relevant to a context or field of study [268][269][270]. For a thorough review of NLP techniques, as applied to the biomedical literature, we invite the reader to reference [271].…”
Section: Computationally Predicted Resourcesmentioning
confidence: 99%
“…The achieved F1 score for this method was 84.0% while using a multi-layer perceptron with sigmoid functions. A biomedical document classification was carried out in [18], where an imbalanced bio-dataset was used for a cluster-based classification on the under-sampled dataset GXD. Overall precision of 0.72 was achieved.…”
Section: Literature Reviewmentioning
confidence: 99%
“…However, they did not lead to overall improvement. An alternative would be to use a classifier ensemble, as proposed in [15]. However, this approach is too expensive for deep learning models due to the learning cost.…”
Section: Classification Performance Per Category Document Size and Omentioning
confidence: 99%
“…LocText, for example, implements a NER and RE for proteins based on SVM, achieving 86% precision (56% F1-score) [14]. To address the common issue of class imbalance in biocuration, an ensemble of SVM classifiers along with random under-sampling were proposed for automatically identifying relevant papers for curation in the Gene Expression Database [15].…”
Section: Introductionmentioning
confidence: 99%