2020
DOI: 10.1093/database/baaa043
|View full text |Cite
|
Sign up to set email alerts
|

Effective biomedical document classification for identifying publications relevant to the mouse Gene Expression Database (GXD)

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 1 publication
0
13
0
Order By: Relevance
“…Captions associated with figures provide another important source of information for biomedical document classification. In order to make use of captions, we employ a standard preprocessing procedure that includes named-entity recognition (NER), stemming and stop-words removal as we have done in our earlier work ( Jiang et al , 2017 , 2020 ). For NER, we first identify all gene, disease, chemical, species, mutation and cell-line concepts using PubTator, which is widely used for annotations of biomedical concepts ( Wei et al , 2019 ).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Captions associated with figures provide another important source of information for biomedical document classification. In order to make use of captions, we employ a standard preprocessing procedure that includes named-entity recognition (NER), stemming and stop-words removal as we have done in our earlier work ( Jiang et al , 2017 , 2020 ). For NER, we first identify all gene, disease, chemical, species, mutation and cell-line concepts using PubTator, which is widely used for annotations of biomedical concepts ( Wei et al , 2019 ).…”
Section: Methodsmentioning
confidence: 99%
“…Image captions have been shown effective for document classification in several studies ( Burns et al , 2019 ; Jiang et al , 2017 , 2020 ; Regev et al , 2002 ). For instance, Burns et al (2019) compared classification performance under different information sources, when identifying publications containing molecular interaction information, relevant to the IntAct Molecular Interaction database ( Kerrien et al , 2012 ).…”
Section: Introductionmentioning
confidence: 99%
“…As we demonstrated in our earlier work (13), image captions in biomedical publications, which form brief summaries of the images, contain significant and useful information for determining the topic discussed in the publications. As part of future work, we plan to integrate image captions into the classification scheme.…”
Section: Discussionmentioning
confidence: 86%
“…Much work over the past two decades aimed to address biomedical document classification. Most of the proposed methods are trained and tested over balanced data sets, in which all classes are similar in size (13–16). However, biomedical data sets are typically highly imbalanced, where relatively few publications within a large volume of literature are actually relevant to any specific topic of interest (17).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation