2020
DOI: 10.1093/bib/bbaa054
|View full text |Cite
|
Sign up to set email alerts
|

Biomedical named entity recognition and linking datasets: survey and our recent development

Abstract: Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein–protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inapp… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 35 publications
(14 citation statements)
references
References 41 publications
0
14
0
Order By: Relevance
“…There are methods aimed at NER that have been developing during the last years (Kaewphan et al, 2018;Korvigo et al, 2018;Hemati and Mehler, 2019;Hong and Lee, 2020;Huang et al, 2020;Kilicoglu et al, 2020). Most of them are based on algorithms for NER related either to chemicals or biological objects.…”
Section: Introductionmentioning
confidence: 99%
“…There are methods aimed at NER that have been developing during the last years (Kaewphan et al, 2018;Korvigo et al, 2018;Hemati and Mehler, 2019;Hong and Lee, 2020;Huang et al, 2020;Kilicoglu et al, 2020). Most of them are based on algorithms for NER related either to chemicals or biological objects.…”
Section: Introductionmentioning
confidence: 99%
“…JLNPBA (Huang et al , 2020) dataset is formed from MEDLINE by using MeSH terms “human,” “blood cells” and “transcription factors.” Two thousand abstracts have been selected from this search and hand-annotated, based on a small 48 class's taxonomic classification. Out of 48 classes, 36 were used to annotate the GENIA corpus.…”
Section: Experimentationmentioning
confidence: 99%
“…Information extraction systems have also been intensively researched and developed ( 9 ), allowing the automatic mining of key knowledge that helps to keep biomedical databases updated and alleviating the need for manual efforts ( 10 , 11 ). For instance, previous research efforts on information extraction have focused on identifying biomedical entities such as genes, proteins, chemical compounds ( 12 ) and clinical entities including laboratory procedures, diseases and adverse effects ( 13 , 14 ). Identification of these entities of interest in the text is commonly paired with a normalization step, where the entities are grounded to unique identifiers from standard vocabularies or databases.…”
Section: Related Workmentioning
confidence: 99%