Building a Pediatric Medical Corpus: Word Segmentation and Named Entity Annotation

Zan, Hongying; Li, Wenxin; Zhang, Kunli; Ye, Yajuan; Chang, Baobao; Sui, Zhifang

doi:10.1007/978-3-030-81197-6_55

Cited by 17 publications

(12 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When conducting automatic term extraction through NLP, word segmentation throughout semantic analysis is required using a corpus [ 33 , 34 ]. For example, the term “diabetes mellitus” should be extracted based on its meaning, not by extracting “diabetes” and “mellitus” separately.…”

Section: Resultsmentioning

confidence: 99%

A SNOMED CT Mapping Guideline for the Local Terms Used to Document Clinical Findings and Procedures in Electronic Medical Records in South Korea: Methodological Study

Sung¹,

Park²,

Jung³

et al. 2023

JMIR Med Inform

View full text Add to dashboard Cite

Background South Korea joined SNOMED International as the 39th member country. To ensure semantic interoperability, South Korea introduced SNOMED CT (Systemized Nomenclature of Medicine–Clinical Terms) in 2020. However, there is no methodology to map local Korean terms to SNOMED CT. Instead, this is performed sporadically and independently at each local medical institution. The quality of the mapping, therefore, cannot be guaranteed. Objective This study aimed to develop and introduce a guideline to map local Korean terms to the SNOMED CT used to document clinical findings and procedures in electronic health records at health care institutions in South Korea. Methods The guidelines were developed from December 2020 to December 2022. An extensive literature review was conducted. The overall structures and contents of the guidelines with diverse use cases were developed by referencing the existing SNOMED CT mapping guidelines, previous studies related to SNOMED CT mapping, and the experiences of the committee members. The developed guidelines were validated by a guideline review panel. Results The SNOMED CT mapping guidelines developed in this study recommended the following 9 steps: define the purpose and scope of the map, extract terms, preprocess source terms, preprocess source terms using clinical context, select a search term, use search strategies to find SNOMED CT concepts using a browser, classify mapping correlations, validate the map, and build the final map format. Conclusions The guidelines developed in this study can support the standardized mapping of local Korean terms into SNOMED CT. Mapping specialists can use this guideline to improve the mapping quality performed at individual local medical institutions.

show abstract

Section: Resultsmentioning

confidence: 99%

A SNOMED CT Mapping Guideline for the Local Terms Used to Document Clinical Findings and Procedures in Electronic Medical Records in South Korea: Methodological Study

Sung¹,

Park²,

Jung³

et al. 2023

JMIR Med Inform

View full text Add to dashboard Cite

show abstract

“…[4] The entire process of annotation was under the guidelines that we established to ensure consistency and medical specialty. While setting the guidelines, the principles from Common Clinical Medical Terms (2019 Edition)[6], CMeEE [7], and the Baidu Health Dictionary[8] were adopted. Moreover, two doctors in the specialty of Obstetrics and Gynecology were involved in refining the guidelines by using 50 consultations out of the 2,383.…”

Section: Construction Of Datasetmentioning

confidence: 99%

A Chinese telemedicine-dialogue dataset annotated for named entities

Wang

Yan

et al. 2023

Preprint

View full text Add to dashboard Cite

Background: A large collection of dialogues between patients and doctors are needed to be annotated for medical named entities to build intelligence for telemedicine. However, since most patients involved in telemedicine deliver related named entities in an informal and sentence-level multi-word expression way, it is challenging to tag them on the data of telemedicine dialogues. Under such circumstance, this study aims to address this issue. Methods: On the data of telemedicine dialogues from Haodf, we have developed guidelines and followed two-round procedure to tag six types of named entities, including disease, symptom, time, pharmaceutical, operation, and examination. Moreover, we have experimented four deep-learning models on the dataset to establish a benchmark for named entity recognition. Results: The distilled dataset contains 2,383 consultations between doctors and patients, 13,411 sentences from doctors, 17,929 from patients. The average characters per consultation is 1,100. There is 63,560 named entities on the whole, and average characters per named entity is 4.33.Moreover, the experiment results suggest that LatticeLSTM performs best on our dataset regarding all scores like accuracy, precision, F1, etc. Conclusion: Compared with other exiting datasets, the novelties of this dataset are reflected in three facets: First, the intricated tagging of long multi-word expressions for medical named entity has been tackled in this study. Second, it is one of first attempts to mark temporal entities. Third, this dataset is balanced across the six types of labels. We believe that this dataset will play a considerable role in expanding telemedicine AI.

show abstract

“…The Note that all the results are our implementations and best scores are highlighted in bold. Hongying et al [2020], which has been widely used in the literature. Moreover, we also experiment with various English datasets, including CONLL04 Roth and Yih [2004], Genia Ohta et al [2002], NYT Riedel et al [2010], WebNLG Zeng et al [2018] and ADE Gurulingappa et al [2012].…”

Section: Class Imbalance Lossmentioning

confidence: 99%

Global Pointer: Novel Efficient Span-based Approach for Named Entity Recognition

Su¹,

Ahmed²,

Pan³

et al. 2022

Preprint

View full text Add to dashboard Cite

Named entity recognition (NER) task aims at identifying entities from a piece of text that belong to predefined semantic types such as person, location, organization, etc. The state-of-the-art solutions for flat entities NER commonly suffer from capturing the fine-grained semantic information in underlying texts. The existing span-based approaches overcome this limitation, but the computation time is still a concern. In this work, we propose a novel span-based NER framework, namely Global Pointer (GP), that leverages the relative positions through a multiplicative attention mechanism. The ultimate goal is to enable a global view that considers the beginning and the end positions to predict the entity. To this end, we design two modules to identify the head and the tail of a given entity to enable the inconsistency between the training and inference processes. Moreover, we introduce a novel classification loss function to address the imbalance label problem. In terms of parameters, we introduce a simple but effective approximate method to reduce the training parameters. We extensively evaluate GP on various benchmark datasets. Our extensive experiments demonstrate that GP can outperform the existing solution. Moreover, the experimental results show the efficacy of the introduced loss function compared to softmax and entropy alternatives.

show abstract

Building a Pediatric Medical Corpus: Word Segmentation and Named Entity Annotation

Cited by 17 publications

References 4 publications

A SNOMED CT Mapping Guideline for the Local Terms Used to Document Clinical Findings and Procedures in Electronic Medical Records in South Korea: Methodological Study

A SNOMED CT Mapping Guideline for the Local Terms Used to Document Clinical Findings and Procedures in Electronic Medical Records in South Korea: Methodological Study

A Chinese telemedicine-dialogue dataset annotated for named entities

Global Pointer: Novel Efficient Span-based Approach for Named Entity Recognition

Contact Info

Product

Resources

About