Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

Chen, Xianglong; Ouyang, Cheng; Liu, Yongbin; Bu, Yi

doi:10.3390/ijerph17082687

Cited by 19 publications

(11 citation statements)

References 30 publications

(35 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the evaluation tasks of the previous two years, CCKS 2019 was interested in disease and diagnosis, examination, inspection, surgery, medication, and anatomy. From the related studies of clinical information extraction, it can be found that admission notes, disease course notes, and discharge summaries are the main objects of clinical information extraction [ 20 , 21 ], which contain a large number of clinical entities.…”

Section: Related Workmentioning

confidence: 99%

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Fang

Zhao

et al. 2022

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

Objective Pituitary adenomas are the most common type of pituitary disorders, which usually occur in young adults and often affect the patient’s physical development, labor capacity and fertility. Clinical free texts noted in electronic medical records (EMRs) of pituitary adenomas patients contain abundant diagnosis and treatment information. However, this information has not been well utilized because of the challenge to extract information from unstructured clinical texts. This study aims to enable machines to intelligently process clinical information, and automatically extract clinical named entity for pituitary adenomas from Chinese EMRs. Methods The clinical corpus used in this study was from one pituitary adenomas neurosurgery treatment center of a 3A hospital in China. Four types of fine-grained texts of clinical records were selected, which included notes from present illness, past medical history, case characteristics and family history of 500 pituitary adenoma inpatients. The dictionary-based matching, conditional random fields (CRF), bidirectional long short-term memory with CRF (BiLSTM-CRF), and bidirectional encoder representations from transformers with BiLSTM-CRF (BERT-BiLSTM-CRF) were used to extract clinical entities from a Chinese EMRs corpus. A comprehensive dictionary was constructed based on open source vocabularies and a domain dictionary for pituitary adenomas to conduct the dictionary-based matching method. We selected features such as part of speech, radical, document type, and the position of characters to train the CRF-based model. Random character embeddings and the character embeddings pretrained by BERT were used respectively as the input features for the BiLSTM-CRF model and the BERT-BiLSTM-CRF model. Both strict metric and relaxed metric were used to evaluate the performance of these methods. Results Experimental results demonstrated that the deep learning and other machine learning methods were able to automatically extract clinical named entities, including symptoms, body regions, diseases, family histories, surgeries, medications, and disease courses of pituitary adenomas from Chinese EMRs. With regard to overall performance, BERT-BiLSTM-CRF has the highest strict F1 value of 91.27% and the highest relaxed F1 value of 95.57% respectively. Additional evaluations showed that BERT-BiLSTM-CRF performed best in almost all entity recognition except surgery and disease course. BiLSTM-CRF performed best in disease course entity recognition, and performed as well as the CRF model for part of speech, radical and document type features, with both strict and relaxed F1 value reaching 96.48%. The CRF model with part of speech, radical and document type features performed best in surgery entity recognition with relaxed F1 value of 95.29%. Conclusions In this study, we conducted four entity recognition methods for pituitary adenomas based on Chinese EMRs. It demonstrates that the deep learning methods can effectively extract various types of clinical entities with satisfying performance. This study contributed to the clinical named entity extraction from Chinese neurosurgical EMRs. The findings could also assist in information extraction in other Chinese medical texts.

show abstract

Section: Related Workmentioning

confidence: 99%

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Fang

Zhao

et al. 2022

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

show abstract

“…Most of the current text classification methods combine models such as CNN and LSTM [ 15 , 16 , 17 ], which prioritize sequential and local information and can capture semantic and syntactic information in continuous word sequences very well. However, they ignore global word co-occurrence, which carries discontinuous as well as long-distance semantic information.…”

Section: Related Workmentioning

confidence: 99%

Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN

Yang

Duan

Xiao

et al. 2022

IJERPH

View full text Add to dashboard Cite

Nowadays, tourists increasingly prefer to check the reviews of attractions before traveling to decide whether to visit them or not. To respond to the change in the way tourists choose attractions, it is important to classify the reviews of attractions with high precision. In addition, more and more tourists like to use emojis to express their satisfaction or dissatisfaction with the attractions. In this paper, we built a dataset for Chinese attraction evaluation incorporating emojis (CAEIE) and proposed an explicitly n-gram masking method to enhance the integration of coarse-grained information into a pre-training (ERNIE-Gram) and Text Graph Convolutional Network (textGCN) (E2G) model to classify the dataset with a high accuracy. The E2G preprocesses the text and feeds it to ERNIE-Gram and TextGCN. ERNIE-Gram was trained using its unique mask mechanism to obtain the final probabilities. TextGCN used the dataset to construct heterogeneous graphs with comment text and words, which were trained to obtain a representation of the document output category probabilities. The two probabilities were calculated to obtain the final results. To demonstrate the validity of the E2G model, this paper was compared with advanced models. After experiments, it was shown that E2G had a good classification effect on the CAEIE dataset, and the accuracy of classification was up to 97.37%. Furthermore, the accuracy of E2G was 1.37% and 1.35% ahead of ERNIE-Gram and TextGCN, respectively. In addition, two sets of comparison experiments were conducted to verify the performance of TextGCN and TextGAT on the CAEIE dataset. The final results showed that ERNIE and ERNIE-Gram combined TextGCN and TextGAT, respectively, and TextGCN performed 1.6% and 2.15% ahead. This paper compared the effects of eight activation functions on the second layer of the TextGCN and the activation-function-rectified linear unit 6 (RELU6) with the best results based on experiments.

show abstract

“…e reason why the BERT pre-training language model is chosen is that the text vector is used as the input of the model, and the granularity of Chinese division is divided into character-level and word-level. Existing research shows that character-level pretraining schemes show better results [37,39], while the BERT pre-training language model is a character-level pretraining program.…”

Section: Bert-bigru-crf Model Constructionmentioning

confidence: 99%

A BERT‐BiGRU‐CRF Model for Entity Recognition of Chinese Electronic Medical Records

2021

View full text Add to dashboard Cite

Because of difficulty processing the electronic medical record data of patients with cerebrovascular disease, there is little mature recognition technology capable of identifying the named entity of cerebrovascular disease. Excellent research results have been achieved in the field of named entity recognition (NER), but there are several problems in the pre processing of Chinese named entities that have multiple meanings, of which neglecting the combination of contextual information is one. Therefore, to extract five categories of key entity information for diseases, symptoms, body parts, medical examinations, and treatment in electronic medical records, this paper proposes the use of a BERT-BiGRU-CRF named entity recognition method, which is applied to the field of cerebrovascular diseases. The BERT layer first converts the electronic medical record text into a low-dimensional vector, then uses this vector as the input to the BiGRU layer to capture contextual features, and finally uses conditional random fields (CRFs) to capture the dependency between adjacent tags. The experimental results show that the F1 score of the model reaches 90.38%.

show abstract

Improving the Named Entity Recognition of Chinese Electronic Medical Records by Combining Domain Dictionary and Rules

Cited by 19 publications

References 30 publications

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Extracting clinical named entity for pituitary adenomas from Chinese electronic medical records

Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN

A BERT‐BiGRU‐CRF Model for Entity Recognition of Chinese Electronic Medical Records

Contact Info

Product

Resources

About