Information Extraction for Intestinal Cancer Electronic Medical Records

Wang, Sufen; Pang, Minmin; Pan, Changqing; Yuan, Junyi; Xu, Bo; Du, Ming; Zhang, Hong

doi:10.1109/access.2020.3005684

Cited by 7 publications

(2 citation statements)

References 31 publications

(33 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Our comparison of embedding features and unigram features clearly demonstrates the added value of lexically-abstracted embedding features, which enable data-driven models to capitalize on similar and related words beyond exact matches (46,73). As observed in prior literature (30,74,75), word embeddings that balance a training corpus that is representative of the target information with corpus size achieve the best performance for specialized tasks. While our results led us to use the most specialized PT-OT corpus for our word2vec embeddings, the performance of our more general NIHCC corpus (approximately 155,000 documents) was comparable to PT-OT results, and MIMIC embeddings were not far behind.…”

Section: A Template For Expanding Automated Coding To New Concept Domsupporting

confidence: 60%

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health

Newman-Griffis¹,

Fosler‐Lussier²

2021

Front. Digit. Health

View full text Add to dashboard Cite

Linking clinical narratives to standardized vocabularies and coding systems is a key component of unlocking the information in medical text for analysis. However, many domains of medical concepts, such as functional outcomes and social determinants of health, lack well-developed terminologies that can support effective coding of medical text. We present a framework for developing natural language processing (NLP) technologies for automated coding of medical information in under-studied domains, and demonstrate its applicability through a case study on physical mobility function. Mobility function is a component of many health measures, from post-acute care and surgical outcomes to chronic frailty and disability, and is represented as one domain of human activity in the International Classification of Functioning, Disability, and Health (ICF). However, mobility and other types of functional activity remain under-studied in the medical informatics literature, and neither the ICF nor commonly-used medical terminologies capture functional status terminology in practice. We investigated two data-driven paradigms, classification and candidate selection, to link narrative observations of mobility status to standardized ICF codes, using a dataset of clinical narratives from physical therapy encounters. Recent advances in language modeling and word embedding were used as features for established machine learning models and a novel deep learning approach, achieving a macro-averaged F-1 score of 84% on linking mobility activity reports to ICF codes. Both classification and candidate selection approaches present distinct strengths for automated coding in under-studied domains, and we highlight that the combination of (i) a small annotated data set; (ii) expert definitions of codes of interest; and (iii) a representative text corpus is sufficient to produce high-performing automated coding systems. This research has implications for continued development of language technologies to analyze functional status information, and the ongoing growth of NLP tools for a variety of specialized applications in clinical care and research.

show abstract

Section: A Template For Expanding Automated Coding To New Concept Domsupporting

confidence: 60%

Automated Coding of Under-Studied Medical Concept Domains: Linking Physical Activity Reports to the International Classification of Functioning, Disability, and Health

Newman-Griffis¹,

Fosler‐Lussier²

2021

Front. Digit. Health

View full text Add to dashboard Cite

show abstract

“…Syndrome differentiation of Yin and Yang deficiency is based on the physiological and pathological characteristics of the Yin and the Yang, and involves analyzing and summarizing a variety of disease-related information that is collected according to four diagnostics for identification [6]. A large amount of critical information on healthcare is buried in unstructured narratives, such as medical records, which makes its computational analysis difficult [7]. Moreover, mastering syndrome differentiation in TCM is a complicated and time-consuming process.…”

Section: Introductionmentioning

confidence: 99%

Multi-Task Joint Learning Model for Chinese Word Segmentation and Syndrome Differentiation in Traditional Chinese Medicine

Yan

et al. 2022

IJERPH

View full text Add to dashboard Cite

Evidence-based treatment is the basis of traditional Chinese medicine (TCM), and the accurate differentiation of syndromes is important for treatment in this context. The automatic differentiation of syndromes of unstructured medical records requires two important steps: Chinese word segmentation and text classification. Due to the ambiguity of the Chinese language and the peculiarities of syndrome differentiation, these tasks pose a daunting challenge. We use text classification to model syndrome differentiation for TCM, and use multi-task learning (MTL) and deep learning to accomplish the two challenging tasks of Chinese word segmentation and syndrome differentiation. Two classic deep neural networks—bidirectional long short-term memory (Bi-LSTM) and text-based convolutional neural networks (TextCNN)—are fused into MTL to simultaneously carry out these two tasks. We used our proposed method to conduct a large number of comparative experiments. The experimental comparisons showed that it was superior to other methods on both tasks. Our model yielded values of accuracy, specificity, and sensitivity of 0.93, 0.94, and 0.90, and 0.80, 0.82, and 0.78 on the Chinese word segmentation task and the syndrome differentiation task, respectively. Moreover, statistical analyses showed that the accuracies of the non-joint and joint models were both within the 95% confidence interval, with pvalue < 0.05. The experimental comparison showed that our method is superior to prevalent methods on both tasks. The work here can help modernize TCM through intelligent differentiation.

show abstract