Clinical text classification research trends: Systematic literature review and open issues

Mujtaba, Ghulam; Shuib, Liyana; Idris, Norisma; Hoo, Wai Lam; Raj, Ram Gopal; Khowaja, Kamran; Shaikh, Khairunisa; Nweke, Henry Friday

doi:10.1016/j.eswa.2018.09.034

Cited by 94 publications

(116 citation statements)

References 79 publications

Supporting

Mentioning

110

Contrasting

Order By: Relevance

“…The data sources used in various research studies can be categorised into two types: homogeneous sources and heterogeneous sources, which can further be divided into three subtypes: binary class, multi-class single labeled, multi-class multilabeled datasets (Mujtaba et al, 2019). There are few datasets that are publicly available such as PhysioNet 1 , i2b2 NLP dataset 2 , and OHSUMED 3 .…”

Section: Datasets Availablementioning

confidence: 99%

“…Preprocessing is done to remove meaningless information from the dataset as the clinical narratives may contain high level of noise, sparsity, mispelled words, grammatical errors (Nguyen and Patrick, 2016;Mujtaba et al, 2019). Different preprocessing techniques are applied in research studies including sentence splitting, tokenisation, spell error detection and correction, stemming and lemmatisation, normalisation (Manning et al, 2008), removal of stop words, removal of punctuation or special symbols, abbreviation expansion, chunking, named entity recognition (Bird et al, 2009), negation detection (Chapman et al, 2001).…”

Section: Preprocessingmentioning

confidence: 99%

“…Feature engineering is the combination of feature extraction, feature representation, and feature selection (Mujtaba et al, 2019). Feature extraction is the process of extracting useful features which includes Bag of Words (BoW), n-gram, Word2Vec, and GloVe.…”

Section: Feature Engineeringmentioning

confidence: 99%

“…The performance of clinical text classification models can be measured using standard evaluation metrics which include precision, recall, Fmeasure (or F-score), accuracy, precision (micro and macro-average), recall (micro and macroaverage), F-measure (micro and macro-average), and area under the curve (AUC). These metrics can be computed by using values of true positive (TP), false positive (FP), true negative (TN), and false negative (FN) in the standard confusion matrix (Mujtaba et al, 2019).…”

Section: Evaluation Metricsmentioning

confidence: 99%

See 3 more Smart Citations

Distributed Knowledge Based Clinical Auto-Coding System

Kaur

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

Codification of free-text clinical narratives have long been recognised to be beneficial for secondary uses such as funding, insurance claim processing and research. In recent years, many researchers have studied the use of Natural Language Processing (NLP), related Machine Learning (ML) methods and techniques to resolve the problem of manual coding of clinical narratives. Most of the studies are focused on classification systems relevant to the U.S and there is a scarcity of studies relevant to Australian classification systems such as ICD-10-AM and ACHI. Therefore, we aim to develop a knowledge-based clinical auto-coding system, that utilise appropriate NLP and ML techniques to assign ICD-10-AM and ACHI codes to clinical records, while adhering to both local coding standards (Australian Coding Standard) and international guidelines that get updated and validated continuously.

show abstract

Section: Datasets Availablementioning

confidence: 99%

Section: Preprocessingmentioning

confidence: 99%

Section: Feature Engineeringmentioning

confidence: 99%

Section: Evaluation Metricsmentioning

confidence: 99%

See 2 more Smart Citations

Distributed Knowledge Based Clinical Auto-Coding System

Kaur

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

View full text Add to dashboard Cite

show abstract

“…With respect to automated text classification, in this work, we compared the approaches from the two main paradigms: (1) symbolic text classification, in which texts are represented with sparse vectors of TF-IDF weights, used as input features for traditional machine learning algorithms, such as Logistic Regression (LR) or Support Vector Machine (SVM); and (2) a more recent semantic text classification paradigm, in which dense semantic representations of words-word embeddings-are introduced as input to a neural architecture. Different deep learning architectures have been tried in a number of medical text classification tasks [25][26][27], including automated classification of radiology reports [6,28,29]. While recurrent [29,30] and attention-based neural networks [27,31] may present a viable solution, convolutional neural networks (CNN) seem to generally offer an edge in classification performance as well as faster training times [6,29].…”

Section: Introductionmentioning

confidence: 99%

Automatic Annotation of Narrative Radiology Reports

et al. 2020

View full text Add to dashboard Cite

Narrative texts in electronic health records can be efficiently utilized for building decision support systems in the clinic, only if they are correctly interpreted automatically in accordance with a specified standard. This paper tackles the problem of developing an automated method of labeling free-form radiology reports, as a precursor for building query-capable report databases in hospitals. The analyzed dataset consists of 1295 radiology reports concerning the condition of a knee, retrospectively gathered at the Clinical Hospital Centre Rijeka, Croatia. Reports were manually labeled with one or more labels from a set of 10 most commonly occurring clinical conditions. After primary preprocessing of the texts, two sets of text classification methods were compared: (1) traditional classification models—Naive Bayes (NB), Logistic Regression (LR), Support Vector Machine (SVM), and Random Forests (RF)—coupled with Bag-of-Words (BoW) features (i.e., symbolic text representation) and (2) Convolutional Neural Network (CNN) coupled with dense word vectors (i.e., word embeddings as a semantic text representation) as input features. We resorted to nested 10-fold cross-validation to evaluate the performance of competing methods using accuracy, precision, recall, and F 1 score. The CNN with semantic word representations as input yielded the overall best performance, having a micro-averaged F 1 score of 86 . 7 % . The CNN classifier yielded particularly encouraging results for the most represented conditions: degenerative disease ( 95 . 9 % ), arthrosis ( 93 . 3 % ), and injury ( 89 . 2 % ). As a data-hungry deep learning model, the CNN, however, performed notably worse than the competing models on underrepresented classes with fewer training instances such as multicausal disease or metabolic disease. LR, RF, and SVM performed comparably well, with the obtained micro-averaged F 1 scores of 84 . 6 % , 82 . 2 % , and 82 . 1 % , respectively.

show abstract