Objective. To accelerate the use of outcome measures in rheumatology, we developed and evaluated a natural language processing (NLP) pipeline for extracting these measures from free-text outpatient rheumatology notes within the American College of Rheumatology's Rheumatology Informatics System for Effectiveness (RISE) registry.Methods. We included all patients in RISE (2015RISE ( -2018. The NLP pipeline extracted scores corresponding to 8 measures of rheumatoid arthritis (RA) disease activity (DA) and functional status (FS) documented in outpatient rheumatology notes. Score extraction performance was evaluated by chart review, and we assessed agreement with scores documented in structured data. We conducted an external validation of our NLP pipeline using data from rheumatology notes from an academic medical center that is not included in the RISE registry.Results. We processed over 34 million notes from 854,628 patients, 158 practices, and 24 electronic health record (EHR) systems from RISE. Manual chart review revealed a sensitivity, positive predictive value (PPV), and F1 score of 95%, 87%, and 91%, respectively. Substantial agreement was observed between scores extracted from RISE notes and scores derived from structured data (κ = 0.43-0.68 among DA and 0.86-0.98 among FS measures). In the external validation, we found a sensitivity, PPV, and F1 score of 92%, 69%, and 79%, respectively.Conclusion. We developed an NLP pipeline to extract RA outcome measures from a national registry of notes from multiple EHR systems and found it to have good internal and external validity. This pipeline can facilitate measurement of clinical-and patient-reported outcomes for use in research and quality measurement.
Experts have noted a concerning gap between clinical natural language processing (NLP) research and real-world applications, such as clinical decision support. To help address this gap, in this viewpoint, we enumerate a set of practical considerations for developing an NLP system to support real-world clinical needs and improve health outcomes. They include determining (1) the readiness of the data and compute resources for NLP, (2) the organizational incentives to use and maintain the NLP systems, and (3) the feasibility of implementation and continued monitoring. These considerations are intended to benefit the design of future clinical NLP projects and can be applied across a variety of settings, including large health systems or smaller clinical practices that have adopted electronic medical records in the United States and globally.
Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases Highlights d Five NLP word vectorization models predict 8 ICD-10 codes with high AUROC and AUPRC d The best-performing TF-IDF models showed full interpretability with important words d The models showed high transferability when tested on the MIMIC-III ICU dataset
Background Automated extraction of symptoms from clinical notes is a challenging task owing to the multidimensional nature of symptom description. The availability of labeled training data is extremely limited owing to the nature of the data containing protected health information. Natural language processing and machine learning to process clinical text for such a task have great potential. However, supervised machine learning requires a great amount of labeled data to train a model, which is at the origin of the main bottleneck in model development. Objective The aim of this study is to address the lack of labeled data by proposing 2 alternatives to manual labeling for the generation of training labels for supervised machine learning with English clinical text. We aim to demonstrate that using lower-quality labels for training leads to good classification results. Methods We addressed the lack of labels with 2 strategies. The first approach took advantage of the structured part of electronic health records and used diagnosis codes (International Classification of Disease–10th revision) to derive training labels. The second approach used weak supervision and data programming principles to derive training labels. We propose to apply the developed framework to the extraction of symptom information from outpatient visit progress notes of patients with cardiovascular diseases. Results We used >500,000 notes for training our classification model with International Classification of Disease–10th revision codes as labels and >800,000 notes for training using labels derived from weak supervision. We show that the dependence between prevalence and recall becomes flat provided a sufficiently large training set is used (>500,000 documents). We further demonstrate that using weak labels for training rather than the electronic health record codes derived from the patient encounter leads to an overall improved recall score (10% improvement, on average). Finally, the external validation of our models shows excellent predictive performance and transferability, with an overall increase of 20% in the recall score. Conclusions This work demonstrates the power of using a weak labeling pipeline to annotate and extract symptom mentions in clinical text, with the prospects to facilitate symptom information integration for a downstream clinical task such as clinical decision support.
Mining the structured data in electronic health records(EHRs) enables many clinical applications while the information in free-text clinical notes often remains untapped. Free-text notes are unstructured data harder to use in machine learning while structured diagnostic codes can be missing or even erroneous. To improve the quality of diagnostic codes, this work extracts structured diagnostic codes from the unstructured notes concerning cardiovascular diseases. Five old and new word embeddings were used to vectorize over 5 million progress notes from Stanford EHR and logistic regression was used to predict eight ICD-10 codes of common cardiovascular diseases. The models were interpreted by the important words in predictions and analyses of false positive cases. Trained on Stanford notes, the model transferability was tested in the prediction of corresponding ICD-9 codes of the MIMIC-III discharge summaries. The word embeddings and logistic regression showed good performance in the diagnostic code extraction with TF-IDF as the best word embedding model showing AU-ROC ranging from 0.9499 to 0.9915 and AUPRC ranging from 0.2956 to 0.8072. The models also showed transferability when tested on MIMIC-III data set with AUROC ranging from 0.7952 to 0.9790 and AUPRC ranging from 0.2353 to 0.8084. Model interpretability was showed by the important words with clinical meanings matching each disease. This study shows the feasibility to accurately extract structured diagnostic codes, impute missing codes and correct erroneous codes from free-text clinical notes with interpretable models for clinicians, which helps improve the data quality of diagnostic codes for information retrieval and downstream machine-learning applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.