Abstract:While much data within a patient's electronic health record (EHR) is coded, crucial information concerning the patient's care and management remain buried in unstructured clinical notes, making it difficult and time-consuming for physicians to review during their usual clinical workflow. In this paper, we present our clinical note processing pipeline, which extends beyond basic medical natural language processing (NLP) with concept recognition and relation detection to also include components specific to EHR d… Show more
“…NLP for medical notes. The NLP community has worked extensively on medical notes to alleviate information overload, ranging from summarization (McInerney et al, 2020;Liang et al, 2019;Alsentzer and Kim, 2018) to information extraction (Wiegreffe et al, 2019;Zheng et al, 2014;Wang et al, 2018). For instance, information extraction aims to automatically extract valuable information from existing medical notes.…”
Machine learning models depend on the quality of input data. As electronic health records are widely adopted, the amount of data in health care is growing, along with complaints about the quality of medical notes. We use two prediction tasks, readmission prediction and in-hospital mortality prediction, to characterize the value of information in medical notes. We show that as a whole, medical notes only provide additional predictive power over structured information in readmission prediction. We further propose a probing framework to select parts of notes that enable more accurate predictions than using all notes, despite that the selected information leads to a distribution shift from the training data ("all notes"). Finally, we demonstrate that models trained on the selected valuable information achieve even better predictive performance, with only 6.8% of all the tokens for readmission prediction.
“…NLP for medical notes. The NLP community has worked extensively on medical notes to alleviate information overload, ranging from summarization (McInerney et al, 2020;Liang et al, 2019;Alsentzer and Kim, 2018) to information extraction (Wiegreffe et al, 2019;Zheng et al, 2014;Wang et al, 2018). For instance, information extraction aims to automatically extract valuable information from existing medical notes.…”
Machine learning models depend on the quality of input data. As electronic health records are widely adopted, the amount of data in health care is growing, along with complaints about the quality of medical notes. We use two prediction tasks, readmission prediction and in-hospital mortality prediction, to characterize the value of information in medical notes. We show that as a whole, medical notes only provide additional predictive power over structured information in readmission prediction. We further propose a probing framework to select parts of notes that enable more accurate predictions than using all notes, despite that the selected information leads to a distribution shift from the training data ("all notes"). Finally, we demonstrate that models trained on the selected valuable information achieve even better predictive performance, with only 6.8% of all the tokens for readmission prediction.
“…Similar investigations into latent EHR data have identified benefits to extracting cardiovascular data, 1 pulmonary function tests, 16 health maintenance history, immunizations, and other clinical data that may exist unstructured within patient notes. 17 In the current generation of commercial EHRs, this information does Searching the PDF Haystack Kostrinsky-Thomas et al 247 not necessarily trigger or satisfy health maintenance reminders, and unless it is manually read and entered, what is contained in these scanned records may not be reflected in the EHR past medical history, patient problem lists, or lists of allergies. As others have noted, the literature devoted to scanned documents and images within EHRs is smaller than we expected given the importance of this commonly used means for HIE in the early decades of EHR use in our country.…”
Background Clinicians express concern that they may be unaware of important information contained in voluminous scanned and other outside documents contained in electronic health records (EHRs). An example is “unrecognized EHR risk factor information,” defined as risk factors for heritable cancer that exist within a patient's EHR but are not known by current treating providers. In a related study using manual EHR chart review, we found that half of the women whose EHR contained risk factor information meet criteria for further genetic risk evaluation for heritable forms of breast and ovarian cancer. They were not referred for genetic counseling.
Objectives The purpose of this study was to compare the use of automated methods (optical character recognition with natural language processing) versus human review in their ability to identify risk factors for heritable breast and ovarian cancer within EHR scanned documents.
Methods We evaluated the accuracy of the chart review by comparing our criterion standard (physician chart review) versus an automated method involving Amazon's Textract service (Amazon.com, Seattle, Washington, United States), a clinical language annotation modeling and processing toolkit (CLAMP) (Center for Computational Biomedicine at The University of Texas Health Science, Houston, Texas, United States), and a custom-written Java application.
Results We found that automated methods identified most cancer risk factor information that would otherwise require clinician manual review and therefore is at risk of being missed.
Conclusion The use of automated methods for identification of heritable risk factors within EHRs may provide an accurate yet rapid review of patients' past medical histories. These methods could be further strengthened via improved analysis of handwritten notes, tables, and colloquial phrases.
“…Generating a medical summary from a clinician-patient conversation can be cast as a supervised learning task, 32 where an ML algorithm is trained with a large set of past medical conversation transcripts along with the gold standard summary associated with each conversation. 7,33 The input to the summarization model would be a clinician-patient transcript and the output would be an appropriate summary. 34,35 However, obtaining the gold standard summary of each conversation is costly because of the medical expertize required to complete the task 14 and the high variability in clinician notes' content, style, organization, and quality.…”
Clinicians spend a large amount of time on clinical documentation of patient encounters, often impacting quality of care and clinician satisfaction, and causing physician burnout. Advances in artificial intelligence (AI) and machine learning (ML) open the possibility of automating clinical documentation with digital scribes, using speech recognition to eliminate manual documentation by clinicians or medical scribes. However, developing a digital scribe is fraught with problems due to the complex nature of clinical environments and clinical conversations. This paper identifies and discusses major challenges associated with developing automated speech-based documentation in clinical settings: recording high-quality audio, converting audio to transcripts using speech recognition, inducing topic structure from conversation data, extracting medical concepts, generating clinically meaningful summaries of conversations, and obtaining clinical data for AI and ML algorithms.npj Digital Medicine (2019) 2:114 ; https://doi.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.