Characterizing the Value of Information in Medical Notes

Hsu, Chao-Chun; Karnwal, Shantanu; Mullainathan, Sendhil; Obermeyer, Ziad; Tan, Chenhao

doi:10.18653/v1/2020.findings-emnlp.187

Cited by 11 publications

(6 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Comparing the distribution of the ‘Baseline + Patient-Level Clinician’ model (mean: 0.80; SE: 0.004) to the distribution of the ‘Baseline’ model (mean: 0.70; SE: 0.006), the former distribution significantly dominates the latter, evidencing that there are features predictive of prostate cancer recurrence in patient notes. This is in line with Hsu et al’s work 52 on the readmission prediction task and other studies 17,51 . Further comparing the distribution of the ‘Baseline + Patient-Level Clinician’ model to the ‘Baseline + Automated NLP’ model (mean: 0.74; SE: 0.006), we conclude that the patient-level CFG process of leveraging extensive clinical expertise to identify and create patient-level features is able to extract more signal from the progress notes, compared to the AFG method via NLP.…”

Section: Resultssupporting

confidence: 92%

Patient-Level Clinical Expertise Enhances Prostate Cancer Recurrence Predictions with Machine Learning

Vallon

Panjwani

Ling

et al. 2022

Preprint

View full text Add to dashboard Cite

With rising access to electronic health record data, application of artificial intelligence to create clinical risk prediction models has grown. A key component in designing these models is feature generation. Methods used to generate features differ in the degree of clinical expertise they deploy (from minimal to population-level to patient-level), and subsequently the extent to which they can extract reliable signals and be automated. In this work, we develop a new process that defines how to systematically implement patient-level clinician feature generation (CFG), which leverages clinical expertise to define concepts relevant to the outcome variable, identify each concept's associated features, and finally extract most features on a per-patient level by manual chart review. We subsequently apply this method to identifying and extracting patient-level features predictive of cancer recurrence from progress notes for a cohort of prostate cancer patients. We evaluate the performance of the CFG process against an automated feature generation (AFG) process via natural language processing techniques. The machine learning outcome prediction model leveraging the CFG process has a mean AUC-ROC of 0.80, in comparison to the AFG model that has a mean AUC-ROC of 0.74. This relationship remains qualitatively unchanged throughout extensive sensitivity analyses. Our analyses illustrate the value of in-depth specialist reasoning in generating features from progress notes and provide a proof of concept that there is a need for new research on efficient integration of in-depth clinical expertise into feature generation for clinical risk prediction.

show abstract

Section: Resultssupporting

confidence: 92%

Patient-Level Clinical Expertise Enhances Prostate Cancer Recurrence Predictions with Machine Learning

Vallon

Panjwani

Ling

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The increasing trend of using textual data for summarization might be attributed to the improvement of NLP, the improved computing power required for some NLP tasks, and the results published by Van Vleck et al [ 158 ], who claimed that a significant portion of patient information lies in clinical notes. In contrast, Hsu et al [ 111 ] challenged this hypothesis by presenting experiments to predict some clinical measures (eg, hospital readmission and mortality) using textual and structured patient information sources. They concluded that textual sources have little predictive power for the outcomes.…”

Section: Discussionmentioning

confidence: 99%

Patient Information Summarization in Clinical Settings: Scoping Review

Keszthelyi,

Gaudet-Blavignac,

Bjelogrlic

et al. 2023

JMIR Med Inform

View full text Add to dashboard Cite

Background Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. Objective This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. Methods A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework “collect—synthesize—communicate” referring to information gathering from data, its synthesis, and communication to the end user. Results Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. Conclusions The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the “collect—synthesize—communicate” framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary.

show abstract

“…Therefore, we will explore various approaches to preprocessing the clinical notes, focusing on different ways to select the terms used to train the topic models (e.g., based on term frequency, term frequency-inverse document frequency weights, and named entity recognition). We will also train topic models on different clinical note types and sections, based on evidence that predicting clinical outcomes from medical notes may be improved when more relevant note types or parts are used [ 188 ]. To evaluate these approaches, we will examine associations between emerging topics and the available structured data (i.e., as a form of external validation), as well as perform human-in-the loop evaluations of topic coherence [ 189 ].…”

Section: Methodsmentioning

confidence: 99%

Characterization and prediction of individual functional outcome trajectories in schizophrenia spectrum disorders (PREDICTS study): Study protocol

Agarwal,

Dissanayake,

Agid

et al. 2023

PLoS ONE

View full text Add to dashboard Cite

Schizophrenia spectrum disorders (SSDs) are associated with significant functional impairments, disability, and low rates of personal recovery, along with tremendous economic costs linked primarily to lost productivity and premature mortality. Efforts to delineate the contributors to disability in SSDs have highlighted prominent roles for a diverse range of symptoms, physical health conditions, substance use disorders, neurobiological changes, and social factors. These findings have provided valuable advances in knowledge and helped define broad patterns of illness and outcomes across SSDs. Unsurprisingly, there have also been conflicting findings for many of these determinants that reflect the heterogeneous population of individuals with SSDs and the challenges of conceptualizing and treating SSDs as a unitary categorical construct. Presently it is not possible to identify the functional course on an individual level that would enable a personalized approach to treatment to alter the individual’s functional trajectory and mitigate the ensuing disability they would otherwise experience. To address this ongoing challenge, this study aims to conduct a longitudinal multimodal investigation of a large cohort of individuals with SSDs in order to establish discrete trajectories of personal recovery, disability, and community functioning, as well as the antecedents and predictors of these trajectories. This investigation will also provide the foundation for the co-design and testing of personalized interventions that alter these functional trajectories and improve outcomes for people with SSDs.

show abstract

Characterizing the Value of Information in Medical Notes

Cited by 11 publications

References 25 publications

Patient-Level Clinical Expertise Enhances Prostate Cancer Recurrence Predictions with Machine Learning

Patient-Level Clinical Expertise Enhances Prostate Cancer Recurrence Predictions with Machine Learning

Patient Information Summarization in Clinical Settings: Scoping Review

Characterization and prediction of individual functional outcome trajectories in schizophrenia spectrum disorders (PREDICTS study): Study protocol

Contact Info

Product

Resources

About