Objectives Patient-generated health data (PGHD) are important for tracking and monitoring out of clinic health events and supporting shared clinical decisions. Unstructured text as PGHD (eg, medical diary notes and transcriptions) may encapsulate rich information through narratives which can be critical to better understand a patient’s condition. We propose a natural language processing (NLP) supported data synthesis pipeline for unstructured PGHD, focusing on children with special healthcare needs (CSHCN), and demonstrate it with a case study on cystic fibrosis (CF). Materials and Methods The proposed unstructured data synthesis and information extraction pipeline extract a broad range of health information by combining rule-based approaches with pretrained deep-learning models. Particularly, we build upon the scispaCy biomedical model suite, leveraging its named entity recognition capabilities to identify and link clinically relevant entities to established ontologies such as Systematized Nomenclature of Medicine (SNOMED) and RXNORM. We then use scispaCy’s syntax (grammar) parsing tools to retrieve phrases associated with the entities in medication, dose, therapies, symptoms, bowel movements, and nutrition ontological categories. The pipeline is illustrated and tested with simulated CF patient notes. Results The proposed hybrid deep-learning rule-based approach can operate over a variety of natural language note types and allow customization for a given patient or cohort. Viable information was successfully extracted from simulated CF notes. This hybrid pipeline is robust to misspellings and varied word representations and can be tailored to accommodate the needs of a specific patient, cohort, or clinician. Discussion The NLP pipeline can extract predefined or ontology-based entities from free-text PGHD, aiming to facilitate remote care and improve chronic disease management. Our implementation makes use of open source models, allowing for this solution to be easily replicated and integrated in different health systems. Outside of the clinic, the use of the NLP pipeline may increase the amount of clinical data recorded by families of CSHCN and ease the process to identify health events from the notes. Similarly, care coordinators, nurses and clinicians would be able to track adherence with medications, identify symptoms, and effectively intervene to improve clinical care. Furthermore, visualization tools can be applied to digest the structured data produced by the pipeline in support of the decision-making process for a patient, caregiver, or provider. Conclusion Our study demonstrated that an NLP pipeline can be used to create an automated analysis and reporting mechanism for unstructured PGHD. Further studies are suggested with real-world data to assess pipeline performance and further implications.
Background Parental justice involvement (eg, prison, jail, parole, or probation) is an unfortunately common and disruptive household adversity for many US youths, disproportionately affecting families of color and rural families. Data on this adversity has not been captured routinely in pediatric health care settings, and if it is, it is not discrete nor able to be readily analyzed for purposes of research. Objective In this study, we outline our process training a state-of-the-art natural language processing model using unstructured clinician notes of one large pediatric health system to identify patients who have experienced a justice-involved parent. Methods Using the electronic health record database of a large Midwestern pediatric hospital-based institution from 2011-2019, we located clinician notes (of any type and written by any type of provider) that were likely to contain such evidence of family justice involvement via a justice-keyword search (eg, prison and jail). To train and validate the model, we used a labeled data set of 7500 clinician notes identifying whether the patient was ever exposed to parental justice involvement. We calculated the precision and recall of the model and compared those rates to the keyword search. Results The development of the machine learning model increased the precision (positive predictive value) of locating children affected by parental justice involvement in the electronic health record from 61% (a simple keyword search) to 92%. Conclusions The use of machine learning may be a feasible approach to addressing the gaps in our understanding of the health and health services of underrepresented youth who encounter childhood adversities not routinely captured—particularly for children of justice-involved parents.
Background Patient-generated health data (PGHD) captured via smart devices or digital health technologies can reflect an individual health journey. PGHD enables tracking and monitoring of personal health conditions, symptoms, and medications out of the clinic, which is crucial for self-care and shared clinical decisions. In addition to self-reported measures and structured PGHD (eg, self-screening, sensor-based biometric data), free-text and unstructured PGHD (eg, patient care note, medical diary) can provide a broader view of a patient’s journey and health condition. Natural language processing (NLP) is used to process and analyze unstructured data to create meaningful summaries and insights, showing promise to improve the utilization of PGHD. Objective Our aim is to understand and demonstrate the feasibility of an NLP pipeline to extract medication and symptom information from real-world patient and caregiver data. Methods We report a secondary data analysis, using a data set collected from 24 parents of children with special health care needs (CSHCN) who were recruited via a nonrandom sampling approach. Participants used a voice-interactive app for 2 weeks, generating free-text patient notes (audio transcription or text entry). We built an NLP pipeline using a zero-shot approach (adaptive to low-resource settings). We used named entity recognition (NER) and medical ontologies (RXNorm and SNOMED CT [Systematized Nomenclature of Medicine Clinical Terms]) to identify medication and symptoms. Sentence-level dependency parse trees and part-of-speech tags were used to extract additional entity information using the syntactic properties of a note. We assessed the data; evaluated the pipeline with the patient notes; and reported the precision, recall, and F1 scores. Results In total, 87 patient notes are included (audio transcriptions n=78 and text entries n=9) from 24 parents who have at least one CSHCN. The participants were between the ages of 26 and 59 years. The majority were White (n=22, 92%), had more than one child (n=16, 67%), lived in Ohio (n=22, 92%), had mid- or upper-mid household income (n=15, 62.5%), and had higher level education (n=24, 58%). Out of 87 notes, 30 were drug and medication related, and 46 were symptom related. We captured medication instances (medication, unit, quantity, and date) and symptoms satisfactorily (precision >0.65, recall >0.77, F1>0.72). These results indicate the potential when using NER and dependency parsing through an NLP pipeline on information extraction from unstructured PGHD. Conclusions The proposed NLP pipeline was found to be feasible for use with real-world unstructured PGHD to accomplish medication and symptom extraction. Unstructured PGHD can be leveraged to inform clinical decision-making, remote monitoring, and self-care including medical adherence and chronic disease management. With customizable information extraction methods using NER and medical ontologies, NLP models can feasibly extract a broad range of clinical information from unstructured PGHD in low-resource settings (eg, a limited number of patient notes or training data).
Background Many people use apps to help understand and manage their depression symptoms. App-administered questionnaires for the symptoms of depression, such as the Patient Health Questionnaire-9, are easy to score and implement in an app, but may not be accompanied by essential resources and access needed to provide proper support and avoid potential harm. Objective Our primary goal was to evaluate the differences in risks and helpfulness associated with using an app to self-diagnose depression, comparing assessment-only apps with multifeatured apps. We also investigated whether, what, and how additional app features may mitigate potential risks. Methods In this retrospective observational study, we identified apps in the Google Play store that provided a depression assessment as a feature and had at least five user comments. We separated apps into two categories based on those having only a depression assessment versus those that offered additional supportive features. We conducted theoretical thematic analyses over the user reviews, with thematic coding indicating the helpfulness of the app, the presence of suicidal ideation, and how and why the apps were used. We compared the results across the two categories of apps and analyzed the differences using chi-square statistical tests. Results We evaluated 6 apps; 3 provided only a depression assessment (assessment only), and 3 provided features in addition to self-assessment (multifeatured). User comments for assessment-only apps indicated significantly more suicidal ideation or self-harm (n=31, 9.4%) compared to comments for multifeatured apps (n=48, 2.3%; X21=43.88, P<.001). Users of multifeatured apps were over three times more likely than assessment-only app users to comment in favor of the app’s helpfulness, likely due to features like mood tracking, journaling, and informational resources (n=56, 17% vs n=1223, 59% respectively; X21=200.36, P<.001). The number of users under the age of 18 years was significantly higher among assessment-only app users (n=40, 12%) than multifeatured app users (n=9, 0.04%; X21=189.09, P<.001). Conclusions Apps that diagnose depression by self-assessment without context or other supportive features are more likely to be used by those under 18 years of age and more likely to be associated with increased user distress and potential harm. Depression self-assessments in apps should be implemented with caution and accompanied by evidence-based capabilities that establish proper context, increase self-empowerment, and encourage users to seek clinical diagnostics and outside help.
UNSTRUCTURED Patient-generated health data (PGHD) is becoming a necessary component of remote monitoring and clinical decision-making as well as self-care and managing chronic conditions. Medical diaries and patient notes have been the primary source of unstructured PGHD, enabling patients to collect and communicate their health information remotely and in real-time to their providers. Yet there is no established mechanism or pipeline for processing free text patient notes (“unstructured PGHD”) or integrating these notes into clinical decision making in a low-resource setting (i.e., not depending on a large dataset or processing power). In the literature, there are a number of studies reporting NLP applications on clinical notes to identify symptoms and conditions, but these studies have been limited to public data (e.g. social media posts by patients). In this paper, we evaluate the performance of an hybrid (deep learning + rule-based) NLP pipeline on a low-resource PHGD dataset through an empirical evaluation of automatic component extraction where we measure the model’s ability to conduct automatic entity extraction using ontologies (medication-dose: RxNORM, symptoms: SNOMED) from patient notes.
BACKGROUND Patient generated health data (PGHD) is important to understand a patient's health condition out of the clinic and communicate timely. It plays a supplementary role in preventive medicine, self-care, remote patient monitoring and patient-reported outcomes. In addition to standard measures and structured data (sensors, biometric data), unstructured PGHD (free-text data) can provide a broader view of a patient's journey and health condition. OBJECTIVE Our aim to evaluate feasibility of an NLP pipeline with real-world patient and caregiver data. METHODS Using a zero-shot approach which is adaptive to low-resource settings, the NLP pipeline is built upon named entity recognition (NER) to identify medication and symptoms using the standard ontologies (RXNorm and SNOMED CT). Sentence level dependency parse trees and part-of-speech tags were included to extract additional entity information using the syntactic properties of a note. We tested the model with the patient notes (text-based or transcribed audio notes) collected from 24 parents of children with special healthcare needs during a 2-weeks use of a voice-interactive app. In total, 87 patient notes were used. RESULTS In total, 87 patient notes are included (voice entry transcriptions (n=78) and text entries (n=9)). 30 of the notes are drug and medication-related, and 57 of the notes are symptom-related. We are able to capture medication instances (medication, unit, quantity, and date) and symptoms satisfactorily (Precision >0.65, Recall >0.77, F1>0.72). These results indicate the potential when using NER and dependency parsing through an NLP pipeline on information extraction from unstructured PGHD. CONCLUSIONS Unstructured PGHD provides a new and untapped layer in patient health records which can inform decision making and support remote monitoring and self-care. In this paper, we share the new research findings and preliminary results for a customizable information extraction (IE) NLP model focused on extracting a broad-range of clinical information from unstructured PGHD in low-resource settings, especially as it relates to chronic disease management.
BACKGROUND Many people use apps to help understand and manage their depression symptoms. App-administered questionnaires for the symptoms of depression such as the Patient Health Questionnaire 9 are easy to score and implement in an app, but may not be accompanied by important resources and access needed to properly support and to avoid potential harm. OBJECTIVE Our main objective was to evaluate the differences in risks and helpfulness associated with using an app to self-diagnose depression, comparing assessment-only apps with multi-featured apps. Secondly, we also investigated whether, what, and how additional app features may mitigate potential risks. METHODS In this retrospective observational study, we identified apps in the Google Play app store that provided a depression assessment as a feature and had at least five user comments. We separated apps into two categories based on those having only a depression assessment vs. those that offered additional supportive features. We conducted theoretical thematic analysis over the user comments in the reviews, with thematic coding indicating the helpfulness of the app, the presence of suicidal ideation, and how and why the apps were used. We compared the results across the two categories of apps and analyzed the differences using chi-square statistical tests. RESULTS A total of six apps were used of which 3 (50%) provided only a depression assessment (assessment only) and 3 (50%) provided features in addition to self assessment (multi-featured). User comments for assessment only apps (n = 31, 9.4%) indicated significantly more suicidal ideation or self harm compared to comments for multi-featured apps (n = 48, 2.3%, χ2=43.88, P<.001). Compared to users of assessment only apps, users of multi-featured apps were over 3 times more likely to comment in favor of the apps’ helpfulness, mood tracking, journaling, and informational resources (n = 56, 17% vs. n = 1223, 59% respectively, χ2=200.36, P<.001). Number of users under age 18 years was significantly higher among assessment only app users (n = 40, 12%) compared to multi-featured app users (n = 9, 0.04%, χ2=189.09, P<.001). CONCLUSIONS Apps that diagnose depression by self-assessment without context or other supportive features are more likely to be used by those under the age of 18 and more likely to be associated with increased user distress and potential harm. Depression self-assessments in apps should be implemented with caution and should be accompanied by evidence-based capabilities that establish proper context, increase self-empowerment, and encourage users to seek clinical diagnostics and outside help. CLINICALTRIAL
BACKGROUND Parental justice involvement (eg, prison, jail, parole, or probation) is an unfortunately common and disruptive household adversity for many US youths, disproportionately affecting families of color and rural families. Data on this adversity has not been captured routinely in pediatric health care settings, and if it is, it is not discrete nor able to be readily analyzed for purposes of research. OBJECTIVE In this study, we outline our process training a state-of-the-art natural language processing model using unstructured clinician notes of one large pediatric health system to identify patients who have experienced a justice-involved parent. METHODS Using the electronic health record database of a large Midwestern pediatric hospital-based institution from 2011-2019, we located clinician notes (of any type and written by any type of provider) that were likely to contain such evidence of family justice involvement via a justice-keyword search (eg, prison and jail). To train and validate the model, we used a labeled data set of 7500 clinician notes identifying whether the patient was ever exposed to parental justice involvement. We calculated the precision and recall of the model and compared those rates to the keyword search. RESULTS The development of the machine learning model increased the precision (positive predictive value) of locating children affected by parental justice involvement in the electronic health record from 61% (a simple keyword search) to 92%. CONCLUSIONS The use of machine learning may be a feasible approach to addressing the gaps in our understanding of the health and health services of underrepresented youth who encounter childhood adversities not routinely captured—particularly for children of justice-involved parents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.