Objective User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. Materials and Methods We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. Results We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. Conclusions The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Importance: The All of Us Research Program hypothesizes that accruing one million or more diverse participants engaged in a longitudinal research cohort will advance precision medicine and ultimately improve human health. Launched nationally in 2018, to date All of Us has recruited more than 345,000 participants. All of Us plans to open beta access to researchers in May 2020. Objective: To demonstrate the quality, utility, and diversity of the All of Us Research Programs initial data release and beta launch of the cloud-based analysis platform, the cloud-based Researcher Workbench. Evidence: We analyzed the initial All of Us data release, comprising surveys, physical measurements (PM), and electronic health record (EHR) data, to characterize All of Us participants including self-reported descriptors of diversity. Data depth, density, and quality were evaluated using medication sequencing analyses for depression and type 2 diabetes. Replication of known oncologic associations with smoking exposure ascertained by EHR and survey data and calculation of population-based atherosclerotic cardiovascular disease risk scores demonstrated the utility of data and platform capability. Findings: The beta launch of the All of Us Researcher Workbench contains data on 224,143 participants. Seventy-seven percent of this cohort were identified as Underrepresented in Biomedical Research (UBR) including over forty-eight percent self-reporting non-White race. Medication usage patterns in common diseases depression and type 2 diabetes replicated prior findings previously reported in the literature and showed differences based on race. Oncologic associations with smoking were replicated and effect sizes compared for EHR and survey exposures finding general agreement. A cardiovascular disease score was calculated utilizing multiple data elements curated across sources. The cloud-based architecture built in the Researcher Workbench provided secure access and powerful computational resources at a low cost. All analyses have been made available for replication and reuse by registered researchers. Conclusions and Relevance: The All of Us Research Programs initial release of cohort data contains longitudinal and multidimensional data on diverse participants that replicate known associations. This dataset and the cloud-based Researcher Workbench advance the mission of All of Us to make data widely and securely available to researchers to improve human health and advance precision medicine.
Additional research is necessary to assess patient and physician perceptions of reviewing immediately released test results, which warrants future research.
Word2Vec and graph representations improved the accuracy of classifying portal messages compared to features that lacked semantic information such as bag of words, and bag of phrases. Furthermore, using Word2Vec along with a CNN model, which provide a higher order representation, improved the classification of portal messages.
Objective Family health history is important to clinical care and precision medicine. Prior studies show gaps in data collected from patient surveys and electronic health records (EHRs). The All of Us Research Program collects family history from participants via surveys and EHRs. This Demonstration Project aims to evaluate availability of family health history information within the publicly available data from All of Us and to characterize the data from both sources. Materials and Methods Surveys were completed by participants on an electronic portal. EHR data was mapped to the Observational Medical Outcomes Partnership data model. We used descriptive statistics to perform exploratory analysis of the data, including evaluating a list of medically actionable genetic disorders. We performed a subanalysis on participants who had both survey and EHR data. Results There were 54 872 participants with family history data. Of those, 26% had EHR data only, 63% had survey only, and 10.5% had data from both sources. There were 35 217 participants with reported family history of a medically actionable genetic disorder (9% from EHR only, 89% from surveys, and 2% from both). In the subanalysis, we found inconsistencies between the surveys and EHRs. More details came from surveys. When both mentioned a similar disease, the source of truth was unclear. Conclusions Compiling data from both surveys and EHR can provide a more comprehensive source for family health history, but informatics challenges and opportunities exist. Access to more complete understanding of a person’s family health history may provide opportunities for precision medicine.
Differences in obesity and body fat distribution across gender and race/ethnicity have been extensively described. We sought to replicate these differences and evaluate newly emerging data from the All of Us Research Program (AoU). We compared body mass index (BMI), waist circumference, and waist-to-hip ratio from the baseline physical examination, and alanine aminotransferase (ALT) from the electronic health record in up to 88,195 Non-Hispanic White (NHW), 40,770 Non-Hispanic Black (NHB), 35,640 Hispanic, and 5,648 Asian participants. We compared AoU sociodemographic variable distribution to National Health and Nutrition Examination Survey (NHANES) data and applied the pseudo-weighting method for adjusting selection biases of AoU recruitment. Our findings replicate previous observations with respect to gender differences in BMI. In particular, we replicate the large gender disparity in obesity rates among NHB participants, in which obesity and mean BMI are much higher in NHB women than NHB men (33.34 kg/m2 versus 28.40 kg/m2 respectively; p<2.22x10-308). The overall age-adjusted obesity prevalence in AoU participants is similar overall but lower than the prevalence found in NHANES for NHW participants. ALT was higher in men than women, and lower among NHB participants compared to other racial/ethnic groups, consistent with previous findings. Our data suggest consistency of AoU with national averages related to obesity and suggest this resource is likely to be a major source of scientific inquiry and discovery in diverse populations.
Background: Patient portals are consumer health applications that allow patients to view their health information. Portals facilitate the interactions between patients and their caregivers by offering secure messaging. Patients communicate different needs through portal messages. Medical needs contain requests for delivery of care (e.g. reporting new symptoms). Automating the classification of medical decision complexity in portal messages has not been investigated. Materials and methods:We trained two multiclass classifiers, multinomial Naïve Bayes and random forest on 500 message threads, to quantify and label the complexity of decisionmaking into four classes: no decision, straightforward, low, and moderate. We compared the performance of the models to using only the number of medical terms without training a machine learning model.Results: Our analysis demonstrated that machine learning models have better performance than the model that did not use machine learning. Moreover, machine learning models could quantify the complexity of decision-making that the messages contained with 0.59, 0.45, and 0.58 for macro, micro, and weighted precision and 0.63,0.41, and 0.63 for macro, micro, and weighted recall.Conclusions: This study is one of the first to attempt to classify patient portal messages by whether they involve medical decision-making and the complexity of that decisionmaking. Machine learning classifiers trained on message content resulted in better message thread classification than classifiers that employed medical terms in the messages alone.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.