Lina Sulieman scite author profile

Objective User-generated content (UGC) in online environments provides opportunities to learn an individual’s health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. Materials and Methods We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. Results We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. Conclusions The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.

show abstract

The All of Us Research Program: data quality, utility, and diversity

Ramirez

Sulieman

Schlueter

et al. 2020

Preprint

View full text Add to dashboard Cite

Importance: The All of Us Research Program hypothesizes that accruing one million or more diverse participants engaged in a longitudinal research cohort will advance precision medicine and ultimately improve human health. Launched nationally in 2018, to date All of Us has recruited more than 345,000 participants. All of Us plans to open beta access to researchers in May 2020. Objective: To demonstrate the quality, utility, and diversity of the All of Us Research Programs initial data release and beta launch of the cloud-based analysis platform, the cloud-based Researcher Workbench. Evidence: We analyzed the initial All of Us data release, comprising surveys, physical measurements (PM), and electronic health record (EHR) data, to characterize All of Us participants including self-reported descriptors of diversity. Data depth, density, and quality were evaluated using medication sequencing analyses for depression and type 2 diabetes. Replication of known oncologic associations with smoking exposure ascertained by EHR and survey data and calculation of population-based atherosclerotic cardiovascular disease risk scores demonstrated the utility of data and platform capability. Findings: The beta launch of the All of Us Researcher Workbench contains data on 224,143 participants. Seventy-seven percent of this cohort were identified as Underrepresented in Biomedical Research (UBR) including over forty-eight percent self-reporting non-White race. Medication usage patterns in common diseases depression and type 2 diabetes replicated prior findings previously reported in the literature and showed differences based on race. Oncologic associations with smoking were replicated and effect sizes compared for EHR and survey exposures finding general agreement. A cardiovascular disease score was calculated utilizing multiple data elements curated across sources. The cloud-based architecture built in the Researcher Workbench provided secure access and powerful computational resources at a low cost. All analyses have been made available for replication and reuse by registered researchers. Conclusions and Relevance: The All of Us Research Programs initial release of cohort data contains longitudinal and multidimensional data on diverse participants that replicate known associations. This dataset and the cloud-based Researcher Workbench advance the mission of All of Us to make data widely and securely available to researchers to improve human health and advance precision medicine.

show abstract

Association of Immediate Release of Test Results to Patients With Implications for Clinical Workflow

et al. 2021

View full text Add to dashboard Cite

show abstract

The All of Us Research Program: Data quality, utility, and diversity

et al. 2022

View full text Add to dashboard Cite

Classifying patient portal messages using Convolutional Neural Networks

Sulieman

Gilmore²,

French³

et al. 2017

Journal of Biomedical Informatics

View full text Add to dashboard Cite

show abstract

Comparison of family health history in surveys vs electronic health record data mapped to the observational medical outcomes partnership data model in the All of Us Research Program

Cronin

Halvorson

Springer

et al. 2021

View full text Add to dashboard Cite

Objective Family health history is important to clinical care and precision medicine. Prior studies show gaps in data collected from patient surveys and electronic health records (EHRs). The All of Us Research Program collects family history from participants via surveys and EHRs. This Demonstration Project aims to evaluate availability of family health history information within the publicly available data from All of Us and to characterize the data from both sources. Materials and Methods Surveys were completed by participants on an electronic portal. EHR data was mapped to the Observational Medical Outcomes Partnership data model. We used descriptive statistics to perform exploratory analysis of the data, including evaluating a list of medically actionable genetic disorders. We performed a subanalysis on participants who had both survey and EHR data. Results There were 54 872 participants with family history data. Of those, 26% had EHR data only, 63% had survey only, and 10.5% had data from both sources. There were 35 217 participants with reported family history of a medically actionable genetic disorder (9% from EHR only, 89% from surveys, and 2% from both). In the subanalysis, we found inconsistencies between the surveys and EHRs. More details came from surveys. When both mentioned a similar disease, the source of truth was unclear. Conclusions Compiling data from both surveys and EHR can provide a more comprehensive source for family health history, but informatics challenges and opportunities exist. Access to more complete understanding of a person’s family health history may provide opportunities for precision medicine.

show abstract

Racial, ethnic, and gender differences in obesity and body fat distribution: An All of Us Research Program demonstration project

et al. 2021

View full text Add to dashboard Cite

Differences in obesity and body fat distribution across gender and race/ethnicity have been extensively described. We sought to replicate these differences and evaluate newly emerging data from the All of Us Research Program (AoU). We compared body mass index (BMI), waist circumference, and waist-to-hip ratio from the baseline physical examination, and alanine aminotransferase (ALT) from the electronic health record in up to 88,195 Non-Hispanic White (NHW), 40,770 Non-Hispanic Black (NHB), 35,640 Hispanic, and 5,648 Asian participants. We compared AoU sociodemographic variable distribution to National Health and Nutrition Examination Survey (NHANES) data and applied the pseudo-weighting method for adjusting selection biases of AoU recruitment. Our findings replicate previous observations with respect to gender differences in BMI. In particular, we replicate the large gender disparity in obesity rates among NHB participants, in which obesity and mean BMI are much higher in NHB women than NHB men (33.34 kg/m2 versus 28.40 kg/m2 respectively; p<2.22x10-308). The overall age-adjusted obesity prevalence in AoU participants is similar overall but lower than the prevalence found in NHANES for NHW participants. ALT was higher in men than women, and lower among NHB participants compared to other racial/ethnic groups, consistent with previous findings. Our data suggest consistency of AoU with national averages related to obesity and suggest this resource is likely to be a major source of scientific inquiry and discovery in diverse populations.

show abstract

Automating the Classification of Complexity of Medical Decision-Making in Patient-Provider Messaging in a Patient Portal

Sulieman

Robinson

Jackson

2020

Journal of Surgical Research

View full text Add to dashboard Cite

Background: Patient portals are consumer health applications that allow patients to view their health information. Portals facilitate the interactions between patients and their caregivers by offering secure messaging. Patients communicate different needs through portal messages. Medical needs contain requests for delivery of care (e.g. reporting new symptoms). Automating the classification of medical decision complexity in portal messages has not been investigated. Materials and methods:We trained two multiclass classifiers, multinomial Naïve Bayes and random forest on 500 message threads, to quantify and label the complexity of decisionmaking into four classes: no decision, straightforward, low, and moderate. We compared the performance of the models to using only the number of medical terms without training a machine learning model.Results: Our analysis demonstrated that machine learning models have better performance than the model that did not use machine learning. Moreover, machine learning models could quantify the complexity of decision-making that the messages contained with 0.59, 0.45, and 0.58 for macro, micro, and weighted precision and 0.63,0.41, and 0.63 for macro, micro, and weighted recall.Conclusions: This study is one of the first to attempt to classify patient portal messages by whether they involve medical decision-making and the complexity of that decisionmaking. Machine learning classifiers trained on message content resulted in better message thread classification than classifiers that employed medical terms in the messages alone.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lina Sulieman

A systematic literature review of machine learning in online personal health data

The All of Us Research Program: data quality, utility, and diversity

Association of Immediate Release of Test Results to Patients With Implications for Clinical Workflow

The All of Us Research Program: Data quality, utility, and diversity

Classifying patient portal messages using Convolutional Neural Networks

Comparison of family health history in surveys vs electronic health record data mapped to the observational medical outcomes partnership data model in the All of Us Research Program

Racial, ethnic, and gender differences in obesity and body fat distribution: An All of Us Research Program demonstration project

Automating the Classification of Complexity of Medical Decision-Making in Patient-Provider Messaging in a Patient Portal

Contact Info

Product

Resources

About