Jake Luo scite author profile

Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.

show abstract

EliXR: an approach to eligibility criteria extraction and representation

Weng

Luo

et al. 2011

Journal of the American Medical Informatics Association

107

100

View full text Add to dashboard Cite

show abstract

Telemedicine Adoption during the COVID-19 Pandemic: Gaps and Inequalities

et al. 2021

View full text Add to dashboard Cite

Background The telemedicine industry has been experiencing fast growth in recent years. The outbreak of coronavirus disease 2019 (COVID-19) further accelerated the deployment and utilization of telemedicine services. An analysis of the socioeconomic characteristics of telemedicine users to understand potential socioeconomic gaps and disparities is critical for improving the adoption of telemedicine services among patients. Objectives This study aims to measure the correlation of socioeconomic determinants with the use of telemedicine services in Milwaukee metropolitan area. Methods Electronic health record review of patients using telemedicine services compared with those not using telemedicine services within an academic-community health system: patient demographics (e.g., age, gender, race, and ethnicity), insurance status, and socioeconomic determinants obtained through block-level census data in Milwaukee area. The telemedicine users were compared with all other patients using regression analysis. The telemedicine adoption rates were calculated across regional ZIP codes to analyze the geographic patterns of telemedicine adoption. Results A total of 104,139 patients used telemedicine services during the study period. Patients who used video visits were younger (median age 48.12), more likely to be White (odds ratio [OR] 1.34; 95% confidence interval [CI], 1.31–1.37), and have private insurance (OR 1.43; CI, 1.41–1.46); patients who used telephone visits were older (median age 57.58), more likely to be Black (OR 1.31; CI 1.28–1.35), and have public insurance (OR 1.30; CI 1.27–1.32). In general, Latino and Asian populations were less likely to use telemedicine; women used more telemedicine services in general than men. In the multiple regression analysis of social determinant factors across 126 ZIP codes, college education (coefficient 1.41, p = 0.01) had a strong correlation to video telemedicine adoption rate. Conclusion Adoption of telemedicine services was significantly impacted by the social determinant factors of health, such as income, education level, race, and insurance type. The study reveals the potential inequities and disparities in telemedicine adoption.

show abstract

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Luo

Yetisgen-Yildiz²,

Weng

2011

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Objective To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity. Design The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers. Measurements We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1,578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the “bag of words” feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and Instance-based Learning Classifier. Results The UMLS semantic feature representation outperforms the “bag of words” feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2,000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency. Conclusion The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.

show abstract

A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications

Zolnoori

Fung

Patrick

et al. 2019

Journal of Biomedical Informatics

View full text Add to dashboard Cite

A deep learning study on osteosarcoma detection from histological images

Anisuzzaman

Barzekar

Tong

et al. 2021

Biomedical Signal Processing and Control

View full text Add to dashboard Cite

A human–computer collaborative approach to identifying common data elements in clinical trial eligibility criteria

Luo

Miotto

Weng

2013

Journal of Biomedical Informatics

View full text Add to dashboard Cite

Objective To identify Common Data Elements (CDEs) in eligibility criteria of multiple clinical trials studying the same disease using a human-computer collaborative approach. Design A set of free-text eligibility criteria from clinical trials on two representative diseases, breast cancer and cardiovascular diseases, was sampled to identify disease-specific eligibility criteria CDEs. In this proposed approach, a semantic annotator is used to recognize Unified Medical Language Systems (UMLS) terms within the eligibility criteria text. The Apriori algorithm is applied to mine frequent disease-specific UMLS terms, which are then filtered by a list of preferred UMLS semantic types, grouped by similarity based on the Dice coefficient, and, finally, manually reviewed. Measurements Standard precision, recall, and F-score of the CDEs recommended by the proposed approach were measured with respect to manually identified CDEs. Results Average precision and recall of the recommended CDEs for the two diseases were 0.823 and 0.797, respectively, leading to an average F-score of 0.810. In addition, the machine-powered CDEs covered 80% of the cardiovascular CDEs published by The American Heart Association and assigned by human experts. Conclusion It is feasible and effort saving to use a human-computer collaborative approach to augment domain experts for identifying disease-specific CDEs from free-text clinical trial eligibility criteria.

show abstract

Very large-scale data classification based on K-means clustering and multi-kernel SVM

et al. 2018

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jake Luo

Big Data Application in Biomedical Research and Health Care: A Literature Review

EliXR: an approach to eligibility criteria extraction and representation

Telemedicine Adoption during the COVID-19 Pandemic: Gaps and Inequalities

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

A systematic approach for developing a corpus of patient reported adverse drug events: A case study for SSRI and SNRI medications

A deep learning study on osteosarcoma detection from histological images

A human–computer collaborative approach to identifying common data elements in clinical trial eligibility criteria

Very large-scale data classification based on K-means clustering and multi-kernel SVM

Contact Info

Product

Resources

About