Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.
This study present a semi-automated data-driven approach to developing a semantic network that aligns well with the top-level information structure in clinical research eligibility criteria text and demonstrates the feasibility of using the resulting semantic role labels to generate semistructured eligibility criteria with nearly perfect interrater reliability.
Background The telemedicine industry has been experiencing fast growth in recent years. The outbreak of coronavirus disease 2019 (COVID-19) further accelerated the deployment and utilization of telemedicine services. An analysis of the socioeconomic characteristics of telemedicine users to understand potential socioeconomic gaps and disparities is critical for improving the adoption of telemedicine services among patients.
Objectives This study aims to measure the correlation of socioeconomic determinants with the use of telemedicine services in Milwaukee metropolitan area.
Methods Electronic health record review of patients using telemedicine services compared with those not using telemedicine services within an academic-community health system: patient demographics (e.g., age, gender, race, and ethnicity), insurance status, and socioeconomic determinants obtained through block-level census data in Milwaukee area. The telemedicine users were compared with all other patients using regression analysis. The telemedicine adoption rates were calculated across regional ZIP codes to analyze the geographic patterns of telemedicine adoption.
Results A total of 104,139 patients used telemedicine services during the study period. Patients who used video visits were younger (median age 48.12), more likely to be White (odds ratio [OR] 1.34; 95% confidence interval [CI], 1.31–1.37), and have private insurance (OR 1.43; CI, 1.41–1.46); patients who used telephone visits were older (median age 57.58), more likely to be Black (OR 1.31; CI 1.28–1.35), and have public insurance (OR 1.30; CI 1.27–1.32). In general, Latino and Asian populations were less likely to use telemedicine; women used more telemedicine services in general than men. In the multiple regression analysis of social determinant factors across 126 ZIP codes, college education (coefficient 1.41, p = 0.01) had a strong correlation to video telemedicine adoption rate.
Conclusion Adoption of telemedicine services was significantly impacted by the social determinant factors of health, such as income, education level, race, and insurance type. The study reveals the potential inequities and disparities in telemedicine adoption.
Objective
To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity.
Design
The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers.
Measurements
We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1,578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the “bag of words” feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and Instance-based Learning Classifier.
Results
The UMLS semantic feature representation outperforms the “bag of words” feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2,000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency.
Conclusion
The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.
Objective
To identify Common Data Elements (CDEs) in eligibility criteria of multiple clinical trials studying the same disease using a human-computer collaborative approach.
Design
A set of free-text eligibility criteria from clinical trials on two representative diseases, breast cancer and cardiovascular diseases, was sampled to identify disease-specific eligibility criteria CDEs. In this proposed approach, a semantic annotator is used to recognize Unified Medical Language Systems (UMLS) terms within the eligibility criteria text. The Apriori algorithm is applied to mine frequent disease-specific UMLS terms, which are then filtered by a list of preferred UMLS semantic types, grouped by similarity based on the Dice coefficient, and, finally, manually reviewed.
Measurements
Standard precision, recall, and F-score of the CDEs recommended by the proposed approach were measured with respect to manually identified CDEs.
Results
Average precision and recall of the recommended CDEs for the two diseases were 0.823 and 0.797, respectively, leading to an average F-score of 0.810. In addition, the machine-powered CDEs covered 80% of the cardiovascular CDEs published by The American Heart Association and assigned by human experts.
Conclusion
It is feasible and effort saving to use a human-computer collaborative approach to augment domain experts for identifying disease-specific CDEs from free-text clinical trial eligibility criteria.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.