In the last few years, there has been a growing expectation created about the analysis of large amounts of data often available in organizations, which has been both scrutinized by the academic world and successfully exploited by industry. Nowadays, two of the most common terms heard in scientific circles are Big Data and Deep Learning. In this double review, we aim to shed some light on the current state of these different, yet somehow related branches of Data Science, in order to understand the current state and future evolution within the healthcare area. We start by giving a simple description of the technical elements of Big Data technologies, as well as an overview of the elements of Deep Learning techniques, according to their usual description in scientific literature. Then, we pay attention to the application fields that can be said to have delivered relevant real-world success stories, with emphasis on examples from large technology companies and financial institutions, among others. The academic effort that has been put into bringing these technologies to the healthcare sector are then summarized and analyzed from a twofold view as follows: first, the landscape of application examples is globally scrutinized according to the varying nature of medical data, including the data forms in electronic health recordings, medical time signals, and medical images; second, a specific application field is given special attention, in particular the electrocardiographic signal analysis, where a number of works have been published in the last two years. A set of toy application examples are provided with the publicly-available MIMIC dataset, aiming to help the beginners start with some principled, basic, and structured material and available code. Critical discussion is provided for current and forthcoming challenges on the use of both sets of techniques in our future healthcare.
Ovarian cancer (OC) is the second most common gynecological malignancy and the gynecological tumor with the worst prognosis. To try to improve this situation, Data Science technologies could be a useful tool to help clinicians to know more about the disease. In our case, we are interested in exploring OC data to discover relationships between clinical and genetic factors and the disease progression. For it, we propose an analysis framework for simple and univariate statistical descriptions of features of different types, based on bootstrap resampling. Foremost, we define the framework for metric, categorical, and dates variables and determine what are the advantages and disadvantages of using different bootstrap resampling strategies, based on their statistical basis. Then, we use it to perform a univariate analysis over an OC dataset that allows to explore how is the disease progression, having platinum-free interval as indicator, in relation to clinical and genetic features of different types. Also, it provides a first set of variables possibly relevant for survival prediction. Results obtained show that some features have led to individual differences between both platinum resistant (<6 months) and platinum sensitive(>6 months) groups. It can be concluded that this could be an indicator that the database could be discriminatory for the hypotheses studied, though it is convenient to make multivariate analyses to check how relationships among features are influenced.
Background The high incidence and mortality rate of colorectal cancer require new technologies to improve its early diagnosis. This study aims at extracting the medical needs related to the endoscopic technology and the colonoscopy procedure currently used for colorectal cancer diagnosis, essential for designing these demanded technologies. Methods Semi-structured interviews and an online survey were used. Results Six endoscopists were interviewed and 103 were surveyed, obtaining the demanded needs that can be divided into: a) clinical needs, for better polyp detection and classification (especially flat polyps), location, size, margins and penetration depth; b) computer-aided diagnosis (CAD) system needs, for additional visual information supporting polyp characterization and diagnosis; and c) operational/physical needs, related to limitations of image quality, colon lighting, flexibility of the endoscope tip, and even poor bowel preparation. Conclusions This study shows some undertaken initiatives to meet the detected medical needs and challenges to be solved. The great potential of advanced optical technologies suggests their use for a better polyp detection and classification since they provide additional functional and structural information than the currently used image enhancement technologies. The inspection of remaining tissue of diminutive polyps (< 5 mm) should be addressed to reduce recurrence rates. Few progresses have been made in estimating the infiltration depth. Detection and classification methods should be combined into one CAD system, providing visual aids over polyps for detection and displaying a Kudo-based diagnosis suggestion to assist the endoscopist on real-time decision making. Estimated size and location of polyps should also be provided. Endoscopes with 360° vision are still a challenge not met by the mechanical and optical systems developed to improve the colon inspection. Patients and healthcare providers should be trained to improve the patient’s bowel preparation.
Developments of richer integrative analysis methods for oncological studies are needed for efficiently leveraging the amount of clinical and genetic data available to provide the clinicians with better information. However, analyses of this nature often require mixing data of different types, which are not immediate to address jointly with classical methods. In this work, our aim is to find relationships between clinical and genetic features of different types (metric, categorical, and text) and the ovarian cancer (OC) disease progression. To this end, we first propose a univariate statistical method for text type applying bootstrap resampling to Bag of Words and Latent Dirichlet Allocation in order to include as features the free-text fields of the health recordings. Secondly, we extend bootstrap resampling for metric and categorical feature extraction with Principal Component Analysis (PCA) and Multiple Correspondence Analysis (MCA), respectively. We subsequently formulate a novel and integrative method for jointly considering metric, categorical, and text features. Results obtained in text analysis indicate individual differences in some words between two OC patients groups categorised according to their sensitivity to platinum drugs. These results indicate separability between both groups for text features. Also, regarding the multivariate analysis, clinical data results showed separability patterns for the three methods analysed according to the platinum-sensitivity degree. The use of these analytical tools in our OC cohort has allowed us to demonstrate their strengths by confirming the predictive and prognostic role of widely-known clinical and genetic variables (BRCA status, value of adjuvant therapy and optimal resection, or family history) and demonstrating significant associations in other variables whose role in OC development has been studied to a lesser extent (such as PMS1, GPC3, and SLX4 genes). These results highlight the value of implementing these approaches for the identification of novel biomarkers in the context of OC.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.