Background Given the established links between an individual’s behaviors and lifestyle factors and potentially adverse health outcomes, univariate or simple multivariate health metrics and scores have been developed to quantify general health at a given point in time and estimate risk of negative future outcomes. However, these health metrics may be challenging for widespread use and are unlikely to be successful at capturing the broader determinants of health in the general population. Hence, there is a need for a multidimensional yet widely employable and accessible way to obtain a comprehensive health metric. Objective The objective of the study was to develop and validate a novel, easily interpretable, points-based health score (“C-Score”) derived from metrics measurable using smartphone components and iterations thereof that utilize statistical modeling and machine learning (ML) approaches. Methods A literature review was conducted to identify relevant predictor variables for inclusion in the first iteration of a points-based model. This was followed by a prospective cohort study in a UK Biobank population for the purposes of validating the C-Score and developing and comparatively validating variations of the score using statistical and ML models to assess the balance between expediency and ease of interpretability and model complexity. Primary and secondary outcome measures were discrimination of a points-based score for all-cause mortality within 10 years (Harrell c-statistic) and discrimination and calibration of Cox proportional hazards models and ML models that incorporate C-Score values (or raw data inputs) and other predictors to predict the risk of all-cause mortality within 10 years. Results The study cohort comprised 420,560 individuals. During a cohort follow-up of 4,526,452 person-years, there were 16,188 deaths from any cause (3.85%). The points-based model had good discrimination (c-statistic=0.66). There was a 31% relative reduction in risk of all-cause mortality per decile of increasing C-Score (hazard ratio of 0.69, 95% CI 0.663-0.675). A Cox model integrating age and C-Score had improved discrimination (8 percentage points; c-statistic=0.74) and good calibration. ML approaches did not offer improved discrimination over statistical modeling. Conclusions The novel health metric (“C-Score”) has good predictive capabilities for all-cause mortality within 10 years. Embedding the C-Score within a smartphone app may represent a useful tool for democratized, individualized health risk prediction. A simple Cox model using C-Score and age balances parsimony and accuracy of risk predictions and could be used to produce absolute risk estimations for app users.
We present an explainable AI framework to predict mortality after a positive COVID-19 diagnosis based solely on data routinely collected in electronic healthcare records (EHRs) obtained prior to diagnosis. We grounded our analysis on the 1/2 Million people UK Biobank and linked NHS COVID-19 records. We developed a method to capture the complexities and large variety of clinical codes present in EHRs, and we show that these have a larger impact on risk than all other patient data but age. We use a form of clustering for natural language processing of the clinical codes, specifically, topic modelling by Latent Dirichlet Allocation (LDA), to generate a succinct digital fingerprint of a patient's full secondary care clinical history, i.e. their co-morbidities and past interventions. These digital comorbidity fingerprints offer immediately interpretable clinical descriptions that are meaningful, e.g. grouping cardiovascular disorders with common risk factors but also novel groupings that are not obvious. The comorbidity fingerprints differ in both their breadth and depth from existing observational disease associations in the COVID-19 literature. Taking this data-driven approach allows us to avoid human-induction bias and confirmation bias during the selection of what are important potential predictors of COVID-19 mortality. Together with age, these digital fingerprints are the single most important factor in our predictor. This holds the potential for improving individual risk profiling for clinical decisions and the identification of groups for public health interventions such as vaccine programmes. Combining our digital precondition fingerprints with demographic characteristics allow us to match or exceed the performance of existing state-of-the-art COVID-19 mortality predictors (EHCF) which have been developed through expert consensus. Our precondition fingerprinting and entire mortality prediction analytics pipeline is designed so as to be rapidly redeployable, e.g. for COVID-19 variants or other pre-existing diseases.
BACKGROUND Even though established links exist between individuals behaviours and potentially adverse health outcomes, to date either univariate, simpler models or multivariate, yet difficult to employ ones, have been developed. Such models are unlikely to be successful at capturing the wider determinants of health in the broader population. Hence, there is a need for a multidimensional, yet widely employable and accessible, way to obtain a comprehensive health metric. OBJECTIVE To develop and validate a novel, easily interpretable points-based health score ("C-Score") derived from metrics measurable using smartphone components, and iterations thereof that utilise statistical modelling and machine learning approaches. METHODS Comprehensive literature review to identify suitable predictor variables for inclusion in a first iteration points-based model. This was followed by a prospective cohort study in a UK Biobank population for the purposes of validating the C-Score, and developing and comparatively validating variations of the score using statistical/machine learning models to assess the balance between expediency and ease of interpretability versus model complexity. Primary and secondary outcome measures: Discrimination of a points-based score for all-cause mortality within 10 years (Harrell’s c-statistic). Discrimination and calibration of Cox proportional hazards models and machine learning models that incorporate C-Score values (or raw data inputs) and other predictors to predict risk of all-cause mortality within 10 years. RESULTS The cohort comprised 420,560 individuals. During a cohort follow-up of 4,526,452 person-years, there were 16,188 deaths from any cause (3.85%). The points-based model had good discrimination (c-statistic = 0.66). There was a 31% relative reduction in risk of all-cause mortality per decile of increasing C-Score (hazard ratio: 0.69, 95% CI: 0.663 to 0.675). A Cox model integrating age and C-Score had improved discrimination (8% percentage points, c-statistic = 0.74) and good calibration. Machine learning approaches did not offer improved discrimination over statistical modelling. CONCLUSIONS The novel health metric (‘C-Score’) has good predictive capabilities for all-cause mortality within 10 years. Embedding C-Score within a smartphone application may represent a useful tool for democratised, individualised health risk prediction. A simple Cox model using C-Score and age optimally balances parsimony and accuracy of risk predictions and could be used to produce absolute risk estimations for application users.
We present an explainable AI framework to predict mortality after a positive COVID-19 diagnosis based solely on data routinely collected in electronic healthcare records (EHRs) obtained prior to diagnosis. We grounded our analysis on the ½ Million people UK Biobank and linked NHS COVID-19 records. We developed a method to capture the complexities and large variety of clinical codes present in EHRs and we show that these have a larger impact on risk than all other patient data but age. We use a form of clustering for natural language processing of the clinical codes, specifically, topic modelling by Latent Dirichlet Allocation (LDA), to generate a succinct digital fingerprint of a patient’s full secondary care clinical history, i.e. their comorbidities and past interventions. These digital comorbidity fingerprints offer immediately interpretable clinical descriptions that are meaningful, e.g. grouping cardiovascular disorders with common risk factors but also novel groupings that are not obvious. The comorbidity fingerprints differ in both their breadth and depth from existing observational disease associations in the COVID-19 literature. Taking this data-driven approach allows us to avoid human-induction bias and confirmation bias during selection of what are important potential predictors of COVID-19 mortality. Together with age these digital fingerprints are the single most important factor in our predictor. This holds the potential for improving individual risk profiling for clinical decisions and the identification of groups for public health interventions such as vaccine programmes. Combining our digital precondition fingerprints with demographic characteristics allow us to match or exceed the performance of existing state-of-the-art COVID-19 mortality predictors (EHCF) which have been developed through expert consensus. Our precondition fingerprinting and entire mortality prediction analytics pipeline are designed so as to be rapidly redeployable, e.g. for COVID-19 variants or other pre-existing diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.