Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

Schroeder, Emily B.; Shetterly, Susan; Goodrich, Glenn K.; O’Connor, Patrick J.; Steiner, John F.; Schmittdiel, Julie A.; Desai, Jay; Pathak, Ram D.; Neugebauer, Romain; Butler, Melissa G.; Kirchner, Lester; Raebel, Marsha A.

doi:10.19139/68

Cited by 6 publications

(7 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Baseline status for comorbid conditions was assigned based on two or more outpatient diagnosis codes or one or more inpatient diagnosis codes on or before the cohort entry date for chronic kidney disease (CKD; ICD-9 code 585.xx), CVD (ICD-9 codes 410–414.xx and 429.2), heart failure (HF; ICD-9 codes 428–428.9), hemorrhagic stroke (ICD-9 codes 430–432.9), ischemic stroke (ICD-9 codes 433–434.91), and transient ischemic attack (ICD-9 code 435.xx). Multiple imputation was used for missing data on A1C, LDL-C, and HDL-C following previous work using the SUPREME-DM cohort (12). …”

Section: Methodsmentioning

confidence: 99%

Preventable Major Cardiovascular Events Associated With Uncontrolled Glucose, Blood Pressure, and Lipids and Active Smoking in Adults With Diabetes With and Without Cardiovascular Disease: A Contemporary Analysis

Vazquez‐Benitez

Desai

et al. 2015

Diabetes Care

Self Cite

View full text Add to dashboard Cite

OBJECTIVEThe objective of this study was to assess the incidence of major cardiovascular (CV) hospitalization events and all-cause deaths among adults with diabetes with or without CV disease (CVD) associated with inadequately controlled glycated hemoglobin (A1C), high LDL cholesterol (LDL-C), high blood pressure (BP), and current smoking.RESEARCH DESIGN AND METHODSStudy subjects included 859,617 adults with diabetes enrolled for more than 6 months during 2005–2011 in a network of 11 U.S. integrated health care organizations. Inadequate risk factor control was classified as LDL-C ≥100 mg/dL, A1C ≥7% (53 mmol/mol), BP ≥140/90 mm Hg, or smoking. Major CV events were based on primary hospital discharge diagnoses for myocardial infarction (MI) and acute coronary syndrome (ACS), stroke, or heart failure (HF). Five-year incidence rates, rate ratios, and average attributable fractions were estimated using multivariable Poisson regression models.RESULTSMean (SD) age at baseline was 59 (14) years; 48% of subjects were female, 45% were white, and 31% had CVD. Mean follow-up was 59 months. Event rates per 100 person-years for adults with diabetes and CVD versus those without CVD were 6.0 vs. 1.7 for MI/ACS, 5.3 vs. 1.5 for stroke, 8.4 vs. 1.2 for HF, 18.1 vs. 40 for all CV events, and 23.5 vs. 5.0 for all-cause mortality. The percentages of CV events and deaths associated with inadequate risk factor control were 11% and 3%, respectively, for those with CVD and 34% and 7%, respectively, for those without CVD.CONCLUSIONSAdditional attention to traditional CV risk factors could yield further substantive reductions in CV events and mortality in adults with diabetes.

show abstract

Section: Methodsmentioning

confidence: 99%

Preventable Major Cardiovascular Events Associated With Uncontrolled Glucose, Blood Pressure, and Lipids and Active Smoking in Adults With Diabetes With and Without Cardiovascular Disease: A Contemporary Analysis

Vazquez‐Benitez

Desai

et al. 2015

Diabetes Care

Self Cite

View full text Add to dashboard Cite

show abstract

“…An account of available software facilitated modelling using MI in diabetes studies is given in ref. [17] Despite regarded as "state of the art", EM and MI techniques are computationally very intensive, especially MI, which is rather a statistical experiment featuring an imputation method. Apart from the design, the biggest contributor to the problem is the multitude of model parameters as their number is dependent on the number of problem dimensions and can grow explosively with model complexity.…”

Section: Related Workmentioning

confidence: 99%

Diagnostic with incomplete nominal/discrete data

Jelinek¹,

Yatsko²,

Stranieri³

et al. 2015

AIR

View full text Add to dashboard Cite

Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.

show abstract

“…A study by Rose et al [18] discussed the correlation between RBS and HbA1c levels. Stanley et al [19] used a linear regression model for imputation of missing HbA1c data. Their model calculates HbA1c levels for patient records with missing HbA1c values as continuous and categorical values and uses 4 predictors extracted from an EHR system: RBS, FBS, along with age and gender, as predictors to calculate the level of HbA1c for a diabetic population.…”

Section: Related Workmentioning

confidence: 99%

Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records

et al. 2021

View full text Add to dashboard Cite

Background Predicting the risk of glycated hemoglobin (HbA1c) elevation can help identify patients with the potential for developing serious chronic health problems, such as diabetes. Early preventive interventions based upon advanced predictive models using electronic health records data for identifying such patients can ultimately help provide better health outcomes. Objective Our study investigated the performance of predictive models to forecast HbA1c elevation levels by employing several machine learning models. We also examined the use of patient electronic health record longitudinal data in the performance of the predictive models. Explainable methods were employed to interpret the decisions made by the black box models. Methods This study employed multiple logistic regression, random forest, support vector machine, and logistic regression models, as well as a deep learning model (multilayer perceptron) to classify patients with normal (<5.7%) and elevated (≥5.7%) levels of HbA1c. We also integrated current visit data with historical (longitudinal) data from previous visits. Explainable machine learning methods were used to interrogate the models and provide an understanding of the reasons behind the decisions made by the models. All models were trained and tested using a large data set from Saudi Arabia with 18,844 unique patient records. Results The machine learning models achieved promising results for predicting current HbA1c elevation risk. When coupled with longitudinal data, the machine learning models outperformed the multiple logistic regression model used in the comparative study. The multilayer perceptron model achieved an accuracy of 83.22% for the area under receiver operating characteristic curve when used with historical data. All models showed a close level of agreement on the contribution of random blood sugar and age variables with and without longitudinal data. Conclusions This study shows that machine learning models can provide promising results for the task of predicting current HbA1c levels (≥5.7% or less). Using patients’ longitudinal data improved the performance and affected the relative importance for the predictors used. The models showed results that are consistent with comparable studies.

show abstract

Accuracy of hemoglobin A1c imputation using fasting plasma glucose in diabetes research using electronic health records data

Cited by 6 publications

References 2 publications

Preventable Major Cardiovascular Events Associated With Uncontrolled Glucose, Blood Pressure, and Lipids and Active Smoking in Adults With Diabetes With and Without Cardiovascular Disease: A Contemporary Analysis

Preventable Major Cardiovascular Events Associated With Uncontrolled Glucose, Blood Pressure, and Lipids and Active Smoking in Adults With Diabetes With and Without Cardiovascular Disease: A Contemporary Analysis

Diagnostic with incomplete nominal/discrete data

Improving Current Glycated Hemoglobin Prediction in Adults: Use of Machine Learning Algorithms With Electronic Health Records

Contact Info

Product

Resources

About