Missing values may be present in data without undermining its use for diagnostic / classification purposes but compromise application of readily available software. Surrogate entries can remedy the situation, although the outcome is generally unknown. Discretization of continuous attributes renders all data nominal and is helpful in dealing with missing values; particularly, no special handling is required for different attribute types. A number of classifiers exist or can be reformulated for this representation. Some classifiers can be reinvented as data completion methods. In this work the Decision Tree, Nearest Neighbour, and Naive Bayesian methods are demonstrated to have the required aptness. An approach is implemented whereby the entered missing values are not necessarily a close match of the true data; however, they intend to cause the least hindrance for classification. The proposed techniques find their application particularly in medical diagnostics. Where clinical data represents a number of related conditions, taking Cartesian product of class values of the underlying sub-problems allows narrowing down of the selection of missing value substitutes. Real-world data examples, some publically available, are enlisted for testing. The proposed and benchmark methods are compared by classifying the data before and after missing value imputation, indicating a significant improvement.
Glycated haemoglobin (HbA1c) is now more commonly used as an alternative test to the fasting plasma glucose and oral glucose tolerance tests for the identification of Type 2 Diabetes Mellitus (T2DM) because it is easily obtained using the point-of-care technology and represents long-term blood sugar levels. According to WHO guidelines, HbA1c values of 6.5% or above are required for a diagnosis of T2DM. However outcomes of a large number of trials with HbA1c have been inconsistent across the clinical spectrum and further research is required to determine the efficacy of HbA1c testing in identification of T2DM. Medical records from a diabetes screening program in Australia illustrate that many patients could be classified as diabetics if other clinical indicators are included, even though the HbA1c result does not exceed 6.5%. This suggests that a cutoff for the general population of 6.5% may be too simple and miss individuals at risk or with already overt, undiagnosed diabetes. In this study, data mining algorithms have been applied to identify markers that can be used with HbA1c. The results indicate that T2DM is best classified by HbA1c at 6.2% -a cutoff level lower than the currently recommended one, which can be even less, having assumed the threshold flexibility, if additionally to HbA1c being high the rule is conditioned on oxidative stress or inflammation being present, atherogenicity or adiposity being high, or hypertension being diagnosed, etc.
To date a vast evidence exists that the waist circumference to height ratio (WCHR) provides a better measure of obesity comparing to the body mass index (BMI). While weight and height are routinely obtained to calculate BMI, waist circumference, despite easily acquired, is often overlooked because the screening protocols, particularly for diabetes, demand BMI. This creates an obstacle for application of WCHR -a more definite measure than BMI for diagnostic of many linked to obesity metabolic disorders such as diabetes, cardiovascular disease and hypertension. This article is intended to fill the gap in the literature by providing a conversion from BMI to WCHR for five adult age categories. A strong linearity between the measures is demonstrated and equivalent to BMI WCHR thresholds are provided to identify normality, overweight as well as obesity and other points. The analysis is based on the data from National Health and Nutrition Examination Survey (NHANES). Different forms of BMI are also discussed and a strong linearity between them is demonstrated. An obesity index based on simple weight to height ratio to match the standard levels is proposed. The equivalence between the proposed and existing obesity indices is tested on the original data with promising results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.