We propose a robust diabetes prediction model by examining how predictions from several learning algorithms, performing the same task, can be exploited to yield a higher performance than the best individual learning algorithm. The task was to forecast the onset of non-insulin dependent diabetes within a five year period using previous vital sign examination information. Experimental data is a 768 x 9 array arranged as row vectors, each with observed input in all but the last column which contains a single vector of output. Five well-known models were trained with associated learning algorithms (Sequential Minimal Optimization (SMO), Radial Basis Function (RBF), C4.5, Naïve Bayes and RIPPER) on the same dataset, and performance compared using Accuracy, Receiver Operating Characteristics area (aROC) and Speed as metrics. After comparison, a combiner (Meta) model, using a simple Logistic Regression algorithm, was trained to make a final prediction using outputs of the best and worst performing algorithms (in the order Accuracy -aROC -Speed) as additional inputs. C4.5 had the best performance with Accuracy of 77.9% and aROC of 83.1%. The RBF gave the lowest performance with Accuracy of 73.6% and aROC of 80.5%. The Meta model achieved a classification accuracy of 77.0% with aROC of 84.9%. The slight decline in Accuracy was because we used aROC (not Accuracy) as the performance metric during selection.
Separating household waste into categories such as organic and recyclable is a critical part of waste management systems to make sure that valuable materials are recycled and utilised. This is beneficial to human health and the environment because less risky treatments are used at landfill and/or incineration, ultimately leading to improved circular economy. Conventional waste separation relies heavily on manual separation of objects by humans, which is inefficient, expensive, time consuming, and prone to subjective errors caused by limited knowledge of waste classification. However, advances in artificial intelligence research has led to the adoption of machine learning algorithms to improve the accuracy of waste classification from images. In this paper, we used a waste classification dataset to evaluate the performance of a bespoke five-layer convolutional neural network when trained with two different image resolutions. The dataset is publicly available and contains 25,077 images categorised into 13,966 organic and 11,111 recyclable waste. Many researchers have used the same dataset to evaluate their proposed methods with varying accuracy results. However, these results are not directly comparable to our approach due to fundamental issues observed in their method and validation approach, including the lack of transparency in the experimental setup, which makes it impossible to replicate results. Another common issue associated with image classification is high computational cost which often results to high development time and prediction model size. Therefore, a lightweight model with high accuracy and a high level of methodology transparency is of particular importance in this domain. To investigate the computational cost issue, we used two image resolution sizes (i.e., 225×264 and 80×45) to explore the performance of our bespoke five-layer convolutional neural network in terms of development time, model size, predictive accuracy, and cross-entropy loss. Our intuition is that smaller image resolution will lead to a lightweight model with relatively high and/or comparable accuracy than the model trained with higher image resolution. In the absence of reliable baseline studies to compare our bespoke convolutional network in terms of accuracy and loss, we trained a random guess classifier to compare our results. The results show that small image resolution leads to a lighter model with less training time and the accuracy produced (80.88%) is better than the 76.19% yielded by the larger model. Both the small and large models performed better than the baseline which produced 50.05% accuracy. To encourage reproducibility of our results, all the experimental artifacts including preprocessed dataset and source code used in our experiments are made available in a public repository.
An exploratory research is presented to gauge the impact of feature selection on heterogeneous ensembles. The task is to predict diabetes onset with healthcare data obtained from UC Irvine (UCI) database. Evidence suggests that accuracy and diversity are the two vital requirements to achieve good ensembles. Therefore, the research presented in this paper exploits diversity from heterogeneous base classifiers; and the optimisation effect of feature subset selection in order to improve accuracy. Five widely used classifiers are employed for the ensembles and a meta-classifier is used to aggregate their outputs. The results are presented and compared with similar studies that used the same dataset within the literature. It is shown that by using the proposed method, diabetes onset prediction can be done with higher accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.