From the perspective of clinical decision-making in a Medical IoT-based healthcare system, achieving effective and efficient analysis of long-term health data for supporting wise clinical decision-making is an extremely important objective, but determining how to effectively deal with the multi-dimensionality and high volume of generated data obtained from Medical IoT-based healthcare systems is an issue of increasing importance in IoT healthcare data exploration and management. A novel classifier or predicator equipped with a good feature selection function contributes effectively to classification and prediction performance. This paper proposes a novel bagging C4.5 algorithm based on wrapper feature selection, for the purpose of supporting wise clinical decision-making in the medical and healthcare fields. In particular, the new proposed sampling method, S-C4.5-SMOTE, is not only able to overcome the problem of data distortion, but also improves overall system performance because its mechanism aims at effectively reducing the data size without distortion, by keeping datasets balanced and technically smooth. This achievement directly supports the Wrapper method of effective feature selection without the need to consider the problem of huge amounts of data; this is a novel innovation in this work.
Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.