As biomedical databases continue to expand, it becomes increasingly difficult to identify a crucial feature for a classification task due to big data size and sparsity issues. Traditional feature subset models rely on fixed-sized dimensions for the feature ranking and classification process, which is not suitable for addressing concerns with sparsity, missing values, and imbalance in the selection of crucial features for the data classification process. To enhance disease prediction effectiveness, this article proposes a hybrid ensemble feature selection method that employs an advanced cluster-based classification model. The model uses an ensemble of rated features to classify the disease with high accuracy and true positive rate. To improve the effectiveness of tree pruning and classification, we introduce a novel cluster-based classification model. We simulated experimental results using various training datasets to predict accuracy. Our proposed results demonstrate that the gene-chemical disease clustering-based classification framework outperforms traditional methods, statistical metrics, and classification models in terms of optimization.
As the size of the biomedical databases areincreasing day-by-day, finding anessential featureset for classification problem is complex due to large data size and sparsity problems. Microarray feature ranking and classification is one of the major challenges to scientific and medical researchers due to its high dimensional feature space and limited number of samples. Feature transformation, feature ranking and data classification are the essential components to improve the microarray cancer prediction on high dimensional datasets. In this work, a novel framework is designed and implemented to classify the high dimensional data with high true positive rate. In the proposed work, a hybrid feature transformation, hybrid feature selection and advance classification approach are implemented to improve the true positive rate and error rate of the disease prediction. A novel principal component ranking measure is integratedin order to find the subset of features for classification problem. Finally, a hybrid decision tree classifier is used to predict the classification accuracy on the selected features set. Experimental results proved that the present framework has better performance compared to the traditional models for variable microarray datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.