In this study, breast cancer prediction model is proposed with decision tree and adaptive boosting (Adboost). Furthermore, an extensive experimental evaluation of the predictive performance of the proposed model is conducted. The study is conducted on breast cancer dataset collected form the kaggle data repository. The dataset consists of 569 observations of which the 212 or 37.25% are benign or breast cancer negative and 62.74% are malignant or breast cancer positive. The class distribution shows that, the dataset is highly imbalanced and a learning algorithm such as decision tree is biased to the benign observation and results in poor performance on predicting the malignant observation. To improve the performance of the decision tree on the malignant observation, boosting algorithm namely, the adaptive boosting is employed. Finally, the predictive performance of the decision tree and adaptive boosting is analyzed. The analysis on predictive performance of the model on the kaggle breast cancer data repository shows that, adaptive boosting has 92.53% accuracy and the accuracy of decision tree is 88.80%, Overall, the adaboost algorithm performed better than decision tree.
Explaining the reason for model’s output as diabetes positive or negative is crucial for diabetes diagnosis. Because, reasoning the predictive outcome of model helps to understand why the model predicted an instance into diabetes positive or negative class. In recent years, highest predictive accuracy and promising result is achieved with simple linear model to complex deep neural network. However, the use of complex model such as ensemble and deep learning have trade-off between accuracy and interpretability. In response to the problem of interpretability, different approaches have been proposed to explain the predictive outcome of complex model. However, the relationship between the proposed approaches and the preferred approach for diabetes prediction is not clear. To address this problem, the authors aimed to implement and compare existing model interpretation approaches, local interpretable model agnostic explanation (LIME), shapely additive explanation (SHAP) and permutation feature importance by employing extreme boosting (XGBoost). Experiment is conducted on diabetes dataset with the aim of investigating the most influencing feature on model output. Overall, experimental result evidently appears to reveal that blood glucose has the highest impact on model prediction outcome.
Breast cancer is the most common type of cancer occurring mostly in females. In recent years, many researchers have devoted to automate diagnosis of breast cancer by developing different machine learning model. However, the quality and quantity of feature in breast cancer diagnostic dataset have significant effect on the accuracy and efficiency of predictive model. Feature selection is effective method for reducing the dimensionality and improving the accuracy of predictive model. The use of feature selection is to determine feature required for training model and to remove irrelevant and duplicate feature. Duplicate feature is a feature that is highly correlated to another feature. The objective of this study is to conduct experimental research on three different feature selection methods for breast cancer prediction. Sequential, embedded and chi-square feature selection are implemented using breast cancer diagnostic dataset. The study compares the performance of sequential embedded and chi-square feature selection on test set. The experimental result evidently shows that sequential feature selection outperforms as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection. Overall, sequential feature selection achieves better accuracy of 98.3% as compared to chi-square (X<sup>2</sup>) statistics and embedded feature selection.
<span lang="EN-US">Chronic Kidney Disease (CKD) is a type of lifelong kidney disease that leads to the gradual loss of kidney function over time; the main function of the kidney is to filter the wastein the human body. When the kidney malfunctions, the wastes accumulate in our body leading to complete failure. Machine learning algorithms can be used in prediction of the kidney disease at early stages by analyzing the symptoms. The aim of this paper is to propose an ensemble learning technique for predicting Chronic Kidney Disease (CKD). We propose a new hybrid classifier called as ABC4.5, which is ensemble learning for predicting Chronic Kidney Disease (CKD). The proposed hybrid classifier is compared with the machine learning classifiers such as Support Vector Machine (SVM), Decision Tree (DT), C4.5, Particle Swarm Optimized Multi Layer Perceptron (PSO-MLP). The proposed classifier accurately predicts the occurrences of kidney disease by analysis various medical factors. The work comprises of two stages, the first stage consists of obtaining weak decision tree classifiers from C4.5 and in the second stage, the weak classifiers are added to the weighted sum to represent the final output for improved performance of the classifier.</span>
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.