Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.
The visible success of the Peer to Peer (P2P) paradigm is associated with many challenges in finding trustworthy peers as reliable communication partners. Reputation management systems are emerging in the face of these challenges. The EigenTrust reputation management system is among the most known and successful reputation systems. On the other hand, a main drawback of this system is its reliance on a set of pre-trusted peers which causes nodes to center around them. As a consequence, other peers are ranked low despite being honest, marginalizing their effect in the system. To tackle this problem, this paper proposed enhancing the EigenTrust algorithm by giving peers with high reputation values (honest peers) a role in calculating the global reputation of other peers. Rather than solely depending on the static group of pre-trusted peers, the proposed algorithm, HonestPeer, selects the most reputable nodes, honest peers, dynamically based on the quality of the provided files. This makes HonestPeer more robust to the increase in the number of files and nodes in the system. Through simulation, it has been shown that HonestPeer has successfully maintained higher success rate and lower percentage of inauthentic downloads when compared to the original algorithm. ª 2015 The Author. Production and hosting by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
The Middle East Respiratory Syndrome Coronavirus (MERS-CoV) is a viral respiratory disease that is spreading worldwide necessitating to have an accurate diagnosis system that accurately predicts infections. As data mining classifiers can greatly assist in enhancing the prediction accuracy of diseases in general. In this paper, classifier model performance for two classification types:(1) binary and (2) multi-class were tested on a MERS-CoV dataset that consists of all reported cases in Saudi Arabia between 2013 and 2017. A cross-validation model was applied to measure the accuracy of the Support Vector Machine (SVM), Decision Tree, and k-Nearest Neighbor (k-NN) classifiers. Experimental results demonstrate that SVM and Decision Tree classifiers achieved the highest accuracy of 86.44% for binary classification based on healthcare personnel class. On the other hand, for multiclass classification based on city class, the decision tree classifier had the highest accuracy among the remaining classifiers; although it did not reach a satisfactory accuracy level (42.80%). This work is intended to be a part of a MERS-CoV prediction system to enhance the diagnosis of MERS-CoV disease.
Emerging grids could help bridge the gap between grid technologies and users. This classification of grid systems aims to motivate research and help establish a foundation in this developing field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.