Abstract. Several methods were proposed to reduce the number of instances (vectors) in the learning set. Some of them extract only bad vectors while others try to remove as many instances as possible without significant degradation of the reduced dataset for learning. Several strategies to shrink training sets are compared here using different neural and machine learning classification algorithms. In part II (the accompanying paper) results on benchmarks databases have been presented.
Abstract. This paper is an continuation of the accompanying paper with the same main title. The first paper reviewed instance selection algorithms, here results of empirical comparison and comments are presented. Several test were performed mostly on benchmark data sets from the machine learning repository at UCI. Instance selection algorithms were tested with neural networks and machine learning algorithms.
Classification methods with linear computational complexity O(nd) in the number of samples n and their dimensionality d often give results that are better or at least statistically not significantly worse that slower algorithms. This is demonstrated here for many benchmark datasets downloaded from the UCI Machine Learning Repository. Results provided in this paper should be used as a reference for estimating usefulness of new learning algorithms: higher complexity methods should provide significantly better results to justify their use.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.