Instance selection in the supervised machine learning, often referred to as the data reduction, aims at deciding which instances from the training set should be retained for further use during the learning process. Instance selection can result in increased capabilities and generalization properties of the learning model, shorter time of the learning process, or it can help in scaling up to large data sources. The paper proposes a cluster-based instance selection approach with the learning process executed by the team of agents and discusses its four variants. The basic assumption is that instance selection is carried out after the training data have been grouped into clusters. To validate the proposed approach and to investigate the influence of the clustering method used on the quality of the classification, the computational experiment has been carried out.Keywords Machine learning · Data mining · Instance selection · Multi-agent system
IntroductionLearning from examples remains the most important paradigm of the machine learning. The problem of learning from data, according to [7], can be formulated as follows: Given a dataset D, a set of hypotheses H , a performance criterion P, the learning algorithm L outputs a hypothesis h ∈ H that optimizes P. The data D consists of N training examples, also called instances. Each example is described by a set A of n attributes. The goal of learning is to produce a hypothesis that optimizes the performance criterion. In the pattern classification application, h is a classifier (i.e. decision tree, artificial neural network, naive Bayes, k-nearest neighbor, etc.) that has been induced based on the training set D.Research works in the field of machine learning have resulted in the development of numerous approaches and algorithms for classification problems [46,51]. One of the recent focuses of such research includes methods of selecting relevant information to be used within the I. Czarnowski (B)