Incremental and online learning algorithms are more relevant in the data mining context because of the increasing necessity to process data streams. In this context, the target function may change over time, an inherent problem of online learning (known as concept drift). In order to handle concept drift regardless of the learning model, we propose new methods to monitor the performance metrics measured during the learning process, to trigger drift signals when a significant variation has been detected. To monitor this performance, we apply some probability inequalities that assume only independent, univariate and bounded random variables to obtain theoretical guarantees for the detection of such distributional changes. Some common restrictions for the online change detection as well as relevant types of change (abrupt and gradual) are considered. Two main approaches are proposed, the first one involves moving averages and is more suitable to detect abrupt changes. The second one follows a widespread intuitive idea to deal with gradual changes using weighted moving averages. The simplicity of the proposed methods, together with the computational efficiency make them very advantageous. We use a Naïve Bayes classifier and a Perceptron to evaluate the performance of the methods over synthetic and real data.
Abstract-Imbalanced classification deals with learning from data with a disproportional number of samples in its classes. Traditional classifiers exhibit poor behavior when facing this kind of data because they do not take into account the imbalanced class distribution. Four main kinds of solutions exist to solve this problem: modifying the data distribution, modifying the learning algorithm for considering the imbalance representation, including the use of costs for data samples, and ensemble methods. In this paper, we adopt the second type of solution and introduce a classification algorithm for imbalanced data that uses fuzzy rough set theory and ordered weighted average aggregation. The proposal considers different strategies to build a weight vector to take into account data imbalance. Our methods are validated by an extensive experimental study, showing statistically better results than 13 other state-of-the-art methods.Index Terms-Fuzzy rough sets, imbalanced classification, machine learning, ordered weighted average (OWA).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.