Incremental and online learning algorithms are more relevant in the data mining context because of the increasing necessity to process data streams. In this context, the target function may change over time, an inherent problem of online learning (known as concept drift). In order to handle concept drift regardless of the learning model, we propose new methods to monitor the performance metrics measured during the learning process, to trigger drift signals when a significant variation has been detected. To monitor this performance, we apply some probability inequalities that assume only independent, univariate and bounded random variables to obtain theoretical guarantees for the detection of such distributional changes. Some common restrictions for the online change detection as well as relevant types of change (abrupt and gradual) are considered. Two main approaches are proposed, the first one involves moving averages and is more suitable to detect abrupt changes. The second one follows a widespread intuitive idea to deal with gradual changes using weighted moving averages. The simplicity of the proposed methods, together with the computational efficiency make them very advantageous. We use a Naïve Bayes classifier and a Perceptron to evaluate the performance of the methods over synthetic and real data.
Studying the temporal evolution of a parameter in a system is essential to many applications. In these applications, if the value of the parameter at every moment can be represented by a symbol, then the evolution of the system over time can be represented by a sequence of symbols. Statistical modeling of these complex sequences is a fundamental goal of machine learning owing to its wide variety of natural applications, for example statistical models of biological sequences such as DNA (Krogh, Mian, and Haussler 1993).It is impossible in many applications to determine exactly the next symbol given the previous symbols in the sequence. Thus, learning models try to compute the probability that every symbol must appear in the sequence given the previous sequence of symbols. The set of probabilities of every symbol given the previous sequence of symbols is called the probability distribution of the next symbol.This statistical modeling tries to predict the next symbol in the sequence given the preceding subsequence of symbols-for example, a sequence of musical notes. In this case, statistical modeling will try to predict the next pitch in a melody given the preceding subsequence of pitches. If we consider the empirical probability distribution of the next symbol given a preceding subsequence of some given length, then there exists a length L (the memory length) such that the conditional probability distribution does not change substantially if we condition it on preceding subsequences of length greater than L. This feature can be found in many applications related to natural language processing, such as speech recognition (Jelinek 1990;Nadas 1984) and speech tagging (Brill 1994;Merialdo 1994).Markov chains (Shannon 1951) model this statistical property and have been applied to model data sequences. The last L symbols of a sequence are called the state of the Markov chain. If the parameter of the system can take m different values, and the last L symbols are considered to determine the state of the system, then the system will have m L different states. To train a Markov chain, the next symbol probability distribution must be computed. This task is usually carried out by computing the relative frequency of a symbol given the preceding subsequence of L symbols in a training sample. The number of conditional probabilities that must be computed is m L m=m L+1 . This number grows exponentially with its order L, so only lower-order Markov chains can be considered in practical applications.An improved model of Markov chains called Probabilistic Suffix Automata (PSA) was developed by Dana Ron (Ron 1996;Ron, Singer, and Tishby 1996). Hence, a PSA is a variable-order L Markov chain, meaning that the order-or equivalently , the memory length-is variable. Unlike Markov chains, this model does not grow exponentially with its order, and hence higher-order models can be considered. Moreover, it produces more intuitive descriptions of practical problems.A simplified version of the PSA model based on the Lempel-Zip algorithm (Weinber...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.