In this research we present a novel approach to the concept change detection problem. Change detection is a fundamental issue with data stream mining as classification models generated need to be updated when significant changes in the underlying data distribution occur. A number of change detection approaches have been proposed but they all suffer from limitations with respect to one or more key performance factors such as high computational complexity, poor sensitivity to gradual change, or the opposite problem of high false positive rate. Our approach uses reservoir sampling to build a sequential change detection model that offers statistically sound guarantees on false positive and false negative rates but has much smaller computational complexity than the ADWIN concept drift detector. Extensive experimentation on a wide variety of datasets reveals that the scheme also has a smaller false detection rate while maintaining a competitive true detection rate to ADWIN.
In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the Discrete Fourier Transform (DFT) to Decision Tree classifiers to obtain highly compressed versions of the trees at concept drift points in the stream and store such trees in a repository for future use. Our empirical results on real world and synthetic data exhibiting varying degrees of recurrence show that the Fourier compressed trees are more robust to noise and are able to capture recurring concepts with higher precision than a meta learning approach that chooses to re-use classifiers in their originally occurring form. The DFT, apart from its use in meta learning, has a number of other desirable properties that make it attractive for mining high speed data streams. This includes the ability to classify directly from the spectra generated, thus eliminating the need for expensive traversal of a tree structure.Our experimental results in section 5 clearly show the accuracy, processing speed and memory advantages of applying the DFT as opposed to the meta learning approach proposed by Gama and Kosina in [4].The rest of the paper is as follows. In section 2 we review work done in the area of capturing recurrences. We describe the basics of applying the DFT to decision trees in section 3. In section 4 we discuss a novel approach to optimizing the computation of the Fourier spectrum from a Decision Tree. Our experimental results are presented in section 5 and we conclude the paper in section 6 where we draw conclusions on the research and discuss some directions for future research.
Abstract. In this research we present a novel approach to the concept change detection problem. Change detection is a fundamental issue with data stream mining as models generated need to be updated when significant changes in the underlying data distribution occur. A number of change detection approaches have been proposed but they all suffer from limitations such as high computational complexity, poor sensitivity to gradual change, or the opposite problem of high false positive rate. Our approach, termed OnePassSampler, has low computational complexity as it avoids multiple scans on its memory buffer by sequentially processing data. Extensive experimentation on a wide variety of datasets reveals that OnePassSampler has a smaller false detection rate and smaller computational overheads while maintaining a competitive true detection rate to ADWIN2.
In this research, we apply ensembles of Fourier encoded spectra to capture and mine recurring concepts in a data stream environment. Previous research showed that compact versions of Decision Trees can be obtained by applying the Discrete Fourier Transform to accurately capture recurrent concepts in a data stream. However, in highly volatile environments where new concepts emerge often, the approach of encoding each concept in a separate spectrum is no longer viable due to memory overload and thus in this research we present an ensemble approach that addresses this problem. Our empirical results on real world data and synthetic data exhibiting varying degrees of recurrence reveal that the ensemble approach outperforms the single spectrum approach in terms of classification accuracy, memory and execution time
SUMMARYIn this research, we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the reuse of previously learned classifiers without the need for relearning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the discrete Fourier transform to decision tree classifiers to obtain highly compressed versions of the trees at concept drift points in the stream and store such trees in a repository for future use. In addition, the impact of drift detector in enabling stable performance is also studied with the two drift detectors: ADWIN and SeqDrift2 in recurring concept capturing context. Our empirical results on real world and synthetic data exhibiting varying degrees of recurrence show that the Fourier compressed trees are more robust to noise and are able to capture recurring concepts with higher precision than a meta-learning approach that chooses to reuse classifiers in their originally occurring form. A case study on a flight dataset that closely matches the target data stream environment where concepts recur in similar form in a time critical system is also conducted and the benefits of discrete Fourier transform application is evaluated.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.