Data stream mining has gained growing attentions due to its wide emerging applications such as target marketing, email filtering and network intrusion detection.In this paper, we propose a prototype-based classification model for evolving data streams, called SyncStream, which dynamically models time-changing concepts and makes predictions in a local fashion. Instead of learning a single model on a sliding window or ensemble learning, SyncStream captures evolving concepts by dynamically maintaining a set of prototypes in a new data structure called the P-tree. The prototypes are obtained by error-driven representativeness learning and synchronization-inspired constrained clustering. To identify abrupt concept drift in data streams, PCA and statistics based heuristic approaches are employed. SyncStream has several attractive benefits: (a) It is capable of dynamically modeling evolving concepts from even a small set of prototypes and is robust against noisy examples. (b) Owing to synchronization-based constrained clustering and the PTree, it supports an efficient and effective data representation and maintenance. (c) Gradual and abrupt concept drift can be effectively detected. Empirical results shows that our method achieves good predictive performance compared to state-of-the-art algorithms and that it requires much less time than another instance-based stream mining algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.