Many real‐world data mining applications have to deal with unlabeled streaming data. They are unlabeled because the sheer volume of the stream makes it impractical to label a significant portion of the data. The data streams can evolve over time and these changes are called concept drifts. Concept drifts have different characteristics, which can be used to categorize them into different types. A trade‐off between performance and cost exists among many concept drift detection approaches. On the one hand, high accuracy detection approach usually requires labeled data, possibly involving high cost for labeling. On the other hand, a variety of methods have been devoted to the topic of concept drift detection with unlabeled data, but these approaches often are most suited for only a subset of the concept drift types. The objective of this survey is to present these methods, categorize them and give recommendations of usage based on their behaviors under different types of concept drift.
This article is categorized under:
Fundamental Concepts of Data and Knowledge > Data Concepts
Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining
Explainable AI > Classification
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.