Learning in non-stationary environments with class imbalance

Hoens, T. Ryan; Chawla, Nitesh V.

doi:10.1145/2339530.2339558

Cited by 26 publications

(30 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some researchers chose to calculate AUC using entire streams [13,36], while others used periodical holdout sets [28,44]. Nevertheless, it was noticed that periodical holdout sets may not fully capture the temporal dimension of the data [33], whereas evaluation using entire streams is neither feasible for large datasets nor suitable for drift detection.…”

Section: Area Under the Roc Curvementioning

confidence: 99%

“…This way, we maintain a structure that facilitates the calculation of AUC and ensures that the oldest score in the sliding window will be promptly found in the red-black tree. After the sliding window and tree have been updated, AUC is calculated by summing the number of positive examples occurring before each negative example (lines [18][19][20][21][22][23][24][25][26][27][28] and normalizing that value by all possible pairs pn (line 29), where p is the number of positives and n is the number of negatives in the window. This method of calculating AUC, proposed in [48], is equivalent to summing the area of trapezoids for each pair of sequential points on the ROC curve, but more suitable for our purposes, as it requires very little computation given a sorted collection of scores.…”

Section: Prequential Aucmentioning

confidence: 99%

“…2, there have already been attempts to use AUC as an evaluation measure for data stream classifiers. Some researchers [28,44] calculated AUC on periodical holdout sets, i.e., consecutive blocks of examples. Others [13,36], for experimental purposes, treated small data streams as a single batch of examples and calculated AUC traditionally.…”

Section: Auc Visualizations Over Timementioning

confidence: 99%

“…This means that AUC cannot be directly computed on large data streams, as this would require scanning through the entire stream after each example. That is why, the use of AUC for data streams has been limited only to estimations on periodical holdout sets [28,44] or entire streams [13,36], making it either potentially biased or computationally infeasible for practical applications.…”

Section: Introductionmentioning

confidence: 99%

“…Although the computation of traditional batch AUC is infeasible for real streams, it nevertheless serves as the best AUC estimate for data without concept changes. Moreover, the proposed measure will be compared with earlier attempts at using AUC to assess stream classifiers, in particular those based on block processing [28,44]. Additionally, we perform a series of experiments, which analyze the use of different AUC calculation procedures for visualizing concept changes over time and assess the processing speed of the proposed measure.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Prequential AUC: properties of the area under the ROC curve for data streams with concept drift

2017

View full text Add to dashboard Cite

Modern data-driven systems often require classifiers capable of dealing with streaming imbalanced data and concept changes. The assessment of learning algorithms in such scenarios is still a challenge, as existing online evaluation measures focus on efficiency, but are susceptible to class ratio changes over time. In case of static data, the area under the receiver operating characteristics curve, or simply AUC, is a popular measure for evaluating classifiers both on balanced and imbalanced class distributions. However, the characteristics of AUC calculated on time-changing data streams have not been studied. This paper analyzes the properties of our recent proposal, an incremental algorithm that uses a sorted tree structure with a sliding window to compute AUC with forgetting. The resulting evaluation measure, called prequential AUC, is studied in terms of: visualization over time, processing speed, differences compared to AUC calculated on blocks of examples, and consistency with AUC calculated traditionally. Simulation results show that the proposed measure is statistically consistent with AUC computed traditionally on streams without drift and comparably fast to existing evaluation procedures. Finally, experiments on real-world and synthetic data showcase characteristic properties of prequential AUC compared to classification accuracy, G-mean, Kappa, Kappa M, and recall when used to evaluate classifiers on imbalanced streams with various difficulty factors.

show abstract

Section: Area Under the Roc Curvementioning

confidence: 99%

Section: Prequential Aucmentioning

confidence: 99%

Section: Auc Visualizations Over Timementioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Prequential AUC: properties of the area under the ROC curve for data streams with concept drift

2017

View full text Add to dashboard Cite

show abstract

Prequential AUC for Classifier Evaluation and Drift Detection in Evolving Data Streams

Brzeziński

Stefanowski

2015

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Detecting and adapting to concept drifts make learning data stream classifiers a difficult task. It becomes even more complex when the distribution of classes in the stream is imbalanced. Currently, proper assessment of classifiers for such data is still a challenge, as existing evaluation measures either do not take into account class imbalance or are unable to indicate class ratio changes in time. In this paper, we advocate the use of the area under the ROC curve (AUC) in imbalanced data stream settings and propose an efficient incremental algorithm that uses a sorted tree structure with a sliding window to compute AUC using constant time and memory. Additionally, we experimentally verify that this algorithm is capable of correctly evaluating classifiers on imbalanced streams and can be used as a basis for detecting changes in class definitions and imbalance ratio.

show abstract