2022
DOI: 10.48550/arxiv.2204.03719
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework

Abstract: Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(8 citation statements)
references
References 95 publications
(138 reference statements)
0
5
0
Order By: Relevance
“…The combined problem of class imbalance and concept drift remains an open challenge [21,3]. There are two types of approaches, algorithm-level and datalevel.…”
Section: Class Imbalancementioning
confidence: 99%
See 1 more Smart Citation
“…The combined problem of class imbalance and concept drift remains an open challenge [21,3]. There are two types of approaches, algorithm-level and datalevel.…”
Section: Class Imbalancementioning
confidence: 99%
“…Cost-sensitive learning, one-class classification, and anomaly detection are other types of methods that can handle imbalance. We direct the interested reader towards these excellent surveys [21,3].…”
Section: Class Imbalancementioning
confidence: 99%
“…The proposed method can distinguish between normal and anomalous images with 98.52% accuracy. Recently, training/learning high-quality AI models from imbalanced data in real-time applications has become a popular research topic [333]. Vu et al [334] developed a novel collaborative data model for semi-fully distributed settings for real-time medical applications.…”
Section: B Potential Opportunities For Future Research In Privacy Domainmentioning
confidence: 99%
“…Also all experiments were conducted in MOA command line mode similarly to ones described in [4]. 2 Recall that our study aims at detailed investigating the impact of selected factors on the classification of multi-class imbalanced stream, not at comparing many different classifiers such as [1], so it is sufficient to select few representative classifiers only.…”
Section: An Experimental Setupmentioning
confidence: 99%
“…However in evolving data streams they could also influence the changes in local class distributions and other local drifts. Nevertheless, the conjunctions of these data factors and drifts have not been sufficiently studied yet, see discussions in [1,6,4].…”
Section: Introductionmentioning
confidence: 99%