2018
DOI: 10.1016/j.patcog.2018.03.008
|View full text |Cite
|
Sign up to set email alerts
|

Handling data irregularities in classification: Foundations, trends, and future challenges

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
70
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 168 publications
(70 citation statements)
references
References 108 publications
0
70
0
Order By: Relevance
“…However, it should be mentioned that the theoretical advantages were built upon some regularity assumption of the data such as the independent identically distributed distribution. When faced with data of irregularities, such as the class imbalance, small disjuncts, and class distribution skew [10], the theoretical advantages may not be still hold and the algorithm itself should be modified. We refer the reader to the nice review [10] on modifying classifier for data of irregularities.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, it should be mentioned that the theoretical advantages were built upon some regularity assumption of the data such as the independent identically distributed distribution. When faced with data of irregularities, such as the class imbalance, small disjuncts, and class distribution skew [10], the theoretical advantages may not be still hold and the algorithm itself should be modified. We refer the reader to the nice review [10] on modifying classifier for data of irregularities.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…When faced with data of irregularities, such as the class imbalance, small disjuncts, and class distribution skew [10], the theoretical advantages may not be still hold and the algorithm itself should be modified. We refer the reader to the nice review [10] on modifying classifier for data of irregularities. We will keep working on the study of using RBoosting to generate classifier for data of irregularities and report our progress in a future publication.…”
Section: Conclusion and Discussionmentioning
confidence: 99%
“…Most of the learning and classification methods used in building such ID models are based on a number of key assumptions [2,3], such as: (i) the equal representation of classes, (ii) the equal representation of sub concepts for a specific class, (iii) the similar class conditional distributions of all classes, and (iv) the pre defining and knowledge of all the values of the attributes for all records in the dataset. Due to the traffic evolution, most, if not all, of these assumptions are violated in real environments, as new traffic will start to exhibit different statistical properties to those of the training data.…”
Section: Problem Statementmentioning
confidence: 99%
“…The first type of class that is under-presented with fewer instances than others because of the rare events, abnormal patterns, unusual behaviours, or interruptions during gathering of data is known as the minority, while the remaining class/classes that have an abundant number of instances are named as majority [3]. Figure 1 maps the types of imbalanced data [4], frequently suggested solutions in the literature [5], assessment metrics to evaluate effectiveness of these solutions [6], and widespread real-world applications of imbalance data [3].…”
Section: Introductionmentioning
confidence: 99%
“…For instance, detecting an attack is more important than detecting normal traffic, or diagnosing the disease is more critical than diagnosing health. Class imbalanced problem is typically handled in three ways: under/oversampling, modifying algorithm, and reducing misclassification cost [5]. However, these approaches have several limitations, such as working well on small data, having more computing and storage costs because of algorithm complexity, being slow by algorithm's nature, handling either binary-class or multi-class problems, and requiring predefined threshold values.…”
Section: Introductionmentioning
confidence: 99%