2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) 2018
DOI: 10.1109/icmla.2018.00125
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study on Class Rarity in Big Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
15
0
1

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
2

Relationship

3
5

Authors

Journals

citations
Cited by 32 publications
(18 citation statements)
references
References 29 publications
1
15
0
1
Order By: Relevance
“…Finally, in [5], the impact of class rarity on big data is evaluated. The researchers use publicly available Medicare data and map known fraudulent providers, from the List of Excluded Individuals/Entities (LEIE) [23], as labels for the positive class.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Finally, in [5], the impact of class rarity on big data is evaluated. The researchers use publicly available Medicare data and map known fraudulent providers, from the List of Excluded Individuals/Entities (LEIE) [23], as labels for the positive class.…”
Section: Related Workmentioning
confidence: 99%
“…Various degrees of class imbalance exist, ranging from slightly imbalanced to rarity. Class rarity in a dataset is defined by comparatively inconsequential numbers of positive instances [5], e.g., the occurrence of 10 fraudulent transactions out of 1,000,000 total transactions generated daily for a bank. Binary classification is usually associated with class imbalance since many multi-class classification problems can be managed by breaking down the data into multiple binary classification tasks.…”
mentioning
confidence: 99%
“…For both RF and GBT, which share several similar parameter settings, the number of trees generated in the training process was set to 100 [6,51]. The Cache Node Ids was set to True and the maximum memory in megabytes (MB) was set to 1024 for speeding up the tree-building process.…”
Section: Classifiersmentioning
confidence: 99%
“…There are various degrees of class imbalance, ranging from slightly imbalanced to rarity. Rarity in a dataset involves comparatively inconsequential numbers of positive instances [6], e.g., the occurrence of 40 fraudulent transactions within an insurance claims dataset of 1,000,000 normal transactions. Binary classification is frequently utilized to focus on class imbalance because many non-binary (i.e., multi-class) classification problems can be addressed by transforming the given data into multiple binary classification tasks.…”
mentioning
confidence: 99%
“…The spectrum of class imbalance ranges from "slightly imbalanced" to "rarity. " Dataset rarity is associated with insignificant numbers of positive instances [4], e.g., the occurrence of 25 fraudulent transactions among 1,000,000 normal transactions within a financial security dataset of a reputable bank. Since many multi-class problems can be simplified by binary classification, data scientists frequently take the binary approach for analytics [5].…”
mentioning
confidence: 99%