2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS) 2016
DOI: 10.1109/icis.2016.7550920
|View full text |Cite
|
Sign up to set email alerts
|

A bi-directional sampling based on K-means method for imbalance text classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
18
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 38 publications
(18 citation statements)
references
References 12 publications
0
18
0
Order By: Relevance
“…Cluster sampling methods were also used by [27], which introduced the process of cluster density and boundary density thresholds to determine the cluster and sampling boundary. The literature [28] used a method called a bidirectional sampling based on K-means clustering, which performed very well with data that had too much noise and few samples. Each of these sampling techniques has its benefits and drawbacks, which are very subjective and depend on the context of the application and usage [29].…”
Section: A Sampling Based Techniquesmentioning
confidence: 99%
“…Cluster sampling methods were also used by [27], which introduced the process of cluster density and boundary density thresholds to determine the cluster and sampling boundary. The literature [28] used a method called a bidirectional sampling based on K-means clustering, which performed very well with data that had too much noise and few samples. Each of these sampling techniques has its benefits and drawbacks, which are very subjective and depend on the context of the application and usage [29].…”
Section: A Sampling Based Techniquesmentioning
confidence: 99%
“…This method identifies the regions for oversampling by using the clusters to ensure deviation from over generalization between the samples. Another clustering based approach, bi-directional sampling based on k-means method is proposed in [41], which uses the hybrid solution of both resampling techniques, oversampling and undersampling, with k-means for the imbalances text classification problem. This method eliminates the between-class imbalance problem and within-class imbalance problem, along with avoiding the generation of noise in the data.…”
Section: Background Studymentioning
confidence: 99%
“…Although approaches for overcoming the challenges posed by data imbalance have been proposed in many previous studies, such as [35][36][37][38][39][40][41][42][43][44][45][46], the issue of imbalanced data in machine learning studies still remains unresolved. In some of the primary studies selected in this SLR, such as [47][48][49], resampling techniques have been applied to address this problem. In addition, reweighting has been applied in previous studies, such as [50][51][52], to address the imbalance problem.…”
Section: Class Imbalancementioning
confidence: 99%
“…However, among these primary studies, there is sufficient evidence that SLR studies on data preprocessing are lacking, as indicated by the fact that only 2% of the primary studies considered in this study followed SLR guidelines. This finding [136] N P N Y 1.5 [105] N N N Y 1.0 [86] N Y N Y 2.0 [137] N N N Y 1.0 [106] N P N Y 1.5 [83] N Y N Y 2.0 [138] N P N Y 1.5 [139] N P N Y 1.5 [140] N N N Y 1.0 [68] N Y N Y 2.0 [141] N P N Y 1.5 [142] N P N Y 1.5 [67] N P N Y 1.5 [90] N P N Y 1.5 [50] N P N Y 1.5 [53] N N N Y 1.0 [52] N P N Y 1.5 [47] N P N Y 1.5 [48] N N N Y 1.0 [143] N P N Y 1.5 [117] N Y N Y 2.0 [144] N N N Y 1.0 [145] N P N Y 1.5 [73] Y Y Y Y 4.0 [146] N P N Y 1.5 [147] N P N Y 1.5 [148] N N N Y 1.0 [149] N P N Y 1.5 [150] N P N Y 1.5 [60] N P N Y 1.5 [151] N N N Y 1.0 [152] N P N Y 1.5 [153] N N N Y 1.0 [154] N N N Y 1.0 [155] N N N Y 1.0 [156] N N N Y 1.0 [157] N N N Y 1.0 [158] N P N Y 1.5 [159] N N N Y 1.0 [160] N N N Y 1.0 [161] N N N Y 1.0 [162] N N N Y 1.0…”
Section: What Are the Limitations Of Current Research?mentioning
confidence: 99%