2022
DOI: 10.21108/ijoict.v8i1.622
|View full text |Cite
|
Sign up to set email alerts
|

Overcoming Data Imbalance Problems in Sexual Harassment Classification with SMOTE

Abstract: Delivery of justice with the help of artificial intelligence is a current research interest. Machine learning with natural language processing (NLP) can classify the types of sexual harassment experiences into quid pro quo (QPQ) and hostile work environments (HWE). However, imbalanced data are often present in classes of sexual harassment classification on specific datasets. Data imbalance can cause a decrease in the classifier's performance because it usually tends to choose the majority class. This study pro… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
1
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(2 citation statements)
references
References 27 publications
(29 reference statements)
0
1
0
Order By: Relevance
“…where 𝑥 1𝑖 is the first variable with data item 𝑖, 𝑚 is the dataset size, 𝑥 1 ̅̅̅ is the average of the first variable, 𝑥 2𝑖 is the second variable with data item 𝑖 and 𝑥 2 ̅̅̅ is the average of the second variable. Furthermore, we apply random oversampling [14]. The application of random oversampling is for imbalanced data.…”
Section: A Chicken Egg Harvesting Data and Pre-processingmentioning
confidence: 99%
“…where 𝑥 1𝑖 is the first variable with data item 𝑖, 𝑚 is the dataset size, 𝑥 1 ̅̅̅ is the average of the first variable, 𝑥 2𝑖 is the second variable with data item 𝑖 and 𝑥 2 ̅̅̅ is the average of the second variable. Furthermore, we apply random oversampling [14]. The application of random oversampling is for imbalanced data.…”
Section: A Chicken Egg Harvesting Data and Pre-processingmentioning
confidence: 99%
“…An imbalanced dataset is when, in a dataset, the number of one label (majority label) is far greater than the other (minority labels) [30]. Imbalanced datasets can affect the performance of machine learning models, then the validity of a measurement metric [31].…”
Section: Roc Threshold Selectionmentioning
confidence: 99%