Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaga 2019
DOI: 10.18653/v1/d19-5018
|View full text |Cite
|
Sign up to set email alerts
|

Cost-Sensitive BERT for Generalisable Sentence Classification on Imbalanced Data

Abstract: The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be finetuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news documents and other forms of decontextualized social communication (e.g. sentiment analysis), inherently deals with data who… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

4
47
0
1

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 67 publications
(52 citation statements)
references
References 21 publications
(16 reference statements)
4
47
0
1
Order By: Relevance
“…Hence, improved results may be obtained experimentally by using a hyper-parameter search ( 34 ). Furthermore, in addition to cost-sensitive learning, a variety of methods, such as oversampling and data augmentation, are available to address imbalance problems ( 23 , 35 ). Different imbalance strategies may lead to diverse conclusions.…”
Section: Discussionmentioning
confidence: 99%
“…Hence, improved results may be obtained experimentally by using a hyper-parameter search ( 34 ). Furthermore, in addition to cost-sensitive learning, a variety of methods, such as oversampling and data augmentation, are available to address imbalance problems ( 23 , 35 ). Different imbalance strategies may lead to diverse conclusions.…”
Section: Discussionmentioning
confidence: 99%
“…S [11] proposes a hybrid imbalanced data learning framework (HIDLF) to deal with the imbalance of views in the movie review dataset, and then classifies the movie reviews by the proposed HIDLT-SVM algorithm. Harish [12] proposed the the BERT model to deal with the problem of data imbalance in text classification. Li [13] proposes a solution to the imbalanced text problem in a multi-classification task.…”
Section: Imbalanced Textmentioning
confidence: 99%
“…Previous work with WLS data used oversampling of the minority class to address this imbalance, which was effective with some but not all models (Noorian et al, 2017). As recent work with BERT suggests cost-sensitive learning is an effective alternative to address class imbalance (Madabushi et al, 2019), we evaluate the utility of this method also. Cost-sensitive learning involves adjusting the loss function of a model such that changes in performance on one class are weighted more heavily.…”
Section: Introductionmentioning
confidence: 99%