Proceedings - Natural Language Processing in a Deep Learning World 2019
DOI: 10.26615/978-954-452-056-4_029
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Toxicity in News Articles: Application to Bulgarian

Abstract: Online media aim for reaching ever bigger audience and for attracting ever longer attention span. This competition creates an environment that rewards sensational, fake, and toxic news. To help limit their spread and impact, we propose and develop a news toxicity detector that can recognize various types of toxic content. While previous research primarily focused on English, here we target Bulgarian. We created a new dataset by crawling a website that for five years has been collecting Bulgarian news articles … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
1
1

Relationship

2
5

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 59 publications
0
4
0
Order By: Relevance
“…This includes studying the impact of misinformation on politics [4], [15], [16], society [5], [8], economy [7], and health [18]. Different machine learning algorithms have been developed to automatically detect false news [34]- [37], propaganda [26], [38], [39], toxicity [40], and fact checking [41] from the online web content.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This includes studying the impact of misinformation on politics [4], [15], [16], society [5], [8], economy [7], and health [18]. Different machine learning algorithms have been developed to automatically detect false news [34]- [37], propaganda [26], [38], [39], toxicity [40], and fact checking [41] from the online web content.…”
Section: Related Workmentioning
confidence: 99%
“…To evade the problem, English language resources and datasets were translated to low resource languages. For instance, news toxicity detector was developed by translating English news text to Bulgarian language [40]. The toxicity detector showed the accuracy of 59% with stylometric, NELA, and word embedding features.…”
Section: Related Workmentioning
confidence: 99%
“…While the analysis of the language used by the target news outlet is the most important information source, we can also consider information in Wikipedia and in social media, traffic statistics, and the structure of the target sites URL as shown in Figure 1: 1. the text of a few hundred articles published by the target news outlet, analyzing the style, subjectivity, sentiment, offensiveness [77,78,79,62], toxicity [30], morality, vocabulary richness, propagandistic content, etc. ; 2. the text of its Wikipedia page (if any), including infobox, summary, content, categories, e.g., it might say that the website spreads false information and conspiracy theories; 3. metadata and statistics about its Twitter account (if any): is it an old account, is it verified, is it popular, how is the medium self-describing, is there a link to its website, etc.…”
Section: Related Workmentioning
confidence: 99%
“…It is also useful to understand whether the post is propagandistic, what propaganda techniques are used, and how the issue is framed. While there have been studies focusing on (some of) these issues for high-resource languages such as English and Arabic (Barrón-Cedeño et al, 2020;Hossain et al, 2020;Li et al, 2020;Nakov et al, 2021a,c), there has been less work for low-resource languages such as Bulgarian (Dinkov et al, 2019;Alam et al, 2021d;Shaar et al, 2021b,c). Here, we aim to bridge this gap by analyzing tweets and Facebook posts about COVID-19 in Bulgarian, with focus on factuality, harmfulness, propaganda, and framing.…”
Section: Introductionmentioning
confidence: 99%