2021
DOI: 10.28995/2075-7182-2021-20-179-190
|View full text |Cite
|
Sign up to set email alerts
|

Methods for Detoxification of Texts for the Russian Language

Abstract: We introduce the first study of automatic detoxification of Russian texts to combat offensive language. Such a kind of textual style transfer can be used, for instance, for processing toxic content in social media. While much work has been done for the English language in this field, it has never been solved for the Russian language yet. We test two types of models -unsupervised approach based on BERT architecture that performs local corrections and supervised approach based on pretrained language GPT-2 model … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 49 publications
0
2
0
Order By: Relevance
“…This dataset has 127,656 training samples, 93,342 validation samples, and 31,915 testing samples. For Russian toxic classification, we used the RuToxic dataset (Dementieva et al, 2021). This two-class dataset was collected from Dvach, a Russian anonymous imageboard.…”
Section: Toxicity Classificationmentioning
confidence: 99%
“…This dataset has 127,656 training samples, 93,342 validation samples, and 31,915 testing samples. For Russian toxic classification, we used the RuToxic dataset (Dementieva et al, 2021). This two-class dataset was collected from Dvach, a Russian anonymous imageboard.…”
Section: Toxicity Classificationmentioning
confidence: 99%
“…Detoxification task is usually considered a variety of TST task from toxic to neutral style. There already exist unsupervised approaches to detoxification (Dementieva et al, 2021a;Dale et al, 2021) for the Russian and English languages. However, the output of these models is often of bad quality.…”
Section: Introductionmentioning
confidence: 99%