Methods for Detoxification of Texts for the Russian Language

Dementieva, Daryna; Moskovskiy, Daniil; Logacheva, Varvara; Dale, David; Kozlova, Olga; Semenov, Nikita; Panchenko, Alexander

doi:10.28995/2075-7182-2021-20-179-190

Cited by 2 publications

(2 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This dataset has 127,656 training samples, 93,342 validation samples, and 31,915 testing samples. For Russian toxic classification, we used the RuToxic dataset (Dementieva et al, 2021). This two-class dataset was collected from Dvach, a Russian anonymous imageboard.…”

Section: Toxicity Classificationmentioning

confidence: 99%

Knowledge Transfer Between Tasks and Languages in the Multi-task Encoder-agnostic Transformer-based Models

Karpov,

Konovalov

2023

Computational Linguistics and Intellectual Technologies”

View full text Add to dashboard Cite

We explore the knowledge transfer in the simple multi-task encoder-agnostic transformer-based models on five dialog tasks: emotion classification, sentiment classification, toxicity classification, intent classification, and topic classification. We show that these mo dels’ accuracy differs from the analogous single-task models by ∼0.9%. These results hold for the multiple transformer backbones. At the same time, these models have the same backbone for all tasks, which allows them to have about 0.1% more parameters than any analogous single-task model and to support multiple tasks simultaneously. We also found that if we decrease the dataset size to a certain extent, multi-task models outperform singletask ones, especially on the smallest datasets. We also show that while training multilingual models on the Russian data, adding the English data from the same task to the training sample can improve model performance for the multi-task and single-task settings. The improvement can reach 4-5% if the Russian data are scarce enough. We have integrated these models to the DeepPavlov library and to the DREAM dialogue platform.

show abstract

Section: Toxicity Classificationmentioning

confidence: 99%

Knowledge Transfer Between Tasks and Languages in the Multi-task Encoder-agnostic Transformer-based Models

Karpov,

Konovalov

2023

Computational Linguistics and Intellectual Technologies”

View full text Add to dashboard Cite

show abstract

“…Detoxification task is usually considered a variety of TST task from toxic to neutral style. There already exist unsupervised approaches to detoxification (Dementieva et al, 2021a;Dale et al, 2021) for the Russian and English languages. However, the output of these models is often of bad quality.…”

Section: Introductionmentioning

confidence: 99%

RUSSE-2022: Findings of the First Russian Detoxification Shared Task Based on Parallel Corpora

Dementieva¹,

Logacheva²,

Nikishina³

et al. 2022

Computational Linguistics and Intellectual Technologies

View full text Add to dashboard Cite

Text detoxification is the task of rewriting a toxic text into a neutral text while preserving its original content. It has a wide range of applications, e.g. moderation of output of neural chatbots or suggesting less emotional version of posts on social networks. This paper provides a description of RUSSE-2022 competition of detoxification methods for the Russian language. This is the first competition which features (i) parallel training data and (ii) manual evaluation. We describe the setup of the competition, the solutions of the participating teams and analyse their performance. In addition to that, the large-scale evaluation allows us to analyse the performance of automatic evaluation metrics.

show abstract

Methods for Detoxification of Texts for the Russian Language

Cited by 2 publications

References 49 publications

Knowledge Transfer Between Tasks and Languages in the Multi-task Encoder-agnostic Transformer-based Models

Knowledge Transfer Between Tasks and Languages in the Multi-task Encoder-agnostic Transformer-based Models

RUSSE-2022: Findings of the First Russian Detoxification Shared Task Based on Parallel Corpora

Contact Info

Product

Resources

About