Multilingual Offensive Language Identification with Cross-lingual Embeddings

Ranasinghe, Tharindu; Zampieri, Marcos

doi:10.18653/v1/2020.emnlp-main.470

Cited by 96 publications

(77 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[31] employ a transfer learning approach using BERT for hate speech detection. [35] use cross-lingual embeddings to identify offensive content in multilingual setting. Our multilingual approach is similar in spirit to the method proposed in [34] which use the same model architecture and aligned word embedding to solve the tasks.…”

Section: Related Workmentioning

confidence: 99%

Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media

2021

View full text Add to dashboard Cite

Hate Speech has become a major content moderation issue for online social media platforms. Given the volume and velocity of online content production, it is impossible to manually moderate hate speech related content on any platform. In this paper we utilize a multi-task and multi-lingual approach based on recently proposed Transformer Neural Networks to solve three sub-tasks for hate speech. These sub-tasks were part of the 2019 shared task on hate speech and offensive content (HASOC) identification in Indo-European languages. We expand on our submission to that competition by utilizing multi-task models which are trained using three approaches, (a) multi-task learning with separate task heads, (b) back-translation, and (c) multilingual training. Finally, we investigate the performance of various models and identify instances where the Transformer based models perform differently and better. We show that it is possible to to utilize different combined approaches to obtain models that can generalize easily on different languages and tasks, while trading off slight accuracy (in some cases) for a much reduced inference time compute cost. We open source an updated version of our HASOC 2019 code with the new improvements at https ://githu b.com/socia lmedi aie/MTML_HateS peech .

show abstract

Section: Related Workmentioning

confidence: 99%

Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media

2021

View full text Add to dashboard Cite

show abstract

“…Pre-trained models on a language can be used for many tasks with further fine-tuning and training. On social media, many users communicate in mixed languages [166]. For example, in Asian countries like India and Pakistan, people mix English with the Urdu language.…”

Section: Handling Of a Dynamic Corpusmentioning

confidence: 99%

A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

Arif

2021

JISCR

View full text Add to dashboard Cite

Social media networks are becoming an essential part of life for most of the world’s population. Detecting cyberbullying using machine learning and natural language processing algorithms is getting the attention of researchers. There is a growing need for automatic detection and mitigation of cyberbullying events on social media. In this study, research directions and the theoretical foundation in this area are investigated. A systematic review of the current state-of-the-art research in this area is conducted. A framework considering all possible actors in the cyberbullying event must be designed, including various aspects of cyberbullying and its effect on the participating actors. Furthermore, future directions and challenges are also discussed.

show abstract

“…Joint training of universal encoders has led to enormous progress on standard benchmarks and industrial applications such as (Ranasinghe and Zampieri, 2020;Gencoglu, 2020).…”

Section: Continual Learningmentioning

confidence: 99%

Language Scaling for Universal Suggested Replies Model

Ying¹,

Bajaj²,

Deb³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

View full text Add to dashboard Cite

We consider the problem of scaling automated suggested replies for Outlook email system to multiple languages. Faced with increased compute requirements and low resources for language expansion, we build a single universal model for improving the quality and reducing run-time costs of our production system. However, restricted data movement across regional centers prevents joint training across languages. To this end, we propose a multitask continual learning framework, with auxiliary tasks and language adapters to learn universal language representation across regions. The experimental results show positive crosslingual transfer across languages while reducing catastrophic forgetting across regions. Our online results on real user traffic show significant gains in CTR and characters saved, as well as 65% training cost reduction compared with per-language models. As a consequence, we have scaled the feature in multiple languages including low-resource markets.

show abstract

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Cited by 96 publications

References 45 publications

Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media

Exploring Multi-Task Multi-Lingual Learning of Transformer Models for Hate Speech and Offensive Speech Identification in Social Media

A Systematic Review of Machine Learning Algorithms in Cyberbullying Detection: Future Directions and Challenges

Language Scaling for Universal Suggested Replies Model

Contact Info

Product

Resources

About