Online Multilingual Hate Speech Detection: Experimenting with Hindi and English Social Media

Vashistha, Neeraj; Zubiaga, Arkaitz

doi:10.3390/info12010005

Cited by 50 publications

(27 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on Table 4, most studies implemented transformerbased architecture to deal with abusive language detection in a cross-lingual setting. However, we also observe some works that exploited a traditional machine learning approach, such as logistic regression [6,10,135], linear support vector machines [92,94], and support vector machines [59]. They used multilingual language representation or simple translation tools (to translate the data training to the target languages) for the knowledge sharing between languages.…”

Section: Modelsmentioning

confidence: 99%

“…2021 [135] Multiple models Experimented with several models including a joint-learning architecture which allow the model to learn from source and target languages sequentially.…”

Section: Multiple Modelsmentioning

confidence: 99%

“…They used multilingual language representation or simple translation tools (to translate the data training to the target languages) for the knowledge sharing between languages. Some studies also exploited several neural-based models such as LSTM [29,92,94,135], Bi-LSTMs [29], and GRU [6,28]. The more recent works adopted several transformer-based architectures due to the availability of multilingual transformer models such as Multilingual BERT [1,6,48,92,100,132,135], RoBERTa [30,31], XLM [28,132], and XLM-RoBERTa [30,31,48,110].…”

Section: Modelsmentioning

confidence: 99%

“…Some studies also exploited several neural-based models such as LSTM [29,92,94,135], Bi-LSTMs [29], and GRU [6,28]. The more recent works adopted several transformer-based architectures due to the availability of multilingual transformer models such as Multilingual BERT [1,6,48,92,100,132,135], RoBERTa [30,31], XLM [28,132], and XLM-RoBERTa [30,31,48,110]. Interestingly, we also notice some works that proposed a multichannel architecture based on the multilingual BERT model [20,130], which allows the model to learn the task in several languages sequentially.…”

Section: Modelsmentioning

confidence: 99%

“…Meanwhile, most neural-based models were coupled by multilingual word embedding models, including Facebook MUSE (Multilingual FastText) [6,89,92,94] and Babylon Embeddings [89]. Finally, the transformer-based architectures exploited the multilingual pre-trained model trained on the very big corpus such as Multilingual BERT [1,6,48,92,100,132,135], RoBERTa [30,31], ULMFit [31], and the recent XLM-RoBERTa [30,31,48,110]. It is worth noting that we also discover that some features were introduced to complement the language representation, providing languageagnostic information for knowledge transfer such as a hatespecific lexicon (HurtLex) [29,92,94] and emotion features based on emoji presence [28].…”

Section: Feature Representationmentioning

confidence: 99%

See 4 more Smart Citations

Towards multidomain and multilingual abusive language detection: a survey

Pamungkas

Basile

Patti

2021

Pers Ubiquit Comput

View full text Add to dashboard Cite

Abusive language is an important issue in online communication across different platforms and languages. Having a robust model to detect abusive instances automatically is a prominent challenge. Several studies have been proposed to deal with this vital issue by modeling this task in the cross-domain and cross-lingual setting. This paper outlines and describes the current state of this research direction, providing an overview of previous studies, including the available datasets and approaches employed in both cross-domain and cross-lingual settings. This study also outlines several challenges and open problems of this area, providing insights and a useful roadmap for future work.

show abstract

Section: Modelsmentioning

confidence: 99%

“…2021 [135] Multiple models Experimented with several models including a joint-learning architecture which allow the model to learn from source and target languages sequentially.…”

Section: Multiple Modelsmentioning

confidence: 99%