Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Modha, Sandip; Mandl, Thomas; Shahi, Gautam Kishore; Madhu, Hiren; Satapara, Shrey; Ranasinghe, Tharindu; Zampieri, Marcos

doi:10.1145/3503162.3503176

Cited by 57 publications

(42 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Its performance, averaged on the two Hindi problems and the Marathi problem, ranks it in first place among the teams that proposed systems for at least two of these problems. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches that are frequently used to address this type of task such as deep learning or taking into account complementary resources (Mandl et al, 2019;Mandl et al, 2020;. However, it is essential to note that the proposed system never ranked first in any specific task.…”

Section: Discussionmentioning

confidence: 99%

“…Not surprisingly, a lot of research is being done to develop automatic detection systems. As in many NLP domains, deep learning approaches and the use of pre-computed embeddings have proven to be the most efficient, even in languages with few resources (Mandl et al, 2019;Mandl et al, 2020). However, traditional machine learning systems have sometimes proven to be very competitive (Mujadia et al, 2019;Saroj et al, 2019).…”

mentioning

confidence: 99%

See 1 more Smart Citation

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Bestgen¹

2022

Preprint

View full text Add to dashboard Cite

For automatically identifying hate speech and offensive content in tweets, a system based on a classical supervised algorithm only fed with character n-grams, and thus completely language-agnostic, is proposed by the SATLab team. After its optimization in terms of the feature weighting and the classifier parameters, it reached, in the multilingual HASOC 2021 challenge, a medium performance level in English, the language for which it is easy to develop deep learning approaches relying on many external linguistic resources, but a far better level for the two less resourced language, Hindi and Marathi. It ends even first when performances are averaged over the three tasks in these languages, outperforming many deep learning approaches. These performances suggest that it is an interesting reference level to evaluate the benefits of using more complex approaches such as deep learning or taking into account complementary resources.

show abstract

Section: Discussionmentioning

confidence: 99%

mentioning

confidence: 99%

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Bestgen¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The Hate speech and Offensive Content Identification in English and Indo-Aryan Languages HASOC 2021 [5,6] purposes two different tasks, in 3 different languages English, Hindi, Marathi. The authors participated in both tasks for English and Hindi languages.…”

Section: Languagesmentioning

confidence: 99%

“…The authors would like to thank the organizers of Hate Speech and Offensive Content Identification in Indo-Aryan Languages 2021 [5] for conducting this data challenge. The authors gratefully acknowledge google colab for providing GPU's to do the computation.…”

Section: Acknowledgmentsmentioning

confidence: 99%

See 1 more Smart Citation

Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss

Mitra¹,

Sankhala²

2022

Preprint

View full text Add to dashboard Cite

The number of increased social media users has led to a lot of people misusing these platforms to spread offensive content and use hate speech. Manual tracking the vast amount of posts is impractical so it is necessary to devise automated methods to identify them quickly. Large language models are trained on a lot of data and they also make use of contextual embeddings. We fine-tune the large language models to help in our task. The data is also quite unbalanced; so we used a modified cross-entropy loss to tackle the issue. We observed that using a model which is fine-tuned in hindi corpora performs better. Our team (HNLP) achieved the macro F1-scores of 0.808, 0.639 in English Subtask A and English Subtask B respectively. For Hindi Subtask A, Hindi Subtask B our team achieved macro F1-scores of 0.737, 0.443 respectively in HASOC 2021.

show abstract

On the Importance of Word Embedding in Automated Harmful Information Detection

Mohtaj

Möller

2022

Text, Speech, and Dialogue

View full text Add to dashboard Cite

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Cited by 57 publications

References 15 publications

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

A simple language-agnostic yet very strong baseline system for hate speech and offensive content identification

Multilingual Hate Speech and Offensive Content Detection using Modified Cross-entropy Loss

On the Importance of Word Embedding in Automated Harmful Information Detection

Contact Info

Product

Resources

About