Sandip Modha scite author profile

With the growth of social media, the spread of hate speech is also increasing rapidly. Social media are widely used in many countries. Also Hate Speech is spreading in these countries. This brings a need for multilingual Hate Speech detection algorithms. Much research in this area is dedicated to English at the moment. The HASOC track intends to provide a platform to develop and optimize Hate Speech detection algorithms for Hindi, German and English. The dataset is collected from a Twitter archive and pre-classified by a machine learning system. HASOC has two sub-task for all three languages: task A is a binary classification problem (Hate and Not Offensive) while task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY. Overall, 252 runs were submitted by 40 teams. The performance of the best classification algorithms for task A are F1 measures of 0.51, 0.53 and 0.52 for English, Hindi, and German, respectively. For task B, the best classification algorithms achieved F1 measures of 0.26, 0.33 and 0.29 for English, Hindi, and German, respectively. This article presents the tasks and the data development as well as the results. The best performing algorithms were mainly variants of the transformer architecture BERT. However, also other systems were applied with good success.

show abstract

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Modha¹,

Mandl

Shahi

et al. 2021

View full text Add to dashboard Cite

The widespread of offensive content online such as hate speech poses a growing societal problem. AI tools are necessary for supporting the moderation process at online platforms. For the evaluation of these identification tools, continuous experimentation with data sets in different languages are necessary. The HASOC track (Hate Speech and Offensive Content Identification) is dedicated to develop benchmark data for this purpose. This paper presents the HASOC subtrack for English, Hindi, and Marathi. The data set was assembled from Twitter. This subtrack has two sub-tasks. Task A is a binary classification problem (Hate and Not Offensive) offered for all three languages. Task B is a fine-grained classification problem for three classes (HATE) Hate speech, OFFENSIVE and PROFANITY offered for English and Hindi. Overall, 652 runs were submitted by 65 teams. The performance of the best classification algorithms for task A are F1 measures 0.91, 0.78 and 0.83 for Marathi, Hindi and English, respectively. This overview presents the tasks and the data development as well as the detailed results. The systems submitted to the competition applied a variety of technologies. The best performing algorithms were mainly variants of transformer architectures.

show abstract

Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

Modha

Majumder

Mandl

et al. 2020

Expert Systems with Applications

View full text Add to dashboard Cite

Tracking Hate in Social Media: Evaluation, Challenges and Approaches

et al. 2020

View full text Add to dashboard Cite

This paper presents online hate speech as a societal and computational challenge. Offensive content detection in social media is considered as a multilingual, multi-level, multi-class classification problem for three Indo-European languages. This research problem is offered to the community through the HASOC shared task. HASOC intends to stimulate research and development in hate speech recognition across different languages. Three datasets (in English, German, and Hindi) were developed from Twitter and Facebook, and made available. This paper describes the creation of the multilingual datasets and the annotation method. We will present the numerous approaches based on traditional classifiers, deep neural models, and transfer learning models, along with features used for the classification. Results show that the best classifier for the binary classification might not perform best in the multi-class classification, and the performance of the same classifier varies across the languages. Overall, transfer learning models such as BERT, and deep neural models based on LSTMs and CNNs perform similar but better than traditional classifiers such as SVM. We will conclude the discussion with a list of issues that needs to be addressed for future datasets.

show abstract

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Mandl¹,

Modha²,

Shahi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Differential Weight Based Hybrid Approach to Detect Software Plagiarism

Shah¹,

Modha²,

Dave³

2016

View full text Add to dashboard Cite

Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Madhu

Satapara

Modha³

et al. 2023

Expert Systems with Applications

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sandip Modha

Overview of the HASOC track at FIRE 2019

Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages and Conversational Hate Speech

Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance

Tracking Hate in Social Media: Evaluation, Challenges and Approaches

Overview of the HASOC Subtrack at FIRE 2021: Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages

Differential Weight Based Hybrid Approach to Detect Software Plagiarism

Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments

Contact Info

Product

Resources

About