Ritesh Kumar scite author profile

As offensive content has become pervasive in social media, there has been much research in identifying potentially offensive messages. However, previous work on this topic did not consider the problem as a whole, but rather focused on detecting very specific types of offensive content, e.g., hate speech, cyberbulling, or cyber-aggression. In contrast, here we target several different kinds of offensive content.In particular, we model the task hierarchically, identifying the type and the target of offensive messages in social media. For this purpose, we complied the Offensive Language Identification Dataset (OLID), a new dataset with tweets annotated for offensive content using a fine-grained three-layer annotation scheme, which we make publicly available. We discuss the main similarities and differences between OLID and pre-existing datasets for hate speech identification, aggression detection, and similar tasks. We further experiment with and we compare the performance of different machine learning models on OLID.

show abstract

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

Zampieri

Malmasi

Nakov³

et al. 2019

546

506

View full text Add to dashboard Cite

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets. It featured three sub-tasks. In sub-task A, the goal was to discriminate between offensive and non-offensive posts. In sub-task B, the focus was on the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, about 800 teams signed up to participate in the task, and 115 of them submitted results, which we present and analyze in this report.

show abstract

Multilingual Protest News Detection - Shared Task 1, CASE 2021

Hürriyetoğlu¹,

Mutlu²,

Yörük³

et al. 2021

View full text Add to dashboard Cite

Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zeroshot settings for socio-political event information collection is achieved in the scope of the shared task Socio-political and Crisis Events Detection at the workshop CASE @ ACL-IJCNLP 2021. Socio-political event data is utilized for national and international policyand decision-making. Therefore, the reliability and validity of such datasets are of utmost importance. We split the shared task into three parts to address the three aspects of data collection (Task 1), fine-grained semantic classification (Task 2), and evaluation (Task 3). Task 1, which is the focus of this report, is on multilingual protest news detection and comprises four subtasks that are document classification (subtask 1), sentence classification (subtask 2), event sentence coreference identification (subtask 3), and event extraction (subtask 4). All subtasks have English, Portuguese, and Spanish for both training and evaluation data. Data in Hindi language is available only for the evaluation of subtask 1. The majority of the submissions, which are 238 in total, are created using multi-and cross-lingual approaches. Best scores are between 77.27 and 84.55 F1-macro for subtask 1, between 85.32 and 88.61 F1macro for subtask 2, between 84.23 and 93.03 CoNLL 2012 average score for subtask 3, and between 66.20 and 78.11 F1-macro for subtask 4 in all evaluation settings. The performance of the best system for subtask 4 is above 66.20 F1 for all available languages. Although there is still a significant room for improvement in cross-lingual and zero-shot settings, the best submissions for each evaluation scenario yield remarkable results. Monolingual models outperformed the multilingual models in a few evaluation scenarios, in which there is relatively much training data.

show abstract

Verbal Aggression on Social Media: How, why and its Automatic Identification

Kumar¹

2021

Preprint

View full text Add to dashboard Cite

In recent times, verbal aggression and related phenomena of hate speech, abusive language, trolling, etc. have become a major problem over social media. In this paper, I present the results of a large-scale quantitative study of aggression based on a target-based typology in a manually-annotated multilingual dataset of over 20,000 Facebook comments and tweets each written in Hindi, English or code-mixed Hindi-English. Taking insights from this study, I develop 2 different classifiers for detecting aggression in Hindi, English and Hindi-English mixed Facebook and Twitter conversations. The classifiers are developed using an annotatedcorpus of approximately 9,000 Facebook comments and 5,000 tweets. Since a phenomenon like aggression is highly subjective, the study shows a comparatively modest inter-annotator agreement of 0.72 and an overall F1 score of 0.64 for both Facebook and Twitter. Consequently, I also carried out two user studies, where humans were asked to evaluate the annotations by the classifier, to test the actual 'acceptance' of the classifier's judgments. I discuss the results of this user study and give an analysis of the overall performance of the system.

show abstract

Impact of urban heat island formation on energy consumption in Delhi

et al. 2021

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ritesh Kumar

Predicting the Type and Target of Offensive Posts in Social Media

SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

Multilingual Protest News Detection - Shared Task 1, CASE 2021

Verbal Aggression on Social Media: How, why and its Automatic Identification

Impact of urban heat island formation on energy consumption in Delhi

Contact Info

Product

Resources

About