NLRG at SemEval-2021 Task 5: Toxic Spans Detection Leveraging BERT-based Token Classification and Span Prediction Techniques

Chhablani, Gunjan; Sharma, Abheesht; Pandey, Harshit; Bhartia, Yash; Suthaharan, Shan

doi:10.18653/v1/2021.semeval-1.27

Cited by 9 publications

(8 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since SE tasks require highly nuanced semantic understanding, most solutions leveraged large language models pre-trained using transformers, including BERT (Devlin et al 2019) and other types of transformers (Morio et al 2020;Chhablani et al 2021). These models are pre-trained on billions of words of English text data and can be easily fine-tuned to adapt to new tasks.…”

Section: Span Extractionmentioning

confidence: 99%

“…To the best of our knowledge, this is the first work on extracting hate speech spans. In Section 2.3, we described the propaganda (Da San Martino et al 2020) and toxic (Chhablani et al 2021) SE tasks.…”

Section: Comparison With Other Workmentioning

confidence: 99%

“…Turning to toxic SE, the best system achieved an F1-score of 68.5% among the solutions proposed by Chhablani et al (2021). Similar to BERT+token, it is based on sequence labelling except it fine-tunes SpanBERT (Joshi et al 2019).…”

Section: Comparison With Other Workmentioning

confidence: 99%

“…The solutions for the two tasks can be categorised into span prediction and sequence labelling. Span prediction identifies the start and end offsets of the span (Chhablani et al 2021). Sequence labelling classifies each member of a sequence, for example, identify whether each token is toxic (Chhablani et al 2021) or use BIO encoding (i.e., mark the token as (B) if it is at the beginning, (I) if it is inside or (O) if it is outside of the span) (Morio et al 2020).…”

Section: Span Extractionmentioning

confidence: 99%

See 3 more Smart Citations

Automated hate speech detection and span extraction in underground hacking and extremist forums

et al. 2022

View full text Add to dashboard Cite

Hate speech is any kind of communication that attacks a person or a group based on their characteristics, such as gender, religion and race. Due to the availability of online platforms where people can express their (hateful) opinions, the amount of hate speech is steadily increasing that often leads to offline hate crimes. This paper focuses on understanding and detecting hate speech in underground hacking and extremist forums where cybercriminals and extremists, respectively, communicate with each other, and some of them are associated with criminal activity. Moreover, due to the lengthy posts, it would be beneficial to identify the specific span of text containing hateful content in order to assist site moderators with the removal of hate speech. This paper describes a hate speech dataset composed of posts extracted from HackForums, an online hacking forum, and Stormfront and Incels.co, two extremist forums. We combined our dataset with a Twitter hate speech dataset to train a multi-platform classifier. Our evaluation shows that a classifier trained on multiple sources of data does not always improve the performance compared to a mono-platform classifier. Finally, this is the first work on extracting hate speech spans from longer texts. The paper fine-tunes BERT (Bidirectional Encoder Representations from Transformers) and adopts two approaches – span prediction and sequence labelling. Both approaches successfully extract hateful spans and achieve an F1-score of at least 69%.

show abstract

Section: Span Extractionmentioning

confidence: 99%

Section: Comparison With Other Workmentioning

confidence: 99%

Section: Comparison With Other Workmentioning

confidence: 99%

Section: Span Extractionmentioning

confidence: 99%

See 2 more Smart Citations

Automated hate speech detection and span extraction in underground hacking and extremist forums

et al. 2022

View full text Add to dashboard Cite

show abstract

“…It can also be detected by ferreting out offensive and toxic spans in the texts. A toxic span detecting system was developed by leveraging token classification and span prediction techniques that are based on bidirectional encoder representations from transformers (BERT) [36]. Multi-lingual detection of offensive spans (MUDES) [37] was developed to detect offensive spans in texts.…”

Section: Offensive Language Identificationmentioning

confidence: 99%

Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages

Hande¹,

Hegde²,

Priyadharshini³

et al. 2021

Preprint

View full text Add to dashboard Cite

To obtain extensive annotated data for underresourced languages is challenging, so in this research, we have investigated whether it is beneficial to train models using multi-task learning. Sentiment analysis and offensive language identification share similar discourse properties. The selection of these tasks is motivated by the lack of large labelled data for user-generated code-mixed datasets. This paper works on code-mixed YouTube comments for Tamil, Malayalam, and Kannada languages. Our framework is applicable to other sequence classification problems irrespective of the size of the datasets. Experiments show that our multi-task learning model can achieve high results compared with single-task learning while reducing the time and space constraints required to train the models on individual

show abstract

Cross-Domain Toxic Spans Detection

Schouten,

Barbarestani,

Tufa

et al. 2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

NLRG at SemEval-2021 Task 5: Toxic Spans Detection Leveraging BERT-based Token Classification and Span Prediction Techniques

Cited by 9 publications

References 20 publications

Automated hate speech detection and span extraction in underground hacking and extremist forums

Automated hate speech detection and span extraction in underground hacking and extremist forums

Benchmarking Multi-Task Learning for Sentiment Analysis and Offensive Language Identification in Under-Resourced Dravidian Languages

Cross-Domain Toxic Spans Detection

Contact Info

Product

Resources

About