Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue 2016
DOI: 10.18653/v1/w16-3638
|View full text |Cite
|
Sign up to set email alerts
|

Do Characters Abuse More Than Words?

Abstract: Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abusive language. In this study, we investigate the effectiveness of such features for abusive language detection in user-generated online comments, and show that such methods outperform previous state-of-theart approaches and other strong baselines.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
88
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 140 publications
(93 citation statements)
references
References 12 publications
(16 reference statements)
1
88
0
Order By: Relevance
“…Existing methods primarily cast the problem as a supervised document classification task [36]. These can be divided into two categories: one relies on manual feature engineering that are then consumed by algorithms such as SVM, Naive Bayes, and Logistic Regression [2,9,11,16,20,24,[38][39][40][41][42] (classic methods); the other represents the more recent deep learning paradigm that employs neural networks to automatically learn multi-layers of abstract features from raw data [14,27,31,37] (deep learning methods).…”
Section: Methods Of Hate Speech Detection and Related Problemsmentioning
confidence: 99%
“…Existing methods primarily cast the problem as a supervised document classification task [36]. These can be divided into two categories: one relies on manual feature engineering that are then consumed by algorithms such as SVM, Naive Bayes, and Logistic Regression [2,9,11,16,20,24,[38][39][40][41][42] (classic methods); the other represents the more recent deep learning paradigm that employs neural networks to automatically learn multi-layers of abstract features from raw data [14,27,31,37] (deep learning methods).…”
Section: Methods Of Hate Speech Detection and Related Problemsmentioning
confidence: 99%
“…The features used in traditional machine learning approaches are the main aspects distinguishing different methods, and surface-level features such as bag of words, word-level and character-level n-grams, etc. have proven to be the most predictive features [11,13,22]. Apart from features, different algorithms such as Support Vector Machines [10], Naive Baye [16], and Logistic Regression [3,22], etc.…”
Section: Previous Workmentioning
confidence: 99%
“…To detect online hate speech, a large number of scientific studies have been dedicated by using Natural Language Processing (NLP) in combination with Machine Learning (ML) and Deep Learning (DL) methods [1,8,11,13,22,25]. Although supervised machine learning-based approaches have used different text mining-based features such as surface features, sentiment analysis, lexical resources, linguistic features, knowledge-based features or user-based and platformbased metadata [3,6,23], they necessitate a well-defined feature extraction approach.…”
Section: Introductionmentioning
confidence: 99%
“…Badjatiya et al (2017) implemented Gradient Boosted Decision Trees classifiers using word representations trained by deep learning models. Other researchers have investigated characterlevel representations and their effectiveness compared to word-level representations (Mehdad and Tetreault, 2016;Park and Fung, 2017).…”
Section: Related Workmentioning
confidence: 99%