Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

Oriola, Oluwafemi; Kotzé, Eduan

doi:10.1109/access.2020.2968173

Cited by 77 publications

(36 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…A non-exhaustive list includes Arabic (Mubarak et al, 2017;Chowdhury et al, 2020), Danish (Sigurbergsson and Derczynski, 2020), German (Jaki and De Smedt, 2019;Wiegand et al, 2018b), Hindi (Saroj and Pal, 2020), Italian (Bosco et al, 2018;Fersini et al, 2018), Polish (Ptaszynski et al, 2019), Portuguese (Fortuna et al, 2019), Dutch (Tulkens et al, 2016), and Slovene (Fišer et al, 2017). There is also some work on specific language variants, like Hindi-English code-switched language (Mathur et al, 2018a;Mathur et al, 2018b) or South African English (Oriola and Kotzé, 2020 assurance, each test example candidate has been manually checked by the authors, and replaced by another sample if it (i) comprises only a single non-indicative word, (ii) it is not written in English, or (iii) it relies on world knowledge which is too specific or geographically localized or on contextual information which hinders proper translation. The final English XHATE-999 test set comprises 600, 300 and 99 instances from WUL, TRAC, and GAO, respectively.…”

Section: Related Work and Motivationmentioning

confidence: 99%

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

Glavaš¹,

Karan²,

Vulić³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

We present XHATE-999, a multi-domain and multilingual evaluation data set for abusive language detection. By aligning test instances across six typologically diverse languages, XHATE-999 for the first time allows for disentanglement of the domain transfer and language transfer effects in abusive language detection. We conduct a series of domain-and language-transfer experiments with state-of-the-art monolingual and multilingual transformer models, setting strong baseline results and profiling XHATE-999 as a comprehensive evaluation resource for abusive language detection. Finally, we show that domain-and language-adaptation, via intermediate masked language modeling on abusive corpora in the target language, can lead to substantially improved abusive language detection in the target language in the zero-shot transfer setups.

show abstract

Section: Related Work and Motivationmentioning

confidence: 99%

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

Glavaš¹,

Karan²,

Vulić³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Paper [19] is about detecting hate speech in South African tweets using machine learning approach. Twitter is the most used social media in South Africa.…”

Section: Resultsmentioning

confidence: 99%

A Systematic Literature Review of Different Machine Learning Methods on Hate Speech Detection

Salim

Suhartono

2020

JOIV : Int. J. Inform. Visualization

View full text Add to dashboard Cite

Hate speech is one of the most challenging problem internet is facing today. This systematic literature review examine hate speech detection problem and will be used to do an experimental approach on detecting hate speech and abusive language. This work also provide an overview of previous research, including methods, algorithms, and main features used. We use two research questions in this literature review which will be the foundation of the next experimental research. Correctly classifying a piece of text as an actual hate speech requires a lot of correctly labelled data. Most common challenges are different languages, out of vocabulary words, long range dependencies and many more.

show abstract

“…eir SVMbased classifier obtained 77.6% accuracy. Oriola and Kotze [19] developed an English corpus of South African tweets and applied various machine learning algorithms to detect offensive speech.…”

Section: Related Workmentioning

confidence: 99%

Towards a Framework for Acquisition and Analysis of Speeches to Identify Suspicious Contents through Machine Learning

et al. 2020

View full text Add to dashboard Cite

The most prominent form of human communication and interaction is speech. It plays an indispensable role for expressing emotions, motivating, guiding, and cheering. An ill-intentioned speech can mislead people, societies, and even a nation. A misguided speech can trigger social controversy and can result in violent activities. Every day, there are a lot of speeches being delivered around the world, which are quite impractical to inspect manually. In order to prevent any vicious action resulting from any misguided speech, the development of an automatic system that can efficiently detect suspicious speech has become imperative. In this study, we have presented a framework for acquisition of speech along with the location of the speaker, converting the speeches into texts and, finally, we have proposed a system based on long short-term memory (LSTM) which is a variant of recurrent neural network (RNN) to classify speeches into suspicious and nonsuspicious. We have considered speeches of Bangla language and developed our own dataset that contains about 5000 suspicious and nonsuspicious samples for training and validating our model. A comparative analysis of accuracy among other machine learning algorithms such as logistic regression, SVM, KNN, Naive Bayes, and decision tree is performed in order to evaluate the effectiveness of the system. The experimental results show that our proposed deep learning-based model provides the highest accuracy compared to other algorithms.

show abstract

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

Cited by 77 publications

References 24 publications

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

XHate-999: Analyzing and Detecting Abusive Language Across Domains and Languages

A Systematic Literature Review of Different Machine Learning Methods on Hate Speech Detection

Towards a Framework for Acquisition and Analysis of Speeches to Identify Suspicious Contents through Machine Learning

Contact Info

Product

Resources

About