2020
DOI: 10.1109/access.2020.2968173
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

Abstract: In recent times, South Africa has been witnessing insurgence of offensive and hate speech along racial and ethnic dispositions on Twitter. Popular among the South African languages used is English. Although, machine learning has been successfully used to detect offensive and hate speech in several English contexts, the distinctiveness of South African tweets and the similarities among offensive, hate and free speeches require domain-specific English corpus and techniques to detect the offensive and hate speech… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0
2

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 77 publications
(36 citation statements)
references
References 24 publications
0
23
0
2
Order By: Relevance
“…A non-exhaustive list includes Arabic (Mubarak et al, 2017;Chowdhury et al, 2020), Danish (Sigurbergsson and Derczynski, 2020), German (Jaki and De Smedt, 2019;Wiegand et al, 2018b), Hindi (Saroj and Pal, 2020), Italian (Bosco et al, 2018;Fersini et al, 2018), Polish (Ptaszynski et al, 2019), Portuguese (Fortuna et al, 2019), Dutch (Tulkens et al, 2016), and Slovene (Fišer et al, 2017). There is also some work on specific language variants, like Hindi-English code-switched language (Mathur et al, 2018a;Mathur et al, 2018b) or South African English (Oriola and Kotzé, 2020 assurance, each test example candidate has been manually checked by the authors, and replaced by another sample if it (i) comprises only a single non-indicative word, (ii) it is not written in English, or (iii) it relies on world knowledge which is too specific or geographically localized or on contextual information which hinders proper translation. The final English XHATE-999 test set comprises 600, 300 and 99 instances from WUL, TRAC, and GAO, respectively.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…A non-exhaustive list includes Arabic (Mubarak et al, 2017;Chowdhury et al, 2020), Danish (Sigurbergsson and Derczynski, 2020), German (Jaki and De Smedt, 2019;Wiegand et al, 2018b), Hindi (Saroj and Pal, 2020), Italian (Bosco et al, 2018;Fersini et al, 2018), Polish (Ptaszynski et al, 2019), Portuguese (Fortuna et al, 2019), Dutch (Tulkens et al, 2016), and Slovene (Fišer et al, 2017). There is also some work on specific language variants, like Hindi-English code-switched language (Mathur et al, 2018a;Mathur et al, 2018b) or South African English (Oriola and Kotzé, 2020 assurance, each test example candidate has been manually checked by the authors, and replaced by another sample if it (i) comprises only a single non-indicative word, (ii) it is not written in English, or (iii) it relies on world knowledge which is too specific or geographically localized or on contextual information which hinders proper translation. The final English XHATE-999 test set comprises 600, 300 and 99 instances from WUL, TRAC, and GAO, respectively.…”
Section: Related Work and Motivationmentioning
confidence: 99%
“…Paper [19] is about detecting hate speech in South African tweets using machine learning approach. Twitter is the most used social media in South Africa.…”
Section: Resultsmentioning
confidence: 99%
“…eir SVMbased classifier obtained 77.6% accuracy. Oriola and Kotze [19] developed an English corpus of South African tweets and applied various machine learning algorithms to detect offensive speech.…”
Section: Related Workmentioning
confidence: 99%