2016
DOI: 10.1007/978-3-319-46562-3_12
|View full text |Cite
|
Sign up to set email alerts
|

Harnessing the Power of Text Mining for the Detection of Abusive Content in Social Media

Abstract: The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat -using datasets from Twitter, YouTube, MySpace, Kongregate, Fo… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 27 publications
(15 citation statements)
references
References 22 publications
(32 reference statements)
0
15
0
Order By: Relevance
“…In addition, our dataset is unbalanced, with the 'abusive' comments and 'undecided' comments occurring far less than 'nonabusive' comments (see Table 2). Therefore, we applied resampling [3] to randomly oversample the minority instances before training the model. To minimise the impact of conducting random oversampling, we resampled the training data 3 times, and used the average performance across the 3 result sets.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…In addition, our dataset is unbalanced, with the 'abusive' comments and 'undecided' comments occurring far less than 'nonabusive' comments (see Table 2). Therefore, we applied resampling [3] to randomly oversample the minority instances before training the model. To minimise the impact of conducting random oversampling, we resampled the training data 3 times, and used the average performance across the 3 result sets.…”
Section: Methodsmentioning
confidence: 99%
“…Given that comments in the dataset are typically conversational in style and short, we did not apply stemming or stop word removal. Following results from our previous work [3], Document Frequency (DF) feature reduction was used to cut down on high or low frequency occurrence terms without jeopardising model performance. Term Frequency (TF) was used to normalise the feature values.…”
Section: Content-based Featuresmentioning
confidence: 99%
See 2 more Smart Citations
“…Nowadays, a considerable number of research lines and applications require accurate text classification. For example, spam emails detection (Mohamad and Selamat, 2015;Wang et al, 2011), abusive text detection (Chen et al, 2017), sentiment analysis (Pang et al, 2008), detecting suspicious activities in the Darknet network , and classifying illegal content in the Online Notepad Services (ONS) . However, several factors collaborate to make this task challenging, such as the possibility of access to a sufficient number of labeled examples (Ohana et al, 2012;, and the quality and the length of the addressed text (Li et al, 2016b).…”
Section: Text Classificationmentioning
confidence: 99%