Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Bourgonje, Peter; Schneider, Julián Moreno; Srivastava, Ankit; Rehm, Georg

doi:10.1007/978-3-319-73706-5_15

Cited by 24 publications

(19 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In (Wang et al 2017), naive Bayesian classifier demonstrated the worst classification quality. In another study (Bourgonje et al 2018), the relatively high quality of this model was observed only with a relatively large length of texts.…”

Section: Naive Bayesian Classifiermentioning

confidence: 87%

“…It was noted that the average length of texts affects the result of classification. In (Bourgonje et al 2018) the authors deal with publications in social network Twitter and articles from Wikipedia with an average length of 18 and 65 words respectively. As a result of experiments, it became obvious that different classifiers give best result for different average length of texts.…”

Section: Influence Of the Length Of Text On The Quality Of Classificamentioning

confidence: 99%

“…Based on works (Abuhaiba and Dawoud, 2017;Bourgonje et al 2018;Liu et al 2017;Semberecki and Maciejewski, 2017), it is possible to identify the models of machine learning that are most suitable for classification of textual data. Such models are: logistic regression, random forest, SVM, and artificial neural network (both feedforward and LSTM).…”

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

See 2 more Smart Citations

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov

Lomotin

Kozlova

2019

Data Science Journal

View full text Add to dashboard Cite

This work is devoted to the study of applicability of modern methods of machine learning to the task of automatic classification of scientific articles and abstracts. For this purpose, the study of such models of machine learning as artificial neural networks, random forest, logistic regression, and support vector machine was carried out with taking into account such a feature of scientific texts as a large number of terms specific for various categories. Separately, the stages of data collection and extraction of text characteristics are considered. The results of research are used in development of a decision support system for assignment of scientific texts to the code of the department or abstract journal of All-Russian Institute of Scientific and Technical Information of Russian Academy of Sciences.

show abstract

Section: Naive Bayesian Classifiermentioning

confidence: 87%

Section: Influence Of the Length Of Text On The Quality Of Classificamentioning

confidence: 99%

Section: Influence Of Number Of Classes On the Quality Of Classificationmentioning

confidence: 99%

See 1 more Smart Citation

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Romanov

Lomotin

Kozlova

2019

Data Science Journal

View full text Add to dashboard Cite

show abstract

“…From an NLP perspective, the challenge of dealing with this problem is further exemplified by the fact that annotated data is hard to find, and, if present, exhibits rather low inter-annotator agreement. Approaching the "abusive language" and "hate speech" problem from an NLP angle (Bourgonje et al, 2017), (Ross et al, 2016) introduce a German corpus of tweets and annotate it for hate speech, resulting in figures for Krippendorff's α between 0.18 and 0.29, (Waseem, 2016) compare amateur (CrowdFlower) annotations and expert annotations on an English corpus of Tweets and report figures for Cohen's Kappa of 0.14, (Van Hee et al, 2015) use a Dutch corpus annotated for cyberbullying and report Kappa scores between 0.19 and 0.69, and (Kwok and Wang, 2013) investigate English racist tweets and report an overall interannotator agreement of only 33%.…”

Section: Related Workmentioning

confidence: 99%

From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles

Bourgonje¹,

Schneider²,

Rehm³

2017

Proceedings of the 2017 EMNLP Workshop: Natural Language Processing Meets Journalism

Self Cite

110

View full text Add to dashboard Cite

We present a system for the detection of the stance of headlines with regard to their corresponding article bodies. The approach can be applied in fake news, especially clickbait detection scenarios. The component is part of a larger platform for the curation of digital content; we consider veracity and relevancy an increasingly important part of curating online information. We want to contribute to the debate on how to deal with fake news and related online phenomena with technological means, by providing means to separate related from unrelated headlines and further classifying the related headlines. On a publicly available data set annotated for the stance of headlines with regard to their corresponding article bodies, we achieve a (weighted) accuracy score of 89.59.

show abstract

“…3); the concept has been devised in a research and technology transfer project, in which smart technologies for curating large amounts of digital content are being developed and applied by companies that cover different sectors including journalism (Rehm and Sasaki 2015;Bourgonje et al 2016a,b;Rehm et al 2017). Among others, we currently develop services aimed at the detection and classification of abusive language (Bourgonje et al 2017a) and clickbait content (Bourgonje et al 2017b). The proposed hybrid infrastructure combines automatic language technology components and user-generated annotations and is meant to empower internet users better to handle the modern online media phenomena mentioned above.…”

Section: Introductionmentioning

confidence: 99%

An Infrastructure for Empowering Internet Users to Handle Fake News and Other Online Media Phenomena

Rehm

2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

show abstract

Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication

Cited by 24 publications

References 16 publications

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

Application of Natural Language Processing Algorithms to the Task of Automatic Classification of Russian Scientific Texts

From Clickbait to Fake News Detection: An Approach based on Detecting the Stance of Headlines to Articles

An Infrastructure for Empowering Internet Users to Handle Fake News and Other Online Media Phenomena

Contact Info

Product

Resources

About