Hate speech, offensive language, sexism, racism and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. This is due to the openness and willingness of popular media platforms, such as Twitter and Facebook, to host content of sensitive or controversial topics. However, these platforms have not adequately addressed the problem of online abusive behavior, and their responsiveness to the effective detection and blocking of such inappropriate behavior remains limited. In fact, up to now, they have entered an arms race with the perpetrators, who constantly change tactics to evade the detection algorithms deployed by these platforms. Such algorithms are typically custom-designed and tuned to detect only one specific type of abusive behavior, but usually miss other related behaviors.In the present paper, we study this complex problem by following a more holistic approach, which considers the various aspects of abusive behavior. To make the approach tangible, we focus on Twitter data and analyze user and textual properties from different angles of abusive posting behavior. We propose a deep learning architecture, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related. We apply this unified architecture in a seamless, transparent fashion to detect different types of abusive behavior (hate speech, sexism vs. racism, bullying, sarcasm, etc.) without the need for any tuning of the model architecture for each task. We test the proposed approach with multiple datasets addressing different and multiple abusive behaviors on Twitter. Our results demonstrate that it largely outperforms the state-of-art methods (between 21 and 45% improvement in AUC, depending on the dataset).1 Did trolls cost Twitter 3.5bn and its sale? goo.gl/PlIL66 2 A Calendar of Our Safety Work (Twitter): https:
In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels. By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.
Smart cities (SCs) are becoming highly sophisticated ecosystems at which innovative solutions and smart services are being deployed. These ecosystems consider SCs as data production and sharing engines, setting new challenges for building effective SC architectures and novel services. The aim of this article is to “connect the pieces” among Data Science and SC domains, with a systematic literature review which identifies the core topics, services, and methods applied in SC data monitoring. The survey focuses on data harvesting and data mining processes over repeated SC data cycles. A survey protocol is followed to reach both quantitative and semantically important entities. The review results generate useful taxonomies for data scientists in the SC context, which offers clear guidelines for corresponding future works. In particular, a taxonomy is proposed for each of the main SC data entities, namely, the “D Taxonomy” for the data production, the “M Taxonomy” for data analytics methods, and the “S Taxonomy” for smart services. Each of these taxonomies clearly places entities in a classification which is beneficial for multiple stakeholders and for multiple domains in urban smartness targeting. Such indicative scenarios are outlined and conclusions are quite promising for systemizing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.