National governments now recognize online hate speech as a pernicious social problem. In the wake of political votes and terror attacks, hate incidents online and offline are known to peak in tandem. This article examines whether an association exists between both forms of hate, independent of ‘trigger’ events. Using Computational Criminology that draws on data science methods, we link police crime, census and Twitter data to establish a temporal and spatial association between online hate speech that targets race and religion, and offline racially and religiously aggravated crimes in London over an eight-month period. The findings renew our understanding of hate crime as a process, rather than as a discrete event, for the digital age.
In this paper we present a proposal to address the problem of the pricey and unreliable human annotation, which is important for detection of hate speech from the web contents. In particular, we propose to use the text that are produced from the suspended accounts in the aftermath of a hateful event as subtle and reliable source for hate speech prediction. The proposal was motivated after implementing emotion analysis on three sources of data sets: suspended, active and neutral ones, i.e. the first two sources of data sets contain hateful tweets from suspended accounts and active accounts, respectively, whereas the third source of data sets contain neutral tweets only. The emotion analysis indicated that the tweets from suspended accounts show more disgust, negative, fear and sadness emotions than the ones from active accounts, although tweets from both types of accounts might be annotated as hateful ones by human annotators. We train two Random Forest classifiers based on the semantic meaning of tweets respectively from suspended and active accounts, and evaluate the prediction accuracy of the two classifiers on unseen data. The results show that the classifier trained on the tweets from suspended accounts outperformed the one trained on the tweets from active accounts by 16% of overall F-score.
Massive online social networks with hundreds of millions of active users are increasingly being used by Cyber criminals to spread malicious software (malware) to exploit vulnerabilities on the machines of users for personal gain. Twitter is particularly susceptible to such activity as, with its 140 character limit, it is common for people to include URLs in their tweets to link to more detailed information, evidence, news reports and so on. URLs are often shortened so the endpoint is not obvious before a person clicks the link. Cyber criminals can exploit this to propagate malicious URLs on Twitter, for which the endpoint is a malicious server that performs unwanted actions on the person's machine. This is known as a drive-by-download. In this paper we develop a machine classification system to distinguish between malicious and benign URLs within seconds of the URL being clicked (i.e. 'real-time'). We train the classifier using machine activity logs created while interacting with URLs extracted from Twitter data collected during a large global event-the Superbowl-and test it using data from another large sporting event-the Cricket World Cup. The results show that machine activity logs produce precision performances of up to 0.975 on training data from the first event and 0.747 on a test data from a second event. Furthermore, we examine the properties of the learned model to explain the relationship between machine activity and malicious software behaviour, and build a learning curve for the classifier to illustrate that very small samples of training data can be used with only a small detriment to performance.
Twitter has emerged as one of the most popular platforms to get updates on entertainment and current events. However, due to its 280 character restriction and automatic shortening of URLs, it is continuously targeted by cybercriminals to carry out drive-by download attacks, where a user's system is infected by merely visiting a Web page. Popular events that attract a large number of users are used by cybercriminals to infect and propagate malware by using popular hashtags and creating misleading tweets to lure users to malicious Web pages. A drive-by download attack is carried out by obfuscating a malicious URL in an enticing tweet and used as clickbait to lure users to a malicious Web page. In this paper we answer the following two questions: Why are certain malicious tweets retweeted more than others? Do emotions reflecting in a tweet drive virality? We gathered tweets from seven different sporting events over three years and identified those tweets that were used to carry to out a drive-by download attack. From the malicious (N=105,642) and benign (N=169,178) data sample identified, we built models to predict information flow size and survival. We define size as the number of retweets of an original tweet, and survival as the duration of the original tweet's presence in the study window. We selected the zero-truncated negative binomial (ZTNB) regression method for our analysis based on the distribution exhibited by our dependent size measure and the comparison of results with other predictive models. We used the Cox regression technique to model the survival of information flows as it estimates proportional hazard rates for independent measures. Our results show that both social and content factors are statistically significant for the size and survival of information flows for both malicious and benign tweets. In the benign data sample, positive emotions and positive sentiment reflected in the tweet significantly predict size and survival. In contrast, for the malicious data sample, negative emotions, especially fear, are associated with both size and survival of information flows. CCS Concepts: • Security and privacy → Social network security and privacy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.