Cody Buntain scite author profile

A fundamental part of conducting cross-disciplinary web science research is having useful, high-quality datasets that provide value to studies across disciplines. In this paper, we introduce a large, handcoded corpus of online harassment data. A team of researchers collaboratively developed a codebook using grounded theory and labeled 35,000 tweets. Our resulting dataset has roughly 15% positive harassment examples and 85% negative examples. This data is useful for training machine learning models, identifying textual and linguistic features of online harassment, and for studying the nature of harassing comments and the culture of trolling.

show abstract

Automatically Identifying Fake News in Popular Twitter Threads

Buntain

Golbeck

2017

159

View full text Add to dashboard Cite

Information quality in social media is an increasingly important issue, but web-scale data hinders experts' ability to assess and correct much of the inaccurate content, or "fake news," present in these platforms. This paper develops a method for automating fake news detection on Twitter by learning to predict accuracy assessments in two credibility-focused Twitter datasets: CREDBANK, a crowdsourced dataset of accuracy assessments for events in Twitter, and PHEME, a dataset of potential rumors in Twitter and journalistic assessments of their accuracies. We apply this method to Twitter content sourced from BuzzFeed's fake news dataset and show models trained against crowdsourced workers outperform models based on journalists' assessment and models trained on a pooled dataset of both crowdsourced workers and journalists. All three datasets, aligned into a uniform format, are also publicly available. A feature analysis then identifies features that are most predictive for crowdsourced and journalistic accuracy assessments, results of which are consistent with prior work. We close with a discussion contrasting accuracy and credibility and why models of nonexperts outperform models of journalists for fake news detection in Twitter.

show abstract

Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube during the 2016 U.S. Presidential Election

Golovchenko

Buntain

Eady

et al. 2020

The International Journal of Press/Politics

View full text Add to dashboard Cite

This paper investigates online propaganda strategies of the Internet Research Agency (IRA)—Russian “trolls”—during the 2016 U.S. presidential election. We assess claims that the IRA sought either to (1) support Donald Trump or (2) sow discord among the U.S. public by analyzing hyperlinks contained in 108,781 IRA tweets. Our results show that although IRA accounts promoted links to both sides of the ideological spectrum, “conservative” trolls were more active than “liberal” ones. The IRA also shared content across social media platforms, particularly YouTube—the second-most linked destination among IRA tweets. Although overall news content shared by trolls leaned moderate to conservative, we find troll accounts on both sides of the ideological spectrum, and these accounts maintain their political alignment. Links to YouTube videos were decidedly conservative, however. While mixed, this evidence is consistent with the IRA’s supporting the Republican campaign, but the IRA’s strategy was multifaceted, with an ideological division of labor among accounts. We contextualize these results as consistent with a pre-propaganda strategy. This work demonstrates the need to view political communication in the context of the broader media ecology, as governments exploit the interconnected information ecosystem to pursue covert propaganda strategies.

show abstract

Content-based features predict social media influence operations

et al. 2020

View full text Add to dashboard Cite

We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.

show abstract

Identifying social roles in reddit using network structure

Buntain

Golbeck

2014

View full text Add to dashboard Cite

YouTube Recommendations and Effects on Sharing Across Online Social Platforms

Buntain

Bonneau

Nagler

et al. 2021

Proc. ACM Hum.-Comput. Interact.

View full text Add to dashboard Cite

In January 2019, YouTube announced its platform would exclude potentially harmful content from video recommendations while allowing such videos to remain on the platform. While this action is intended to reduce YouTube's role in propagating such content, continued availability of these videos via hyperlinks in other online spaces leaves an open question of whether such actions actually impact sharing of these videos in the broader information space. This question is particularly important as other online platforms deploy similar suppressive actions that stop short of deletion despite limited understanding of such actions' impacts. To assess this impact, we apply interrupted time series models to measure whether sharing of potentially harmful YouTube videos in Twitter and Reddit changed significantly in the eight months around YouTube's announcement. We evaluate video sharing across three curated sets of anti-social content: a set of conspiracy videos that have been shown to experience reduced recommendations in YouTube, a larger set of videos posted by conspiracy-oriented channels, and a set of videos posted by alternative influence network (AIN) channels. As a control, we also evaluate these effects on a dataset of videos from mainstream news channels. Results show conspiracy-labeled and AIN videos that have evidence of YouTube's de-recommendation do experience a significant decreasing trend in sharing on both Twitter and Reddit. At the same time, however, videos from conspiracy-oriented channels actually experience a significant increase in sharing on Reddit following YouTube's intervention, suggesting these actions may have unintended consequences in pushing less overtly harmful conspiratorial content. Mainstream news sharing likewise sees increases in trend on both platforms, suggesting YouTube's suppression of particular content types has a targeted effect. In summary, while this work finds evidence that reducing exposure to anti-social videos within YouTube potentially reduces sharing on other platforms, increases in the level of conspiracy-channel sharing raise concerns about how producers -- and consumers -- of harmful content are responding to YouTube's changes. Transparency from YouTube and other platforms implementing similar strategies is needed to evaluate these effects further.

show abstract

Fake News vs Satire

Golbeck

Mauriello

Auxier

et al. 2018

View full text Add to dashboard Cite

Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum

et al. 2020

View full text Add to dashboard Cite

When the world’s countries agreed on the 2030 Agenda for Sustainable Development, they recognized that equity and inclusion should be at the center of implementing the 17 Sustainable Development Goals (SDGs). SDG 15, which calls for protecting, restoring, and promoting the sustainable use of terrestrial ecosystems, has spurred commitments to restore 350 million hectares of land by 2030. These commitments, primarily made in a top-down manner at the international scale, must be implemented by actively engaging individual landholders and local communities. Ensuring that diverse and marginalized audiences are engaged in the land restoration movement is critical to equitably distributing the economic benefits of restoration. This publication uses social network analysis and machine learning to understand how important the voices of Africans, women, and young people are in governing restoration in Africa. We analyze location- and machine learning-identified demographics from Twitter data collected during the Global Landscapes Forum (GLF), which is the world’s largest platform for promoting sustainable land use practices. Our results suggest that convening the GLF in Nairobi, Kenya elevated the voices of African leaders in comparison to the previous GLF in Bonn, Germany. We also found significant demographic differences in topic-level engagement between different ages, races, and genders. The primary contributions of this paper are a novel methodology for quantifying demographic differences in social media engagement and the application of social media and social network analysis to provide critical insights into the inclusivity of a large political conference aimed at engaging youth and African voices.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Cody Buntain

A Large Labeled Corpus for Online Harassment Research

Automatically Identifying Fake News in Popular Twitter Threads

Cross-Platform State Propaganda: Russian Trolls on Twitter and YouTube during the 2016 U.S. Presidential Election

Content-based features predict social media influence operations

Identifying social roles in reddit using network structure

YouTube Recommendations and Effects on Sharing Across Online Social Platforms

Fake News vs Satire

Identifying social media user demographics and topic diversity with computational social science: a case study of a major international policy forum

Contact Info

Product

Resources

About