Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Joseph, Kenneth; Landwehr, Peter M.; Carley, Kathleen M.

doi:10.1007/978-3-319-05579-4_10

Cited by 51 publications

(43 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the 2012-2013 period, this collection contains on average about 132 million tweets (amounting to 38 GB of compressed data) per month. The quality of Twitter data samples acquired via the publicly available APIs that offer limited access to the full Twitter stream has been studied extensively, to understand the nature of the biases of such data samples [18,19,25,31,32]. Yet, while [32] have shown biases with respect to hashtag and topic prevalence in the Streaming API (which we do not use in this study), [31] shows that the data obtained via the Sample API closely resemble the random samples over the full Twitter stream, which corroborates the specifications of this API.…”

Section: Data Samplingmentioning

confidence: 99%

What to Expect When the Unexpected Happens

Olteanu

Vieweg

Castillo

2015

Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work &Amp; Social Computing

288

View full text Add to dashboard Cite

The use of social media to communicate timely information during crisis situations has become a common practice in recent years. In particular, the one-to-many nature of Twitter has created an opportunity for stakeholders to disseminate crisis-relevant messages, and to access vast amounts of information they may not otherwise have. Our goal is to understand what affected populations, response agencies and other stakeholders can expect-and not expect-from these data in various types of disaster situations. Anecdotal evidence suggests that different types of crises elicit different reactions from Twitter users, but we have yet to see whether this is in fact the case. In this paper, we investigate several crisesincluding natural hazards and human-induced disasters-in a systematic manner and with a consistent methodology. This leads to insights about the prevalence of different information types and sources across a variety of crisis situations.

show abstract

Section: Data Samplingmentioning

confidence: 99%

What to Expect When the Unexpected Happens

Olteanu

Vieweg

Castillo

2015

Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work &Amp; Social Computing

288

View full text Add to dashboard Cite

show abstract

“…The Search API does not provide 100% of all tweets from the Twitter Firehose, but its results are typically representative of those obtained from the Firehose. 20 NCapture saves all recent tweets matching the search criteria to a dataset (with “recent” being defined by Twitter’s internal algorithms); therefore the search was repeated every day over the 6-week period and duplicate tweets were deleted so that each tweet appears only once in the dataset. The tweets were imported into the NVivo qualitative data analysis program and were content coded by two human coders independently (Cohen’s kappa=0.84, indicating strong agreement).…”

Section: Methodsmentioning

confidence: 99%

Perceptions of Secondhand E-cigarette Aerosol among Twitter Users

Unger¹,

Escobedo²,

Allem³

et al. 2016

tobacco reg sci

View full text Add to dashboard Cite

Objectives There is considerable debate among the public health community about the health risks of secondhand exposure to the aerosol from electronic cigarettes (e-cigarettes). Despite mounting scientific evidence on the chemical content of e-cigarette aerosol, public perceptions of the relative safety of secondhand e-cigarette aerosol have not been well characterized. Method This study collected tweets, or messages sent using Twitter, about exposure to secondhand e-cigarette aerosol over a 6-week period in 2015. Tweets were coded on sentiment about e-cigarettes (pro-, anti-, or neutral/unknown) and topic (health, social, advertisement, or unknown). Results The 1519 tweets included 531 pro-e-cigarette tweets, 392 anti-e-cigarette tweets, and 596 neutral tweets. Social tweets far outnumbered health tweets (747 vs. 182, respectively). Social-focused tweets were predominantly pro-e-cigarette, whereas health-focused tweets were predominantly anti-e-cigarette. Discussion Twitter discussions about secondhand vaping are dominated by pro-e-cigarette social tweets, although there is a presence of anti-e-cigarette social tweets and tweets about negative and positive health effects. Public health and regulatory agencies could use social media and traditional media to disseminate the message that e-cigarette aerosol contains potentially harmful chemicals and could be perceived as offensive. This study identifies the prevalent topics and opinions that could be incorporated into health education messages.

show abstract

“…They pointed out that these issues did not receive much attention by the reviewed studies and call for improved documentation of data sampling parameters and in-depth investigation of applied sentiment analysis methods. Other studies focused especially on the data sampling applied in Twitter research and questioned the quality of the data retrieved from Twitter [10], [11]. Following these critics, in this paper, we investigate the Twitter prediction research process in more detail and shed light on the involved actors and decisions.…”

Section: Related Workmentioning

confidence: 99%

Predictions based on Twitter — A critical view on the research process

Madlberger

Almansour

2014

2014 International Conference on Data and Software Engineering (ICODSE)

View full text Add to dashboard Cite

Twitter data is increasingly used to make predictions about real-world events. However recently, several studies directly or indirectly questioned proposed Twitter prediction procedures. In this paper, we conduct a literature review to investigate the research processes adopted by previous Twitter prediction studies in detail. We first identify the actors involved, and then we study how they influence the different phases of the research process. We found that in Twitter prediction research up to four actors perform several sampling, filtering, classification and assessment decisions throughout the development of prediction models. If these decisions and the reasons behind them are not sufficiently documented, the developed prediction methods cannot be reproduced in future research and consequently their validity and reliability are hard to assess.

show abstract

Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Cited by 51 publications

References 7 publications

What to Expect When the Unexpected Happens

What to Expect When the Unexpected Happens

Perceptions of Secondhand E-cigarette Aerosol among Twitter Users

Predictions based on Twitter — A critical view on the research process

Contact Info

Product

Resources

About

Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Cited by 51 publications

References 7 publications

What to Expect When the Unexpected Happens

What to Expect When the Unexpected Happens

Perceptions of Secondhand E-cigarette Aerosol among Twitter Users

Predictions based on Twitter &#x2014; A critical view on the research process

Contact Info

Product

Resources

About

Predictions based on Twitter — A critical view on the research process