2014
DOI: 10.1007/978-3-319-05579-4_10
|View full text |Cite
|
Sign up to set email alerts
|

Two 1%s Don’t Make a Whole: Comparing Simultaneous Samples from Twitter’s Streaming API

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
42
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
4
4
2

Relationship

0
10

Authors

Journals

citations
Cited by 51 publications
(43 citation statements)
references
References 7 publications
1
42
0
Order By: Relevance
“…In the 2012-2013 period, this collection contains on average about 132 million tweets (amounting to 38 GB of compressed data) per month. The quality of Twitter data samples acquired via the publicly available APIs that offer limited access to the full Twitter stream has been studied extensively, to understand the nature of the biases of such data samples [18,19,25,31,32]. Yet, while [32] have shown biases with respect to hashtag and topic prevalence in the Streaming API (which we do not use in this study), [31] shows that the data obtained via the Sample API closely resemble the random samples over the full Twitter stream, which corroborates the specifications of this API.…”
Section: Data Samplingmentioning
confidence: 99%
“…In the 2012-2013 period, this collection contains on average about 132 million tweets (amounting to 38 GB of compressed data) per month. The quality of Twitter data samples acquired via the publicly available APIs that offer limited access to the full Twitter stream has been studied extensively, to understand the nature of the biases of such data samples [18,19,25,31,32]. Yet, while [32] have shown biases with respect to hashtag and topic prevalence in the Streaming API (which we do not use in this study), [31] shows that the data obtained via the Sample API closely resemble the random samples over the full Twitter stream, which corroborates the specifications of this API.…”
Section: Data Samplingmentioning
confidence: 99%
“…The Search API does not provide 100% of all tweets from the Twitter Firehose, but its results are typically representative of those obtained from the Firehose. 20 NCapture saves all recent tweets matching the search criteria to a dataset (with “recent” being defined by Twitter’s internal algorithms); therefore the search was repeated every day over the 6-week period and duplicate tweets were deleted so that each tweet appears only once in the dataset. The tweets were imported into the NVivo qualitative data analysis program and were content coded by two human coders independently (Cohen’s kappa=0.84, indicating strong agreement).…”
Section: Methodsmentioning
confidence: 99%
“…They pointed out that these issues did not receive much attention by the reviewed studies and call for improved documentation of data sampling parameters and in-depth investigation of applied sentiment analysis methods. Other studies focused especially on the data sampling applied in Twitter research and questioned the quality of the data retrieved from Twitter [10], [11]. Following these critics, in this paper, we investigate the Twitter prediction research process in more detail and shed light on the involved actors and decisions.…”
Section: Related Workmentioning
confidence: 99%