2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) 2014
DOI: 10.1109/asonam.2014.6921610
|View full text |Cite
|
Sign up to set email alerts
|

On the endogenesis of Twitter's Spritzer and Gardenhose sample streams

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
26
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 30 publications
(26 citation statements)
references
References 14 publications
0
26
0
Order By: Relevance
“…Lazer et al [41] revealed that Google does not store the search term typed by the user but the search term selected based on suggestions, which has tremendous implications for the analysis of human behavior based on those data. Our work focuses on issues resulting from sampling [42,43] of Twitter data. Since Twitter does not reveal how data sampling is performed, the use of Twitter data is generally regarded as highly problematic, especially in the social sciences [42,[44][45][46].…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Lazer et al [41] revealed that Google does not store the search term typed by the user but the search term selected based on suggestions, which has tremendous implications for the analysis of human behavior based on those data. Our work focuses on issues resulting from sampling [42,43] of Twitter data. Since Twitter does not reveal how data sampling is performed, the use of Twitter data is generally regarded as highly problematic, especially in the social sciences [42,[44][45][46].…”
Section: Related Workmentioning
confidence: 99%
“…Our work focuses on issues resulting from sampling [42,43] of Twitter data. Since Twitter does not reveal how data sampling is performed, the use of Twitter data is generally regarded as highly problematic, especially in the social sciences [42,[44][45][46]. Several studies discuss working, compositions and possible biases of data [47,48] and a "reverse-engineered" model has been developed for the Sample API, which indicates that the sampling is based on a millisecond time window and that the timestamp at which the Tweet arrived at Twitter's servers is coded into the Tweet's ID [42,43].…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Figure 1(a) shows a typical Twitter-Day for English content of our data set. We are able to show the amount of total tweets in the firehose (i.e., the stream of all public tweets), taking advantage of the nature of Twitter's sample stream that we described in [19] and also captured for this analysis. Also the proportion of our data set can be derived from the figure.…”
Section: Data Sourcementioning
confidence: 99%
“…For our study, we accessed tweets archived from a Twitter feed licensed to the University of Sheffield from July 2009 to September 2014 inclusive. These comprise a random 10% sample of all tweets (Kergl et al, 2014) and are kept in hourly or daily files. The sample was searched for terms related to mephedrone by using Aho-Corasick (1975) search first to losslessly reduce the number of records processed in detail.…”
Section: Twitter Data Resourcementioning
confidence: 99%