2018
DOI: 10.1140/epjds/s13688-018-0178-0
|View full text |Cite
|
Sign up to set email alerts
|

Tampering with Twitter’s Sample API

Abstract: Social media data is widely analyzed in computational social science. Twitter, one of the largest social media platforms, is used for research, journalism, business, and government to analyze human behavior at scale. Twitter offers data via three different Application Programming Interfaces (APIs). One of which, Twitter's Sample API, provides a freely available 1% and a costly 10% sample of all Tweets. These data are supposedly random samples of all platform activity. However, we demonstrate that, due to the n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
87
0
2

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 115 publications
(96 citation statements)
references
References 48 publications
1
87
0
2
Order By: Relevance
“…Therefore, it is important to highlight the difficulties associated with extracting the proper data to ensure insightful scientific results. Unfortunately, there are often restrictions on the amount and type of data that can be acquired from social media platforms [45], and data quality is also a problem because of the level of bias [48]. Therefore, these features need to be considered in social media research and especially when studying important political processes.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, it is important to highlight the difficulties associated with extracting the proper data to ensure insightful scientific results. Unfortunately, there are often restrictions on the amount and type of data that can be acquired from social media platforms [45], and data quality is also a problem because of the level of bias [48]. Therefore, these features need to be considered in social media research and especially when studying important political processes.…”
Section: Discussionmentioning
confidence: 99%
“…In spite of scale, social media data generally entail self-selected samples, since subjects are free to choose when to participate and what content to submit. This bias is compounded by a mix of access restrictions imposed by social media platforms (174). As a result, researchers are prone to use so-called convenience samples, i.e., social media datasets that are, due to standardization efforts, more widespread, accessible, and convenient to use, although potentially not representative of the wider population.…”
Section: Limitationsmentioning
confidence: 99%
“…Social media content may also be subject to lexical bias (123) that could cause sentiment data to overrepresent positive sentiment. In addition, platform-specific factors may alter user behavior (174,175) and lead to bias in subsequent data analysis. Indeed, users may be encouraged to engage in profile and reputation management by establishing different online personas to highlight their individuality and qualities that are perceived as desirable (176).…”
Section: Limitationsmentioning
confidence: 99%
“…Social media has become a popular source of data for researchers mainly due to the ease with which data can be harvested using new tools and APIs (Pfeffer, Mayer & Morstatter, 2018); the sheer volume of data available (Ahmed, 2017;Murphy, 2017); and the richness of cultural and social data (Stewart, 2017). However, the matter pertaining to how to address the ethical issues related to data collection from social media in general and Twitter in particular is not yet settled (Kelley, Cranshaw & Sleeper, 2013).…”
Section: Ethical Considerationsmentioning
confidence: 99%