2019
DOI: 10.48550/arxiv.1907.08228
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

TED-On: A Total Error Framework for Digital Traces of Human Behavior on Online Platforms

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
8
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(8 citation statements)
references
References 0 publications
0
8
0
Order By: Relevance
“…Even when we employ the "exclusion query" methodology, Facebook audience estimates consistently under-count populations in rural areas and over-count males (with the gender imbalance being more pronounced in the rural municipalities). Many sources of bias may be at play: (1) self-selection bias in the user base of Facebook, thought to be younger and more tech savvy (but which may be shifting toward older users 11 ), (2) measurement bias in the sensitivity of Facebook's user attribute extraction pipeline (for example, the gender statistic may be swayed by numerous fake accounts 12 ), and (3) financial incentives to inflate the number of users who may see an advertisement (being an important revenue stream, Facebook's advertising revenue exceeded $55 billion in 2018 13 ), among others already explored in literature on online data representativeness [37,42,46,51]. Although it is not likely that Facebook will release details of its user attribute inference code, demographers may be able to adjust for the larger sample biases of Facebook user base, as recommended in [13].…”
Section: Discussionmentioning
confidence: 99%
“…Even when we employ the "exclusion query" methodology, Facebook audience estimates consistently under-count populations in rural areas and over-count males (with the gender imbalance being more pronounced in the rural municipalities). Many sources of bias may be at play: (1) self-selection bias in the user base of Facebook, thought to be younger and more tech savvy (but which may be shifting toward older users 11 ), (2) measurement bias in the sensitivity of Facebook's user attribute extraction pipeline (for example, the gender statistic may be swayed by numerous fake accounts 12 ), and (3) financial incentives to inflate the number of users who may see an advertisement (being an important revenue stream, Facebook's advertising revenue exceeded $55 billion in 2018 13 ), among others already explored in literature on online data representativeness [37,42,46,51]. Although it is not likely that Facebook will release details of its user attribute inference code, demographers may be able to adjust for the larger sample biases of Facebook user base, as recommended in [13].…”
Section: Discussionmentioning
confidence: 99%
“…Finally, this evidence suggests that, just as the old 140character limit (Gligorić, Anderson, and West 2018), the new 280-character limit impacts the writing style and content of tweets (Sen et al 2019). The length constraint and the resulting tweet-length distribution remain an important dimension to consider in studies using Twitter data, as after the switch the number of characters remains an important variable, correlated with important properties of tweets including topics, language, device, and the likelihood of being an automated source of tweets.…”
Section: Discussionmentioning
confidence: 91%
“…The concept of "total error" arose from the survey methodology literature (Groves & Lyberg, 2010), where "total survey error" (TSE) is the standard framework for designing, evaluating, and optimizing data collection (Biemer, 2010;Biemer & Lyberg, 2003). Amaya et al (2020) extended this framework to generic "big data" studies, Sen et al (2019) extended this framework to digital trace data and Beinhauer et al (2020) extended the framework to sensor data.…”
Section: Using Data-download Packages (Ddps) For Scientific Researchmentioning
confidence: 99%