TED-On: A Total Error Framework for Digital Traces of Human Behavior on Online Platforms

Sen, Indira; Floeck, Fabian; Weller, Katrin; Weiß, Bernd; Wagner, Claudia

doi:10.48550/arxiv.1907.08228

Cited by 8 publications

(8 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even when we employ the "exclusion query" methodology, Facebook audience estimates consistently under-count populations in rural areas and over-count males (with the gender imbalance being more pronounced in the rural municipalities). Many sources of bias may be at play: (1) self-selection bias in the user base of Facebook, thought to be younger and more tech savvy (but which may be shifting toward older users 11 ), (2) measurement bias in the sensitivity of Facebook's user attribute extraction pipeline (for example, the gender statistic may be swayed by numerous fake accounts 12 ), and (3) financial incentives to inflate the number of users who may see an advertisement (being an important revenue stream, Facebook's advertising revenue exceeded $55 billion in 2018 13 ), among others already explored in literature on online data representativeness [37,42,46,51]. Although it is not likely that Facebook will release details of its user attribute inference code, demographers may be able to adjust for the larger sample biases of Facebook user base, as recommended in [13].…”

Section: Discussionmentioning

confidence: 99%

Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide

Rama

Mejova

Tizzoni

et al. 2020

Proceedings of the Web Conference 2020

View full text Add to dashboard Cite

In the global move toward urbanization, making sure the people remaining in rural areas are not left behind in terms of development and policy considerations is a priority for governments worldwide. However, it is increasingly challenging to track important statistics concerning this sparse, geographically dispersed population, resulting in a lack of reliable, up-to-date data. In this study, we examine the usefulness of the Facebook Advertising platform, which offers a digital "census" of over two billions of its users, in measuring potential rural-urban inequalities. We focus on Italy, a country where about 30% of the population lives in rural areas. First, we show that the population statistics that Facebook produces suffer from instability across time and incomplete coverage of sparsely populated municipalities. To overcome such limitation, we propose an alternative methodology for estimating Facebook Ads audiences that nearly triples the coverage of the rural municipalities from 19% to 55% and makes feasible fine-grained sub-population analysis. Using official national census data, we evaluate our approach and confirm known significant urban-rural divides in terms of educational attainment and income. Extending the analysis to Facebook-specific user "interests" and behaviors, we provide further insights on the divide, for instance, finding that rural areas show a higher interest in gambling. Notably, we find that the most predictive features of income in rural areas differ from those for urban centres, suggesting researchers need to consider a broader range of attributes when examining rural wellbeing. The findings of this study illustrate the necessity of improving existing tools and methodologies to include under-represented populations in digital demographic studies -the failure to do so could result in misleading observations, conclusions, and most importantly, policies.

show abstract

Section: Discussionmentioning

confidence: 99%

Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide

Rama

Mejova

Tizzoni

et al. 2020

Proceedings of the Web Conference 2020

View full text Add to dashboard Cite

show abstract

“…Finally, this evidence suggests that, just as the old 140character limit (Gligorić, Anderson, and West 2018), the new 280-character limit impacts the writing style and content of tweets (Sen et al 2019). The length constraint and the resulting tweet-length distribution remain an important dimension to consider in studies using Twitter data, as after the switch the number of characters remains an important variable, correlated with important properties of tweets including topics, language, device, and the likelihood of being an automated source of tweets.…”

Section: Discussionmentioning

confidence: 91%

Adoption of Twitter's New Length Limit: Is 280 the New 140?

Gligorić,

Anderson,

West

2020

Preprint

View full text Add to dashboard Cite

In November 2017, Twitter doubled the maximum allowed tweet length from 140 to 280 characters, a drastic switch on one of the world's most influential social media platforms. In the first long-term study of how the new length limit was adopted by Twitter users, we ask: Does the effect of the new length limit resemble that of the old one? Or did the doubling of the limit fundamentally change how Twitter is shaped by the limited length of posted content? By analyzing Twitter's publicly available 1% sample over a period of around 3 years, we find that, when the length limit was raised from 140 to 280 characters, the prevalence of tweets around 140 characters dropped immediately, while the prevalence of tweets around 280 characters rose steadily for about 6 months. Despite this rise, tweets approaching the length limit have been far less frequent after than before the switch. We find widely different adoption rates across languages and client-device types. The prevalence of tweets around 140 characters before the switch in a given language is strongly correlated with the prevalence of tweets around 280 characters after the switch in the same language, and very long tweets are vastly more popular on Web clients than on mobile clients. Moreover, tweets of around 280 characters after the switch are syntactically and semantically similar to tweets of around 140 characters before the switch, manifesting patterns of message squeezing in both cases. Taken together, these findings suggest that the new 280-character limit constitutes a new, less intrusive version of the old 140-character limit. The length limit remains an important factor that should be considered in all studies using Twitter data.

show abstract

“…The concept of "total error" arose from the survey methodology literature (Groves & Lyberg, 2010), where "total survey error" (TSE) is the standard framework for designing, evaluating, and optimizing data collection (Biemer, 2010;Biemer & Lyberg, 2003). Amaya et al (2020) extended this framework to generic "big data" studies, Sen et al (2019) extended this framework to digital trace data and Beinhauer et al (2020) extended the framework to sensor data.…”

Section: Using Data-download Packages (Ddps) For Scientific Researchmentioning

confidence: 99%

Digital trace data collection through data donation

Boeschoten,

Ausloos,

Moeller

et al. 2020

Preprint

View full text Add to dashboard Cite

A potentially powerful method of social-scientific data collection and investigation has been created by an unexpected institution: the law. Article 15 of the EU's 2018 General Data Protection Regulation (GDPR) mandates that individuals have electronic access to a copy of their personal data, and all major digital platforms now comply with this law by providing users with "data download packages" (DDPs). Through voluntary donation of DDPs, all data collected by public and private entities during the course of citizens' digital life can be obtained and analyzed to answer social-scientific questions -with consent. Thus, consented DDPs open the way for vast new research opportunities. However, while this entirely new method of data collection will undoubtedly gain popularity in the coming years, it also comes with its own questions of representativeness and measurement quality, which are often evaluated systematically by means of an error framework. Therefore, in this paper we provide a blueprint for digital trace data collection using DDPs, and devise a "total error framework" for such projects. Our error framework for digital trace data collection through data donation is intended to facilitate high quality social-scientific investigations using DDPs while critically reflecting its unique methodological challenges and sources of error. In addition, we provide a quality control checklist to guide researchers in leveraging the vast opportunities afforded by this new mode of investigation.

show abstract

TED-On: A Total Error Framework for Digital Traces of Human Behavior on Online Platforms

Cited by 8 publications

References 0 publications

Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide

Facebook Ads as a Demographic Tool to Measure the Urban-Rural Divide

Adoption of Twitter's New Length Limit: Is 280 the New 140?

Digital trace data collection through data donation

Contact Info

Product

Resources

About