2019
DOI: 10.3389/fdata.2019.00013
|View full text |Cite
|
Sign up to set email alerts
|

Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries

Abstract: Social data in digital form-including user-generated content, expressed or implicit relations between people, and behavioral traces-are at the core of popular applications and platforms, driving the research agenda of many researchers. The promises of social data are many, including understanding "what the world thinks" about a social issue, brand, celebrity, or other entity, as well as enabling better decision-making in a variety of fields including public policy, healthcare, and economics. Many academics and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
180
0
2

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 395 publications
(220 citation statements)
references
References 245 publications
2
180
0
2
Order By: Relevance
“…We must often repurpose born-digital data (e.g., Twitter was not designed to measure public opinion), but data biases may lead to spurious results and limit justification for generalization (Olteanu et al, 2019). In particular, data collected via black box APIs designed for commercial, not research, purposes are likely to introduce biases into the inferences we draw, and the closed nature of these APIs means we rarely know what biases are introduced, let alone how severely they might impact our research (Morstatter et al, 2013;Tromble et al, 2017).…”
Section: Data Qualitymentioning
confidence: 99%
“…We must often repurpose born-digital data (e.g., Twitter was not designed to measure public opinion), but data biases may lead to spurious results and limit justification for generalization (Olteanu et al, 2019). In particular, data collected via black box APIs designed for commercial, not research, purposes are likely to introduce biases into the inferences we draw, and the closed nature of these APIs means we rarely know what biases are introduced, let alone how severely they might impact our research (Morstatter et al, 2013;Tromble et al, 2017).…”
Section: Data Qualitymentioning
confidence: 99%
“…Data-driven domains in computer science, including many applied artificial intelligence/machine learning domains, might profit from psychology's rich tradition of conducting carefully designed studies with human subjects, and psychologists' focus on reliable and valid data gathering methods, therefore giving opportunities for stronger empirical scientific foundations and better data quality (e.g., Lipton & Steinhardt, 2019). Furthermore, psychologists have studied biases and fairness issues in a variety of settings for decades (e.g., in the personnel decision-making process, see for instance Harvey, 1938), which is a current and highly relevant topic for computer scientists (e.g., Olteanu, Castillo, Diaz, & Kıcıman, 2019). For psychologists, potentials unfold when employing novel data gathering tools, cleaning unstructured data from various sources for further analyses, and using alternative data analysis approaches that are still uncommon within psychological practice and research (e.g., decision trees, and deep learning approaches).…”
Section: Why Should You Be Interested In Collaborating Withmentioning
confidence: 99%
“…On the one hand, their contribution can provide important insights in the interpretation of data. On the other hand, self-selection issues which apply to standard experimental settings (Henrich et al, 2010;Olteanu et al, 2019) are to be considered through different lenses when participation is enhanced.…”
Section: Social Impactmentioning
confidence: 99%