2023
DOI: 10.31235/osf.io/5ecfa
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Synthetic Replacements for Human Survey Data? The Perils of Large Language Models

Abstract: Large Language Models (LLMs) offer new research possibilities for social scientists, but their potential as “synthetic data” is still largely unknown. In this note, we investigate the potential of using the popular closed-source LLM ChatGPT to measure human opinion. We show that although ChatGPT-generated opinions are similar to human opinion for some groups of US respondents, synthetic opinions also significantly exaggerate the extremity and certainty of partisan and social divisions. Responses from prompted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 8 publications
(18 citation statements)
references
References 28 publications
0
13
0
Order By: Relevance
“…Finally, standard limitations in the everyday use of LLMs also apply to their usage for classification tasks. Biases inherent in the training of these models (Bisbee et al, 2023;Motoki et al, 2024) may seep into text annotation, especially ones more specific or contentious than the classifications done here. Researchers should be mindful of these potential biases and carefully consider their impact on potential outcomes.…”
Section: Discussionmentioning
confidence: 95%
“…Finally, standard limitations in the everyday use of LLMs also apply to their usage for classification tasks. Biases inherent in the training of these models (Bisbee et al, 2023;Motoki et al, 2024) may seep into text annotation, especially ones more specific or contentious than the classifications done here. Researchers should be mindful of these potential biases and carefully consider their impact on potential outcomes.…”
Section: Discussionmentioning
confidence: 95%
“…Some argue such “silicon samples” could be used to produce more diverse samples than the convenience samples utilized by so many university researchers—and may also allow researchers to administer lengthier survey instruments, since LLMs have potentially unlimited attention spans ( 12 ). At the same time, more recent research indicates GPT 3.5 turbo produces accurate mean estimates of attitudes within a population, but understates variances—exaggerating extreme attitudes ( 13 ). Another study indicates LLMs exhibit an affirmative bias in yes/no questions ( 14 ).…”
Section: Opportunities For Social Science With Generative Aimentioning
confidence: 99%
“…Another study indicates LLMs exhibit an affirmative bias in yes/no questions ( 14 ). Studies also indicate LLMs represent some demographic subgroups more accurately than others ( 13 , 15 ). Yet these studies do not employ the latest models, and only focus on one country: the United States.…”
Section: Opportunities For Social Science With Generative Aimentioning
confidence: 99%
See 2 more Smart Citations