2022
DOI: 10.31234/osf.io/6mkjy
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Human-like property induction is a challenge for large language models

Abstract: The impressive recent performance of large language models such as GPT-3 has led many to wonder to what extent they can serve as models of general intelligence or are similar to human cognition. We address this issue by applying GPT-3 to a classic problem in human inductive reasoning known as property induction. Our results suggest that while GPT-3 can qualitatively mimic human performance for some inductive phenomena (especially those that depend primarily on similarity relationships), it reasons in a qualita… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 11 publications
(11 citation statements)
references
References 20 publications
0
11
0
Order By: Relevance
“…Indeed, recently there has been a push towards creating large benchmarks to assess the capability of foundation models [48][49][50] . Large language models have also been studied using other methods from cognitive psychology, such as property induction 51 , thinking-out-loud protocols 52 , or learning causal over-hypotheses 53 , where researchers have come to similar conclusions. Methods from cognitive psychology have also previously been applied to understand other deep learning models' behavior 54 .…”
Section: Discussionmentioning
confidence: 99%
“…Indeed, recently there has been a push towards creating large benchmarks to assess the capability of foundation models [48][49][50] . Large language models have also been studied using other methods from cognitive psychology, such as property induction 51 , thinking-out-loud protocols 52 , or learning causal over-hypotheses 53 , where researchers have come to similar conclusions. Methods from cognitive psychology have also previously been applied to understand other deep learning models' behavior 54 .…”
Section: Discussionmentioning
confidence: 99%
“…Please answer "Yes" or "No." This prompt was shown to generate the most human-like performance out of all the prompts in Han et al (2022). We test whether the model's first five output tokens include "Yes," "yes," "YES," "No," "no," and "NO" and subsequently calculate the probability attached to "Yes," "yes" and "YES" versus "No," "no" and "NO."…”
Section: Natural Language Inference Modelsmentioning
confidence: 99%
“…Most of the NLI models also captured this effect, likely because they are able to encode typicality relations and use these relations for generalization (Han et al, 2022;Misra et al, 2022). However, GPT-DaVinci and BART-MNLI failed to do so.…”
Section: Empirical Regularitiesmentioning
confidence: 99%
“…Assessing the capabilities of Arti cial Intelligence (AI) has been an important research direction since the inception of AI and this became more urgent after large language models, especially GPT, attracted popular attention (Bubeck et al, 2023). Most research focuses on cognitive capabilities, such as reasoning (Dasgupta, et al, 2022), induction (Han, et al, 2022), and creativity (Stevenson, et al, 2022;Uludag, 2023). Recently, Bubeck et al (2023) conducted a wide range of tests on GPT-4, the latest model developed by OpenAI, exploring its mathematical abilities, multimodal capabilities, tool usage, and coding.…”
Section: Introductionmentioning
confidence: 99%