2023
DOI: 10.48550/arxiv.2303.03378
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PaLM-E: An Embodied Multimodal Language Model

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
61
0
1

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 90 publications
(108 citation statements)
references
References 0 publications
0
61
0
1
Order By: Relevance
“…Consistent with this idea, our study showed that the human-like affordance boundary became gradually obvious with the increase in the model size of the LLMs (i.e., greater information processing capacity). Future study is needed to test this possibility, possibly by providing sensorimotor information novel to humans, or instilling a different virtual body scheme through training corpus of language, or attaching LLMs to a real robot (Driess et al, 2023), to see if the affordance boundary observed here shifts with this altered body "metric". Taking our finding with physically embodied humans and linguistically disembodied LLMs, our findings suggest that the embodied cognition and symbolic processing of languages may be more closely and fundamentally related than we think: perception-action problems and language problems can be treated as the same kind of thing (Wilson & Golonka, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…Consistent with this idea, our study showed that the human-like affordance boundary became gradually obvious with the increase in the model size of the LLMs (i.e., greater information processing capacity). Future study is needed to test this possibility, possibly by providing sensorimotor information novel to humans, or instilling a different virtual body scheme through training corpus of language, or attaching LLMs to a real robot (Driess et al, 2023), to see if the affordance boundary observed here shifts with this altered body "metric". Taking our finding with physically embodied humans and linguistically disembodied LLMs, our findings suggest that the embodied cognition and symbolic processing of languages may be more closely and fundamentally related than we think: perception-action problems and language problems can be treated as the same kind of thing (Wilson & Golonka, 2013).…”
Section: Discussionmentioning
confidence: 99%
“…Since there are 𝑅 𝑡 erroneous sequences in 𝐿 𝑡 , the expected number of erroneous sequences in 𝐿 𝑡 +1 (given 𝐿 𝑡 ) is bounded above by 𝑀𝜖𝑅 𝑡 + (𝑀 − 1)𝜖. This shows the inequality in (5). Taking the expectation on both sides of (5) leads to…”
Section: A Sufficient Condition For Guaranteed Accuracymentioning
confidence: 93%
“…LLMs such as and PaLM-E [5] take a sequence of tokens as their input (prompts) and generate another sequence of tokens as their output (answers). To model these, denote by I (resp.…”
Section: Mathematical Formulation For Llmsmentioning
confidence: 99%
“…Another potential type of future architecture is a monolithic architecture, which only contains a single big foundation model capable of performing a variety of tasks by incorporating different types of sensor data for cross-training. An example of this type of architecture is PaLM-E [5], which is used for performing language, visual-language, and reasoning tasks. In this type of architecture, no external components are required, including prompt components.…”
Section: Architecture Evolution Of Ai Systemsmentioning
confidence: 99%