PaLM-E: An Embodied Multimodal Language Model

Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; Lynch, Corey; Chowdhery, Aakanksha; Ichter, Brian; Wahid, Ayzaan; Tompson, Jonathan; Vuong, Quan V.; Yu, Tong; Huang, Wenlong; Chebotar, Yevgen; Sermanet, Pierre; Duckworth, Daniel; Levine, Sergey; Vanhoucke, Vincent; Hausman, Karol; Toussaint, Marc; Greff, Klaus; Zeng, Andy; Mordatch, Igor; Florence, Pete

doi:10.48550/arxiv.2303.03378

Cited by 90 publications

(108 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Consistent with this idea, our study showed that the human-like affordance boundary became gradually obvious with the increase in the model size of the LLMs (i.e., greater information processing capacity). Future study is needed to test this possibility, possibly by providing sensorimotor information novel to humans, or instilling a different virtual body scheme through training corpus of language, or attaching LLMs to a real robot (Driess et al, 2023), to see if the affordance boundary observed here shifts with this altered body "metric". Taking our finding with physically embodied humans and linguistically disembodied LLMs, our findings suggest that the embodied cognition and symbolic processing of languages may be more closely and fundamentally related than we think: perception-action problems and language problems can be treated as the same kind of thing (Wilson & Golonka, 2013).…”

Section: Discussionmentioning

confidence: 99%

Body size as a metric for the affordable world

Feng

et al. 2023

Preprint

View full text Add to dashboard Cite

The physical body of an organism serve as a vital interface for interactions between the organism and its environment, significantly shaping its intelligence. Here we investigated the impact of human body size on the perception of affordance that is the action possibilities offered by environment to humans. Our findings revealed that body size delineated a distinct boundary on affordance similarity, dividing objects with continuous real-world sizes into two discrete categories: objects on either side of the boundary afford distinct sets of affordances. In addition, the boundary adjusted in accordance with changes in imagined body size, indicating a close link between body size and affordance perception. Intriguingly, the boundary may not be exclusively derived from organism-environment interactions, as ChatGPT, a large language model lacking a physical embodiment, exhibited a modest yet comparable affordance boundary at the scale of human body size. This implies that a human-like body schema may emerge in ChatGPT through exposure to linguistic materials alone. A subsequent fMRI experiment explored the functionality of the boundary, determining that only the affordances of objects within the range of body size were represented in the visual streams of the brain. This suggests that objects capable of being manipulable are the only objects capable of offering affordance in the eyes of the organism. In summary, our study presents an embodied perspective on defining object-ness in an affordable world, advocating the concept of embodied cognition for understanding the emergence of intelligence under the constraints of an organism's physical attributes.

show abstract

Section: Discussionmentioning

confidence: 99%

Body size as a metric for the affordable world

Feng

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Since there are 𝑅 𝑡 erroneous sequences in 𝐿 𝑡 , the expected number of erroneous sequences in 𝐿 𝑡 +1 (given 𝐿 𝑡 ) is bounded above by 𝑀𝜖𝑅 𝑡 + (𝑀 − 1)𝜖. This shows the inequality in (5). Taking the expectation on both sides of (5) leads to…”

Section: A Sufficient Condition For Guaranteed Accuracymentioning

confidence: 93%

“…LLMs such as and PaLM-E [5] take a sequence of tokens as their input (prompts) and generate another sequence of tokens as their output (answers). To model these, denote by I (resp.…”

Section: Mathematical Formulation For Llmsmentioning

confidence: 99%

A Simple Explanation for the Phase Transition in Large Language Models with List Decoding

Chang¹

2023

Preprint

View full text Add to dashboard Cite

Various recent experimental results show that large language models (LLM) exhibit emergent abilities that are not present in small models. System performance is greatly improved after passing a certain critical threshold of scale. In this letter, we provide a simple explanation for such a phase transition phenomenon. For this, we model an LLM as a sequence-to-sequence random function. Instead of using instant generation at each step, we use a list decoder that keeps a list of candidate sequences at each step and defers the generation of the output sequence at the end. We show that there is a critical threshold such that the expected number of erroneous candidate sequences remains bounded when an LLM is below the threshold, and it grows exponentially when an LLM is above the threshold. Such a threshold is related to the basic reproduction number in a contagious disease.

show abstract

“…Another potential type of future architecture is a monolithic architecture, which only contains a single big foundation model capable of performing a variety of tasks by incorporating different types of sensor data for cross-training. An example of this type of architecture is PaLM-E [5], which is used for performing language, visual-language, and reasoning tasks. In this type of architecture, no external components are required, including prompt components.…”

Section: Architecture Evolution Of Ai Systemsmentioning

confidence: 99%

Responsible-AI-by-Design: A Pattern Collection for Designing Responsible Artificial Intelligence Systems

Zhu

et al. 2023

IEEE Softw.

View full text Add to dashboard Cite

The release of ChatGPT, Bard, and other large language model (LLM)-based chatbots has drawn huge attention on foundations models worldwide. There is a growing trend that foundation models will serve as the fundamental building blocks for most of the future AI systems. However, incorporating foundation models in AI systems raises significant concerns about responsible AI due to their black box nature and rapidly advancing super-intelligence. Additionally, the foundation model's growing capabilities can eventually absorb the other components of AI systems, introducing the moving boundary and interface evolution challenges in architecture design. To address these challenges, this paper proposes a pattern-oriented responsible-AI-by-design reference architecture for designing foundation model-based AI systems. Specially, the paper first presents an architecture evolution of AI systems in the era of foundation models, from "foundation-model-as-a-connector" to "foundationmodel-as-a-monolithic architecture". The paper then identifies the key design decision points and proposes a pattern-oriented reference architecture to provide reusable responsible-AI-by-design architectural solutions to address the new architecture evolution and responsible AI challenges. The patterns can be embedded as product features of foundation model-based AI systems and can enable organisations to capitalise on the potential of foundation models while minimising associated risks.

show abstract

PaLM-E: An Embodied Multimodal Language Model

Cited by 90 publications

References 0 publications

Body size as a metric for the affordable world

Body size as a metric for the affordable world

A Simple Explanation for the Phase Transition in Large Language Models with List Decoding

Responsible-AI-by-Design: A Pattern Collection for Designing Responsible Artificial Intelligence Systems

Contact Info

Product

Resources

About