Govind Thattai scite author profile

Language-guided robots performing home and office tasks must navigate in and interact with the world. Grounding language instructions against visual observations and actions to take in an environment is an open challenge. We present Embodied BERT (EmBERT), a transformer-based model which can attend to high-dimensional, multi-modal inputs across long temporal horizons for languageconditioned task completion. 1 Additionally, we bridge the gap between successful object-centric navigation models used for non-interactive agents and the languageguided visual task completion benchmark, ALFRED, by introducing object navigation targets for EmBERT training. We achieve competitive performance on the ALFRED benchmark, and EmBERT marks the first transformer-based model to successfully handle the long-horizon, dense, multi-modal histories of ALFRED, and the first ALFRED model to utilize object-centric navigation targets.1 https://github.com/amazon-research/embert Preprint. Under review.

show abstract

Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering

Gao

Ping

Thattai

et al. 2022

View full text Add to dashboard Cite

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Gao

Gong

et al. 2022

IEEE Robot. Autom. Lett.

View full text Add to dashboard Cite

We introduce Alexa Arena, a user-centric simulation platform for Embodied AI (EAI) research. Alexa Arena provides a variety of multi-room layouts and interactable objects, for the creation of human-robot interaction (HRI) missions. With user-friendly graphics and control mechanisms, Alexa Arena supports the development of gamified robotic tasks readily accessible to general human users, thus opening a new venue for highefficiency HRI data collection and EAI system evaluation. Along with the platform, we introduce a dialog-enabled instruction-following benchmark and provide baseline results for it. We make Alexa Arena 1 publicly available to facilitate research in building generalizable and assistive embodied agents.

show abstract

Interactive Teaching for Conversational AI

Ping¹,

Niu²,

Thattai³

et al. 2020

Preprint

View full text Add to dashboard Cite

Current conversational AI systems aim to understand a set of pre-designed requests and execute related actions, which limits them to evolve naturally and adapt based on human interactions. Motivated by how children learn their first language interacting with adults, this paper describes a new Teachable AI system that is capable of learning new language nuggets called concepts, directly from end users using live interactive teaching sessions. The proposed setup uses three models to: a) Identify gaps in understanding automatically during live conversational interactions, b) Learn the respective interpretations of such unknown concepts from live interactions with users, and c) Manage a classroom sub-dialogue specifically tailored for interactive teaching sessions. We propose state-of-the-art transformer based neural architectures of models, fine-tuned on top of pre-trained models, and show accuracy improvements on the respective components. We demonstrate that this method is very promising in leading way to build more adaptive and personalized language understanding models.

show abstract

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

Lawton¹,

Kumar²,

Thattai³

et al. 2023

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Govind Thattai

Embodied BERT: A Transformer Model for Embodied, Language-guided Visual Task Completion

Transform-Retrieve-Generate: Natural Language-Centric Outside-Knowledge Visual Question Answering

DialFRED: Dialogue-Enabled Agents for Embodied Instruction Following

Interactive Teaching for Conversational AI

Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

Contact Info

Product

Resources

About