Embodied Collaborative Referring Expression Generation in Situated Human-Robot Interaction

Fang, Rui; Doering, Malcolm; Chai, Joyce

doi:10.1145/2696454.2696467

Cited by 62 publications

(37 citation statements)

References 21 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many of the early works in this space focused on relatively limited datasets, using synthesized images of objects in artificial scenes or limited sets of real-world objects in simplified environments [20,7,15]. Recently, the research focus has shifted to more complex natural image datasets and has expanded to include the Referring Expression Comprehension task [13,19,31] as well as to real-world interactions with robotics [4,3]. One reason this has become feasible is that several large-scale REG datasets have been collected at a scale where deep learning models can be applied.…”

Section: Introductionmentioning

confidence: 99%

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Tan

Bansal

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

247

262

View full text Add to dashboard Cite

Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer introduces a reward function to guide sampling of more discriminative expressions. The listenerspeaker modules are trained jointly in an end-to-end learning framework, allowing the modules to be aware of one another during learning while also benefiting from the discriminative reinforcer's feedback. We demonstrate that this unified framework and training achieves state-of-theart results for both comprehension and generation on three referring expression datasets. Project and demo page: https://vision.cs.unc.edu/refer.

show abstract

Section: Introductionmentioning

confidence: 99%

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Tan

Bansal

et al. 2017

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

247

262

View full text Add to dashboard Cite

show abstract

“…These techniques have been successfully applied in HRI for children with Autism Spectrum Disorders (ASD) [100], [101]. Attention formulation could be achieved either by deictic words or vocal expressions such as "look here, see me, look right, or next one", by pointing gesture, by using line of sight, by using communication cue such as eye gazing [102], [103] or by combination of vocal and gesture commands [104], [105].…”

Section: A Joint Attention Formulationmentioning

confidence: 99%

Physical Human–Robot Collaboration: Robotic Systems, Learning Methods, Collaborative Strategies, Sensors, and Actuators

Ogenyi

Liu

Yang

et al. 2021

IEEE Trans. Cybern.

View full text Add to dashboard Cite

This paper presents a state-of-the-art survey on robotic systems, sensors, actuators and collaborative strategies for Physical Human-Robot Collaboration (pHRC). The paper starts with an overview of some robotic systems with cuttingedge technologies (sensors and actuators) suitable for pHRC operations and the intelligent assist devices employed in pHRC. Sensors being among the essential components to establish communication between a human and a robotic system are surveyed. The sensor supplies the signal needed to drive the robotic actuators. The survey reveals that the design of new generation collaborative robots and other intelligent robotic systems has paved the way for sophisticated learning techniques and control algorithms to be deployed in pHRC. Furthermore, it revealed relevant components needed to be considered for effective pHRC to be accomplished. Finally, a discussion of the major advances made, some research directions, and future challenges are presented.

show abstract

“…Finally, previous work addressed the difficulty of common grounding due to the perceptual difference between humans and machines (Liu, Fang, and Chai 2012;Fang, Doering, and Chai 2015). However, such problems are specific to human-machine dialogues, and instead we focus on a more general difficulty of common grounding due to complex ambiguity and uncertainty introduced by continuous and partially-observable context.…”

Section: Related Workmentioning

confidence: 99%

A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context

Udagawa

Aizawa

2019

AAAI

View full text Add to dashboard Cite

Common grounding is the process of creating, repairing and updating mutual understandings, which is a critical aspect of sophisticated human communication. However, traditional dialogue systems have limited capability of establishing common ground, and we also lack task formulations which introduce natural difficulty in terms of common grounding while enabling easy evaluation and analysis of complex models. In this paper, we propose a minimal dialogue task which requires advanced skills of common grounding under continuous and partially-observable context. Based on this task formulation, we collected a largescale dataset of 6,760 dialogues which fulfills essential requirements of natural language corpora. Our analysis of the dataset revealed important phenomena related to common grounding that need to be considered. Finally, we evaluate and analyze baseline neural models on a simple subtask that requires recognition of the created common ground. We show that simple baseline models perform decently but leave room for further improvement. Overall, we show that our proposed task will be a fundamental testbed where we can train, evaluate, and analyze dialogue system's ability for sophisticated common grounding.

show abstract

Embodied Collaborative Referring Expression Generation in Situated Human-Robot Interaction

Cited by 62 publications

References 21 publications

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Physical Human–Robot Collaboration: Robotic Systems, Learning Methods, Collaborative Strategies, Sensors, and Actuators

A Natural Language Corpus of Common Grounding under Continuous and Partially-Observable Context

Contact Info

Product

Resources

About