Emergent Communication in a Multi-Modal, Multi-Step Referential Game

Evtimova, Katrina; Drozdov, Andrew; Kiela, Douwe; Cho, Kyunghyun

doi:10.48550/arxiv.1705.10369

Cited by 26 publications

(19 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To summarize, we propose PatchGame, a referential game formulation where given an image, the speaker sends discrete signal in terms of mid-level patches, and the listener embeds these symbols to match them with another view of the same image in the presence of distractors. Compared to previous works [22,30,45], we make the following key changes:…”

Section: Speaker Speaker Viewmentioning

confidence: 99%

“…• Agents in the some of the prior works [22,30,45] have access to a pre-trained network, such as AlexNet [44] or VGG [69], for extracting features from images. In this work, the agents rely on training on a large scale image dataset, and invariance introduced by various image augmentations, to learn the language in a self-supervised way [53].…”

Section: Speaker Speaker Viewmentioning

confidence: 99%

“…They used a fixed-sized message composed of a large vocabulary for their communication. Bouchacourt and Baroni [9], Evtimova et al [22], Havrylov and Titov [30] relax this assumption and allow communication via variable-length sequences. Havrylov and Titov [30] allows the speaker agent to use an LSTM [33] to construct a variable-length message.…”

Section: Related Workmentioning

confidence: 99%

“…A desirable characteristic of the speaker agent is that it should be able to encode "important" components of images with a variable length sequence of discrete symbols. Previous works [9,22,30] have achieved this by first converting the image into a continuous deterministic latent vector and then using an LSTM network [33] to generate a sequence of hidden states, and sample from this sequence of hidden state until a special end of sequence token (or maximum length) is reached. As observed by [30,46], in order to achieve the minimum loss, the model ends up always using the maximum allowable length.…”

Section: Speaker Agent Architecturementioning

confidence: 99%

See 3 more Smart Citations

PatchGame: Learning to Signal Mid-level Patches in Referential Games

Gupta¹,

Somepalli²,

Gupta³

et al. 2021

Preprint

View full text Add to dashboard Cite

We study a referential game (a type of signaling game) where two agents communicate with each other via a discrete bottleneck to achieve a common goal. In our referential game, the goal of the speaker is to compose a message or a symbolic representation of "important" image patches, while the task for the listener is to match the speaker's message to a different view of the same image. We show that it is indeed possible for the two agents to develop a communication protocol without explicit or implicit supervision. We further investigate the developed protocol and show the applications in speeding up recent Vision Transformers by using only important patches, and as pre-training for downstream recognition tasks (e.g., classification).Code is available at https://kampta.github.io/patch-game.

show abstract

Section: Speaker Speaker Viewmentioning

confidence: 99%

Section: Speaker Speaker Viewmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Speaker Agent Architecturementioning

confidence: 99%

See 2 more Smart Citations

PatchGame: Learning to Signal Mid-level Patches in Referential Games

Gupta¹,

Somepalli²,

Gupta³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In contrast to non-referential games, constructing a communication protocol is critical to solving the task -where one can only arrive at a solution through communication. Referential games are referred to as a grounded learning environment, and therefore, communication in MARL has been studied mainly through the lens of referential games [13,26]. Lastly, ordinal tasks refer to a family of problems where the ordering of the actions is critical for solving the task.…”

Section: Environmentsmentioning

confidence: 99%

Learning to Ground Multi-Agent Communication with Autoencoders

Lin¹,

Huh²,

Stauffer³

et al. 2021

Preprint

View full text Add to dashboard Cite

Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -autoencoding -is sufficient for arriving at a grounded common language. When agents broadcast these representations, they learn to understand and respond to each other's utterances and achieve surprisingly strong task performance across a variety of multi-agent communication environments. IntroductionAn essential aspect of communication is that each pair of speaker and listener must share a common understanding of the symbols being used [9]. For artificial agents interacting in an environment, with a communication channel but without an agreed-upon communication protocol, this raises the question: how can meaningful communication emerge before a common language has been established? To address this challenge, prior works have used supervised learning [19], centralized learning [15,18,30], or differentiable communication [7,18,30,34,43]. Yet, none of these mechanisms is representative of how communication emerges in nature, where animals and humans have evolved communication protocols without supervision and without a centralized coordinator [37]. The communication model that most closely resembles language learning in nature is a fully decentralized model, where agents' policies are independently optimized. However, decentralized models perform poorly even in simple communication tasks [18] or with additional inductive biases [11].We tackle this challenge by first making the following observations on why emergent communication is difficult in a decentralized multi-agent reinforcement learning setting. A key problem that prevents agents from learning meaningful communication is the lack of a common grounding in communication symbols [3,11,20]. In nature, the emergence of a common language is thought to be aided by physical biases and embodiment [31,44] -we can only produce certain vocalizations, these sounds only can be heard a certain distance away, these sounds bear similarity to natural sounds in the environment, etc. -yet artificial communication protocols are not a priori grounded in aspects of the environment Project page, code, and videos can be found at https://toruowo.github.io/marl-ae-comm/.

show abstract

Contribution of reflection in language emergence with an under-restricted situation

Todo

Yamamura

2020

2020 International Research Conference on Smart Computing and Systems Engineering (SCSE)

View full text Add to dashboard Cite

Emergent Communication in a Multi-Modal, Multi-Step Referential Game

Cited by 26 publications

References 9 publications

PatchGame: Learning to Signal Mid-level Patches in Referential Games

PatchGame: Learning to Signal Mid-level Patches in Referential Games

Learning to Ground Multi-Agent Communication with Autoencoders

Contribution of reflection in language emergence with an under-restricted situation

Contact Info

Product

Resources

About