Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process, but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -autoencoding -is sufficient for arriving at a grounded common language. When agents broadcast these representations, they learn to understand and respond to each other's utterances and achieve surprisingly strong task performance across a variety of multi-agent communication environments.
IntroductionAn essential aspect of communication is that each pair of speaker and listener must share a common understanding of the symbols being used [9]. For artificial agents interacting in an environment, with a communication channel but without an agreed-upon communication protocol, this raises the question: how can meaningful communication emerge before a common language has been established? To address this challenge, prior works have used supervised learning [19], centralized learning [15,18,30], or differentiable communication [7,18,30,34,43]. Yet, none of these mechanisms is representative of how communication emerges in nature, where animals and humans have evolved communication protocols without supervision and without a centralized coordinator [37]. The communication model that most closely resembles language learning in nature is a fully decentralized model, where agents' policies are independently optimized. However, decentralized models perform poorly even in simple communication tasks [18] or with additional inductive biases [11].We tackle this challenge by first making the following observations on why emergent communication is difficult in a decentralized multi-agent reinforcement learning setting. A key problem that prevents agents from learning meaningful communication is the lack of a common grounding in communication symbols [3,11,20]. In nature, the emergence of a common language is thought to be aided by physical biases and embodiment [31,44] -we can only produce certain vocalizations, these sounds only can be heard a certain distance away, these sounds bear similarity to natural sounds in the environment, etc. -yet artificial communication protocols are not a priori grounded in aspects of the environment Project page, code, and videos can be found at https://toruowo.github.io/marl-ae-comm/.