Proceedings of the 20th Annual SIGdial Meeting on Discourse and Dialogue 2019
DOI: 10.18653/v1/w19-5912
|View full text |Cite
|
Sign up to set email alerts
|

Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning

Abstract: We present the first complete attempt at concurrently training conversational agents that communicate only via self-generated language. Using DSTC2 as seed data, we trained natural language understanding (NLU) and generation (NLG) networks for each agent and let the agents interact online. We model the interaction as a stochastic collaborative game where each agent (player) has a role ("assistant", "tourist", "eater", etc.) and their own objectives, and can only interact via natural language they generate. Eac… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 28 publications
(24 citation statements)
references
References 43 publications
0
24
0
Order By: Relevance
“…Then in the RL training phase, the dialog policy is alternately trained through learning from real users and planning with the environment model. Some other works jointly train a system policy and a user policy simultaneously [43,44].…”
Section: Dialog Policymentioning
confidence: 99%
“…Then in the RL training phase, the dialog policy is alternately trained through learning from real users and planning with the environment model. Some other works jointly train a system policy and a user policy simultaneously [43,44].…”
Section: Dialog Policymentioning
confidence: 99%
“…Several studies have demonstrated that applying MARL delivers promising results in NLP tasks these years. While some methods use identical rewards for all agents (Das et al, 2017;Feng et al, 2018), other studies use completely separate rewards (Georgila et al, 2014;Papangelis et al, 2019). MADPL integrates two types of rewards by role-aware reward decomposition to train a better dialog policy in task-oriented dialog.…”
Section: Multi-agent Reinforcement Learningmentioning
confidence: 99%
“…Two dialog agents interact with each other and collaborate to achieve the goal so that they require no explicit domain expertise, which helps develop a dialog system without the need of a well-built user simulator. Different from existing methods (Georgila et al, 2014;Papangelis et al, 2019), our approach is based on actor-critic framework (Barto et al, 1983) in order to facilitate pretraining and bootstrap the RL training. Following the paradigm of centralized training with decentralized execution (CTDE) (Bernstein et al, 2002) in multi-agent RL (MARL), the actor selects its action conditioned only on its local stateaction history, while the critic is trained with the actions of all agents.…”
Section: Introductionmentioning
confidence: 99%
“…5 For the Telegram integration, the python-telegram-bot API is used. 6 The user's options, generated by the NLG, are shown as keyboard buttons in the Telegram app. The text of each button corresponds to a possible response and is linked to a speci c dialogue act.…”
Section: Multi-modal Chat Interfacementioning
confidence: 99%
“…OpenDial [5] is designed to facilitate the development of agents for single-turn Q&A-style dialogues. Plato [6] and PyDial [10] attempt to model user preferences, but do not track the preference evolution over conversations. Further, most of the available domain-speci c (movie) recommender systems are closed-source commercial products, such as the Facebook messenger bot And chill.…”
Section: Introductionmentioning
confidence: 99%