Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning

Papangelis, Alexandros; Wang, Yi-Chia; Molino, Piero; Tür, Gökhan

doi:10.18653/v1/w19-5912

Cited by 28 publications

(24 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Then in the RL training phase, the dialog policy is alternately trained through learning from real users and planning with the environment model. Some other works jointly train a system policy and a user policy simultaneously [43,44].…”

Section: Dialog Policymentioning

confidence: 99%

Recent advances and challenges in task-oriented dialog systems

Zhang

Takanobu

Huang

et al. 2020

Sci. China Technol. Sci.

120

View full text Add to dashboard Cite

Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in task-oriented dialog systems. We also discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model. Besides, we review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey, though incomplete, can shed a light on future research in task-oriented dialog systems. task-oriented dialog systems, natural language understanding, dialog policy, dialog state tracking, natural language generation

show abstract

Section: Dialog Policymentioning

confidence: 99%

Recent advances and challenges in task-oriented dialog systems

Zhang

Takanobu

Huang

et al. 2020

Sci. China Technol. Sci.

120

View full text Add to dashboard Cite

show abstract

“…Several studies have demonstrated that applying MARL delivers promising results in NLP tasks these years. While some methods use identical rewards for all agents (Das et al, 2017;Feng et al, 2018), other studies use completely separate rewards (Georgila et al, 2014;Papangelis et al, 2019). MADPL integrates two types of rewards by role-aware reward decomposition to train a better dialog policy in task-oriented dialog.…”

Section: Multi-agent Reinforcement Learningmentioning

confidence: 99%

“…Two dialog agents interact with each other and collaborate to achieve the goal so that they require no explicit domain expertise, which helps develop a dialog system without the need of a well-built user simulator. Different from existing methods (Georgila et al, 2014;Papangelis et al, 2019), our approach is based on actor-critic framework (Barto et al, 1983) in order to facilitate pretraining and bootstrap the RL training. Following the paradigm of centralized training with decentralized execution (CTDE) (Bernstein et al, 2002) in multi-agent RL (MARL), the actor selects its action conditioned only on its local stateaction history, while the critic is trained with the actions of all agents.…”

Section: Introductionmentioning

confidence: 99%

Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

Takanobu¹,

Liang²,

Huang³

2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Many studies have applied reinforcement learning to train a dialog policy and show great promise these years. One common approach is to employ a user simulator to obtain a large number of simulated user experiences for reinforcement learning algorithms. However, modeling a realistic user simulator is challenging. A rule-based simulator requires heavy domain expertise for complex tasks, and a data-driven simulator requires considerable data and it is even unclear how to evaluate a simulator. To avoid explicitly building a user simulator beforehand, we propose Multi-Agent Dialog Policy Learning, which regards both the system and the user as the dialog agents. Two agents interact with each other and are jointly learned simultaneously. The method uses the actorcritic framework to facilitate pretraining and improve scalability. We also propose Hybrid Value Network for the role-aware reward decomposition to integrate role-specific domain knowledge of each agent in task-oriented dialog. Results show that our method can successfully build a system policy and a user policy simultaneously, and two agents can achieve a high task success rate through conversational interaction.

show abstract

“…5 For the Telegram integration, the python-telegram-bot API is used. 6 The user's options, generated by the NLG, are shown as keyboard buttons in the Telegram app. The text of each button corresponds to a possible response and is linked to a speci c dialogue act.…”

Section: Multi-modal Chat Interfacementioning

confidence: 99%

“…OpenDial [5] is designed to facilitate the development of agents for single-turn Q&A-style dialogues. Plato [6] and PyDial [10] attempt to model user preferences, but do not track the preference evolution over conversations. Further, most of the available domain-speci c (movie) recommender systems are closed-source commercial products, such as the Facebook messenger bot And chill.…”

Section: Introductionmentioning

confidence: 99%

IAI MovieBot

Habib

Zhang²,

Balog

2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

Conversational recommender systems support users in accomplishing recommendation-related goals via multi-turn conversations. To better model dynamically changing user preferences and provide the community with a reusable development framework, we introduce IAI MovieBot, a conversational recommender system for movies. It features a task-speci c dialogue ow, a multi-modal chat interface, and an e ective way to deal with dynamically changing user preferences. The system is made available open source and is operated as a channel on Telegram. CCS CONCEPTS • Information systems → Users and interactive retrieval; Recommender systems.

show abstract

Collaborative Multi-Agent Dialogue Model Training Via Reinforcement Learning

Cited by 28 publications

References 43 publications

Recent advances and challenges in task-oriented dialog systems

Recent advances and challenges in task-oriented dialog systems

Multi-Agent Task-Oriented Dialog Policy Learning with Role-Aware Reward Decomposition

IAI MovieBot

Contact Info

Product

Resources

About