Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.129
|View full text |Cite
|
Sign up to set email alerts
|

Learning Dialog Policies from Weak Demonstrations

Abstract: Deep reinforcement learning is a promising approach to training a dialog manager, but current methods struggle with the large state and action spaces of multi-domain dialog systems. Building upon Deep Q-learning from Demonstrations (DQfD), an algorithm that scores highly in difficult Atari games, we leverage dialog data to guide the agent to successfully respond to a user's requests. We make progressively fewer assumptions about the data needed, using labeled, reduced-labeled, and even unlabeled data to train … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
9
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3
1

Relationship

0
7

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 23 publications
(13 reference statements)
0
9
0
Order By: Relevance
“…They train the proposed SDS using a network of DQN agents, which is similar to hierarchical DRL but with more flexibility for transitioning across dialogues domains. Another work-related to faster training is proposed by Gordon-Hall et al (2020), where the behaviour of RL agents is guided by expert demonstrations.…”
Section: Spoken Dialogue Systems (Sdss)mentioning
confidence: 99%
“…They train the proposed SDS using a network of DQN agents, which is similar to hierarchical DRL but with more flexibility for transitioning across dialogues domains. Another work-related to faster training is proposed by Gordon-Hall et al (2020), where the behaviour of RL agents is guided by expert demonstrations.…”
Section: Spoken Dialogue Systems (Sdss)mentioning
confidence: 99%
“…[Nishimoto and Reali Costa 2019] extended the first work by showing that a good balance in exploration and exploitation during training can significantly improve the performance. Some other recent works also used the classical DQN algorithm to train the policy [Gordon-Hall et al 2020, Wang et al 2020, showing that despite simple, this algorithm can provide good results [Mo et al 2018] and [Weisz et al 2018] tried out other RL algorithms to model the DM, such as SARSA and actor-critic, respectively. Finally, [Saha et al 2020] proposed a hierarchical deep reinforcement learning approach to deal with more complex dialogue systems and [Takanobu et al 2019] proposed a method to learn the reward and optimize the policy jointly.…”
Section: Related Workmentioning
confidence: 99%
“…However, it is more complicated and needs a lot of labeled data collected from experts. [Gordon-Hall et al 2020] proposed the Deep Q-learning from Demonstrations (DQfD), which uses expert demonstrators in a weakly supervised fashion.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations