2014
DOI: 10.1145/2659003
|View full text |Cite
|
Sign up to set email alerts
|

Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots

Abstract: Conversational systems and robots that use reinforcement learning for policy optimization in large domains often face the problem of limited scalability. This problem has been addressed either by using function approximation techniques that estimate the approximate true value function of a policy or by using a hierarchical decomposition of a learning task into subtasks. We present a novel approach for dialogue policy optimization that combines the benefits of both hierarchical control and function approximatio… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2014
2014
2024
2024

Publication Types

Select...
3
2
1

Relationship

3
3

Authors

Journals

citations
Cited by 8 publications
(12 citation statements)
references
References 48 publications
0
12
0
Order By: Relevance
“…Although the currently generated dialogues using the trained policies seem reasonable 6 , it is natural to ask How good are the DRL-based policies? To answer this question we integrated a K-Nearest Neighbour (KNN) baseline [13], which aims to behave as the example demonstration dialogues-see Appendix A.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Although the currently generated dialogues using the trained policies seem reasonable 6 , it is natural to ask How good are the DRL-based policies? To answer this question we integrated a K-Nearest Neighbour (KNN) baseline [13], which aims to behave as the example demonstration dialogues-see Appendix A.…”
Section: Resultsmentioning
confidence: 99%
“…In contrast to Hierarchical DQNs [5] that follow a strict sequence of agents, an NDQN in our method allows transitions between all DQN agents except for self-transitions. The latter using a stack-based approach as in [6]. While user responses can motivate transitions to another domain in the network, completing a subdialogue within a domain motivates a transition to the previous domain to resume the interaction.…”
Section: A Network Of Deep Q-network (Ndqn)mentioning
confidence: 99%
“…It remains to be demonstrated how far one can go with such an approach. Future work includes to (a) compare different model architectures, training parameters and reward functions; (b) extend or improve the abilities of the proposed dialogue system; (c) train deep learning agents in other (larger scale) domains [7,8,9]; (d) evaluate end-to-end systems with real users; (e) compare or combine different types of neural nets [10]; and (e) perform fast learning based on parallel computing. Table 1 Example dialogue using the policy from Fig.2, where states are numerical representations of the last system and noisy user inputs, actions are dialogue acts, and user resposes are in brackets…”
Section: Discussionmentioning
confidence: 99%
“…This function is known as a classifier when the labels are discrete and as a regressor when the labels are continuous. All articles in this special issue make use of classifiers to predict events during human-machine interactions [Ngo et al 2014;Benotti et al 2014;Keizer et al 2014;Cuayáhuitl et al 2014]. -In contrast to supervised learning that makes use of direct feedback, reinforcement learning makes use of indirect feedback typically based on numerical rewards given during the interaction, and the goal is to maximize them in the long run.…”
Section: Multimodal Interactive Learning Systems: What and Why?mentioning
confidence: 99%