Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

Esteban, Domingo; Rozo, Leonel; Caldwell, Darwin G.

doi:10.1109/iros40897.2019.8968149

Cited by 4 publications

(3 citation statements)

References 17 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Generally, the original MRL supports not only the decomposition of complex tasks into modules, but also the composability of separately learned modules as new strategies for tasks that were never solved before [4,13]. Focusing on the optimality of the composite strategy for the entire task and the independence of learning in separate modules, [12] introduced the specific concept of "modular reward", which comes from the actual reward after each interaction plus a bonus for passing the task on a proper module.…”

Section: Related Workmentioning

confidence: 99%

“…Specifically, this bonus is calculated from the modular value function and the temporal difference in the module gating signal, which propagates the reward toward the entire task achievement between modules. In situations where the tasks require to perform the sub-tasks concurrently, [4] propose an hierarchical RL approach to learn both compound and composable policies within the same learning process, by which exploiting the off-policy data generated by the compound policy. The results show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding sub-tasks.…”

Section: Related Workmentioning

confidence: 99%

“…Furthermore, different from [3] where not all modules are involved in the update procedure after each interaction, instead, only the most related module will be targeted for amelioration. Moreover, the update of parameters in the selected module is gated by a dynamic signal, the "responsibility signal" [3,12] , which comes from the gaussian softmax function of prediction error in each module and determines how much the module is responsible for the current situation, rather than only according to a fixed parameter of learning rate as in [13,4] as we will explain in section 5.1.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Multi-task Learning with Modular Reinforcement Learning

Xue

Alexandre

2022

From Animals to Animats 16

View full text Add to dashboard Cite

The ability to learn compositional strategies in multi-task learning and to exert them appropriately is crucial to the development of artificial intelligence. However, there exist several challenges: (i) how to maintain the independence of modules in learning their own sub-tasks; (ii) how to avoid performance degradation in situations where modules' reward scales are incompatible; (iii) how to find the optimal composite policy for the entire set of tasks. In this paper, we introduce a Modular Reinforcement Learning (MRL) framework that coordinates the competition and the cooperation between separate modules. A selective update mechanism enables the learning system to align incomparable reward scales in different modules. Furthermore, the learning system follows a "joint policy" to calculate actions' preferences combined with their responsibility for the current task. We evaluate the effectiveness of our approach on a classic food-gathering and predator-avoidance task. Results show that our approach has better performance than previous MRL methods in learning separate strategies for sub-tasks, is robust to modules with incomparable reward scales, and maintains the independence of the learning in each module.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%