Reinforcement learning is an appropriate and successful method to robustly perform lowlevel robot control under noisy conditions. Symbolic action planning is useful to resolve causal dependencies and to break a causally complex problem down into a sequence of simpler high-level actions. A problem with the integration of both approaches is that action planning is based on discrete high-level action-and state spaces, whereas reinforcement learning is usually driven by a continuous reward function. However, recent advances in reinforcement learning, specifically, universal value function approximators and hindsight experience replay, have focused on goal-independent methods based on sparse rewards. In this article, we build on these novel methods to facilitate the integration of action planning with reinforcement learning by exploiting the reward-sparsity as a bridge between the high-level and low-level state-and control spaces. As a result, we demonstrate that the integrated neuro-symbolic method is able to solve object manipulation problems that involve tool use and non-trivial causal dependencies under noisy conditions, exploiting both data and knowledge.
Eppe et al.
From semantics to executioncausal puzzles from scratch, without learning from demonstration or other data sources, remains unsolved. The existing learning-based approaches are either very constrained (e.g. (Deisenroth and Rasmussen, 2011)), or they have been applied only to low-dimensional non-noisy control problems that do not involve complex causal dependencies (e.g. (Bacon et al., 2017;Levy et al., 2019)), or they build on learning from demonstration (Aytar et al., 2018).A complementary method to address complex causal dependencies is to use pre-specified semantic domain knowledge, e.g. in the form of an action planning domain description (McDermott et al., 1998).With an appropriate problem description, a planner can provide a sequence of solvable sub-tasks in a discrete high-level action space. However, the problem with this semantic and symbolic task planning approach is that the high-level actions generated by the planner require the grounding in a low-level motion execution layer to consider the context of the current low-level state. For example, executing a high-level robotic action move object to target requires precise information (e.g. the location and the shape) of the object to move it to the target location along a context-specific path. This grounding problem consists of two sub-problems P.1 and P.2 that we tackle in this article.
P.1The first grounding sub-problem is to map the discrete symbolic action space to context-specific subgoals in the continuous state space. For instance, move object to target needs to be associated with a continuous sub-goal that specifies the desired metric target coordinates of the object.
P.2The second grounding sub-problem is to map the subgoals to low-level context-specific action trajectories. For instance, the low-level trajectory for move object to target is specific to the continuous start-and targ...