Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications

Cai, Mingyu; Aasi, Erfan; Belta, Călin; Vasile, Cristian-Ioan

doi:10.1109/lra.2023.3246844

Cited by 18 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This approach requires the ability to plan-ahead in the environment, which is not always feasible. Automaton-guided RL has been used to aid navigational exploration for robotic domains (Cai et al 2023) and for multi-agent settings (Hammond et al 2021). Generating a curriculum given the high-level objective (Shukla et al 2023) requires access to the Object-Oriented MDP (Diuk, Cohen, and Littman 2008), which cannot be obtained if environment details are not known in advance.…”

Section: Related Workmentioning

confidence: 99%

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Shukla,

Burman,

Kulkarni

et al. 2024

ICAPS

View full text Add to dashboard Cite

Reinforcement Learning (RL) has made significant strides in enabling artificial agents to learn diverse behaviors. However, learning an effective policy often requires a large number of environment interactions. To mitigate sample complexity issues, recent approaches have used high-level task specifications, such as Linear Temporal Logic (LTLf) formulas or Reward Machines (RM), to guide the learning progress of the agent. In this work, we propose a novel approach, called Logical Specifications-guided Dynamic Task Sampling (LSTS), that learns a set of RL policies to guide an agent from an initial state to a goal state based on a high-level task specification, while minimizing the number of environmental interactions. Unlike previous work, LSTS does not assume information about the environment dynamics or the Reward Machine, and dynamically samples promising tasks that lead to successful goal policies. We evaluate LSTS on a gridworld and show that it achieves improved time-to-threshold performance on complex sequential decision-making problems compared to state-of-the-art RM and Automaton-guided RL baselines, such as Q-Learning for Reward Machines and Compositional RL from logical Specifications (DIRL). Moreover, we demonstrate that our method outperforms RM and Automaton-guided RL baselines in terms of sample-efficiency, both in a partially observable robotic task and in a continuous control robotic manipulation task.

show abstract

Section: Related Workmentioning

confidence: 99%

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Shukla,

Burman,

Kulkarni

et al. 2024

ICAPS

View full text Add to dashboard Cite

show abstract

“…q K+M P with q K P = q K+M P is executed infinitely often. Following prior work [6], we can decompose the optimal path τ F = τ pre P [τ suf P ] ω based on automaton components into a sequence of goal-reaching trajectories i.e., τ F = τ 0 τ 1 . .…”

Section: A Offline Motion Planningmentioning

confidence: 99%

“…Each τ i can be presented as an optimal solution of reachability navigation expressed as a simple LTL formula ϕ i,F = □¬O ∧ ϕ gi , O represent obstacles. We refer readers for more details about the decomposition procedure in [6].…”

Section: A Offline Motion Planningmentioning

confidence: 99%

Vision-Based Reactive Planning and Control of Quadruped Robots in Unstructured Dynamic Environments

Qian,

Zhou,

Wang

et al. 2023

2023 International Conference on Advanced Robotics and Mechatronics (ICARM)

View full text Add to dashboard Cite

Quadruped robots have received increasing attention for the past few years. However, existing works primarily focus on static environments or assume the robot has full observations of the environment. This limits their practical applications since real-world environments are often dynamic and partially observable. To tackle these issues, vision-based reactive planning and control (V-RPC) is developed in this work. The V-RPC comprises two modules: offline pre-planning and online reactive planning. The pre-planning phase generates a reference trajectory over continuous workspace via samplingbased methods using prior environmental knowledge, given an LTL specification. The online reactive module dynamically adjusts the reference trajectory and control based on the robot's real-time visual perception to adapt to environmental changes.

show abstract

“…A compositional RL algorithm that interleaves Dijkastra's algorithm for high-level task planning with learning sub-task policy using RL is developed in Jothimurugan et al (2021). Cai et al (2023) introduce a path planning-guided reward design scheme to RL-based policy design with LTL specified mission goals. Cai et al (2021) consider motion planning under LTL task specifications in continuous state and action spaces, and develop an unsupervised one-shot and on-the-fly motion planning framework to learn the unknown state.…”

Section: Continuous-state Pomdpmentioning

confidence: 99%

Mori-Zwanzig Approach for Belief Abstraction with Application to Belief Space Planning

Hou,

Lin,

Zhou

et al. 2023

Preprint

View full text Add to dashboard Cite

We propose a learning-based method to extract symbolic representations of the belief state and its dynamics in order to solve planning problems in a continuous-state partially observable Markov decision processes (POMDP) problem. While existing approaches typically parameterize the continuous-state POMDP into a finite-dimensional Markovian model, they are unable to preserve fidelity of the abstracted model. The first major contribution of this paper is we propose a Neural Network based method to learn the non-Markovian transition model based on the Mori-Zwanzig (M-Z) formalism. Different from existing work in applying M-Z formalism to autonomous time-invariant systems, our approach is the first work generalizing the M-Z formalism to robotics, by addressing the non-Markovian modeling of the belief dynamics that is dependent on historical observations and actions. The second major contribution is we theoretically show that modeling the non-Markovian memory effect in the abstracted belief dynamics improves the modeling accuracy, which is the key benefit of the proposed algorithm. Simulation experiment of a belief space planning problem is provided to validate the performance of the proposed belief abstraction algorithms.

show abstract

Overcoming Exploration: Deep Reinforcement Learning for Continuous Control in Cluttered Environments From Temporal Logic Specifications

Cited by 18 publications

References 25 publications

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Logical Specifications-guided Dynamic Task Sampling for Reinforcement Learning Agents

Vision-Based Reactive Planning and Control of Quadruped Robots in Unstructured Dynamic Environments

Mori-Zwanzig Approach for Belief Abstraction with Application to Belief Space Planning

Contact Info

Product

Resources

About