Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

Zhou, Fan; Su, Rui; Zhang, Weinan; Yu, Yong

doi:10.24963/ijcai.2019/316

Cited by 58 publications

(20 citation statements)

References 2 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To solve the stochastic optimization problem defined in Section 2.2, we need an algorithm that allows a mixture of continuous (harvesting quantities) and discrete actions (timings of clear-cuts and thinnings). We approach the problem of continuous action and state space using the notion of parameterized action spaces suggested by Fan et al (2019). The idea is to view the overall action as a hierarchical structure instead of a flat set.…”

Section: Rl Algorithm With Parameterized Action Spacesmentioning

confidence: 99%

“…To handle the parameterized action space containing both discrete actions and continuous parameters, Fan et al (2019) have proposed a hybrid proximal policy optimization (H-PPO) algorithm.…”

Section: Rl Algorithm With Parameterized Action Spacesmentioning

confidence: 99%

“…The algorithm is essentially similar to the broadly applied proximal policy optimization (PPO) algorithm with only slight modifications to allow for the use of parameterized actions that combine continuous and discrete decisions (Schulman et al 2017, Fan et al 2019).…”

Section: Appendixmentioning

confidence: 99%

See 2 more Smart Citations

Reinforcement learning in optimizing forest management

et al. 2021

View full text Add to dashboard Cite

We solve a stochastic high-dimensional optimal harvesting problem by reinforcement learning algorithms developed for agents who learn an optimal policy in a sequential decision process through repeated experience. This approach produces optimal solutions without discretization of state and control variables. Our stand-level model includes mixed species, tree size structure, optimal harvest timing, choice between rotation and continuous cover forestry, stochasticity in stand growth, and stochasticity in the occurrence of natural disasters. The optimal solution or policy maps the system state to the set of actions, i.e. clear-cut/thinning/no harvest decisions and the intensity of thinning over tree species and size classes. The algorithm repeats the solutions for deterministic problems computed earlier with time-consuming methods. Optimal policy describes harvesting choices from any initial state and reveals how the initial thinning vs. clear-cut choice depends on the economic and ecological factors. Stochasticity in stand growth increases the diversity of species composition. Despite the high variability in natural regeneration, the optimal policy closely satisfies the certainty equivalence principle. The effect of natural disasters is similar to an increase in the interest rate, but in contrast to earlier results, this tends to change the management regime from rotation forestry to continuous cover management.

show abstract

Section: Rl Algorithm With Parameterized Action Spacesmentioning

confidence: 99%

Section: Rl Algorithm With Parameterized Action Spacesmentioning

confidence: 99%

See 1 more Smart Citation

Reinforcement learning in optimizing forest management

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Previous work has touched upon the idea of using trees as a formalization of action spaces with multiple components such as in [16], where the tree structure is referred to as a Hierarchical Action Space. Other works have used action spaces that are similar to those used in this paper as examples of action trees.…”

Section: A Action Treesmentioning

confidence: 99%

Generalising Discrete Action Spaces with Conditional Action Trees

Bamford¹,

Ovalle²

2021

Preprint

View full text Add to dashboard Cite

There are relatively few conventions followed in reinforcement learning (RL) environments to structure the action spaces. As a consequence the application of RL algorithms to tasks with large action spaces with multiple components require additional effort to adjust to different formats. In this paper we introduce Conditional Action Trees with two main objectives: (1) as a method of structuring action spaces in RL to generalise across several action space specifications, and (2) to formalise a process to significantly reduce the action space by decomposing it into multiple sub-spaces, favoring a multi-staged decision making approach. We show several proof-of-concept experiments validating our scheme, ranging from environments with basic discrete action spaces to those with large combinatorial action spaces commonly found in RTS-style games.

show abstract

“…Unlike [22], we do not transfer any target samples back to the internal model. Similar to the problem settings of [23]- [25], our agent predicts the best parameters for a controller.…”

Section: Related Workmentioning

confidence: 99%

Causal Reasoning in Simulation for Structure and Transfer Learning of Robot Manipulation Policies

Lee¹,

Zhao²,

Sawhney³

et al. 2021

Preprint

View full text Add to dashboard Cite

We present CREST, an approach for causal reasoning in simulation to learn the relevant state space for a robot manipulation policy. Our approach conducts interventions using internal models, which are simulations with approximate dynamics and simplified assumptions. These interventions elicit the structure between the state and action spaces, enabling construction of neural network policies with only relevant states as input. These policies are pretrained using the internal model with domain randomization over the relevant states. The policy network weights are then transferred to the target domain (e.g., the real world) for fine tuning. We perform extensive policy transfer experiments in simulation for two representative manipulation tasks: block stacking and crate opening. Our policies are shown to be more robust to domain shifts, more sample efficient to learn, and scale to more complex settings with larger state spaces. We also show improved zero-shot simto-real transfer of our policies for the block stacking task.

show abstract

Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space

Cited by 58 publications

References 2 publications

Reinforcement learning in optimizing forest management

Reinforcement learning in optimizing forest management

Generalising Discrete Action Spaces with Conditional Action Trees

Causal Reasoning in Simulation for Structure and Transfer Learning of Robot Manipulation Policies

Contact Info

Product

Resources

About