Discretizing Continuous Action Space for On-Policy Optimization

Tang, Yunhao; Agrawal, Shipra

doi:10.1609/aaai.v34i04.6059

Cited by 46 publications

(24 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, the control actions of an on-load tap changer with 3 tap positions that correspond to turns ratios of 0.95, 1, and 1.05 can be deemed as a discretization of an ordinal variable of turns ratio. Thus, we adopt an ordinal representation [45] for all the discrete actions of a device to encode the natural ordering between the discrete actions.…”

Section: F Device-decoupled Policy Network Structure and Ordinal Encmentioning

confidence: 99%

Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems

Wang

Gao

et al. 2020

IEEE Trans. Smart Grid

204

View full text Add to dashboard Cite

Volt-VAR control is critical to keeping distribution network voltages within allowable range, minimizing losses, and reducing wear and tear of voltage regulating devices. To deal with incomplete and inaccurate distribution network models, we propose a safe off-policy deep reinforcement learning algorithm to solve Volt-VAR control problems in a model-free manner. The Volt-VAR control problem is formulated as a constrained Markov decision process with discrete action space, and solved by our proposed constrained soft actor-critic algorithm. Our proposed reinforcement learning algorithm achieves scalability, sample efficiency, and constraint satisfaction by synergistically combining the merits of the maximum-entropy framework, the method of multiplier, a device-decoupled neural network structure, and an ordinal encoding scheme. Comprehensive numerical studies with the IEEE distribution test feeders show that our proposed algorithm outperforms the existing reinforcement learning algorithms and conventional optimization-based approaches on a large feeder.

show abstract

Section: F Device-decoupled Policy Network Structure and Ordinal Encmentioning

confidence: 99%

Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems

Wang

Gao

et al. 2020

IEEE Trans. Smart Grid

204

View full text Add to dashboard Cite

show abstract

“…, θ is the network parameter of the SPA model, π θ (a t |s t ) is the probability distribution of the policy under state s t and action a t at step t. We optimize the policy with minibatch AdamW. The estimated advantage function according to [66][67][68][69][70]…”

Section: Training Of Salient Patch Agentmentioning

confidence: 99%

“…We optimize the policy with minibatch AdamW. The estimated advantage function according to [ 66 , 67 , 68 , 69 , 70 ] is

, where

is the discount factor,

is the SPA reward at step

is the number of steps of SPA.

is the value output with

under state

.…”

Section: Asnet Frameworkmentioning

confidence: 99%

ASNet: Auto-Augmented Siamese Neural Network for Action Recognition

Zhang

Xiong

et al. 2021

Sensors

View full text Add to dashboard Cite

Human action recognition methods in videos based on deep convolutional neural networks usually use random cropping or its variants for data augmentation. However, this traditional data augmentation approach may generate many non-informative samples (video patches covering only a small part of the foreground or only the background) that are not related to a specific action. These samples can be regarded as noisy samples with incorrect labels, which reduces the overall action recognition performance. In this paper, we attempt to mitigate the impact of noisy samples by proposing an Auto-augmented Siamese Neural Network (ASNet). In this framework, we propose backpropagating salient patches and randomly cropped samples in the same iteration to perform gradient compensation to alleviate the adverse gradient effects of non-informative samples. Salient patches refer to the samples containing critical information for human action recognition. The generation of salient patches is formulated as a Markov decision process, and a reinforcement learning agent called SPA (Salient Patch Agent) is introduced to extract patches in a weakly supervised manner without extra labels. Extensive experiments were conducted on two well-known datasets UCF-101 and HMDB-51 to verify the effectiveness of the proposed SPA and ASNet.

show abstract

“…The use of molecular fragments simplifies the search problem, while the variable-sized fragment distribution maintains the reachability of most molecular compounds. Because our search algorithm ultimately uses the latent representation of the molecules as the action space, we find that using a VQ-VAE with a categorical prior instead of the typical Gaussian prior makes RL training stable and provides good performance gains (Tang & Agrawal, 2020;Grill et al, 2020).…”

Section: Learning Distributional Fragment Vocabularymentioning

confidence: 99%

“…Previous works optimize molecules either by generating from scratch or a single translation from known molecules, which is inefficient in finding high-quality molecules and often discovering molecules lacking novelty/diversity. Our proposed framework addresses these deficiencies since our method is (1) very efficient in finding molecules that satisfy property constraints as the model stay close to the high-property-score chemical manifold; and (2) able to produce highly novel molecules because the sequence of fragment-based translation can lead to very different and diverse molecules compared to the known active set.…”

Section: Introductionmentioning

confidence: 99%

Fragment-based Sequential Translation for Molecular Optimization

Chen

Jaakkola

et al. 2021

Preprint

View full text Add to dashboard Cite

Searching for novel molecular compounds with desired properties is an important problem in drug discovery. Many existing frameworks generate molecules one atom at a time. We instead propose a flexible editing paradigm that generates molecules using learned molecular fragments---meaningful substructures of molecules. To do so, we train a variational autoencoder (VAE) to encode molecular fragments in a coherent latent space, which we then utilize as a vocabulary for editing molecules to explore the complex chemical property space. Equipped with the learned fragment vocabulary, we propose Fragment-based Sequential Translation (FaST), which learns a reinforcement learning (RL) policy to iteratively translate model-discovered molecules into increasingly novel molecules while satisfying desired properties. Empirical evaluation shows that FaST significantly improves over state-of-the-art methods on benchmark single/multi-objective molecular optimization tasks.

show abstract

Discretizing Continuous Action Space for On-Policy Optimization

Cited by 46 publications

References 18 publications

Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems

Safe Off-Policy Deep Reinforcement Learning Algorithm for Volt-VAR Control in Power Distribution Systems

ASNet: Auto-Augmented Siamese Neural Network for Action Recognition

Fragment-based Sequential Translation for Molecular Optimization

Contact Info

Product

Resources

About