Deep reinforcement learning: a survey

Wang, Haonan; Liu, Ning; Zhang, Yiyun; Feng, Dawei; Huang, Feng; Li, Dongsheng; Zhang, Yiming

doi:10.1631/fitee.1900533

Cited by 164 publications

(64 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While it has been shown that one can learn a control policy end-to-end using deep reinforcement learning (DRL) given high-dimensional observations [31], a significant, sometimes prohibitive amount of data is needed. However, it is possible to take advantage of compact, low-dimensional state representation to improve data efficiency [32].…”

Section: Related Workmentioning

confidence: 99%

Excavation Reinforcement Learning Using Geometric Representation

Lu¹,

Zhu²,

Zhang³

2022

Preprint

View full text Add to dashboard Cite

Excavation of irregular rigid objects in clutter, such as fragmented rocks and wood blocks, is very challenging due to their complex interaction dynamics and highly variable geometries. In this paper, we adopt reinforcement learning (RL) to tackle this challenge and learn policies to plan for a sequence of excavation trajectories for irregular rigid objects, given point clouds of excavation scenes. Moreover, we separately learn a compact representation of the point cloud on geometric tasks that do not require human labeling. We show that using the representation reduces training time for RL, while achieving similar asymptotic performance compare to an endto-end RL algorithm. When using a policy trained in simulation directly on a real scene, we show that the policy trained with the representation outperforms end-to-end RL. To our best knowledge, this paper presents the first application of RL to plan a sequence of excavation trajectories of irregular rigid objects in clutter.

show abstract

Section: Related Workmentioning

confidence: 99%

Excavation Reinforcement Learning Using Geometric Representation

Lu¹,

Zhu²,

Zhang³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Bilevel optimization method is usually a good choice when faced with a comprehensive solution of multilevel or multi-party interests [27], [28]. Traditional back propagation particle swarm optimization (BPPSO) and reinforcement learning algorithm with action-reward incentive method [29] are also widely used optimization method to realize the process. However, there are few researches in this field at home and abroad, which are mainly faced with two difficulties: computational complexity and feedback accuracy.…”

Section: Figure 1 Rps and Tgc Mechanisms In Chinamentioning

confidence: 99%

Joint Optimization of Quota Policy Design and Electric Market Behavior Based on Renewable Portfolio Standard in China

Liu

Wang

et al. 2021

IEEE Access

View full text Add to dashboard Cite

Under the perspective of carbon neutrality, the green electricity absorption target constrained by the quota system policy plays a crucial role in reducing the carbon emission of the power industry. However, the current green certificate policy has not achieved good results. On the premise of reducing the additional market burden as much as possible, the policy parameters should take into account the influence of market behavior to formulate better policy parameters in line with China's carbon emission peak goal. This paper constructs a combined hierarchical reinforcement learning with off-policy correction and multiagent deep deterministic policy gradient algorithm (HIRO-MADDPG). It realizes the benefit analysis of the existing policy parameters joint with the solution of the optimal policy parameters. The algorithm solves the problem that benefit analysis and parameter formulation cannot be jointly trained and improves the precision. The results indicate:(1)HIRO-MADDPG algorithm can reach the highest policy benefits on the premise of maintaining market fairness.(2)Under the new optimal policy parameters, the income per kilowatt hour of thermal power generator(TPG) and renewable power generator(RPG) can be maintained at 10% under the condition of abolishing subsidies.(3)With the help of the new policy parameters, China's power sector will reach the peak of carbon emissions from coal-fired power plants in 2026 ahead of schedule, and reduce carbon emissions by a further 11% by 2030.

show abstract

“…One method for solving this problem is to effectively combine deep learning with reinforcement learning. A deep neural network is used, in traditional reinforcement learning, to model solutions to continuous reinforcement learning tasks [27,28]. Based on this method, Lillicrap et al [29] proposed a depth deterministic strategy gradient algorithm based on the actor critic framework.…”

Section: Introductionmentioning

confidence: 99%

Adaptive Proportional Integral Robust Control of an Uncertain Robotic Manipulator Based on Deep Deterministic Policy Gradient

Huang

Xiao

et al. 2021

Mathematics

View full text Add to dashboard Cite

An adaptive proportional integral robust (PIR) control method based on deep deterministic policy gradient (DDPGPIR) is proposed for n-link robotic manipulator systems with model uncertainty and time-varying external disturbances. In this paper, the uncertainty of the nonlinear dynamic model, time-varying external disturbance, and friction resistance of the n-link robotic manipulator are integrated into the uncertainty of the system, and the adaptive robust term is used to compensate for the uncertainty of the system. In addition, dynamic information of the n-link robotic manipulator is used as the input of the DDPG agent to search for the optimal parameters of the proportional integral robust controller in continuous action space. To ensure the DDPG agent’s stable and efficient learning, a reward function combining a Gaussian function and the Euclidean distance is designed. Finally, taking a two-link robot as an example, the simulation experiments of DDPGPIR and other control methods are compared. The results show that DDPGPIR has better adaptive ability, robustness, and higher trajectory tracking accuracy.

show abstract

Deep reinforcement learning: a survey

Cited by 164 publications

References 42 publications

Excavation Reinforcement Learning Using Geometric Representation

Excavation Reinforcement Learning Using Geometric Representation

Joint Optimization of Quota Policy Design and Electric Market Behavior Based on Renewable Portfolio Standard in China

Adaptive Proportional Integral Robust Control of an Uncertain Robotic Manipulator Based on Deep Deterministic Policy Gradient

Contact Info

Product

Resources

About