Dueling Network Architectures for Deep Reinforcement Learning

Wang, Ziyu; Schaul, Tom; Hessel, Matteo; Hasselt, Hado van; Lanctot, Marc; Freitas, Nando de

doi:10.48550/arxiv.1511.06581

Cited by 208 publications

(285 citation statements)

References 7 publications

Supporting

Mentioning

283

Contrasting

Order By: Relevance

“…The reinforcement learning method adopted in DRLCFA method is based on the idea of V-D D3QN method [38]. This method is a variant of the Double Dueling Q-learning Network, which changes the update mode of the Q Network in a innovative way while retaining the duelling structure [40].…”

Section: Reinforcement Learning Methods and The Agentmentioning

confidence: 99%

Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment

Zhang

2022

IET Image Processing

View full text Add to dashboard Cite

Automatic image cropping can change the composition to improve the aesthetic quality of the images. Most of the existing automatic image cropping methods based on specific features need to generate a large number of candidate cropping windows. It is very time-consuming and can only produce a limited aspect ratio results. In the face of these situations, a reinforcement learning cropping method based on comprehensive feature and aesthetics assessment is proposed. It does not need to produce a large number of candidate windows. Its gradually cropping mode is more in line with the process of image cropping by human. What is more, the proposed method takes the image aesthetic assessment into consideration. Experimental results show that the proposed method improves the cropping efficiency and achieves excellent cropping effect on the open Flickr Cropping Dataset and CUHK Image Cropping Dataset. The proposed method can overcome the shortages of existing methods.

show abstract

Section: Reinforcement Learning Methods and The Agentmentioning

confidence: 99%

Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment

Zhang

2022

IET Image Processing

View full text Add to dashboard Cite

show abstract

“…To improve the data efficiency of DQN and expedite its convergence, it is shown in [51] that, instead of uniform sampling, the more surprising transitions should be sampled more frequently, a method which is called prioritized experience replay (PER). Rainbow-DQN, which demonstrates superior performance in comparison to other DQN variants in several Atari games [52], combines some of the best approaches to improve DQN like double Q-learning [53], PER [51], dueling architecture [54], multi-step learning, distributional reinforcement learning [55] and noisy nets [56]. In this paper, we use a double DQN with dueling architecture and PER.…”

Section: Reinforcement Learning For Cppmentioning

confidence: 99%

Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition

Heydari,

Saha,

Ganapathy

2021

Preprint

View full text Add to dashboard Cite

Coverage path planning in a generic known environment is shown to be NP-hard. When the environment is unknown, it becomes more challenging as the robot is required to rely on its online map information built during coverage for planning its path. A significant research effort focuses on designing heuristic or approximate algorithms that achieve reasonable performance. Such algorithms have sub-optimal performance in terms of covering the area or the cost of coverage, e.g., coverage time or energy consumption. In this paper, we provide a systematic analysis of the coverage problem and formulate it as an optimal stopping time problem, where the trade-off between coverage performance and its cost is explicitly accounted for. Next, we demonstrate that reinforcement learning (RL) techniques can be leveraged to solve the problem computationally. To this end, we provide some technical and practical considerations to facilitate the application of the RL algorithms and improve the efficiency of the solutions. Finally, through experiments in grid world environments and Gazebo simulator, we show that reinforcement learning-based algorithms efficiently cover realistic unknown indoor environments, and outperform the current state of the art.

show abstract

“…Therefore, the dueling DDQN can generalize the learning process for all actions and has the ability to identify the best actions and important states quickly without learning the effects of each action for each state. The dueling DQN [35] is the improved version of DQN, where the Q-network has two streams (sequences) Q-function (i.e., the state-action value function is decomposed), namely the state value function V π (s) and the advantage function A π (s, a), to speed up the convergence and improve the efficiency. The value function V π (s) is used to represent the quality of being in a particular state (calculating the average contribution of a particular state to the Q-function), and the advantage function A π (s, a) measures the comparative importance of a particular action versus other actions in a particular state.…”

Section: Dueling Double Deep Q Network Algorithmmentioning

confidence: 99%

A Power-Pool-Based Power Control in Semi-Grant-Free NOMA Transmission

Fayaz¹,

Yi²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, we exploit the capability of multi-agent deep reinforcement learning (MA-DRL) technique to generate a transmit power pool (PP) for Internet of things (IoT) networks with semi-grantfree non-orthogonal multiple access (SGF-NOMA). The PP is mapped with each resource block (RB) to achieve distributed transmit power control (DPC). We first formulate the resource (sub-channel and transmit power) selection problem as stochastic Markov game, and then solve it using two competitive MA-DRL algorithms, namely double deep Q network (DDQN) and Dueling DDQN. Each GF user as an agent tries to find out the optimal transmit power level and RB to form the desired PP. With the aid of dueling processes, the learning process can be enhanced by evaluating the valuable state without considering the effect of each action at each state. Therefore, DDQN is designed for communication scenarios with a small-size action-state space, while Dueling DDQN is for a large-size case. Our resultsshow that the proposed MA-Dueling DDQN based SGF-NOMA with DPC outperforms the SGF-NOMA system with the fixed-power-control mechanism and networks with pure GF protocols with 17.5% and 22.2% gain in terms of the system throughput, respectively. Moreover, to decrease the training time, we eliminate invalid actions (high transmit power levels) to reduce the action space. We show that our proposed algorithm is computationally scalable to massive IoT networks. Finally, to control the interference and guarantee the quality-of-service requirements of grant-based users, we find the optimal number of GF users for each sub-channel.

show abstract

Dueling Network Architectures for Deep Reinforcement Learning

Cited by 208 publications

References 7 publications

Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment

Reinforcement learning cropping method based on comprehensive feature and aesthetics assessment

Reinforcement Learning-Based Coverage Path Planning with Implicit Cellular Decomposition

A Power-Pool-Based Power Control in Semi-Grant-Free NOMA Transmission

Contact Info

Product

Resources

About