DDPG‐Based Energy‐Efficient Flow Scheduling Algorithm in Software‐Defined Data Centers

Yao, Zan; Wang, Ying; Meng, Luoming; Qiu, Xuesong; Yu, Peng

doi:10.1155/2021/6629852

Cited by 5 publications

(5 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It uses the offline learning method and Q-network for training and takes samples from the replay buffer to minimize correlation between samples. The authors in [22] adapt the DDPG algorithm to find the optimal scheduling scheme for flows. The authors in [23,24] present a QoS optimization algorithm based on DDPG that ultimately improves the load-balancing degree and throughput rate to ensure delay and packet-loss rate.…”

Section: Related Researchmentioning

confidence: 99%

“…Reinforcement learning opens a new way for solving complex network problems [15]. Some researchers have used traditional algorithms of reinforcement learning such as deep Q-learning network (DQN) [16][17][18][19], proximal policy optimization (PPO) [20], deep deterministic policy gradient (DDPG) [21][22][23][24][25][26][27][28], and twin delayed deep deterministic policy gradient (TD3) [29][30][31]. The DQN algorithm uses Q-tables to store value functions, but it leads to excessive memory overhead and sizeable computational complexity when the network size increases.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking

Chen

Wang

Ou³

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

Due to the rapid development of network communication technology and the significant increase in network terminal equipment, the application of new network architecture software-defined networking (SDN) combined with reinforcement learning in network traffic scheduling has become an important focus of research. Because of network traffic transmission variability and complexity, the traditional reinforcement-learning algorithms in SDN face problems such as slow convergence rates and unbalanced loads. The problems seriously affect network performance, resulting in network link congestion and the low efficiency of inter-stream bandwidth allocation. This paper proposes an automatic load-balancing architecture based on reinforcement learning (ALBRL) in SDN. In this architecture, we design a load-balancing optimization model in high-load traffic scenarios and adapt the improved Deep Deterministic Policy Gradient (DDPG) algorithm to find a near-optimal path between network hosts. The proposed ALBRL uses the sampling method of updating the experience pool with the SumTree structure to improve the random extraction strategy of the empirical-playback mechanism in DDPG. It extracts a more meaningful experience for network updating with greater probability, which can effectively improve the convergence rate. The experiment results show that the proposed ALBRL has a faster training speed than existing reinforcement-learning algorithms and significantly improves network throughput.

show abstract

Section: Related Researchmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking

Chen

Wang

Ou³

et al. 2022

Wireless Communications and Mobile Computing

View full text Add to dashboard Cite

show abstract

“…Equation ( 5) has already been proved for its establishment [29,30]. Even the zero mean error of the initial state will lead to an overestimation of the action value due to the update of the value function, and the adverse efect of this error will be gradually enlarged by the calculation of the Bellman equation.…”

Section: Error Analysis It Is An Inevitable Problem For Q-mentioning

confidence: 99%

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Zhang¹,

Zhang

et al. 2022

Computational Intelligence and Neuroscience

View full text Add to dashboard Cite

The traditional Deep Deterministic Policy Gradient (DDPG) algorithm has been widely used in continuous action spaces, but it still suffers from the problems of easily falling into local optima and large error fluctuations. Aiming at these deficiencies, this paper proposes a dual-actor-dual-critic DDPG algorithm (DN-DDPG). First, on the basis of the original actor-critic network architecture of the algorithm, a critic network is added to assist the training, and the smallest Q value of the two critic networks is taken as the estimated value of the action in each update. Reduce the probability of local optimal phenomenon; then, introduce the idea of dual-actor network to alleviate the underestimation of value generated by dual-evaluator network, and select the action with the greatest value in the two-actor networks to update to stabilize the training of the algorithm process. Finally, the improved method is validated on four continuous action tasks provided by MuJoCo, and the results show that the improved method can reduce the fluctuation range of error and improve the cumulative return compared with the classical algorithm.

show abstract

“…For the dynamic and stochastic nature of order dispatching in ride-sharing platforms, Tang X. et al [24] proposed an order dispatching solution based on deep reinforcement learning, and verified the effectiveness of the algorithm through large-scale online tests. In addition, the application of DRL to network flow control problems [25], financial market intraday trading [26], subway train dispatching [27], etc. proved the superiority and effectiveness of DRL in solving sequence decision-making.…”

Section: Literature Reviewmentioning

confidence: 99%

Scheduling of AGVs in Automated Container Terminal Based on the Deep Deterministic Policy Gradient (DDPG) Using the Convolutional Neural Network (CNN)

Chen

Wang

2021

JMSE

View full text Add to dashboard Cite

In order to improve the horizontal transportation efficiency of the terminal Automated Guided Vehicles (AGVs), it is necessary to focus on coordinating the time and space synchronization operation of the loading and unloading of equipment, the transportation of equipment during the operation, and the reduction in the completion time of the task. Traditional scheduling methods limited dynamic response capabilities and were not suitable for handling dynamic terminal operating environments. Therefore, this paper discusses how to use delivery task information and AGVs spatiotemporal information to dynamically schedule AGVs, minimizes the delay time of tasks and AGVs travel time, and proposes a deep reinforcement learning algorithm framework. The framework combines the benefits of real-time response and flexibility of the Convolutional Neural Network (CNN) and the Deep Deterministic Policy Gradient (DDPG) algorithm, and can dynamically adjust AGVs scheduling strategies according to the input spatiotemporal state information. In the framework, firstly, the AGVs scheduling process is defined as a Markov decision process, which analyzes the system’s spatiotemporal state information in detail, introduces assignment heuristic rules, and rewards the reshaping mechanism in order to realize the decoupling of the model and the AGVs dynamic scheduling problem. Then, a multi-channel matrix is built to characterize space–time state information, the CNN is used to generalize and approximate the action value functions of different state information, and the DDPG algorithm is used to achieve the best AGV and container matching in the decision stage. The proposed model and algorithm frame are applied to experiments with different cases. The scheduling performance of the adaptive genetic algorithm and rolling horizon approach is compared. The results show that, compared with a single scheduling rule, the proposed algorithm improves the average performance of task completion time, task delay time, AGVs travel time and task delay rate by 15.63%, 56.16%, 16.36% and 30.22%, respectively; compared with AGA and RHPA, it reduces the tasks completion time by approximately 3.10% and 2.40%.

show abstract

DDPG‐Based Energy‐Efficient Flow Scheduling Algorithm in Software‐Defined Data Centers

Cited by 5 publications

References 24 publications

ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking

ALBRL: Automatic Load-Balancing Architecture Based on Reinforcement Learning in Software-Defined Networking

Network Architecture for Optimizing Deep Deterministic Policy Gradient Algorithms

Scheduling of AGVs in Automated Container Terminal Based on the Deep Deterministic Policy Gradient (DDPG) Using the Convolutional Neural Network (CNN)

Contact Info

Product

Resources

About