Integrating Reinforcement Learning with Multi-Agent Techniques for Adaptive Service Composition

Wang, Hongbign; Chen, Xin; Wu, Qing; Yu, Qi; Hu, Xingguo; Zheng, Zibin; Bouguettaya, Athman

doi:10.1145/3058592

Cited by 30 publications

(22 citation statements)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Multi-agent system (a distributed system composed of multiple independent autonomous agents, which are in the same working environment, can sense the environmental information and perform their own actions) robot soccer match is a typical multi-agent system research platform, which is also a field of artificial intelligence and robotics machine learning. 1,2 At present, the subject of high challenge has received extensive attention and research. 3,4,5 However, the process of robot soccer match is complex, dynamic and uncertain, which makes the decision-making system based on expert knowledge is lack sufficient completeness and flexibility to deal the process of the complex, dynamic and uncertain of robot football game, however, the reinforcement learning method does not need accurate environment model and complete expert knowledge.…”

Section: Introductionmentioning

confidence: 99%

RETRACTED: Research on decision-making strategy of soccer robot based on multi-agent reinforcement learning

Liu

2020

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

This article studies a multi-agent reinforcement learning algorithm based on agent action prediction. In multi-agent system, the action of learning agent selection is inevitably affected by the action of other agents, so the reinforcement learning system needs to consider the joint state and joint action of multi-agent based on this. In addition, the application of this method in the cooperative strategy learning of soccer robot is studied, so that the multi-agent system can pass through the environment. To realize the division of labour and cooperation of multi-robots, the interactive learning is used to master the behaviour strategy. Combined with the characteristics of decision-making of soccer robot, this article analyses the role transformation and experience sharing of multi-agent reinforcement learning, and applies it to the local attack strategy of soccer robot, uses this algorithm to learn the action selection strategy of the main robot in the team, and uses Matlab platform for simulation verification. The experimental results prove the effectiveness of the research method, and the superiority of the proposed method is validated compared with some simple methods.

show abstract

Section: Introductionmentioning

confidence: 99%

RETRACTED: Research on decision-making strategy of soccer robot based on multi-agent reinforcement learning

Liu

2020

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

show abstract

“…First, we analyze the efficiency of the proposed long-term qualitative composition framework for a new set of requests (without history). We compare the proposed approaches with four state-of-the-art techniques: a) Global Dynamic Programming [77], b) 2-d Q-learning [43], c) Heuristics based optimization [40], and d) On-policy SARSA learning approach [61].…”

Section: Methodsmentioning

confidence: 99%

“…Finally, long-term requests are accepted through collaborative decisions of local optimizations, i.e., based on the global utility score. • On-policy SARSA learning approach: A modified SARSA (State-Action-Reward-State-Action) algorithm is proposed as the on-policy reinforcement learning approach for adaptive service composition [61]. The difference between SARSA and Q-learning is that SARSA selects requests (action) following the same current policy and updates its Q-values.…”

Section: Methodsmentioning

confidence: 99%

Sequential Learning-based IaaS Composition

2021

Self Cite

View full text Add to dashboard Cite

We propose a novel Infrastructure-as-a-Service composition framework that selects an optimal set of consumer requests according to the provider’s qualitative preferences on long-term service provisions. Decision variables are included in the temporal conditional preference networks to represent qualitative preferences for both short-term and long-term consumers. The global preference ranking of a set of requests is computed using a k -d tree indexing-based temporal similarity measure approach. We propose an extended three-dimensional Q-learning approach to maximize the global preference ranking. We design the on-policy-based sequential selection learning approach that applies the length of request to accept or reject requests in a composition. The proposed on-policy-based learning method reuses historical experiences or policies of sequential optimization using an agglomerative clustering approach. Experimental results prove the feasibility of the proposed framework.

show abstract

“…Last but not least, the work of Wang et al [111] has been identified to fulfil the above-mentioned criteria. In their work, reinforcement learning techniques are combined with multiagent technology in the context of self-adaptive service composition.…”

Section: Improvement Aspectsmentioning

confidence: 99%