Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Sheikh, Hassam Ullah; Bölöni, Ladislau

doi:10.48550/arxiv.2003.10598

Cited by 2 publications

(2 citation statements)

References 14 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We then designed the execution and training steps of the algorithm. Following the previous methods of resource allocation in vehicular networks [40][41][42], we set a data packet transmission task as an episode and set the maximum transmission time duration as the maximum time step for each episode. It is worth mentioning that, for the sake of easier comparison with previous methods, we adopted the same execution and training framework and a similar MDP transition process.…”

Section: Learning Algorithm and Training Setupmentioning

confidence: 99%

Multi-Cell Cooperative Resource Allocation and Performance Evaluation for Roadside-Assisted Automated Driving

Yang,

Zhu,

et al. 2024

WEVJ

View full text Add to dashboard Cite

The proliferation of wireless technologies, particularly the advent of 5G networks, has ushered in transformative possibilities for enhancing vehicular communication systems, particularly in the context of autonomous driving. Leveraging sensory data and mapping information downloaded from base stations using I2V links, autonomous vehicles in these networks present the promise of enabling distant perceptual abilities essential to completing various tasks in a dynamic environment. However, the efficient down-link transmission of vehicular network data via base stations, often relying on spectrum sharing, presents a multifaceted challenge. This paper addresses the intricacies of spectrum allocation in vehicular networks, aiming to resolve the thorny issues of cross-station interference and coupling while adapting to the dynamic and evolving characteristics of the vehicular environment. A novel approach is suggested involving the utilization of a multi-agent option-critic reinforcement learning algorithm. This algorithm serves a dual purpose: firstly, it learns the most efficient way to allocate spectrum resources optimally. Secondly, it adapts to the ever-changing dynamics of the environment by learning various policy options tailored to different situations. Moreover, it identifies the conditions under which a switch between these policy options is warranted as the situation evolves. The proposed algorithm is structured in two layers, with the upper layer consisting of policy options that are shared across all agents, and the lower layer comprising intra-option policies executed in a distributed manner. Through experimentation, we showcase the superior spectrum efficiency and communication quality achieved by our approach. Specifically, our approach outperforms the baseline methods in terms of training average reward convergence stability and the transmission success rate. Control-variable experiments also reflect the better adaptability of the proposed method as the environmental conditions change, underscoring the significant potential of the proposed method in aiding successful down-link transmissions in vehicular networks.

show abstract

Section: Learning Algorithm and Training Setupmentioning

confidence: 99%

Multi-Cell Cooperative Resource Allocation and Performance Evaluation for Roadside-Assisted Automated Driving

Yang,

Zhu,

et al. 2024

WEVJ

View full text Add to dashboard Cite

show abstract

“…Another solution is to use the RNN instead of feed forward neural network and consider the information of the agent's state and action histories h t for action selection [42]. Since the state of the environment of our proposed model is partially observable and there are two different objectives for the agents as individual and common objectives, we utilize the MASRDDPG which is a derivative of the decomposed multi agent deep deterministic policy gradient (DE-MADDPG) proposed in [43]. For further analyzes, we consider the RDPG method as a second solution since it is suitable for partially observable and uncertain environments [42].…”

Section: Solution and Algorithmmentioning

confidence: 99%

AI-Based Secure NOMA and Cognitive Radio enabled Green Communications: Channel State Information and Battery Value Uncertainties

Sheikhzadeh¹,

Pourghasemian²,

Javan³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this paper, the security-aware robust resource allocation in energy harvesting cognitive radio networks is considered with cooperation between two transmitters while there are uncertainties in channel gains and battery energy value. To be specific, the primary access point harvests energy from the green resource and uses time switching protocol to send the energy and data towards the secondary access point (SAP). Using power-domain non-orthogonal multiple access technique, the SAP helps the primary network to improve the security of data transmission by using the frequency band of the primary network. In this regard, we introduce the problem of maximizing the proportional-fair energy efficiency (PFEE) considering uncertainty in the channel gains and battery energy value subject to the practical constraints. Moreover, the channel gain of the eavesdropper is assumed to be unknown. Employing the decentralized partially observable Markov decision process, we investigate the solution of the corresponding resource allocation problem. We exploit multi-agent with single reward deep deterministic policy gradient (MASRDDPG) and recurrent deterministic policy gradient (RDPG) methods. These methods are compared with the state-of-the-art ones like multi-agent and single-agent DDPG. Simulation results show that both MASRDDPG and RDPG methods, outperform the state-of-the-art methods by providing more PFEE to the network.Index terms-Power-domain non-orthogonal multiple access, proportional-fair energy efficiency, cooperation cognitive communication, wireless energy transfer, partially observable Markov decision processes, uncertainty, multi-agent with single reward deep deterministic policy gradient, recurrent deterministic policy gradient.

show abstract

Multi-Agent Reinforcement Learning for Problems with Combined Individual and Team Reward

Cited by 2 publications

References 14 publications

Multi-Cell Cooperative Resource Allocation and Performance Evaluation for Roadside-Assisted Automated Driving

Multi-Cell Cooperative Resource Allocation and Performance Evaluation for Roadside-Assisted Automated Driving

AI-Based Secure NOMA and Cognitive Radio enabled Green Communications: Channel State Information and Battery Value Uncertainties

Contact Info

Product

Resources

About