2003
DOI: 10.1007/3-540-44826-8_2
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement Learning Approaches to Coordination in Cooperative Multi-agent Systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2006
2006
2023
2023

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 25 publications
(7 citation statements)
references
References 3 publications
0
7
0
Order By: Relevance
“…(c) RL enables SUs to explore new operating environment and exploit the knowledge gained so far. An issue is that, SUs may converge to a sub-optimal joint action when one of these conditions happens: actions with severe negative rewards exist or multiple high performance actions exist [30]. Using RL helps to solve such issue found in game-based approaches since RL has the flexibility to fine tune its policy as time progresses.…”
Section: Q4mentioning
confidence: 98%
“…(c) RL enables SUs to explore new operating environment and exploit the knowledge gained so far. An issue is that, SUs may converge to a sub-optimal joint action when one of these conditions happens: actions with severe negative rewards exist or multiple high performance actions exist [30]. Using RL helps to solve such issue found in game-based approaches since RL has the flexibility to fine tune its policy as time progresses.…”
Section: Q4mentioning
confidence: 98%
“…For example, the Q-value of 𝒟 𝑗 's 𝑘 th action, i.e., A major challenge for the selection of actions is to strike a balance between exploring and exploiting. In our scheme, we have chosen the Boltzmann strategy; each agent chooses an action to perform in the next iteration with a probability that is based on its current estimate of the usefulness of that action [15], [17].…”
Section: The Proposed Spectrum Allocation Scheme For the Iomt Paradigmmentioning
confidence: 99%
“…If the users do not fetch the content from the neighboring SBSs, it will fetch it from the core network directly. Thus, the problem of each SBSs is to learn an optimal caching policy so as to minimize the expected transmission delay of the network, which is modeled in an MAMAB framework with stateless setting [128]. In this case, the SBS is an agent.…”
Section: B Content Cachingmentioning
confidence: 99%
“…The centralized critic network is deployed at the central core network to estimate an action-value function to train an actor for each SBS network through using all the SBSs' experiences. The evaluation experiments conducted within an ultra-dense network with a large number of BSs reveal that the proposed algorithm can achieve a better fetching cost and cache hit rate compared with non-cooperative multiagent actor-critic [138], (LRU) [131], and DRL [128].…”
Section: B Content Cachingmentioning
confidence: 99%