Proceedings Fourth International Conference on MultiAgent Systems
DOI: 10.1109/icmas.2000.858525
|View full text |Cite
|
Sign up to set email alerts
|

A multiagent variant of Dyna-Q

Abstract: This paper describes a multiagent variant of Dyna-Q called M-Dyna-Q. Dyna-Q is an integrated single-agent framework for planning, reacting, and learning. Like Dyna-Q, M-Dyna-Q employs two key ideas: learning results can serve as a valuable input for both planning and reacting, and results of planning and reacting can serve as a valuable input to learning. M-Dyna-Q extends Dyna-Q in that planning, reacting, and learning are jointly realized by multiple agents.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0
1

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 19 publications
(21 citation statements)
references
References 4 publications
0
20
0
1
Order By: Relevance
“…, namely that the other agent's strategy is convergent, the strategy model can be obtained by learning agent after observing repeatedly, convergence strategies is unknown for learning agent, in this case, the joint probability between agents can ensure search of the whole problem space, which also guarantees convergence of multi-agent Q learning algorithm according formula (5). The action selection through trial an error for the learning agent, at the same time, the statistics and learning of other agents' strategy action in the beginning learning process, with the development of the learning process, the learning agent is familiar with other agents gradually and can establish its effective strategy model with relevant knowledge, the strategies mutation of other agent (may be caused by the unexpected behavior) after many learning is given only small probability of recognition, the large probability events is the main goal of learning, so learning in undetermined Markov environment according to formula(5) is suitable.…”
Section: B Discussion Of the Feasibility Of Algorithmmentioning
confidence: 99%
See 2 more Smart Citations
“…, namely that the other agent's strategy is convergent, the strategy model can be obtained by learning agent after observing repeatedly, convergence strategies is unknown for learning agent, in this case, the joint probability between agents can ensure search of the whole problem space, which also guarantees convergence of multi-agent Q learning algorithm according formula (5). The action selection through trial an error for the learning agent, at the same time, the statistics and learning of other agents' strategy action in the beginning learning process, with the development of the learning process, the learning agent is familiar with other agents gradually and can establish its effective strategy model with relevant knowledge, the strategies mutation of other agent (may be caused by the unexpected behavior) after many learning is given only small probability of recognition, the large probability events is the main goal of learning, so learning in undetermined Markov environment according to formula(5) is suitable.…”
Section: B Discussion Of the Feasibility Of Algorithmmentioning
confidence: 99%
“…The joint probability of action under the strategy * 1 of learning agent and estimation strategy 1 of other agent is described in the * 1 2 = n i i part in formula (5), which decides the probability distribution of selection action a ´ in the new state s´, it should be noted here, because the motion vector a is composed of multiple-agent decision, the realization of search strategy is also dependent on other agent's behavior for learning agent, further, if other agent's strategy to satisfy:…”
Section: B Discussion Of the Feasibility Of Algorithmmentioning
confidence: 99%
See 1 more Smart Citation
“…The model of Jakkula & Cook is built in a multi-agents [76] fashioned architecture where the agents perceive directly the state of the environment from sensor's output raw data. The temporal part is constructed from Allen's intervals based temporal relations presented in Chapter 2 [27].…”
Section: Jakkula and Cookmentioning
confidence: 99%
“…In our project [4] we develop a multi-agent system aimed at hybrid computational intelligence models represented as a collection of autonomous agents in a multi-agent system [9]. One of the goals was to develop a unifying framework allowing time complexity estimates for agents encompassing computational methods on one hand, and a computer-aided performance analysis of the real agents behavior in a distributed environment on the other.…”
Section: Introductionmentioning
confidence: 99%