A multiagent variant of Dyna-Q

Weiss, Gera

doi:10.1109/icmas.2000.858525

Cited by 19 publications

(21 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…, namely that the other agent's strategy is convergent, the strategy model can be obtained by learning agent after observing repeatedly, convergence strategies is unknown for learning agent, in this case, the joint probability between agents can ensure search of the whole problem space, which also guarantees convergence of multi-agent Q learning algorithm according formula (5). The action selection through trial an error for the learning agent, at the same time, the statistics and learning of other agents' strategy action in the beginning learning process, with the development of the learning process, the learning agent is familiar with other agents gradually and can establish its effective strategy model with relevant knowledge, the strategies mutation of other agent (may be caused by the unexpected behavior) after many learning is given only small probability of recognition, the large probability events is the main goal of learning, so learning in undetermined Markov environment according to formula(5) is suitable.…”

Section: B Discussion Of the Feasibility Of Algorithmmentioning

confidence: 99%

“…The joint probability of action under the strategy * 1 of learning agent and estimation strategy 1 of other agent is described in the * 1 2 = n i i part in formula (5), which decides the probability distribution of selection action a ´ in the new state s´, it should be noted here, because the motion vector a is composed of multiple-agent decision, the realization of search strategy is also dependent on other agent's behavior for learning agent, further, if other agent's strategy to satisfy:…”

Section: B Discussion Of the Feasibility Of Algorithmmentioning

confidence: 99%

“…Secondly, other agent's strategy should be considered for strategies selection of learning agent in MAS, the changes from current state to next state aren't all decided by actions of learning agent, other agents also choose actions which change system state, the uncertainty of successor function is caused by unknown of other agent's strategy [5] [6]. In most cases, the other agent's behavior is not random, but can be considered as action strategy of probability distribution, which is random behavior that is subject to a certain probability distribution in a certain state.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Study on Statistics Based Q-Learning Algorithm for Multi-agent System

Xie

Huang

2013

2013 Fourth International Conference on Intelligent Systems Design and Engineering Applications

View full text Add to dashboard Cite

This paper proposes statistic learning based Qlearning algorithm for Multi-Agent System, the agent can learn other agents' action policies through observing and counting the joint action, a concise but useful hypothesis is adopted to denote the optimal policies of other agents, the full joint probability of policies distribution guarantees the learning agent to choose optimal action. The algorithm can improve the learning speed because it cut conventional Qlearning space from exponential one to linear one. The convergence of the algorithm is proved, the successful application of this algorithm in the RoboCup shows its good learning performance.

show abstract

Section: B Discussion Of the Feasibility Of Algorithmmentioning

confidence: 99%

Section: B Discussion Of the Feasibility Of Algorithmmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Study on Statistics Based Q-Learning Algorithm for Multi-agent System

Xie

Huang

2013

2013 Fourth International Conference on Intelligent Systems Design and Engineering Applications

View full text Add to dashboard Cite

show abstract

“…The model of Jakkula & Cook is built in a multi-agents [76] fashioned architecture where the agents perceive directly the state of the environment from sensor's output raw data. The temporal part is constructed from Allen's intervals based temporal relations presented in Chapter 2 [27].…”

Section: Jakkula and Cookmentioning

confidence: 99%

Qualitative spatial reasoning for activity recognition using tools of ambient intelligence /

Bouchard¹

2012

View full text Add to dashboard Cite

The aging population represents a growing concern of governments due to the extent that it will take in the coming decades and the speed of its evolution. This problem will result in increasing number of people affected by many diseases associated with aging such as the various types of dementia, including the sadly famous Alzheimer's disease. People with Alzheimer's must be assisted at all time during their everyday life. Technological assistance inside what is called a smart home could bring an affordable solution to solve this concern. One of the key issues to smart home assistance is to recognize the ongoing activities of everyday life made by the patient in order to be able to provide useful services at an appropriate moment. To do so, we must build a structured knowledge base of activities from which one or many intelligent agents (communicating with each other) would use information extracted from the various sensors to take a decision on what the inhabitant could be currently doing. The best way to build such an algorithm is to exploit constraints of different natures (logical, temporal, etc.) in order to circumscribe a library of activities. Many authors have emphasized the importance of the fundamental spatial aspect in activity recognition. However, only few works exist, and they are tested in a limited way that does not allow discerning the importance of dealing with space. Important spatial criterions, such as distance between objects, could help to reduce the number of hypotheses. Moreover, many errors can be detected only by using the spatial reasoning such as position problems (inappropriate objects are brought into the activity zone) or orientation of object issue (cup of coffee is upside down when pouring coffee).This thesis provides potential solutions to the problem outlined, which deals with spatial recognition of activities of daily living of a person with Alzheimer's disease. It proposes to adapt a theory of spatial reasoning, developed by Egenhofer, to a new model for recognition of activities. This new model allows identifying the ongoing activity using only qualitative spatial criterions which we demonstrate through the text that some could not have been identified otherwise. It also allows detection of new abnormalities related to the behavior of an individual in loss of autonomy. Finally, the model has been implemented and validated in carrying out activities in a smart home on the cutting edge of technology. These activities were derived from a clinical study with normal and mild to moderate Alzheimer subjects. The results were analyzed and compared with existing approaches to measure the contribution of this thesis. RÉSUMÉLe vieillissement de la population représente une préoccupation croissante des gouvernements en raison de l'ampleur qu'il prendra dans les prochaines décennies et la rapidité de son évolution. Ce problème se traduira par l'augmentation du nombre de personnes touchées par de nombreuses maladies liées au vieillissement telles que les différents types de déme...

show abstract

“…In our project [4] we develop a multi-agent system aimed at hybrid computational intelligence models represented as a collection of autonomous agents in a multi-agent system [9]. One of the goals was to develop a unifying framework allowing time complexity estimates for agents encompassing computational methods on one hand, and a computer-aided performance analysis of the real agents behavior in a distributed environment on the other.…”

Section: Introductionmentioning

confidence: 99%