2012
DOI: 10.1007/978-3-642-27216-5_23
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Multi-robot Team Reconfiguration Using a Policy-Reuse Reinforcement Learning Approach

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

1
2
0

Year Published

2012
2012
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(3 citation statements)
references
References 25 publications
1
2
0
Order By: Relevance
“…When δ = 0.25, the number of core-policies is around 14. Interestingly, this is very close to the number of rooms in the domain (15). While increasing δ, the number of core-policies increases and when δ = 1, almost all the learned policies are stored.…”
Section: Learning the Structure Of The Domainsupporting
confidence: 53%
See 1 more Smart Citation
“…When δ = 0.25, the number of core-policies is around 14. Interestingly, this is very close to the number of rooms in the domain (15). While increasing δ, the number of core-policies increases and when δ = 1, almost all the learned policies are stored.…”
Section: Learning the Structure Of The Domainsupporting
confidence: 53%
“…The use of Policy Reuse for transfer learning among different state and action spaces (typically called inter-task transfer [11]), and its evaluation in the Keepaway can be found in the literature [12][13][14]. Variations of Policy Reuse algorithms can also be found for multi-robot reconfiguration [15] and learning from demonstration, also in the Keepaway [16]. In this work we use a grid-based domain that allows us to highlight some properties which are more difficult to represent in other domains.…”
Section: Introductionmentioning
confidence: 99%
“…Bayesian Reinforcement Learning (RL) allows the agents to learn the capabilities of other agents through interactions and transform repetitive coalition formation problems into sequential decision problems [21]. This approach was validated in the soccer team formation problem [22], Modeling Dynamic Robot Formation for the Area Coverage Problem Using Weighted Voting Games and Q-Learning, and Extended to Formation-Based Navigation Problems [23,24]. The formation structure is pruned using Shapley values and marginal contributions, and the transition of robots from one formation to another is represented by the Markov process, which searches for the optimal structure of the formation space using a Markov probability distribution.…”
Section: Introductionmentioning
confidence: 99%