Adaptive Multi-robot Team Reconfiguration Using a Policy-Reuse Reinforcement Learning Approach

Dasgupta, Prithviraj; Cheng, Ke; Banerjee, Bikramjit

doi:10.1007/978-3-642-27216-5_23

Cited by 5 publications

(3 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When δ = 0.25, the number of core-policies is around 14. Interestingly, this is very close to the number of rooms in the domain (15). While increasing δ, the number of core-policies increases and when δ = 1, almost all the learned policies are stored.…”

Section: Learning the Structure Of The Domainsupporting

confidence: 53%

“…The use of Policy Reuse for transfer learning among different state and action spaces (typically called inter-task transfer [11]), and its evaluation in the Keepaway can be found in the literature [12][13][14]. Variations of Policy Reuse algorithms can also be found for multi-robot reconfiguration [15] and learning from demonstration, also in the Keepaway [16]. In this work we use a grid-based domain that allows us to highlight some properties which are more difficult to represent in other domains.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Learning domain structure through probabilistic policy reuse in reinforcement learning

Fernández

Veloso

2012

Prog Artif Intell

View full text Add to dashboard Cite

Policy Reuse is a transfer learning approach to improve a reinforcement learner with guidance from previously learned similar action policies. The method uses the past policies as a probabilistic bias where the learner chooses among the exploitation of the ongoing learned policy, the exploration of random unexplored actions, and the exploitation of past policies. In this work we demonstrate that Policy Reuse further contributes to the learning of the structure of a domain. Interestingly and almost as a side effect, Policy Reuse identifies classes of similar policies revealing a basis of core-policies of the domain. We demonstrate theoretically that, under a set of conditions to be satisfied, reusing such a set of core-policies allows us to bound the minimal expected gain received while learning a new policy. In general, Policy Reuse contributes to the overall goal of lifelong reinforcement learning, as (i) it incrementally builds a policy library; (ii) it provides a mechanism to reuse past policies; and (iii) it learns an abstract domain structure in terms of corepolicies of the domain.

show abstract

Section: Learning the Structure Of The Domainsupporting

confidence: 53%

Section: Introductionmentioning

confidence: 99%

Learning domain structure through probabilistic policy reuse in reinforcement learning

Fernández

Veloso

2012

Prog Artif Intell

View full text Add to dashboard Cite

show abstract

“…Bayesian Reinforcement Learning (RL) allows the agents to learn the capabilities of other agents through interactions and transform repetitive coalition formation problems into sequential decision problems [21]. This approach was validated in the soccer team formation problem [22], Modeling Dynamic Robot Formation for the Area Coverage Problem Using Weighted Voting Games and Q-Learning, and Extended to Formation-Based Navigation Problems [23,24]. The formation structure is pruned using Shapley values and marginal contributions, and the transition of robots from one formation to another is represented by the Markov process, which searches for the optimal structure of the formation space using a Markov probability distribution.…”

Section: Introductionmentioning

confidence: 99%

Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments

Liang,

Li,

2023

Sensors

View full text Add to dashboard Cite

To address the challenge of coordinated combat involving multiple UAVs in reconnaissance and search attacks, we propose the Multi-UAV Distributed Self-Organizing Cooperative Intelligence Surveillance and Combat (CISCS) strategy. This strategy employs distributed control to overcome issues associated with centralized control and communication difficulties. Additionally, it introduces a time-constrained formation controller to address the problem of unstable multi-UAV formations and lengthy formation times. Furthermore, a multi-task allocation algorithm is designed to tackle the issue of allocating multiple tasks to individual UAVs, enabling autonomous decision-making at the local level. The distributed self-organized multi-UAV cooperative reconnaissance and combat strategy consists of three main components. Firstly, a multi-UAV finite time formation controller allows for the rapid formation of a mission-specific formation in a finite period. Secondly, a multi-task goal assignment module generates a task sequence for each UAV, utilizing an improved distributed Ant Colony Optimization (ACO) algorithm based on Q-Learning. This module also incorporates a colony disorientation strategy to expand the search range and a search transition strategy to prevent premature convergence of the algorithm. Lastly, a UAV obstacle avoidance module considers internal collisions and provides real-time obstacle avoidance paths for multiple UAVs. In the first part, we propose a formation algorithm in finite time to enable the quick formation of multiple UAVs in a three-dimensional space. In the second part, an improved distributed ACO algorithm based on Q-Learning is introduced for task allocation and generation of task sequences. This module includes a colony disorientation strategy to expand the search range and a search transition strategy to avoid premature convergence. In the third part, a multi-task target assignment module is presented to generate task sequences for each UAV, considering internal collisions. This module provides real-time obstacle avoidance paths for multiple UAVs, preventing premature convergence of the algorithm. Finally, we verify the practicality and reliability of the strategy through simulations.

show abstract

Approaches to dynamic team sizes

Nitschke

Tolkamp

2013

2013 IEEE Workshop on Robotic Intelligence in Informationally Structured Space (RiiSS)

View full text Add to dashboard Cite

Adaptive Multi-robot Team Reconfiguration Using a Policy-Reuse Reinforcement Learning Approach

Cited by 5 publications

References 25 publications

Learning domain structure through probabilistic policy reuse in reinforcement learning

Learning domain structure through probabilistic policy reuse in reinforcement learning

Multi-UAV Collaborative Search and Attack Mission Decision-Making in Unknown Environments

Approaches to dynamic team sizes

Contact Info

Product

Resources

About