2017
DOI: 10.1609/aaai.v31i1.10708
|View full text |Cite
|
Sign up to set email alerts
|

Collective Multiagent Sequential Decision Making Under Uncertainty

Abstract: Multiagent sequential decision making has seen rapid progress with formal models such as decentralized MDPs and POMDPs. However, scalability to large multiagent systems and applicability to real world problems remain limited. To address these challenges, we study multiagent planning problems where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our work exploits recent advances in graphical models for modeling and inference with a population of individuals… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
4

Relationship

3
5

Authors

Journals

citations
Cited by 28 publications
(13 citation statements)
references
References 18 publications
(24 reference statements)
0
11
0
Order By: Relevance
“…We maintain a buffer B where each recorded sample ξ involves the tuple r ξ , n SA ξ . The n SA ξ is the state-action count table given the joint state-action was s, a (defined in section 2), and reward signal r ξ was provided by the simulator as a result of the joint action a taken in the joint state s. Given the homogeneous agent population, the reward function does not depend on the identities of the agent, as also noted in previous models (Yang et al 2018;Nguyen, Kumar, and Lau 2017a). Therefore, we learn a function approximator r w (n SA ) that takes as input the state-action count table.…”
Section: Learning System Reward Modelmentioning
confidence: 94%
See 1 more Smart Citation
“…We maintain a buffer B where each recorded sample ξ involves the tuple r ξ , n SA ξ . The n SA ξ is the state-action count table given the joint state-action was s, a (defined in section 2), and reward signal r ξ was provided by the simulator as a result of the joint action a taken in the joint state s. Given the homogeneous agent population, the reward function does not depend on the identities of the agent, as also noted in previous models (Yang et al 2018;Nguyen, Kumar, and Lau 2017a). Therefore, we learn a function approximator r w (n SA ) that takes as input the state-action count table.…”
Section: Learning System Reward Modelmentioning
confidence: 94%
“…In several large scale multiagent systems, an agent's behavior is mainly influenced by the aggregate information about neighboring agents rather than their identities (Sonu, Chen, and Doshi 2015;Robbel et al 2016;Subramanian et al 2020). For example, in taxi fleet optimization, the movement behavior of a taxi agent is primarily influenced by the total demand and the count of other taxis present in city zones (Varakantham, Adulyasak, and Jaillet 2014;Nguyen, Kumar, and Lau 2017a). In air and maritime traffic control, most of the agents can be considered as homogeneous (or belonging to a small number of types) (Brittain and Wei 2019;Singh, Kumar, and Lau 2020).…”
Section: Introductionmentioning
confidence: 99%
“…Our developed knowledge compilation methods are general and applicable in a variety of multiagent models, e.g., in collective Dec-POMDPs where agents are identical to each other (Nguyen, Kumar, and Lau 2017), or the heterogeneous setting where each agent is unique.…”
Section: The Dec-pomdp Model and Mapfmentioning
confidence: 99%
“…In cooperative sequential multiagent decision making, agents acting in a partially observable and uncertain environment are required to take coordinated decisions towards a long term goal (Durfee and Zilberstein 2013). Decentralized partially observable MDPs (Dec-POMDPs) provide a rich framework for multiagent planning (Bernstein et al 2002;Oliehoek and Amato 2016), and are applicable in domains such as vehicle fleet optimization (Nguyen, Kumar, and Lau 2017), cooperative robotics (Amato et al 2019), and multiplayer video games (Rashid et al 2018). However, scalability remains a key challenge with even a 2-agent Dec-POMDP, which is NEXP-Hard to solve optimally (Bernstein et al 2002).…”
Section: Introductionmentioning
confidence: 99%
“…All rights reserved. Nguyen, Kumar, and Lau 2017a;2017b). Our focus is on agent interactions with complex event-based rewards which depend on entire state-action histories of multiple agents.…”
Section: Introductionmentioning
confidence: 99%