Approximate information state for approximate planning and reinforcement learning in partially observed systems

Subramanian, Jayakumar; Sinha, Amit; Seraj, Raihan; Mahajan, Aditya

doi:10.48550/arxiv.2010.08843

Cited by 4 publications

(7 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Many other IPMs have been considered in the literature including Kolmogorov distance, bounded Lipschitz metric, and maximum mean discrepancy. See, for example, ; Subramanian et al (2020). The choice of the metric often depends on the specific properties of the model.…”

Section: Approximate Gamementioning

confidence: 99%

Robustness and sample complexity of model-based MARL for general-sum Markov games

Subramanian¹,

Sinha²,

Mahajan³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Multi-agent reinfocement learning (MARL) is often modeled using the framework of Markov games (also called stochastic games or dynamic games). Most of the existing literature on MARL concentrates on zerosum Markov games but is not applicable to general-sum Markov games. It is known that the best-response dynamics in general-sum Markov games are not a contraction. Therefore, different equilibrium in general-sum Markov games can have different values. Moreover, the Q-function is not sufficient to completely characterize the equilibrium. Given these challenges, model based learning is an attractive approach for MARL in general-sum Markov games. In this paper, we investigate the fundamental question of sample complexity for model-based MARL algorithms in general-sum Markov games and show that Õ(|S| |A|(1 − γ) −2 α −2 ) samples are sufficient to obtain a α-approximate Markov perfect equilibrium with high probability, where S is the state space, A is the joint action space of all players, and γ is the discount factor, and the Õ(•) notation hides logarithmic terms. To obtain these results, we study the robustness of Markov perfect equilibrium to model approximations. We show that the Markov perfect equilibrium of an approximate (or perturbed) game is always an approximate Markov perfect equilibrium of the original game and

show abstract

Section: Approximate Gamementioning

confidence: 99%

Robustness and sample complexity of model-based MARL for general-sum Markov games

Subramanian¹,

Sinha²,

Mahajan³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…For now we note that here the conditions we set for the action compression from Ω(Γ t ) to Ω( Λ t ) are on the private states instead of defining an encapsulation directly on the actions (i.e., prescriptions); moreover, the compression may depend on the common state h 0 t as well. Hence, this falls outside of the action compression scheme studied in Subramanian et al [2020]. We bound the error between the value functions obtained from Algorithm 2 and the optimal value functions obtained from Algorithm 1 in the following theorem proved in Section 4.2.…”

Section: Compressing Private Statesmentioning

confidence: 99%

“…In Kara and Yuksel [2020] consider a special type of AIS -the N -memory, which contains the information from the last N steps. Here, the compression function is fixed but in contrast to Subramanian et al [2020], the approximation error given each history need not be uniform. When the model is known, they provide conditions that bound the regret of N -memory policies (policies that depend on N -memory), and an algorithm that finds optimal policies within this class.…”

Section: A Supplementary Detailsmentioning

confidence: 99%

“…Stochastic control theory details the conditions an information state (IS) needs to satisfy so that it acts as the Markov state in an equivalent MDP so one may only consider IS-based policies without loss of generality [Mahajan and Mannan, 2016]; the belief state is an example of such IS [Kumar and Varaiya, 2015]. Subramanian et al [2020] extends the idea to an approximate information state (AIS), where the IS conditions hold approximately; importantly, the optimality gap of running DP with any valid AIS is quantified. Based on their AIS scheme, they propose a DL framework that learns the AIS representation without knowing the model.…”

Section: Introductionmentioning

confidence: 99%

“…Strictly speaking, this requires a straightforward extension to time-varying action spaces for different time steps -seeSubramanian et al [2020] Section 5 for details.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Kao¹,

Subramanian²

2021

Preprint

View full text Add to dashboard Cite

Due to information asymmetry, finding optimal policies for Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) is hard with the complexity growing doubly exponentially in the horizon length. The challenge increases greatly in the multi-agent reinforcement learning (MARL) setting where the transition probabilities, observation kernel, and reward function are unknown. Here, we develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed. We derive the optimality gap of executing dynamic programming (DP) with the approximate states in terms of the approximation error parameters and the remaining time steps. When the compression is exact (no error), the resulting DP is equivalent to the one in existing work. Our general framework generalizes a number of methods proposed in the literature. The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme.

show abstract

Online Learning for Unknown Partially Observable MDPs

Jafarnia-Jahromi,

Jain,

Nayyar

2021

Preprint

View full text Add to dashboard Cite

Solving Partially Observable Markov Decision Processes (POMDPs) is hard. Learning optimal controllers for POMDPs when the model is unknown is harder. Online learning of optimal controllers for unknown POMDPs, which requires efficient learning using regret-minimizing algorithms that effectively tradeoff exploration and exploitation, is even harder, and no solution exists currently. In this paper, we consider infinite-horizon average-cost POMDPs with unknown transition model, though known observation model. We propose a natural posterior sampling-based reinforcement learning algorithm (POMDP-PSRL) and show that it achieves O(T 2/3 ) regret where T is the time horizon. To the best of our knowledge, this is the first online RL algorithm for POMDPs and has sub-linear regret.

show abstract

Approximate information state for approximate planning and reinforcement learning in partially observed systems

Cited by 4 publications

References 0 publications

Robustness and sample complexity of model-based MARL for general-sum Markov games

Robustness and sample complexity of model-based MARL for general-sum Markov games

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Online Learning for Unknown Partially Observable MDPs

Contact Info

Product

Resources

About