A decentralized control system with linear dynamics, quadratic cost, and Gaussian disturbances is considered. The system consists of a finite number of subsystems whose dynamics and per-step cost function are coupled through their mean-field (empirical average). The system has mean-field sharing information structure, i.e., each controller observes the state of its local subsystem (either perfectly or with noise) and the mean-field. It is shown that the optimal control law is unique, linear, and identical across all subsystems. Moreover, the optimal gains are computed by solving two decoupled Riccati equations in the full observation model and by solving an additional filter Riccati equation in the noisy observation model. These Riccati equations do not depend on the number of subsystems. It is also shown that the optimal decentralized performance is the same as the optimal centralized performance. An example, motivated by smart grids, is presented to illustrate the result.
This paper studies a large number of homogeneous Markov decision processes where the transition probabilities and costs are coupled in the empirical distribution of states (also called mean-field). The state of each process is not known to others, which means that the information structure is fully decentralized. The objective is to minimize the average cost, defined as the empirical mean of individual costs, for which a sub-optimal solution is proposed. This solution does not depend on the number of processes, yet it converges to the optimal solution of the so-called mean-field sharing as the number of processes tends to infinity. Under some mild conditions, it is shown that the convergence rate of the proposed decentralized solution is proportional to the square root of the inverse of the number of processes. Finding this sub-optimal solution involves a non-smooth non-convex optimization problem over an uncountable set, in general. To overcome this drawback, a combinatorial optimization problem is introduced that achieves the same rate of convergence.
We investigate team optimal control of stochastic subsystems that are weakly coupled in dynamics (through the mean-field of the system) and are arbitrary coupled in the cost. The controller of each subsystem observes its local state and the mean-field of the state of all subsystems. The system has a nonclassical information structure. Exploiting the symmetry of the problem, we identify an information state and use that to obtain a dynamic programming decomposition. This dynamic program determines a globally optimal strategy for all controllers. Our solution approach works for arbitrary number of controllers and generalizes to the setup when the mean-field is observed with noise. The size of the information state is time-invariant; thus, the results generalize to the infinite-horizon control setups as well. In addition, when the mean-field is observed without noise, the size of the corresponding information state increases polynomially (rather than exponentially) with the number of controllers which allows us to solve problems with moderate number of controllers. We illustrate our approach by an example motivated by smart grids that consists of 100 coupled subsystems.
Inspired by the concepts of deep learning in artificial intelligence and fairness in behavioural economics, we introduce deep teams in this paper. In such systems, agents are partitioned into a few sub-populations so that the dynamics and cost of agents in each sub-population is invariant to the indexing of agents. The goal of agents is to minimize a common cost function in such a manner that the agents in each sub-population are not discriminated or privileged by the way they are indexed. Two non-classical information structures are studied. In the first one, each agent observes its local state as well as the empirical distribution of the states of agents in each sub-population, called deep state, whereas in the second one, the deep states of a subset (possibly all) of sub-populations are not observed. Novel dynamic programs are developed to identify globally optimal and sub-optimal solutions for the first and second information structures, respectively. The computational complexity of finding the optimal solution in both space and time is polynomial (rather than exponential) with respect to the number of agents in each sub-population and is linear (rather than exponential) with respect to the control horizon. This complexity is further reduced in time by introducing a forward equation, that we call deep Chapman-Kolmogorov equation, described by multiple convolutional layers of Binomial probability distributions. Two different prices are defined for computation and communication, and it is shown that under mild assumptions they converge to zero as the quantization level and the number of agents tend to infinity. In addition, the main results are extended to the infinite-horizon discounted model and arbitrarily asymmetric cost function. Finally, a service-management example with 200 users is presented.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.