We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least ๐ chains to reach their target states. If the player decides to play a di erent chain, an additional switching cost is incurred. e special case in which there is no switching cost was solved optimally by Dumitriu, Tetali and Winkler [DTW03] by a variant of the celebrated Gi ins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards [Git74, Git79]. However, for multi-armed bandit (MAB) with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram [BS94] showed that no index strategy can be optimal. 1 In this paper, we complement their result and show there is a simple index strategy that achieves a constant approximation factor if the switching cost is constant and ๐ = 1. To the best of our knowledge, this is the rst index strategy that achieves a constant approximation factor for a general MAB variant with switching costs. For the general metric, we propose a more involved constant-factor approximation algorithm, via an nontrivial reduction to the stochastic ๐-TSP problem, in which a Markov chain is approximated by a random variable. Our analysis makes extensive use of various interesting properties of Gi ins index.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citationsโcitations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.