Chen-Yu Wei scite author profile

In this work, we develop linear bandit algorithms that automatically adapt to different environments. By plugging a novel loss estimator into the optimization problem that characterizes the instance-optimal strategy, our first algorithm not only achieves nearly instance-optimal regret in stochastic environments, but also works in corrupted environments with additional regret being the amount of corruption, while the state-of-the-art (Li et al., 2019) achieves neither instance-optimality nor the optimal dependence on the corruption amount. Moreover, by equipping this algorithm with an adversarial component and carefully-designed testings, our second algorithm additionally enjoys minimax-optimal regret in completely adversarial environments, which is the first of this kind to our knowledge. Finally, all our guarantees hold with high probability, while existing instance-optimal guarantees only hold in expectation.

show abstract

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

Wei¹,

Jafarnia-Jahromi²,

Luo³

et al. 2020

Preprint

View full text Add to dashboard Cite

We develop several new algorithms for learning Markov Decision Processes in an infinite-horizon average-reward setting with linear function approximation. Using the optimism principle and assuming that the MDP has a linear structure, we first propose a computationally inefficient algorithm with optimal O( √ T ) regret and another computationally efficient variant with O(T 3 4 ) regret, where T is the number of interactions. Next, taking inspiration from adversarial linear bandits, we develop yet another efficient algorithm with O( √ T ) regret under a different set of assumptions, improving the best existing result by Hao et al. [16] with O(T 2 3 ) regret. Moreover, we draw a connection between this algorithm and the Natural Policy Gradient algorithm proposed by Kakade [22], and show that our analysis improves the sample complexity bound recently given by Agarwal et al. [4].Preprint. Under review.

show abstract

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Wei

Lee

Zhang

et al. 2021

Preprint

View full text Add to dashboard Cite

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a decentralized algorithm that provably converges to the set of Nash equilibria under self-play. Our algorithm is based on running an Optimistic Gradient Descent Ascent algorithm on each state to learn the policies, with a critic that slowly learns the value of each state. To the best of our knowledge, this is the first algorithm in this setting that is simultaneously rational (converging to the opponent's best response when it uses a stationary policy), convergent (converging to the set of Nash equilibria under self-play), agnostic (no need to know the actions played by the opponent), symmetric (players taking symmetric roles in the algorithm), and enjoying a finite-time last-iterate convergence guarantee, all of which are desirable properties of decentralized algorithms. * Equal contribution.

show abstract

Federated Residual Learning

Agarwal¹,

Langford²,

Wei³

2020

Preprint

View full text Add to dashboard Cite

Irreversible Adaptive Allocation Rules

Hu¹,

Wei²

1989

Ann. Statist.

View full text Add to dashboard Cite

Online Reinforcement Learning in Stochastic Games

Wei

Hong

2017

Preprint

View full text Add to dashboard Cite

We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the UCSG algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the diameter, which is an intrinsic value related to the mixing property of SGs. If we let the opponent play an optimistic best response to the learner, UCSG finds an ε-maximin stationary policy with a sample complexity of Õ (poly(1/ε)), where ε is the gap to the best policy.

show abstract

Acute psychosis induced by mRNA-based COVID-19 vaccine in adolescents: A pediatric case report

Lien¹,

Wei²,

Liang³

2023

Pediatrics & Neonatology

View full text Add to dashboard Cite

Refined Regret for Adversarial MDPs with Linear Function Approximation

Yan¹,

Luo²,

Wei³

et al. 2023

Preprint

View full text Add to dashboard Cite

We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over K episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021b) is of order O(K 2/3 ) (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to O( √ K) in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the logbarrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced loss estimator, further removing the polynomial dependency on the number of actions in the first algorithm and leading to the optimal regret bound (up to logarithmic terms and dependency on the horizon). Moreover, we also extend the first algorithm to simulator-free linear MDPs, which achieves O(K 8/9 ) regret and greatly improves over the best existing bound O(K 14/15 ). This algorithm relies on a better alternative to the Matrix Geometric Resampling procedure by Neu and Olkhovskaya (2020), which could again be of independent interest.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chen-Yu Wei

Achieving Near Instance-Optimality and Minimax-Optimality in Stochastic and Adversarial Linear Bandits Simultaneously

Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games

Federated Residual Learning

Irreversible Adaptive Allocation Rules

Online Reinforcement Learning in Stochastic Games

Acute psychosis induced by mRNA-based COVID-19 vaccine in adolescents: A pediatric case report

Refined Regret for Adversarial MDPs with Linear Function Approximation

Contact Info

Product

Resources

About