Uri Sherman scite author profile

Uri Sherman

4Publications

9Citation Statements Received

106Citation Statements Given

How they've been cited

How they cite others

105

Affiliations

Publications

Order By: Most citations

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

Sherman¹,

Koren²,

Mansour³

2023

Preprint

View full text Add to dashboard Cite

We study reinforcement learning with linear function approximation and adversarially changing cost functions, a setup that has mostly been considered under simplifying assumptions such as full information feedback or exploratory conditions. We present a computationally efficient policy optimization algorithm for the challenging general setting of unknown dynamics and bandit feedback, featuring a combination of mirror-descent and least squares policy evaluation in an auxiliary MDP used to compute exploration bonuses. Our algorithm obtains an O(K 6/7 ) regret bound, improving significantly over previous state-of-the-art of O(K 14/15 ) in this setting. In addition, we present a version of the same algorithm under the assumption a simulator of the environment is available to the learner (but otherwise no exploratory assumptions are made), and prove it obtains state-of-the-art regret of O(K 2/3 ).

show abstract

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Lancewicki¹,

Sherman²,

Koren³

et al. 2022

Preprint

View full text Add to dashboard Cite

An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.

show abstract

Benign Underfitting of Stochastic Gradient Descent

Koren¹,

Livni²,

Mansour³

et al. 2022

Preprint

View full text Add to dashboard Cite

We study to what extent may stochastic gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit to training data. We consider the fundamental stochastic convex optimization framework, where (one pass, without -replacement) SGD is classically known to minimize the population risk at rate O(1/ √ n), and prove that, surprisingly, there exist problem instances where the SGD solution exhibits both empirical risk and generalization gap of Ω(1). Consequently, it turns out that SGD is not algorithmically stable in any sense, and its generalization ability cannot be explained by uniform convergence or any other currently known generalization bound technique for that matter (other than that of its classical analysis). We then continue to analyze the closely related with-replacement SGD, for which we show that an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate. Finally, we interpret our main results in the context of without-replacement SGD for finite-sum convex optimization problems, and derive upper and lower bounds for the multi-epoch regime that significantly improve upon previously known results.

show abstract

Lazy OCO: Online Convex Optimization on a Switching Budget

Sherman¹,

Koren²

2021

Preprint

View full text Add to dashboard Cite

We study a variant of online convex optimization where the player is permitted to switch decisions at most S times in expectation throughout T rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary. In this work, we aim to fill the gap and present computationally efficient algorithms in the more prevalent oblivious setting, establishing a regret bound of O(T /S) for general convex losses and O(T /S 2 ) for strongly convex losses. In addition, for stochastic i.i.d. losses, we present a simple algorithm that performs log T switches with only a multiplicative log T factor overhead in its regret in both the general and strongly convex settings. Finally, we complement our algorithms with lower bounds that match our upper bounds in some of the cases we consider.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Uri Sherman

Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

Regret Minimization and Convergence to Equilibria in General-sum Markov Games

Benign Underfitting of Stochastic Gradient Descent

Lazy OCO: Online Convex Optimization on a Switching Budget

Contact Info

Product

Resources

About