Teodor V. Marinov scite author profile

Teodor V. Marinov

4Publications

13Citation Statements Received

106Citation Statements Given

How they've been cited

How they cite others

102

Affiliations

Publications

Order By: Most citations

Corralling Stochastic Bandit Algorithms

Arora¹,

Marinov²,

Mohri³

2020

Preprint

View full text Add to dashboard Cite

We study the problem of corralling stochastic bandit algorithms, that is combining multiple bandit algorithms designed for a stochastic environment, with the goal of devising a corralling algorithm that performs almost as well as the best base algorithm. We give two general algorithms for this setting, which we show benefit from favorable regret guarantees. We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward, and depends on the gap between the highest reward and other rewards. We also provide lower bounds for this problem that further justify our approach.Preprint. Under review.

show abstract

Private Stochastic Convex Optimization: Efficient Algorithms for Non-smooth Objectives

Arora¹,

Marinov²,

Ullah³

2020

Preprint

View full text Add to dashboard Cite

In this paper, we revisit the problem of private stochastic convex optimization. We propose an algorithm, based on noisy mirror descent, which achieves optimal rates up to a logarithmic factor, both in terms of statistical complexity and number of queries to a first-order stochastic oracle. Unlike prior work, we do not require Lipschitz continuity of stochastic gradients to achieve optimal rates. Our algorithm generalizes beyond the Euclidean setting and yields anytime utility and privacy guarantees.

show abstract

Dimension Independent Generalization of DP-SGD for Overparameterized Smooth Convex Optimization

Ma¹,

Marinov²,

Zhang³

2022

Preprint

View full text Add to dashboard Cite

This paper considers the generalization performance of differentially private convex learning. We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD. More specifically, by using some recently obtained dimension-independent convergence results for stochastic Langevin algorithms with convex objective functions, we obtain O(n −1/4 ) privacy guarantees for DP-SGD with the optimal excess generalization error of Õ(n −1/2 ) for certain classes of overparameterized smooth convex optimization problems. This improves previous DP-SGD results for such problems that contain explicit dimension dependencies, so that the resulting generalization bounds become unsuitable for overparameterized models used in practical applications.

show abstract

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Dann¹,

Marinov²,

Mohri³

et al. 2021

Preprint

View full text Add to dashboard Cite

We provide improved gap-dependent regret bounds for reinforcement learning in finite episodic Markov decision processes. Compared to prior work, our bounds depend on alternative definitions of gaps. These definitions are based on the insight that, in order to achieve a favorable regret, an algorithm does not need to learn how to behave optimally in states that are not reached by an optimal policy. We prove tighter upper regret bounds for optimistic algorithms and accompany them with new information-theoretic lower bounds for a large class of MDPs. Our results show that optimistic algorithms can not achieve the information-theoretic lower bounds even in deterministic MDPs unless there is a unique optimal policy. Recently, however, some significant progress has been achieved towards deriving more optimistic problem-dependent guarantees. This includes more refined regret bounds for the tabular episodic setting that depend on structural properties of the specific MDP considered [29,24,20,12,16]. Motivated by instance-dependent analyses in multi-armed bandits [23], these analyses derive gapdependent regret-bounds of the form O (s,a)∈S×A H log(K) gap (s,a) , where the sum is over state-actions pairs (s, a) and where the gap notion is defined as the difference of the optimal value function V * of the Bellman optimal policy π * and the Q-function of π * at a sub-optimal action: gap(s, a) = V * (s) − Q * (s, a). We will refer to this gap definition as value-function gap in the following. We note Preprint. Under review.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Teodor V. Marinov

Corralling Stochastic Bandit Algorithms

Private Stochastic Convex Optimization: Efficient Algorithms for Non-smooth Objectives

Dimension Independent Generalization of DP-SGD for Overparameterized Smooth Convex Optimization

Beyond Value-Function Gaps: Improved Instance-Dependent Regret Bounds for Episodic Reinforcement Learning

Contact Info

Product

Resources

About