Lipschitz Continuity in Model-based Reinforcement Learning

Asadi, Kavosh; Misra, Dipendra; Littman, Michael L.

doi:10.48550/arxiv.1804.07193

Cited by 11 publications

(17 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…They only assume that the optimal action-value function is Lipschitz continuous. This assumption is more general than that used in the aforementioned works as it is known that Lipschitz continuity of the reward function and the transition kernel leads to Lipschitz continuity of the optimal action-value function (Asadi et al, 2018). We use the same condition in this present paper.…”

Section: Related Workmentioning

confidence: 99%

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Touati¹,

Taïga²,

Bellemare³

2020

Preprint

View full text Add to dashboard Cite

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZOOMRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitationexploration trade-off. We show that ZOOMRL achieves a worst-case regret O(H) where H is the planning horizon, K is the number of episodes and d is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors.

show abstract

Section: Related Workmentioning

confidence: 99%

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Touati¹,

Taïga²,

Bellemare³

2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Errors in the world model compound, and cause issues when used for control [3,63]. Amos et al [2], similar to our work, directly optimizes the dynamics model against loss by differentiating through a planning procedure, and Schmidhuber [52] proposes a similar idea of improving the internal model using an RNN, although the RNN world model is initially trained to perform forward prediction.…”

Section: Related Workmentioning

confidence: 94%

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

Freeman¹,

Metz²,

Ha³

2019

Preprint

View full text Add to dashboard Cite

Much of model-based reinforcement learning involves learning a model of an agent's world, and training an agent to leverage this model to perform a task more efficiently. While these models are demonstrably useful for agents, every naturally occurring model of the world of which we are aware-e.g., a brain-arose as the byproduct of competing evolutionary pressures for survival, not minimization of a supervised forward-predictive loss via gradient descent. That useful models can arise out of the messy and slow optimization process of evolution suggests that forward-predictive modeling can arise as a side-effect of optimization under the right circumstances. Crucially, this optimization process need not explicitly be a forward-predictive loss. In this work, we introduce a modification to traditional reinforcement learning which we call observational dropout, whereby we limit the agents ability to observe the real environment at each timestep. In doing so, we can coerce an agent into learning a world model to fill in the observation gaps during reinforcement learning. We show that the emerged world model, while not explicitly trained to predict the future, can help the agent learn key skills required to perform well in its environment. Videos of our results available at https://learningtopredict.github.io/

show abstract

“…Our approach differs from it in three details: a) we use the absolute value of the value difference instead of the squared difference; b) we use the imaginary value function from the estimated dynamical model to define the loss, which makes the loss purely a function of the estimated model and the policy; c) we show that the iterative algorithm, using the loss function as a building block, can converge to a local maximum, partly by cause of the particular choices made in a) and b). Asadi et al (2018) also study the discrepancy bounds under Lipschitz condition of the MDP.…”

Section: Additional Related Workmentioning

confidence: 99%

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Luo,

Xu,

et al. 2018

Preprint

View full text Add to dashboard Cite

Model-based reinforcement learning (RL) is considered to be a promising approach to reduce the sample complexity that hinders model-free RL. However, the theoretical understanding of such methods has been rather limited. This paper introduces a novel algorithmic framework for designing and analyzing model-based RL algorithms with theoretical guarantees. We design a meta-algorithm with a theoretical guarantee of monotone improvement to a local maximum of the expected reward. The meta-algorithm iteratively builds a lower bound of the expected reward based on the estimated dynamical model and sample trajectories, and then maximizes the lower bound jointly over the policy and the model. The framework extends the optimism-in-face-of-uncertainty principle to non-linear dynamical models in a way that requires no explicit uncertainty quantification. Instantiating our framework with simplification gives a variant of model-based RL algorithms Stochastic Lower Bounds Optimization (SLBO). Experiments demonstrate that SLBO achieves stateof-the-art performance when only one million or fewer samples are permitted on a range of continuous control benchmark tasks. 1

show abstract

Lipschitz Continuity in Model-based Reinforcement Learning

Cited by 11 publications

References 0 publications

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces

Learning to Predict Without Looking Ahead: World Models Without Forward Prediction

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Contact Info

Product

Resources

About