Meta-learning of Sequential Strategies

Ortéga, Pascal; Wang, Jane X.; Rowland, Mark; Genewein, Tim; Kurth‐Nelson, Zeb; Pascanu, Razvan; Heess, Nicolas; Veness, Joel; Pritzel, Alex; Sprechmann, Pablo; Jayakumar, Siddhant M.; McGrath, Tom; Miller, Kevin; Azar, Mohammad Gheshlaghi; Osband, Ian; Rabinowitz, Neil C.; György, András; Chiappa, Silvia; Osindero, Simon; Teh, Yee Whye; Hasselt, Hado van; Freitas, Nando de; Botvinick, Matthew; Legg, Shane

doi:10.48550/arxiv.1905.03030

Cited by 10 publications

(10 citation statements)

References 20 publications

(22 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One study byKorovina et al, who propose a method described in the next paragraph, highlight several existing methods that all require ≥ 5 thousand evaluations for a single task compared to their 100. Based on the machine learning community's broader interest in improving the sample efficiency of reinforcement learning algorithms34…”

mentioning

confidence: 99%

The Synthesizability of Molecules Proposed by Generative Models

Gao

Coley

2020

J. Chem. Inf. Model.

254

274

View full text Add to dashboard Cite

The discovery of functional molecules is an expensive and time-consuming process, exemplified by the rising costs of small molecule therapeutic discovery. One class of techniques of growing interest for early-stage drug discovery is de novo molecular generation and optimization, catalyzed by the development of new deep learning approaches. 1 These techniques can suggest novel molecular structures intended to maximize a multi-objective function, e.g., suitability as a therapeutic against a particular target, 2 without relying on brute-force exploration of a chemical space. 3 However, the utility of these approaches is stymied by ignorance of synthesizability. To highlight the severity of this issue, we use a data-driven computer-aided synthesis planning program 4 to quantify how often molecules proposed by state-of-the-art generative models cannot be readily synthesized. Our analysis demonstrates that there are several tasks for which these models generate unrealistic molecular structures despite performing well on popular quantitative benchmarks. Synthetic complexity heuristics can successfully bias generation toward synthetically-tractable chemical space, although doing so 1 arXiv:2002.07007v1 [q-bio.QM] 17 Feb 2020 necessarily detracts from the primary objective. This analysis suggests that to improve the utility of these models in real discovery workflows, new algorithm development is warranted.

show abstract

mentioning

confidence: 99%

The Synthesizability of Molecules Proposed by Generative Models

Gao

Coley

2020

J. Chem. Inf. Model.

254

274

View full text Add to dashboard Cite

show abstract

“…Perhaps surprisingly, perplexity-based meta-learning of historydependent LLMs is closely related to the explicit Bayesian mixture solution described in Equation 4. In particular, one can show that in many standard meta-learning setups, the optimal perplexity-minimizing solution is exactly a Bayesian mixture distribution (Ortega et al 2019). Provided that a sufficiently powerful history-dependent model is used (such as the case with LLMs based on Transformers) to model the interaction histories, a low-perplexity solution can be seen as a learnt approximation to the explicit Bayesian construction we provided in in Equation 4.…”

Section: Discussionmentioning

confidence: 99%

Reinforcement Learning with Information-Theoretic Actuation

Catt¹,

Hutter²,

Veness³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Reinforcement Learning formalises an embodied agent's interaction with the environment through observations, rewards and actions. But where do the actions come from? Actions are often considered to represent something external, such as the movement of a limb, a chess piece, or more generally, the output of an actuator. In this work we explore and formalize a contrasting view, namely that actions are best thought of as the output of a sequence of internal choices with respect to an action model. This view is particularly well-suited for leveraging the recent advances in large sequence models as prior knowledge for multi-task reinforcement learning problems. Our main contribution in this work is to show how to augment the standard MDP formalism with a sequential notion of internal action using information-theoretic techniques, and that this leads to self-consistent definitions of both internal and external action value functions.

show abstract

“…Both the fixed payoff and the mean 𝜇 of the risky arm were drawn from a standard Gaussian distribution at the beginning of an episode, which lasted twenty rounds. To build agents that can trade off exploration versus exploitation, we used memory-based meta-learning [Santoro et al, 2016, Wang et al, 2016, which is known to produce near-optimal bandit players [Mikulik et al, 2020, Ortega et al, 2019.…”

Section: Banditsmentioning

confidence: 99%

Model-Free Risk-Sensitive Reinforcement Learning

Grégoire¹,

Grau-Moya²,

Kunesch³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

We extend temporal-difference (TD) learning in order to obtain risk-sensitive, model-free reinforcement learning algorithms. This extension can be regarded as modification of the Rescorla-Wagner rule, where the (sigmoidal) stimulus is taken to be either the event of over-or underestimating the TD target. As a result, one obtains a stochastic approximation rule for estimating the free energy from i.i.d. samples generated by a Gaussian distribution with unknown mean and variance. Since the Gaussian free energy is known to be a certainty-equivalent sensitive to the mean and the variance, the learning rule has applications in risk-sensitive decision-making.

show abstract

Meta-learning of Sequential Strategies

Cited by 10 publications

References 20 publications

The Synthesizability of Molecules Proposed by Generative Models

The Synthesizability of Molecules Proposed by Generative Models

Reinforcement Learning with Information-Theoretic Actuation

Model-Free Risk-Sensitive Reinforcement Learning

Contact Info

Product

Resources

About