Roy Fox scite author profile

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias results from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the value estimates by penalizing deterministic policies in the beginning of the learning process. We show that this method reduces the bias of the value-function estimation, leading to faster convergence to the optimal value and the optimal policy. Moreover, G-learning enables the natural incorporation of prior domain knowledge, when available. The stochastic nature of G-learning also makes it avoid some exploration costs, a property usually attributed only to on-policy algorithms. We illustrate these ideas in several examples, where G-learning results in significant improvements of the convergence rate and the cost of the learning process.

show abstract

Chronic Liver Disease and Primary Liver-Cell Cancer With Hepatitis-Associated (Australia) Antigen in Serum

Sherlock

Niazi²,

Fox

et al. 1970

The Lancet

246

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Roy Fox

Essential role for the p110δ phosphoinositide 3-kinase in the allergic response

The Use of Goal Attainment Scaling in a Geriatric Care Setting

Cellular Immunity and Hepatitis-Associated, Australia Antigen Liver Disease

Taming the Noise in Reinforcement Learning via Soft Updates

Chronic Liver Disease and Primary Liver-Cell Cancer With Hepatitis-Associated (Australia) Antigen in Serum

Contact Info

Product

Resources

About