Temporal Difference Learning

Uther, William; Ciaramita, Massimiliano; Berendt, Bettina; Kołcz, Aleksander; Grobelnik, Marko; Mladenić, Dunja; Witbrock, Michael; Risch, John; Bohn, Shawn; Poteet, Steve; Kao, Anne; Quach, Lesley; Wu, Jason; Keogh, Eamonn; Miikkulainen, Risto; Flener, Pierre; Schmid, Ute; Zheng, Fei; Webb, Geoffrey I.; Nijssen, Siegfried

doi:10.1007/978-0-387-30164-8_817

Cited by 8 publications

(7 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It chooses the best action in each cycle except for a fixed fraction of the time when it tries a random action. We have implemented the Q(λ) algorithm (Watkins, 1989), which subsumes the simpler Q(0) algorithm as a special case, and also HLQ(λ) which is similar except that it automatically adapts its learning rate (Hutter and Legg, 2007). Finally, we have created a wrapper for MC-AIXI (Veness et al, 2010(Veness et al, , 2011, a more advanced reinforcement learning agent that can be viewed as an approximation to Hutter's AIXI.…”

Section: Resultsmentioning

confidence: 99%

An Approximation of the Universal Intelligence Measure

Legg¹,

Veness²

2011

Preprint

View full text Add to dashboard Cite

The Universal Intelligence Measure is a recently proposed formal definition of intelligence. It is mathematically specified, extremely general, and captures the essence of many informal definitions of intelligence. It is based on Hutter's Universal Artificial Intelligence theory, an extension of Ray Solomonoff's pioneering work on universal induction. Since the Universal Intelligence Measure is only asymptotically computable, building a practical intelligence test from it is not straightforward. This paper studies the practical issues involved in developing a real-world UIM-based performance metric. Based on our investigation, we develop a prototype implementation which we use to evaluate a number of different artificial agents.

show abstract

Section: Resultsmentioning

confidence: 99%

An Approximation of the Universal Intelligence Measure

Legg¹,

Veness²

2011

Preprint

View full text Add to dashboard Cite

show abstract

“…By taking an action and moving from one state to another, based on the Bellman equation and Bellman update scheme [ 51 ], the value function is gradually updated using sample transitions. This procedure is referred to as Temporal Difference (TD) update [ 51 ]. There are two approaches to update policy: “on-policy learning” or “off-policy learning”.…”

Section: Problem Formulationmentioning

confidence: 99%

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

Salimibeni

Mohammadi

Malekzadeh

et al. 2022

Sensors

View full text Add to dashboard Cite

Development of distributed Multi-Agent Reinforcement Learning (MARL) algorithms has attracted an increasing surge of interest lately. Generally speaking, conventional Model-Based (MB) or Model-Free (MF) RL algorithms are not directly applicable to the MARL problems due to utilization of a fixed reward model for learning the underlying value function. While Deep Neural Network (DNN)-based solutions perform well, they are still prone to overfitting, high sensitivity to parameter selection, and sample inefficiency. In this paper, an adaptive Kalman Filter (KF)-based framework is introduced as an efficient alternative to address the aforementioned problems by capitalizing on unique characteristics of KF such as uncertainty modeling and online second order learning. More specifically, the paper proposes the Multi-Agent Adaptive Kalman Temporal Difference (MAK-TD) framework and its Successor Representation-based variant, referred to as the MAK-SR. The proposed MAK-TD/SR frameworks consider the continuous nature of the action-space that is associated with high dimensional multi-agent environments and exploit Kalman Temporal Difference (KTD) to address the parameter uncertainty. The proposed MAK-TD/SR frameworks are evaluated via several experiments, which are implemented through the OpenAI Gym MARL benchmarks. In these experiments, different number of agents in cooperative, competitive, and mixed (cooperative-competitive) scenarios are utilized. The experimental results illustrate superior performance of the proposed MAK-TD/SR frameworks compared to their state-of-the-art counterparts.

show abstract

“…None of the existing methods for step-size adaptation in TD learning satisfy both of our criteria while also performing well in practice. HL(λ) (Hutter et al, 2007) and AlphaBound (Dabney and Barto, 2012) have a single step-size which only decreases in value. RMSprop (Tieleman and Hinton, 2012) satisfies our criteria and can be trivially generalized to TD, however, it does not perform well in TD Learning-as we demonstrate in this paper.…”

Section: Step-size Adaptation In Temporal-difference Learningmentioning

confidence: 99%

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

Kearney¹,

Veeriah²,

Travnik³

et al. 2018

Preprint

View full text Add to dashboard Cite

In this paper, we introduce a method for adapting the step-sizes of temporal difference (TD) learning. The performance of TD methods often depends on well chosen step-sizes, yet few algorithms have been developed for setting the step-size automatically for TD learning. An important limitation of current methods is that they adapt a single step-size shared by all the weights of the learning system. A vector step-size enables greater optimization by specifying parameters on a per-feature basis. Furthermore, adapting parameters at different rates has the added benefit of being a simple form of representation learning. We generalize Incremental Delta Bar Delta (IDBD)-a vectorized adaptive step-size method for supervised learning-to TD learning, which we name TIDBD. We demonstrate that TIDBD is able to find appropriate step-sizes in both stationary and non-stationary prediction tasks, outperforming ordinary TD methods and TD methods with scalar step-size adaptation; we demonstrate that it can differentiate between features which are relevant and irrelevant for a given task, performing representation learning; and we show on a real-world robot prediction task that TIDBD is able to outperform ordinary TD methods and TD methods augmented with AlphaBound and RMSprop.

show abstract

Temporal Difference Learning

Cited by 8 publications

References 8 publications

An Approximation of the Universal Intelligence Measure

An Approximation of the Universal Intelligence Measure

Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation

TIDBD: Adapting Temporal-difference Step-sizes Through Stochastic Meta-descent

Contact Info

Product

Resources

About