On the Role of Weight Sharing During Deep Option Learning

Riemer, Matthew; Cases, Ignacio; Rosenbaum, Clemens; Liu, Miao; Tesauro, Gerald

doi:10.1609/aaai.v34i04.6003

Cited by 10 publications

(13 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…the agent reaches the blue zone, it obtains a reward of +20 as opposed to a reward of +10 at the red-green junction. In Figure 1, we plot the rewards obtained per cycle for both the AR-RL agent and a DR-RL agent, and show that the hierarchical AR policy gradient performs better than its DR counterpart proposed by Riemer et al (2020). Finally, we illustrate the asymptotic convergence of the actor and critic parameters in Figure 2.…”

Section: Resultsmentioning

confidence: 99%

“…Finally, we look at the susceptibility of our framework to traps, and compare it to the DR setting proposed by Riemer et al (2020). Figure 3(b) depicts a grid world environment characterized by sparse rewards.…”

Section: Resultsmentioning

confidence: 99%

“…First, we illustrate how to extend the framework proposed by Riemer et al (2020) for the AR criterion. Apart from addressing the AR criterion, our framework also presents a simplified and intuitive approach to dealing with hierarchical option-critic algorithms (Riemer, Liu, and Tesauro 2018) by introducing the concept of o 0 and o N .…”

Section: Policy-gradient With Function Approximationmentioning

confidence: 99%

“…We can now follow a similar procedure as the one explored (Riemer et al 2020), and take the derivative of Q Ω (s, o 0:N −1 ) with respect to θ. By incorporating the notion of o 0 into our equations, we were able to significantly reduce the complexity of the following equations.…”

Section: Hierarchical Average Reward Policy Gradientmentioning

confidence: 99%

“…However, employing a discount factor to bound the cumulative rewards can inadvertently lead to incorrect credit assignment. We addresses this issue by extending the framework proposed by Riemer et al (2020) for the average reward (AR) criterion.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Dharmavaram

Riemer²,

Bhatnagar

2020

AAAI

Self Cite

View full text Add to dashboard Cite

Option-critic learning is a general-purpose reinforcement learning (RL) framework that aims to address the issue of long term credit assignment by leveraging temporal abstractions. However, when dealing with extended timescales, discounting future rewards can lead to incorrect credit assignments. In this work, we address this issue by extending the hierarchical option-critic policy gradient theorem for the average reward criterion. Our proposed framework aims to maximize the long-term reward obtained in the steady-state of the Markov chain defined by the agent's policy. Furthermore, we use an ordinary differential equation based approach for our convergence analysis and prove that the parameters of the intra-option policies, termination functions, and value functions, converge to their corresponding optimal values, with probability one. Finally, we illustrate the competitive advantage of learning options, in the average reward setting, on a grid-world environment with sparse rewards.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Resultsmentioning

confidence: 99%

Section: Policy-gradient With Function Approximationmentioning

confidence: 99%

Section: Hierarchical Average Reward Policy Gradientmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Dharmavaram

Riemer²,

Bhatnagar

2020

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

A quantum inspired approach to learning dynamical laws from data—block-sparsity and gauge-mediated weight sharing

Fuksa,

Götte,

Roth

et al. 2024

Mach. Learn.: Sci. Technol.

View full text Add to dashboard Cite

Recent years have witnessed an increased interest in recovering dynamical laws of complex systems in a largely data-driven fashion under meaningful hypotheses. In this work, we propose a scalable and numerically robust method for this task, utilizing efﬁcient block-sparse tensor train representations of dynamical laws, inspired by similar approaches in quantum many-body systems. Low-rank tensor train representations have been previously derived for dynamical laws of one-dimensional systems. We extend this result to efﬁcient representations of systems with K-mode interactions and controlled approximations of systems with decaying interactions. We further argue that natural structure assumptions on dynamical laws, such as bounded polynomial degrees, can be exploited in the form of block-sparse support patterns of tensor-train cores. Additional structural similarities between interactions of certain modes can be accounted for by weight sharing within the ansatz. To make use of these structure assumptions, we propose a novel optimization algorithm, block-sparsity restricted alternating least squares with gauge-mediated weight sharing. The algorithm is inspired by similar notions in machine learning and achieves a signiﬁcant improvement in performance over previous approaches. We demonstrate the performance of the method numerically on three one-dimensional systems – the Fermi-Pasta-Ulam-Tsingou system, rotating magnetic dipoles and point particles interacting via modiﬁed Lennard-Jones potentials, observing a highly accurate and noise-robust recovery.

show abstract

Distributed Learning for Large-Scale Models at Edge With Privacy Protection

Yuan,

Chen,

et al. 2024

IEEE Trans. Comput.

View full text Add to dashboard Cite

On the Role of Weight Sharing During Deep Option Learning

Cited by 10 publications

References 12 publications

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

Hierarchical Average Reward Policy Gradient Algorithms (Student Abstract)

A quantum inspired approach to learning dynamical laws from data—block-sparsity and gauge-mediated weight sharing

Distributed Learning for Large-Scale Models at Edge With Privacy Protection

Contact Info

Product

Resources

About