2022
DOI: 10.1109/tnnls.2021.3110281
|View full text |Cite
|
Sign up to set email alerts
|

Deep Reinforcement Learning With Modulated Hebbian Plus Q-Network Architecture

Abstract: In this paper, we consider a subclass of partially observable Markov decision process (POMDP) problems which we termed confounding POMDPs. In these types of POMDPs temporal difference (TD)-based RL algorithms struggle, as TD error cannot be easily derived from observations. We solve these types of problems using a new bio-inspired neural architecture that combines a modulated Hebbian network (MOHN) with DQN, which we call modulated Hebbian plus Q network architecture (MOHQA). The key idea is to use a Hebbian n… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 17 publications
0
7
0
Order By: Relevance
“…The continuous control environments are the simple 2D navigation, the half-cheetah direction (Finn et al, 2017) and velocity (Finn et al, 2017) Mujoco (Todorov et al, 2012) based environments and the meta-world ML1 and ML45 environments (Yu et al, 2020). The discrete action environment is a graph navigation environment that supports configurable levels of complexity called the CTgraph (Soltoggio et al, 2021;Ladosz et al, 2021;Ben-Iwhiwhu et al, 2020). The experimental setup focused on investigating the beneficial effect of the proposed neuromodulatory mechanism when augmenting existing meta-RL frameworks (i.e., neuromodulation as complementary tool to meta-RL rather than competing).…”
Section: Results and Analysismentioning
confidence: 99%
“…The continuous control environments are the simple 2D navigation, the half-cheetah direction (Finn et al, 2017) and velocity (Finn et al, 2017) Mujoco (Todorov et al, 2012) based environments and the meta-world ML1 and ML45 environments (Yu et al, 2020). The discrete action environment is a graph navigation environment that supports configurable levels of complexity called the CTgraph (Soltoggio et al, 2021;Ladosz et al, 2021;Ben-Iwhiwhu et al, 2020). The experimental setup focused on investigating the beneficial effect of the proposed neuromodulatory mechanism when augmenting existing meta-RL frameworks (i.e., neuromodulation as complementary tool to meta-RL rather than competing).…”
Section: Results and Analysismentioning
confidence: 99%
“…Importantly, STELLAR integrated 11 innovative components that solve different challenges and requirements for LL. It employed Sliced Cramer Preservation (SCP) (Kolouri et al, 2020), or the sketched version of it (SCP++) (Li et al, 2021), and Complex Synapse Optimizer (Benna and Fusi, 2016) to overcome catastrophic forgetting of old tasks; Self-Preserving World Model (Ketz et al, 2019) and Context-Skill Model (Tutum et al, 2021) for backward transfer to old tasks as well as forward transfer to their variants; Neuromodulated Attention (Zou et al, 2020) for rapid performance recovery when an old task repeats; Modulated Hebbian Network (Ladosz et al, 2022) and Plastic Neuromodulated Network (Ben-Iwhiwhu et al, 2021) for rapid adaptation to new tasks; Reflexive Adaptation (Maguire et al, 2021) and Meta-Learned Instinct Network (Grbic and Risi, 2021) to safely adapt to new tasks; and Probabilistic Program Neurogenesis (Martin and Pilly, 2019) to scale up the learning of new tasks during fielded operation. More details on the precise effect of each of these components are beyond the scope of this paper; however, this case study outlines how the integrated system dynamics demonstrated LL using the proposed metrics, and how these metrics shaped the advancement of the SG-HRL system.…”
Section: System Group Hrl -Carla 531 System Overviewmentioning
confidence: 99%
“…[19]. An RL-based architecture proposed by the researchers in [20] may be used to decouple model development and implementation. The CC framework is proposed offline and deployed in authentic for online display alteration decision-making in instantaneously.…”
Section: Drl Backgroundmentioning
confidence: 99%
“…Model of Deep Reinforcement Learning (DRL) for congestion control MPTCP's route diversity-induced decrease of Controllability is the topic of [19]. An RL-based architecture proposed by the researchers in[20] may be used to decouple model development and implementation. The CC framework is proposed offline and deployed in authentic for online display alteration decision-making in instantaneously.…”
mentioning
confidence: 99%