Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Brunner, Gino; Fritsche, Manuel; Richter, Oliver; Wattenhofer, Roger

doi:10.1109/ictai.2018.00015

Cited by 4 publications

(4 citation statements)

References 4 publications

(17 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ings (Carvalho et al, 2019;Rogers et al, 2020;Braşoveanu and Andonie, 2020). Such studies range from, for example, observing attention weights (Clark et al, 2019;Kovaleva et al, 2019;Reif et al, 2019;Lin et al, 2019;Mareček and Rosa, 2019;Htut et al, 2019;Raganato and Tiedemann, 2018), gradients (Brunner et al, 2020), and value-weighted vector norms (Kobayashi et al, 2020). The analysis scope has been further extended from attention only to including RES1 (Abnar and Zuidema, 2020), RES1 and LN1 (Kobayashi et al, 2021), and RES1, LN1, and LN2 (Modarressi et al, 2022).…”

Section: All the Scopesmentioning

confidence: 99%

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Kobayashi¹,

Kuribayashi²,

Yokoi³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Transformer architecture has become ubiquitous in the natural language processing field. To interpret the Transformer-based models, their attention patterns have been extensively analyzed. However, the Transformer architecture is not only composed of the multihead attention; other components can also contribute to Transformers' progressive performance. In this study, we extended the scope of the analysis of Transformers from solely the attention patterns to the whole attention block, i.e., multi-head attention, residual connection, and layer normalization. Our analysis of Transformer-based masked language models shows that the token-to-token interaction performed via attention has less impact on the intermediate representations than previously assumed. These results provide new intuitive explanations of existing reports; for example, discarding the learned attention patterns tends not to adversely affect the performance. The codes of our experiments are publicly available. 1

show abstract

Section: All the Scopesmentioning

confidence: 99%

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Kobayashi¹,

Kuribayashi²,

Yokoi³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Besides, intrinsic rewards can be modeled based on comparisons between the current observation and the past episodic memories [54]. Moreover, the differences between the actual and predicted consequences can also be regarded as a measure of surprise [55], [56]. Generally, the latter dynamicbased rewards are straightforward to scale and parallelize [57].…”

Section: Related Workmentioning

confidence: 99%

BND*-DDQN: Learn to Steer Autonomously Through Deep Reinforcement Learning

Wang

Esfahani

et al. 2021

IEEE Trans. Cogn. Dev. Syst.

View full text Add to dashboard Cite

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

“…This can enable faster training, since it can preempt the need for performing expensive simulations of the environment. Predicting latent representation was also proposed in [Brunner et al, 2018] as a regularization method for reinforcement learning.…”

Section: Related Workmentioning

confidence: 99%

Mathematical Reasoning in Latent Space

Lee,

Szegedy,

Rabe

et al. 2019

Preprint

View full text Add to dashboard Cite

We design and conduct a simple experiment to study whether neural networks can perform several steps of approximate reasoning in a fixed dimensional latent space. The set of rewrites (i.e. transformations) that can be successfully performed on a statement represents essential semantic features of the statement. We can compress this information by embedding the formula in a vector space, such that the vector associated with a statement can be used to predict whether a statement can be rewritten by other theorems. Predicting the embedding of a formula generated by some rewrite rule is naturally viewed as approximate reasoning in the latent space. In order to measure the effectiveness of this reasoning, we perform approximate deduction sequences in the latent space and use the resulting embedding to inform the semantic features of the corresponding formal statement (which is obtained by performing the corresponding rewrite sequence using real formulas). Our experiments show that graph neural networks can make non-trivial predictions about the rewrite-success of statements, even when they propagate predicted latent representations for several steps. Since our corpus of mathematical formulas includes a wide variety of mathematical disciplines, this experiment is a strong indicator for the feasibility of deduction in latent space in general.

show abstract

Using State Predictions for Value Regularization in Curiosity Driven Deep Reinforcement Learning

Cited by 4 publications

References 4 publications

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

Incorporating Residual and Normalization Layers into Analysis of Masked Language Models

BND*-DDQN: Learn to Steer Autonomously Through Deep Reinforcement Learning

Mathematical Reasoning in Latent Space

Contact Info

Product

Resources

About