Neural Networks and the Chomsky Hierarchy

Grégoire, Delétang,; Ruoss, Anian; Grau-Moya, Jordi; Genewein, Tim; Wenliang, Li Kevin; Catt, Elliot; Hutter, Marcus; Legg, Shane; Ortéga, Pascal

doi:10.48550/arxiv.2207.02098

Cited by 10 publications

(15 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Image-only models is reasonable, also echoing the conclusion mentioned in previous studies [7,12] that RNN-style models may outperform Transformstyle ones in the low-resource scenarios, especially formal language tasks. Hence, in the subsequent experiments, we mainly focus on GRU-style models.…”

Section: Modelssupporting

confidence: 79%

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Dong,

Wu,

Meng

et al. 2023

Proceedings of the 31st ACM International Conference on Multimedia

View full text Add to dashboard Cite

Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair. In this regard, dominant methods mainly focus on multi-modal fusion for keyphrase generation. Nevertheless, there are still two main drawbacks: 1) only a limited number of sources, such as image captions, can be utilized to provide auxiliary information. However, they may not be sufficient for the subsequent keyphrase generation. 2) the input text and image are often not perfectly matched, and thus the image may introduce noise into the model. To address these limitations, in this paper, we propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise. First, we introduce external visual entities of the image as the supplementary input to the model, which benefits the cross-modal semantic alignment for keyphrase generation. Second, we simultaneously calculate an image-text matching score and image region-text correlation scores to perform multi-granularity image noise filtering. Particularly, we introduce the correlation scores between image regions and ground-truth keyphrases to refine the calculation of the previously-mentioned correlation scores. To demonstrate the effectiveness of our model, we conduct several groups of experiments on the benchmark dataset. Experimental results and in-depth analyses show that our model achieves the state-of-the-art performance. Our code is available on https://github.com/DeepLearnXMU/MM-MKP.

show abstract

Section: Modelssupporting

confidence: 79%

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Dong,

Wu,

Meng

et al. 2023

Proceedings of the 31st ACM International Conference on Multimedia

View full text Add to dashboard Cite

show abstract

“…Recent theoretical work has pointed out that finitedepth Transformers have an issue of expressibility that will result in failure to generalize (Hahn, 2020;Hao et al, 2022;Merrill et al, 2022;Liu et al, 2022). Delétang et al (2022) ran several neural architectures on a suite of different synthetic languages generated from different levels of the Chomsky hierarchy and empirically confirmed these results, showing that VTs have difficulty generalizing to Regular languages. Universal Transformers (UTs; Dehghani et al 2018) are Transformers that share parameters at every layer of the architecture.…”

Section: Introductionmentioning

confidence: 71%

Sparse Universal Transformer

Tan,

Shen,

Chen

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

The Universal Transformer (UT) is a variant of the Transformer that shares parameters across its layers. Empirical evidence shows that UTs have better compositional generalization than Vanilla Transformers (VTs) in formal language tasks. The parameter-sharing also affords it better parameter efficiency than VTs. Despite its many advantages, scaling UT parameters is much more compute and memory intensive than scaling up a VT. This paper proposes the Sparse Universal Transformer (SUT), which leverages Sparse Mixture of Experts (SMoE) and a new stick-breaking-based dynamic halting mechanism to reduce UT's computation complexity while retaining its parameter efficiency and generalization ability. Experiments show that SUT achieves the same performance as strong baseline models while only using half computation and parameters on WMT'14 and strong generalization results on formal language tasks (Logical inference and CFQ). The new halting mechanism also enables around 50% reduction in computation during inference with very little performance decrease on formal language tasks.

show abstract

“…o learns the desired end of the task, while n learns means by which it can be completed. What an architecture is suitable to describe varies according to the Chomsky hierarchy [53]. Using one architecture as opposed to another is equivalent to including different sorts of declarative programs in V .…”

Section: A Improving the Performance Of Incumbent Systemsmentioning

confidence: 99%

“…This argument amounts to the declaration that the result of any inductive bias can be learned with enough scale (that inductive bias is just foreknowledge). Acknowledging that the debate continues regarding this point [55,53], and that scale is all you need dismisses the cost of scale, lets assume for the sake of argument that scale is a viable approach and inductive biases are unnecessary. By fitting a curve, a neural network is approximating a model a task (albeit usually in an imperative rather than declarative form).…”

Section: Scale Is Not All You Needmentioning

confidence: 99%

Computable Artificial General Intelligence

Bennett¹

2022

Preprint

View full text Add to dashboard Cite

<p>Artificial general intelligence (AGI) may herald our extinction, according to AI safety research. Yet claims regarding AGI must rely upon mathematical formalisms -- theoretical agents we may analyse or attempt to build. AIXI appears to be the only such formalism supported by proof that its behaviour is optimal, a consequence of its use of compression as a proxy for intelligence. Unfortunately, AIXI is incomputable and claims regarding its behaviour highly subjective. We argue that this is because AIXI formalises cognition as taking place in isolation from the environment in which goals are pursued (Cartesian dualism). We propose an alternative, supported by proof and experiment, which overcomes these problems. Integrating research from cognitive science with AI, we formalise an enactive model of learning and reasoning to address the problem of subjectivity. This allows us to formulate a different proxy for intelligence, called weakness, which addresses the problem of incomputability. We prove optimal behaviour is attained when weakness is maximised. This proof is supplemented by experimental results comparing weakness and description length (the closest analogue to compression possible without reintroducing subjectivity). Weakness outperforms description length, suggesting it is a better proxy. Furthermore we show that, if cognition is enactive, then minimisation of description length is neither necessary nor sufficient to attain optimal performance. These results undermine the notion that compression is closely related to intelligence. We conclude with a discussion of limitations, implications and future research. There remain several open questions regarding the implementation of scale-able general intelligence. In the short term, these results may be best utilised to improve the performance of existing systems. For example, our results explain why Deepmind's Apperception Engine is able to generalise effectively, and how to replicate that performance by maximising weakness. Likewise in the context of neural networks, our results suggest both limitations of ``scale is all you need", and how those limitations can be overcome.</p>

show abstract

Neural Networks and the Chomsky Hierarchy

Cited by 10 publications

References 0 publications

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Towards Better Multi-modal Keyphrase Generation via Visual Entity Enhancement and Multi-granularity Image Noise Filtering

Sparse Universal Transformer

Computable Artificial General Intelligence

Contact Info

Product

Resources

About