2022
DOI: 10.48550/arxiv.2207.02098
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Neural Networks and the Chomsky Hierarchy

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
2

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(15 citation statements)
references
References 0 publications
1
3
0
Order By: Relevance
“…Image-only models is reasonable, also echoing the conclusion mentioned in previous studies [7,12] that RNN-style models may outperform Transformstyle ones in the low-resource scenarios, especially formal language tasks. Hence, in the subsequent experiments, we mainly focus on GRU-style models.…”
Section: Modelssupporting
confidence: 79%
“…Image-only models is reasonable, also echoing the conclusion mentioned in previous studies [7,12] that RNN-style models may outperform Transformstyle ones in the low-resource scenarios, especially formal language tasks. Hence, in the subsequent experiments, we mainly focus on GRU-style models.…”
Section: Modelssupporting
confidence: 79%
“…Recent theoretical work has pointed out that finitedepth Transformers have an issue of expressibility that will result in failure to generalize (Hahn, 2020;Hao et al, 2022;Merrill et al, 2022;Liu et al, 2022). Delétang et al (2022) ran several neural architectures on a suite of different synthetic languages generated from different levels of the Chomsky hierarchy and empirically confirmed these results, showing that VTs have difficulty generalizing to Regular languages. Universal Transformers (UTs; Dehghani et al 2018) are Transformers that share parameters at every layer of the architecture.…”
Section: Introductionmentioning
confidence: 71%
“…o learns the desired end of the task, while n learns means by which it can be completed. What an architecture is suitable to describe varies according to the Chomsky hierarchy [53]. Using one architecture as opposed to another is equivalent to including different sorts of declarative programs in V .…”
Section: A Improving the Performance Of Incumbent Systemsmentioning
confidence: 99%
“…This argument amounts to the declaration that the result of any inductive bias can be learned with enough scale (that inductive bias is just foreknowledge). Acknowledging that the debate continues regarding this point [55,53], and that scale is all you need dismisses the cost of scale, lets assume for the sake of argument that scale is a viable approach and inductive biases are unnecessary. By fitting a curve, a neural network is approximating a model a task (albeit usually in an imperative rather than declarative form).…”
Section: Scale Is Not All You Needmentioning
confidence: 99%