Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing 2018
DOI: 10.18653/v1/d18-1503
|View full text |Cite
|
Sign up to set email alerts
|

The Importance of Being Recurrent for Modeling Hierarchical Structure

Abstract: Recent work has shown that recurrent neural networks (RNNs) can implicitly capture and exploit hierarchical information when trained to solve common natural language processing tasks (Blevins et al., 2018) such as language modeling (Linzen et al., 2016;Gulordava et al., 2018) and neural machine translation (Shi et al., 2016). In contrast, the ability to model structured data with non-recurrent neural networks has received little attention despite their success in many NLP tasks (Gehring et al., 2017;Vaswani et… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

6
94
3

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 123 publications
(104 citation statements)
references
References 13 publications
6
94
3
Order By: Relevance
“…For ELMo there is still a discernible difference for dependencies longer than 5, but for BERT the two curves are almost indistinguishable throughout the whole range. This could be related to the aforementioned intuition that a Transformer captures long dependencies more effectively than a BiLSTM (see Tran et al (2018) for contrary observations, albeit for different tasks). The overall trends for both baseline and enhanced models are quite consistent across languages, although with large variations in accuracy levels.…”
Section: Dependency Lengthmentioning
confidence: 99%
“…For ELMo there is still a discernible difference for dependencies longer than 5, but for BERT the two curves are almost indistinguishable throughout the whole range. This could be related to the aforementioned intuition that a Transformer captures long dependencies more effectively than a BiLSTM (see Tran et al (2018) for contrary observations, albeit for different tasks). The overall trends for both baseline and enhanced models are quite consistent across languages, although with large variations in accuracy levels.…”
Section: Dependency Lengthmentioning
confidence: 99%
“…Jawahar et al (2019) extended this work to using multiple layers and tasks, supporting the claim that BERT's intermediate layers capture rich linguistic information. On the other hand, Tran et al (2018) concluded that LSTMs generalize to longer sequences better, and are more robust with respect to agreement distractors, compared to Transformers. Liu et al (2019) investigated the transferability of contextualized word representations to a number of probing tasks requiring linguistic knowledge.…”
Section: Related Workmentioning
confidence: 99%
“…The primary reason for adopting recurrent architecture for sentence-encoder is because recurrent neural networks have been shown to be essential for capturing the underlying hierarchical structure of sequential data [14]. By adopting this approach sentence-encoder is able to encode how sentences are structured in a document.…”
Section: ) Lexical Embeddingmentioning
confidence: 99%
“…For sentence-level encoder, we employ an attention-based recurrent neural network to capture the struc-tural patterns of sentences in the document. The primary reason for adopting recurrent architecture for sentence-encoder is because recurrent neural networks have been shown to be essential for capturing the underlying hierarchical structure of sequential data [14]. Hence, sentence-encoder in the proposed model is expected to capture the structural information of documents.…”
Section: Introductionmentioning
confidence: 99%