Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1030
|View full text |Cite
|
Sign up to set email alerts
|

You Only Need Attention to Traverse Trees

Abstract: In recent NLP research, a topic of interest is universal sentence encoding, sentence representations that can be used in any supervised task. At the word sequence level, fully attention-based models suffer from two problems: a quadratic increase in memory consumption with respect to the sentence length and an inability to capture and use syntactic information. Recursive neural nets can extract very good syntactic information by traversing a tree structure. To this end, we propose Tree Transformer, a model that… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
24
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(34 citation statements)
references
References 14 publications
1
24
0
Order By: Relevance
“…We can also conclude that language models with more successful structural knowledge can better help to encode effective intrinsic language patterns, which is consistent with the prior studies (Kim et al, 2019b;Drozdov et al, 2019). We also compare the constituency parsing with state-of-the-art structure-aware models, including 1) Recurrent-based models described in §2: PRPN (Shen et al, 2018a), On-LSTM (Shen et al, 2018b), URNNG (Kim et al, 2019b), DIORA (Drozdov et al, 2019), PCFG (Kim et al, 2019a), and 2) Transformer based methods: Tree+Trm , RvTrm (Ahmed et al, 2019), PI+TrmXL , and the BERT model initialized with rich weights. As shown in Table 2, all the structure-aware models can give good parsing results, compared with non-structured models.…”
Section: Structure-aware Language Modelingsupporting
confidence: 67%
See 2 more Smart Citations
“…We can also conclude that language models with more successful structural knowledge can better help to encode effective intrinsic language patterns, which is consistent with the prior studies (Kim et al, 2019b;Drozdov et al, 2019). We also compare the constituency parsing with state-of-the-art structure-aware models, including 1) Recurrent-based models described in §2: PRPN (Shen et al, 2018a), On-LSTM (Shen et al, 2018b), URNNG (Kim et al, 2019b), DIORA (Drozdov et al, 2019), PCFG (Kim et al, 2019a), and 2) Transformer based methods: Tree+Trm , RvTrm (Ahmed et al, 2019), PI+TrmXL , and the BERT model initialized with rich weights. As shown in Table 2, all the structure-aware models can give good parsing results, compared with non-structured models.…”
Section: Structure-aware Language Modelingsupporting
confidence: 67%
“…They find that the latent language structure knowledge is best retained at the middle-layer in BERT (Vig and Belinkov, 2019;Jawahar et al, 2019;Goldberg, 2019). Ahmed et al (2019) employ a decomposable attention mechanism for recursively learn the tree structure for Transformer.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In these works, a straightforward strategy is to augment the conventional transformer with structural positional embeddings (Wang et al, 2019a;Shiv and Quirk, 2019). On the other hand, Tree Transformer is proposed to attend over nearer neighbor nodes (Ahmed et al, 2019;Wang et al, 2019b). Our proposed method is a substantial extension of Tree Transformer for modeling propagation tree structures for detecting rumors on microblogging websites.…”
Section: Related Workmentioning
confidence: 99%
“…In traditional linguistics, dependency parses are used to represent the relationships among words as triples of a relation between pairs of words [7,8]. In this paper, we propose a novel edge encoding mechanism of a dependency parse tree and extend the design of one of the existing dependency tree transformer models [9] using very few extra parameters. To the best of our knowledge, no work has been done on encoding these head-dependent relations into a dependency tree edge.…”
Section: Introductionmentioning
confidence: 99%