You Only Need Attention to Traverse Trees

Ahmed, Mahtab; Samee, Muhammad Rifayat; Mercer, Robert E.

doi:10.18653/v1/p19-1030

Cited by 21 publications

(34 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We can also conclude that language models with more successful structural knowledge can better help to encode effective intrinsic language patterns, which is consistent with the prior studies (Kim et al, 2019b;Drozdov et al, 2019). We also compare the constituency parsing with state-of-the-art structure-aware models, including 1) Recurrent-based models described in §2: PRPN (Shen et al, 2018a), On-LSTM (Shen et al, 2018b), URNNG (Kim et al, 2019b), DIORA (Drozdov et al, 2019), PCFG (Kim et al, 2019a), and 2) Transformer based methods: Tree+Trm , RvTrm (Ahmed et al, 2019), PI+TrmXL , and the BERT model initialized with rich weights. As shown in Table 2, all the structure-aware models can give good parsing results, compared with non-structured models.…”

Section: Structure-aware Language Modelingsupporting

confidence: 67%

“…They find that the latent language structure knowledge is best retained at the middle-layer in BERT (Vig and Belinkov, 2019;Jawahar et al, 2019;Goldberg, 2019). Ahmed et al (2019) employ a decomposable attention mechanism for recursively learn the tree structure for Transformer.…”

Section: Related Workmentioning

confidence: 99%

“…Some effort devote to improved the ability of structure * Corresponding author. learning in Transformer LM by installing novel syntax-attention mechanisms (Ahmed et al, 2019;. Nevertheless, several limitations can be observed.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Retrofitting Structure-aware Transformer Language Model for End Tasks

Fei¹,

Ren²,

Ji³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We consider retrofitting structure-aware Transformer language model for facilitating end tasks by proposing to exploit syntactic distance to encode both the phrasal constituency and dependency connection into the language model. A middle-layer structural learning strategy is leveraged for structure integration, accomplished with main semantic task training under multi-task learning scheme. Experimental results show that the retrofitted structure-aware Transformer language model achieves improved perplexity, meanwhile inducing accurate syntactic phrases. By performing structure-aware fine-tuning, our model achieves significant improvements for both semantic-and syntactic-dependent tasks.

show abstract

Section: Structure-aware Language Modelingsupporting

confidence: 67%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Retrofitting Structure-aware Transformer Language Model for End Tasks

Fei¹,

Ren²,

Ji³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…In these works, a straightforward strategy is to augment the conventional transformer with structural positional embeddings (Wang et al, 2019a;Shiv and Quirk, 2019). On the other hand, Tree Transformer is proposed to attend over nearer neighbor nodes (Ahmed et al, 2019;Wang et al, 2019b). Our proposed method is a substantial extension of Tree Transformer for modeling propagation tree structures for detecting rumors on microblogging websites.…”

Section: Related Workmentioning

confidence: 99%

Debunking Rumors on Twitter with Tree Transformer

Ma¹,

Gao²

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Rumors are manufactured with no respect for accuracy, but can circulate quickly and widely by "word-of-post" through social media conversations. Conversation tree encodes important information indicative of the credibility of rumor. Existing conversation-based techniques for rumor detection either just strictly follow tree edges or treat all the posts fully-connected during feature learning. In this paper, we propose a novel detection model based on tree transformer to better utilize user interactions in the dialogue where post-level self-attention plays the key role for aggregating the intra-/inter-subtree stances. Experimental results on the TWITTER and PHEME datasets show that the proposed approach consistently improves rumor detection performance.

show abstract

“…In traditional linguistics, dependency parses are used to represent the relationships among words as triples of a relation between pairs of words [7,8]. In this paper, we propose a novel edge encoding mechanism of a dependency parse tree and extend the design of one of the existing dependency tree transformer models [9] using very few extra parameters. To the best of our knowledge, no work has been done on encoding these head-dependent relations into a dependency tree edge.…”

Section: Introductionmentioning

confidence: 99%

Encoding Dependency Information inside Tree Transformer

Ahmed

Mercer

2021

Proceedings of the Canadian Conference on Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

Representing a sentence in a high dimensional space is fundamental for most natural language processing (NLP) tasks at present. These representations depend on the underlying structures upon which they are built. Two scenarios are possible: one is to view the sentence as a sequence of words and another is to consider its inherent grammatical structure. It is possible to equip the first way with some external grammatical knowledge, but to capture a proper syntax would be close to impossible. Therefore, we investigate the second one by extending the design of an existing dependency tree transformer (DT-Transformer). We propose adding a novel edge encoding mechanism to this prior architecture. Experiments show that in sentence encoding, having access to information about the relationships between "head" words and their "dependent" words and how the heads are influenced by the dependent words achieves better sentence representation. Evaluation on the four tasks shows noteworthy results compared to the existing DT-Transformer, standard Transformer, LSTM-based models, and treestructured LSTMs. Extensive experimentation with representing the edge embeddings as different distributions (mean and standard deviation), encoding the edges in different ways, and an ablation study to find where to place each module in the architecture and which modules to use in the design is also provided.

show abstract

You Only Need Attention to Traverse Trees

Cited by 21 publications

References 14 publications

Retrofitting Structure-aware Transformer Language Model for End Tasks

Retrofitting Structure-aware Transformer Language Model for End Tasks

Debunking Rumors on Twitter with Tree Transformer

Encoding Dependency Information inside Tree Transformer

Contact Info

Product

Resources

About