2019
DOI: 10.1017/s1351324919000184
|View full text |Cite
|
Sign up to set email alerts
|

Jointly learning sentence embeddings and syntax with unsupervised Tree-LSTMs

Abstract: We introduce a neural network that represents sentences by composing their words according to induced binary parse trees. We use Tree-LSTM as our composition function, applied along a tree structure found by a fully differentiable natural language chart parser. Our model simultaneously optimises both the composition function and the parser, thus eliminating the need for externally-provided parse trees which are normally required for Tree-LSTM. It can therefore be seen as a tree-based RNN that is unsupervised w… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
97
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 62 publications
(98 citation statements)
references
References 24 publications
1
97
0
Order By: Relevance
“…Recently, there has been growing interest in providing an inductive bias in neural network by forcing layers to represent tree structures (Kim et al, 2017;Maillard et al, 2017;Choi et al, 2018;Niculae et al, 2018;Williams et al, 2018a;Liu and Lapata, 2018). Maillard et al (2017) also operates on a chart but, rather than modeling discrete trees, uses a soft-gating approach to mix representations of constituents in each given cell. While these models showed consistent improvement over comparable baselines, they do not seem to explicitly capture syntactic or semantic structures (Williams et al, 2018a).…”
Section: Related Workmentioning
confidence: 99%
“…Recently, there has been growing interest in providing an inductive bias in neural network by forcing layers to represent tree structures (Kim et al, 2017;Maillard et al, 2017;Choi et al, 2018;Niculae et al, 2018;Williams et al, 2018a;Liu and Lapata, 2018). Maillard et al (2017) also operates on a chart but, rather than modeling discrete trees, uses a soft-gating approach to mix representations of constituents in each given cell. While these models showed consistent improvement over comparable baselines, they do not seem to explicitly capture syntactic or semantic structures (Williams et al, 2018a).…”
Section: Related Workmentioning
confidence: 99%
“…Typically, recursive neural network models assume that an annotated treebank or a pretrained syntactic parser is available (Socher et al, 2013;Tai et al, 2015;Kim et al, 2019a), but recent work pays more attention to learning syntactic structures in an unsupervised manner. Yogatama et al (2017) propose to use reinforcement learning, and Maillard et al (2017) introduce the Tree-LSTM to jointly learn sentence embeddings and syntax trees, later combined with a Straight-Through Gumbel-Softmax estimator by Choi et al (2018). In addition to sentence classification tasks, recent research has focused on unsupervised structure learning for language modeling (Shen et al, 2018(Shen et al, , 2019Drozdov et al, 2019;Kim et al, 2019b).…”
Section: Related Workmentioning
confidence: 99%
“…In early approaches, unsupervised parsers were trained by optimizing the marginal likelihood of sentences (Klein and Manning, 2014). More recent deep learning approaches (Yogatama et al, 2017;Maillard et al, 2017;Choi et al, 2018) obtain latent tree structures by reinforcement learning (RL). Typically, this involves a secondary task, e.g., a language modeling objective or a semantic task.…”
Section: Introductionmentioning
confidence: 99%
“…For instance, Yogatama et al [27] used REINFORCE algorithms [28] to train the shiftreduce parser without ground truth. Instead of the shiftreduce parsers, Maillard et al [29] used a chart parser, which is fully differentiable by introducing a softmax annealing but suffers from O(n 3 ) time-and space-complexity. Gumbel Tree-LSTM is a parsing strategy proposed by [14], which introduces Tree-LSTM and calculates the merging score for each adjacent node pair based on a learnable query vector and greedily merges the best pair with the highest score in the next layer.…”
Section: Learning Tree Structures For Languagementioning
confidence: 99%