2019
DOI: 10.48550/arxiv.1902.09393
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Cooperative Learning of Disjoint Syntax and Semantics

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(3 citation statements)
references
References 0 publications
0
3
0
Order By: Relevance
“…Models such as a flat RNN will fail to capture the hierarchical structure of this task. However, if a model can induce an explicit latent z, the parse tree of the expression, then the task is easy to learn by a tree-RNN model p(y|x, z) (Yogatama et al, 2016;Havrylov et al, 2019).…”
Section: Motivating Case Studymentioning
confidence: 99%
“…Models such as a flat RNN will fail to capture the hierarchical structure of this task. However, if a model can induce an explicit latent z, the parse tree of the expression, then the task is easy to learn by a tree-RNN model p(y|x, z) (Yogatama et al, 2016;Havrylov et al, 2019).…”
Section: Motivating Case Studymentioning
confidence: 99%
“…The contextual transfer boosts the performance of both the participating models by enriching the intermediate-level representations that share the backpropagation from both tasks. We note works [7,14,39] that recommend joint training of tasks on complementary contexts like emotion and sentiment classification. However, in contrast to these, we use joint training to leverage more complex modalities such as syntax and semantics.…”
Section: Introductionmentioning
confidence: 99%
“…In fact, the Gumbel-Softmax trick naturally translates to structured variables when arg max operator is applied over a structured domain rather than component-wise [34]. In contrast, score function estimators are now less common in structured domain, with a few exceptions such as [50,14]. The primary difficulty is the sample score function: neither Gibbs distributions, nor distribution defined through a generative process have a general shortcut to compute it.…”
Section: Introductionmentioning
confidence: 99%