Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.171
|View full text |Cite
|
Sign up to set email alerts
|

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Abstract: Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data. One challenge with end-to-end training of these models is the argmax operation, which has null gradient. In this paper, we focus on surrogate gradients, a popular strategy to deal with this problem. We explore latent structure learning through the angle of pulling back the downstream lea… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…Effectively, this procedure performs a one-step gradient descent to induce an approximation of the optimal structure. Mihaylova et al (2020) noted that this formulation recovers the identity STE in Equation (1):…”
Section: Surrogate Gradientsmentioning
confidence: 90%
See 2 more Smart Citations
“…Effectively, this procedure performs a one-step gradient descent to induce an approximation of the optimal structure. Mihaylova et al (2020) noted that this formulation recovers the identity STE in Equation (1):…”
Section: Surrogate Gradientsmentioning
confidence: 90%
“…Being mindful of the structural constraints for STE can help improve the learning process. To see this, we first examine an alternative formulation of STE as proposed by Mihaylova et al (2020). Consider the hypothetical case where the optimal structure z * is accessible.…”
Section: Surrogate Gradientsmentioning
confidence: 99%
See 1 more Smart Citation
“…where T is the set of all possible structures and Z is a normalization often called partition function. This equation can be thought of as a softmax equivalent over an extremely large set of structured outputs that share sub-structures (Sutton and Mc-Callum, 2007;Mihaylova et al, 2020).…”
Section: Structured Distributionsmentioning
confidence: 99%