Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Mihaylova, Tsvetomila; Niculae, Vlad; Martins, André F. T.

doi:10.18653/v1/2020.emnlp-main.171

Cited by 2 publications

(4 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Effectively, this procedure performs a one-step gradient descent to induce an approximation of the optimal structure. Mihaylova et al (2020) noted that this formulation recovers the identity STE in Equation (1):…”

Section: Surrogate Gradientsmentioning

confidence: 90%

“…Being mindful of the structural constraints for STE can help improve the learning process. To see this, we first examine an alternative formulation of STE as proposed by Mihaylova et al (2020). Consider the hypothetical case where the optimal structure z * is accessible.…”

Section: Surrogate Gradientsmentioning

confidence: 99%

“…Inspired by SPIGOT, Mihaylova et al (2020) explored more numbers of (projected) gradient update steps, an alternative loss based on the CRF loss (Lafferty et al, 2001) in the place of the structured perceptron loss, and exponentiated gradient updates (Kivinen and Warmuth, 1997). Nevertheless, they did not observe significant improvement in the structured case.…”

Section: Surrogate Gradientsmentioning

confidence: 99%

See 2 more Smart Citations

Learning with Latent Structures in Natural Language Processing: A Survey

Wu¹

2022

Preprint

View full text Add to dashboard Cite

While end-to-end learning with fully differentiable models has enabled tremendous success in natural language process (NLP) and machine learning, there have been significant recent interests in learning with latent discrete structures to incorporate better inductive biases for improved end-task performance and better interpretability. This paradigm, however, is not straightforwardly amenable to the mainstream gradient-based optimization methods. This work surveys three main families of methods to learn such models: surrogate gradients, continuous relaxation, and marginal likelihood maximization via sampling. We conclude with a review of applications of these methods and an inspection of the learned latent structure that they induce. 1

show abstract

Section: Surrogate Gradientsmentioning

confidence: 90%

Section: Surrogate Gradientsmentioning

confidence: 99%

Section: Surrogate Gradientsmentioning

confidence: 99%

See 1 more Smart Citation

Learning with Latent Structures in Natural Language Processing: A Survey

Wu¹

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…where T is the set of all possible structures and Z is a normalization often called partition function. This equation can be thought of as a softmax equivalent over an extremely large set of structured outputs that share sub-structures (Sutton and Mc-Callum, 2007;Mihaylova et al, 2020).…”

Section: Structured Distributionsmentioning

confidence: 99%

SynJax: Structured Probability Distributions for JAX

Stanojević,

Sartran

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

View full text Add to dashboard Cite

The development of deep learning software libraries enabled significant progress in the field by allowing users to focus on modeling, while letting the library to take care of the tedious and time-consuming task of optimizing execution for modern hardware accelerators. However, this has benefited only particular types of deep learning models, such as Transformers, whose primitives map easily to the vectorized computation. The models that explicitly account for structured objects, such as trees and segmentations, did not benefit equally because they require custom algorithms that are difficult to implement in a vectorized form.SynJax directly addresses this problem by providing an efficient vectorized implementation of inference algorithms for structured distributions covering alignment, tagging, segmentation, constituency trees and spanning trees. This is done by exploiting the connection between algorithms for automatic differentiation and probabilistic inference. With SynJax we can build large-scale differentiable models that explicitly model structure in the data. The code is available at https://github.com/google-deepmind/synjax.

show abstract

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

Cited by 2 publications

References 39 publications

Learning with Latent Structures in Natural Language Processing: A Survey

Learning with Latent Structures in Natural Language Processing: A Survey

SynJax: Structured Probability Distributions for JAX

Contact Info

Product

Resources

About