Smoothing and Shrinking the Sparse Seq2Seq Search Space

Peters, Ben; Martins, André F. T.

doi:10.18653/v1/2021.naacl-main.210

Cited by 11 publications

(7 citation statements)

References 38 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…4 Hyperparameters α and τ . In all experiments we set α = 1.5, because this value was recommended by Peters et al (2019); Peters and Martins (2021) as the middle ground between α = 1 (softmax) and α = 2 (sparsemax).…”

Section: Setupmentioning

confidence: 99%

“…We remind the reader that the cat got your tongue problem (Stahlberg and Byrne, 2019) is one of the main motivations for using sparse transformations when generating text. As Peters and Martins (2021) have shown, 1.5-entmax successfully tackles this problem by significantly lowering the proportion of cases where an empty string is more likely than the beam search hypothesis. For 1.5-ReLU, we also calculated this proportion, and compared it with the proportions for softmax and sparsemax (Table 2).…”

Section: Empty Translationsmentioning

confidence: 99%

See 1 more Smart Citation

Speeding Up Entmax

Tezekbayev¹,

Nikoulina²,

Gallé³

et al. 2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

Softmax is the de facto standard for normalizing logits in modern neural networks for language processing. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. αentmax of Peters et al. ( 2019) solves this problem, but is unfortunately slower than softmax.In this paper, we propose an alternative to αentmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

show abstract

Section: Setupmentioning

confidence: 99%

Section: Empty Translationsmentioning

confidence: 99%

Speeding Up Entmax

Tezekbayev¹,

Nikoulina²,

Gallé³

et al. 2022

Findings of the Association for Computational Linguistics: NAACL 2022

View full text Add to dashboard Cite

show abstract

“…When α = 1, this recovers cross entropy. Entmax-based sparse sequence-to-sequence models have been shown to work well on machine translation Peters and Martins, 2021) as well morphological and phonological (Peters and Martins, 2020) tasks. Beyond the topline results, they have also been shown to be better calibrated than models trained with cross entropy loss (Peters and Martins, 2021).…”

Section: Modelmentioning

confidence: 99%

Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

2022

View full text Add to dashboard Cite

Speech consists of a continuously-varying acoustic signal. Yet human listeners experience it as sequences of discrete speech sounds, which are used to recognise words. To examine how the human brain appropriately sequences the speech signal, we recorded two-hour magnetoencephalograms from 21 subjects listening to short narratives. Our analyses show that the brain continuously encodes the three most recently heard speech sounds in parallel, and maintains this information long past the sensory input. Each speech sound has a representation that evolves over time, jointly encoding both its phonetic features and time elapsed since onset. This allows the brain to represent the relative order and phonetic content of the phonetic sequence. These dynamic representations are active earlier when phonemes are more predictable, and are sustained longer when lexical identity is uncertain. The flexibility in the dynamics of these representations paves the way for further understanding of how such sequences may be used to interface with higher order structure such as morphemes and words.Bio: Laura Gwilliams received her PhD in Psychology with a focus in Cognitive Neuroscience from New York University in May 2020. Currently she is a post-doctoral researcher at UCSF, using MEG and ECoG data to understand how linguistic structures are parsed and composed while listening to continuous speech. The ultimate goal of Laura's research is to describe speech comprehension in terms of what operations are applied to the acoustic signal; which representational formats are generated and manipulated (e.g. phonetic, syllabic, morphological), and under what processing architecture.

show abstract

“…• IWSLT'14 De→En (Cettolo et al) Hyperparameters α and τ . In all experiments we set α = 1.5, because this value was recommended by Peters et al (2019); Peters and Martins (2021a) as the middle ground between α = 1 (softmax) and α = 2 (sparsmax).…”

Section: Setupmentioning

confidence: 99%

Speeding Up Entmax

Tezekbayev¹,

Nikoulina²,

Gallé³

et al. 2021

Preprint

View full text Add to dashboard Cite

Softmax is the de facto standard in modern neural networks for language processing when it comes to normalizing logits. However, by producing a dense probability distribution each token in the vocabulary has a nonzero chance of being selected at each generation step, leading to a variety of reported problems in text generation. α-entmax of Peters et al. ( 2019) solves this problem, but is considerably slower than softmax.In this paper, we propose an alternative to αentmax, which keeps its virtuous characteristics, but is as fast as optimized softmax and achieves on par or better performance in machine translation task.

show abstract

Smoothing and Shrinking the Sparse Seq2Seq Search Space

Cited by 11 publications

References 38 publications

Speeding Up Entmax

Speeding Up Entmax

Proceedings of the 19th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

Speeding Up Entmax

Contact Info

Product

Resources

About