Proceedings of the 3rd Workshop on Neural Generation and Translation 2019
DOI: 10.18653/v1/d19-5616
|View full text |Cite
|
Sign up to set email alerts
|

Generalization in Generation: A closer look at Exposure Bias

Abstract: Exposure bias refers to the train-test discrepancy that seemingly arises when an autoregressive generative model uses only ground-truth contexts at training time but generated ones at test time. We separate the contributions of the model and the learning framework to clarify the debate on consequences and review proposed counter-measures. In this light, we argue that generalization is the underlying property to address and propose unconditional generation as its fundamental benchmark. Finally, we combine laten… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
27
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 45 publications
(27 citation statements)
references
References 26 publications
0
27
0
Order By: Relevance
“…1. b) If the model only observes states resulting from correct past decisions at training time, it will not be prepared to recover from its own mistakes during prediction, suffering from exposure bias (Schmidt, 2019;Fried and Klein, 2018). In the experiment section, we demonstrate how this phenomenon will significantly hurt the language model performance and, to a lesser extent, also hurt the parsing performance.…”
Section: Syntactic Ordered Memorymentioning
confidence: 95%
“…1. b) If the model only observes states resulting from correct past decisions at training time, it will not be prepared to recover from its own mistakes during prediction, suffering from exposure bias (Schmidt, 2019;Fried and Klein, 2018). In the experiment section, we demonstrate how this phenomenon will significantly hurt the language model performance and, to a lesser extent, also hurt the parsing performance.…”
Section: Syntactic Ordered Memorymentioning
confidence: 95%
“…The predictions for MP, ME and Qual spans in Stage 3 are heavily dependent on the Q spans from Stage 1, and there does not exist any mechanism to rectify errors in Stage 1 later, in our approach. There is also an exposure bias (Schmidt, 2019;Galloway et al, 2019) as the model is trained on the ground truth, while tested on the predicted Q spans. Moreover, we believe that having common weights between the BERT models of Stage 1 and Stage 3 will not only make our approach faster and lighter, but also more performant through multi-task learning.…”
Section: Future Workmentioning
confidence: 99%
“…It is important to mention that, all smoothing methods mentioned in this paper are only used during training, while during testing, the one-hot vectors are again used, following the idea that a model trained with smoothed representation of words should better generalize during testing. This can be further related to the topic of exposure bias (Schmidt, 2019), where the training-testing mismatch may lead to unseen or corrupted context during search. Although we do not conduct experiments to further validate the point, qualitatively we think input smoothing effectively lets the model see more unique contexts during training, mitigating the exposure bias problem.…”
Section: Input Smoothing + Output Smoothingmentioning
confidence: 99%