Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1432
|View full text |Cite
|
Sign up to set email alerts
|

Distributionally Robust Language Modeling

Abstract: Language models are generally trained on data spanning a wide range of topics (e.g., news, reviews, fiction), but they might be applied to an a priori unknown target distribution (e.g., restaurant reviews). In this paper, we first show that training on text outside the test distribution can degrade test performance when using standard maximum likelihood (MLE) training. To remedy this without the knowledge of the test distribution, we propose an approach which trains a model that performs well over a wide range… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
94
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 75 publications
(102 citation statements)
references
References 24 publications
(25 reference statements)
1
94
0
Order By: Relevance
“…Our data union can be seen as a simple approach toward this end, but we found that current models do not exploit useful information beyond each target dataset. More sophisticated approaches, such as distributionally robust optimization (Delage and Ye, 2010;Oren et al, 2019), may help. Another promising way is relying on strong pretrained language models, including BERT (Devlin et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…Our data union can be seen as a simple approach toward this end, but we found that current models do not exploit useful information beyond each target dataset. More sophisticated approaches, such as distributionally robust optimization (Delage and Ye, 2010;Oren et al, 2019), may help. Another promising way is relying on strong pretrained language models, including BERT (Devlin et al, 2019).…”
Section: Discussionmentioning
confidence: 99%
“…While large pre-trained models have shown to work well, many questions and challenges remain. Recent work has shown that these models degrade on out-of-domain data, maximum likelihood training makes them too over-confident (Oren et al, 2019) and particularly calibration is important for out-ofdomain generalization (Hendrycks et al, 2020). An acknowledged issue with fine-tuning is the brittleness of the process (Phang et al, 2018;Dodge et al, 2020).…”
Section: Pre-training-mentioning
confidence: 99%
“…The question-answering dataset SQuAD 2.0 was created in response to the observation that existing systems could not reliably demur when presented with an unanswerable question (Rajpurkar et al, 2016(Rajpurkar et al, , 2018. The perplexity of language models rises when given out-of-domain text (Oren et al, 2019).…”
Section: Robustnessmentioning
confidence: 99%