Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.242
|View full text |Cite
|
Sign up to set email alerts
|

Posterior Calibrated Training on Sentence Classification Tasks

Abstract: Most classification models work by first predicting a posterior probability distribution over all classes and then selecting that class with the largest estimated probability. In many settings however, the quality of posterior probability itself (e.g., 65% chance having diabetes), gives more reliable information than the final predicted class alone. When these methods are shown to be poorly calibrated, most fixes to date have relied on posterior calibration, which rescales the predicted probabilities but often… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(11 citation statements)
references
References 8 publications
0
11
0
Order By: Relevance
“…Since NCE training requires more computation (because of the noise ratio), we have tried finetuning the baseline with more steps, but we find that gives worse ECE and very little or no improvement on accuracy. We compare EBM training with three strong baselines for calibration: posterior calibrated training (PosCal) (Jung et al, 2020), temperature scaling (T-Scal) (Guo et al, 2017), and scaling-binning calibrator (Scal-bin) (Kumar et al, 2019). For PosCal and Scal-bin, we use the published code.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…Since NCE training requires more computation (because of the noise ratio), we have tried finetuning the baseline with more steps, but we find that gives worse ECE and very little or no improvement on accuracy. We compare EBM training with three strong baselines for calibration: posterior calibrated training (PosCal) (Jung et al, 2020), temperature scaling (T-Scal) (Guo et al, 2017), and scaling-binning calibrator (Scal-bin) (Kumar et al, 2019). For PosCal and Scal-bin, we use the published code.…”
Section: Methodsmentioning
confidence: 99%
“…ECE first partitions all predictions into B equally-spaced bins by its confidence. Following Jung et al (2020); Grathwohl et al (2019), we set B " 20, which means the width of each bin is 0.05. For example, the first bin contains all predictions that have confidence in the range of r0, 0.05q.…”
Section: Definition Of Ecementioning
confidence: 99%
See 1 more Smart Citation
“…This is because the number of classes are exponentially large and estimation of every posterior density or marginal posterior density is not possible. Previous works such as (Jung et al, 2020;Nguyen and O'Connor, 2015) propose to use the downstream task with small number of classes to perform calibration and estimation of the calibration error. In structured prediction models, calibration is also important for the generation of the structured outputs as the decoding algorithm relies on the posterior estimates to efficiently search through the space of sequences.…”
Section: Related Workmentioning
confidence: 99%
“…Predicted probabilities of direct models are usually calibrated to present refined confidence scores, because no models are perfect, and even within high probability predicted labels, there could be errors. [23][24][25][26] An important distinction in machine learning algorithms, including deep learning algorithms, is whether they are supervised or unsupervised. Supervised learning algorithms predict target variables, which may be either continuous or categorical, such as prognosis, diagnosis, or toxicity.…”
Section: Recent Technical Advances In Nlpmentioning
confidence: 99%