Posterior Calibrated Training on Sentence Classification Tasks

Jung, Taehee; Kang, Dongyeop; Cheng, Hua; Mentch, Lucas; Schaaf, Thomas

doi:10.18653/v1/2020.acl-main.242

Cited by 7 publications

(11 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since NCE training requires more computation (because of the noise ratio), we have tried finetuning the baseline with more steps, but we find that gives worse ECE and very little or no improvement on accuracy. We compare EBM training with three strong baselines for calibration: posterior calibrated training (PosCal) (Jung et al, 2020), temperature scaling (T-Scal) (Guo et al, 2017), and scaling-binning calibrator (Scal-bin) (Kumar et al, 2019). For PosCal and Scal-bin, we use the published code.…”

Section: Methodsmentioning

confidence: 99%

“…ECE first partitions all predictions into B equally-spaced bins by its confidence. Following Jung et al (2020); Grathwohl et al (2019), we set B " 20, which means the width of each bin is 0.05. For example, the first bin contains all predictions that have confidence in the range of r0, 0.05q.…”

Section: Definition Of Ecementioning

confidence: 99%

“…available. To overcome this constraint, Jung et al (2020) uses a penalty term to encourage better calibration during training.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

McCann²,

Xiong

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

In this work, we explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders (e.g., Roberta) for natural language understanding (NLU) tasks. Our experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines, with little or no loss in accuracy. We discuss three variants of energy functions (namely scalar, hidden, and sharp-hidden) that can be defined on top of a text encoder, and compare them in experiments. Due to the discreteness of text data, we adopt noise contrastive estimation (NCE) to train the energy-based model. To make NCE training more effective, we train an autoregressive noise model with the masked language model (MLM) objective.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Definition Of Ecementioning

confidence: 99%

See 1 more Smart Citation

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

McCann²,

Xiong

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…This is because the number of classes are exponentially large and estimation of every posterior density or marginal posterior density is not possible. Previous works such as (Jung et al, 2020;Nguyen and O'Connor, 2015) propose to use the downstream task with small number of classes to perform calibration and estimation of the calibration error. In structured prediction models, calibration is also important for the generation of the structured outputs as the decoding algorithm relies on the posterior estimates to efficiently search through the space of sequences.…”

Section: Related Workmentioning

confidence: 99%

Platt-Bin: Efficient Posterior Calibrated Training for NLP Classifiers

Singh¹,

Goshtasbpour²

2022

Findings of the Association for Computational Linguistics: ACL 2022

View full text Add to dashboard Cite

Modern NLP classifiers are known to return uncalibrated estimations of class posteriors. Existing methods for posterior calibration rescale the predicted probabilities but often have an adverse impact on final classification accuracy, thus leading to poorer generalization. We propose an end-to-end trained calibrator, Platt-Binning, that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities. Our method leverages the sample efficiency of Platt scaling and the verification guarantees of histogram binning, thus not only reducing the calibration error but also improving task performance. In contrast to existing calibrators, we perform this efficient calibration during training. Empirical evaluation of benchmark NLP classification tasks echoes the efficacy of our proposal.

show abstract

“…Predicted probabilities of direct models are usually calibrated to present refined confidence scores, because no models are perfect, and even within high probability predicted labels, there could be errors. [23][24][25][26] An important distinction in machine learning algorithms, including deep learning algorithms, is whether they are supervised or unsupervised. Supervised learning algorithms predict target variables, which may be either continuous or categorical, such as prognosis, diagnosis, or toxicity.…”

Section: Recent Technical Advances In Nlpmentioning

confidence: 99%