Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume 2021
DOI: 10.18653/v1/2021.eacl-main.151
|View full text |Cite
|
Sign up to set email alerts
|

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

Abstract: In this work, we explore joint energy-based model (EBM) training during the finetuning of pretrained text encoders (e.g., Roberta) for natural language understanding (NLU) tasks. Our experiments show that EBM training can help the model reach a better calibration that is competitive to strong baselines, with little or no loss in accuracy. We discuss three variants of energy functions (namely scalar, hidden, and sharp-hidden) that can be defined on top of a text encoder, and compare them in experiments. Due to … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(15 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Implicit Models for Dynamics Learning Implicit models (Teh et al 2003;Welling and Hinton 2002) have been widely used in many areas of machine learning, including image generation (Du and Mordatch 2019), natural language processing (Bakhtin et al 2021;He et al 2021), and density estimation (Saremi et al 2018;Song et al 2019). This is largely due to its ability to generalize probabilistic and deterministic approaches to classification, regression, and estimation (LeCun et al 2006;Song and Kingma 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Implicit Models for Dynamics Learning Implicit models (Teh et al 2003;Welling and Hinton 2002) have been widely used in many areas of machine learning, including image generation (Du and Mordatch 2019), natural language processing (Bakhtin et al 2021;He et al 2021), and density estimation (Saremi et al 2018;Song et al 2019). This is largely due to its ability to generalize probabilistic and deterministic approaches to classification, regression, and estimation (LeCun et al 2006;Song and Kingma 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Recent studies [20,57,58] have investigated training techniques that enable the application of EBMs to high-dimensional data and address issues of training stability. EBMs have been applied to various areas including image generation [4,18,20,29,76,77], graph generation [63], image classification [24], regression [27,28], continual learning [50] and natural language processing [17,32,71].…”
Section: Related Workmentioning
confidence: 99%
“…Calibration of uncertainty. We also propose to look at the models' calibration since this was previously shown to be improved by energy-based training [24,32]. A model is considered well-calibrated if its confidence (e.g.…”
Section: Supervised Energy-based Trainingmentioning
confidence: 99%
“…For example, Gal and Ghahramani (2016) propose to adopt multiple predictions with different dropout masks and then combine them to get the confidence estimate. Recently, several works focus on the calibration of PLMs models for NLP tasks (Hendrycks et al, 2019;Desai and Durrett, 2020;Jung et al, 2020;He et al, 2021;Park and Caragea, 2022;Bose et al, 2022). Dan and Roth (2021) investigate the calibration properties of different transformer architectures and sizes of BERT.…”
Section: Related Workmentioning
confidence: 99%