Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

He, Tianxing; McCann, Bryan; Xiong, Caiming; Hosseini-Asl, Ehsan

doi:10.18653/v1/2021.eacl-main.151

Cited by 5 publications

(15 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Implicit Models for Dynamics Learning Implicit models (Teh et al 2003;Welling and Hinton 2002) have been widely used in many areas of machine learning, including image generation (Du and Mordatch 2019), natural language processing (Bakhtin et al 2021;He et al 2021), and density estimation (Saremi et al 2018;Song et al 2019). This is largely due to its ability to generalize probabilistic and deterministic approaches to classification, regression, and estimation (LeCun et al 2006;Song and Kingma 2021).…”

Section: Related Workmentioning

confidence: 99%

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Wang,

Xiao

et al. 2024

AAAI

View full text Add to dashboard Cite

Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on two manipulation environments and four tasks reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Wang,

Xiao

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…Recent studies [20,57,58] have investigated training techniques that enable the application of EBMs to high-dimensional data and address issues of training stability. EBMs have been applied to various areas including image generation [4,18,20,29,76,77], graph generation [63], image classification [24], regression [27,28], continual learning [50] and natural language processing [17,32,71].…”

Section: Related Workmentioning

confidence: 99%

“…Calibration of uncertainty. We also propose to look at the models' calibration since this was previously shown to be improved by energy-based training [24,32]. A model is considered well-calibrated if its confidence (e.g.…”

Section: Supervised Energy-based Trainingmentioning

confidence: 99%

EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering

Shevchenko¹,

Abbasnejad²,

Dick³

et al. 2022

Preprint

View full text Add to dashboard Cite

Context. The availability of clean and diverse labeled data is a major roadblock for training models on complex tasks such as visual question answering (VQA). The extensive work on large vision-and-language models has shown that self-supervised learning is effective for pretraining multimodal interactions. In this technical report, we focus on visual representations. We review and evaluate self-supervised methods to leverage unlabeled images and pretrain a model, which we then fine-tune on a custom VQA task that allows controlled evaluation and diagnosis. We compare energy-based models (EBMs) with contrastive learning (CL). While EBMs are growing in popularity, they lack an evaluation on downstream tasks.Findings. Both EBMs and CL can learn representations from unlabeled images that enable training a VQA model on very little annotated data. In a simple setting similar to CLEVR, we find that CL representations also improve systematic generalization, and even match the performance of representations from a larger, supervised, ImageNetpretrained model. However, we find EBMs to be difficult to train because of instabilities and high variability in their results. We, therefore, investigate other purported benefits of EBMs. They prove useful for OOD detection, but other results on supervised energy-based training and uncertainty calibration are largely negative. Conclusions.(1) We make the encouraging observation that self-supervised visual pretraining allows displacing some of the requirements for training data from the main/supervised to a pretraining/unsupervised stage.(2) CL currently seems a preferable option over EBMs. To our surprise, EBMs could not achieve the benefits purported in the literature, even in a toy setting.

show abstract

“…For example, Gal and Ghahramani (2016) propose to adopt multiple predictions with different dropout masks and then combine them to get the confidence estimate. Recently, several works focus on the calibration of PLMs models for NLP tasks (Hendrycks et al, 2019;Desai and Durrett, 2020;Jung et al, 2020;He et al, 2021;Park and Caragea, 2022;Bose et al, 2022). Dan and Roth (2021) investigate the calibration properties of different transformer architectures and sizes of BERT.…”

Section: Related Workmentioning

confidence: 99%

Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates

Li¹,

Hu²,

Chen³

2022

Preprint

View full text Add to dashboard Cite

Calibration strengthens the trustworthiness of black-box models by producing better accurate confidence estimates on given examples. However, little is known about if model explanations can help confidence calibration. Intuitively, humans look at important features attributions and decide whether the model is trustworthy. Similarly, the explanations can tell us when the model may or may not know. Inspired by this, we propose a method named CME that leverages model explanations to make the model less confident with non-inductive attributions. The idea is that when the model is not highly confident, it is difficult to identify strong indications of any class, and the tokens accordingly do not have high attribution scores for any class and vice versa. We conduct extensive experiments on six datasets with two popular pre-trained language models in the in-domain and out-ofdomain settings. The results show that CME improves calibration performance in all settings. The expected calibration errors are further reduced when combined with temperature scaling. Our findings highlight that model explanations can help calibrate posterior estimates.

show abstract

Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models

Cited by 5 publications

References 13 publications

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning

EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering

Calibration Meets Explanation: A Simple and Effective Approach for Model Confidence Estimates

Contact Info

Product

Resources

About