Uncertainty Estimation for Language Reward Models

Gleave, Adam; Irving, Geoffrey

doi:10.48550/arxiv.2203.07472

Cited by 2 publications

(5 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Still, it can be safely stated already that the two Softmax Ensemble techniques (KLD and VE) perform far worse than any other technique. This confirms the work of Gleave and Irving (2022), where the usefulness of softmax ensemble methods for Transformer models was investigated, with the same conclusion as ours: ensemble methods perform far worse than even random sampling for Transformer models.…”

Section: Test Accuraciessupporting

confidence: 91%

“…The aforementioned Uncertainty-measures can directly be used in AL strategies to sort the pool of unlabeled samples to select exactly those samples for labeling that have the lowest confidence/highest uncertainty. As repeatedly reported by others Karamcheti et al (2021); Gleave and Irving (2022); We propose three easily implementable methods on improving uncertainty-based AL strategies by preventing potentially harmful outliers from being selected for labeling. An uncertainty-based AL strategy always selects those samples for labeling first, where the uncertainty is the highest.…”

Section: Uncertainty-clipping (Uc)mentioning

confidence: 86%

“…But as has been mentioned in the past by other researchers Lakshminarayanan et al (2017); Weiss and Tonella (2022); Gleave and Irving (2022); Sankararaman et al (2022); D'Arcy and Downey (2022), the training objective for NNs is purely to maximize the value of the correct output neuron, not to create a true confidence-probability. An inherent limitation of the softmax function is its inability to have -in the theoretical case -zero confidence in its prediction, as the sum of all possible outcomes always equals 1.…”

Section: Softmax As Uncertainty-measurementioning

confidence: 97%

“…Despite successful application in a variety of domains (Gonsior et al, 2020;Gal et al, 2017;Lowell et al, 2019), AL fails to work for very deep NN such as Transformerencoder models, rarely beating pure random sampling. The common explanation (Karamcheti et al, 2021;Gleave and Irving, 2022;Sankararaman et al, 2022;D'Arcy and Downey, 2022) is that AL methods favor hard-to-learn samples, often simply called outliers, which therefore neglects the potential benefits gained from AL. Another potential explanation -to the best of our knowledge not yet covered in the AL literature -could be the calculation of the uncertainty of the NN.…”

mentioning

confidence: 99%

See 3 more Smart Citations

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers

Gonsior,

Falkenberg,

Magino

et al. 2024

Inf Syst Front

View full text Add to dashboard Cite

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper, we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (commonly called outliers), resulting in worse than random performance, instead of samples, which actually reduce the uncertainty of the learned language model. As a solution, this paper proposes Uncertainty-Clipping, a heuristic to systematically exclude samples, which results in improvements for most methods compared to the softmax function.

show abstract

Section: Test Accuraciessupporting

confidence: 91%

Section: Uncertainty-clipping (Uc)mentioning

confidence: 86%

Section: Softmax As Uncertainty-measurementioning

confidence: 97%

mentioning

confidence: 99%

See 2 more Smart Citations

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers

Gonsior,

Falkenberg,

Magino

et al. 2024

Inf Syst Front

View full text Add to dashboard Cite

show abstract

“…While scholars have studied model uncertainty, prior work has focused on more accurately extracting model confidence (Kuhn et al, 2023;Sun et al, 2022;Gleave and Irving, 2022), measuring (Kwiatkowski et al, 2019b;Radford et al, 2019;Liang et al, 2023) and improving model calibration 2022) teach models to be linguistically calibrated when answering math questions.…”

Section: Related Workmentioning

confidence: 99%

Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models

Zhou,

Jurafsky,

Hashimoto

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

The increased deployment of LMs for realworld tasks involving knowledge and facts makes it important to understand model epistemology: what LMs think they know, and how their attitudes toward that knowledge are affected by language use in their inputs. Here, we study an aspect of model epistemology: how epistemic markers of certainty, uncertainty, or evidentiality like "I'm sure it's", "I think it's", or "Wikipedia says it's" affect models, and whether they contribute to model failures. We develop a typology of epistemic markers and inject 50 markers into prompts for question answering. We find that LMs are highly sensitive to epistemic markers in prompts, with accuracies varying more than 80%. Surprisingly, we find that expressions of high certainty result in a 7% decrease in accuracy as compared to low certainty expressions; similarly, factive verbs hurt performance, while evidentials benefit performance. Our analysis of a popular pretraining dataset shows that these markers of uncertainty are associated with answers on question-answering websites, while markers of certainty are associated with questions. These associations may suggest that the behavior of LMs is based on mimicking observed language use, rather than truly reflecting epistemic uncertainty.

show abstract

Uncertainty Estimation for Language Reward Models

Cited by 2 publications

References 26 publications

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models by Discarding Outliers

Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models

Contact Info

Product

Resources

About