Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Garg, Siddhant; Moschitti, Alessandro

doi:10.18653/v1/2021.emnlp-main.583

Cited by 13 publications

(11 citation statements)

References 55 publications

(52 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The softmax probability assigned to class '1' by this calibrator is used as the confidence estimator for selective prediction. We refer to this approach as Calib C. We also train a transformer-based model for calibration (Calib T) that leverages the entire input text for this classifi-cation task instead of the syntactic features (Garg and Moschitti, 2021).…”

Section: Calibrationmentioning

confidence: 99%

Towards Improving Selective Prediction Ability of NLP Systems

Varshney¹,

Mishra²,

Baral³

2022

Proceedings of the 7th Workshop on Representation Learning for NLP

View full text Add to dashboard Cite

It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Outof-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over MaxProb -a selective prediction baseline-on NLI and DD tasks respectively.

show abstract

Section: Calibrationmentioning

confidence: 99%

Towards Improving Selective Prediction Ability of NLP Systems

Varshney¹,

Mishra²,

Baral³

2022

Proceedings of the 7th Workshop on Representation Learning for NLP

View full text Add to dashboard Cite

show abstract

“…These confidence scores can be used in a "selective QA" setting (Kamath et al, 2020), where the model can abstain on a certain fraction of questions where it assigns low confidence to its answers. We use the area under coverage-accuracy curve(AUC) to evaluate how well a model is calibrated as in past literature (Kamath et al, 2020;Garg and Moschitti, 2021;Ye and Durrett, 2022). The curve plots the average accuracy with varying fractions (coverage) of questions being answered (examples in Figure 5).…”

Section: Calibrating Advhotpotmentioning

confidence: 99%

The Unreliability of Explanations in Few-Shot In-Context Learning

Yang¹,

Durrett²

2022

Preprint

View full text Add to dashboard Cite

How can prompting a large language model like GPT-3 with explanations improve in-context learning? We focus specifically on two NLP tasks that involve reasoning over text, namely question answering and natural language inference. Including explanations in the prompt and having the model generate them does not consistently improve performance in the settings we study, contrary to recent results on symbolic reasoning tasks (Nye et al., 2021;Wei et al., 2022). Despite careful prompting, explanations generated by GPT-3 may not even be factually grounded in the input, even on simple tasks with straightforward extractive explanations.However, these flawed explanations can still be useful as a way to verify GPT-3's predictions post-hoc. Through analysis in three settings, we show that explanations judged as good by humans-those that are logically consistent with the input and the prediction-usually indicate more accurate predictions. Following these observations, we present a framework for calibrating model predictions based on the reliability of the explanations. Our framework trains calibrators using automatically extracted scores that approximately assess the reliability of explanations, which helps improve performance across three different datasets. Calibrator Prompt Train Example Test Example Explanation +Label OutputThe prediction is incorrect. The explanation is not factual with respect to the context.

show abstract

“…The performance of AS2 systems in practical applications is typically (Garg and Moschitti, 2021) measured using the Accuracy in providing correct answers for the questions (the percentage of correct responses provided by the system), also called the Precision-at-1 (P@1). In addition to P@1, we use Mean Average Precision (MAP) and Mean Reciprocal Recall (MRR) to evaluate the ranking produced of the set of candidates by the model.…”

Section: B3 Metricsmentioning

confidence: 99%

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Liello¹,

Garg²,

Soldaini³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.

show abstract

Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering

Cited by 13 publications

References 55 publications

Towards Improving Selective Prediction Ability of NLP Systems

Towards Improving Selective Prediction Ability of NLP Systems

The Unreliability of Explanations in Few-Shot In-Context Learning

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

Contact Info

Product

Resources

About