Towards Benchmarking the Utility of Explanations for Model Debugging

Idahl, Maximilian; Lyu, Lijun; Gadiraju, Ujwal; Anand, Avishek

doi:10.48550/arxiv.2105.04505

Cited by 3 publications

(3 citation statements)

References 3 publications

(3 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The second piece of related work is regarding debugging ML models that have been used in text classification [21], search [27,29,31], and many language tasks in general [6,9] by using explanations or interpretable machine learning approaches called explanationbased human debugging (EBHD). Lertvittayakumjorn and Toni [12] recently review EBHD approaches that exploit explanations to enable humans to give feedback and debug NLP models.…”

Section: Related Workmentioning

confidence: 99%

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

Zhang,

Setty,

Anand

2022

Preprint

Self Cite

View full text Add to dashboard Cite

We introduce SparCAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or Masked Language Model based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future. CCS CONCEPTS• Information systems → Online analytical processing; • Humancentered computing → User interface toolkits; • Computing methodologies → Reasoning about belief and knowledge.

show abstract

Section: Related Workmentioning

confidence: 99%

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

Zhang,

Setty,

Anand

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The prediction model can still perform well even if the attention weights don't correlate with the (sub-)token weight as desired by humans. Finally, there has been recent work on devising decoy datasets to measure utility of explanations methods for NLP models [25]. Our approach for rationale based explanations differs in the type of architectures, objectives, and general nature of its utility.…”

Section: Select-and-predict Modelsmentioning

confidence: 99%

Extractive Explanations for Interpretable Text Ranking

Leonhardt¹,

Rudra²,

Anand³

2021

Preprint

Self Cite

View full text Add to dashboard Cite

Machine learning models for the ad-hoc retrieval of documents and passages have recently shown impressive improvements due to better language understanding using large pre-trained language models. However, these over-parameterized models are inherently non-interpretable and do not provide any information on the parts of the documents that were used to arrive at a certain prediction.In this paper we introduce the select and rank paradigm for document ranking, where interpretability is explicitly ensured when scoring longer documents. Specifically, we first select sentences in a document based on the input query and then predict the querydocument score based only on the selected sentences, acting as an explanation. We treat sentence selection as a latent variable trained jointly with the ranker from the final output. We conduct extensive experiments to demonstrate that our inherently interpretable select-and-rank approach is competitive in comparison to other state-of-the-art methods and sometimes even outperforms them. This is due to our novel end-to-end training approach based on weighted reservoir sampling that manages to train the selector despite the stochastic sentence selection. We also show that our sentence selection approach can be used to provide explanations for models that operate on only parts of the document, such as BERT.

show abstract

“…If the model is incorrect in its assessment, it can be provided with a corrective input such as "The keywords 'great' and 'amazing' are important cues in predicting the sentiment of this sentence" where the keywords themselves are automatically identified by a post hoc explanation method. While post hoc explanations have generally been considered valuable tools for deepening our understanding of model behavior [11] and for identifying root causes of errors made by ML models [12,13], our work is the first to explore their utility in improving the performance of LLMs.…”

Section: Introductionmentioning

confidence: 99%

Post Hoc Explanations of Language Models Can Improve Language Models

Krishna¹,

Ma²,

Slack³

et al. 2023

Preprint

View full text Add to dashboard Cite

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, lead to critical insights for refining in-context learning.

show abstract

Towards Benchmarking the Utility of Explanations for Model Debugging

Cited by 3 publications

References 3 publications

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

Extractive Explanations for Interpretable Text Ranking

Post Hoc Explanations of Language Models Can Improve Language Models

Contact Info

Product

Resources

About