Explanation-Based Human Debugging of NLP Models: A Survey

Lertvittayakumjorn, Piyawat; Toni, Francesca

doi:10.1162/tacl_a_00440

Cited by 25 publications

(16 citation statements)

References 84 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another context where AXPLR could be useful is explanation-based human debugging of the model [49]. The individual model weight w i for the pattern feature p i may not make sense to humans when p i is in fact related to other pattern features (as we can see in Experiment 2, where τ(α i ) does not quite correlate with human reasoning).…”

Section: General Considerations On Axplrmentioning

confidence: 99%

Argumentative Explanations for Pattern-Based Text Classifiers

Lertvittayakumjorn¹,

Toni²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

Recent works in Explainable AI mostly address the transparency issue of black-box models or create explanations for any kind of models (i.e., they are model-agnostic), while leaving explanations of interpretable models largely underexplored. In this paper, we fill this gap by focusing on explanations for a specific interpretable model, namely pattern-based logistic regression (PLR) for binary text classification. We do so because, albeit interpretable, PLR is challenging when it comes to explanations. In particular, we found that a standard way to extract explanations from this model does not consider relations among the features, making the explanations hardly plausible to humans. Hence, we propose AXPLR, a novel explanation method using (forms of) computational argumentation to generate explanations (for outputs computed by PLR) which unearth model agreements and disagreements among the features. Specifically, we use computational argumentation as follows: we see features (patterns) in PLR as arguments in a form of quantified bipolar argumentation frameworks (QBAFs) and extract attacks and supports between arguments based on specificity of the arguments; we understand logistic regression as a gradual semantics for these QBAFs, used to determine the arguments' dialectic strength; and we study standard properties of gradual semantics for QBAFs in the context of our argumentative re-interpretation of PLR, sanctioning its suitability for explanatory purposes. We then show how to extract intuitive explanations (for outputs computed by PLR) from the constructed QBAFs. Finally, we conduct an empirical evaluation and two experiments in the context of human-AI collaboration to demonstrate the advantages of our resulting AXPLR method.

show abstract

Section: General Considerations On Axplrmentioning

confidence: 99%

Argumentative Explanations for Pattern-Based Text Classifiers

Lertvittayakumjorn¹,

Toni²

2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…Besides, Lertvittayakumjorn and Toni (2021) recognize that there are explanations which stay between the local and the global scopes. These amount to explanations for groups of examples such as a group of false positives of a certain class and a cluster of examples with some fixed features.…”

Section: Scopes Of Explanationsmentioning

confidence: 99%

“…For instance, one may develop an annotation tool which shows explanations for MT metrics as supporting information and measure human annotators' efficiency, compared to the case where they use the system with no explanation. Also, developing a new framework for incorporating human feedback on different types of explanations to improve the metric is another way to evaluate the explanations with respect to a downstream task (i.e., metric improvement) (Lertvittayakumjorn and Toni 2021). Lastly, it is also possible to measure user trust in the metrics with and without the explanations so as to assess whether the explanations can boost the user trust and promote adoption of complex model-based metrics (Hoffman et al 2018;).…”

Section: Extrinsic Evaluation Of Explanations For Mt Metricsmentioning

confidence: 99%

“…For users of the AI systems, explanations help them make more informed decisions (especially in high-stake domains) (Sachan et al 2020;, better understand and hence gain trust of the AI systems (Pu and Chen 2006;Toreini et al 2020), and even learn from the AI systems to accomplish the tasks more successfully (Mac Aodha et al 2018;Lai, Liu, and Tan 2020). For AI system designers and developers, explanations allow them identify the problems and weaknesses of the system (Krause, Perer, and Ng 2016;Han, Wallace, and Tsvetkov 2020), calibrate the confidence of the system (Zhang, Liao, and Bellamy 2020), and improve the system accordingly (Kulesza et al 2015;Lertvittayakumjorn and Toni 2021).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Towards Explainable Evaluation Metrics for Natural Language Generation

Leiter¹,

Lertvittayakumjorn²,

Fomicheva³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics (such as BERTScore or MoverScore) are based on black-box language models such as BERT or XLM-R. They often achieve strong correlations with human judgments, but recent research indicates that the lower-quality classical metrics remain dominant, one of the potential reasons being that their decision processes are transparent. To foster more widespread acceptance of the novel high-quality metrics, explainability thus becomes crucial. In this concept paper, we identify key properties and propose key goals of explainable machine translation evaluation metrics. We also provide a synthesizing overview over recent approaches for explainable machine translation metrics and discuss how they relate to those goals and properties. Further, we conduct own novel experiments, which (among others) find that current adversarial NLP techniques are unsuitable for automatically identifying limitations of high-quality black-box evaluation metrics, as they are not meaning-preserving. Finally, we provide a vision of future approaches to explainable evaluation metrics and their evaluation. We hope that our work can help catalyze and guide future research on explainable evaluation metrics and, mediately, also contribute to better and more transparent text generation systems.

show abstract

“…Different from computer vision, the basic input units in NLP for neural models are discrete language tokens rather than continuous pixels in images [20,38,15]. This discrete nature of language poses a challenge for interpreting neural NLP models, making the interpreting methods in CV hard to be directly applied to NLP domain [22,114]. To accommodate the discrete nature of texts, a great variety of works have rapidly emerged over the past a few years for neural model interpretability.…”

Section: Introductionmentioning

confidence: 99%

Interpreting Deep Learning Models in Natural Language Processing: A Review

Sun¹,

Yang²,

Li³

et al. 2021

Preprint

View full text Add to dashboard Cite

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only reduces the reliability of neural NLP systems but also limits the scope of their applications in areas where interpretability is essential (e.g., health care applications). In response, the increasing interest in interpreting neural NLP models has spurred a diverse array of interpretation methods over recent years. In this survey, we provide a comprehensive review of various interpretation methods for neural models in NLP. We first stretch out a high-level taxonomy for interpretation methods in NLP, i.e., training-based approaches, test-based approaches and hybrid approaches. Next, we describe sub-categories in each category in detail, e.g., influence-function based methods, KNN-based methods, attention-based models, saliency-based methods, perturbation-based methods, etc. We point out deficiencies of current methods and suggest some avenues for future research.

show abstract

Explanation-Based Human Debugging of NLP Models: A Survey

Cited by 25 publications

References 84 publications

Argumentative Explanations for Pattern-Based Text Classifiers

Argumentative Explanations for Pattern-Based Text Classifiers

Towards Explainable Evaluation Metrics for Natural Language Generation

Interpreting Deep Learning Models in Natural Language Processing: A Review

Contact Info

Product

Resources

About