FIND: Human-in-the-Loop Debugging Deep Text Classifiers

Lertvittayakumjorn, Piyawat; Specia, Lucia; Toni, Francesca

doi:10.18653/v1/2020.emnlp-main.24

Cited by 27 publications

(44 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Improvements can be applied by retraining and either removing input features (Ribeiro et al, 2016) or integrating explanation annotations into the objective function via explanation regularization (Ross et al, 2017;Liu and Avci, 2019;Rieger et al, 2020). Alternatively, features can also be disabled on the representation level (Lertvittayakumjorn et al, 2020).…”

Section: Setup Iii: Identify and Improvementioning

confidence: 99%

Towards Benchmarking the Utility of Explanations for Model Debugging

Idahl¹,

Lyu²,

Gadiraju³

et al. 2021

Proceedings of the First Workshop on Trustworthy Natural Language Processing

View full text Add to dashboard Cite

Post-hoc explanation methods are an important class of approaches that help understand the rationale underlying a trained model's decision. But how useful are they for an end-user towards accomplishing a given task? In this vision paper, we argue the need for a benchmark to facilitate evaluations of the utility of post-hoc explanation methods. As a first step to this end, we enumerate desirable properties that such a benchmark should possess for the task of debugging text classifiers. Additionally, we highlight that such a benchmark facilitates not only assessing the effectiveness of explanations but also their efficiency.

show abstract

Section: Setup Iii: Identify and Improvementioning

confidence: 99%

Towards Benchmarking the Utility of Explanations for Model Debugging

Idahl¹,

Lyu²,

Gadiraju³

et al. 2021

Proceedings of the First Workshop on Trustworthy Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…Improvements can be applied by retraining and either removing input features or integrating explanation annotations into the objective function via explanation regularization (Ross et al, 2017;Liu and Avci, 2019;Rieger et al, 2020). Alternatively, features can also be disabled on the representation level (Lertvittayakumjorn et al, 2020).…”

Section: Setup Iii: Identify and Improvementioning

confidence: 99%

Proceedings of the First Workshop on Trustworthy Natural Language Processing

2021

View full text Add to dashboard Cite

We introduce a method that transforms a rulebased relation extraction (RE) classifier into a neural one such that both interpretability and performance are achieved. Our approach jointly trains a RE classifier with a decoder that generates explanations for these extractions, using as sole supervision a set of rules that match these relations. Our evaluation on the TACRED dataset shows that our neural RE classifier outperforms the rule-based one we started from by 9 F1 points; our decoder generates explanations with a high BLEU score of over 90%; and, the joint learning improves the performance of both the classifier and decoder.

show abstract

“…Further, example-based methods, such as influence functions (Koh and Liang, 2017), identify training data points which are the most important for particular predictions. Existing works have proposed ways to improve models by incorporating human feedback, in response to the explanations, by: adding model constraints by fixing certain parameters (Stumpf et al, 2009;Lertvittayakumjorn et al, 2020), adding training samples (Teso and Kersting, 2019), and adjusting models' weights directly (Kulesza et al, 2015).…”

Section: Introductionmentioning

confidence: 99%

“…Recently, explanatory debugging has been applied to more complex models using refined interpretability methods. In FIND (Lertvittayakumjorn et al, 2020), a masking matrix is added at the end of a CNN text classifier so as to disable particular CNN filters based on human feedback in response to LRP-based explanations (Arras et al, 2016). In CAIPI (Teso and Kersting, 2019), the user investigates and corrects a LIMEbased explanation (Ribeiro et al, 2016) for each prediction.…”

Section: Introductionmentioning

confidence: 99%

HILDIF: Interactive Debugging of NLI Models Using Influence Functions

Zylberajch¹,

Lertvittayakumjorn²,

Toni³

2021

Proceedings of the First Workshop on Interactive Learning for Natural Language Processing

Self Cite

View full text Add to dashboard Cite

Biases and artifacts in training data can cause unwelcome behavior in text classifiers (such as shallow pattern matching), leading to lack of generalizability. One solution to this problem is to include users in the loop and leverage their feedback to improve models. We propose a novel explanatory debugging pipeline called HILDIF, enabling humans to improve deep text classifiers using influence functions as an explanation method. We experiment on the Natural Language Inference (NLI) task, showing that HILDIF can effectively alleviate artifact problems in fine-tuned BERT models and result in increased model generalizability.

show abstract

FIND: Human-in-the-Loop Debugging Deep Text Classifiers

Cited by 27 publications

References 45 publications

Towards Benchmarking the Utility of Explanations for Model Debugging

Towards Benchmarking the Utility of Explanations for Model Debugging

Proceedings of the First Workshop on Trustworthy Natural Language Processing

HILDIF: Interactive Debugging of NLI Models Using Influence Functions

Contact Info

Product

Resources

About