2021
DOI: 10.48550/arxiv.2111.07367
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

Abstract: Feature attribution a.k.a. input salience methods which assign an importance score to a feature are abundant but may produce surprisingly different results for the same model on the same input. While differences are expected if disparate definitions of importance are assumed, most methods claim to provide faithful attributions and point at the features most relevant for a model's prediction. Existing work on faithfulness evaluation is not conclusive and does not provide a clear answer as to how different metho… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(8 citation statements)
references
References 25 publications
0
3
0
Order By: Relevance
“…The desiderata of AI interpretability for advanced AI systems (and especially FMs and GenAI systems) are broadly agreed. Ensuring sufficient interpretability can help AI research scientists and developers to debug the models they are building and to uncover otherwise hidden or unforeseeable failure modes, thereby improving downstream model functioning and performance (Bastings et al, 2022;Luo & Specia, 2024;. It can also help detect and mitigate discriminatory biases that may be buried within model architectures (Alikhademi et al, 2021;Zhao, Chen, et al, 2024;Zhou et al, 2020).…”
Section: Risks From Model Scaling: Model Opacity and Complexitymentioning
confidence: 99%
“…The desiderata of AI interpretability for advanced AI systems (and especially FMs and GenAI systems) are broadly agreed. Ensuring sufficient interpretability can help AI research scientists and developers to debug the models they are building and to uncover otherwise hidden or unforeseeable failure modes, thereby improving downstream model functioning and performance (Bastings et al, 2022;Luo & Specia, 2024;. It can also help detect and mitigate discriminatory biases that may be buried within model architectures (Alikhademi et al, 2021;Zhao, Chen, et al, 2024;Zhou et al, 2020).…”
Section: Risks From Model Scaling: Model Opacity and Complexitymentioning
confidence: 99%
“…In this research, we used a relatively small set of measures to compare model performance. A promising area of future work is to employ salience methods (Bastings et al, 2021) or training data attribution (Pruthi et al, 2020) to determine which parts of the input affect model performance. Using these methods together with visual analysis tools (such as LIT; Tenney et al, 2020) could enable deeper insight into the relationships between prompt designs, inputs, and model outputs vis-a-vis API design strategies.…”
Section: Future Workmentioning
confidence: 99%
“…Current NLP methods tend to learn implicitly superficial cues instead of the causal associations between the input and labels, as evidenced by (Geirhos et al, 2020;Guo et al, 2023b), and thus usually show their brittleness when deployed in real-world scenarios. Recent work (Sugawara et al, 2018(Sugawara et al, , 2020Lai et al, 2021;Wang et al, 2021b;Du et al, 2021a;Zhu et al, 2021;Bastings et al, 2021) indicates that current PLMs unintentionally learn shortcuts to trick specific benchmarks and such tricks (i.e., syntactic heuristics, lexical overlap, and relevant words) that use partial evidence to produce unreliable output, which is particularly serious in the open domain.…”
Section: Featuresmentioning
confidence: 99%