2024
DOI: 10.1145/3639372
|View full text |Cite
|
Sign up to set email alerts
|

Explainability for Large Language Models: A Survey

Haiyan Zhao,
Hanjie Chen,
Fan Yang
et al.

Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this paper, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based lan… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
5
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(7 citation statements)
references
References 101 publications
0
5
0
Order By: Relevance
“…Moreover LLMs are often seen as "black boxes" due to their complex and opaque nature, making it difficult to understand how they process data and arrive at specific outputs [63]. This lack of transparency can hinder the identification and rectification of privacy and security issues within the model.…”
Section: Limitationsmentioning
confidence: 99%
“…Moreover LLMs are often seen as "black boxes" due to their complex and opaque nature, making it difficult to understand how they process data and arrive at specific outputs [63]. This lack of transparency can hinder the identification and rectification of privacy and security issues within the model.…”
Section: Limitationsmentioning
confidence: 99%
“…Ensuring sufficient interpretability can help AI research scientists and developers to debug the models they are building and to uncover otherwise hidden or unforeseeable failure modes, thereby improving downstream model functioning and performance (Bastings et al, 2022;Luo & Specia, 2024;. It can also help detect and mitigate discriminatory biases that may be buried within model architectures (Alikhademi et al, 2021;Zhao, Chen, et al, 2024;Zhou et al, 2020). Furnishing understandable and accessible explanations of the rationale behind system outputs can likewise help to establish the lawfulness of AI systems (e.g., their compliance with data protection law and equality law) (Chuang et al, 2024; ICO/Turing, 2020) as well as to ensure responsible and trustworthy implementation by system deployers, who are better equipped to grasp system capabilities, limitations, and flaws and to integrate system outputs into their own reasoning, judgment, and experience (ICO/Turing, 2020; Leslie, Rincón, et al, 2024).…”
Section: Risks From Model Scaling: Model Opacity and Complexitymentioning
confidence: 99%
“…While the field of explainable AI (often referred to simply as XAI) has made notable progress over the past several years in advancing knowledge about the behaviors and potential flaws of opaque AI systems (Angelov et al, 2021;Räuker et al, 2023;Zhao, Chen, et al, 2024), myriad critical voices have emphasized that applications of contemporary AI explainability methods to black-box AI systems are rife with shortcomings that continue to hamper their real-world utility. These critics have cautioned against 'false hopes' that current explainability techniques provide justified reassurance about the safety, accuracy, reliability, and fairness of black-box models, stressing that contemporary approaches often generate misleading or unfaithful explanations (Ghassemi et al, 2021, p. e746).…”
Section: Risks From Model Scaling: Model Opacity and Complexitymentioning
confidence: 99%
See 2 more Smart Citations