"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

Bastings, Jasmijn; Ebert, Sebastian; Zablotskaia, Polina; Sandholm, Anders; Filippova, Katja

doi:10.48550/arxiv.2111.07367

Cited by 6 publications

(8 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The desiderata of AI interpretability for advanced AI systems (and especially FMs and GenAI systems) are broadly agreed. Ensuring sufficient interpretability can help AI research scientists and developers to debug the models they are building and to uncover otherwise hidden or unforeseeable failure modes, thereby improving downstream model functioning and performance (Bastings et al, 2022;Luo & Specia, 2024;. It can also help detect and mitigate discriminatory biases that may be buried within model architectures (Alikhademi et al, 2021;Zhao, Chen, et al, 2024;Zhou et al, 2020).…”

Section: Risks From Model Scaling: Model Opacity and Complexitymentioning

confidence: 99%

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Leslie,

Perini

2024

Harvard Data Science Review

View full text Add to dashboard Cite

stakeholders from across industry, academia, government, and civil society, and from around the globe, had made concerted efforts to develop standards, policies, and governance mechanisms to ensure the ethical, responsible, and equitable production and use of AI systems.However, as we then show, despite these ostensibly supportive activities and background conditions, several primary drivers of future shock converged to produce an international AI policy and governance crisis in the wake of the dawning of the GenAI era. Such a crisis, we argue, was marked by the disconnect between the strengthening thrust of public concerns about the hazards posed by the hasty industrial scaling of GenAI and the absence of effectual regulatory mechanisms and needed policy interventions to address such hazards. In painting a broad-stroked picture of this crisis, we underscore two sets of contributing factors. First, there have been factors that have demonstrated the absence of various vital aspects of AI policy and governance capability and execution-and thus the absence of key preconditions for readiness and resilience in managing technological transformation. These include prevalent enforcement gaps in existing digital-and data-related laws (e.g., intellectual property and data protection statutes), a lack of regulatory AI capacity, democratic deficits in the production of standards for trustworthy AI, and widespread evasionary tactics of ethic washing and state-enabled deregulation.Second, there have been factors that have significantly contributed to the presence of a new scale and order of systemic-, societal-, and biospheric-level risks and harms. Chief among these were the closely connected dynamics of unprecedented scaling and centralization that emerged as both drivers and by-products of the GenAI revolution. We focus, in particular, on model scaling and industrial scaling. Whereas the scaling of data, model size, and compute were linked to the emergence of serious model intrinsic risks deriving from the unfathomability of training data, model opacity and complexity, emergent model capabilities, and exponentially expanding compute costs, the rapid industrialization of FMs and GenAI systems meant the onset of a new scale of systemic risks that spanned the social, political, economic, cultural, and natural ecosystems in which these systems were embedded. The brute-force commercialization of GenAI ushered in a new age of widespread exposure in which increasing numbers of impacted people and communities at large were made susceptible to the risks and harms issuing from model scaling and to new possibilities for misuse, abuse, and cascading system-level effects.Alongside these aspects of model scaling and industrial scaling, patterns of economic and geopolitical centralization only further intensified conditions of future shock. The steering and momentum of these scaling dynamics lay largely in the hands of a few large tech corporations, which essentially controlled the data, compute, and skills and knowledge infrastructures r...

show abstract

Section: Risks From Model Scaling: Model Opacity and Complexitymentioning

confidence: 99%

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Leslie,

Perini

2024

Harvard Data Science Review

View full text Add to dashboard Cite

show abstract

“…In this research, we used a relatively small set of measures to compare model performance. A promising area of future work is to employ salience methods (Bastings et al, 2021) or training data attribution (Pruthi et al, 2020) to determine which parts of the input affect model performance. Using these methods together with visual analysis tools (such as LIT; Tenney et al, 2020) could enable deeper insight into the relationships between prompt designs, inputs, and model outputs vis-a-vis API design strategies.…”

Section: Future Workmentioning

confidence: 99%

Synthetic APIs: Enabling Language Models to Act as Interlocutors Between Natural Language and Code

Mullins,

Terry

2023

Proceedings of the Annual Hawaii International Conference on System Sciences

View full text Add to dashboard Cite

Large language models (LLMs) can synthesize code from natural language descriptions or by completing code in-context. In this paper, we consider the ability of LLMs to synthesize code, at inference time, for a novel API not in its training data, and specifically examine the impact of different API designs on this ability. We find that: 1) code examples in model training data seem to facilitate API use at inference time; 2) hallucination is the most common failure mode; and 3) the designs of both the novel API and the prompt affect performance. In light of these findings, we introduce the concept of a Synthetic API: an API designed to be used by LLMs instead of by humans. Synthetic APIs for LLMs offer the potential to further accelerate development of natural language interfaces to arbitrary tools and services.

show abstract

“…Current NLP methods tend to learn implicitly superficial cues instead of the causal associations between the input and labels, as evidenced by (Geirhos et al, 2020;Guo et al, 2023b), and thus usually show their brittleness when deployed in real-world scenarios. Recent work (Sugawara et al, 2018(Sugawara et al, , 2020Lai et al, 2021;Wang et al, 2021b;Du et al, 2021a;Zhu et al, 2021;Bastings et al, 2021) indicates that current PLMs unintentionally learn shortcuts to trick specific benchmarks and such tricks (i.e., syntactic heuristics, lexical overlap, and relevant words) that use partial evidence to produce unreliable output, which is particularly serious in the open domain.…”

Section: Featuresmentioning

confidence: 99%

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Yang,

Song,

Ren

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in natural language understanding. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We further discuss the challenges involved and potential future research directions. By providing convenient access to existing work, we hope this survey will encourage future research in this area.

show abstract

"Will You Find These Shortcuts?" A Protocol for Evaluating the Faithfulness of Input Salience Methods for Text Classification

Cited by 6 publications

References 25 publications

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Synthetic APIs: Enabling Language Models to Act as Interlocutors Between Natural Language and Code

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Contact Info

Product

Resources

About