Inseq: An Interpretability Toolkit for Sequence Generation Models

Sarti, Gabriele; Feldhus, Nils; Sickert, Ludwig; Wal, Oskar van der

doi:10.18653/v1/2023.acl-demo.40

Cited by 13 publications

(10 citation statements)

References 28 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the Translation stage is still opaque, meaning it is not self-interpretable how the LM generates the reasoning chain from the question. It is still an under-explored question whether it is possible to improve the interpretability of the LM generation process in general, and a few recent studies have made promising early progress (Yin and Neubig, 2022;Sarti et al, 2023) that might be used to improve the faithfulness of the Translation stage.…”

Section: Limitationsmentioning

confidence: 99%

Faithful Chain-of-Thought Reasoning

Lyu,

Havaldar,

Stein

et al. 2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query → symbolic reasoning chain) and Problem Solving (reasoning chain → answer), using an LM and a deterministic solver respectively. This guarantees that the reasoning chain provides a faithful explanation of the final answer. Aside from interpretability, Faithful CoT also improves empirical performance: it outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning, 5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference. Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong synergy between faithfulness and accuracy. 1

show abstract

Section: Limitationsmentioning

confidence: 99%

Faithful Chain-of-Thought Reasoning

Lyu,

Havaldar,

Stein

et al. 2023

Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacifi

View full text Add to dashboard Cite

show abstract

“…More recent solutions enable the study of the impact of source and target tokens (Ferrando et al, 2022) or discover the causes of hallucinations (Dale et al, 2023). Concurrent work by Sarti et al (2023b) uses post-hoc XAI methods to uncover gender bias in Turkish-English neural MT models. We expand their setup to more complex sentences and the notional-togrammatical gender MT in two more languages.…”

Section: Related Workmentioning

confidence: 99%

“…We aggregate first over f and then g because we expect token-level per-unit scores to represent token attribution more expressively, and we do not want to lose such information with an initial pooling along the hidden size. We use Inseq (Sarti et al, 2023b) to compute and aggregate the scores.…”

Section: B1 Interpretabilitymentioning

confidence: 99%

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Attanasio,

Plaza del Arco,

Nozza

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Recent instruction fine-tuned models can solve multiple NLP tasks when prompted to do so, with machine translation (MT) being a prominent use case. However, current research often focuses on standard performance benchmarks, leaving compelling fairness and ethical considerations behind. In MT, this might lead to misgendered translations, resulting, among other harms, in the perpetuation of stereotypes and prejudices. In this work, we address this gap by investigating whether and to what extent such models exhibit gender bias in machine translation and how we can mitigate it. Concretely, we compute established gender bias metrics on the WinoMT corpus from English to German and Spanish. We discover that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. Next, using interpretability methods, we unveil that models systematically overlook the pronoun indicating the gender of a target occupation in misgendered translations. Finally, based on this finding, we propose an easy-to-implement and effective bias mitigation solution based on fewshot learning that leads to significantly fairer translations. 1 * * * * Table 12: Gender bias evaluation on WinoMT for different strategies of human-written demonstration sampling. Zero-shot variants in top rows. Best results in bold. * * Results not reported due to unstable translations (i.e., mostly empty or not in the target language).

show abstract

“…These logits vary vastly among different languages. To focus on the relation between the original and edited fact, we normalize the logits following the previous work (Sarti et al, 2023) as…”

Section: Subword Vocabulary Overlapmentioning

confidence: 99%

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

Qi,

Fernández,

Bisazza

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Multilingual large-scale Pretrained Language Models (PLMs) have been shown to store considerable amounts of factual knowledge, but large variations are observed across languages. With the ultimate goal of ensuring that users with different language backgrounds obtain consistent feedback from the same model, we study the cross-lingual consistency (CLC) of factual knowledge in various multilingual PLMs. To this end, we propose a Rankingbased Consistency (RankC) metric to evaluate knowledge consistency across languages independently from accuracy. Using this metric, we conduct an in-depth analysis of the determining factors for CLC, both at model level and at language-pair level. Among other results, we find that increasing model size leads to higher factual probing accuracy in most languages, but does not improve cross-lingual consistency. Finally, we conduct a case study on CLC when new factual associations are inserted in the PLMs via model editing. Results on a small sample of facts inserted in English reveal a clear pattern whereby the new piece of knowledge transfers only to languages with which English has a high RankC score. 1

show abstract

Inseq: An Interpretability Toolkit for Sequence Generation Models

Cited by 13 publications

References 28 publications

Faithful Chain-of-Thought Reasoning

Faithful Chain-of-Thought Reasoning

A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

Contact Info

Product

Resources

About