Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Swanson, Kyle; Yu, Lili; Leí, Tao

doi:10.18653/v1/2020.acl-main.496

Cited by 22 publications

(37 citation statements)

References 39 publications

(23 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Deep learning models typically function as black boxes offering very little insight into their decision making mechanics. To expose model understanding at various depths, researchers have proposed various structural probing (Tenney et al, 2018;Hewitt and Manning, 2019;Lin et al, 2019) and behavioral probing methods (McCoy et al, 2020;Goldberg, 2019;Warstadt et al, 2019;Ettinger, 2020), as well as input saliency maps to highlight the most important tokens/sentences in the input for each prediction (Serrano and Smith, 2019;Ribeiro et al, 2016;Swanson et al, 2020;Tenney et al, 2019), and input token relationships (Lamm et al, 2020). Alongside, there is work on producing textual rationales (Lei et al, 2016), which are snippets of NL to help explain model predictions.…”

Section: Related Workmentioning

confidence: 99%

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

Lakhotia¹,

Paranjape²,

Ghoshal³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations. However, these models have many shortcomings; they can fabricate explanations even for incorrect predictions, they are difficult to adapt to long input documents, and their training requires a large amount of labeled data. In this paper, we develop FiD-Ex 1 , which addresses these shortcomings for seq2seq models by: 1) introducing sentence markers to eliminate explanation fabrication by encouraging extractive generation, 2) using the fusion-in-decoder architecture to handle long input contexts, and 3) intermediate fine-tuning on re-structured open domain QA datasets to improve few-shot performance. FiD-Ex significantly improves over prior work in terms of explanation metrics and task accuracy on five tasks from the ERASER explainability benchmark in both fully supervised and few-shot settings.

show abstract

Section: Related Workmentioning

confidence: 99%

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

Lakhotia¹,

Paranjape²,

Ghoshal³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…But their approach is unable to extract more fine-grained alignments (e.g., one-to-one continuous alignments). Bastings et al (2019); Swanson et al (2020) design sparse attention for hard alignments. However, these methods trade performance for interpretability, and are immutable to analyze well-trained models.…”

Section: Explaining Models In Nlpmentioning

confidence: 99%

“…Moreover, co-attention assigns scores among words thus forbids us to observe phraselevel alignments, which is a flaw that generally exists for attribution explanations as shown in Figure 1 (c). Other works build hard alignments resorting sparse attention (Yu et al, 2019;Bastings et al, 2019;Swanson et al, 2020). But their selfexplanatory architectures pay for the interpretability at a cost of performance dropping on accuracy (Molnar, 2020).…”

Section: Introductionmentioning

confidence: 99%

Alignment Rationale for Natural Language Inference

Jiang¹,

Zhang²,

Zhao³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Deep learning models have achieved great success on the task of Natural Language Inference (NLI), though only a few attempts try to explain their behaviors. Existing explanation methods usually pick prominent features such as words or phrases from the input text. However, for NLI, alignments among words or phrases are more enlightening clues to explain the model. To this end, this paper presents AREC, a post-hoc approach to generate alignment rationale explanations for co-attention based models in NLI. The explanation is based on feature selection, which keeps few but sufficient alignments while maintaining the same prediction of the target model. Experimental results show that our method is more faithful and readable compared with many existing approaches. We further study and reevaluate three typical models through our explanation beyond accuracy, and propose a simple method that greatly improves the model robustness. 1

show abstract

“…In machine learning literature, Sinkhorn-based networks have been gaining popularity as a means of learning latent permutations of visual or synthetic data (Mena et al, 2018) or imposing permutation invariance for set-theoretic learning (Grover et al, 2019), with so far limited adoption in the linguistic setting (Tay et al, 2020;Swanson et al, 2020). In contrast to prior applications of Sinkhorn as a final classification layer, we use it over chain element representations that have been mutually contextualized, rather than set elements vectorized in isolation.…”

Section: Related Workmentioning

confidence: 99%

Neural Proof Nets

Kogkalidis

Moortgat

Moot

2020

Proceedings of the 24th Conference on Computational Natural Language Learning

View full text Add to dashboard Cite

Linear logic and the linear λ-calculus have a long standing tradition in the study of natural language form and meaning. Among the proof calculi of linear logic, proof nets are of particular interest, offering an attractive geometric representation of derivations that is unburdened by the bureaucratic complications of conventional prooftheoretic formats. Building on recent advances in set-theoretic learning, we propose a neural variant of proof nets based on Sinkhorn networks, which allows us to translate parsing as the problem of extracting syntactic primitives and permuting them into alignment. Our methodology induces a batch-efficient, end-to-end differentiable architecture that actualizes a formally grounded yet highly efficient neuro-symbolic parser. We test our approach on AEthel, a dataset of typelogical derivations for written Dutch, where it manages to correctly transcribe raw text sentences into proofs and terms of the linear λcalculus with an accuracy of as high as 70%.

show abstract

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Cited by 22 publications

References 39 publications

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

Alignment Rationale for Natural Language Inference

Neural Proof Nets

Contact Info

Product

Resources

About