Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.496
|View full text |Cite
|
Sign up to set email alerts
|

Rationalizing Text Matching: Learning Sparse Alignments via Optimal Transport

Abstract: Selecting input features of top relevance has become a popular method for building selfexplaining models. In this work, we extend this selective rationalization approach to text matching, where the goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction. Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs. However, directly applying OT often produces dense and therefore uninterpretable alignments.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
37
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(37 citation statements)
references
References 39 publications
(23 reference statements)
0
37
0
Order By: Relevance
“…Deep learning models typically function as black boxes offering very little insight into their decision making mechanics. To expose model understanding at various depths, researchers have proposed various structural probing (Tenney et al, 2018;Hewitt and Manning, 2019;Lin et al, 2019) and behavioral probing methods (McCoy et al, 2020;Goldberg, 2019;Warstadt et al, 2019;Ettinger, 2020), as well as input saliency maps to highlight the most important tokens/sentences in the input for each prediction (Serrano and Smith, 2019;Ribeiro et al, 2016;Swanson et al, 2020;Tenney et al, 2019), and input token relationships (Lamm et al, 2020). Alongside, there is work on producing textual rationales (Lei et al, 2016), which are snippets of NL to help explain model predictions.…”
Section: Related Workmentioning
confidence: 99%
“…Deep learning models typically function as black boxes offering very little insight into their decision making mechanics. To expose model understanding at various depths, researchers have proposed various structural probing (Tenney et al, 2018;Hewitt and Manning, 2019;Lin et al, 2019) and behavioral probing methods (McCoy et al, 2020;Goldberg, 2019;Warstadt et al, 2019;Ettinger, 2020), as well as input saliency maps to highlight the most important tokens/sentences in the input for each prediction (Serrano and Smith, 2019;Ribeiro et al, 2016;Swanson et al, 2020;Tenney et al, 2019), and input token relationships (Lamm et al, 2020). Alongside, there is work on producing textual rationales (Lei et al, 2016), which are snippets of NL to help explain model predictions.…”
Section: Related Workmentioning
confidence: 99%
“…But their approach is unable to extract more fine-grained alignments (e.g., one-to-one continuous alignments). Bastings et al (2019); Swanson et al (2020) design sparse attention for hard alignments. However, these methods trade performance for interpretability, and are immutable to analyze well-trained models.…”
Section: Explaining Models In Nlpmentioning
confidence: 99%
“…Moreover, co-attention assigns scores among words thus forbids us to observe phraselevel alignments, which is a flaw that generally exists for attribution explanations as shown in Figure 1 (c). Other works build hard alignments resorting sparse attention (Yu et al, 2019;Bastings et al, 2019;Swanson et al, 2020). But their selfexplanatory architectures pay for the interpretability at a cost of performance dropping on accuracy (Molnar, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…In machine learning literature, Sinkhorn-based networks have been gaining popularity as a means of learning latent permutations of visual or synthetic data (Mena et al, 2018) or imposing permutation invariance for set-theoretic learning (Grover et al, 2019), with so far limited adoption in the linguistic setting (Tay et al, 2020;Swanson et al, 2020). In contrast to prior applications of Sinkhorn as a final classification layer, we use it over chain element representations that have been mutually contextualized, rather than set elements vectorized in isolation.…”
Section: Related Workmentioning
confidence: 99%