Attention-Fused Deep Matching Network for Natural Language Inference

Duan, Chaoqun; Cui, Lei; Chen, Xinchi; Wei, Furu; Zhu, Conghui; Zhao, Tiejun

doi:10.24963/ijcai.2018/561

Cited by 44 publications

(46 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…10 We leave additional NLI datasets, such as the Diverse NLI Collection (Poliak et al, 2018a), for future work. 11 Many NLI models encode P and H separately (Rocktäschel et al, 2016;Mou et al, 2016;Liu et al, 2016;Cheng et al, 2016;Chen et al, 2017), although some share information between the encoders via attention Duan et al, 2018). 12 Specifically, representations are concatenated, subtracted, and multiplied element-wise.…”

Section: Methods 2: Negative Samplingmentioning

confidence: 99%

Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Belinkov¹,

Poliak²,

Shieber³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Natural Language Inference (NLI) datasets often contain hypothesis-only biases-artifacts that allow models to achieve non-trivial performance without learning whether a premise entails a hypothesis. We propose two probabilistic methods to build models that are more robust to such biases and better transfer across datasets. In contrast to standard approaches to NLI, our methods predict the probability of a premise given a hypothesis and NLI label, discouraging models from ignoring the premise. We evaluate our methods on synthetic and existing NLI datasets by training on datasets containing biases and testing on datasets containing no (or different) hypothesis-only biases. Our results indicate that these methods can make NLI models more robust to dataset-specific artifacts, transferring better than a baseline architecture in 9 out of 12 NLI datasets. Additionally, we provide an extensive analysis of the interplay of our methods with known biases in NLI datasets, as well as the effects of encouraging models to ignore biases and fine-tuning on target datasets. 1 * * Equal contribution 1 Our code is available at https://github.com/ azpoliak/robust-nli.2 This hypothesis contradicts the premise and would likely not be inferred.

show abstract

Section: Methods 2: Negative Samplingmentioning

confidence: 99%

Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Belinkov¹,

Poliak²,

Shieber³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…• AF-DMN (Duan et al, 2018) stacks multiple computational blocks in its matching layer to learn the interaction of the sentence pair better.…”

Section: Models For Comparingmentioning

confidence: 99%

“…Tay et al (2018) compare and compress alignment pairs using factorization lay-ers which leverages the rich history of standard machine learning literature to achieve this task. AF-DMN (Duan et al, 2018) stacks multiple computational blocks in its matching layer to learn the interaction of the sentence pair better. KIM is capable of leveraging external knowledge in co-attention, local inference collection, and inference composition components to improve the performance.…”

Section: Related Workmentioning

confidence: 99%

Asynchronous Deep Interaction Network for Natural Language Inference

Liang

Zhang

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Natural language inference aims to predict whether a premise sentence can infer another hypothesis sentence. Existing methods typically have framed the reasoning problem as a semantic matching task. The both sentences are encoded and interacted symmetrically and in parallel. However, in the process of reasoning, the role of the two sentences is obviously different, and the sentence pairs for NLI are asymmetrical corpora. In this paper, we propose an asynchronous deep interaction network (ADIN) to complete the task. ADIN is a neural network structure stacked with multiple inference sub-layers, and each sub-layer consists of two local inference modules in an asymmetrical manner. Different from previous methods, this model deconstructs the reasoning process and implements the asynchronous and multi-step reasoning. Experiment results show that ADIN achieves competitive performance and outperforms strong baselines on three popular benchmarks: SNLI, MultiNLI, and SciTail.

show abstract

“…The technique has applications in natural language inference to judge whether a hypothesis sentence can be inferred from a premise sentence (Bowman et al, 2015) and in paraphrase identification to determine whether two sentences express the equivalent meaning or not (Yin et al, 2015). The core issue for sentence matching is to model the relatedness between two sentences (Rocktäschel et al, 2015;Parikh et al, 2016;Wang et al, 2017;Duan et al, 2018).…”

Section: Introductionmentioning

confidence: 99%

“…Recently, neural network-based models for sentence matching have attracted more attention for their powerful ability to learn sentence representation (Bowman et al, 2015;Wang et al, 2017;Duan et al, 2018). There are mainly two types of frameworks: sentence encoding based framework and attention-based framework.…”

Section: Introductionmentioning

confidence: 99%

Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching

Liu

Zhang

Xu³

et al. 2019

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen

View full text Add to dashboard Cite

Sentence matching is a key issue in natural language inference and paraphrase identification. Despite the recent progress on multi-layered neural network with cross sentence attention, one sentence learns attention to the intermediate representations of another sentence, which are propagated from preceding layers and therefore are uncertain and unstable for matching, particularly at the risk of error propagation. In this paper, we present an original semantics-oriented attention and deep fusion network (OSOA-DFN) for sentence matching. Unlike existing models, each attention layer of OSOA-DFN is oriented to the original semantic representation of another sentence, which captures the relevant information from a fixed matching target. The multiple attention layers allow one sentence to repeatedly read the important information of another sentence for better matching. We then additionally design deep fusion to propagate the attention information at each matching layer. At last, we introduce a self-attention mechanism to capture global context to enhance attention-aware representation within each sentence. Experiment results on three sentence matching benchmark datasets SNLI, SciTail and Quora show that OSOA-DFN has the ability to model sentence matching more precisely.

show abstract

Attention-Fused Deep Matching Network for Natural Language Inference

Cited by 44 publications

References 4 publications

Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference

Asynchronous Deep Interaction Network for Natural Language Inference

Original Semantics-Oriented Attention and Deep Fusion Network for Sentence Matching

Contact Info

Product

Resources

About