Alignment Rationale for Natural Language Inference

Jiang, Zhongtao; Zhang, Yuanzhe; Zhao, Yang; Zhao, Jun; Liu, Kang

doi:10.18653/v1/2021.acl-long.417

Cited by 12 publications

(7 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ideally, a model should learn rational (Jiang et al, 2021;Lu et al, 2022) features for robust generalization. Take sentiment classification for example.…”

Section: Featuresmentioning

confidence: 99%

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Yang,

Song,

Ren

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Machine learning (ML) systems in natural language processing (NLP) face significant challenges in generalizing to out-of-distribution (OOD) data, where the test distribution differs from the training data distribution. This poses important questions about the robustness of NLP models and their high accuracy, which may be artificially inflated due to their underlying sensitivity to systematic biases. Despite these challenges, there is a lack of comprehensive surveys on the generalization challenge from an OOD perspective in natural language understanding. Therefore, this paper aims to fill this gap by presenting the first comprehensive review of recent progress, methods, and evaluations on this topic. We further discuss the challenges involved and potential future research directions. By providing convenient access to existing work, we hope this survey will encourage future research in this area.

show abstract

“…Ideally, a model should learn rational (Jiang et al, 2021;Lu et al, 2022) features for robust generalization. Take sentiment classification for example.…”

Section: Featuresmentioning

confidence: 99%

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Yang,

Song,

Ren

et al. 2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…where ŷ is the predicted label, N is the number of examples, p(ŷ|) is the probability on the predicted class, and x(k) i is modified sample. Higher AOPCs is better, which means that the features chosen by attribution scores are more important; Feng et al Besides these works, a lot of works (Shrikumar et al, 2017;Chen et al, 2019;Nguyen, 2018;DeYoung et al, 2020;Hao et al, 2020;Jiang et al, 2021) use similar metrics to perform evaluation and comparisons. The main difference between evaluation metrics in these works is the difference in the modification strategy.…”

Section: Part II Evaluation 2: Evaluation Based On Meaningful Perturb...mentioning

confidence: 99%

Logic Traps in Evaluating Attribution Scores

Ju¹,

Zhang²,

Zhao³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Modern deep learning models are notoriously opaque, which has motivated the development of methods for interpreting how deep models predict. This goal is usually approached with attribution method, which assesses the influence of features on model predictions. As an explanation method, the evaluation criteria of attribution methods is how accurately it reflects the actual reasoning process of the model (faithfulness). Meanwhile, since the reasoning process of deep models is inaccessible, researchers design various evaluation methods to demonstrate their arguments. However, some crucial logic traps in these evaluation methods are ignored in most works, causing inaccurate evaluation and unfair comparison. This paper systematically reviews existing methods for evaluating attribution scores and summarizes the logic traps in these methods. We further conduct experiments to demonstrate the existence of each logic trap. Through both theoretical and experimental analysis, we hope to increase attention on the inaccurate evaluation of attribution scores. Moreover, with this paper, we suggest stopping focusing on improving performance under unreliable evaluation systems and starting efforts on reducing the impact of proposed logic traps.

show abstract

“…Recently, there are many large-scale standard datasets released, like SciTail [Khot Tushar 2018], SNLI [Bowman et al 2015], Multi-NLI [Williams et al 2018], etc. These datasets facilitate the study of NLI greatly, and some state-of-the-art neural models have achieved very competitive performance on these datasets [Belinkov et al 2019;Chen et al 2021b;Jiang et al 2021;Meissner et al 2021;Zhou and Bansal 2020]. From the definition of NLI we can see that it is based on (and assumes) common human understanding of language as well as common background knowledge, thus it has been considered by many as an important evaluation measure for language understanding [Bowman et al 2015;Ido Dagan 2006;Williams et al 2018;Zylberajch et al 2021].…”

Section: Related Workmentioning

confidence: 99%

An Understanding-oriented Robust Machine Reading Comprehension Model

Ren

Liu

et al. 2022

ACM Trans. Asian Low-Resour. Lang. Inf. Process.

View full text Add to dashboard Cite

Although existing machine reading comprehension models are making rapid progress on many datasets, they are far from robust. In this paper, we propose an understanding-oriented machine reading comprehension model to address three kinds of robustness issues, which are over sensitivity, over stability and generalization. Specifically, we first use a natural language inference module to help the model understand the accurate semantic meanings of input questions so as to address the issues of over sensitivity and over stability. Then in the machine reading comprehension module, we propose a memory-guided multi-head attention method that can further well understand the semantic meanings of input questions and passages. Third, we propose a multi-language learning mechanism to address the issue of generalization. Finally, these modules are integrated with a multi-task learning based method. We evaluate our model on three benchmark datasets that are designed to measure models’ robustness, including DuReader (robust) and two SQuAD-related datasets. Extensive experiments show that our model can well address the mentioned three kinds of robustness issues. And it achieves much better results than the compared state-of-the-art models on all these datasets under different evaluation metrics, even under some extreme and unfair evaluations. The source code of our work is available at: https://github.com/neukg/RobustMRC.

show abstract

Alignment Rationale for Natural Language Inference

Cited by 12 publications

References 39 publications

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Out-of-Distribution Generalization in Natural Language Processing: Past, Present, and Future

Logic Traps in Evaluating Attribution Scores

An Understanding-oriented Robust Machine Reading Comprehension Model

Contact Info

Product

Resources

About