Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference

Ding, Xiaoan; Li, Tianyu; Chang, Baobao; Sui, Zhifang; Gimpel, Kevin

doi:10.18653/v1/2020.emnlp-main.657

Cited by 8 publications

(4 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We are motivated by the insight that generative classifier is more effective in low data regime than discriminative classifier which is demonstrated by [17]. Although the conclusion is drawn on simple linear models [17], similar results are also observed on deep neural networks (DNNs) [18], [19] recently. It should be noticed that in online CIL setting the data is seen only once, not fully trained, so it is analogous to the low data regime in which the generative classifier is preferable.…”

Section: Introductionmentioning

confidence: 86%

See 1 more Smart Citation

Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Shen¹,

Jie²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

Continual learning requires the model to maintain the learned knowledge while learning from a non-i.i.d data stream continually. Due to the single-pass training setting, online continual learning is very challenging, but it is closer to the real-world scenarios where quick adaptation to new data is appealing. In this paper, we focus on online class-incremental learning setting in which new classes emerge over time. Almost all existing methods are replay-based with a softmax classifier. However, the inherent logits bias problem in the softmax classifier is a main cause of catastrophic forgetting while existing solutions are not applicable for online settings. To bypass this problem, we abandon the softmax classifier and propose a novel generative framework based on the feature space. In our framework, a generative classifier which utilizes replay memory is used for inference, and the training objective is a pair-based metric learning loss which is proven theoretically to optimize the feature space in a generative way. In order to improve the ability to learn new data, we further propose a hybrid of generative and discriminative loss to train the model. Extensive experiments on several benchmarks, including newly introduced task-free datasets, show that our method beats a series of state-of-the-art replay-based methods with discriminative classifiers, and reduces catastrophic forgetting consistently with a remarkable margin.

show abstract

Section: Introductionmentioning

confidence: 86%

“…As discussed above, online CIL is in a low data setting where generative classifiers are preferable compared to discriminative classifiers [17]. Moreover, generative classifiers are more robust to continual learning [18] and imbalanced data settings [19]. At each iteration, B n ∪ B r is also highly imbalanced.…”

Section: B Inference With a Generative Classifiermentioning

confidence: 99%

Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Shen¹,

Jie²,

Li³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The trigger bias proposed in our paper belongs to selection bias and model overamplification bias. Bias has also been investigated in natural language inference [1,6,7,13,[21][22][23], question answering [24], ROC story cloze [2,28], lexical inference [17], visual question answering [12], etc. To our best knowledge, we are the first to present the biases in FSEC, i.e., trigger overlapping and trigger separability.…”

Section: Few-shot Event Classificationmentioning

confidence: 99%

Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification

Wang¹,

Xu²,

Li³

et al. 2021

Preprint

Self Cite

View full text Add to dashboard Cite

Few-Shot Event Classification (FSEC) aims at developing a model for event prediction, which can generalize to new event types with a limited number of annotated data. Existing FSEC studies have achieved high accuracy on different benchmarks. However, we find they suffer from trigger biases that signify the statistical homogeneity between some trigger words and target event types, which we summarize as trigger overlapping and trigger separability. The biases can result in context-bypassing problem, i.e., correct classifications can be gained by looking at only the trigger words while ignoring the entire context. Therefore, existing models can be weak in generalizing to unseen data in real scenarios. To further uncover the trigger biases and assess the generalization ability of the models, we propose two new sampling methods, Trigger-Uniform Sampling (TUS) and COnfusion Sampling (COS), for the meta tasks construction during evaluation. Besides, to cope with the context-bypassing problem in FSEC models, we introduce adversarial training and trigger reconstruction techniques. Experiments show these techniques help not only improve the performance, but also enhance the generalization ability of models. Our data and code is

show abstract

“…Sanchez et al (2018) analysed the behaviour of NLI models and the factors to be more robust. Ding et al (2020) proposed efficient methods to mitigate a particular known bias in NLI. Benchmark collection in NLI: GLUE (Wang et al, 2019b,a) benchmark contains several NLIrelated benchmark datasets.…”

Section: Related Workmentioning

confidence: 99%

An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference

Li¹,

Zheng²,

Ding³

et al. 2020

Proceedings of the 24th Conference on Computational Natural Language Learning

Self Cite

View full text Add to dashboard Cite

The prior work on natural language inference (NLI) debiasing mainly targets at one or few known biases while not necessarily making the models more robust. In this paper, we focus on the model-agnostic debiasing strategies and explore how to (or is it possible to) make the NLI models robust to multiple distinct adversarial attacks while keeping or even strengthening the models' generalization power. We firstly benchmark prevailing neural NLI models including pretrained ones on various adversarial datasets. We then try to combat distinct known biases by modifying a mixture of experts (MoE) ensemble method (Clark et al., 2019) and show that it's nontrivial to mitigate multiple NLI biases at the same time, and that model-level ensemble method outperforms MoE ensemble method. We also perform data augmentation including text swap, word substitution and paraphrase and prove its efficiency in combating various (though not all) adversarial attacks at the same time. Finally, we investigate several methods to merge heterogeneous training data (1.35M) and perform model ensembling, which are straightforward but effective to strengthen NLI models.

show abstract

Discriminatively-Tuned Generative Classifiers for Robust Natural Language Inference

Cited by 8 publications

References 36 publications

Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Bypassing Logits Bias in Online Class-Incremental Learning with a Generative Framework

Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification

An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference

Contact Info

Product

Resources

About