Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Zhang, Jianguo; Hashimoto, Kazuma; Liu, Wenhao; Wu, Chien-Sheng; Wan, Yao; Yu, Philip S.; Socher, Richard; Xiong, Caiming

doi:10.18653/v1/2020.emnlp-main.411

Cited by 46 publications

(67 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Stage 2 fine-tuning is inspired by metric-based meta-learning (Vinyals et al, 2016;Musgrave et al, 2020) and exemplar-based (also termed prototype-based) learning (Snell et al, 2017;Zhang et al, 2020), which is especially suited for few-shot scenarios. We assume the existence of N a annotated in-task examples see (Henderson et al, 2019a).…”

Section: Stage 2: Task-based Sentence Encodersmentioning

confidence: 99%

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Vulić¹,

Su²,

Coope³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Transformer-based language models (LMs) pretrained on large text collections are proven to store a wealth of semantic knowledge. However, 1) they are not effective as sentence encoders when used off-the-shelf, and 2) thus typically lag behind conversationally pretrained (e.g., via response selection) encoders on conversational tasks such as intent detection (ID). In this work, we propose CON-VFIT, a simple and efficient two-stage procedure which turns any pretrained LM into a universal conversational encoder (after Stage 1 CONVFIT-ing) and task-specialised sentence encoder (after Stage 2). We demonstrate that 1) full-blown conversational pretraining is not required, and that LMs can be quickly transformed into effective conversational encoders with much smaller amounts of unannotated data; 2) pretrained LMs can be fine-tuned into task-specialised sentence encoders, optimised for the fine-grained semantics of a particular task. Consequently, such specialised sentence encoders allow for treating ID as a simple semantic similarity task based on interpretable nearest neighbours retrieval. We validate the robustness and versatility of the CON-VFIT framework with such similarity-based inference on the standard ID evaluation sets: CONVFIT-ed LMs achieve state-of-the-art ID performance across the board, with particular gains in the most challenging, few-shot setups.

show abstract

Section: Stage 2: Task-based Sentence Encodersmentioning

confidence: 99%

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Vulić¹,

Su²,

Coope³

et al. 2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…In addition to the standard few shot learning evaluation where the model is only evaluated on samples from in-scope class distribution, a more realistic evaluation setting involves the Out-of-Scope (OOS) class, in which samples come from a different distribution, e.g., random utterances not related to any registered intent class in a dialogue. We adopt the OOS evaluation strategy (Zhang et al, 2020;Larson et al, 2019) which adds an additional OOS class in the meta testing stage, while the meta training stage remains to be the same. A sample is assigned to the OOS class if the probabilistic prediction for the best class is under a specified threshold T with value between 0 and 1.…”

Section: Out-of-scope Evaluationmentioning

confidence: 99%

“…In this study, we focus on cross-domain few shot classification with the goal to investigate whether we can meta train a large pre-trained language model (e.g., BERT) in a semi-supervised fashion without access to a large number of labeled data or meta training tasks. The resulting representation should generalize and adapt well to a new domain, and provide clear separations between indomain and out-of-scope (OOS) examples (Zhang et al, 2020). Our base meta learner consists of an embedding function (e.g., BERT) and ProtoNet (Snell et al, 2017) as the general supervised classifier, which can be fine-tuned either using the supervised N -way K-shot classification tasks (supervised meta training) or together with the selfsupervised SMLMT tasks (semi-supervised meta training).…”

Section: Introductionmentioning

confidence: 99%

Semi-supervised Meta-learning for Cross-domain Few-shot Intent Classification

Li¹,

Zhang²

2021

Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

View full text Add to dashboard Cite

Meta-learning aims to optimize the model's capability to generalize to new tasks and domains. Lacking a data-efficient way to create meta training tasks has prevented the application of meta-learning to the real-world few shot learning scenarios. Recent studies have proposed unsupervised approaches to create meta-training tasks from unlabeled data for free, e.g., the SMLMT method (Bansal et al., 2020a) constructs unsupervised multiclass classification tasks from the unlabeled text by randomly masking words in the sentence and let the meta learner choose which word to fill in the blank. This study proposes a semi-supervised meta-learning approach that incorporates both the representation power of large pre-trained language models and the generalization capability of prototypical networks enhanced by SMLMT. The semi-supervised meta training approach avoids overfitting prototypical networks on a small number of labeled training examples and quickly learns cross-domain task-specific representation only from a few supporting examples. By incorporating SMLMT with prototypical networks, the meta learner generalizes better to unseen domains and gains higher accuracy on out-ofscope examples without the heavy lifting of pre-training. We observe significant improvement in few-shot generalization after training only a few epochs on the intent classification tasks evaluated in a multi-domain setting.

show abstract

“…Different from (Shridhar et al, 2021), during encoding, we concatenate the initial encoding h RNN and z µ as an input to obtain h t , namely, h RNN = Encode(g, τ t ), h t = GRU(ReLU(W(h RNN ⊕ z µ ) + b), h t−1 ), where ⊕ denotes the concatenation operation, W ∈ R de×2de is a weight matrix, b ∈ R de is a bias vector, h RNN ∈ R de , h t ∈ R d h , d e is the dimension of z µ , d h is the dimension of h t , GRU denotes a gated recurrent unit, and ReLU denotes a ReLU activation function. Compared to selecting text actions from a set of valid actions, generating text actions word by word is more likely to explore multiple possibilities for performing actions to achieve higher rewards (Yao et al, 2020). However, Shridhar et al (2021) show that when trained from a sparse reinforcement learning signal in ALFWorld, generation-based methods are hard to get good performance.…”

Section: Execution Policymentioning

confidence: 99%

“…In contrast, generation-based methods can generate more possibilities and potentially have a better generalization ability. Therefore, to allow a text agent to fully explore in an environment and obtain best performance, a generation-based method is needed (Yao et al, 2020). However, the combinatorial action space precludes reinforcement learning from working well on a generation-based policy network.…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

2021

View full text Add to dashboard Cite

Deep learning based natural language processing (NLP) has become the mainstream of research in recent years and significantly outperforms conventional methods. However, deep learning models are notorious for being data and computation hungry. These downsides limit such models' application from deployment to different domains, languages, countries, or styles, since collecting in-genre data and model training from scratch are costly. The long-tail nature of human language makes challenges even more significant.Meta-learning, or 'Learning to Learn', aims to learn better learning algorithms, including better parameter initialization, optimization strategy, network architecture, distance metrics, and beyond. Metalearning has been shown to allow faster fine-tuning, converge to better performance, and achieve outstanding results for few-shot learning in many applications. Meta-learning is one of the most important new techniques in machine learning in recent years, but the method is mainly investigated with applications in computer vision. It is believed that meta-learning has excellent potential to be applied in NLP, and some works have been proposed with notable achievements in several relevant problems, e.g., relation extraction, machine translation, and dialogue generation and state tracking. However, it does not catch the same level of attention as in the image processing community.

show abstract

Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Cited by 46 publications

References 25 publications

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Semi-supervised Meta-learning for Cross-domain Few-shot Intent Classification

Proceedings of the 1st Workshop on Meta Learning and Its Applications to Natural Language Processing

Contact Info

Product

Resources

About