Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.411
|View full text |Cite
|
Sign up to set email alerts
|

Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference

Abstract: Intent detection is one of the core components of goal-oriented dialog systems, and detecting out-of-scope (OOS) intents is also a practically important skill. Few-shot learning is attracting much attention to mitigate data scarcity, but OOS detection becomes even more challenging. In this paper, we present a simple yet effective approach, discriminative nearest neighbor classification with deep self-attention. Unlike softmax classifiers, we leverage BERTstyle pairwise encoding to train a binary classifier tha… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
67
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 46 publications
(67 citation statements)
references
References 25 publications
0
67
0
Order By: Relevance
“…Stage 2 fine-tuning is inspired by metric-based meta-learning (Vinyals et al, 2016;Musgrave et al, 2020) and exemplar-based (also termed prototype-based) learning (Snell et al, 2017;Zhang et al, 2020), which is especially suited for few-shot scenarios. We assume the existence of N a annotated in-task examples see (Henderson et al, 2019a).…”
Section: Stage 2: Task-based Sentence Encodersmentioning
confidence: 99%
“…Stage 2 fine-tuning is inspired by metric-based meta-learning (Vinyals et al, 2016;Musgrave et al, 2020) and exemplar-based (also termed prototype-based) learning (Snell et al, 2017;Zhang et al, 2020), which is especially suited for few-shot scenarios. We assume the existence of N a annotated in-task examples see (Henderson et al, 2019a).…”
Section: Stage 2: Task-based Sentence Encodersmentioning
confidence: 99%
“…In addition to the standard few shot learning evaluation where the model is only evaluated on samples from in-scope class distribution, a more realistic evaluation setting involves the Out-of-Scope (OOS) class, in which samples come from a different distribution, e.g., random utterances not related to any registered intent class in a dialogue. We adopt the OOS evaluation strategy (Zhang et al, 2020;Larson et al, 2019) which adds an additional OOS class in the meta testing stage, while the meta training stage remains to be the same. A sample is assigned to the OOS class if the probabilistic prediction for the best class is under a specified threshold T with value between 0 and 1.…”
Section: Out-of-scope Evaluationmentioning
confidence: 99%
“…In this study, we focus on cross-domain few shot classification with the goal to investigate whether we can meta train a large pre-trained language model (e.g., BERT) in a semi-supervised fashion without access to a large number of labeled data or meta training tasks. The resulting representation should generalize and adapt well to a new domain, and provide clear separations between indomain and out-of-scope (OOS) examples (Zhang et al, 2020). Our base meta learner consists of an embedding function (e.g., BERT) and ProtoNet (Snell et al, 2017) as the general supervised classifier, which can be fine-tuned either using the supervised N -way K-shot classification tasks (supervised meta training) or together with the selfsupervised SMLMT tasks (semi-supervised meta training).…”
Section: Introductionmentioning
confidence: 99%
“…Different from (Shridhar et al, 2021), during encoding, we concatenate the initial encoding h RNN and z µ as an input to obtain h t , namely, h RNN = Encode(g, τ t ), h t = GRU(ReLU(W(h RNN ⊕ z µ ) + b), h t−1 ), where ⊕ denotes the concatenation operation, W ∈ R de×2de is a weight matrix, b ∈ R de is a bias vector, h RNN ∈ R de , h t ∈ R d h , d e is the dimension of z µ , d h is the dimension of h t , GRU denotes a gated recurrent unit, and ReLU denotes a ReLU activation function. Compared to selecting text actions from a set of valid actions, generating text actions word by word is more likely to explore multiple possibilities for performing actions to achieve higher rewards (Yao et al, 2020). However, Shridhar et al (2021) show that when trained from a sparse reinforcement learning signal in ALFWorld, generation-based methods are hard to get good performance.…”
Section: Execution Policymentioning
confidence: 99%
“…In contrast, generation-based methods can generate more possibilities and potentially have a better generalization ability. Therefore, to allow a text agent to fully explore in an environment and obtain best performance, a generation-based method is needed (Yao et al, 2020). However, the combinatorial action space precludes reinforcement learning from working well on a generation-based policy network.…”
Section: Introductionmentioning
confidence: 99%