ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Vulić, Ivan; Su, Pei-Hao; Coope, Sam; Gerz, Daniela; Budzianowski, Paweł; Casanueva, Iñigo; Mrkšić, Nikola; Wen, Tao

doi:10.18653/v1/2021.emnlp-main.88

Cited by 19 publications

(22 citation statements)

References 52 publications

(34 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other work on few-shot intent classification explores fine-tuning dialogue-specific LMs as classifiers as well as using similarity-based classifiers instead of MLP-based ones on top of BERT (Vulić et al, 2021). We believe that improvements brought by data augmentation would be complementary to the gains brought by these methods.…”

Section: Gpt-3 Predictionsmentioning

confidence: 99%

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

Sahu¹,

Rodríguez²,

Laradji³

et al. 2022

Preprint

View full text Add to dashboard Cite

Data augmentation is a widely employed technique to alleviate the problem of data scarcity. In this work, we propose a prompting-based approach to generate labelled training data for intent classification with off-the-shelf language models (LMs) such as GPT-3. An advantage of this method is that no task-specific LM-fine-tuning for data generation is required; hence the method requires no hyper-parameter tuning and is applicable even when the available training data is very scarce. We evaluate the proposed method in a few-shot setting on four diverse intent classification tasks. We find that GPT-generated data significantly boosts the performance of intent classifiers when intents in consideration are sufficiently distinct from each other. In tasks with semantically close intents, we observe that the generated data is less helpful. Our analysis shows that this is because GPT often generates utterances that belong to a closely-related intent instead of the desired one. We present preliminary evidence that a prompting-based GPT classifier could be helpful in filtering the generated data to enhance its quality. 1

show abstract

Section: Gpt-3 Predictionsmentioning

confidence: 99%

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

Sahu¹,

Rodríguez²,

Laradji³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…For example, formulated intent recognition as a sentence similarity task and pre-trained on natural language inference (NLI) datasets. Vulić et al (2021); Zhang et al (2021e) pre-trained with a contrastive loss on intent detection tasks. Our multi-task pre-training method is inspired from Zhang et al (2021d) which leverages publicly available intent datasets and unlabeled data in the current domain for pre-training to improve the performance of few-shot intent detection.…”

Section: Related Workmentioning

confidence: 99%

New Intent Discovery with Pre-training and Contrastive Learning

Zhang¹,

Zhang²,

Zhan³

et al. 2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

New intent discovery aims to uncover novel intent categories from user utterances to expand the set of supported intent classes. It is a critical task for the development and service expansion of a practical dialogue system. Despite its importance, this problem remains underexplored in the literature. Existing approaches typically rely on a large amount of labeled utterances and employ pseudo-labeling methods for representation learning and clustering, which are label-intensive, inefficient, and inaccurate. In this paper, we provide new solutions to two important research questions for new intent discovery: (1) how to learn semantic utterance representations and (2) how to better cluster utterances. Particularly, we first propose a multi-task pre-training strategy to leverage rich unlabeled data along with external labeled data for representation learning. Then, we design a new contrastive loss to exploit self-supervisory signals in unlabeled data for clustering. Extensive experiments on three intent recognition benchmarks demonstrate the high effectiveness of our proposed method, which outperforms state-of-the-art methods by a large margin in both unsupervised and semisupervised scenarios. The source code will be available at https://github.com/ zhang-yu-wei/MTP-CLNN.

show abstract

“…Recent advances in pre-trained language models have resulted in impressive performances on open-domain text generation, such as story com-pletion (See et al, 2019;Yao et al, 2019;Fan et al, 2019;Ippolito et al, 2020), dialogue generation (Rashkin et al, 2019b;Zhang et al, 2020b;Li, 2020;Vulić et al, 2021), question generation (Cheng et al, 2021;Wang et al, 2021), and so on. For example, in dialogue generation, Zhang et al (2020b) Despite the success of generative pre-trained language models on a series of open-ended text generation tasks, they still suffer in maintaining coherence throughout multiple sentences due to the left-to-right word-by-word generation style (Fan et al, 2019;.…”

Section: Related Workmentioning

confidence: 99%

Event Transition Planning for Open-ended Text Generation

Li¹,

Li²,

Bi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Open-ended text generation tasks, such as dialogue generation and story completion, require models to generate a coherent continuation given limited preceding context. The openended nature of these tasks brings new challenges to the neural auto-regressive text generators nowadays. Despite these neural models are good at producing human-like text, it is difficult for them to arrange causalities and relations between given facts and possible ensuing events. To bridge this gap, we propose a novel two-stage method which explicitly arranges the ensuing events in open-ended text generation. Our approach can be understood as a specially-trained coarse-to-fine algorithm, where an event transition planner provides a "coarse" plot skeleton and a text generator in the second stage refines the skeleton. Experiments on two open-ended text generation tasks demonstrate that our proposed method effectively improves the quality of the generated text, especially in coherence and diversity. The code is available at: https://github.com/ qtli/EventPlanforTextGen.

show abstract

ConvFiT: Conversational Fine-Tuning of Pretrained Language Models

Cited by 19 publications

References 52 publications

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

New Intent Discovery with Pre-training and Contrastive Learning

Event Transition Planning for Open-ended Text Generation

Contact Info

Product

Resources

About