New Intent Discovery with Pre-training and Contrastive Learning

Zhang, Yuwei; Zhang, Haode; Zhan, Li-Ming; Wu, Xiaoming; Lam, Albert Y. S.

doi:10.18653/v1/2022.acl-long.21

Cited by 12 publications

(7 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…15 to balance different losses. Moreover, we adopt random token replacement (Zhang et al 2022) as data augmentation for second view generation. For other general hyperparameters, the learning rate is set to 5e −5 , training epochs are set to 80, and the batch size of labeled and unlabeled instances is set to 128 for all datasets equally.…”

Section: Implementation Detailsmentioning

confidence: 99%

A Unified Knowledge Transfer Network for Generalized Category Discovery

Shi,

An,

Tian

et al. 2024

AAAI

View full text Add to dashboard Cite

Generalized Category Discovery (GCD) aims to recognize both known and novel categories in an unlabeled dataset by leveraging another labeled dataset with only known categories. Without considering knowledge transfer from known to novel categories, current methods usually perform poorly on novel categories due to the lack of corresponding supervision. To mitigate this issue, we propose a unified Knowledge Transfer Network (KTN), which solves two obstacles to knowledge transfer in GCD. First, the mixture of known and novel categories in unlabeled data makes it difficult to identify transfer candidates (i.e., samples with novel categories). For this, we propose an entropy-based method that leverages knowledge in the pre-trained classifier to differentiate known and novel categories without requiring extra data or parameters. Second, the lack of prior knowledge of novel categories presents challenges in quantifying semantic relationships between categories to decide the transfer weights. For this, we model different categories with prototypes and treat their similarities as transfer weights to measure the semantic similarities between categories. On the basis of two treatments, we transfer knowledge from known to novel categories by conducting pre-adjustment of logits and post-adjustment of labels for transfer candidates based on the transfer weights between different categories. With the weighted adjustment, KTN can generate more accurate pseudo-labels for unlabeled data, which helps to learn more discriminative features and boost model performance on novel categories. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Furthermore, different from previous clustering-based methods that can only work offline with abundant data, KTN can be deployed online conveniently with faster inference speed. Code and data are available at https://github.com/yibai-shi/KTN.

show abstract

Section: Implementation Detailsmentioning

confidence: 99%

A Unified Knowledge Transfer Network for Generalized Category Discovery

Shi,

An,

Tian

et al. 2024

AAAI

View full text Add to dashboard Cite

show abstract

“…After pretraining, most existing works (Lin, Xu, and Zhang 2020;Zhang et al 2021bZhang et al , 2022 For labeled data, we take average of all instance embeddings belonging to the same category as labeled prototypes…”

Section: Learning Category Prototypesmentioning

confidence: 99%

“…is the number of total categories. We presume prior knowledge of K following previous works (Zhang et al 2021b(Zhang et al , 2022 to make a fair comparison and we tackle the problem of estimating this parameter in the experiment. Then we take average of all instance embeddings belonging to the same cluster as unlabeled prototypes…”

Section: Learning Category Prototypesmentioning

confidence: 99%

“…Since data with novel categories are unlabeled, we aim to explore some novel knowledge to improve model's discriminative ability on these categories with unsupervised learning. Previous works focused on instance-level discrimination with pseudolabel training (Zhang et al 2021b) or contrastive learning (Zhang et al 2022). However, they ignored high-level semantics between instances and categories.…”

Section: Semantic-aware Prototypical Learningmentioning

confidence: 99%

See 1 more Smart Citation

Generalized Category Discovery with Decoupled Prototypical Network

An,

Tian,

Zheng

et al. 2023

AAAI

View full text Add to dashboard Cite

Generalized Category Discovery (GCD) aims to recognize both known and novel categories from a set of unlabeled data, based on another dataset labeled with only known categories. Without considering differences between known and novel categories, current methods learn about them in a coupled manner, which can hurt model's generalization and discriminative ability. Furthermore, the coupled training approach prevents these models transferring category-specific knowledge explicitly from labeled data to unlabeled data, which can lose high-level semantic information and impair model performance. To mitigate above limitations, we present a novel model called Decoupled Prototypical Network (DPN). By formulating a bipartite matching problem for category prototypes, DPN can not only decouple known and novel categories to achieve different training targets effectively, but also align known categories in labeled and unlabeled data to transfer category-specific knowledge explicitly and capture high-level semantics. Furthermore, DPN can learn more discriminative features for both known and novel categories through our proposed Semantic-aware Prototypical Learning (SPL). Besides capturing meaningful semantic information, SPL can also alleviate the noise of hard pseudo labels through semantic-weighted soft assignment. Extensive experiments show that DPN outperforms state-of-the-art models by a large margin on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/Lackel/DPN.

show abstract

“…Thus, we propose a KCL loss to keep large interclass variance and help downstream transfer. For OOD clustering, Zhang et al ( , 2022 use kmeans to learn cluster assignments but ignore joint learning intent representations. Mou et al (2022) uses contrastive clustering where the instance-level contrastive loss for learning intent features has a gap with the cluster-level loss for clustering.…”

Section: Related Workmentioning

confidence: 99%

Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery

Mou¹,

He²,

Wang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Discovering out-of-domain (OOD) intent is important for developing new skills in taskoriented dialogue systems. The key challenges lie in how to transfer prior in-domain (IND) knowledge to OOD clustering, as well as jointly learn OOD representations and cluster assignments. Previous methods suffer from in-domain overfitting problem, and there is a natural gap between representation learning and clustering objectives. In this paper, we propose a unified K-nearest neighbor contrastive learning framework to discover OOD intents. Specifically, for IND pre-training stage, we propose a KCL objective to learn inter-class discriminative features, while maintaining intra-class diversity, which alleviates the in-domain overfitting problem. For OOD clustering stage, we propose a KCC method to form compact clusters by mining true hard negative samples, which bridges the gap between clustering and representation learning. Extensive experiments on three benchmark datasets show that our method achieves substantial improvements over the state-of-the-art methods.

show abstract

New Intent Discovery with Pre-training and Contrastive Learning

Cited by 12 publications

References 27 publications

A Unified Knowledge Transfer Network for Generalized Category Discovery

A Unified Knowledge Transfer Network for Generalized Category Discovery

Generalized Category Discovery with Decoupled Prototypical Network

Watch the Neighbors: A Unified K-Nearest Neighbor Contrastive Learning Framework for OOD Intent Discovery

Contact Info

Product

Resources

About