Yuanmeng Yan scite author profile

Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pretrained language models achieve high performance on many downstream tasks, the native derived sentence representations are proved to be collapsed and thus produce a poor performance on the semantic textual similarity (STS) tasks. In this paper, we present ConSERT, a Contrastive Framework for Self-Supervised SEntence Representation Transfer, that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way. By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations and make them more applicable for downstream tasks. Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art, even comparable to the supervised SBERT-NLI. And when further incorporating NLI supervision, we achieve new stateof-the-art performance on STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples available, showing its robustness in data scarcity scenarios.

show abstract

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Yan

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space

Yan

et al. 2020

View full text Add to dashboard Cite

Detecting out-of-domain (OOD) input intents is critical in the task-oriented dialog system. Different from most existing methods that rely heavily on manually labeled OOD samples, we focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data. In this paper, we propose a simple but strong generative distancebased classifier to detect OOD samples. We estimate the class-conditional distribution on feature spaces of DNNs via Gaussian discriminant analysis (GDA) to avoid over-confidence problems. And we use two distance functions, Euclidean and Mahalanobis distances, to measure the confidence score of whether a test sample belongs to OOD. Experiments on four benchmark datasets show that our method can consistently outperform the baselines.

show abstract

Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

Zeng¹,

He²,

Yan³

et al. 2021

View full text Add to dashboard Cite

Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a taskoriented dialog system. A key challenge of OOD detection is to learn discriminative semantic features. Traditional cross-entropy loss only focuses on whether a sample is correctly classified, and does not explicitly distinguish the margins between categories. In this paper, we propose a supervised contrastive learning objective to minimize intra-class variance by pulling together in-domain intents belonging to the same class and maximize inter-class variance by pushing apart samples from different classes. Besides, we employ an adversarial augmentation mechanism to obtain pseudo diverse views of a sample in the latent space. Experiments on two public datasets prove the effectiveness of our method capturing discriminative representations for OOD detection. 1

show abstract

Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack

He¹,

Zhang²,

Yan³

et al. 2020

View full text Add to dashboard Cite

Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuanmeng Yan

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space

Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack

Contact Info

Product

Resources

About