Yuanmeng Yan scite author profile

Learning high-quality sentence representations benefits a wide range of natural language processing tasks. Though BERT-based pretrained language models achieve high performance on many downstream tasks, the native derived sentence representations are proved to be collapsed and thus produce a poor performance on the semantic textual similarity (STS) tasks. In this paper, we present ConSERT, a Contrastive Framework for Self-Supervised SEntence Representation Transfer, that adopts contrastive learning to fine-tune BERT in an unsupervised and effective way. By making use of unlabeled texts, ConSERT solves the collapse issue of BERT-derived sentence representations and make them more applicable for downstream tasks. Experiments on STS datasets demonstrate that ConSERT achieves an 8% relative improvement over the previous state-of-the-art, even comparable to the supervised SBERT-NLI. And when further incorporating NLI supervision, we achieve new stateof-the-art performance on STS tasks. Moreover, ConSERT obtains comparable results with only 1000 samples available, showing its robustness in data scarcity scenarios.

show abstract

A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space

Yan

et al. 2020

View full text Add to dashboard Cite

Detecting out-of-domain (OOD) input intents is critical in the task-oriented dialog system. Different from most existing methods that rely heavily on manually labeled OOD samples, we focus on the unsupervised OOD detection scenario where there are no labeled OOD samples except for labeled in-domain data. In this paper, we propose a simple but strong generative distancebased classifier to detect OOD samples. We estimate the class-conditional distribution on feature spaces of DNNs via Gaussian discriminant analysis (GDA) to avoid over-confidence problems. And we use two distance functions, Euclidean and Mahalanobis distances, to measure the confidence score of whether a test sample belongs to OOD. Experiments on four benchmark datasets show that our method can consistently outperform the baselines.

show abstract

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Yan

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

Zeng¹,

He²,

Yan³

et al. 2021

View full text Add to dashboard Cite

Detecting Out-of-Domain (OOD) or unknown intents from user queries is essential in a taskoriented dialog system. A key challenge of OOD detection is to learn discriminative semantic features. Traditional cross-entropy loss only focuses on whether a sample is correctly classified, and does not explicitly distinguish the margins between categories. In this paper, we propose a supervised contrastive learning objective to minimize intra-class variance by pulling together in-domain intents belonging to the same class and maximize inter-class variance by pushing apart samples from different classes. Besides, we employ an adversarial augmentation mechanism to obtain pseudo diverse views of a sample in the latent space. Experiments on two public datasets prove the effectiveness of our method capturing discriminative representations for OOD detection. 1

show abstract

Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack

He¹,

Zhang²,

Yan³

et al. 2020

View full text Add to dashboard Cite

Zero-shot slot filling has widely arisen to cope with data scarcity in target domains. However, previous approaches often ignore constraints between slot value representation and related slot description representation in the latent space and lack enough model robustness. In this paper, we propose a Contrastive Zero-Shot Learning with Adversarial Attack (CZSL-Adv) method for the cross-domain slot filling. The contrastive loss aims to map slot value contextual representations to the corresponding slot description representations. And we introduce an adversarial attack training strategy to improve model robustness. Experimental results show that our model significantly outperforms state-of-the-art baselines under both zero-shot and few-shot settings.

show abstract

Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge

He¹,

Yan²,

Wang³

2020

View full text Add to dashboard Cite

Neural-based context-aware models for slot tagging have achieved state-of-the-art performance. However, the presence of OOV(outof-vocab) words significantly degrades the performance of neural-based models, especially in a few-shot scenario. In this paper, we propose a novel knowledge-enhanced slot tagging model to integrate contextual representation of input text and the large-scale lexical background knowledge. Besides, we use multilevel graph attention to explicitly model lexical relations. The experiments show that our proposed knowledge integration mechanism achieves consistent improvements across settings with different sizes of training data on two public benchmark datasets.

show abstract

Adversarial Self-Supervised Learning for Out-of-Domain Detection

Zeng¹,

He²,

Yan³

et al. 2021

View full text Add to dashboard Cite

Detecting out-of-domain (OOD) intents is crucial for the deployed task-oriented dialogue system. Previous unsupervised OOD detection methods only extract discriminative features of different in-domain intents while supervised counterparts can directly distinguish OOD and in-domain intents but require extensive labeled OOD data. To combine the benefits of both types, we propose a selfsupervised contrastive learning framework to model discriminative semantic features of both in-domain intents and OOD intents from unlabeled data. Besides, we introduce an adversarial augmentation neural module to improve the efficiency and robustness of contrastive learning. Experiments on two public benchmark datasets show that our method can consistently outperform the baselines with a statistically significant margin.

show abstract

Multi-Level Cross-Lingual Transfer Learning With Language Shared and Specific Knowledge for Spoken Language Understanding

Wang

Yan

2020

IEEE Access

View full text Add to dashboard Cite

Recently conversational agents effectively improve their understanding capabilities by neural networks. Such deep neural models, however, do not apply to most human languages due to the lack of annotated training data for various NLP tasks. In this paper, we propose a multi-level cross-lingual transfer model with language shared and specific knowledge to improve the spoken language understanding of lowresource languages. Our method explicitly separates the model into the language-shared part and languagespecific part to transfer cross-lingual knowledge and improve the monolingual slot tagging, especially for low-resource languages. To refine the shared knowledge, we add a language discriminator and employ adversarial training to reinforce information separation. Besides, we adopt novel multi-level knowledge transfer in an incremental and progressive way to acquire multi-granularity shared knowledge rather than a single layer. To mitigate the discrepancies between the feature distributions of language specific and shared knowledge, we propose the neural adapters to fuse knowledge automatically. Experiments show that our proposed model consistently outperforms monolingual baseline with a statistically significant margin up to 2.09%, even higher improvement of 12.21% in the zero-shot setting.INDEX TERMS Spoken language understanding, cross-lingual learning, linguistic knowledge transfer, adversarial learning, multi-level knowledge representation.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yuanmeng Yan

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

A Deep Generative Distance-Based Classifier for Out-of-Domain Detection with Mahalanobis Space

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Modeling Discriminative Representations for Out-of-Domain Detection with Supervised Contrastive Learning

Contrastive Zero-Shot Learning for Cross-Domain Slot Filling with Adversarial Attack

Learning to Tag OOV Tokens by Integrating Contextual Representation and Background Knowledge

Adversarial Self-Supervised Learning for Out-of-Domain Detection

Multi-Level Cross-Lingual Transfer Learning With Language Shared and Specific Knowledge for Spoken Language Understanding

Contact Info

Product

Resources

About