ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Yan, Yuanmeng; Li, Rumei; Wang, Sirui; Zhang, Fuzheng; Wu, Wei; Wang, Xu

doi:10.18653/v1/2021.acl-long.393

Cited by 265 publications

(181 citation statements)

References 23 publications

Supporting

Mentioning

130

Contrasting

Order By: Relevance

“…Pagliardini et al (2018) show that simply augmenting the idea of word2vec (Mikolov et al, 2013) with n-gram embeddings leads to strong results. Several recent (and concurrent) approaches adopt contrastive objectives (Zhang et al, 2020;Giorgi et al, 2021;Meng et al, 2021;Carlsson et al, 2021;Kim et al, 2021;Yan et al, 2021) by taking different views-from data augmentation or different copies of models-of the same sentence or document. Compared to these work, SimCSE uses the simplest idea by taking different outputs of the same sentence from standard dropout, and performs the best on STS tasks.…”

Section: Related Workmentioning

confidence: 99%

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Gao¹,

Yao²,

Chen³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

1,209

620

View full text Add to dashboard Cite

This paper presents SimCSE, a simple contrastive learning framework that greatly advances the state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework, by using "entailment" pairs as positives and "contradiction" pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to previous best results. We also show-both theoretically and empirically-that contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available. 1

show abstract

Section: Related Workmentioning

confidence: 99%

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Gao¹,

Yao²,

Chen³

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

1,209

620

View full text Add to dashboard Cite

show abstract

“…CERT (Fang and Xie, 2020) mainly using back-and-forth translation, and CLINE proposed synonym substitution as positive samples and antonym substitution as negative samples, and then minimize the triplet loss between positive, negative cases as well as the original text. ConSERT (Yan et al, 2021) uses adversarial attack, token shuffling, cutoff, and dropout as data augmentation. CLAE (Ho and Nvasconcelos, 2020) also introduces Fast Gradient Sign Method, an adversarial attack method, as text data augmentation.…”

Section: Related Workmentioning

confidence: 99%

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

Cao¹,

Wang²,

Liang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Contrastive learning is emerging as a powerful technique for extracting knowledge from unlabeled data. This technique requires a balanced mixture of two ingredients: positive (similar) and negative (dissimilar) samples. This is typically achieved by maintaining a queue of negative samples during training. Prior works in the area typically uses a fixed-length negative sample queue, but how the negative sample size affects the model performance remains unclear. The opaque impact of the number of negative samples on performance when employing contrastive learning aroused our in-depth exploration. This paper presents a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE. We add the prediction layer to the online branch to make the model asymmetric and together with EMA update mechanism of the target branch to prevent model from collapsing. We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples. Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue. We evaluate the proposed unsupervised MoCoSE on the semantic text similarity (STS) task and obtain an average Spearman's correlation of 77.27%. Source code is available here.

show abstract

“…Specifically, Reimers and Gurevych (2019) mainly use the classification objective for an NLI dataset, and adopt contrastive learning to utilize self-supervision from a large corpus. Yan et al (2021); Gao et al (2021) incorporate a parallel corpus such as NLI datasets into their contrastive learning framework.…”

Section: Semantic Textual Similaritymentioning

confidence: 99%

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning

Lee¹,

Lee²,

Jang³

et al. 2022

Preprint

View full text Add to dashboard Cite

Recently, finetuning a pretrained language model to capture the similarity between sentence embeddings has shown the state-of-theart performance on the semantic textual similarity (STS) task. However, the absence of an interpretation method for the sentence similarity makes it difficult to explain the model output. In this work, we explicitly describe the sentence distance as the weighted sum of contextualized token distances on the basis of a transportation problem, and then present the optimal transport-based distance measure, named RCMD; it identifies and leverages semantically-aligned token pairs. In the end, we propose CLRCMD, a contrastive learning framework that optimizes RCMD of sentence pairs, which enhances the quality of sentence similarity and their interpretation. Extensive experiments demonstrate that our learning framework outperforms other baselines on both STS and interpretable-STS benchmarks, indicating that it computes effective sentence similarity and also provides interpretation consistent with human judgement.

show abstract

ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Cited by 265 publications

References 23 publications

SimCSE: Simple Contrastive Learning of Sentence Embeddings

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Exploring the Impact of Negative Samples of Contrastive Learning: A Case Study of Sentence Embedding

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning

Contact Info

Product

Resources

About