CERT: Contrastive Self-supervised Learning for Language Understanding

Fang, Hongfei; Wang, Sicheng; Zhang, Meng; Ding, Jiayuan; Xie, Pengtao

doi:10.48550/arxiv.2005.12766

Cited by 100 publications

(122 citation statements)

References 28 publications

Supporting

Mentioning

110

Contrasting

Order By: Relevance

“…Common approaches consider sentences within the same context as semantically similar samples (Kiros et al, 2015;Logeswaran & Lee, 2018). To create positive training pairs with augmented samples, a diverse set of text augmentation operations have been explored, including lexiconbased distortion (Wei & Zou, 2019), synonym replacement (Kobayashi, 2018), back-translation (Fang & Xie, 2020), cut-off (Shen et al, 2020) and dropout (Gao et al, 2021). However, unsupervised sentence embedding models still perform notably worse than supervised sentence encoders.…”

Section: Related Workmentioning

confidence: 99%

Text and Code Embeddings by Contrastive Pre-Training

Neelakantan¹,

Xu²,

Puri³

et al. 2022

Preprint

View full text Add to dashboard Cite

Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and model architecture. In this work, we show that contrastive pre-training on unsupervised data at scale leads to high quality vector representations of text and code. The same unsupervised text embeddings that achieve new state-of-the-art results in linear-probe classification also display impressive semantic search capabilities and sometimes even perform competitively with fine-tuned models. On linear-probe classification accuracy averaging over 7 tasks, our best unsupervised model achieves a relative improvement of 4% and 1.8% over previous best unsupervised and supervised text embedding models respectively. The same text embeddings when evaluated on large-scale semantic search attains a relative improvement of 23.4%, 14.7%, and 10.6% over previous best unsupervised methods on MSMARCO, Natural Questions and TriviaQA benchmarks, respectively. Similarly to text embeddings, we train code embedding models on (text, code) pairs, obtaining a 20.8% relative improvement over prior best work on code search.

show abstract

Section: Related Workmentioning

confidence: 99%

Text and Code Embeddings by Contrastive Pre-Training

Neelakantan¹,

Xu²,

Puri³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Constrastive learning was introduced in computer vision by Wu et al (2018), followed by several modifications to improve the training (He et al, 2020;Caron et al, 2020). In the context of natural language processing, Fang et al (2020) proposed to apply MoCo where positive pairs of sentences are obtained using back-translation. Different works augmented the masked language modeling objective with a contrastive loss (Giorgi et al, 2020;Meng et al, 2021).…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Dense Information Retrieval with Contrastive Learning

Izacard¹,

Caron²,

Hosseini³

et al. 2021

Preprint

View full text Add to dashboard Cite

Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.

show abstract

“…For the case of image modality, such tasks include predicting artificial rotations [13], colourisation [41,42] and feature clustering [4]. Recently Contrastive Learning [16] has become increasingly popular for learning both visual [9,19], audio [6,28] and natural language [11] representations. The method is to push positive pairs' embedding closer while pulling negative pairs' embedding further apart.…”

Section: Related Workmentioning

confidence: 99%

Fine-grained Multi-Modal Self-Supervised Learning

Wang¹,

Karout²

2021

Preprint

View full text Add to dashboard Cite

Multi-Modal Self-Supervised Learning from videos has been shown to improve model's performance on various downstream tasks. However, such Self-Supervised pretraining requires large batch sizes and a large amount of computation resources due to the noise present in the uncurated data. This is partly due to the fact that the prevalent training scheme is trained on coarse-grained setting, in which vectors representing the whole video clips or natural language sentences are used for computing similarity. Such scheme makes training noisy as part of the video clips can be totally not correlated with the othermodality input such as text description. In this paper, we propose a fine-grained multimodal self-supervised training scheme that computes the similarity between embeddings at finer-scale (such as individual feature map embeddings and embeddings of phrases), and uses attention mechanisms to reduce noisy pairs' weighting in the loss function. We show that with the proposed pre-training scheme, we can train smaller models, with smaller batch-size and much less computational resources to achieve downstream tasks performances comparable to State-Of-The-Art, for tasks including action recognition and text-image retrievals.

show abstract

CERT: Contrastive Self-supervised Learning for Language Understanding

Cited by 100 publications

References 28 publications

Text and Code Embeddings by Contrastive Pre-Training

Text and Code Embeddings by Contrastive Pre-Training

Unsupervised Dense Information Retrieval with Contrastive Learning

Fine-grained Multi-Modal Self-Supervised Learning

Contact Info

Product

Resources

About