SimSCL: A Simple Fully-Supervised Contrastive Learning Framework for Text Representation

Moukafih, Youness; Ghanem, Abdelghani; Abidi, Karima; Sbihi, Nada; Ghogho, Mounir; Smaïli, Kamel

doi:10.1007/978-3-030-97546-3_59

Cited by 3 publications

(1 citation statement)

References 19 publications

(19 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Following common practice in contrastive learning, we first study the importance of adding a projection head that maps representations to new space where the supervised contrastive loss is applied. Similar to [9], [42], we tested three different MLP architectures: (1) identity mapping; (2) linear projection z = g(h) = W (1) h ∈ R 512 ; (3) non-linear projection with one additional hidden layer as used by several previous approaches z = g(h) = W (2) ReLU (W (1) h) ∈ R 512 . Similar to what was found in previous works, we observe that a non-linear architecture is better than both the linear and the identity functions for the projection head network (See table 2).…”

Section: Classification Accuracymentioning

confidence: 99%

SuperConText: Supervised Contrastive Learning Framework for Textual Representations

Moukafih¹,

Sbihi²,

Ghogho³

et al. 2023

IEEE Access

View full text Add to dashboard Cite

In the last decade, Deep neural networks (DNNs) have been proven to outperform conventional machine learning models in supervised learning tasks. Most of these models are typically optimized by minimizing the well-known Cross-Entropy objective function. The latter, however, has a number of drawbacks, including poor margins and instability. Taking inspiration from the recent selfsupervised Contrastive representation learning approaches, we introduce Supervised Contrastive learning framework for Textual representations (SuperConText) to address those issues. We pretrain a neural network by minimizing a novel fully-supervised contrastive loss. The goal is to increase both inter-class separability and intra-class compactness of the embeddings in the latent space. Examples belonging to the same class are regarded as positive pairs, while examples belonging to different classes are considered negatives. Further, we propose a simple yet effective method for selecting hard negatives during the training phase. In an extensive series of experiments, we study the impact of a number parameters on the quality of the learned representations (e.g. the batch size). Simulation results show that the proposed solution outperforms several competing approaches on various large-scale text classification benchmarks without requiring specialized architectures, data augmentations, memory banks, or additional unsupervised data. For instance, we achieve top-1 accuracy of 61.94% on the Amazon-F dataset, which is 3.54% above the best result obtained when using the cross-entropy with the same model architecture.

show abstract