“…Nevertheless, we focus on unsupervised contrastive learning and form the positive pairs via data augmentation, since such methods are more cost-effective and applicable across different domains and languages. Along this line, many approaches have been developed recently, where the augmentations are obtained via sampling from surrounding or nearby contexts (Logeswaran and Lee, 2018;Giorgi et al, 2020), word or feature-level perturbation Yan et al, 2021), back-translation (Fang and Xie, 2020), sentencelevel corruption using an auxiliary language model (Meng et al, 2021), intermediate representations of BERT (Kim et al, 2021), and dropout (Yan et al, 2021;Gao et al, 2021).…”