“…Unlike knowledge distillation (Hinton et al, 2015), mutual learning doesn't need a powerful teacher network which is not always available. Mutual learning is first proposed to leverage information from multiple models and allow effective dual knowledge transfer in image processing tasks (Zhang et al, 2018;Zhao et al, 2021) Contrastive learning Contrastive learning aims at learning example representations by minimizing the distance between the positive pairs in the vector space and maximizing the distance between the negative pairs (Saunshi et al, 2019;Liang et al, 2022;Liu et al, 2022a), which is first proposed in the field of computer vision (Chopra et al, 2005;Schroff et al, 2015;Sohn, 2016;Chen et al, 2020a;Wang and Liu, 2021). In the NLP area, contrastive learning is applied to learn sentence embeddings (Giorgi et al, 2021;Yan et al, 2021), translation (Pan et al, 2021;Ye et al, 2022) and summarization Cao and Wang, 2021).…”