“…There is growing interest in developing protein language models ( p LMs) at the scale of evolution due to the abundance of 1D amino acid sequences, such as the series of ESM (Rives et al, 2019; Lin et al, 2022), TAPE (Rao et al, 2019), ProtTrans (Elnaggar et al, 2021), PRoBERTa (Nambiar et al, 2020), PMLM (He et al, 2021), ProteinLM (Xiao et al, 2021), PLUS (Min et al, 2021), Adversarial MLM (McDermott et al, 2021), ProteinBERT (Brandes et al, 2022), CARP (Yang et al, 2022a) in masked language modeling (MLM) fashion, ProtGPT2 (Ferruz et al, 2022) in causal language modeling fashion, and several others (Melnyk et al, 2022a; Madani et al, 2021; Unsal et al, 2022; Nourani et al, 2021; Lu et al, 2020; Sturmfels et al, 2020; Strodthoff et al, 2020). These protein language models are able to generalize across a wide range of downstream applications and can capture evolutionary information about secondary and tertiary structures from sequences alone.…”