“…Also, deep language models, such as BERT 91 and ELMO 46 were originally developed for NLP, and later employed for protein representations 23,28 . Furthermore, Convolutional Neural networks (CNNs), having the ability to learn to summarize the data with adaptive filters, have been employed to represent proteins 23,63,86,102,103 . Additionally, architectures that are capable of inferring patterns from sequential data (e.g., protein sequences) using the attention mechanism 23,55 , such as Long Short-Term Memory (LSTM) neural networks 23,28,44,104,105 and transformer based algorithms 106 , are used in representation methods.…”