“…Machine learning methods have been developed to identify complex relationships and patterns in large scale DNA sequences (including MPRA data) that may not be apparent through conventional statistical methods. For example, convolutional neural networks (CNN) and recurrent neural networks (RNN) were used to capture the local dependences in DNA sequences and/or genomic features and predict binding affinities 1,31,32 , chromatin features 33,34 , DNA methylation 35,36 , RBP (RNA-binding protein) binding [37][38][39] and gene expression levels 40 . In contrast, Transformers are a type of neural network architecture that has gained popularity in recent years for their ability to process sequential data, such as text and speech, more efficiently and effectively than traditional RNNs and CNNs 41 .…”