“…Additionally, BERT, which essentially consists of stacked Transformer encoder layers, shows enhanced performance in down-stream task-specific predictions after pre-training on a massive dataset (Devlin et al , 2019). In the field of bioinformatics, several BERT architectures pre-trained on a massive corpus of protein sequences have been recently proposed, demonstrating their capability to decode the context of biological sequences (Rao et al , 2019; Rives et al , 2021; Elnaggar et al , 2021; Iuchi et al , 2021). In comparison to the protein language models, Ji et al (2021) a pre-trained BERT model, named DNABERT, on a whole human reference genome demonstrated its broad applicability for predicting promoter regions, splicing sites, and transcription factor binding sites upon fine-tuning.…”